Multi-sensor imagery data has been used by researchers for the image semantic segmentation of buildings and outdoor scenes. Due to multi-sensor data hunger, researchers have implemented many simulation approaches to create synthetic datasets, and they have also synthesized thermal images because such thermal information can potentially improve segmentation accuracy. However, current approaches are mostly based on the laws of physics and are limited to geometric models’ level of detail (LOD), which describes the overall planning or modeling state. Another issue in current physics-based approaches is that thermal images cannot be aligned to RGB images because the configurations of a virtual camera used for rendering thermal images are difficult to synchronize with the configurations of a real camera used for capturing RGB images, which is important for segmentation. In this study, we propose an image translation approach to directly convert RGB images to simulated thermal images for expanding segmentation datasets. We aim to investigate the benefits of using an image translation approach for generating synthetic aerial thermal images and compare those approaches with physics-based approaches. ... mehrOur datasets for generating thermal images are from a city center and a university campus in Karlsruhe, Germany. We found that using the generating model established by the city center to generate thermal images for campus datasets performed better than using the latter to generate thermal images for the former. We also found that using a generating model established by one building style to generate thermal images for datasets with the same building styles performed well. Therefore, we suggest using training datasets with richer and more diverse building architectural information, more complex envelope structures, and similar building styles to testing datasets for an image translation approach.