Homogenization of Multi-Institutional Chest X-Ray Images in Various Data Transformation Schemes

Published

Journal of Medical Imaging, 2023, 10.6: 061103-061103

Authors

Hyeongseok Kim¹, Seoyoung Lee², Woo Jung Shim³, Min-Seong Choi³, Seungryong Cho¹

Affiliations

¹Institute for Artificial Intelligence, Korea Advanced Institute of Science and Technology (Republic of Korea)
²Korea Advanced Institute of Science and Technology (Republic of Korea)
³Radisen Co., Ltd. (Republic of Korea)

Purpose

Although there are several options for improving the generalizability of learned models, a data instance-based approach is desirable when stable data acquisition conditions cannot be guaranteed. Despite the wide use of data transformation methods to reduce data discrepancies between different data domains, detailed analysis for explaining the performance of data transformation methods is lacking.

Approach

This study compares several data transformation methods in the tuberculosis detection task with multi-institutional chest x-ray (CXR) data. Five different data transformations, including normalization, standardization with and without lung masking, and multi-frequency-based (MFB) standardization with and without lung masking were implemented. A tuberculosis detection network was trained using a reference dataset, and the data from six other sites were used for the network performance comparison. To analyze data harmonization performance, we extracted radiomic features and calculated the Mahalanobis distance. We visualized the features with a dimensionality reduction technique. Through similar methods, deep features of the trained networks were also analyzed to examine the models’ responses to the data from various sites.

Results

From various numerical assessments, the MFB standardization with lung masking provided the highest network performance for the non-reference datasets. From the radiomic and deep feature analyses, the features of the multi-site CXRs after MFB with lung masking were found to be well homogenized to the reference data, whereas the others showed limited performance.

Conclusions

Conventional normalization and standardization showed suboptimal performance in minimizing feature differences among various sites. Our study emphasizes the strengths of MFB standardization with lung masking in terms of network performance and feature homogenization.

Link to Publication