Sparse-view CBCT reconstruction using meta-learned neural attenuation field and hash-encoding regularization

Published

Computers in Biology and Medicine

Authors

Heejun Shin ^a, Taehee Kim ^a, Jongho Lee ^b, Se Young Chun ^c, Seungryong Cho ^d, Dongmyung Shin ^a

Affiliations

^aArtificial Intelligence Engineering Division, Radisen Co. Ltd., Seoul, Republic of Korea
^bLaboratory for Imaging Science and Technology, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
^cIntelligent Computational Imaging Laboratory, Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea
^dMedical Imaging and Radiotherapy Laboratory, Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, Daejean, Republic of Korea

Highlights

•Limited number of projections in CBCT scans degrades image quality significantly.

•Proposed method provides good-quality CBCT images in limited projections (< 50).

•The method produces consistent quality across different body parts and scanners.

Abstract

Cone beam computed tomography (CBCT) is an emerging medical imaging technique to visualize the internal anatomical structures of patients. During a CBCT scan, several projection images of different angles or views are collectively utilized to reconstruct a tomographic image. However, reducing the number of projections in a CBCT scan while preserving the quality of a reconstructed image is challenging due to the nature of an ill-posed inverse problem. Recently, a neural attenuation field (NAF) method was proposed by adopting a neural radiance field algorithm as a new way for CBCT reconstruction, demonstrating fast and promising results using only 50 views. However, decreasing the number of projections is still preferable to reduce potential radiation exposure, and a faster reconstruction time is required considering a typical scan time. In this work, we propose a fast and accurate sparse-view CBCT reconstruction (FACT) method to provide better reconstruction quality and faster optimization speed in the minimal number of view acquisitions (< 50 views). In the FACT method, we meta-trained a neural network and a hash-encoder using a few scans (= 15), and a new regularization technique is utilized to reconstruct the details of an anatomical structure. In conclusion, we have shown that the FACT method produced better, and faster reconstruction results over the other conventional algorithms based on CBCT scans of different body parts (chest, head, and abdomen) and CT vendors (Siemens, Phillips, and GE).

Introduction

Cone beam computed tomography (CBCT) is an emerging medical imaging technique to visualize the internal anatomical structures of patients. Compared to a fan beam CT (FBCT), CBCT takes advantage of higher image resolution and faster scanning time [1]. During a CBCT scan, a cone-shaped X-ray beam is diverged from a source which circularly rotates an anatomy of interest with an X-ray detector. This results in several projection images of different angles (i.e., views) that are collectively utilized to reconstruct a tomographic image. Nowadays, CBCT has been widely adopted, such as an imaging guidance or patient positioning tool, in human anatomies, including teeth [2], [3], extremities [4], [5], and chest [6], [7].

However, the amount of ionizing radiation to a patient during a CBCT scan is much higher than in conventional radiography, preventing its wide applications [1], [8]. Reducing the number of projections (i.e., sparse-view; < 100 views) in a CBCT scan can effectively circumvent the high exposure; however, preserving the quality of a reconstructed image is challenging due to the nature of an ill-posed inverse problem [9].

Conventional CBCT reconstruction methods such as the Feldkamp–Davis–Kress (FDK) algorithm [10] provide the best image quality in an ordinary CBCT scan with hundreds of views but suffering from streak artifacts in a sparse-view scan. Iterative optimization algorithms, such as simultaneous algebraic reconstruction technique (SART) or adaptive steepest descent-projection on convex subsets (ASD-POCS) [11], [12], produce good image quality in a limited number of views, but they require more computational time and produce unsatisfactory results as the number of views becomes very limited (e.g., 50 views or less).

As more recent approaches, several deep learning-based FBCT reconstruction techniques, primarily based on convolutional neural networks (CNNs), have successfully promoted superior image quality in a sparse-view setting. Those methods are typically categorized as one of three: sinogram-domain methods [13], [14], [15] that directly interpolate or extrapolate sinograms to fill out missing information, images-domain methods [16], [17], [18], [19] which use CNNs to recover reconstructed images with artifacts, and dual-domain methods [20], [21], [22], [23], [24] that combine these two approaches to utilize mutual information in both domains. However, those methods, which rely on each sinogram to reconstruct each 2D FBCT slice, are limited to being generalized to CBCT reconstruction, which requires using multiple 2D projections simultaneously to reconstruct a 3D tomography. In addition, the methods inherently require a lot of training FBCT data and suffer from potential training biases.

Recently, a neural radiance field (NeRF) algorithm [25] has been gaining tremendous popularity as an effective way to synthesize novel views of natural scenes or objects using a set of captured views. In NeRF, rather than a complex CNN, a simple feed-forward network is adopted to learn the relationship (i.e., implicit neural representation [26]) between the spatial coordinates in a 3D volume and metrics for these coordinates, such as colors and densities, to synthesize novel views. Based on the NeRF method, [27] proposed a new self-supervised method for CBCT reconstruction, which is called a neural attenuation field (NAF), demonstrating fast CBCT reconstruction with good quality using only 50 projections. Identical to the NeRF method, the NAF method does not require any training data and produces better reconstruction results than other conventional methods.

Although the NAF method reported promising results as a novel way for CBCT reconstruction, some room for improvement remains. First, decreasing the number of projections, even less than 50 views, is preferable unless the image quality is acceptable for certain clinical purposes (e.g., imaging guidance [7], patient positioning [28], emergency scan [29], etc.), to reduce the amount of radiation exposure. Second, a faster reconstruction time is more favorable since the convergence time of NAF is still relatively long compared to a typical CBCT scan time [30].

In this work, we propose a fast and accurate sparse-view CBCT reconstruction (FACT) method, which utilizes a meta-learning framework and a novel feature-encoding technique to alleviate potential disadvantages of NAF. In the FACT method, we first pre-trained a neural network using a small set of CBCT scans (= 15 scans) based on a meta-learning framework. Then, this meta-learned network is utilized to reconstruct a 3D tomography quickly using a new CBCT scan at test time. During the reconstruction, we regularized the hash-encoding process to further optimize the details of the anatomical structure even in the minimal number of views (< 50 views).

In summary, the main contributions of this paper are the following:

– We propose a FACT method that provides faster optimization speed and better CBCT reconstruction quality than an NAF method in the minimal number of views (< 50 views) based on a meta-learning framework and a novel hash-encoding regularization process.

– Unlike previous supervised CT reconstruction methods that extensively require a large amount of CT data (> a few hundred) to train neural networks, the FACT method requires only a small number of scans (= 15 scans) to meta-train a neural network, avoiding preparation of a large amount of data and potential biases in AI training.

– Throughout the extensive experiments, we have shown that, regardless of CT vendors (Siemens, Phillips, and GE) and body parts (chest, head, and abdomen), the FACT method produced consistently better CBCT reconstruction results.

Section snippets

Sparse-view CT reconstruction

Classical CBCT reconstruction algorithms [10], [11], [12] include analytical optimization methods (e.g., FDK) that estimate attenuation coefficients based on the inverse Radon transform and iterative methods (e.g., SART, ASD-POCS) that apply iterative reconstruction algorithms until convergence. However, FDK works best when hundreds of projections are available and suffers from artifacts in a sparse-view scan. In contrast, iterative methods, such as ASD-POCS, can suppress artifacts but consume

CBCT reconstruction based on implicit neural representation

The goal of CBCT reconstruction is to extract 3D tomographic information using a set of 2D X-ray projections. Compared to a FBCT scan, a CBCT scan uses a circularly rotating X-ray source with a panel detector to acquire a set of N 2D X-ray images with different views (i.e., X-ray projection set; X={X1,X2,…,XN∈RH×W}, where H and W are the height and width of images, respectively). The CBCT acquisition (φ), therefore, depends on a 3D object of interest (C), scanning angles of each view α={α1,α2,…,) …

CBCT reconstruction results

Fig. 2 shows the graphs that report 3D SSIM and 3D PSNR values according to the change in the number of input views (50, 40, 30, 20, and 10 views) for the SART, ASD-POCS, NAF, and FACT methods. Regardless of the number of views, the FACT method outperformed the others (see Table 1 for details, including FDK). Especially in the graph of SSIM (Fig. 2a), FACT revealed more gains as the number of views decreased compared to the others. Indeed, when we investigated the reconstructed images (Fig. 3), …

Discussion

The proposed FACT reconstruction method utilizes a meta-learning framework to pre-train a neural network using a small set of CBCT scans and regularizes the hash-encoding process to increase the optimization speed and the quality of the CBCT reconstruction with no additional computational cost. Although the reconstruction results were not perfect to be used for diagnostic purposes compared to the ground-truths, we proved that the 3D CBCT images of FACT had better quality than those of the other …

Conclusion

In conclusion, with the meta-initialization and the new regularization process, the FACT method demonstrated faster and more accurate sparse-view CBCT reconstruction than the other conventional (FDK, SAR, and ASD-POCS) and NAF methods. The proposed method is expected to provide benefits in some clinical situations, such as CBCT imaging guidance, patient positioning, and emergency scans [7], [28], [29], assuring the consistent reconstruction quality for the multiple CT machines (Siemens, …

Link to Publication