Research on automatic redshift estimation of astronomical spectra based on a CNN architecture with correction mechanism
Bi, Jingzhuo (2026)
Kandidaatintyö
Bi, Jingzhuo
2026
School of Engineering Science, Tietotekniikka
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2026052553442
https://urn.fi/URN:NBN:fi-fe2026052553442
Tiivistelmä
Spectroscopic redshift is a fundamental observational quantity in extragalactic astronomy, and modern sky surveys require reliable automated estimation methods. CNN-based redshift regression models trained with mean-squared-error loss may exhibit regression-to-the-mean behaviour when trained on imbalanced redshift distributions, leading to systematic overprediction at low redshifts and under-prediction at high redshifts.
This thesis evaluates a CNN architecture with a residual correction branch designed to mitigate this effect. A shared 1-D convolutional feature extractor feeds an initial redshift predictor and a correction branch conditioned on the initial prediction, which estimates a residual offset. A cleaned dataset of 49,252 SDSS galaxy spectra covering the range 0 < z < 0.5 is prepared with wavelength alignment, median despiking, and robust z-score normalisation. The proposed model is compared against a baseline CNN of comparable capacity under identical training conditions.
On the held-out test set, the proposed model achieves σNMAD = 0.0065, an absolute bias of approximately 10−3 in ∆z/(1 + z) units, and a catastrophic outlier rate of 0.07%, corresponding to a 14.5% reduction in σNMAD relative to the baseline. The redshift-dependent bias trend observed in the baseline model is also reduced in amplitude, suggesting that an explicit residual correction mechanism conditioned on the initial prediction can help mitigate systematic error in CNN-based spectroscopic redshift estimation.
This thesis evaluates a CNN architecture with a residual correction branch designed to mitigate this effect. A shared 1-D convolutional feature extractor feeds an initial redshift predictor and a correction branch conditioned on the initial prediction, which estimates a residual offset. A cleaned dataset of 49,252 SDSS galaxy spectra covering the range 0 < z < 0.5 is prepared with wavelength alignment, median despiking, and robust z-score normalisation. The proposed model is compared against a baseline CNN of comparable capacity under identical training conditions.
On the held-out test set, the proposed model achieves σNMAD = 0.0065, an absolute bias of approximately 10−3 in ∆z/(1 + z) units, and a catastrophic outlier rate of 0.07%, corresponding to a 14.5% reduction in σNMAD relative to the baseline. The redshift-dependent bias trend observed in the baseline model is also reduced in amplitude, suggesting that an explicit residual correction mechanism conditioned on the initial prediction can help mitigate systematic error in CNN-based spectroscopic redshift estimation.
