Abstract
Contributed Talk - Splinter EScience
Thursday, 12 September 2024, 16:15 (S13)
Performance analysis of source classification using the Gaia DR3
S. Jamal, C. Bailer-Jones
Max Planck Institute for Astronomy
In the Gaia Data Release 3 (GDR3), the Discrete Source Classifier (DSC) provides probabilistic classification of sources using machine learning algorithms, a Bayesian framework and a global prior. The DSC Combmod classifier in GDR3 achieved for the extragalactic classes (quasars and galaxies) a high completeness but a low purity due to contamination from the star class. However, single classification metrics mask significant variation in performance with magnitude and sky position. We introduce two-dimensional (2-d) representations of the completeness and the purity as function of Galactic latitude and source brightness. We also reevaluate DSC performances on a cleaner validation set (excluding contamination from the Magellanic Clouds), and assess the GDR3 DSC Combmod performance at average 2-d completenesses≳92% and average 2-d purities of 55% and 89% for the quasar and galaxy classes respectively. Using the published DSC probabilities, we introduce a new parametric combination, named Combmod-α, that achieves a significant improvement in purity for a small loss of completeness for the extragalactic classes. We obtain on the cleaner validation set average 2-d completenesses of 82% and 93% and average 2-d purities of 79% and 93% for the quasar and galaxy classes respectively using the global prior. The extragalactic catalog from the new combiner constitute a pure sample (about 1.7M quasar candidates and 2.9M galaxy candidates, excluding contamination from the Magellanic Clouds).