Estimation of missing values in astronomical survey data: An improved local approach using cluster directed neighbor selection (original) (raw)

2022, Information Processing & Management

The work presented in this paper aims to develop a new imputation method to better handle missing values encountered in astronomical data analysis, especially the classification of transient events in a sky survey from the GOTO project. In particular, the framework of cluster directed selection of neighbors that has proven effective for benchmark local imputation techniques of KNNimpute and LLSimpute are extended to new multi-stage models. These combinations, namely Iterative-CKNN and Iterative-CLLS, are organic with an original application to analyze sky survey data. They bring out advantages from both local approaches, where estimates are summarized from neighbors in the same data cluster, within the iterative process to refine previous guesses. Based on experiments with simulated datasets corresponding to different survey sizes and missing rations between 1 to 20%, they usually outperform baseline models and BPCA, which is the well-known global technique. For instance, at 10% missing rate, Iterative-CLLS appears to be the most accurate with NRMSE score of 0.190, while BPCA and the best among its baseline models reaches 0.351 and 0.249, respectively. For their practical implications, these methods have also proven effective for classifying transients, using common algorithms like KNN, Naive Bayes and Random Forest.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact