Jarosław Chilimoniuk (Białystok / PL), Krystyna Grzesiak (Białystok / PL; Wrocław / PL), Jakub Kołodziejczyk (Białystok / PL), Dominik Nowakowski (Białystok / PL), Adam Kretowski (Białystok / PL), Rafał Kolenda (Norwich / GB; Wrocław / PL), Michał Ciborowski (Białystok / PL), Michał Burdukiewicz (Białystok / PL; Barcelona / ES)
Missing values in proteomics data pose significant challenges, impacting downstream analysis and interpretation. To address this issue, we introduce imputomics 2.0, a comprehensive wrapper that integrates a wide array of missing value imputation algorithms (MVIAs) tailored specifically for proteomic data. imputomics 2.0 will integrate multiple state-of-the-art imputation algorithms, including but not limited to k-nearest neighbors (KNN), random forest (RF), expectation-maximization (EM), and singular value decomposition (SVD) and more, offering researchers a comprehensive toolkit to effectively handle missing data.
imputomics 2.0 will provide a user-friendly web interface that streamlines the application of different imputation strategies, allowing researchers to effortlessly experiment with various algorithms and parameter settings. This flexibility will enable to customize and optimize imputation methods to suit the specific characteristics of individual proteomic datasets, ultimately enhancing the accuracy and reliability of downstream analyses.
Central to our approach is the development of a unified framework that streamlines the use of different imputation algorithms. Therefore, any implemented algorithm must satisfy three conditions. Uniformity: the wrapper function standardizes input and output to 'data.frames' for tidy data processing, preserving original data structure and auxiliary functions designed specifically for 'data.frames'. Input data verification: input data is automatically verified for missing values ('NAs') or zeros/ones, ensuring all numeric, non-negative values are exclusively used, with optional additional integrity checks if recommended by the original imputation function authors. Passing arguments to function calls: relevant arguments are passed to underlying imputation functions while retaining default values, with any alterations to default behavior clearly documented, and each function enhanced with a boolean 'verbose' argument to suppress unnecessary screen prompts.
Furthermore, we will conduct an extensive benchmark to evaluate the efficacy of each MVIA across diverse proteomic datasets. Additionally, we will examine the computational efficiency and scalability of each algorithm, crucial considerations for large-scale proteomic studies.
Our tool will be available as web server, shiny application (GUI) and R package (CLI).