Introduction The rapid growth of cohort sizes and multiomics analyses generates increasingly vast amounts of data. The analysis and interpretation of this data, along with the ability to precisely replicate analyses and share results, is crucial. To address these needs, we developed SimpliFi (simplifi.protifi.com), a cloud-based, browser-driven data-to-meaning engine designed with an intuitive, user-friendly interface accessible to users of all experience levels. Additionally, we introduce SimpliFi Box, a private deployment of SimpliFi for organizations requiring stringent data privacy and security for internal network use or secure virtual private cloud environments.
Methods SimpliFi models biology using exclusively nonparametric statistics, with biological replicates defining their own distributions. P-values and fold-changes are determined based on biological variation, sample numbers, observations, and measurement error. SimpliFi accepts and integrates all types of omics data and provides confidence intervals for all values, including p-values. Importantly, SimpliFi does not transform data and accommodates increased data variance at low or high intensities. Its user interface allows for the efficient analysis of large sample sets. Projects can be shared via private or public URLs, facilitating collaborative data exploration.
Preliminary Data SimpliFi employs non-parametric statistics where biological replicates yield their own empirical distributions, resulting in p-values that can differ significantly from those derived from parametric tests like T-tests. This is due to the non-Gaussian nature of biological and omics data. Sample size variations, over- or under-sampling of variability, and outliers can lead to false negatives or positives. SimpliFi presents analyses through interactive displays of pathways, tissue states, diseases, cells, and molecular-level classifications.
Front-end displays are optimized for large cohorts; quality control features automatically flagging unusual samples. Users can filter analytes (proteins, metabolites, etc.) by various parameters to quickly identify biomarkers or other significant analytes, and then perform pathway and other analyses on the selected subgroups. Differential expression analysis is available at a population-wide level, with visualizations such as distributions, violin plots, and box-and-whisker plots enabling immediate understanding of experiments with hundreds to thousands of samples. Graphs and diagrams can be saved as vector or raster images with a right-click, and customized reports can be generated easily.
SimpliFi operates on a cloud-based model, allowing users to upload data to a server equipped with GPUs for rapid statistical analysis. However, with SimpliFi Box, organizations can run SimpliFi on-site, ensuring that private data remains on premises. Projects can be shared internally, enabling users to explore data collaboratively while accessing all features.