About This Project
Multi-omic data (e.g., genomics, transcriptomics) holds the key to identifying novel solutions for most pressing problems in biomedical sciences, such as cancer precision therapy [1]. Large amount of this data exists in databases, but remains unanalyzed [2].
We hypothesize that extraction and integration of these multi-omic features across major cancer types will reveal novel molecular signatures that can improve treatment response or disease progression.
Ask the Scientists
Join The DiscussionWhat is the context of this research?
Advances in next-generation sequencing (NGS) methods (e.g., genomics, transcriptomics) enabled a range of cancer omic analyses at the molecular level [1]. These methods identify features underlying cancer pathophysiology and serve pivotal to devise therapeutics [2]. Integrated analysis of multiple omic features can suggest optimal treatment flow at a patient level - termed precision medicine [3].
To get insights, patient omic data is matched with existing omic patterns accumulated through years of research [4]. A major hurdle here is that a large amount of omic data with targetable, but yet unanalyzed features exists in several databases. Also, this data is heterogenoeous, requires complex analysis steps, not validated for clinical actionability, and lacks a multi-omic view [5].
We hypothesize that integrating 15 key omic features across major cancer types will reveal molecular patterns that can guide more effective, data-driven approaches to cancer precision therapy.
What is the significance of this project?
Multi-omic view of disease progression holds the key to identifying novel solutions for some of the most pressing problems in healthcare and biomedical sciences, such as cancer precision therapy, and finding new drug targets or biomarkers [1].
Rapid advances in DNA/RNA/protein sequencing methods made it possible to obtain complete molecular profile of a patient [2]. However, the existing multi-omic data that is required to match with patient data remain largely unanalyzed and scattered [3]. This omic data is large, heterogeneous, and importantly extraction of meaningful insights require specific technical skills, scientific know-how, and computational infrastructure, which is largely lacking among primary stakeholders [4].
This project will provide a proof-of-concept for how a user-facing, AI/ML driven tool encompassing pre-analyzed, pre-validated clinically actionable omic features can drive deeper and faster biological insights.
What are the goals of the project?
Completion of this project would provide a proof-of-concept and identify novel multi-omic, oncologic patterns that correlate with clinical outcomes of cancer patients.
The outcome from this work is also critical for our larger goal to develop a disease-wide Multi-omic Tensor for 50+ omic features corresponding to 85+ human disease conditions (>2Bi data points). Our eventual goal is to combine this Multi-omic Tensor with AI/ML driven omic data analysis kits in cloud to build world's first GPS for biomedical sciences.
Budget
The funds will be used to cover computational costs associated with mining, processing, and analyzing the omic data. Funds will also pay for one bioinformatic analyst, and training/testing of an unsupervised machine learning model. Finally, funds will allow us to deploy the Onco-Tensor in the cloud where stakeholders can easily explore these patterns.
Endorsed by
Project Timeline
Sequencing data (FASTQ format) for 1000 cancer datasets from 3 major cancer databases (TCGA, GEO, and UK biobank) will be mined. Data will be harmonized, QC'd, and mapped to the genome. Using bioinformatic workflows, genome-wide counts for 15 omic features will be obtained. Machine learning will stratify features into cancer-specific moelcular subtypes. A repository and user facing analytic dashboard will be developed for exploration.
Nov 20, 2025
Project Launched
Dec 31, 2025
Mining of raw sequencing data for 1000 cancer datasets from TCGA, GEO, and UK biobank (15 cancer types).
Jan 21, 2026
Harmonizing the data and setting for feature extraction
Feb 28, 2026
Data quality control, mapping to genome, analysis and extraction of 15 omic features.
Mar 16, 2026
Machine learning based unsupervised clustering.
Meet the Team
Deepak Sharma
I am a PhD scientist with a background of analyzing large scale multi-omic data (e.g., genomics, transcriptomics, proteomics) and finding clinically actionable patterns in it [1]. Ever since I began my journey in science, I have been fascinated by what hidden patterns the omic data holds and what patterns are already present in large amount of existing data, but are currently difficult to interpret.
Throughout my 15+ years of research career, I worked at several universites and published >30 research articles in prestigious journals (eg., Nature [2], JCI [3]). Currently I work as the Principal Bioinformatics Scientist at a startup that I co-founded - Bainom [4]. At Bainom, our goal is to create a GPS for biomedical sciences by merging multi-omic data analysis tools (as standalone, query-specific, biology centric "bioinformatic kits") and all existing data from 80+ databases as single flow that accelerates drug discovery, precision health, and biomedical research.
Lab Notes
Nothing posted yet.
Project Backers
- 1Backers
- 1%Funded
- $50Total Donations
- $50.00Average Donation



