The directions we are taking
With the growth of high-throughput sequencing projects modern biology is facing novel bottlenecks due to Big Data issues. One of the challenges is to extract relevant information from this high-volume data while accounting for their intrinsic heterogeneity. So far, genomic screenings have profiled thousands of samples providing insights into the transcriptome of the cell. However, disentangling the heterogeneity of these transcriptomic Big Data to identify defective biological processes remains challenging. We addressed this challenge and introduced the novel concept of discretization of gene expression levels, which we derived from probabilistic modelling and shaped upon knowledge of RNA biology. Our Gene Set Enrichment Class Analysis (GSECA) algorithm exploits the bimodal behavior of RNA-sequencing gene expression profiles to identify altered gene sets in heterogeneous patient cohorts.
We showed that GSECA outperfomed 'state-of-art' algorithms in handling gene sets characterized by expression changes of groups of genes that are more intensively activated or repressed in a heterogeneous manner across samples. It can detect functionally related altered cell mechanisms in a condition of interest considering more heterogeneous cohorts as compared to other available methods. By boosting signal-to-noise ratio, GSECA can successfully manage the heterogeneity of thousands of samples and provides useful insights on clinical and biological patterns proper of a phenotype.
By boosting signal-to-noise ratio, GSECA can successfully manage the heterogeneity of thousands of samples and provides useful insights on clinical and biological patterns proper of a phenotype. With this work we introduced the paradigm shift of "less is more" in treating large heterogenous RNA-seq datasets showing that it improves the detection of the altered biological processes in the phenotype of interest. Like looking to a bunch of photographs from a distance you might be able to get the big message!
In the N.A.R. paper, we generated a comprehensive assessment of the effect of PTEN loss across different cancer types. Our data showed that the impact of PTEN silencing on cellular program regulation is proportional to the impaired modulation of the PI3K/AKT signaling cascade, with the stronger effect of gliomas, endometrial, head and neck, breast carcinomas, melanomas, and sarcomas. GSECA correctly highlighted the role of PTEN in controlling immune-related processes in the majority of cancer types, particularly in those showing a significant alteration of the tumor immune-microenvironment (TIME) composition. These data support the importance of PTEN in modulating the immune system and therapy resistance.
Emerging evidence has suggested that PTEN loss is an immunosuppressive event in prostate tumors. However, the connection between PTEN and the immune system is complex and involves both pro- and anti-tumorigenic immune responses depending on the cellular phenotype and the TIME. Our analyes supported the notion that PTEN loss prostate cancers are non-T cell inflamed, or "cold", tumors. Futhermore, we showed that the immunosuppressive TIME of PTEN-loss prostate tumors could be driven by the significant activation of STAT3. PTEN loss were pivotal to show the shorter of disease-free survival of these patients and to underline the biomarker potential of PTEN expression levels.
Details on our pancancer analysis of PTEN loss are here on Nucleic Acid Research!
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, is becoming a fundamental asset for healthcare and life science research. AI is the pivotal tool to exploit the information available in genomic Big Data and ultimately “deliver” a medicine of precision. The COVID-19 pandemic has opened up new possibilities for AI development. From the first drafts of the human genome, 20 years ago, the number of scientific works employing sequencing data has exponentially increased.
Machine and Deep Learning can leverage the heterogeneity of transcriptomic Big Data to achieve consistent predictions without the need of modeling the system of interest. These algorithms perform tasks that normally require human intelligence. While ML algorithms still need human guidance to improve their predictions, DL methods can autonomously determine the accuracy of a prediction.
Our review on Artificial Intelligence is out on International Journal of Molecular Sciences, for any detail have a look there! ...We also have an R-based tutorial for AI development that is available here