Publications

2020

A small number of key molecules can completely change the cell's state, for example, a stem cell differentiating into distinct types of blood cells or a healthy cell turning cancerous. How can we uncover the important cellular events that govern complex biological behavior? One approach to answering the question has been to elucidate the mechanisms by which genes and proteins control each other in a cell. These mechanisms are typically represented in the form of a gene or protein regulatory network. The resulting networks can be modeled as a system of mathematical equations, also known as a mathematical model. The advantage of such a model is that we can computationally simulate the time courses of various molecules. Moreover, we can use the model simulations to predict the effect of perturbations such as deleting one or more genes. A biologist can perform experiments to test these predictions. Subsequently, the model can be iteratively refined by reconciling any differences between the prediction and the experiment. In this thesis I present two novel solutions aimed at dramatically reducing the time and effort required for this build-simulate-test cycle. The first solution I propose is in prioritizing and planning large-scale gene perturbation experiments that can be used for validating existing models. I then focus on taking advantage of the recent advances in experimental techniques that enable us to measure gene activity at a single-cell resolution, known as scRNA-seq. This scRNA-seq data can be used to infer the interactions in gene regulatory networks. I perform a systematic evaluation of existing computational methods for building gene regulatory networks from scRNA-seq data. Based on the insights gained from this comprehensive evaluation, I propose novel algorithms that can take advantage of prior knowledge in building these regulatory networks. The results underscore the promise of my approach in identifying cell-type specific interactions. These context-specific interactions play a key role in building mathematical models to study complex cellular processes such as a developmental process that drives transitions from one cell type to another
Pratapa A, Jalihal A, Law J, Bharadwaj A, Murali. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020;17(2):147–154. doi:10.1038/s41592-019-0690-6
We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.
Mason M, Schinke C, Eng C, Towfic F, Gruber F, Dervan A, White B, Pratapa A, Guan Y, Chen H, et al. Multiple Myeloma DREAM Challenge reveals epigenetic regulator PHF19 as marker of aggressive disease. Leukemia. 2020;34(7):1866–1874. doi:10.1038/s41375-020-0742-z
While the past decade has seen meaningful improvements in clinical outcomes for multiple myeloma patients, a subset of patients does not benefit from current therapeutics for unclear reasons. Many gene expression-based models of risk have been developed, but each model uses a different combination of genes and often involves assaying many genes making them difficult to implement. We organized the Multiple Myeloma DREAM Challenge, a crowdsourced effort to develop models of rapid progression in newly diagnosed myeloma patients and to benchmark these against previously published models. This effort lead to more robust predictors and found that incorporating specific demographic and clinical features improved gene expression-based models of high risk. Furthermore, post-challenge analysis identified a novel expression-based risk marker, PHF19, which has recently been found to have an important biological role in multiple myeloma. Lastly, we show that a simple four feature predictor composed of age, ISS, and expression of PHF19 and MMSET performs similarly to more complex models with many more gene expression features included.

2019

Wagner M, Pratapa A, Murali. Reconstructing signaling pathways using regular language constrained paths. Bioinformatics. 2019;35(14):i624-i633. doi:10.1093/bioinformatics/btz360
MOTIVATION: High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. RESULTS: We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. AVAILABILITY AND IMPLEMENTATION: https://github.com/Murali-group/RegLinker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2018

Pratapa A, Adames N, Kraikivski P, Franzese N, Tyson J, Peccoud J, Murali. CrossPlan: systematic planning of genetic crosses to validate mathematical models. Bioinformatics. 2018;34(13):2237–2244. doi:10.1093/bioinformatics/bty072
Motivation: Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results: We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation: CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information: Supplementary data are available at Bioinformatics online.
Raman K, Pratapa A, Mohite O, Balachandran S. Computational Prediction of Synthetic Lethals in Genome-Scale Metabolic Models Using Fast-SL. Methods Mol Biol. 2018;1716:315–336. doi:10.1007/978-1-4939-7528-0_14
In this chapter, we describe Fast-SL, an in silico approach to predict synthetic lethals in genome-scale metabolic models. Synthetic lethals are sets of genes or reactions where only the simultaneous removal of all genes or reactions in the set abolishes growth of an organism. In silico approaches to predict synthetic lethals are based on Flux Balance Analysis (FBA), a popular constraint-based analysis method based on linear programming. FBA has been shown to accurately predict the viability of various genome-scale metabolic models. Fast-SL builds on the framework of FBA and enables the prediction of synthetic lethal reactions or genes in different organisms, under various environmental conditions. Predicting synthetic lethals in metabolic network models allows us to generate hypotheses on possible novel genetic interactions and potential candidates for combinatorial therapy, in case of pathogenic organisms. We here summarize the Fast-SL approach for analyzing metabolic networks and detail the procedure to predict synthetic lethals in any given metabolic model. We illustrate the approach by predicting synthetic lethals in Escherichia coli. The Fast-SL implementation for MATLAB is available from https://github.com/RamanLab/FastSL/ .

2015

Pratapa A, Balachandran S, Raman K. Fast-SL: an efficient algorithm to identify synthetic lethal sets in metabolic networks. Bioinformatics. 2015;31(20):3299–305. doi:10.1093/bioinformatics/btv352
MOTIVATION: Synthetic lethal sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. Previous approaches to identify synthetic lethal genes in genome-scale metabolic networks have built on the framework of flux balance analysis (FBA), extending it either to exhaustively analyze all possible combinations of genes or formulate the problem as a bi-level mixed integer linear programming (MILP) problem. We here propose an algorithm, Fast-SL, which surmounts the computational complexity of previous approaches by iteratively reducing the search space for synthetic lethals, resulting in a substantial reduction in running time, even for higher order synthetic lethals. RESULTS: We performed synthetic reaction and gene lethality analysis, using Fast-SL, for genome-scale metabolic networks of Escherichia coli, Salmonella enterica Typhimurium and Mycobacterium tuberculosis. Fast-SL also rigorously identifies synthetic lethal gene deletions, uncovering synthetic lethal triplets that were not reported previously. We confirm that the triple lethal gene sets obtained for the three organisms have a precise match with the results obtained through exhaustive enumeration of lethals performed on a computer cluster. We also parallelized our algorithm, enabling the identification of synthetic lethal gene quadruplets for all three organisms in under 6 h. Overall, Fast-SL enables an efficient enumeration of higher order synthetic lethals in metabolic networks, which may help uncover previously unknown genetic interactions and combinatorial drug targets. AVAILABILITY AND IMPLEMENTATION: The MATLAB implementation of the algorithm, compatible with COBRA toolbox v2.0, is available at https://github.com/RamanLab/FastSL CONTACT: kraman@iitm.ac.in SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.