Publications

2022

Battaglia, Sofia, Kevin Dong, Jingyi Wu, Zeyu Chen, Fadi J Najm, Yuanyuan Zhang, Molly M Moore, Vivian Hecht, Noam Shoresh, and Bradley E Bernstein. (2022) 2022. “Long-Range Phasing of Dynamic, Tissue-Specific and Allele-Specific Regulatory Elements”. Nature Genetics 54 (10): 1504-13. https://doi.org/10.1038/s41588-022-01188-8.

Epigenomic maps identify gene regulatory elements by their chromatin state. However, prevailing short-read sequencing methods cannot effectively distinguish alleles, evaluate the interdependence of elements in a locus or capture single-molecule dynamics. Here, we apply targeted nanopore sequencing to profile chromatin accessibility and DNA methylation on contiguous  100-kb DNA molecules that span loci relevant to development, immunity and imprinting. We detect promoters, enhancers, insulators and transcription factor footprints on single molecules based on exogenous GpC methylation. We infer relationships among dynamic elements within immune loci, and order successive remodeling events during T cell stimulation. Finally, we phase primary sequence and regulatory elements across the H19/IGF2 locus, uncovering primate-specific features. These include a segmental duplication that stabilizes the imprinting control region and a noncanonical enhancer that drives biallelic IGF2 expression in specific contexts. Our study advances emerging strategies for phasing gene regulatory landscapes and reveals a mechanism that overrides IGF2 imprinting in human cells.

Kartha, Vinay K, Fabiana M Duarte, Yan Hu, Sai Ma, Jennifer G Chew, Caleb A Lareau, Andrew Earl, et al. (2022) 2022. “Functional Inference of Gene Regulation Using Single-Cell Multi-Omics”. Cell Genomics 2 (9). https://doi.org/10.1016/j.xgen.2022.100166.

Cells require coordinated control over gene expression when responding to environmental stimuli. Here we apply scATAC-seq and single-cell RNA sequencing (scRNA-seq) in resting and stimulated human blood cells. Collectively, we generate  91,000 single-cell profiles, allowing us to probe the cis-regulatory landscape of the immunological response across cell types, stimuli, and time. Advancing tools to integrate multi-omics data, we develop functional inference of gene regulation (FigR), a framework to computationally pair scA-TAC-seq with scRNA-seq cells, connect distal cis-regulatory elements to genes, and infer gene-regulatory networks (GRNs) to identify candidate transcription factor (TF) regulators. Utilizing these paired multi-omics data, we define domains of regulatory chromatin (DORCs) of immune stimulation and find that cells alter chromatin accessibility and gene expression at timescales of minutes. Construction of the stimulation GRN elucidates TF activity at disease-associated DORCs. Overall, FigR enables elucidation of regulatory interactions across single-cell data, providing new opportunities to understand the function of cells within tissues.

2021

de Goede, Olivia M, Daniel C Nachun, Nicole M Ferraro, Michael J Gloudemans, Abhiram S Rao, Craig Smail, Tiffany Y Eulalio, et al. (2021) 2021. “Population-Scale Tissue Transcriptomics Maps Long Non-Coding RNAs to Complex Disease”. Cell 184 (10): 2633-2648.e19. https://doi.org/10.1016/j.cell.2021.03.050.

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

Tognon, Manuel, Vincenzo Bonnici, Erik Garrison, Rosalba Giugno, and Luca Pinello. (2021) 2021. “GRAFIMO: Variant and Haplotype Aware Motif Scanning on Pangenome Graphs”. PLoS Computational Biology 17 (9): e1009444. https://doi.org/10.1371/journal.pcbi.1009444.

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.

2020

Ma, Sai, Bing Zhang, Lindsay M LaFave, Andrew S Earl, Zachary Chiang, Yan Hu, Jiarui Ding, et al. (2020) 2020. “Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin”. Cell 183 (4): 1103-1116.e20. https://doi.org/10.1016/j.cell.2020.09.056.

Cell differentiation and function are regulated across multiple layers of gene regulation, including modulation of gene expression by changes in chromatin accessibility. However, differentiation is an asynchronous process precluding a temporal understanding of regulatory events leading to cell fate commitment. Here we developed simultaneous high-throughput ATAC and RNA expression with sequencing (SHARE-seq), a highly scalable approach for measurement of chromatin accessibility and gene expression in the same single cell, applicable to different tissues. Using 34,774 joint profiles from mouse skin, we develop a computational strategy to identify cis-regulatory interactions and define domains of regulatory chromatin (DORCs) that significantly overlap with super-enhancers. During lineage commitment, chromatin accessibility at DORCs precedes gene expression, suggesting that changes in chromatin accessibility may prime cells for lineage commitment. We computationally infer chromatin potential as a quantitative measure of chromatin lineage-priming and use it to predict cell fate outcomes. SHARE-seq is an extensible platform to study regulatory circuitry across diverse cells in tissues.

Kim-Hellmuth, Sarah, François Aguet, Meritxell Oliva, Manuel Muñoz-Aguirre, Silva Kasela, Valentin Wucher, Stephane E Castel, et al. (2020) 2020. “Cell Type-Specific Genetic Regulation of Gene Expression across Human Tissues”. Science (New York, N.Y.) 369 (6509). https://doi.org/10.1126/science.aaz8528.

The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.

2019

Vinyard, Michael E, Cindy Su, Allison P Siegenfeld, Amanda L Waterbury, Allyson M Freedy, Pallavi M Gosavi, Yongho Park, et al. (2019) 2019. “CRISPR-Suppressor Scanning Reveals a Nonenzymatic Role of LSD1 in AML”. Nature Chemical Biology 15 (5): 529-39. https://doi.org/10.1038/s41589-019-0263-0.

Understanding the mechanism of small molecules is a critical challenge in chemical biology and drug discovery. Medicinal chemistry is essential for elucidating drug mechanism, enabling variation of small molecule structure to gain structure-activity relationships (SARs). However, the development of complementary approaches that systematically vary target protein structure could provide equally informative SARs for investigating drug mechanism and protein function. Here we explore the ability of CRISPR-Cas9 mutagenesis to profile the interactions between lysine-specific histone demethylase 1 (LSD1) and chemical inhibitors in the context of acute myeloid leukemia (AML). Through this approach, termed CRISPR-suppressor scanning, we elucidate drug mechanism of action by showing that LSD1 enzyme activity is not required for AML survival and that LSD1 inhibitors instead function by disrupting interactions between LSD1 and the transcription factor GFI1B on chromatin. Our studies clarify how LSD1 inhibitors mechanistically operate in AML and demonstrate how CRISPR-suppressor scanning can uncover novel aspects of target biology.