The state and behaviour of a cell can be influenced by both genetic and environmental factors. In particular, tumour progression is determined by underlying genetic aberrations1-4 as well as the makeup of the tumour microenvironment5,6. Quantifying the contributions of these factors requires new technologies that can accurately measure the spatial location of genomic sequence together with phenotypic readouts. Here we developed slide-DNA-seq, a method for capturing spatially resolved DNA sequences from intact tissue sections. We demonstrate that this method accurately preserves local tumour architecture and enables the de novo discovery of distinct tumour clones and their copy number alterations. We then apply slide-DNA-seq to a mouse model of metastasis and a primary human cancer, revealing that clonal populations are confined to distinct spatial regions. Moreover, through integration with spatial transcriptomics, we uncover distinct sets of genes that are associated with clone-specific genetic aberrations, the local tumour microenvironment, or both. Together, this multi-modal spatial genomics approach provides a versatile platform for quantifying how cell-intrinsic and cell-extrinsic factors contribute to gene expression, protein abundance and other cellular phenotypes.
Publications
2022
The combination of single-cell transcriptomics with mitochondrial DNA variant detection can be used to establish lineage relationships in primary human cells, but current methods are not scalable to interrogate complex tissues. Here, we combine common 3' single-cell RNA-sequencing protocols with mitochondrial transcriptome enrichment to increase coverage by more than 50-fold, enabling high-confidence mutation detection. The method successfully identifies skewed immune-cell expansions in primary human clonal hematopoiesis.
Global methylation changes in aging cells affect cancer risk and tissue homeostasis.
Epigenomic maps identify gene regulatory elements by their chromatin state. However, prevailing short-read sequencing methods cannot effectively distinguish alleles, evaluate the interdependence of elements in a locus or capture single-molecule dynamics. Here, we apply targeted nanopore sequencing to profile chromatin accessibility and DNA methylation on contiguous 100-kb DNA molecules that span loci relevant to development, immunity and imprinting. We detect promoters, enhancers, insulators and transcription factor footprints on single molecules based on exogenous GpC methylation. We infer relationships among dynamic elements within immune loci, and order successive remodeling events during T cell stimulation. Finally, we phase primary sequence and regulatory elements across the H19/IGF2 locus, uncovering primate-specific features. These include a segmental duplication that stabilizes the imprinting control region and a noncanonical enhancer that drives biallelic IGF2 expression in specific contexts. Our study advances emerging strategies for phasing gene regulatory landscapes and reveals a mechanism that overrides IGF2 imprinting in human cells.
Cells require coordinated control over gene expression when responding to environmental stimuli. Here we apply scATAC-seq and single-cell RNA sequencing (scRNA-seq) in resting and stimulated human blood cells. Collectively, we generate 91,000 single-cell profiles, allowing us to probe the cis-regulatory landscape of the immunological response across cell types, stimuli, and time. Advancing tools to integrate multi-omics data, we develop functional inference of gene regulation (FigR), a framework to computationally pair scA-TAC-seq with scRNA-seq cells, connect distal cis-regulatory elements to genes, and infer gene-regulatory networks (GRNs) to identify candidate transcription factor (TF) regulators. Utilizing these paired multi-omics data, we define domains of regulatory chromatin (DORCs) of immune stimulation and find that cells alter chromatin accessibility and gene expression at timescales of minutes. Construction of the stimulation GRN elucidates TF activity at disease-associated DORCs. Overall, FigR enables elucidation of regulatory interactions across single-cell data, providing new opportunities to understand the function of cells within tissues.
2021
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.
2020
Cell differentiation and function are regulated across multiple layers of gene regulation, including modulation of gene expression by changes in chromatin accessibility. However, differentiation is an asynchronous process precluding a temporal understanding of regulatory events leading to cell fate commitment. Here we developed simultaneous high-throughput ATAC and RNA expression with sequencing (SHARE-seq), a highly scalable approach for measurement of chromatin accessibility and gene expression in the same single cell, applicable to different tissues. Using 34,774 joint profiles from mouse skin, we develop a computational strategy to identify cis-regulatory interactions and define domains of regulatory chromatin (DORCs) that significantly overlap with super-enhancers. During lineage commitment, chromatin accessibility at DORCs precedes gene expression, suggesting that changes in chromatin accessibility may prime cells for lineage commitment. We computationally infer chromatin potential as a quantitative measure of chromatin lineage-priming and use it to predict cell fate outcomes. SHARE-seq is an extensible platform to study regulatory circuitry across diverse cells in tissues.
The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.
2019
Understanding the mechanism of small molecules is a critical challenge in chemical biology and drug discovery. Medicinal chemistry is essential for elucidating drug mechanism, enabling variation of small molecule structure to gain structure-activity relationships (SARs). However, the development of complementary approaches that systematically vary target protein structure could provide equally informative SARs for investigating drug mechanism and protein function. Here we explore the ability of CRISPR-Cas9 mutagenesis to profile the interactions between lysine-specific histone demethylase 1 (LSD1) and chemical inhibitors in the context of acute myeloid leukemia (AML). Through this approach, termed CRISPR-suppressor scanning, we elucidate drug mechanism of action by showing that LSD1 enzyme activity is not required for AML survival and that LSD1 inhibitors instead function by disrupting interactions between LSD1 and the transcription factor GFI1B on chromatin. Our studies clarify how LSD1 inhibitors mechanistically operate in AML and demonstrate how CRISPR-suppressor scanning can uncover novel aspects of target biology.
