Publications

2024

Yao, David, Josh Tycko, Jin Woo Oh, Lexi R Bounds, Sager J Gosai, Lazaros Lataniotis, Ava Mackay-Smith, et al. (2024) 2024. “Multicenter Integrated Analysis of Noncoding CRISPRi Screens.”. Nature Methods 21 (4): 723-34. https://doi.org/10.1038/s41592-024-02216-7.

The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.

Ma, Rosa, Stephanie D Conley, Michael Kosicki, Danila Bredikhin, Ran Cui, Steven Tran, Maya U Sheth, et al. (2024) 2024. “Molecular Convergence of Risk Variants for Congenital Heart Defects Leveraging a Regulatory Map of the Human Fetal Heart.”. MedRxiv : The Preprint Server for Health Sciences. https://doi.org/10.1101/2024.11.20.24317557.

Congenital heart defects (CHD) arise in part due to inherited genetic variants that alter genes and noncoding regulatory elements in the human genome. These variants are thought to act during fetal development to influence the formation of different heart structures. However, identifying the genes, pathways, and cell types that mediate these effects has been challenging due to the immense diversity of cell types involved in heart development as well as the superimposed complexities of interpreting noncoding sequences. As such, understanding the molecular functions of both noncoding and coding variants remains paramount to our fundamental understanding of cardiac development and CHD. Here, we created a gene regulation map of the healthy human fetal heart across developmental time, and applied it to interpret the functions of variants associated with CHD and quantitative cardiac traits. We collected single-cell multiomic data from 734,000 single cells sampled from 41 fetal hearts spanning post-conception weeks 6 to 22, enabling the construction of gene regulation maps in 90 cardiac cell types and states, including rare populations of cardiac conduction cells. Through an unbiased analysis of all 90 cell types, we find that both rare coding variants associated with CHD and common noncoding variants associated with valve traits converge to affect valvular interstitial cells (VICs). VICs are enriched for high expression of known CHD genes previously identified through mapping of rare coding variants. Eight CHD genes, as well as other genes in similar molecular pathways, are linked to common noncoding variants associated with other valve diseases or traits via enhancers in VICs. In addition, certain common noncoding variants impact enhancers with activities highly specific to particular subanatomic structures in the heart, illuminating how such variants can impact specific aspects of heart structure and function. Together, these results implicate new enhancers, genes, and cell types in the genetic etiology of CHD, identify molecular convergence of common noncoding and rare coding variants on VICs, and suggest a more expansive view of the cell types instrumental in genetic risk for CHD, beyond the working cardiomyocyte. This regulatory map of the human fetal heart will provide a foundational resource for understanding cardiac development, interpreting genetic variants associated with heart disease, and discovering targets for cell-type specific therapies.

Amrute, Junedh M, Paul C Lee, Ittai Eres, Chang Jie Mick Lee, Andrea Bredemeyer, Maya U Sheth, Tracy Yamawaki, et al. (2024) 2024. “Single Cell Variant to Enhancer to Gene Map for Coronary Artery Disease.”. MedRxiv : The Preprint Server for Health Sciences. https://doi.org/10.1101/2024.11.13.24317257.

Although genome wide association studies (GWAS) in large populations have identified hundreds of variants associated with common diseases such as coronary artery disease (CAD), most disease-associated variants lie within non-coding regions of the genome, rendering it difficult to determine the downstream causal gene and cell type. Here, we performed paired single nucleus gene expression and chromatin accessibility profiling from 44 human coronary arteries. To link disease variants to molecular traits, we developed a meta-map of 88 samples and discovered 11,182 single-cell chromatin accessibility quantitative trait loci (caQTLs). Heritability enrichment analysis and disease variant mapping demonstrated that smooth muscle cells (SMCs) harbor the greatest genetic risk for CAD. To capture the continuum of SMC cell states in disease, we used dynamic single cell caQTL modeling for the first time in tissue to uncover QTLs whose effects are modified by cell state and expand our insight into genetic regulation of heterogenous cell populations. Notably, we identified a variant in the COL4A1/COL4A2 CAD GWAS locus which becomes a caQTL as SMCs de-differentiate by changing a transcription factor binding site for EGR1/2. To unbiasedly prioritize functional candidate genes, we built a genome-wide single cell variant to enhancer to gene (scV2E2G) map for human CAD to link disease variants to causal genes in cell types. Using this approach, we found several hundred genes predicted to be linked to disease variants in different cell types. Next, we performed genome-wide Hi-C in 16 human coronary arteries to build tissue specific maps of chromatin conformation and link disease variants to integrated chromatin hubs and distal target genes. Using this approach, we show that rs4887091 within the ADAMTS7 CAD GWAS locus modulates function of a super chromatin interactome through a change in a CTCF binding site. Finally, we used CRISPR interference to validate a distal gene, AMOTL2, liked to a CAD GWAS locus. Collectively we provide a disease-agnostic framework to translate human genetic findings to identify pathologic cell states and genes driving disease, producing a comprehensive scV2E2G map with genetic and tissue level convergence for future mechanistic and therapeutic studies.

Schnitzler, Gavin R, Helen Kang, Shi Fang, Ramcharan S Angom, Vivian S Lee-Kim, Rosa Ma, Ronghao Zhou, et al. (2024) 2024. “Convergence of Coronary Artery Disease Genes onto Endothelial Cell Programs.”. Nature 626 (8000): 799-807. https://doi.org/10.1038/s41586-024-07022-x.

Linking variants from genome-wide association studies (GWAS) to underlying mechanisms of disease remains a challenge1-3. For some diseases, a successful strategy has been to look for cases in which multiple GWAS loci contain genes that act in the same biological pathway1-6. However, our knowledge of which genes act in which pathways is incomplete, particularly for cell-type-specific pathways or understudied genes. Here we introduce a method to connect GWAS variants to functions. This method links variants to genes using epigenomics data, links genes to pathways de novo using Perturb-seq and integrates these data to identify convergence of GWAS loci onto pathways. We apply this approach to study the role of endothelial cells in genetic risk for coronary artery disease (CAD), and discover 43 CAD GWAS signals that converge on the cerebral cavernous malformation (CCM) signalling pathway. Two regulators of this pathway, CCM2 and TLNRD1, are each linked to a CAD risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. These results suggest a model whereby CAD risk is driven in part by the convergence of causal genes onto a particular transcriptional pathway in endothelial cells. They highlight shared genes between common and rare vascular diseases (CAD and CCM), and identify TLNRD1 as a new, previously uncharacterized member of the CCM signalling pathway. This approach will be widely useful for linking variants to functions for other common polygenic diseases.

Rahmat, Mahshid, Kendell Clement, Jean-Baptiste Alberge, Romanos Sklavenitis-Pistofidis, Rohan Kodgule, Charles P Fulco, Daniel Heilpern-Mallory, et al. (2024) 2024. “Selective Enhancer Gain-of-Function Deregulates MYC Expression in Multiple Myeloma.”. Cancer Research 84 (24): 4173-83. https://doi.org/10.1158/0008-5472.CAN-24-1440.

MYC deregulation occurs in the majority of multiple myeloma cases and is associated with progression and worse prognosis. Enhanced MYC expression occurs in about 70% of patients with multiple myeloma, but it is known to be driven by translocation or amplification events in only ∼40% of myelomas. Here, we used CRISPR interference to uncover an epigenetic mechanism of MYC regulation whereby increased accessibility of a plasma cell-type-specific enhancer leads to increased MYC expression. This native enhancer activity was not associated with enhancer hijacking events but led to specific binding of cMAF, IRF4, and SPIB transcription factors that activated MYC expression in the absence of known genetic aberrations. In addition, focal amplification was another mechanism of activation of this enhancer in approximately 3.4% of patients with multiple myeloma. Together, these findings define an epigenetic mechanism of MYC deregulation in multiple myeloma beyond known translocations or amplifications and point to the importance of noncoding regulatory elements and their associated transcription factor networks as drivers of multiple myeloma progression. Significance: The discovery of a native developmental enhancer that sustains the expression of MYC in a subset of myelomas could help identify predictive biomarkers and therapeutic targets to improve the outcomes of patients with multiple myeloma.

Tervi, Anniina, Markus Ramste, Erik Abner, Paul Cheng, Jacqueline M Lane, Matthew Maher, Jesse Valliere, et al. (2024) 2024. “Genetic and Functional Analysis of Raynaud’s Syndrome Implicates Loci in Vasculature and Immunity.”. Cell Genomics 4 (9): 100630. https://doi.org/10.1016/j.xgen.2024.100630.

Raynaud's syndrome is a dysautonomia where exposure to cold causes vasoconstriction and hypoxia, particularly in the extremities. We performed meta-analysis in four cohorts and discovered eight loci (ADRA2A, IRX1, NOS3, ACVR2A, TMEM51, PCDH10-DT, HLA, and RAB6C) where ADRA2A, ACVR2A, NOS3, TMEM51, and IRX1 co-localized with expression quantitative trait loci (eQTLs), particularly in distal arteries. CRISPR gene editing further showed that ADRA2A and NOS3 loci modified gene expression and in situ RNAscope clarified the specificity of ADRA2A in small vessels and IRX1 around small capillaries in the skin. A functional contraction assay in the cold showed lower contraction in ADRA2A-deficient and higher contraction in ADRA2A-overexpressing smooth muscle cells. Overall, our study highlights the power of genome-wide association testing with functional follow-up as a method to understand complex diseases. The results indicate temperature-dependent adrenergic signaling through ADRA2A, effects at the microvasculature by IRX1, endothelial signaling by NOS3, and immune mechanisms by the HLA locus in Raynaud's syndrome.

Jefsen, Oskar Hougaard, Katrine Holde, John J McGrath, Veera Manikandan Rajagopal, Clara Albiñana, Bjarni Jóhann Vilhjálmsson, Jakob Grove, et al. (2024) 2024. “Polygenic Risk of Mental Disorders and Subject-Specific School Grades.”. Biological Psychiatry 96 (3): 222-29. https://doi.org/10.1016/j.biopsych.2023.11.020.

BACKGROUND: Education is essential for socioeconomic security and long-term mental health; however, mental disorders are often detrimental to the educational trajectory. Genetic correlations between mental disorders and educational attainment do not always align with corresponding phenotypic associations, implying heterogeneity in the genetic overlap.

METHODS: We unraveled this heterogeneity by investigating associations between polygenic risk scores for 6 mental disorders and fine-grained school outcomes: school grades in language and mathematics in ninth grade and high school, as well as educational attainment by age 25, using nationwide-representative data from established cohorts (N = 79,489).

RESULTS: High polygenic liability of attention-deficit/hyperactivity disorder was associated with lower grades in language and mathematics, whereas high polygenic risk of anorexia nervosa or bipolar disorder was associated with higher grades in language and mathematics. Associations between polygenic risk and school grades were mixed for schizophrenia and major depressive disorder and neutral for autism spectrum disorder.

CONCLUSIONS: Polygenic risk scores for mental disorders are differentially associated with language and mathematics school grades.

Huerta-Chagoya, Alicia, Philip Schroeder, Ravi Mandla, Jiang Li, Lowri Morris, Maheak Vora, Ahmed Alkanaq, et al. (2024) 2024. “Rare Variant Analyses in 51,256 Type 2 Diabetes Cases and 370,487 Controls Reveal the Pathogenicity Spectrum of Monogenic Diabetes Genes.”. Nature Genetics 56 (11): 2370-79. https://doi.org/10.1038/s41588-024-01947-9.

Type 2 diabetes (T2D) genome-wide association studies (GWASs) often overlook rare variants as a result of previous imputation panels' limitations and scarce whole-genome sequencing (WGS) data. We used TOPMed imputation and WGS to conduct the largest T2D GWAS meta-analysis involving 51,256 cases of T2D and 370,487 controls, targeting variants with a minor allele frequency as low as 5 × 10-5. We identified 12 new variants, including a rare African/African American-enriched enhancer variant near the LEP gene (rs147287548), associated with fourfold increased T2D risk. We also identified a rare missense variant in HNF4A (p.Arg114Trp), associated with eightfold increased T2D risk, previously reported in maturity-onset diabetes of the young with reduced penetrance, but observed here in a T2D GWAS. We further leveraged these data to analyze 1,634 ClinVar variants in 22 genes related to monogenic diabetes, identifying two additional rare variants in HNF1A and GCK associated with fivefold and eightfold increased T2D risk, respectively, the effects of which were modified by the individual's polygenic risk score. For 21% of the variants with conflicting interpretations or uncertain significance in ClinVar, we provided support of being benign based on their lack of association with T2D. Our work provides a framework for using rare variant GWASs to identify large-effect variants and assess variant pathogenicity in monogenic diabetes genes.

Koller, Dora, Marina Mitjans, Manuela Kouakou, Eleni Friligkou, Brenda Cabrera-Mendoza, Joseph D Deak, Natalia Llonga, et al. (2024) 2024. “Genetic Contribution to the Comorbidity Between Attention-Deficit/Hyperactivity Disorder and Substance Use Disorders.”. Psychiatry Research 333: 115758. https://doi.org/10.1016/j.psychres.2024.115758.

We characterized the genetic architecture of the attention-deficit hyperactivity disorder-substance use disorder (ADHD-SUD) relationship by investigating genetic correlation, causality, pleiotropy, and common polygenic risk. Summary statistics from genome-wide association studies (GWAS) were used to investigate ADHD (Neff = 51,568), cannabis use disorder (CanUD, Neff = 161,053), opioid use disorder (OUD, Neff = 57,120), problematic alcohol use (PAU, Neff = 502,272), and problematic tobacco use (PTU, Neff = 97,836). ADHD, CanUD, and OUD GWAS meta-analyses included cohorts with case definitions based on different diagnostic criteria. PAU GWAS combined information related to alcohol use disorder, alcohol dependence, and the items related to alcohol problematic consequences assessed by the alcohol use disorders identification test. PTU GWAS was generated a multi-trait analysis including information regarding Fagerström Test for Nicotine Dependence and cigarettes per day. Linkage disequilibrium score regression analyses indicated positive genetic correlation with CanUD, OUD, PAU, and PTU. Genomic structural equation modeling showed that these genetic correlations were related to two latent factors: one including ADHD, CanUD, and PTU and the other with OUD and PAU. The evidence of a causal effect of PAU and PTU on ADHD was stronger than the reverse in the two-sample Mendelian randomization analysis. Conversely, similar strength of evidence was found between ADHD and CanUD. CADM2 rs62250713 was a pleiotropic SNP between ADHD and all SUDs. We found seven, one, and twenty-eight pleiotropic variants between ADHD and CanUD, PAU, and PTU, respectively. Finally, OUD, CanUD, and PAU PRS were associated with increased odds of ADHD. Our findings demonstrated the contribution of multiple pleiotropic mechanisms to the comorbidity between ADHD and SUDs.