Publications

2025

Czech, Eric, Will Tyler, Tom White, Ben Jeffery, Timothy R Millar, Benjamin Elsworth, Jérémy Guez, et al. (2025) 2025. “Analysis-Ready VCF at Biobank Scale Using Zarr.”. GigaScience 14. https://doi.org/10.1093/gigascience/giaf049.

BACKGROUND: Variant Call Format (VCF) is the standard file format for interchanging genetic variation data and associated quality control metrics. The usual row-wise encoding of the VCF data model (either as text or packed binary) emphasizes efficient retrieval of all data for a given variant, but accessing data on a field or sample basis is inefficient. The Biobank-scale datasets currently available consist of hundreds of thousands of whole genomes and hundreds of terabytes of compressed VCF. Row-wise data storage is fundamentally unsuitable and a more scalable approach is needed.

RESULTS: Zarr is a format for storing multidimensional data that is widely used across the sciences, and is ideally suited to massively parallel processing. We present the VCF Zarr specification, an encoding of the VCF data model using Zarr, along with fundamental software infrastructure for efficient and reliable conversion at scale. We show how this format is far more efficient than standard VCF-based approaches, and competitive with specialized methods for storing genotype data in terms of compression ratios and single-threaded calculation performance. We present case studies on subsets of 3 large human datasets (Genomics England: $n$=78,195; Our Future Health: $n$=651,050; All of Us: $n$=245,394) along with whole genome datasets for Norway Spruce ($n$=1,063) and SARS-CoV-2 ($n$=4,484,157). We demonstrate the potential for VCF Zarr to enable a new generation of high-performance and cost-effective applications via illustrative examples using cloud computing and GPUs.

CONCLUSIONS: Large row-encoded VCF files are a major bottleneck for current research, and storing and processing these files incurs a substantial cost. The VCF Zarr specification, building on widely used, open-source technologies, has the potential to greatly reduce these costs, and may enable a diverse ecosystem of next-generation tools for analysing genetic variation data directly from cloud-based object stores, while maintaining compatibility with existing file-oriented workflows.

Sept, Corriene E, Esther Tak, Viraat Goel, Mital S Bhakta, Christian G Cerda-Smith, Haley M Hutchinson, Marco Blanchette, et al. (2025) 2025. “High-Resolution CTCF Footprinting Reveals Impact of Chromatin State on Cohesin Extrusion.”. Nature Communications 16 (1): 4506. https://doi.org/10.1038/s41467-025-57775-w.

Cohesin-mediated DNA loop extrusion enables gene regulation by distal enhancers through the establishment of chromosome structure and long-range enhancer-promoter interactions. The best characterized cohesin-related structures, such as topologically associating domains (TADs) anchored at convergent CTCF binding sites, represent static conformations. Consequently, loop extrusion dynamics remain poorly understood. To better characterize static and dynamically extruding chromatin loop structures, we use MNase-based 3D genome assays to simultaneously determine CTCF and cohesin localization as well as the 3D contacts they mediate. Here we present CTCF Analyzer (with) Multinomial Estimation (CAMEL), a tool that identifies CTCF footprints at near base-pair resolution in CTCF MNase HiChiP. We also use Region Capture Micro-C to identify a CTCF-adjacent footprint that is attributed to cohesin occupancy. We leverage this substantial advance in resolution to determine that the fully extruded (CTCF-CTCF loop) state is rare genome-wide with locus-specific variation from  1-10%. We further investigate the impact of chromatin state on loop extrusion dynamics and find that active regulatory elements impede cohesin extrusion. These findings support a model of topological regulation whereby the transient, partially extruded state facilitates enhancer-promoter contacts that can regulate transcription.

Nagano, Masahiro, and Anders S Hansen. (2025) 2025. “Distance Matters: How Protein Regulators Facilitate Enhancer-Promoter Interactions and Transcription.”. Cell Genomics 5 (3): 100817. https://doi.org/10.1016/j.xgen.2025.100817.

Cohesin, transcription factors (TFs), and mediator complex components regulate gene expression partly by regulating enhancer-promoter (E-P) communication. A new study combined E-P distance-controlled reporter screens with the inhibition or degradation of regulatory proteins and uncovered a distance-dependent effect across cohesin, TFs, and mediator complex components.

Narducci, Domenic N, and Anders S Hansen. (2025) 2025. “Putative Looping Factor ZNF143/ZFP143 Is an Essential Transcriptional Regulator With No Looping Function.”. Molecular Cell 85 (1): 9-23.e9. https://doi.org/10.1016/j.molcel.2024.11.032.

Interactions between distal loci, including those involving enhancers and promoters, are a central mechanism of gene regulation in mammals, yet the protein regulators of these interactions remain largely undetermined. The zinc-finger transcription factor (TF) ZNF143/ZFP143 has been strongly implicated as a regulator of chromatin interactions, functioning either with or without CTCF. However, how ZNF143/ZFP143 functions as a looping factor is not well understood. Here, we tagged both CTCF and ZNF143/ZFP143 with dual-purpose degron/imaging tags to combinatorially assess their looping function and effect on each other. We find that ZNF143/ZFP143, contrary to prior reports, possesses no general looping function in mouse and human cells and that it largely functions independently of CTCF. Instead, ZNF143/ZFP143 is an essential and highly conserved transcription factor that largely binds promoters proximally, exhibits an extremely stable chromatin dwell time (>20 min), and regulates an important subset of mitochondrial and ribosomal genes.

Iyer, Ashwin R, Aishwarya Gurumurthy, Shih-Chun A Chu, Rohan Kodgule, Athalee R Aguilar, Travis Saari, Abdullah Ramzan, et al. (2025) 2025. “Selective Enhancer Dependencies in MYC-Intact and MYC-Rearranged Germinal Center B-Cell Diffuse Large B-Cell Lymphoma.”. Blood Cancer Discovery 6 (3): 233-53. https://doi.org/10.1158/2643-3230.BCD-24-0126.

Aberrant MYC activity defines the most aggressive GCB-DLBCLs. We characterized a mechanism of MYC transcriptional activation via a native enhancer that is active in MYC-intact GCB-DLBCL, establishing fitness-sustaining cis- and trans-regulatory circuitry in GCB-DLBCL models that lack MYC enhancer-hijacking rearrangement. See related commentary by Mulet-Lazaro and Delwel, p. 149.

Marston, Nicholas A, Frederick K Kamanu, Giorgio E M Melloni, Gavin Schnitzler, Aaron Hakim, Rosa X Ma, Helen Kang, et al. (2025) 2025. “Endothelial Cell-Related Genetic Variants Identify LDL Cholesterol-Sensitive Individuals Who Derive Greater Benefit from Aggressive Lipid Lowering.”. Nature Medicine 31 (3): 963-69. https://doi.org/10.1038/s41591-025-03533-w.

The role of endothelial cell (EC) dysfunction in contributing to an individual's susceptibility to coronary atherosclerosis and how low-density lipoprotein cholesterol (LDL-C) concentrations might modify this relationship have not been previously studied. Here, from an examination of genome-wide significant single nucleotide polymorphisms associated with coronary artery disease (CAD), we identified variants with effects on EC function and constructed a 35 single nucleotide polymorphism polygenic risk score comprising these EC-specific variants (EC PRS). The association of the EC PRS with the risk of incident cardiovascular disease was tested in 3 cohorts: a primary prevention population in the UK Biobank (UKBB; n = 348,967); a primary prevention cohort from a trial that tested a statin (JUPITER, n = 8,749); and a secondary prevention cohort that tested a PCSK9 inhibitor (FOURIER, n = 14,298). In the UKBB, the EC PRS was independently associated with the risk of incident CAD (adjusted hazard ratio (aHR) per 1 s.d. of 1.24 (95% CI 1.21-1.26), P < 2 × 10-16). Moreover, LDL-C concentration significantly modified this risk: the aHR per 1 s.d. was 1.26 (1.22-1.30) when LDL-C was 150 mg dl-1 but 1.00 (0.85-1.16) when LDL-C was 50 mg dl-1 (Pinteraction = 0.004). The clinical benefit of LDL-C lowering was significantly greater in individuals with a high EC PRS than in individuals with low or intermediate EC PRS, with relative risk reductions of 68% (HR 0.32 (0.18-0.59)) versus 29% (HR 0.71 (0.52-0.95)) in the primary prevention cohort (Pinteraction = 0.02) and 33% (HR 0.67 (0.53-0.83)) versus 8% (HR 0.92 (0.82-1.03)) in the secondary prevention cohort (Pinteraction = 0.01). We conclude that EC PRS quantifies an independent axis of CAD risk that is not currently captured in medical practice and identifies individuals who are more sensitive to the atherogenic effects of LDL-C and who would potentially derive substantially greater benefit from aggressive LDL-C lowering.

Christiansen, Gitte Bundgaard, Liselotte Vogdrup Petersen, Hannah Chatwin, Zeynep Yilmaz, Diana Schendel, Cynthia M Bulik, Jakob Grove, et al. (2025) 2025. “The Role of Co-Occurring Conditions and Genetics in the Associations of Eating Disorders With Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder.”. Molecular Psychiatry 30 (5): 2127-36. https://doi.org/10.1038/s41380-024-02825-w.

Eating disorders (EDs) commonly co-occur with other psychiatric and neurodevelopmental disorders including attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD); however, the pattern of family history and genetic overlap among them requires clarification. This study investigated the diagnostic, familial, and genetic associations of EDs with ADHD and ASD. The nationwide population-based cohort study included all individuals born in Denmark, 1981-2008, linked to their siblings and cousins. Cox regression was used to estimate associations between EDs and ADHD or ASD, and mediation analysis was used to assess the effects of intermediate mood or anxiety disorders. Polygenic scores (PGSs) were used to investigate the genetic association between anorexia nervosa (AN) and ADHD or ASD. Significantly increased risk for any ED was observed following an ADHD or ASD diagnosis. Mediation analysis suggested that intermediate mood or anxiety disorders could account for 44%-100% of the association between ADHD or ASD and ED. Individuals with a full sibling or maternal half sibling with ASD had increased risk of AN compared to those with siblings without ASD. A positive association was found between ASD-PGS and AN risk whereas a negative association was found between AN-PGS and ADHD. In this study, positive phenotypic associations between EDs and ADHD or ASD, mediation by mood or anxiety disorder, and genetic associations between ASD-PGS and AN and between AN-PGS and ADHD were observed. These findings could guide future research in the development of new treatments that can mitigate the development of EDs among individuals with ADHD or ASD.

Martyn, Gabriella E, Michael T Montgomery, Hank Jones, Katherine Guo, Benjamin R Doughty, Johannes Linder, Deepa Bisht, et al. (2025) 2025. “Rewriting Regulatory DNA to Dissect and Reprogram Gene Expression.”. Cell 188 (12): 3349-3366.e23. https://doi.org/10.1016/j.cell.2025.03.034.

Regulatory DNA provides a platform for transcription factor binding to encode cell-type-specific patterns of gene expression. However, the effects and programmability of regulatory DNA sequences remain difficult to map or predict. Here, we develop variant effects from flow-sorting experiments with CRISPR targeting screens (Variant-EFFECTS) to introduce hundreds of designed edits to endogenous regulatory DNA and quantify their effects on gene expression. We systematically dissect and reprogram 3 regulatory elements for 2 genes in 2 cell types. These data reveal endogenous binding sites with effects specific to genomic context, transcription factor motifs with cell-type-specific activities, and limitations of computational models for predicting the effect sizes of variants. We identify small edits that can tune gene expression over a large dynamic range, suggesting new possibilities for prime-editing-based therapeutics targeting regulatory DNA. Variant-EFFECTS provides a generalizable tool to dissect regulatory DNA and to identify genome editing reagents that tune gene expression in an endogenous context.

Bjune, Jan-Inge, Samantha Laber, Laurence Lawrence-Archer, Patrizia M C Nothnagel, Shuntaro Yamada, Xu Zhao, Pouda Panahandeh Strømland, et al. (2025) 2025. “IRX3 Controls a SUMOylation-Dependent Differentiation Switch in Adipocyte Precursor Cells.”. Nature Communications 16 (1): 7248. https://doi.org/10.1038/s41467-025-62361-1.

IRX3 is linked to predisposition to obesity through the FTO locus and is upregulated during early adipogenesis in risk-allele carriers, shifting adipocyte fate toward fat storage. However, how this elevated IRX3 expression influences later developmental stages remains unclear. Here we show that IRX3 regulates adipocyte fate by modulating epigenetic reprogramming. ChIP-sequencing in preadipocytes identifies over 300 IRX3 binding sites, predominantly at promoters of genes involved in SUMOylation and chromatin remodeling. IRX3 knockout alters expression of SUMO pathway genes, increases global SUMOylation, and inhibits PPARγ activity and adipogenesis. Pharmacological SUMOylation inhibition rescues these effects. IRX3 KO also reduces SUMO occupancy at Wnt-related genes, enhancing Wnt signaling and promoting osteogenic fate in 3D cultures. This fate switch is partially reversible by SUMOylation inhibition. We identify IRX3 as a key transcriptional regulator of epigenetic programs, acting upstream of SUMOylation to maintain mesenchymal identity and support adipogenesis while suppressing osteogenesis in mouse embryonic fibroblasts.