Publications

2025

Marshall, Lysette, Soumya Raychaudhuri, and Sebastien Viatte. (2025) 2025. “Understanding Rheumatic Disease through Continuous Cell State Analysis.”. Nature Reviews. Rheumatology. https://doi.org/10.1038/s41584-025-01253-6.

Autoimmune rheumatic diseases are a heterogeneous group of conditions, including rheumatoid arthritis (RA) and systemic lupus erythematosus. With the increasing availability of large single-cell datasets, novel disease-associated cell types continue to be identified and characterized at multiple omics layers, for example, 'T peripheral helper' (TPH) (CXCR5- PD-1hi) cells in RA and systemic lupus erythematosus and MerTK+ myeloid cells in RA. Despite efforts to define disease-relevant cell atlases, the very definition of a 'cell type' or 'lineage' has proven controversial as higher resolution assays emerge. This Review explores the cell types and states involved in disease pathogenesis, with a focus on the shifting perspectives on immune and stromal cell taxonomy. These understandings of cell identity are closely related to the computational methods adopted for analysis, with implications for the interpretation of single-cell data. Understanding the underlying cellular architecture of disease is also crucial for therapeutic research as ambiguity hinders translation to the clinical setting. We discuss the implications of different frameworks for cell identity for disease treatment and the discovery of predictive biomarkers for stratified medicine - an unmet clinical need for autoimmune rheumatic diseases.

Weinand, Kathryn, Erica M Langan, Michelle Curtis, and Soumya Raychaudhuri. (2025) 2025. “Defining Effective Strategies to Integrate Multi-Sample Single-Nucleus ATAC-Seq Datasets via a Multimodal-Guided Approach.”. BioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2025.04.02.646871.

BACKGROUND: Chromatin accessibility, measured via single-nucleus Assay for Transposase-Accessible Chromatin with sequencing (snATAC-seq), can reveal the underpinnings of transcriptional regulation across heterogeneous cell states. As the number and scale of snATAC-seq datasets increases, we need robust computational pipelines to integrate samples within a dataset and datasets across studies. These integration pipelines should correct cell-state-obfuscating technical effects while conserving underlying biological cell states, as has been shown for single-cell RNA-seq (scRNA-seq) pipelines. However, scRNA-seq integration methods have performed inconsistently on snATAC-seq datasets, potentially due to sparsity and genomic feature differences.

RESULTS: Using single-nucleus multimodal datasets profiling ATAC and RNA simultaneously, we can measure snATAC-seq integration method performance by comparison to independently integrated snRNA-seq gold standard embeddings and annotations. Here, we benchmark 58 pipelines, incorporating 7 integration methods plus 1 embedding correction method with 5 feature sets. Using our command-line tool, we assessed 5 multimodal datasets at 3 different resolutions using 2 novel metrics to determine the best practices for multi-sample snATAC-seq integration. ATAC features outperformed Gene Activity Score (GAS) features, and embedding correction with Harmony was generally useful. SnapATAC2, PeakVI, and ArchR's iterative Latent Semantic Indexing (LSI) performed well.

CONCLUSIONS: We recommend SnapATAC2 + Harmony with pre-defined ENCODE candidate cis -regulatory element (cCRE) features as a first-pass pipeline given its metric performance, generalizability of features, and method resource-efficiency. This and other high-performing pipelines will guide future comprehensive gene regulation maps.

Inamo, Jun, Joshua Keegan, Alec Griffith, Tusharkanti Ghosh, Alice Horisberger, Kaitlyn Howard, John F Pulford, et al. (2025) 2025. “Deep Immunophenotyping Reveals Circulating Activated Lymphocytes in Individuals at Risk for Rheumatoid Arthritis.”. The Journal of Clinical Investigation 135 (6). https://doi.org/10.1172/JCI185217.

Rheumatoid arthritis (RA) is a systemic autoimmune disease currently with no universally highly effective prevention strategies. Identifying pathogenic immune phenotypes in at-risk populations prior to clinical onset is crucial to establishing effective prevention strategies. Here, we applied multimodal single-cell technologies (mass cytometry and CITE-Seq) to characterize the immunophenotypes in blood from at-risk individuals (ARIs) identified through the presence of serum antibodies against citrullinated protein antigens (ACPAs) and/or first-degree relative (FDR) status, as compared with patients with established RA and people in a healthy control group. We identified significant cell expansions in ARIs compared with controls, including CCR2+CD4+ T cells, T peripheral helper (Tph) cells, type 1 T helper cells, and CXCR5+CD8+ T cells. We also found that CD15+ classical monocytes were specifically expanded in ACPA-negative FDRs, and an activated PAX5lo naive B cell population was expanded in ACPA-positive FDRs. Further, we uncovered the molecular phenotype of the CCR2+CD4+ T cells, expressing high levels of Th17- and Th22-related signature transcripts including CCR6, IL23R, KLRB1, CD96, and IL22. Our integrated study provides a promising approach to identify targets to improve prevention strategy development for RA.

Xu, Ziqi, Arya Massarat, Laurie Rumker, Melissa Gymrek, Soumya Raychaudhuri, Wei Zhou, and Tiffany Amariuta. (2025) 2025. “Estimating the Cis -Heritability of Gene Expression Using Single Cell Expression Profiles Controls False Positive Rate of EGene Detection.”. BioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2025.02.24.639892.

For gene expression traits, cis -genetic heritability can quantify the strength of genetic regulation in particular cell types, elucidating the cell-type-specificity of disease variants and genes. To estimate gene expression heritability, standard models require a single gene expression value per individual, forcing data from single cell RNA-sequencing (scRNA-seq) experiments to be "pseudobulked". Here, we show that applying standard heritability models to pseudobulk data overestimates gene expression heritability and produces inflated false positive rates for detecting cis -heritable genes. Therefore, we introduce a new method called scGeneHE ( s ingle c ell Gene expression H eritability E stimation), a Poisson mixed-effects model that quantifies the cis -genetic component of gene expression using individual cellular profiles. In simulations, scGeneHE has a consistently well-calibrated false positive rate for eGene detection and unbiasedly estimates cis -heritability at many parameter settings. We applied scGeneHE to scRNA-seq data from 969 individuals, 11 immune cell types, and 822,552 cells from the OneK1K cohort to infer cell-type-specificity of genetic regulation at risk genes for immune-mediated diseases and trace the fluctuation of cis -heritability across cellular populations of varying resolution. In summary, we developed a new statistical method that resolves the analytical challenge of estimating gene expression cis -heritability from native scRNA-seq data.

Millard, Nghia, Jonathan H Chen, Mukta G Palshikar, Karin Pelka, Maxwell Spurrell, Colles Price, Jiang He, Nir Hacohen, Soumya Raychaudhuri, and Ilya Korsunsky. (2025) 2025. “Batch Correcting Single-Cell Spatial Transcriptomics Count Data With Crescendo Improves Visualization and Detection of Spatial Gene Patterns.”. Genome Biology 26 (1): 36. https://doi.org/10.1186/s13059-025-03479-9.

Spatial transcriptomics facilitates gene expression analysis of cells in their spatial anatomical context. Batch effects hinder visualization of gene spatial patterns across samples. We present the Crescendo algorithm to correct for batch effects at the gene expression level and enable accurate visualization of gene expression patterns across multiple samples. We show Crescendo's utility and scalability across three datasets ranging from 170,000 to 7 million single cells across spatial and single-cell RNA sequencing technologies. By correcting for batch effects, Crescendo enhances spatial transcriptomics analyses to detect gene colocalization and ligand-receptor interactions and enables cross-technology information transfer.

Reshef, Yakir, Lakshay Sood, Michelle Curtis, Laurie Rumker, Daniel J Stein, Mukta G Palshikar, Saba Nayar, et al. (2025) 2025. “Powerful and Accurate Case-Control Analysis of Spatial Molecular Data With Deep Learning-Defined Tissue Microniches.”. BioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2025.02.07.637149.

As spatial molecular data grow in scope and resolution, there is a pressing need to identify key spatial structures associated with disease. Current approaches often rely on hand-crafted features such as local abundances of manually annotated, discrete cell types, which may overlook important signals. Here we introduce variational inference-based microniche analysis (VIMA), a method that combines deep learning with principled statistics to discover associated spatial features with greater flexibility and precision. VIMA uses a variational autoencoder to extract numerical "fingerprints" from small tissue patches that capture their biological content. It uses these fingerprints to define a large number of "microniches" - small, potentially overlapping groups of tissue patches with highly similar biology that span multiple samples. It then uses rigorous statistics to identify microniches whose abundance correlates with case-control status. We show in simulations that VIMA is well calibrated and more powerful and accurate than other approaches. We then apply VIMA to a 140-gene spatial transcriptomics dataset in Alzheimer's dementia, a 54-marker CO-Detection by indEXing (CODEX) dataset in ulcerative colitis (UC), and a 7-marker immunohistochemistry dataset in rheumatoid arthritis (RA), in each case recapitulating known biology and identifying novel spatial features of disease.

Donado, Carlos A, Erin Theisen, Fan Zhang, Aparna Nathan, Madison L Fairfield, Karishma Vijay Rupani, Dominique Jones, et al. (2025) 2025. “Granzyme K Activates the Entire Complement Cascade.”. Nature. https://doi.org/10.1038/s41586-025-08713-9.

Granzymes are a family of serine proteases mainly expressed by CD8+ T cells, natural killer cells, and innate-like lymphocytes1. Although their primary function is thought to be the induction of cell death in virally infected and tumor cells, accumulating evidence indicates certain granzymes can elicit inflammation by acting on extracellular substrates1. Recently, we found that the majority of tissue CD8+ T cells in rheumatoid arthritis (RA) synovium and in inflamed organs across other diseases express granzyme K (GZMK)2, a tryptase-like protease with poorly defined function. Here, we show that GZMK can activate the complement cascade by cleaving C2 and C4. The nascent C4b and C2b fragments form a C3 convertase that cleaves C3, enabling assembly of a C5 convertase that cleaves C5. The resulting convertases generate all the effector molecules of the complement cascade: the anaphylatoxins C3a and C5a, the opsonins C4b and C3b, and the membrane attack complex. In RA synovium, GZMK is enriched in regions with abundant complement activation, and fibroblasts are the major producers of complement proteins that serve as substrates for GZMK-mediated complement activation. Further, Gzmk-deficient mice have less severe arthritis and dermatitis with concomitant decreases in complement activation. Our findings describe the discovery of a previously unidentified mechanism of complement activation that is entirely driven by lymphocyte-derived GZMK. Given the widespread abundance of GZMK-expressing T cells in tissues in chronic inflammatory diseases, GZMK-mediated complement activation is likely to be an important contributor to tissue inflammation in multiple disease contexts.

Mueller, Alisa A, Angela E Zou, Lucy-Jayne Marsh, Samuel Kemble, Saba Nayar, Gerald F M Watts, Cassandra L Murphy, et al. (2025) 2025. “Wnt Signaling Drives Stromal Inflammation in Inflammatory Arthritis.”. BioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2025.01.06.631510.

The concept that fibroblasts are critical mediators of inflammation is an emerging paradigm. In rheumatoid arthritis (RA), they are the main producers of IL-6 as well as a host of other cytokines and chemokines. Their pathologic activation also directly causes cartilage and bone degradation. Yet, therapeutic agents specifically targeting fibroblasts are not available. Here, we find that Wnt receptors and modulators are predominantly expressed in stromal populations in the synovium. Importantly, non-canonical Wnt activation induces robust inflammatory gene expression including an abundance of cytokines and chemokines in synovial fibroblasts in vitro . Strikingly, the addition of Wnt ligands or inhibition of Wnt secretion exacerbates or reduces arthritis severity, respectively, in vivo in a murine model of inflammatory arthritis. These observations are relevant in human disease, as Wnt activation signatures are enhanced in fibroblasts derived from inflamed RA synovial tissue as well as fibroblasts across other inflammatory diseases. Together, these findings implicate Wnt signaling as a major driver of fibroblast-mediated inflammation and joint pathology. They further suggest that targeting the Wnt pathway is a therapeutically relevant approach to rheumatoid arthritis, particularly in patients who do not respond to conventional treatments and who often express fibroblast-predominant synovial phenotypes.

2024

Lagattuta, Kaitlyn A, Ayano C Kohlgruber, Nouran S Abdelfattah, Aparna Nathan, Laurie Rumker, Michael E Birnbaum, Stephen J Elledge, and Soumya Raychaudhuri. (2024) 2024. “The T cell Receptor Sequence Influences the Likelihood of T cell Memory Formation.”. Cell Reports 44 (1): 115098. https://doi.org/10.1016/j.celrep.2024.115098.

The amino acid sequence of the T cell receptor (TCR) varies between T cells of an individual's immune system. Particular TCR residues nearly guarantee mucosal-associated invariant T (MAIT) and natural killer T (NKT) cell transcriptional fates. To define how the TCR sequence affects T cell fates, we analyze the paired αβTCR sequence and transcriptome of 961,531 single cells. We find that hydrophobic complementarity-determining region (CDR)3 residues promote regulatory T cell fates in both the CD8 and CD4 lineages. Most strikingly, we find a set of TCR sequence features that promote the T cell transition from naive to memory. We quantify the extent of these features through our TCR scoring function "TCR-mem." Using TCR transduction experiments, we demonstrate that increased TCR-mem promotes T cell activation, even among T cells that recognize the same antigen. Our results reveal a common set of TCR sequence features that enable T cell activation and immunological memory.

Tegtmeyer, Matthew, Jatin Arora, Samira Asgari, Beth A Cimini, Ajay Nadig, Emily Peirent, Dhara Liyanage, et al. (2024) 2024. “High-Dimensional Phenotyping to Define the Genetic Basis of Cellular Morphology.”. Nature Communications 15 (1): 347. https://doi.org/10.1038/s41467-023-44045-w.

The morphology of cells is dynamic and mediated by genetic and environmental factors. Characterizing how genetic variation impacts cell morphology can provide an important link between disease association and cellular function. Here, we combine genomic sequencing and high-content imaging approaches on iPSCs from 297 unique donors to investigate the relationship between genetic variants and cellular morphology to map what we term cell morphological quantitative trait loci (cmQTLs). We identify novel associations between rare protein altering variants in WASF2, TSPAN15, and PRLR with several morphological traits related to cell shape, nucleic granularity, and mitochondrial distribution. Knockdown of these genes by CRISPRi confirms their role in cell morphology. Analysis of common variants yields one significant association and nominate over 300 variants with suggestive evidence (P < 10-6) of association with one or more morphology traits. We then use these data to make predictions about sample size requirements for increasing discovery in cellular genetic studies. We conclude that, similar to molecular phenotypes, morphological profiling can yield insight about the function of genes and variants.