Publications

2022

Schlesinger, Daphne E., Nathaniel Diamant, Aniruddh Raghu, Erik Reinertsen, Katherine Young, Puneet Batra, Eugene Pomerantsev, and Collin M. Stultz. (2026) 2022. “A Deep Learning Model for Inferring Elevated Pulmonary Capillary Wedge Pressures From the 12-Lead Electrocardiogram”. JACC: Advances 1 (1): 100003. https://doi.org/10.1016/j.jacadv.2022.100003.

Publisher's Version

Pirruccello, James P., Mark D. Chaffin, Elizabeth L. Chou, Stephen J. Fleming, Honghuang Lin, Mahan Nekoui, Shaan Khurshid, et al. (2026) 2022. “Deep Learning Enables Genetic Analysis of the Human Thoracic Aorta”. Nature Genetics 54 (1): 40-51. https://doi.org/10.1038/s41588-021-00962-4.

Publisher's Version

Enlargement or aneurysm of the aorta predisposes to dissection, an important cause of sudden death. We trained a deep learning model to evaluate the dimensions of the ascending and descending thoracic aorta in 4.6 million cardiac magnetic resonance images from the UK Biobank. We then conducted genome-wide association studies in 39,688 individuals, identifying 82 loci associated with ascending and 47 with descending thoracic aortic diameter, of which 14 loci overlapped. Transcriptome-wide analyses, rare-variant burden tests and human aortic single nucleus RNA sequencing prioritized genes including SVIL, which was strongly associated with descending aortic diameter. A polygenic score for ascending aortic diameter was associated with thoracic aortic aneurysm in 385,621 UK Biobank participants (hazard ratio = 1.43 per s.d., confidence interval 1.32–1.54, P = 3.3 × 10−20). Our results illustrate the potential for rapidly defining quantitative traits with deep learning, an approach that can be broadly applied to biomedical images.

Khurshid, Shaan, Samuel Friedman, Christopher Reeder, Paolo Di Achille, Nathaniel Diamant, Pulkit Singh, Lia X. Harrington, et al. (2026) 2022. “ECG-Based Deep Learning and Clinical Risk Factors to Predict Atrial Fibrillation”. Circulation 145 (2): 122-33. https://doi.org/10.1161/CIRCULATIONAHA.121.057480.

Publisher's Version

Background:Artificial intelligence (AI)–enabled analysis of 12-lead ECGs may facilitate efficient estimation of incident atrial fibrillation (AF) risk. However, it remains unclear whether AI provides meaningful and generalizable improvement in predictive accuracy beyond clinical risk factors for AF.Methods:We trained a convolutional neural network (ECG-AI) to infer 5-year incident AF risk using 12-lead ECGs in patients receiving longitudinal primary care at Massachusetts General Hospital (MGH). We then fit 3 Cox proportional hazards models, composed of ECG-AI 5-year AF probability, CHARGE-AF clinical risk score (Cohorts for Heart and Aging in Genomic Epidemiology–Atrial Fibrillation), and terms for both ECG-AI and CHARGE-AF (CH-AI), respectively. We assessed model performance by calculating discrimination (area under the receiver operating characteristic curve) and calibration in an internal test set and 2 external test sets (Brigham and Women’s Hospital [BWH] and UK Biobank). Models were recalibrated to estimate 2-year AF risk in the UK Biobank given limited available follow-up. We used saliency mapping to identify ECG features most influential on ECG-AI risk predictions and assessed correlation between ECG-AI and CHARGE-AF linear predictors.Results:The training set comprised 45 770 individuals (age 55±17 years, 53% women, 2171 AF events) and the test sets comprised 83 162 individuals (age 59±13 years, 56% women, 2424 AF events). Area under the receiver operating characteristic curve was comparable using CHARGE-AF (MGH, 0.802 [95% CI, 0.767–0.836]; BWH, 0.752 [95% CI, 0.741–0.763]; UK Biobank, 0.732 [95% CI, 0.704–0.759]) and ECG-AI (MGH, 0.823 [95% CI, 0.790–0.856]; BWH, 0.747 [95% CI, 0.736–0.759]; UK Biobank, 0.705 [95% CI, 0.673–0.737]). Area under the receiver operating characteristic curve was highest using CH-AI (MGH, 0.838 [95% CI, 0.807 to 0.869]; BWH, 0.777 [95% CI, 0.766 to 0.788]; UK Biobank, 0.746 [95% CI, 0.716 to 0.776]). Calibration error was low using ECG-AI (MGH, 0.0212; BWH, 0.0129; UK Biobank, 0.0035) and CH-AI (MGH, 0.012; BWH, 0.0108; UK Biobank, 0.0001). In saliency analyses, the ECG P-wave had the greatest influence on AI model predictions. ECG-AI and CHARGE-AF linear predictors were correlated (Pearson r: MGH, 0.61; BWH, 0.66; UK Biobank, 0.41).Conclusions:AI-based analysis of 12-lead ECGs has similar predictive usefulness to a clinical risk factor model for incident AF and the approaches are complementary. ECG-AI may enable efficient quantification of future AF risk.

Khurshid, Shaan, Julieta Lazarte, James P. Pirruccello, Lu-Chen Weng, Seung Hoan Choi, Amelia W. Hall, Xin Wang, et al. (2026) 2022. “Clinical and Genetic Associations of Deep Learning-Derived Cardiac Magnetic Resonance-Based Left Ventricular Mass”. https://doi.org/10.1101/2022.01.09.22268962.

Publisher's Version

ABSTRACT Increased left ventricular (LV) mass (LVM) and LV hypertrophy (LVH) are risk markers for adverse cardiovascular events, and may indicate an underlying cardiomyopathy. Cardiac magnetic resonance (CMR) is the gold standard for LVM estimation, but is challenging to obtain at scale, which has limited the power of prior genetic analyses. In the current study, we performed a genome-wide association study (GWAS) of CMR-derived LVM indexed to body surface area (LVMI) estimated using a deep learning algorithm within nearly 50,000 participants from the UK Biobank. We identified 12 independent associations (1 known at TTN and 11 novel) meeting genome-wide significance, implicating several candidate genes previously associated with cardiac contractility and cardiomyopathy. Greater CMR-derived LVMI was associated with higher risk of incident dilated (hazard ratio [HR] 2.58 per 1-SD increase, 95% CI 2.10-3.17) and hypertrophic (HR 2.62, 95% CI 2.09-3.30) cardiomyopathies. A polygenic risk score (PRS) for LVMI was also associated with incident hypertrophic cardiomyopathy within a separate set of UK Biobank participants (HR 1.12, 95% CI 1.01-1.12) and among individuals in an external Mass General Brigham dataset (HR 1.18, 95% CI 1.01-1.37). In summary, using CMR-derived LVM available at scale, we have identified 12 common variants associated with LVMI (11 novel) and demonstrated that both CMR-derived and genetically determined LVMI are associated with risk of incident cardiomyopathy. Journal Subject Terms machine learning, left ventricular hypertrophy, genetics

Klarqvist, Marcus D.R., Saaket Agrawal, Nathaniel Diamant, Patrick T. Ellinor, Anthony Philippakis, Kenney Ng, Puneet Batra, and Amit V. Khera. (2026) 2022. “Estimating Body Fat Distribution - a Driver of Cardiometabolic Health - from Silhouette Images”. https://doi.org/10.1101/2022.01.14.22269328.

Publisher's Version

Background: Inter-individual variation in fat distribution is increasingly recognized as clinically important but is not routinely assessed in clinical practice because quantification requires medical imaging. Objectives: We hypothesized that a deep learning model trained on an individual s body shape outline - or silhouette - would enable accurate estimation of specific fat depots, including visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volumes, and VAT/ASAT ratio. We additionally set out to study whether silhouette-estimated VAT/ASAT ratio may stratify risk of cardiometabolic diseases independent of body mass index (BMI) and waist circumference. Methods: Two-dimensional coronal and sagittal silhouettes were constructed from whole-body magnetic resonance images in 40,032 participants of the UK Biobank and used to train a convolutional neural network to predict VAT, ASAT, and GFAT volumes, and VAT/ASAT ratio. Logistic and Cox regressions were used to determine the independent association of silhouette-predicted VAT/ASAT ratio with type 2 diabetes and coronary artery disease.

Diamant, Nathaniel, Erik Reinertsen, Steven Song, Aaron D. Aguirre, Collin M. Stultz, and Puneet Batra. (2026) 2022. “Patient Contrastive Learning: A Performant, Expressive, and Practical Approach to Electrocardiogram Modeling”. PLOS Computational Biology 18 (2): e1009862. https://doi.org/10.1371/journal.pcbi.1009862.

Publisher's Version

Supervised machine learning applications in health care are often limited due to a scarcity of labeled training data. To mitigate the effect of small sample size, we introduce a pre-training approach, Patient Contrastive Learning of Representations (PCLR), which creates latent representations of electrocardiograms (ECGs) from a large number of unlabeled examples using contrastive learning. The resulting representations are expressive, performant, and practical across a wide spectrum of clinical tasks. We develop PCLR using a large health care system with over 3.2 million 12-lead ECGs and demonstrate that training linear models on PCLR representations achieves a 51% performance increase, on average, over six training set sizes and four tasks (sex classification, age regression, and the detection of left ventricular hypertrophy and atrial fibrillation), relative to training neural network models from scratch. We also compared PCLR to three other ECG pre-training approaches (supervised pre-training, unsupervised pre-training with an autoencoder, and pre-training using a contrastive multi ECG-segment approach), and show significant performance benefits in three out of four tasks. We found an average performance benefit of 47% over the other models and an average of a 9% performance benefit compared to best model for each task. We release PCLR to enable others to extract ECG representations at https://github.com/broadinstitute/ml4h/tree/master/modelżoo/PCLR.

2021

Agrawal, Saaket, Marcus D. R. Klarqvist, Nathaniel Diamant, Patrick T. Ellinor, Nehal N. Mehta, Anthony Philippakis, Kenney Ng, Puneet Batra, and Amit V. Khera. (2026) 2021. “Association of Machine Learning-Derived Measures of Body Fat Distribution in >40,000 Individuals With Cardiometabolic Diseases”. MedRxiv, 2021.05.07.21256854. https://doi.org/10.1101/2021.05.07.21256854.

Publisher's Version

\textlessh3\textgreaterABSTRACT\textless/h3\textgreater \textlessh3\textgreaterBackground\textless/h3\textgreater \textlessp\textgreaterObesity is defined based on body-mass index (BMI), a proxy for overall adiposity. However, for any given BMI, individuals vary substantially in fat distribution. The clinical implications of this variability are not fully understood.\textless/p\textgreater\textlessh3\textgreaterMethods\textless/h3\textgreater \textlessp\textgreaterWe studied MRI imaging data of 40,032 UK Biobank participants. Using previously quantified visceral (VAT), abdominal subcutaneous (ASAT), and gluteofemoral (GFAT) adipose tissue volume in up to 9,041 to train convolutional neural networks, we quantified these depots in the remainder of the participants. We derived new metrics for each adipose depot – fully independent of BMI – by quantifying deviation from values predicted by BMI (e.g. VAT adjusted for BMI, VATadjBMI) and determined associations with cardiometabolic diseases.\textless/p\textgreater\textlessh3\textgreaterResults\textless/h3\textgreater \textlessp\textgreaterMachine learning models based on two-dimensional projection images enabled near-perfect estimation of VAT, ASAT, and GFAT, with r$^\textrm2$ in a holdout testing dataset >0.97 for each. Using the newly derived measures of local adiposity – residualized based on BMI – we note marked heterogeneity in associations with cardiometabolic diseases. Taking presence of type 2 diabetes as an example, VATadjBMI was associated with significantly increased risk (odds ratio per standard deviation increase (OR/SD) 1.49; 95%CI: 1.43-1.55), while ASATadjBMI was largely neutral (OR/SD 1.08; 95%CI: 1.03-1.14) and GFATadjBMI conferred protection (OR/SD 0.75; 95%CI: 0.71-0.79). Similar patterns were observed for coronary artery disease.\textless/p\textgreater\textlessh3\textgreaterConclusions\textless/h3\textgreater \textlessp\textgreaterFor any given BMI, measures of local adiposity have variable and divergent associations with cardiometabolic diseases.\textless/p\textgreater

Rambarat, Paula, Emily K Zern, Dongyu Wang, Shahrooz Zarbafian, Elizabeth E Liu, Jessica K Wang, Jenna N McNeill, et al. (2026) 2021. “Abstract 10389: Identifying High-Risk Clinical Phenogroups of Pulmonary Hypertension Through a Clustering Analysis”. Circulation 144 (Suppl\_1): A10389—A10389. https://doi.org/10.1161/circ.144.suppl_1.10389.

Publisher's Version

Introduction: The classification and management of pulmonary hypertension (PH) is challenging due to clinical and hemodynamic heterogeneity of patients. We sought to identify distinct phenogroups of PH that are at particularly high-risk for adverse events.Methods: A hospital-based cohort of patients referred for right heart catheterization between 2005-2016 with PH (mean pulmonary artery pressure \textgreater 20 mmHg at rest) were included. Key exclusion criteria were shock, cardiac arrest, cardiac transplant or valvular surgery. K-prototypes, an unsupervised clustering algorithm, was used to cluster patients into subgroups based on 11 clinical covariates. The optimal number of clusters was determined using the silhouette method.Results: Among 5208 patients with mean age 64 (SD 12) years, 39% women, we identified 6 phenogroups when clustering on baseline clinical comorbidities (Table 1). Phenogroups 2 and 4 had the greatest baseline prevalence of heart failure (both) and diabetes (group 4). Over a median follow-up of 6.3 (IQR 3.6 to 9.8) years we observed 2182 deaths and 2002 major cardiovascular events (MACE). Phenogroups 2 and 4 had the highest risk for future adverse events including death (age and sex adjusted HR 1.33, 95% CI 1.05-1.68 and 1.42, 95% CI 1.13-1.77, each compared with the lowest risk group 3 respectively) and MACE (HR 5.97, 95% CI 4.83-7.38 and 4.06, 95% CI 3.30-4.99, compared with group 3 respectively; Figure 1).Conclusions: Cluster-based analyses identify patients with PH and specific comorbid cardiovascular burden that are at higher risk for adverse clinical outcomes. Further studies are needed to better understand clinical heterogeneity among patients with PH.Download figureDownload figure

Khurshid, Shaan, Samuel Friedman, Christopher Reeder, Paolo Di Achille, Nathaniel Diamant, Pulkit Singh, Lia Harrington, et al. (2026) 2021. “Abstract 12922: Electrocardiogram-Based Deep Learning and Clinical Risk Factors to Predict Incident Atrial Fibrillation”. Circulation 144 (Suppl\_1): A12922—A12922. https://doi.org/10.1161/circ.144.suppl_1.12922.

Publisher's Version

Introduction: Deep learning-derived representations of 12-lead electrocardiograms (ECGs) may allow for atrial fibrillation (AF) risk prediction. However, it remains unclear whether ECG-based artificial intelligence improves prediction beyond established clinical risk factors for AF and whether predictions are generalizable.Methods: Within a dataset comprising over 500,000 individuals receiving regular primary care at a multi-institutional network, we trained a convolutional neural network to predict incident AF using 12-lead ECGs (“ECG-AI”). ECG-AI was trained in individuals with ≥1 ECG performed at Massachusetts General Hospital (MGH) within 3 years prior to start of follow-up. We then fit a Cox proportional hazards model with incident AF as the outcome and a) logit-transformed ECG-AI AF probability, and b) the Cohorts for Aging and Genomic Epidemiology AF (CHARGE-AF) score, as covariates (“CH-AI”). We compared the discrimination and calibration of CHARGE-AF versus CH-AI in three independent samples: MGH (n=4,166), Brigham and Women’s Hospital (BWH, n=37,963) and the UK Biobank (n=41,034). Based on available follow-up, AF was evaluated at 5 years in MGH and BWH, and 2 years in the UK Biobank.Results: ECG-AI was trained in 36,081 individuals with an ECG performed at MGH (mean age 55±17, 53% female). CH-AI had substantially better discrimination (area under the receiver operating characteristic curve [AUROC]: MGH 0.838, BWH 0.777, UK Biobank 0.746; average precision [AP] 0.30, 0.21, 0.06) versus CHARGE-AF (AUROC: 0.802, 0.752, 0.732; AP 0.21, 0.17, 0.02, Figure). CH-AI was well-calibrated in MGH (calibration error 0.012) and BWH (0.019), but overestimated AF risk in the UK Biobank (0.068). Calibration in the UK Biobank was excellent after recalibration to the sample-level 2-year AF hazard (error 7.1x10-5).Conclusions: A model combining clinical AF risk factors with deep learning-derived ECG-based AF risk is favorable for predicting 5-year risk of AF.Download figure

Al-alusi, Mostafa A, Shaan Khurshid, Xin Wang, Christopher Reeder, Pulkit Singh, Rachael Venn, Daniel Pipilas, et al. (2026) 2021. “Abstract 11436: Trends in Consumer Wearable Devices in a Primary Care Cohort”. Circulation 144 (Suppl\_1): A11436—A11436. https://doi.org/10.1161/circ.144.suppl_1.11436.

Publisher's Version

Introduction: Consumer wearable devices that record data relevant to cardiac health are widely available. However, whether providers find these devices relevant to clinical care, and in which patients, is unknown.Methods: We identified mention of wearable devices in provider notes between 2005 and 2020 in an EHR cohort of primary care patients in the Mass General Brigham healthcare system. Outpatient notes containing \textgreater250 characters were included. Search terms included “AliveCor,” “Apple Watch,” “Fitbit,” “Garmin,” “smartwatch,” “fitness tracker,” and alternative spellings. Demographics and cardiovascular comorbidities were compared between patients with and without device mentions. Notes with and without device mention were compared with respect to clinic type and diagnosis codes occurring on the same date. Incidence rates for device mention were calculated for each year.Results: The final sample comprised 494,254 patients and 27,221,019 notes. The mean age was 49.1 ± 17.0 years, and median follow-up was 6.9 (Q1 2.7, Q3 11.9) years. A total of 8,955 (1.8%) patients had ≥1 note mentioning a device. The proportion of notes mentioning devices increased over time (Figure), with increased incidence of device mention from 0.006 to 7.5 per 1000 person-years between 2005 and 2019. Most patients with device mentions were female (69.4%) and white (86.7%). At the time of first device mention, the mean age was 55.3 ± 14.0 years and 1,183 (13.2%) patients had prevalent atrial fibrillation (AF). Cardiology clinic notes comprised a greater proportion of notes with device mentions versus notes without device mentions (15.7% [5,336] vs 3.7% [996,971], p\textless0.001). A total of 9.3% (3,160) of notes with device mentions had same-day diagnosis codes for AF, versus 2.7% (730,614) of notes without device mentions (p \textless 0.001).Conclusion: Provider mention of consumer wearable devices in EHR notes has increased over time, and is associated with cardiology clinic notes and diagnosis codes for AF.Download figure