Motivations for the Cancer Cell Line Encyclopedia (CCLE)

Cancer cell lines are the most commonly used models for studying cancer biology, validating cancer targets and for defining drug efficacy.  Prior to the CCLE, cell line investigations were limited to a few commonly used cell lines or at most the 60 cell lines of the NCI60 panel.  For example, at the time of the discovery of EGFR mutations in lung cancer, EGFR inhibitors had been developed using a single cell line, A549 as the EGFR-inhibitor sensitive model.  This starkly contrasts with the number of patients (n=952) treated on the initial phase III trials of EGFR inhibitors.  Hence, the profound sensitivity of cancers bearing activating EGFR mutations was initially missed, at least in part due to the lack of large-scale, robust well-defined cancer cell line models.  As The Cancer Genome Anatomy (TCGA) project embarked on the efforts to define the genetic basis of human cancers it was clear that a similar effort would be required to characterize the cancer cell lines. 

Initial forays into the large-scale genetic and chemical characterization cancer cell lines

With the advent of high-density SNP arrays, the Sellers lab undertook the genetic characterization of NCI60 cell lines using high density SNP arrays.  Intersecting the SNP-array derived copy-number and LOH data with mRNA expression date generated by the NCI60 cell line team led to the discovery of novel amplification events in melanoma targeting the MITF transcription factor.

Following this work NCI60 cell line genomic DNA was subjected to mutation specific genotyping to identify known oncogenic mutations in K-RAS and other oncogenes. This data along with the published BRAF mutation data was used to search for selective compound sensitivities among  the 42,796 compounds for which the −log10(GI50)) was available from the NCI60 profiling efforts.  Here, several MEK inhibitors were found to have markedly increased anti-proliferative activity in BRAF mutant melanoma cells. In short, BRAF mutation predicted sensitivity to MEK inhibition a finding later confirmed in phase III trials. In aggregate, these data suggested that larger-scale genetic characterization of the cancer cell lines coupled to compound or other cell perturbations might unveil predictive drug sensitivities in cancer.

Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma
Nature 2005;436(7047):117‐122. DOI:10.1038/nature03664

BRAF mutation predicts sensitivity to MEK inhibition
Nature 2006;439(7074):358‐362. DOI:10.1038/nature04304

The Cancer Cell Line Encyclopedia Project - A collaboration between the Broad Institute and the Novartis Institutes for Biomedical Research

In 2006 Sellers (Novartis),  Garraway (Broad Institute) and Schlegel (Novartis) crafted the initial project plan for large-scale genetic characterization of ~1000 cancer cell lines. This project was subsequently renewed on two occasions and hence we think of these as the three phases of the CCLE project.

Phase I of the Cancer Cell Line Encyclopedia project

Initiated in January 2008, the overarching goals of this collaboration were: 1) to conduct a detailed genetic and pharmacologic characterization of a large panel of human cancer models; 2) to develop integrated computational analyses that link distinct pharmacologic vulnerabilities to characteristic genetic, gene expression, and cell lineage patterns; and, 3) to translate cell line integrative genomics into cancer patient stratification. 

Accordingly, the team set out to generate the following datasets from comprehensive genetic characterization of 1000 human cancer models.  In phase I, the collective teams acquired 1000 cell lines directly from the relevant publicly accessible cell line repositories including ATCC (American Type Culture Collection), DSMZ (Deutsche Sammlung von Mikroorganismen und Zellkulturen) and  the KCLB (Korean Cell Line Bank). Thus, the genomics data generated are as close to the repository cell line derivatives as we could achieve.

 After expansion of each cell line, DNA and RNA was extracted and used to generate Affymetrix SNP 6.0 data, Affymetrix U133 2.0+ expression array data, point mutation profiles using a SNP genotyping platform called OncoMap 3.0. and hybrid capture exon sequencing of >1600 known or putative cancer genes across the CCLE.  Finally, pharmacologic testing was performed across ~500 cell lines for a set of anti-cancer therapeutics. It is important to note that XX cell lines were found to be mislabeled version of already known cell lines and XX cell lines were found to harbor no genetic alterations and had expression profiles consistent with fibroblasts.

These data were reported in:

The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
Nature 2012, Mar 28;483(7391):603-7. DOI: 10.1038/nature11003

Pharmacogenomic Agreement Between Two Cancer Cell Line Data Sets
Nature 2015, Dec 3;528(7580):84-7. DOI:10.1038/nature15736
The data files from phase I of the CCLE can be found here


Phase II of the Cancer Cell Line Encyclopedia project

Phase II of the CCLE project expanded on the original characterizations by applying the emerging Next-Gen sequencing to further expand and refine the characterization of expressed mRNAs through RNA-seq, by further characterizing genetic alterations through exome sequencing (in this case complimenting the work of the Sanger Center by filling in the uncovered cell lines), by characterizing the miRNA content of all cell lines, by quantifying the metabolite abundance of 225 metabolites across the CCLE, by mass reaction monitoring (MRM) mass spec quantification of bulk Histone H3 tail modifications, and by performing reverse phase protein array analysis on the CCLE in collaboration with Michael Davis and Gordon Mills at MD Anderson.

The resulting data sets from the Phase II project have been published in the following manuscripts:

Global chromatin profiling reveals NSD2 mutations in pediatric acute lymphoblastic leukemia
Nat Genet. 2013 Nov;45(11):1386-91. DOI: 10.1038/ng.2777

Next-generation characterization of the Cancer Cell Line Encyclopedia
Nature. 2019 May;569(7757):503-508. DOI: 10.1038/s41586-019-1186-3. Epub 2019 May 8.

The landscape of cancer cell line metabolism
Nat Med. 2019 May;25(5):850-860 DOI: 10.1038/s41591-019-0404-8. Epub 2019 May 8.


Phase III of the Cancer Cell Line Encyclopedia project

As characterization of cell lines at the level of nucleic acids reached new levels of completeness we continued to strive towards an understanding of the protein content of cell lines. The vast majority of therapeutics act by interrupting or altering protein function and with the growing interested in antibody-drug conjugates, antibody mediated cellular cytotoxicity (ADCC), and CAR-T cells all directed at surface proteins we sought to try and define the CCLE proteome through mass spectrometry. 

To this end, the Gygi lab performed Tandem-mass tagging mass spectrometry to quantify the abundance of proteins in whole cell extracts derived from 375 of the CCLE cell lines.  In addition, serine/threonine phosphorylation events were quantified by cxxxxx.  In collaboration with the Carr Mass Spectrometry platform at the Broad Institute tyrosine phosphorylation was quantified in a small set of cell lines under conditions of distinct therapeutic perturbations. 

The first of these data sets has been published:

Quantitative Proteomics of the Cancer Cell Line Encyclopedia
Cell. 2020;180(2):387‐402.e16. DOI:10.1016/j.cell.2019.12.023