A data resource for Canadian by Canadians: Understanding the MOHCCN Gold Cohort’s place in the global context
Precision oncology has the potential to transform cancer care, allowing patients to receive specific therapies targeted to the exact genetic changes that cause their cancer, thus improving their outcomes and quality of life. But to this day, only a small subset of patients benefits from this new approach to cancer care.
To make precision medicine a reality for more cancer patients, clinicians and researchers need to better understand how cancer is formed in each individual, why certain patients respond differently to similar treatments and how to best target the unique vulnerabilities in each patient’s cancer. And to gain this understanding they need access to data—more than any single cancer centre in Canada can generate on its own.
Recognizing this, in 2021 the Terry Fox Research Institute and Foundation, along with the Government of Canada and partners across the country, launched the Marathon of Hope Cancer Centres Network (MOHCCN), to unite the Canadian cancer research and care communities and create and deploy a national strategy to accelerate precision oncology research and care for the benefit of all cancer patients in Canada. One of the main pillars in this strategy is the creation of the MOHCCN Gold Cohort, a national data resource aimed at unlocking data-driven discoveries and transform how cancer is studied and treated.
As the Network celebrates the completion of Phase 1 of this resource with data from more than 15,000 cancer patients treated in centres across the country, it's important to address three questions we often get about the Gold Cohort: what exactly is it; how does it differ from other large-scale genomics initiatives around the world; and how will it be deployed to benefit cancer patients in Canada and beyond.
Understanding the MOHCCN Gold Cohort
Funding agreements signed with the Government of Canada in 2021 outlined that one of the main deliverables of the first phase of the Network (2021-26) was the creation of a national data resource that included high-quality data from 15,000 diverse cancer patients treated across the country. This resource would be built in a way that would help identify specific genetic and molecular changes driving each patient’s cancer, better understand why some patients respond to specific therapies while other don’t and eventually help personalize therapies for more patients, improving their outcomes and quality of life.
Importantly, this data resource would be built in Canada, by Canadians for Canadians. It would reflect Canada’s unique and diverse populations and unite experts and institutions across the country to enable solutions tailored to the Canadian context.
To determine what this resource would look like, the Network brought together thought leaders and experts from across the country. From the outset, their view was clear: the Network needed to invest in generating data that are complete, accurate, permanent, and accessible.
In practice, these principles mean that data should be as comprehensive (complete) and high-quality (accurate) as current technologies allow; structured in ways that support analyses both today and into the future (permanent); and shared broadly to maximize research impact rather than remaining siloed within individual institutions (accessible).
To meet these goals, experts determined that the MOHCCN Gold Cohort, as the resource was named, should include the following types of data:
Whole-genome sequencing (WGS) of tumour DNA – provides a comprehensive view of all genetic alterations in a patient’s cancer, enabling the discovery of known and novel drivers of disease.
RNA sequencing (RNA-seq) of tumour tissue, when technically achievable – captures gene activity within tumours, helping researchers understand how genetic changes are expressed and influence cancer behaviour.
Whole-genome sequencing of matched normal (non-cancerous) DNA, typically from blood – distinguishes inherited genetic variants from tumour-specific mutations, clarifying what changes are driving the cancer.
Standardized clinical data (captured using the MOHCCN Clinical Data Model) – links genomic findings to patient characteristics, treatments and outcomes, enabling insights that can inform more personalized care
Together, these data provide the most complete view of a cancer’s DNA and RNA possible with technologies available today, capturing both known and yet-to-be-discovered features of tumour biology, This allows researchers to better understand what genetic alterations are present in a patient’s cancer cells and which may have been inherited and to explore how molecular characteristics influence disease progression, treatment response and the tendency to recur in people with cancer.
The result is a unique resource that makes possible new investigations into how cancers arise, progress and respond to treatments—and how this information can be leveraged to provide personalized cancer care to patients across Canada.
Building on global efforts
The decision to create this resource and include these distinct data types did not happen in a vacuum. In fact, the Gold Cohort builds on years of work from many large-scale global initiatives that have set the stage for the characterization of cancers at the molecular level.
Early large-scale initiatives
The Cancer Genome Atlas (TCGA), launched in 2006 by the US National Cancer Institute and the National Human Genome Research Institute, was the first large-scale cancer genomics initiative. Over a decade, TCGA generated molecular data from more than 11,000 cases across 33 cancer types. While transformative for the field, TCGA cases were not uniformly profiled and mostly relied on technologies such as exome sequencing, which interrogate only a small fraction (~2%) of the human genome.
TCGA was also a major contributor to the International Cancer Genome Consortium (ICGC), whose initial dataset—generated through multiple initiatives across different countries—encompassed data from approximately 24,000 patients worldwide.
A shift toward comprehensive profiling
As sequencing costs declined through the 2010s, whole-genome sequencing became increasingly feasible at scale. In 2020, the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) consortium published analyses of tumour and matched normal whole-genome data from 2,658 patients across 28 cancer types, representing nearly all cancer genomes that were publicly available at the time of the study. RNA-seq data were also available for just under half of these cases.
More recently, national initiatives have produced larger WGS-based datasets. In 2024, the Cancer Programme of the UK’s 100,000 Genomes Project reported analyses of 13,880 tumour genomes across 33 cancer types. As of writing, the Netherlands’ Hartwig Medical Foundation has also generated whole-genome sequencing data from more than 8,000 patients with metastatic cancer, with RNA-seq data available for about one-third of cases.
Scaling clinical genomics differently
While comprehensive whole-genome profiling is invaluable for research, more targeted sequencing approaches — such as ‘panel’ tests that deeply profile up to hundreds of known cancer genes — are widely used in clinical settings to identify actionable mutations. Leveraging the scale of these tests, Project GENIE, led by the American Association for Cancer Research, has assembled genomic and clinical data from more than 200,000 patients across multiple countries, including Canada.
Similarly, the ICGC’s Accelerating Research in Genomic Oncology (ICGC ARGO) initiative aims to analyze genomic data from 100,000 patients globally by 2028, using a mix of sequencing approaches. Several Canadian projects contribute data to both ICGC ARGO and the Gold Cohort.
Whole genomes in the clinic
While genetic testing has been increasingly integrated into routine clinical cancer care, the shift to more comprehensive profiling in research has also revealed potential benefits for this type of approach in clinical settings. Several countries have launched programs to study the impact of comprehensive genomics in clinical care, particularly in cancer and rare diseases. Examples include France’s Genomic Medicine Initiative (PFMG2025), which has generated whole-genome and transcriptome sequencing data from more than 3,000 cancer patients, and the Australian Genomics initiative, which has generated whole-genome sequencing data from about 2,000 cancer patients.
Against this backdrop, the Gold Cohort occupies a distinct and complementary position within the global cancer genomics ecosystem.
3 ways the Gold Cohort is unique
While the Gold Cohort is not the first large-scale cancer genomics dataset, nor the biggest in terms of number of patients, it is distinctive in its combination of scale and scope as well as in the diversity of data it contains. Together, these features position it as a globally unique resource designed to accelerate discovery, support innovation and ultimately help bring precision oncology to all cancer patients in Canada, improving their outcomes and quality of life.
1. Unmatched scale and scope of comprehensive profiling
The Gold Cohort combines scale — over 15,000 patients to date — with depth, through consistent whole-genome and transcriptome analysis across the dataset. Notably, approximately 85 per cent of cases to date have RNA-seq data. Bringing together comprehensive data at this scale within a single, coordinated framework is a unique achievement to date.
This combination enables analyses that are simply not possible in smaller or less comprehensive datasets, including the identification of rare genetic events, cross-cancer comparisons and discovery of novel mechanisms driving cancer development and progression.
2. Deep, standardized clinical data linked to genomics
A defining strength of the Gold Cohort is the integration of comprehensive genomic data with high-quality clinical data.
Each case includes clinical information captured using the MOHCCN Clinical Data Model, which comprises more than 100 standardized data elements. These include details of diagnosis, treatments received, treatment responses and outcomes. Contributing sites are expected to update clinical data annually for at least five years, enabling long-term follow-up.
This level of clinical annotation supports powerful association studies (for example, linking genetic alterations to treatment response or survival) and creates opportunities to generate insights that can ultimately inform more personalized approaches to care.
Importantly, the MOHCCN Clinical Data Model is closely aligned with the ICGC ARGO Data Dictionary. This interoperability allows the Gold Cohort to be integrated with other major datasets, amplifying its value and enabling even larger, international analyses.
3. Diversity across patients, cancers and disease stages
The Gold Cohort also stands out for its diversity—both in the populations it represents and in the cancers it includes.
Genomics research has historically relied heavily on data from individuals of European ancestry, limiting the generalizability of findings and contributing to inequities in care. Canada’s diverse population provides an opportunity to address this imbalance, and the Gold Cohort reflects patients treated across the country, capturing broad demographic, cultural and ancestral diversity. This is crucial to enable research that can benefit the diversity of people who experience cancer both in this country and around the world.
The dataset also reflects the diversity of cancer itself:
More than 40 cancer types are represented.
Patients span the age spectrum, from paediatric cases to older adults, including at least 1,000 adolescents and young adults.
The cohort includes cancers across disease stages, including at least 2,700 advanced cancers.
At least 2,500 cases involve rare cancers, such as uncommon sarcoma subtypes and rare gynaecological and hematological malignancies.
Crucially, the Gold Cohort also includes both treatment-naïve primary tumours and recurrent or metastatic cancers that have been exposed to therapy.
Earlier initiatives such as TCGA focused almost exclusively on untreated primary tumours, enabling foundational molecular classification of cancers. However, treatment resistance—which contributes to most cancer-related deaths—can only be understood by studying cancers after therapy. By including treatment-resistant and metastatic disease, the Gold Cohort is well positioned to support research into the mechanisms of treatment failure and disease progression.
As explorations of the dataset expand and our understanding of it deepens, its full diversity and potential to improve how we detect, diagnose and treat cancer will continue to emerge.
An invaluable resource for the future of cancer research and personalized care
Completing the first phase of the Gold Cohort is a huge achievement for Canada’s cancer research and care communities, reflecting an extraordinary effort by Canadians for Canadians. Beyond the transformational power of the data themselves, the joint effort required to build this resource has laid the foundation for a new era of collaboration across Canadian institutions and their research and clinical communities. This united approach has helped create a national strategy for the acceleration of precision medicine across the country and has led to the development of publicly available guidelines, standards and best practices that can support other large-scale genomic initiatives in Canada and internationally.
As the Network advances into its second phase, the Gold Cohort will continue to grow, along with its impact in the clinic. Of the 15,000 patients who donated their genomic and clinical data to Phase 1 of the resource, at least 1,500 had results of their genomic analysis returned to their care teams to help inform their treatment — this is precision oncology in action. In Phase 2, the Network will build on this success to leverage comprehensive genomic profiling in real time to inform care for more patients across Canada, accelerating innovation and improving the lives of people with cancer. The Gold Cohort dataset will empower this transition to the clinic and will continue to grow – both in size and types of data, including different profiling technologies and more robust socio-demographic data – as more patients are treated across the Network.
Phase 2 will also see the Network increase its attention toward data analysis and broad research use of the Gold Cohort resource. This will help unlock discoveries for the benefit of all cancer patients in Canada while helping position Canada as a global leader in precision oncology research and care.
Importantly, in these shifting geopolitical times, the Gold Cohort plays a dual role: it helps assert Canadian data sovereignty in precision oncology research while also serving as a tool for enhanced global integration. It is a made-in-Canada solution that will not only benefit Canadian cancer patients but will help position Canada as a global leader in precision oncology research and care
Indeed, the Gold Cohort—and the Network more broadly—are not isolated, but an integral part of a global ecosystem that both benefits from and contributes to collective efforts to accelerate cancer genomics and precision oncology research worldwide.
"Against this backdrop, the Gold Cohort occupies a distinct and complementary position within the global cancer genomics ecosystem."
Related News
-
Network data sharing goes live, allowing researchers to request access to Gold Cohort data
Access requests can be submitted to the MOHCCN Data Access Committee (DAC). -
Network opens Gold Cohort data to researchers for the first time through new pilot program
Five research teams from across the country will use the Network’s robust dataset to accelerate precision oncology research, while testing and validating the Network’s data access and use protocols