Gene expression prediction from cfMeDIP-seq data

Using machine learning to improve the efficacy of liquid biopsies 

Over the last few years, a new generation of blood tests have emerged with the promise of transforming how we detect, monitor and treat cancer. These tests – known as liquid biopsies – measure small fragments of tumour DNA circulating in the blood, allowing doctors to find and study a patient’s specific cancer without using invasive tissue biopsies. 

While current liquid biopsies are becoming increasingly efficient at providing insights into the DNA of tumours, they still don’t accurately measure an important factor that contributes to cancer development: gene expression. 

“In the past, scientists mostly studied how changes in our DNA, called genetic mutations, cause cancer. But cancer doesn’t happen only because of these mutations. Sometimes, cancer starts because certain genes are turned on or off, which affects how much mRNA is made,” explains Yahan Zhang, a PhD candidate at the Princess Margaret Cancer Centre. “However, it’s currently really hard to measure mRNA levels in blood, which limits the effective use of liquid biopsies to measure these important levels.” 

This challenge is something that Zhang is hoping to solve thanks to new funding from the Marathon of Hope Cancer Centres Network’s 2025 Health Informatics and Data Science Award. Over the next year, she will receive a total of $80,000 (split evenly between the Network and her institution) to build a machine learning tool that can predict mRNA levels from blood samples, hopefully making liquid biopsies more effective. 

The tool will rely on the use of a new technique called cfMeDIP-seq, which researchers can use to analyze levels of DNA methylation, a chemical modification known to control gene activity, in tiny fragments of DNA found in blood. The machine learning model will then be built to infer mRNA levels from this data. 

“Profiling tumour-specific mRNA from blood remains in its early research stages, but this technique has the potential to transform cancer care workflows by integrating molecular profiling of biological analytes,” said Zhang. “With support from MOHCCN, I am grateful for the opportunity to contribute to this challenge by applying machine learning models to predict gene expression and unlock new possibilities for downstream applications.” 

Zhang will train her model on matched blood and tumour datasets across several cancer types and validate its accuracy on unseen data to ensure broad applicability. By doing this, she hopes to pave the way for faster, more accessible, and more personalized cancer care. 

“This project will facilitate the conversion of data from advanced liquid biopsy technology from a patient blood sample into information about gene expression programs in cancer cells,” said Dr. Robert Kridel, who will mentor Zhang as part of this project. “The methodology we develop will assist clinicians in making better diagnoses and choosing the most effective therapies earlier in the course of cancer progression.”