Jul 20

Cancer Data and Computation in the Cloud: One Path to Affordable Genomics Research

By Gad Getz, Ph.D., Broad Institute / MGH

The cost of DNA sequencing has dropped more than one million-fold over the last decade, making it increasingly possible to discover the genetic basis of cancer and response to treatment. Three challenges, however, impede this goal:

1) Analysts lack the resources to download, store and compute on the data;

2) Existing tools have not been designed to scale to handle petabytes or exabytes; and

3) Sharing and collaboration are hindered by the current model of storing data locally.

Large-scale sequencing efforts such as The Cancer Genome Atlas (TCGA) have begun to elucidate the genetic pathogenesis of cancer, enabling the development of targeted therapies. To enter an era of true precision medicine, however, we need to create sophisticated information technologies to store, analyze, and share data. FireCloud, and other cloud-based analysis platforms, offer a solution.

FireCloud, one of three NCI-funded Cancer Genomics Cloud (CGC) Pilots, democratizes data access and facilitates collaboration by providing a robust, scalable platform accessible to the public. Cloud-based analysis platforms like FireCloud provide elastic compute capacity that will enable the cancer research community to perform powerful analyses and facilitate the discovery of new biological findings.

Much of the cost of genomic research lies in the massive computational resources required, as well as the need for huge amounts of storage. Some large institutions may have the resources to fund these activities and establish such an infrastructure, but many do not. FireCloud, and the other two pilots, Seven Bridges Genomics and the Institute for Systems Biology, co-locate the data and the computational power in the cloud, so researchers can access it and perform analyses from anywhere they have an internet connection. Eliminating the need for redundant, costly infrastructures drastically reduces the cost of genomics research. While there will still be some cost associated with using the Cloud Pilots, it is much more affordable for a wide variety of institutions and scientists, democratizing access to the data and the ability to compute on it.

Moving forward, the plan is for FireCloud and the other NCI CGC Pilots to support increasingly large datasets in the cloud, so that users will not need to download and store their own data locally. Currently, FireCloud provides curated TCGA data and will soon include data from the Cancer Cell Line Encyclopedia (CCLE) project, the Cancer Genome Characterization Initiative (CGCI), the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) initiative, and the Genotype-Tissue Expression (GTEx) project.




Using the scalable cloud-compute infrastructure of FireCloud, my lab hopes to leverage these large datasets to obtain sufficient power to significantly enhance our understanding of driver genes and pathways, biomarkers associated with clinical outcome, molecular subtypes of cancer, mutational processes, and germline risk alleles.

My goal is that other cancer genomics labs will explore how cloud-based analysis platforms like FireCloud can drive breakthroughs in their own research. My hope is that the elastic compute capacity of FireCloud will provide a much more affordable alternative to a lab’s internal computing capabilities.  Since much of the data will be housed in open access cloud buckets, researchers will not have to worry about downloading and storing data, and can thus focus more on the science and the discoveries that lead to new cancer treatments.

To learn more:

You can also provide feedback and ask questions in the FireCloud Forum. We are still actively developing FireCloud and would like to hear from you.

In addition, please use the FireCloud Forum to let us know if you are interested in contributing your own tools and pipelines. Let’s work together to build a better system!


– G.G.

Dr. Getz is an internation­ally acclaimed leader in cancer genome analysis and is pioneering widely used analytic programs in cancer genomic sequence analysis. Dr. Getz is an Associate Professor of Pathology in Harvard Medical School. He is faculty and Director of Bioinformatics at the Massachusetts General Hospital Cancer Center and Department of Pathology, and is an Institute Member of the Broad Institute of Harvard and MIT, where he directs the Cancer Genome Computational Analysis Group. He is also the inaugural incumbent of the Paul C. Zamecnik Chair in Oncology at the MGH Cancer Center.


Permanent link to this article: https://ncip.nci.nih.gov/blog/cancer-data-computation-cloud-one-path-affordable-genomics-research/


Skip to comment form

  1. Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread to other parts of the body.

  2. The cost of DNA sequencing has dropped more than one million-fold over the last decade, making it increasingly possible to discover the genetic basis of cancer and response to treatment. That’s true!!

  3. Cloud computing is sure go long way in medical field and data as well. This will help to manage cancer data over cloud storage and can be accessible anywhere anytime. Though this comes with challenge of managing this data. You can use better project management tools & collaboration tools.

  4. Thanks for this website and the important analytic data, quite insightful!

  5. Great Post! That is very knowledgeable and useful content. Thanks for sharing.

  6. nice post

Leave a Reply

Your email address will not be published.