In a series of previous blog posts, Juli Klemm and I discussed the challenges associated with the growth of biomedical research data, and the results of an informal survey of computational limitations currently experienced by biomedical researchers. One possible solution to resolving these challenges is what we are calling a “cancer genomics cloud,” a computing environment that provides co-located storage and computational infrastructure, pre-loaded with an authoritative copy of NCI public data. Scientists would be able to combine their own data with the NCI reference data set and perform analyses, without needing to worry about provisioning the millions of dollars of IT equipment (and associated support staff) or spending the year or more it might take to download the entire NCI public data set. More importantly, such a cloud (or clouds) would have the added benefit of democratizing access to NCI public data, allowing the broader cancer research community to access and use these valuable resources.
On Monday, June 24, 2013, I presented a concept to two of the NCI’s Federal Advisory Commissions, the National Cancer Advisory Board (NCAB) and the Board of Scientific Advisors (BSA) to initiate a set of Cancer Genomics Cloud Pilots. The concept envisions that NCI will fund the development of up to three cancer genomics cloud pilots at a level of 2.5 petabytes (PB) of core data; that is, scaled to the amount of data that the Cancer Genome Atlas (TCGA) will generate by its conclusion in late September of 2014. The three pilots would be cooperatively managed, along with the NCI Genomic Data Commons (the next phase of the TCGA Data Coordinating Center), sharing a common core data set, but with each of the three pilots supporting at least one additional TCGA data type and a set of unique capabilities. When complete, they will represent a unique set of resources that are available to the biomedical research community, and, if successful, can be scaled to support the needs of the research community for years to come. The cloud pilots would be part of an integrated genomics infrastructure, that includes the forthcoming NCI Genomic Data Commons, that would together resolve many of the issues relating to access to and analysis of large, high value data sets produced from NCI sponsored genomic research.
Now that the BSA has approved the concept, the NCI has begun the process of making this a reality. Our goal is to have up to three groups from the academic community, industry, the not-for-profit world, or joint teams across multiple sectors develop these clouds under the auspices of a Broad Agency Announcement, a contract mechanism, ideally suited to support research and development projects driven by a broad set of technical guidelines. Under a BAA, each successful respondent receives a contract that is defined by their particular response, rather than putting a variety of different capabilities into the confines of a single statement of work. Further, BAAs utilize scientific peer review, rather than the internal review process that is associated with standard federal contracting mechanisms.
The first step of this process is a federal “sources sought” notice, a synopsis of an agency requirement that is posted at the FedBizOps website. A sources sought notice is not a solicitation or a request for proposals, but rather a request for potentially interested parties to submit capability statements that indicate their ability to carry out the work described as part of the government’s market research. Interested parties can find the cloud pilot notice on the FedBizOpps.gov site.
The NCIP blog will continue to post information about the cloud pilots as they proceed and opportunities for you to remain engaged. To be notified about new developments, sign up for the NCIP Announce Listserv, follow us on Twitter @NCI_NCIP or connect with us on LinkedIn.
 The NIH has archived the videocast of the meeting at: http://videocast.nih.gov/summary.asp?Live=12906. The presentation starts at time reference 05:58:00.
 See http://en.wikipedia.org/wiki/Broad_Agency_Announcement for an overview, and http://www.acquisition.gov/far/current/html/Subpart%2035_0.html#wp1085187 for details about BAAs.
George Komatsoulis, Ph.D., is Interim Director of the Center for Biomedical Informatics and Information Technology (CBIIT) and Chief Information Officer at the National Cancer Institute (NCI). You may reach George via email at firstname.lastname@example.org.