The rise of machine learning in genomics
Recently published research indicates that the pharmaceutical industry spends an average of $2.6 billion dollars and 10 years in research and development (R&D) per approved drug . These resource requirements are largely due to the costly trial and error process embedded in traditional R&D activities.
Machine learning in genomics is poised to improve this lengthy and costly process through narrowing the number of promising drug targets for investigation and predicting biological outcomes for hereditary diseases . The potential impact of machine learning in genomics is being buoyed by the plummeting cost of sequencing, and thus the increased availability of big genomic data sets .
Figure 1: Plummeting cost per genome sequencing 
Deep Genomics: Machine learning research and commercial priorities
Founded in 2015, Deep Genomics is a Toronto-based company focused on using machine learning to find difficult-to-detect disease triggers and develop drugs to treat these hereditary diseases. Deep Genomics’ scientists have released numerous publications on the utility of machine learning in the genomics field, including results from a computational technique that provided insight into the genetic underpinnings of nonpolyposis colorectal cancer, spinal muscular atrophy, and autism spectrum disorder .
Figure 2: Supervised machine learning in biomedicine 
Deep Genomics has developed and patented systems and processes that take a DNA or RNA sequence as an input, extract features, and apply an algorithm to compare the test variant to variants known to cause disease . Importantly, Deep Genomics’ automated platform is built using proprietary and public datasets, incorporating the company’s own published work and other scientists’ research (after in-depth, yet rapid evaluation) . In addition, the company accelerates platform improvement by feeding data of on-target and genome-wide off-target effects back into the system for platform updates.
In 2017, Deep Genomics announced its entrance into drug development. For the next three years, Deep Genomics aims to leverage its machine learning platform to efficiently identify new classes of therapies and advance them to clinical trials starting in 2020 . Its first priority is early stage drug development for diseases resulting from a single genetic mutation, which affect 350 million people globally . In the short-term, Deep Genomics will be investing $10 million to develop preclinical therapies for metabolic and neurodegenerative disorders.
In the short and medium-term, Deep Genomics is also seeking pharmaceutical partners that are developing drugs for hereditary disease and need support accelerating the R&D process. In April 2018, Deep Genomics announced a partnership with Cambridge, MA-based Wave Life Sciences to identify and develop novel therapies for neuromuscular diseases such as Duchenne Muscular Dystrophy .
Additional considerations for the future
As Deep Genomics transitions to its new strategic direction in drug development and pharmaceutical partnerships, it must be sure to maintain its machine learning thought leadership position and core platform. In the short-term, leadership should be careful to allocate sufficient resources to maintain its competitive edge within the machine learning space (i.e., be thoughtful about keeping resources in the machine learning research/academic arm of the company and not over-allocating resources to its sales or partnerships teams). Deep Genomics’ industry leadership, proprietary data, and constantly evolving model (including incorporation of rapidly growing public data) was essential to creating its strong reputation in the market, and will be critical to its short-term ability to ink pharma partnerships and its long-term success.
In addition, as Deep Genomics continues to build its platform and commercial offering, it must be cognizant of the data’s biases, as machine learning findings are only as “good” (i.e., representative and valid) as the data in the platform. As one example of potential bias, insufficient diversity and inclusion in biomedical and genomic research in the United States (particularly regarding individuals of African and Hispanic ancestry) is well-documented . Deep Genomics should be cognizant of any biases in the data and seek to include quality genomic data (from its own or public resources) with research populations that are as representative of the larger population as possible. This will help to ensure that Deep Genomics’ advances in genomics-informed drug development are broadly applicable.
As Deep Genomics embarks on its goal to leverage machine learning for drug development, some questions remain. How should Deep Genomics determine priority, strategy, and resource allocation between internal drug discovery/development and external pharmaceutical partnerships? Further, as the cost of genetic sequencing continues to plummet and machine learning in genomics continues to accelerate, competition is bound to increase significantly. How can Deep Genomics defend its market position against entrants, such as new start-ups and/or well-financed multi-national companies seeking to create internal machine learning capabilities?
(Word Count: 760)
 DiMasi, Joseph A.; Grabowski, Henry G.; Ronald W. Hansen. “Innovation in the pharmaceutical industry: New estimates of R&D costs.” Journal of Health Economics (2016): 20-33, ScienceDirect, accessed November 2018.
 Lohr, Steve. “From Agriculture to Art – the A.I. Wave Sweeps In.” New York Times, October 21, 2018, https://www.nytimes.com/2018/10/21/business/from-agriculture-to-art-the-ai-wave-sweeps-in.html, accessed November 2018.
 NIH National Human Genome Research Institute, “DNA Sequencing Costs: Data.” https://www.genome.gov/sequencingcostsdata/, accessed November 2018.
 Xiong, Hui Y.; Alipanahi, Babak; Lee, Leo; Bretschneider, Hannes; Daniele Merico. “The human splicing code reveals new insights into the genetic determinants of disease.” Science (2015), Science Direct, accessed November 2018.
 Wainberg, Michael; Merico, Daniele; Delong, Andrew; Brendan J Frey. “Deep learning in biomedicine.” Nature Biotechnology (2018): 829-838, Science Direct, accessed November 2018.
 Google Patents, “Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network.” https://patents.google.com/patent/US20160364522A1/en, accessed November 2018.
 Deep Genomics, “Platform.” https://www.deepgenomics.com/platform/, accessed November 2018.
 Deep Genomics, “Deep Genomics Takes on Metabolic & Neurogenerative Disorders.” https://www.deepgenomics.com/updates/taking-on-metabolic-neurodegenerative-disorders/, accessed November 2018.
 Knight, Will. “An AI-Driven Genomics Company is Turning to Drugs.” MIT Technology Review, May 3, 2017,, https://www.technologyreview.com/s/604305/an-ai-driven-genomics-company-is-turning-to-drugs/, accessed November 2018.
 “U of T’s Deep Genomics inks partnership with U.S. biotech firm.” University of Toronto, April 11, 2018, https://www.utoronto.ca/news/u-t-s-deep-genomics-inks-partnership-us-biotech-firm, accessed November 2018.
 Bentley, Amy R.; Callier, Shawneequa; Charles N. Rotimi. “Diversity and inclusion in genomic research: why the uneven progress?” J Community Genet. (2017): 255-266. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614884/, accessed November 2018.