Big Data Can Cure Cancer

The potential of Health IT is enormous, yet it isn't technology that is holding it back, it is people.

Health IT is fast becoming the most monetizable area in tech. As initiatives around the world seek to digitize healthcare data, there are huge opportunities for game-changing tools and platforms. However, with potentially a yottabyte (1024 gigabytes) of healthcare data in the United States alone (Raghupathi, 2014), we are no closer to this utopian, data-driven world of healthcare. Data in existence is not standardized, highly fragmented and is stored in incompatible legacy platforms. The technology exists, so why is healthcare so far behind other sectors in utilizing existing technology and what needs to be done to catch up?


In 2011 the US Office of the National Coordinator for Health Information Technology released an aggressive plan to digitize health information across the nation (IT, n.d.). This would be done by incentivizing primary care physicians and hospitals to adapt defined electronic health record (EHR) systems and use these systems in a meaning and consistent way. Concurrently, regional health information organizations (RHIOs) and statewide health information networks would consolidate and centralize disparate data that could be used meaningfully and by multiple parties to improve the cost and efficacy of healthcare in the country by connecting these statewide networks together.

In New York state, the New York eHealth Collaborative (NYeC) was tasked with architecting and implementing the Statewide Health Information Network of New York (SHIN-NY) and driving the incentivization of EHR adoption (Collaborative, n.d.). A team of the nation’s finest business, health policy and health IT professionals were hired to make this all happen. NYeC intended itself to be the model for what would be rolled out through the entire country through close alignment with RHIOs, hospital networks, policy leaders and premier health IT companies. Yet the SHIN-NY proved to be ever-elusive; so, what went wrong?

In 2012, NYeC held its 2nd annual Digital Health Conference in Manhattan. IBM’s Watson was in attendance and used to showcase the possibilities of machine learning and information retrieval in diagnosis and R&D developments when utilizing high-quality health data. Attendees, many of whom were amongst the most influential healthcare professionals in the world, were amazed. It was clear that stakeholders within the industry bought in to the importance of the quality and quantity of digitized data needed. The problem, however, was the general public; not enough people trust, understand and realize the importance of technology to the transformation of healthcare. They question the security of their data and who accesses it, which are valid concerns.


Because of the opt-in/opt-out nature of consumer health data sharing, only a limited section of the population’s data is actually shared. This is an inherent problem with the operating model of NYeC and similar enterprises across the US; patient/consumer advocacy is not part of the conversation. Similarly, there isn’t enough alignment between all stakeholders to ensure healthcare data is collected methodically and used meaningfully. EHR developers are often resistant in their efforts to create systems that make interoperability seamless, in attempts to monopolize regions through the network effects that interoperability creates.

HL7 are a set of international standards for transfer of clinical and administrative data between software applications used by various healthcare providers (Wikipedia, n.d.). This should remedy many of the issues, along with approved EHR vendor lists (all of whom conform to HL7 standards) in allowing interoperability to finally become prevalent, through the SHIN-NY and similar networks. However, the US is clearly behind many other developed nations, as demonstrated in a 2013 Health Affairs (Affairs, n.d.) survey that polled 10 developed nations on EHR adoption…

    1. Norway (98%)
    2. Netherlands (98%)
    3. United Kingdom (97%)
    4. New Zealand (97%)
    5. Australia (92%)
    6. Germany (82%)
    7. United States (69%)
    8. France (67%)
    9. Canada (56%)
    10. Switzerland (41%)

When large data repositories for health data eventually exist, the possibilities are exciting. The SHIN-NY, for example, will have an application program interface (API) that will allow developers to access de-identified data to create valuable applications. Consumer health devices such as heart rate monitors, can be able to push data to a patient’s profile, and hospitals will be able to use devices that automatically record information such as pulse rate and blood-sugar levels over periods of time. These things are incredibly value for individual patients, and when consolidated, can provide even more value for R&D. But until interoperability is achieved, this is all a pipedream.

Next Steps

NYeC need to broaden the scope of their operating model to tackle the most fundamental bottleneck in the digitization of health data. They need to prioritize consumer advocacy to ensure people understand the importance of sharing their data, they need to put more pressure of EHR companies to promote interoperability, and they need to join the discussion on providing homogenous platforms so that IoT healthcare devices, machine learning and manual data input all comply with standards that will allow for rich data that can enable interoperability, diagnosis, health management, affordable care for everyone and, potentially, the data needed to cure cancer. (793 words)



Works Cited

Affairs, H. (n.d.). Retrieved from Beckers Hospital Review:

Collaborative, N. Y. (n.d.). Retrieved from

IT, O. o. (n.d.). Health IT Dashboard. Retrieved from

Raghupathi, W. R. (2014). Big data analytics in healthcare: promise and potential. Health Inf Sci Syst.

Wikipedia. (n.d.). Health Level 7. Retrieved from Wikipedia:




Cisco and IoT, from hype to reality


Amazon Dash: Riding the Data Wave

3 thoughts on “Big Data Can Cure Cancer

  1. Anton, thank you for highlighting the potential and challenges of healthcare data sharing. I would like to call out an additional point regarding adoption and one potential solution.

    Protected Health Information (PHI) has become the newest target for hackers, making headlines following the Anthem data breach of 2015. Large healthcare systems that have invested millions to avoid this fate but still face day-to-day threats to the security of their data. As a result, opening up their PHI to thousands of users outside of their system is perceived as risky. There’s also an element of the waiting game… if Player B hasn’t put themselves out there, why should Player A?

    Currently, government incentives programs are used to accelerate the adoption of health IT (such as, the “Meaningful Use Program” to accelerate electronic medical record adoption). These programs lay out standards for technology proficiency, set reimbursement schedules, and are often used to defray the costs of implementing the technology. I think a nation-wide (rather than state-wide) program will be needed to motivate health systems to take the leap into data sharing.

  2. Really interesting topic Anton, thanks for sharing. I agree with MB’s points (who is MB???) on privacy, but I think if we are able to address those concerns, this digitization of health records would have monumental impact.

    You have some great ideas for how to pressure the system to move this direction, but I’m skeptical about whether this will go far enough to fully align all the players with so many differences state to state, provider to provider, insurer to insurer, customer to customer. This is an area where I think government could be best suited to enter and align incentives, but as we’ve seen the Affordable Care Act doesn’t get us there and is likely to be changed.

    Another question I have is who will have access to data once aggregated–ie who owns it, as we discussed in class. In order to be of widespread use toward cancer cures, for example, it would be great for any and all researchers in the medical community to have this access. But back to the patient privacy concerns, ideally data will not leak beyond individuals and their providers.

  3. This movement towards better collection and standardization of medical records has tremendous potential for healthcare indeed. However, as you rightly pointed out, the opt-in nature of medical data does pose a challenge. I would even venture suggesting that the opt-in nature of data collection might lead to selection bias among patients and bias in data, which is dangerous if you’re going to be pushing medical professionals to rely on statistical evidence of the data to backup their hypotheses.

    I am also unsure to what extent Big Data can “cure” cancer. That said, I do know that Big Data is used a lot in pharmacogenomics which then helps to determine the efficacy of different drugs for different patients.

    You can read more about pharmacogenomics here:

Leave a comment