Big Data Analytics in Healthcare – High-Value Use Cases and Challenges

Leveraging big data is one of the best ways for the healthcare industry to improve its affordability, efficiency and quality Undoubtedly, big data analytics in healthcare can be considered as one of the most significant leaps technology made towards improving human lives.

Although the healthcare industry had a slow start with big data analytics, it is making steady strides and is projected to become a market of US$67.82 billion by 2025. If you’re still sceptical about the importance of big data in healthcare, this article will fill you with enough information to make a “data-driven” acknowledgement.

What Is Big Data Analytics In Healthcare?

Big data refers to large data sets comprised of both structured and unstructured data that are analysed to find insights, details and patterns.

In the healthcare industry, big data is generated by various sources, including electronic health records (EHR), electronic medical records (EMRs), personal health records (PHRs), wearable medical devices and health apps on mobile devices.

This data is then analysed to guide decision-making, improve patient outcomes and decrease healthcare costs, among other uses.

Why Is Big Data Important in Healthcare?

First, the basics.

Just like its application in any other industry, for example, retail analytics or telecommunications analytics, big data in healthcare refers to the abundance and enormity of data collected from numerous sources – including but not limited to electronic health records (EHRs), medical devices, genomic sequencing, medical imagery and test results.

Although big data analytics in healthcare still has a long way to go before it becomes the standard, its current applications are promising enough.

Here are some of the applications and examples of big data in healthcare.

  1. Predictive medicine
  2. Accurate and faster diagnostics
  3. Reducing readmissions
  4. Systematic triage
  5. Prediction of clinical events
  6. Virtual care using IoT devices
  7. Faster drug discoveries

The Four Vs of Big Data in Healthcare

To better understand big data analytics in healthcare, let’s take a look at the following four dimensions of big data:

  1. Volume: This is the amount of data that we produce, with the largest amounts being necessary for big data. According to an EMC report, the amount of worldwide healthcare data totalled 153 exabytes in 2013 and was projected to reach 2,300 exabytes by 2020, increasing by 48% annually.
  2. Velocity: This is the speed at which new data is generated and analysed. With the proliferation of regular at-home monitoring, including daily glucose, blood pressure, EKG and other measurements, the velocity of healthcare data will continue to increase.
  3. Variety: Healthcare data is divided into three categories – structured, semi-structured, and unstructured. Processing a combination of these three different categories is crucial in making the data accessible and useful. Among the existing healthcare data are medical records, handwritten nurse and doctor notes, paper prescriptions, radiology images, and biometric sensor readings, among others.
  4. Veracity: This is the credibility, reliability and accuracy of the data. Accuracy in big data helps medical professionals in improving their decision-making skills and avoid making medical errors.

How Does Big Data Work in Healthcare?

To present you with a clearer perspective of how big data analytics is being utilised in the healthcare industry, here are some real-world applications of big data and how they brought in notable transformations.

1. Analysing electronic health records

One of the most notable benefits of big data in healthcare is its usefulness in analysing electronic health records (EHRs). They are digital versions of patients’ paper charts and offer real-time records for instant retrieval. It will contain all vital information regarding a patient, including medical history, treatment plans, radiology images, and test results.

Adoption of Electronic Health Record by Physicians. Image Credit:

Coupling EHRs with big data analytics can promise large-scale analysis that will aid in assigning triage (deciding the order of treatment) in case of emergencies.

It will also help in unlocking population health analytics – which is essential for formulating population health management strategies, especially during widespread outbreaks like COVID-19.

2. IBM Watson helps fight cancer

From its humble beginnings as a project developed to beat the best humans in the TV game show Jeopardy!, to scanning 15 million pages of medical data at the click of a button, the story of IBM Watson is quite awe-inspiring.

IBM Watson’s Oncology Diagnosis. Image Credit:

Powered by artificial intelligence (AI) and big data analytics, Watson is a question-answering machine that has enormous potential in the healthcare industry. In countries like India, where the proportion of oncologists to cancer patients is around 1:2000, Watson can augment the doctors’ intuition, capability and expertise.

It would take approximately 10,000 weeks for a doctor to read and analyse 10 million patient files, and at least 160 hours of reading per week for a doctor to stay up-to-date with the latest medical knowledge (on top of how long it will take to consider their relevance or application). Watson does all of that in 15 seconds.

The IBM Watson Health Cloud can help doctors find medications that align with a patient’s lifestyle and also suggest the latest and best evidence-based treatments – as keeping up with the ever-increasing medical data is nearly impossible for doctors, but not for Watson.

Watson also allows doctors to quickly analyse treatment history from other doctors or that of the patient’s family members, as well as gain information from wearable devices.

More precisely, as the doctor and patients interact more with Watson, it acquires more knowledge, processes it and implements it back into the system.

According to a study on Clinical Trial Matching (CTM) published in the Journal of Clinical Oncology, Watson was able to reduce the time taken to screen cancer patients by 78%. At Manipal Comprehensive Cancer Center in Bangalore, India, members of the multidisciplinary tumour board changed their treatment decisions in 13.6% of the cases with respect to the information provided by IBM Watson.

Watson is currently trained in 13 different types of cancer. Although Watson’s oncology computing system isn’t fully prepared for worldwide adoption, it is making small wins towards understanding cancer and the medications associated with it.

3. Prediction of clinical events

When it comes to executing big data in healthcare, prediction and prevention of clinical events are the two life-saving goals the industry seeks. The Phoenix Children’s Hospital in Arizona is extensively using big data analytics to identify new acute kidney injuries (AKIs) before they develop into life-threatening conditions.

According to the Division Chief of Nephrology, the hospital uses a combination of real-time surveillance and EHR data for detecting anomalies and sends colour-coded alerts to patients if they are entering the AKI danger zone.

To effectively perform this, the hospital scans the EHR laboratory data, every six hours, for those patients who have been prescribed medications that may have adverse effects on their kidneys. Factors such as the frequency of kidney function tests ordered by the patient are also taken into account.

Graph showing the number of AKI cases before and after implementing big data analytics. Image Credit:

With big data analytics, Phoenix Children’s Hospital was able to flag high-risk cases of AKI within hours of the initial injury. Within one year of deploying the strategy, the hospital witnessed a 34% drop in AKI cases, despite the fact that more patients were admitted during that time.

4. Predictive Analytics to Prevent Suicide and Self Harm

In the US alone, there are 132 suicides per day – making suicide the 10th leading cause of death. The Mental Health Research Network conducted a study and found that big data analytics can accurately predict suicide risks within the 90 days following a mental health visit.

The researchers used the EHR data and results of a depression questionnaire, along with 313 clinical and demographic characteristics taken from the records of individuals, from up to five years before making a mental health visit.

Graph showing the validation data set for prediction of suicide attempts and deaths. Image Credit:

The data included records on prior suicide attempts, substance use diagnoses, and psychiatric medications dispensed. The predictive model analysed and concluded that patients who made mental health speciality visits with top 5% risk scores accounted for 43% of suicide attempts and 48% of suicide deaths.

5. Better patient utilisation rates

Along with helping hospitals get ahead of no-show appointments, predictive analytics can also be used for better patient utilisation rates by giving a heads-up that things are going to get busier.

Before utilising big data analytics, Wake Forest Baptist Health in North Carolina had an uneven utilisation of the haematology-oncology clinic. Improper utilisation caused headaches to both the nurses and pharmacists, which further made it difficult for patients.

Utilisation curve before employing analytics. Image Credit:

During peak hours, 10 am to 2 pm, the resources were stressed to the maximum and appointments were fully-packed – even though many of the 43 treatment chairs would sit idle for the rest of the day.

The over-crowded atmosphere during peak hours also raised concerns about patients’ safety in the light of infectious diseases. Although the hospital tried manual scheduling to reduce the rush, they saw little difference and finally looked into a big data analytics solution.

Utilisation curve after employing analytics. Image Credit:

Since the clinic serves patients with varying oncology needs, modifying the schedules by considering a large variety of treatment protocols was arduous.  With analytics, the clinic was able to flatten the curve and observed a 10% increase in overall chair utilisation, not to mention how relieved the nurses were.

6. Prevent Healthcare Fraud and Abuse

In the fiscal years 2013 and 2014, the Centers for Medicare & Medicaid Services (CMS, the federal agency that administers the major healthcare programs in the US) saved nearly US$42 billion with the help of predictive analytics.

Before utilising analytics, new rules were established, and new experts were hired. And each time the rules were changed, known and suspected fraudulent issues were exposed. However, there were numerous flaws with this system. Firstly, as rules kept changing frequently, several clerical errors arose as physicians weren’t trained to perform billing.

Secondly, the illicit benefactors of the system who stole millions of dollars were sophisticated players and were aware that the industry was watching them – giving them the advantage of being a step ahead.

With the addition of advanced big data analytics, payers (the insurance companies) are now capable of digesting multiple data sets such as demographics, provider (doctor) credentials, geographic location and past claims data.

CMS saved 68% and 74% in the fiscal years 2013 and 2014 respectively. Image Credit:

Analytics is also capable of eliminating false positives, enabling investigation teams to differentiate between real cases of repeated clerical errors and the fraudsters. For CMS, 68% and 74% of the total savings in the fiscal years 2013 and 2014, respectively came from prevention activities using big data.

7. Personalised medicines

Most senior medical researchers (jokingly) feel that a biologist nowadays must be a programmer and a statistician first, before ever considering clinical research.

It is intriguing to note that the DNA, which is essentially a set of instructions to reproduce cells is being studied by high-end computers, which in essence, are also sets of instructions called algorithms.

Although humans as a species are 99.9% the same (DNA-wise), there will be at least 3 million differences between your genome and that of anyone else you pick at random. This means that the efficacy of a specific medicine isn’t the same for everyone.

Use of big data for implementing personalized medication. Image Credit:

Thus arose the need for personalised medicine (aka precision medicine). In fact, one of the primary goals of the Human Genome Project was to empower personalised medicine. Indeed, the project has given the industry a considerable boost to break free from the “one-size-fits-all” formula when it comes to medication.

Research studies in genomics (the study of genomes of organisms) generate humongous volumes of data (big data), which requires advanced analytics to decode and make useful. Although precision medicine is still in its infancy, big data has the potential to deliver systematic ways to achieve it.

According to Gil McVean, a professor at the University of Oxford’s Big Data Institute, 90% of a biomedical research centre today will be composed of computers and just a 10% will be a wet lab.

One easy way of applying big data analytics in genomics is by looking at 10,000 genomes of people with a disease and 10,000 without. By using an algorithm to compare them – we will be able to find the differences between them more easily.

This process will help in identifying the genes that are linked to a particular disease, even without having a hint of which ones they might be, beforehand.

What are the limitations of big data analytics in healthcare?

Methods for big data analytics and management are being continually developed. Yet, there are some limitations and challenges associated with data collection and storage. Let’s take a look at those:

1. Storage

Storing large volumes of data is one of the primary challenges of big data analytics in healthcare. The bigger a data set is, the more difficult it is to store and analyse it.

Although numerous organisations have the capability to do so, several small to medium-sized clinics find it expensive to rely on cloud-based storage, or let alone maintain an on-site server network.

Even if organisations succeed in acquiring high volumes of storage spaces at low cost, there will still be issues regarding security, up-time and ease of access.

2. Accuracy

Healthcare needs vast quantities of data in order to make informed decisions, but this data requires careful analysis. Any inaccuracies can destabilise a healthcare model.

Several researchers observed that the data collected and stored in the EHRs are not entirely accurate. However, this can be tracked down to the individuals responsible for data collection or the inaccuracies of the devices used.

Since big data analytics is only as good as the data it uses, discrepancies can lead to a broken understanding of data and will question the veracity of data-driven decisions.

Additionally, up-to-date data reports need to be clear and accessible to stakeholders, clinicians and administrators. As such, healthcare bodies need to stop relying on historical data and instead, perform monthly reporting in order to optimise cost savings and improve the reliability and accuracy of the data.

3. Adoption

As big data in healthcare analytics is still in its nascent stage, numerous physicians and administrators are still sceptical about its effectiveness. Additionally, big data analytics will also require organisations to have in-house data scientists – which can be be expensive for small to medium-sized clinics.

4. Updating

Data sets can vary in type, format and volatility. While some data sets of patients such as their date of birth, name and gender remain the same, there are numerous other sets of data such as marital status and address that may rarely change and others like vital signs that change continuously.

Considering such differences, organisations must have a clear understanding of which datasets can be manually updated, which requires automation and how data sets can be updated without damaging their integrity.

Unnecessary duplication of data can also make it difficult for physicians and lab technicians to rightly perform their duties.

5. Security

From phishing attacks to accidentally misplacing devices, healthcare data is subjected to vulnerabilities, just like in any other industry. Although numerous safety reforms such as encryption and multi-factor authentication can reduce the risks associated, the system is still fallible to physical breaches or ransomware episodes.

Case study to understand data science application in healthcare

In Malaysia, private hospitals leveraged advanced AI and data science to refine their market strategy for health tourism, traditionally dependent on inefficient manual methods. By integrating a comprehensive range of data, including demographics and global economic indicators, they utilised machine learning to identify profitable markets.

This approach resulted in a 25% increase in revenue per marketing dollar spent, optimised market targeting, and reduced investment risks. Enhanced predictive capabilities also improved customer service by personalising marketing to anticipated needs.

For an in-depth exploration of this approach, read the full case study on data science for healthcare here.

Final Thoughts

With the growing availability and usage of diverse data, big data analytics has the potential to become an essential part of the healthcare industry.

Indeed, big data will be able to significantly improve the ability of healthcare providers to deliver more personalised and effective patient care, as well as to identify and address emerging health trends and issues and improve the quality of medical procedures and treatment plans.

Within a few years, big data analytics will enable the healthcare industry to break free from the “one-size-fits-all” approach towards treatment and medication. Researchers estimate that big data analytics can help more than US$300 billion per year in the US alone.

Although the applications of big data analytics in the healthcare industry are still in their infancy, rapid advancements in the tools and methods used for data collection can significantly accelerate their maturation process.