Sunday, May 3, 2020

Big Data Tutorial 3 | big data in Healthcare | Big data testing

This is the third tutorial in the Big Data Tutorials for Beginners. This Big Data beginner tutorial explains Big Data in HealthCare, Big Data challenges, Big Data Testing, Big Data Testing challenges and Big Data Testing Tools. Please view the Big Data tutorial 3 or read on... First, let us learn about big data in healthcare. The healthcare industry is highly regulated and uses healthcare big data like patient health records, laboratory test results and test reports, prescriptions, claims and payments.

The main problem has been to analyze the healthcare big data quickly. The Hadoop framework is widely used in the healthcare industry to host the big data for quick processing by Map Reduce jobs. The use cases that I mentioned in my Big Data Tutorial 2 are applicable in the healthcare industry e.g. 360-degree view creation of patients and physicians and patient classification for care personalization and efficiency.  Big data examples in healthcare may enable improved prescription accuracy, reduced treatment cost and epidemic prediction. In the future, big data will be used to provide continuous patient monitoring using wearable sensors and Internet of Things devices.

Big data challenges are to perform Data Capture, Data Storage and Data Transfer actions quickly and cost effectively and to blend the data in multiple formats together in Data Analysis. For example, one of the the challenges in Data Capture is data ingestion in Hadoop. Data ingestion means migrating data from source systems to a Hadoop cluster. Since there can be numerous source systems and different ways to ingest data to Hadoop, it can become very complex. Big Data Search, Data Sharing, Data Visualization and Information Privacy are also challenging.

Big Data Testing: Big data testing deals with testing the data quality. High quality big data allows an organization to take accurate business decisions. Big data testing includes big data applications testing, data testing, functional testing and performance testing. Data testing includes:
  •  Data Staging Validation: It is data ingestion testing. It validates the data being loaded into the Hadoop framework. It compares the source data with the data loaded into Hadoop. It also tests that data has been correctly loaded into the Hadoop framework at the correct location. Data staging validation checks the completeness, accuracy, integrity, consistency, validity, standardization and lack of duplicates in the data. Data staging validation of structured data is simpler than that of semi-structured data and unstructured data.
  • Map Reduce Validation: It is the data processing testing to test the business logic and the outputs of the big data applications working on Hadoop. Map reduce validation checks that the Map Reduce process implements the data segregation and data aggregation rules and generates the key value pairs correctly.
  • Output Validation: This is the output testing to test that the Hadoop data matches with the data moved into target systems like data warehouses. Output validation checks the data quality of output data files generated by Hadoop. Then, it tests the ETL process. Finally, it compares the Hadoop data to check complete and accurate data load in the target system.
Functional testing of the big data applications consists of testing the functionality of the big data applications provided in their user interface. Performance testing of the big data applications consists of measuring the data ingestion speed and data processing speed (of Map Reduce jobs) with metrics like throughput (of data ingestion), core utilization and memory utilization. Performance testing includes failover testing (to find if Big Data processing continues in the presence of failed nodes) and sub-component performance testing (to test each component of the Hadoop framework in isolation).

Big Data Testing challenges include availability of enough source test data, QA environment complexity and needing skills to build it, unstructured data testing complexity and needing multiple tools and test automation of big data testing requiring high skills (because unforeseen issues that may occur in unstructured data).

Big Data Testing tools: the Big Data Tester can use the tools in the Hadoop ecosystem for big data testing. Due to the complexity of the big data QA environment and big data volume, velocity and variety, no single tool can do end to end big data testing currently. Some big data testing tools are
  • Tricentis Tosca BI and Data Warehouse Testing tests data integrity with built-in automated tests like pre-screening tests, ETL tests like completeness, uniqueness and referential integrity tests and other tests. 
  • QuerySurge compares the source and target data systems and highlights data differences automatically. It also has features like test management integration, test monitoring and reporting and it's own API.
  • TestingWhiz works with Hadoop, MongoDB and Teradata. It allows data validations tests and performance tests in big data testing.
Want to learn more including Big Data challenges in Healthcare and Big Data Questions and Answers? Please view my Big Data tutorial 3. Thank you.


  1. Thanks for Sharing a Very Informative Post & I read Your Article & I must say that is very helpful post for us.
    DevOps Training
    DevOps Online Training

  2. data warehousing solutions should understand the need of Data, and they should work to build more appropriate services to meet the requirements of their clients.