April 26, 2020

Big Data Tutorial 2 | big data Analytics | Hadoop tutorial for beginners

This is the second tutorial in the Big Data Tutorials for Beginners. It is also the Hadoop Tutorial for Beginners. This Big Data beginner tutorial explains what is data analytics, big data analytics, Big data and Hadoop, big data applications, big data analytics tools, big data visualization tools and big data use cases. Please view the Big Data tutorial 2 or read on... What is Data Analytics? It means the analyses of data sets to find patterns and insights. Data Analytics uses multiple technologies and techniques. Data Analytics enable informed business decisions.


As shown above, Data Analytics can be divided into the following sub-categories :
  • Descriptive analytics: analysis of past data to describe the current state
  • Predictive analytics: data analysis to find patterns and forecast the future situation
  • Prescriptive analytics: data analysis to recommend actions to exploit an advantage or mitigate a future issue
Next, what is Big Data Analytics? It is the process to examine and analyze big data to find patterns, correlations and trends. Big Data Analytics allows data analysts to make informed decisions faster. Big Data Analytics includes techniques like natural language processing, statistics, machine learning, predictive analytics and data mining to draw inferences. Big Data Analytics tools include Hadoop and related tools like HBase, Hive and Pig.

Big Data and Hadoop:  Hadoop is a big data open source tool. It is an open-source framework created by the Apache Software Foundation. Apache Hadoop uses distributed storage (many computers) to handle big data. Hadoop uses the Map Reduce data analysis technique. Hadoop has two components 1) HDFS (Hadoop Distributed File System) manages the big data storage 2) MapReduce manages the data processing. Hadoop divides data into many blocks and distributes these blocks across the computers in a cluster. Then, Hadoop sends code to the nodes for data processing using the Map and Reduce technique. All the tools used by an organization in it's big data architecture form the big data stack. Some tools in the Hadoop ecosystem include:
  • Apache Hadoop YARN is the resource manager, job scheduler and job monitor in Hadoop.
  • Apache HBase is the distributed database that works on HDFS in big data. HBase is a non-relational database that stores data as key-value pairs.
  • Apache Hive is a tool for the purpose of data querying and analysis. Hive allows SQL-like queries to fetch data from the HDFS and the databases managed by Hadoop.
  •  Apache Mahout is a tool for machine learning and data mining tasks.
  • Apache Pig is a platform to write code to run on Hadoop. Pig uses Pig Latin which makes it easier to write programs using the Map Reduce technique.
  • Apache Ambari is a tool to provision, manage and monitor Hadoop clusters.
  • Apache Spark is a compute engine for massive data. Spark big data offers a programming model for ETL, streaming, machine learning and graph generation. In order to use Apache Spark, we can write programs using Java, Python, Scala, R or SQL.
There are many big data applications or big data tools that help organizations create their custom applications. Some examples of big data tools are Teradata database (to import data to Hadoop, query data and export data from Hadoop) and big data analytics software like Statistica (for predictive analytics), IBM's Watson Analytics and MongoDB (for querying unstructured data). Big data analytics means analysis of big data sets to find patterns and extract insights. Some examples of big data analytics tools are Tableau Public, Knime, Plotly and ElasticSearch. Some of the popular big data visualization tools are Tableau, Google Chart and D3.js. There are other tools for big data visualization like DataWrapper, FusionCharts and Plotly.

Some of the popular big data use cases applicable to many industry domains are shown above. These are:
  • 360-degree view creation of an entity (e.g the customer or the patient or the student)
  • Customer classification (into several categories) for relevant communication
  • Price optimization based on demand, competition and customer profiles (especially useful in eCommerce, airline and hotel industries)
  • New product/ service development (based on features that contribute to success)
  • Distribution optimization (based on forecasted demand, expected traffic conditions and so on)
  • Fraud prevention (to flag potentially fraudulent transactions in real-time)
These are just a few big data analytics examples. Big data analytics enables risk assessment in the insurance industry, product recommendation in eCommerce and customer care in every industry.

Want to learn more details about Big Data tools? Also, see Big Data questions and answers? Please view my Big Data tutorial 2. Thank you.

6 comments:

  1. Wow!
    Excellent post. We also have development company. For more details click here
    https://www.techwracktree.com/
    http://techwracktree.blogspot.com/

    ReplyDelete
  2. Very nice article,keep sharing more article about big data and hadoop.
    thank you....

    big data online training

    hadoop admin online training

    ReplyDelete
  3. Hi...Came across your article. Found it quite interesting & helpful for anyone who wants to learn Apache Ambari in details. The article shows your vast knowledge in this field which inspire many people to learn this. Few weeks back one of my close relative has taken training from MaxMunus & he is highly satisfied with their training quality.If you come across anyone willing to take training along with certification guidance ,you can ask him to reach them on this
    Apache Ambari Training

    ReplyDelete
  4. Very Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.

    Data Engineering Services 

    Data Analytics Services

    Artificial Intelligence Solutions

    Data Modernization Services

    ReplyDelete
  5. I liked the way you put together everything, there is certainly no need to go any further to look for any additional information. You mentioned each and everything about Big Data Visualization.

    ReplyDelete

Note: Only a member of this blog may post a comment.