This is the second tutorial in the Big Data Tutorials for Beginners. It is also the Hadoop Tutorial for Beginners. This Big Data beginner tutorial explains what is data analytics, big data analytics, Big data and Hadoop, big data applications, big data analytics tools, big data visualization tools and big data use cases. Please view the Big Data tutorial 2 or read on... What is Data Analytics? It means the analyses of data sets to find patterns and insights. Data Analytics uses multiple technologies and techniques. Data Analytics enable informed business decisions.
As shown above, Data Analytics can be divided into the following sub-categories :
- Descriptive analytics: analysis of past data to describe the current state
- Predictive analytics: data analysis to find patterns and forecast the future situation
- Prescriptive analytics: data analysis to recommend actions to exploit an advantage or mitigate a future issue
Big Data and Hadoop: Hadoop is a big data open source tool. It is an open-source framework created by the Apache Software Foundation. Apache Hadoop uses distributed storage (many computers) to handle big data. Hadoop uses the Map Reduce data analysis technique. Hadoop has two components 1) HDFS (Hadoop Distributed File System) manages the big data storage 2) MapReduce manages the data processing. Hadoop divides data into many blocks and distributes these blocks across the computers in a cluster. Then, Hadoop sends code to the nodes for data processing using the Map and Reduce technique. All the tools used by an organization in it's big data architecture form the big data stack. Some tools in the Hadoop ecosystem include:
- Apache Hadoop YARN is the resource manager, job scheduler and job monitor in Hadoop.
- Apache HBase is the distributed database that works on HDFS in big data. HBase is a non-relational database that stores data as key-value pairs.
- Apache Hive is a tool for the purpose of data querying and analysis. Hive allows SQL-like queries to fetch data from the HDFS and the databases managed by Hadoop.
- Apache Mahout is a tool for machine learning and data mining tasks.
- Apache Pig is a platform to write code to run on Hadoop. Pig uses Pig Latin which makes it easier to write programs using the Map Reduce technique.
- Apache Ambari is a tool to provision, manage and monitor Hadoop clusters.
- Apache Spark is a compute engine for massive data. Spark big data offers a programming model for ETL, streaming, machine learning and graph generation. In order to use Apache Spark, we can write programs using Java, Python, Scala, R or SQL.
Some of the popular big data use cases applicable to many industry domains are shown above. These are:
- 360-degree view creation of an entity (e.g the customer or the patient or the student)
- Customer classification (into several categories) for relevant communication
- Price optimization based on demand, competition and customer profiles (especially useful in eCommerce, airline and hotel industries)
- New product/ service development (based on features that contribute to success)
- Distribution optimization (based on forecasted demand, expected traffic conditions and so on)
- Fraud prevention (to flag potentially fraudulent transactions in real-time)
Want to learn more details about Big Data tools? Also, see Big Data questions and answers? Please view my Big Data tutorial 2. Thank you.
Wow!
ReplyDeleteExcellent post. We also have development company. For more details click here
https://www.techwracktree.com/
http://techwracktree.blogspot.com/
Very nice article,keep sharing more article about big data and hadoop.
ReplyDeletethank you....
big data online training
hadoop admin online training
Hi...Came across your article. Found it quite interesting & helpful for anyone who wants to learn Apache Ambari in details. The article shows your vast knowledge in this field which inspire many people to learn this. Few weeks back one of my close relative has taken training from MaxMunus & he is highly satisfied with their training quality.If you come across anyone willing to take training along with certification guidance ,you can ask him to reach them on this
ReplyDeleteApache Ambari Training
hey that's great article post. If you have know idea about Marketing Reporting Tools then click it. Digital marketing agency and social media marketing agency near me
ReplyDeleteVery Informative and creative contents. This concept is a good way to enhance knowledge. Thanks for sharing. Continue to share your knowledge through articles like these.
ReplyDeleteData Engineering Services
Data Analytics Services
Artificial Intelligence Solutions
Data Modernization Services
I liked the way you put together everything, there is certainly no need to go any further to look for any additional information. You mentioned each and everything about Big Data Visualization.
ReplyDelete