Sunday, April 19, 2020

Big Data Tutorial 1 | What is big data | Characteristics of big data

This is the first tutorial in the Big Data Tutorials for Beginners. This Big Data beginner tutorial explains what is Big Data, what is Big Data technology, Big Data introduction, types of big data and characteristics of Big Data. Please view the Big Data tutorial 1 or read on... First, what is big data? Big data means the massive data sets that we can analyze to find patterns. Such data sets are too big/   complex/ varied/ dynamic to be processed by traditional DBMSs, software applications and procedures. On the other hand, traditional data is in relational databases and files in various formats. Big data comes from multiple sources including emails, messages, devices (e.g. mobile devices, cameras, RFID readers and wireless sensors), software logs and databases etc. That is big data vs traditional data. Big data synonyms can be large data, massive data, lots of data or large data volumes.

Next, what is big data technology? These include Apache Hadoop, Apache Spark, NoSQL databases and Predictive Analytics. Big data technologies enable organizations to analyze operations and take informed decisions. The advantages of big data for organizations are that they can acquire new customers, retain existing customers, increase revenues and decrease costs. The advantages of big data for consumers are personalized service and more features. The advantages of big data for the public are health benefits and social benefits.

Next, let us see big data introduction. Each one of us is creating and using big data daily. For example, if you search a term on a search engine like Google, the search engine uses big data to provide search results. If you search a product on an ecommerce website like Amazon, you see recommended products. The recommendation engines use big data.


Next, let us know the types of big data. Structured data is organized, labelled and exists in a fixed arrangement e.g. tabular data in spreadsheets and relational databases. Semi-structured data may be organized but does not follow a fixed arrangement e.g. XML and JSON files. Unstructured data forms the majority of big data. Unstructured data is neither organized nor follows any fixed arrangement e.g. text files, images, audio files, video files, emails and social media data (tweets and blog posts).

Next, let us learn the characteristics of big data or Big Data 5V. The first V is Volume that is the size of data or the increase in quantity of data. Variety refers to the different formats of the data. Big data can be in various formats like relational data, text (documents, emails, messages in social media), images, audio and video and machine generated data (from mobile devices, wearables, sensors, RFID tags and server logs).


Velocity is the speed of data generation or processing. Typically, the velocity of structured data is less than the velocity of unstructured data. Big data can be processed via batch processing or (typically, costlier) real-time processing. The extreme velocity of big data needs cloud-based technology to process data quickly. Veracity refers to the quality of data i.e. the accuracy, consistency and completeness of data. Big data can include data of different quality. Value is a characteristic of big data. Value refers to unlocking the big data into increased revenue or reduced costs or other benefits like customer satisfaction, employee satisfaction, social benefits or health benefits. Value example in ecommerce is referring products that the customer is likely to buy. In logistics, it is route optimization leading to reduced delivery timelines and reduced transportation costs. In utilities, it is reduced customer churn etc. Value is the most important Big Data V.

There are even more big data characteristics. Valence refers to the connections between the data. Data with more interconnections is more complex than data with fewer interconnections. Variability refers to the data generation consistency, meaning is the data available regularly or intermittently? Virality refers to the speed at which the data is spread.

Want to learn more details with examples? Also, see Big Data interview questions and answers? Please view my Big Data tutorial 1. Thank you.

3 comments:

  1. Thanks for this blog post admin, you have done really nice work.
    Manual testing

    ReplyDelete
  2. Inder, thank you for such an informative post!
    I'd like to know your opinion about my article on software development cost https://www.cleveroad.com/blog/software-development-costs. What do you think?

    ReplyDelete
    Replies
    1. @Yuliia, thank you for sharing the article on app development cost estimation. Very useful.

      Delete