The term big data is becoming more and more popular and spread everywhere around us. The usage of these two words in any context draws attention from executives to data scientists and business users in organizations of any size.
While there are many definitions for big data, we offer you a simple definition: Big data can be defined as data that cannot be processed using traditional data processing techniques due to its characteristics and complexity.
To understand better what big data really is, we'll first explore the different categories of big data. So, there are:
Unstructured data – Text, videos, audio and images.
Semi-structured data – Emails, earnings reports, spreadsheets, software modules.
Structured data – Sensor data, machine data and mathematical model outputs.
If you take a step back and closely observe those categories of data, there are a few common characteristics that you need to understand.
Volume – Any of these data categories are, by default, large in volume and variable in size.
Variety – The data can be available in a variety of formats, languages and sources.
Ambiguity – The data can contain metadata about itself or have no metadata.
Quality – The quality of data in the unstructured and semi-structured categories is unreliable.
These characteristics make the acquisition and processing of big data an extremely complex activity.
The biggest challenge in this process is to prove the associated value from integrating big data.
Today, big data programs in organizations are tied to exploring the potential of a platform (like Hadoop), and the associated business case includes social media data integration, blog posts parsing or machine learning exploration.
The underlying value of these exercises is quantified in ROI of sales lift, market share and customer centricity, but the critical path is understanding the data and its relevancy to the business. Before you put a business case document together, spend some time trying to understand the data and its content, in order to comprehend the value that can be derived. This is where the data governance aspect comes into play.
Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach.