Big data has been very popular during the last year, and that is a trend that is likely to continue. The term itself is appearing everywhere – social networks, blogs and news portals. But what is it all about?
In accordance with the current state of IT, most organizations are now generating and storing data in large quantities. This data becomes Big data when its volume, velocity or variety exceeds the abilities of the organization’s IT systems to ingest, store, analyze, and process it. The amount of data available gives a tremendous opportunity to leverage data for better business insights through analytics. However, traditional analytics methods are not equipped for dealing with data growing at such a speed and in such a variety of data types. Data can be collected not only from computers, but also from billions of mobile phones, tens of billions of social media posts, and an ever-expanding array of networked sensors from cars, utility meters, shipping containers, shop floor equipment, point of sale terminals and many other sources. Considering that data is acquired from many different sources in many different formats, both structured and unstructured, it usually doesn’t fit very well into relational databases. Also, besides useful information, it contains a lot of noise that should be removed. Therefore, it is only logical that it should be pre-processed, most importantly reduced, before stored in a data warehouse.
Relational databases contain structured data, for which the relationships are already known. This enables BI users to know what they can expect and how to analyze this data. Big data analysis involves making “sense” out of large volumes of varied data without a defined data model or relationships. During this process you should expect:
- Discovery – Detecting relationships between different data sets.
- Iteration – Repeating the process, in order to uncover new answers in each iteration.
- Flexible Capacity – Utilizing more time and resources due to the iterative nature of big data analysis.
- Mining and Predicting – Predictive analytics can give many insights into patterns and relationships while mining the data.
- Decision Management – If you are often using big data analytics to drive operational decisions, you should consider how to automate and optimize the implementation of all those actions.
To ensure the efficiency of analyzing Big data it is very important to minimize data movement, use existing skills and attend to data security. Minimizing data movement enables preservation of computing resources, especially important when dealing with large volume of data. Since most organizations have more people who can analyze data using SQL than using MapReduce, it is important to be able to support both types of processing. Unstructured data sources and open source analysis tools are often lacking administration policies and security controls, so great attention should be paid to ensuring data security.
Finally, let’s look at some real world examples of Big data use:
- Wal-Mart Stores Inc. – Using Polaris, a platform that relies on text analysis, machine learning and even synonym mining to produce relevant search results, Wal-Mart included semantic data into the search engine for Walmart.com. According to them, semantic search improved online shoppers completing a purchase by 10% to 15%.
- American Express Co. – After recognizing the limits of traditional BI American Express started looking for indicators that could predict loyalty. This lead to developing sophisticated predictive models to analyze historical transactions and over 100 variables to forecast potential churn. AmEx believes it can now identify 24% of Australian accounts that will close within the next four months.
- Morton’s the Steakhouse – A customer’s put up a joking tweet about sending a dinner to wait him at the airport. Morton’s responded by analyzing its data, discovering that he was a frequent customer, what he might order and, after figuring out his flight, sent a tuxedo-clad delivery person to serve him his dinner. Even though this was a publicity stunt, the fact remains that the company had the data and was able to use it.
- Tipp24AG – One of Europe's leading lottery brokers uses KXEN software for analyzing transactions and customer attributes, and to develop predictive models that target customers and personalize marketing messages on the fly. This results in decreasing the time it took to build predictive models by 90%.