Companies have spent many years building enterprise data warehouses and using business intelligence (BI) tools to report on the business. But predictive analytics is different. Advanced statistical, data mining, and machine learning algorithms dig deeper to find patterns that traditional BI tools may not reveal.
Many of these techniques are not new, but big data has breathed new lifeinto the possibilities because more data can mean more and better predictive models. Big data is the fuel and predictive analytics is the engine that companies need to discover, deploy, and profit from the knowledge they gain.
Forrester defines big data predictive analytics solutions as:
Software and/or hardware solutions that allow companies to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources to improve business performance or mitigate risk.
Predictive analytics uses algorithms to find patterns in data that might predict similar outcomes in the future. A common example of predictive analytics is to find a model that will predict which customers are likely to churn. But this isn’t a one-time operation; companies must rerun their analysis on new data to make sure the models are still effective and to respond to changes in customer desires and competitors. Many companies analyze data weekly or even continuously.
In order to maximize success with predictive analytics programs, companies must:
Set the business goals
Clearly stated business goals lie at the center of any successful predictive analytics project. For example, the goal might be to recommend items to upsell to existing customers — or to prevent life-threatening and costly hospital re-admittance. Businesses can also use predictive analytics to achieve more generic business goals, such as increasing revenue, because it enables them to discover correlations that may suggest strategy improvements.
Understand data from a variety of sources
In large companies, potentially valuable data often exists in multiple siloes. In addition, many companies are now using external data from social media, government data, and other public sources of data to augment their internal data. Advanced data visualization tools can help data analysts explore the data from various sources to determine what might be relevant for a predictive analytics project. Increasingly, many data analysts collect every shred of data available to let the predictive analysis algorithms find what is most relevant.
Prepare the data
Data preparation for predictive analysis is a key challenge. Raw data is often unsuitable for predictive analytics. Data analysts must often perform extensive preprocessing of the data before running analysis algorithms. For example, data analysts might need to enrich the data with calculated aggregate fields, strip out extraneous characters or information that would choke the algorithms, or combine data from multiple sources.
Create the predictive model and evaluate it
Predictive analytics is not about absolutes; it is about probabilities. To evaluate the predictive power of the model, data analysts run it against the test data set. If the predictive model is more effective than a random selection of the outcome, then they’ve found an effective predictive model. Data analysts can continue to run other types of algorithms until they find the one that is most predictive; alternatively, they may not find any because there is not enough data or the data is too random to uncover a predictive model for the desired business outcome
Deploy the model
Analysts must deploy effective predictive models in production applicationsto accrue the business benefits. A deployed model consists of logic to run the predictive rules and/or formulas and a method to get the data that the model needs and return the result.
Monitor the effectiveness of the model
As financial companies caution, “Past results do not guarantee future performance.” It is essential to monitor the effectiveness of the predictive model.
Companies must continue the predictive analytics process to stay on top of business goals, understand new data, prepare better data, refine models with new algorithms, evaluate the models, and deploy and monitor the models in a never-ending cycle.