Stop living in the past when it comes to open data delivery
It’s six in the morning, and your phone comes off ‘do not disturb’ mode with a slew of messages and email notifications. Your eyes scan the list, looking for anything marked High Priority.
And then you see it. The overnight batch load has failed. There will be no updated business dashboards this morning.
While this might sound like the opening to an extremely niche horror novel, this is often the waking nightmare of everyday reality for many BI teams or IT managers when they start their day.
A scheduled process to extract yesterday’s operational data from across multiple business systems and then ingest and prepare that data for business use has failed at some point in the pipeline.
Perhaps it was a broken connection to a data source. Perhaps it was poor-quality data that failed a consistency check. Perhaps it was simply a time-out in one of the many stages of the extract-transform-load (ETL) cycle that shunts data from systems of record to systems of insight.
When a data pipeline cracks and breaks down overnight there are difficult decisions to be made in the near term. Can the process be restarted from the point of failure? Does the entire data pipeline need to be rerun from scratch?
The implications of such a cold restart should not be underestimated: ETL processes are notoriously resource intensive. Is the business willing to work with slow applications for the next six hours while data-hungry queries extract yesterday’s records and clog the operational systems? Is there a guarantee that what broke last time won’t just happen all over again in the middle of the business day?
The alternative isn’t much better. A business that relies on operational analytics to make tactical decisions on mission-critical processes will be flying blind until the following day. Potentially up to 24 hours without telemetry for orders, procurement, billing or other business activities. 24 hours with nothing but gut feel and stale data to steer the ship.
The quest for fresh data to drive analytics insights and business decisions is nothing new, of course. Traditional analytics platforms have continued to shrink the window from data being captured in operational systems to being modeled and presented to analysts in BI tools and reports – from monthly updates that fell in line with classical accounting procedures to overnight loads that made yesterday’s transactions available for descriptive, diagnostic and predictive analysis.
But the pressures from the business to make informed decisions on the most up-to-date information, to respond quickly to changes in business operating conditions and to meet regulatory requirements have put pressure on the traditional overnight batch job.
Enter the incremental batch load.
Modern analytics platforms have recognized the need to load and process incoming data more frequently than just once per day. By monitoring source systems for changes in records during a defined time window, it becomes possible to ‘miniaturize’ the end-to-end data pipeline for just those records that have been created, updated or deleted by business users within a specific time frame.
Most organizations start with baby steps for their incremental loads, moving from a 24-hour window to 12-hours, then 6-hours, gradually reducing the time window until they can process operational data changes every 1-2 hours. Why the plateau at this time window? It’s often because of the ‘T’ in ETL – the transformations, reshaping and remodeling that are needed to process raw operational data into analytics-ready models for traditional BI tools and queries.
With operational data sources such as ERP platforms, there’s hidden complexity in the raw data models that requires a great deal of pre-processing (‘transformation’) for most analytics tools to be able to serve up insights with reasonable performance. Even with an incremental load of data every hour, platforms struggle to run the complex data engineering needed to transform data from raw into processed analytics-ready data.
In order to remove this speed barrier to freshness, organizations must rethink their approach to data transformation within their analytics pipelines. Minimizing (even eliminating) transformation steps means that incremental batch load times can be brought down by another order of magnitude, to every 5 or 10 minutes and allowing customers to run hundreds of so-called ‘micro batches’ of data every single day.
However, analytics on raw operational data such as ERP systems introduces new challenges. BI tools work best when query complexity has already been optimized through an analytical data model (think of a star schema with far fewer relationships between entities, or even flattened views or reporting tables). These traditional BI tools struggle to perform when facing complex queries involving hundreds of joins across raw ERP source tables, placing a huge load on the BI application servers and the underlying analytics database.
What’s needed is a fundamental innovation around working with raw operational data that ensures that data reshaping or remodeling is kept to a minimum so that fresh data can be delivered from ERP source systems through fast, incremental loads – as close to real-time as possible.
With Direct Data Mapping, Incorta offers a way to deliver blazing ERP analytics performance against raw data – without needing expensive, resource-heavy transformation steps in their data pipelines.
With Incorta, companies like Broadcom, ComCast and Shutterfly have moved from a single overnight batch load to running up to 100 incremental batches every day – delivering fresh data to their teams in operations, finance and supply chain management and enabling organizations to monitor financial and operational performance.
“For the first time ever, we can truly see how the business is performing in real-time. This level of insight has made a measurable impact on the way we function as an organization. The best part is that it’s so easy to use…”
Karim Shahine, Henkel Technical Team Leader
Data freshness drives new models of business transformation.
As digital transformation sweeps away legacy practices in departments like Finance or Operations, new practices emerge in their place. Terms like ‘continuous accounting’ reflect a modern approach to data-driven organizations that emphasize access to continuously updated financial records and a more automated approach to accounting practices.
Similarly, an operations team aims to monitor production metrics in real-time, observing and anticipating changes in inventory levels and production output. This enables teams to quickly identify any issues in the production process and take corrective action to minimize downtime and maintain production efficiency.
Incorta’s ability to deliver fresh data in these time frames is essential for organizations that are embracing modern, data-driven practices. By providing updated insights throughout the working day, teams can make faster and more informed decisions, giving them a huge competitive advantage over companies still relying on the fragile nightly data load.
“By implementing Incorta, we went from almost a total absence of reporting to immediate access to virtually real-time information. We’re seeing a big increase in productivity and feedback from our users is incredibly positive.”
Business Systems Manager, Top 10 Global University
If you’re interested in learning more about how to improve your organization’s data freshness and ensure your team is making the most informed decisions possible, then check out our on-demand webinar “Beating the Data Delay” (https://go.incorta.com/recording-beating-the-data-delay) where financial services giant AXA talk through their use of Incorta to deliver the data insights they need, whenever they need them!
Also, take a look at our whitepaper “5 Critical Considerations for Building an Agile Data Pipeline” (https://go.incorta.com/whitepaper-5-critical-considerations-for-building-an-agile-data-pipeline ) taking a look at how Incorta’s open data delivery platform can dramatically improve the freshness and resilience of your critical business analytics.