Best practices, Product

What Is A Semantic Layer? More Importantly, What’s Underneath?

If you’ve ever worked with data in a business context, you’ve probably always viewed it through a semantic layer. You may not even have realized it, but you would certainly notice its absence.

Business systems tend to store a shocking number of individual data elements. For example, think of all the ways to express a company address anywhere in the world. There is a billing address, shipping address, physical address, per-order drop-ship address, warehouse addresses, and so on — and the effective dates and change histories for all of these. Even a moderately complex system can have dozens or hundreds of data tables.

Business systems also store a lot of data that has no value for analytics. For example, data that is used only to run the application, like user preferences, default values, and system configuration. They also store data that is neither categorical nor numerical, like descriptions, comments, drawings, pictures, and audio recordings. These all need to be converted into values that can be counted.

Most applications display calculated data to the user. Since these values are computed on-the-fly, they aren’t typically stored as data, so they have to be recomputed in the analytic system. Sometimes you might also create new calculated fields just for the purpose of analysis.

A business semantic layer cuts down on the amount and complexity of the data to make it both more meaningful, and easier to use. Creating a semantic layer involves four main tasks:

  • Selecting what’s useful from the raw data
  • Naming fields and columns in a way that makes sense
  • Combining data from different tables that should be logically grouped together
  • Recreating formulas and calculations.

Most business users only encounter data that has already been curated this way, never realizing what raw application data really looks like.

They may also not realize that there are two vastly different approaches to arriving at this business-friendly end-state, and that the choice of approach has big implications for their ability to experiment, pivot, drill in, and answer new questions.

The traditional approach involves transforming the data so that it conforms to a simplified data model. The complexities and details are stripped out, and what is left is a distillation of the original data. The data model is designed to answer a particular set of business questions, along with a reasonable set of variations, quickly and efficiently.

The problem with this approach is that nobody really knows what questions they need to ask from the outset. As data is explored, and threads are followed, new and unanticipated questions will emerge. The model may prove inadequate, and need to be redesigned. This can mean weeks of delays. When that happens, it can result in the loss of the original curiosity and missed business opportunities.

The Incorta approach is different. Underneath Incorta’s semantic layer is all your data that’s fit for analytics, in full fidelity and with all the transactional detail. Data selection, renaming, combining, and calculations are done on the fly — all with blazing speed. What that means is that you can drill deep into your data and answer more questions — an order of magnitude more — without having to go back to square one.

This is the key difference between Incorta and traditional systems.

With traditional analytics, you can’t drill down very far because the semantic layer sits on top of data that has been transformed — the dreaded ‘T’ in the ETL process. The data has been pruned, aggregated, flattened, and put into a dimensional model. Then the semantic layer is built on top of it.

Not only does modeling and transformation take time, the transformation process essentially changes the nature of the data. What you get is representative of the data, but is not the data itself. You can’t do root-cause analysis. You generally don’t want to do machine learning on this data. Your options for exploratory analysis are limited. You’ve introduced a potential source of errors, which can erode trust.

Incorta’s semantic layer is better because the data that is underneath it is better. Our Direct Data Mapping technology makes it possible to present a business friendly view on top of the same data that will be usable by a data scientist. Our business views are resolved at query time no matter how complex your data sources, or how much data you have.

You can easily join together data from different parts of the business simply by dragging and dropping into a view. You can combine and recombine your views, and share them with other people. If you want to chase an idea and explore in a different direction, you can. If you need to drill in and investigate why a number is the way it is, you can.

The semantic layer serves a very important purpose in data analytics. You would never want to do analysis with all the raw data that’s extracted from your source systems. Some level of curation is required. But given a choice, nor would you want to do analysis on data that has been overly curated. That is why, as important as the semantic layer is, what’s underneath it is even more important. Having all of your detailed data underneath the semantic layer can make a big difference to your business.

Ready to get hands-on with Incorta’s semantic layer? Start your free trial today and try it for yourself.