For the past several years I’ve been a keen proponent of the OLAP approach to deliver Business Intelligence.
We have different models to work from but I can sum it up in terms of :
By pulling the data from the core system we can be assured that we won’t impact it beyond the initial extraction (which is often scheduled off-peak).
The next phase to transform the data is sometimes essential. The data in the source systems is difficult to query directly. The complexity can be reduced in the transformation phase. At this point the data resides in a Data Warehouse (DW).
The transformation can include turning some of the DW into a number of multi-dimensional cubes.
We can use the cube to carry out our own analysis without having to involve IT. Queries are fast and reliable All good so far? Well, yes and no.
If you get this far it’s likely you’ll soon have multiple cubes. Don’t get me wrong, I use cubes every day in my analytic role. They are very powerful.
The difficulty comes when we need to join 2 cubes. Cubes are often designed along functional lines e.g Sales, Finance, Procurement, Marketing.
The value of the data can be increased on a huge scale if the data within the functional areas is linked. Think about some of these basic questions:
- How does the marketing effort affect Sales? How effective are our campaigns?
- Why is the stock so high?
- Why are the lead-times increasing?
- Which products will our customers likely buy next? Which customers will churn?
- What’s eating free cash-flow, how will it impact on EBITDA next quarter?
To answer these questions we often have to link the data. It’s not easy to link cubes due to their multi-dimensional aspects. Instead we have to query the DW or worst case, the source system.
When I talk to BI developers about this conundrum they say the cube needs to be remodeled. Sometimes this is correct but many times the cube shouldn’t be changed at all. The next worst thing from having no cube is having a cube that is difficult to use.
With the latest in-memory tools from QlikView, JasperSoft, Microsoft and many others it’s possible for non IT workers to write cross functional queries on the DW that are just as fast as those queries run against cubes.
To really benefit from these cutting edge tools we need the Master Keys.
We use keys to join the data. Think of a key as the look-up value in VLOOKUP.
I once read that the customer is king in business. This naturally means lots of analysis is centered around the customer. I find it odd that in many organisations the customer exists in many different parts of the business each with a record that is different to the other.
We see the customer in the Sales cube. In the Marketing cube we see Leads. Some of these Leads will become Customers but as the Customer isn’t present in the Marketing cube it’s almost impossible to link the 2 two objects. We go to the DW and find the customer key isn’t present in the table. This means we have to infer a fuzzy key using the address and possibly name. Far from ideal.
What we really need is the Master Customer Key that represents a single view of customer that can be shared across the different areas.
At the time when source systems are designed it’s often the case that no one is thinking about analytics such as this nor master sources. Data modelers need to be asking questions such as :
- How do leads relate to customers?
- Is a lead created for an existing customer?
- How does Procurement relate to Finance?
- How does the Customer relate to Finance?
When you have the Master Keys your analytical capability will be boosted.
Until then have a read of my post on Fuzzy Lookups.
As always, please feel free to comment or share your experience.