Call SSAS OLAP cubes from R stats

Lee Hawthorn May 17, 2019 #R #SSAS

When data is stored in OLAP cubes on SQL Server this can be a convenient data source for analysts as the data is pre-cleaned and the model is usually named with known business entities like Product, Division, Customer etc...

Low latency

OLAP cubes should be optimised to return query results within a few seconds. Compare this to a SQL query that can take hours. This helps the analyst with data exploration. I wouldn't recommend using SSAS cubes for training data but you can at least follow the data lineage from the cube to locate the relevant source data.

Not on CRAN

You can query SSAS cubes with a version of R that comes from Microsoft.

The package needed is olapR - note, I can't find this in the CRAN repository, hence, you have to use the R version provided by Microsoft.

It would be easier if Microsoft published the package to CRAN but you can't have everything. When I first learnt R in 2015 there was no OLAP package in CRAN and I had to use manual hacks to get OLAP data into R.

If you're using the GNU R with R Studio you can change your version of R temporarily :

R Server

Once you've restarted R you can install the package

Install Package

Note, the Microsoft docs need updating for SQL Server 2017 Tabular Cubes with a compatibility of 1400 (I've submitted feedback to them via GitHub docs)

Assuming you have a SSAS tabular cube deployed and processed here is what you need (when SSAS is on same machine).

library(olapR)
cnnstr <- "Data Source=localhost;Provider=MSOLAP;Initial Catalog=DemoCube"
olapCnn <- OlapConnection(cnnstr)

Note, with 1400 cubes you need to include the cube name in the connection string under [Initial Catalog]

In the code below you need to name the cube "Model" as this is the alias that is used. This is different to 1200 compatibility (SQL Server 2016) where you need to replace Model with the name of your cube.

qry <- Query()
cube(qry) <- "Model"

mdx <- " SELECT NON EMPTY { [Measures].[Sum of Profit] } ON COLUMNS, NON EMPTY { ([Dimension City].[City].[City].ALLMEMBERS ) } ON ROWS FROM [Model] "

result <- execute2D(olapCnn, mdx)

It's best if you craft the MDX as this will give you the ultimate power that MDX provides. OlapR comes with an API that can be used in place of the MDX but this only works with legacy multi-dimensional cubes. You may have a tool that produces MDX, hence, don't fret.

If using a tabular model you can write DAX :-) I can't see this documented on the Microsoft site but it seems to work.

dax <- "EVALUATE SUMMARIZE('Dimension Date', 'Dimension Date'[Calendar Year Label],'Dimension Date'[Short Month])"

This code loads the results of the MDX/DAX query into a Dataframe.

Dataframe

You'll notice there is a bit of cleaning up of column names required.

Summary

Many organisations have cubes already deployed, hence, they provide a great source for data exploration to give the analyst/data scientist a head start. Leverage the data assets already in place.