Data Analysis with Lee Hawthorn

Why is R so useful

Topics: R

As an Excel power user I know Excel can be used to do pretty much anything – I’ve even seen Excel being used to play the Game of Life. If this is the case why do we need R?

In this post I’ll tell you why and then show you.

Reproducible

We can write an R script once to do any of the following :

  • Acquire data
  • Clean
  • Transform
  • Analyse
  • Model
  • Report
  • Publish

If the R is written in the correct way it’s reproducible by default. This is very beneficial if data is dynamic or if you need to pass the script to a colleague. You don’t want them getting a different result.

Flexible

Excel is flexible as mentioned above. Excel is also limited by the resources Microsoft decide to invest in the product. Even if Microsoft had the resources I think they would only include functionality that is useful for a broad spectrum of people.

How about R? First off, it’s open source which means if we need a real niche feature we can write it ourselves and many people do. Furthermore, the architecture of R is based around a package system which makes it highly adaptable. To be fair, Excel has extensibility (VBA, COM, .NET) but to write an extension one has to learn a different language. Packages in R are written with R and C if it's needed. There’s a little extra to learn but learn to use R and you’re very close to being a package author.

At the time of writing there are over 6000 packages available for use. Given R’s long lineage I am willing to bet that the problem you have will have been solved by someone else.

Scalability and Availability

You can run R in lots of different places:

  • Laptop
  • Tablet
  • Cloud
  • In databases
  • On really big machines

It’s important to know that R runs in-memory. This makes it very fast, although memory can be a constraint.

Publish

With R we can send the output to lots of different places. We’re not constrained by a worksheet.

  • Slides
  • Document
  • Web page
  • Application
  • Web Service

Okay, enough theory. Here’s an example of customer analysis. The problem we’re seeing is that customers are disappearing. We need to find more insight before we can look to mitigate the problem.

As you can see in the Excel screenshot below, I’ve pulled in data from a database and grouped customers based on demographics. We see 12 months of customer count.

The second table is a copy of the first but with customer count aligned to the left (for charting/formula clarity). I did this by hand. You could it with a formula to be fair. The next table calculates Retention which takes the current month count and divides by month 1. With this third table it’s simple to create a chart.

Excel

We’ve solved our problem. What is going on with group 4?

It’s not all good though. What happens if we see more groups in the database? Or more months? This worksheet doesn’t grow. I know we can structure the workbook to make it expandable. We can use dynamic ranges to populate the chart. This is okay for me. I’m a power user after all. What about the other users? Wouldn’t it be good if we could forget about structure and layout and just focus on the problem at hand?

This is what R gives us. I wrote the code and published to a website using R Markdown. R Markdown gives us a facility to write a report or slides and embed R directly in the document. You can find a copy of the R Markdown here and the published report here.

Previous PostInteractive Time Series in 3 lines of R
Next PostScraping data with R