Data Analysis with Lee Hawthorn

SSIS Dependency Diagram with DiagrammeR and Sprockit

Topics: SSIS R

SQL Server Integration Services (SSIS) is a mature data integration tool bundled with SQL Server.

SSIS like many other tools can lead to 'dependency hell'. This is where packages or stored procedures are developed that carry out many tasks. With many of these packages it can be very hard for a new developer to understand the dependencies. This makes for expensive maintenance, testing, error recovery.

There are frameworks around that can limit this.

One of the execution frameworks I've had lots of success with is Sprockit.

Sprockit is a powerful but lightweight ETL process controller for managing SSIS packages and T-SQL stored procedure execution. It's free, open-source and written purely in T-SQL. Richard Swinbank developed it.

In this post I will show how you can create a diagram of dependencies with R stats.

Taking up from the end of the tutorial on Richard Swinbanks site.

We first need to create a stored procedure that we can query from R. This procedure creates Digraph text.

I have to give a lot of credit to Richard as he developed the initial SQL before I hacked it to work with DiagrammeR

USE [SprocketTutorial]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE  PROCEDURE [sprockit].[Diagrammer]
AS
 
DECLARE @gv NVARCHAR(MAX) = ''
-- Nodes (processes)
SELECT 
  @gv += '
  n' + CAST(ProcessId AS VARCHAR) + ' [label=''' + ProcessName + ''']'

FROM sprockit.Process
 
SET @gv += '
'
 
-- Edges (process dependencies)
SELECT 
  @gv += '
  n' + CAST(PredecessorId AS VARCHAR) + ' -> n' + CAST(SuccessorId AS VARCHAR) 
FROM sprockit.uvw_ProcessDependency
 
SELECT 'digraph G {  

graph [layout = dot]

node [shape = circle,
      style = filled,
      color = lightgrey,
      fontname = Helvetica,
      fixedsize = true, 
      width = 2.5]
' 
  + @gv 
  + '}' AS res
 
 RETURN 
GO

I've included some format code in the code above.

graph [layout = dot]

node [shape = circle,
      style = filled,
      color = lightgrey,
      fontname = Helvetica,
      fixedsize = true, 
      width = 2.5]

This code is used by DiagrammeR.

You can adjust this code for your own taste.

With this stored procedure done we can look at the R code.

I'm using RODBC to connect to SQL Server. The database is on my machine and I am an admin on the database called SprocketTutorial.

When the data gets into R it comes with \r\n - these need to be removed.

require("RODBC")
require(DiagrammeR)

#open the ODBC connection
dbhandle <- odbcDriverConnect('driver={SQL Server};server=DESKTOP-31S60FC;database=SprocketTutorial;trusted_connection=true')

# construct a query string
query <- "EXEC [sprockit].[Diagrammer]";
# execute the query
df<-sqlQuery(dbhandle, query)

#Remove \R\N
clean <- gsub("\r\n"," ", df$res)

grViz(clean)

Based on the tutorial on Richards site we have the following diagram :

Diagram

You can have a look at the DiagrammeR site as there is a lot more power available for creating these graph diagrams.

This diagram is great for helping developers understand the dependencies that are at the heart of the Sprockit execution tool.

You can run this code as part of your CR pipeline to ensure your documentation is kept up to date.

Previous PostGoogle Trends with R stats
Next PostCleaning Football Data with R Tidyverse