Python script to get data for Power BI

Lee Hawthorn July 04, 2020 #Python #PowerBI

Update : Nov 23 :

I had to recently process a deeply nested JSON file with over 100 columns. It surprised me how well Power BI (dataflows) handled that. As an enabler you sometimes have to take a leap of faith. It doesn't always go well,but it's good when it does.

Update : 4th Jul 2020 :

I tested this with the latest version of Power BI and the problem described below has been solved. I ran a quick test with Python and it's due to PredictHQ amending their API as far as I can see.


For a recent meetup I wanted to query the website predithq. They have a dataset of events and I wanted to use this data in a meetup.

Unfortunately Power BI isn't compatible with the API this provider uses. They use a 308 Permanent redirect as the initial response, requiring an additional request. After a bit of digging Power Query only handles redirects up to 307. I have a suspicion this is .net problem, hence, may not be within control of the Power BI team to sort out.

I used this blockage to test out using a Python script as a data source for Power BI.

The Python web client has no problem handling 308 redirects, thankfully.

I'm running Python under a Conda environment so I had to set the options in Power BI to look at this environment path.

Options

I then created Python code and tested it with VS Code. Note, to get the script working with Power BI you need to install packages for pandas and matplotlib.

Power Query is not very good at processing complex, deeply nested JSON,hence, I cleaned the data in Python and constructed the Data Frame with individual lists so I could take care of missing elements.

import requests
import matplotlib
import pandas as pd


response = requests.get(
    url="https://api.predicthq.com/v1/events",
    headers={
        "Authorization": "Bearer $accesscode$",
        "Accept": "application/json"

    }, params={"limit": "50"}
)


data = response.json()


res = data['results']
category = list(map(lambda x: x['category'], res))
country = list(map(lambda x: x['country'], res))
description = list(map(lambda x: x['description'], res))
duration = list(map(lambda x: x['duration'], res))
end = list(map(lambda x: x['end'], res))
entities = list(map(lambda x: x['entities'], res))
firstseen = list(map(lambda x: x['first_seen'], res))
iid = list(map(lambda x: x['id'], res))
labels = list(map(lambda x: x['labels'], res))
location = list(map(lambda x: x['location'], res))
entity_id = list(map(lambda x: x[0]['entity_id'] if x else '', entities))
formatted_address = list(map(lambda x: x[0]['formatted_address'] if x else '', entities))
venue_name = list(map(lambda x: x[0]['name'] if x else '', entities))
venue_type = list(map(lambda x: x[0]['type'] if x else '', entities))


df = pd.DataFrame(list(zip(
    category, country, formatted_address, venue_name, venue_type,  description, duration, end, firstseen, labels,)),
    columns=['category', 'country', 'formatted_address', 'venue_name', 'venue_type', 'description', 'duration', 'end', 'firstseen', 'labels'])

print(df)

Once your happy with the data, open Power BI and Get Data from Python Script.

GetData

Paste in the Python Script.

Power BI will then execute the script returning data like any other data, giving you the option to Load or Transform.

I did some more data cleaning in Power Query on some of the list data. It's great to see the immediate feedback from cleaning with Power BI.

PowerQuery

In summary, you have a quite a bit of Power to extend the Get Data source without having to learn about the Custom Connector route. I hope Microsoft handle 308 responses soon as I only had to write this code due to the non-conformance of the RFC, which by the way was issued in 2015!

If any MVP's pick this bug up I would appreciate you feeding back to Microsoft as I expect more and more providers to use 308 redirects for future proofing.