Uncategorized

Data Studio Hack – Joining Author Data in Google Sheets for Google Data Studio

Let’s join Google Analytics with author data in Google Sheets, to display it nicely in Google Data Studio. Why? I write, so I’m naturally interested in evaluating content heavy websites. Content analysis usually consists of two parts:

  1. Looking at the data from the platform on which you distribute your content, e.g. your blog (who reads what, and how often?)
  2. Looking at the meta-data of your content (which topic, who wrote this, when, how long is it?)

Those two pieces of information are usually stored in different places. (1) is in your web tracking software, e.g. Google Analytics, (2) is somewhere else e.g. in a content calendar or in your WordPress database. In a last post, we already analyzed these two pieces of data in separate tables in one Google Data Studio dashboard.

Today the goal is to put these two pieces of data into one table!

The basic steps we have to do for that are the following ones:

  1. Getting Google Analytics data into Google Sheets (Google Data Studio doesn’t yet support “joins”)
  2. Getting additional data, namely the author, to join with the Google Analytics data
  3. Calculating some new metrics and displaying everything in Google Data Studio

Our goal is to join the data together, using the “post URL” as “join key” to get a dashboard that tells us which author produces the most posts, most sessions, most sessions per post, and their bounce rate.

Here’s the example setup for this blog post.

  • data source 1: Google Analytics data, a report with
    • metric 1: Sessions
    • metric 2: bounceRate
    • dimension 1: pagePath.
  • data source 2: A Google sheet with a publishing calendar data, which would be
    • dimension 1: post URL
    • dimension 2: the author, let’s call them “Arthur” and “Bianca”
    • dimension 3: the category.

One note: I don’t filter out parameters in my GA account (right now), so in addition to joining the data I will take care of URLs like http://www.datacisions.com?utm=campaign1. If you don’t have this issue, skip Step 3.2 and Step 4.

Step 1: Getting Google Analytics data into Google Sheets

There’s already a bunch of information out there that covers this. The simplest way to get your Google Analytics data into Google Sheets is to use the plugin.

Once you have the plugin running go to “Add-ons > Google Analytics > Create new report” and configure it as follows:

  • Metrics: ga:sessions, ga:bounceRate
  • Dimensions: ga:pagePath

You can then run the report and will get a second tab called “Traffic Data” which will look something like this:

Next, we need to get the additional data from the editorial calendar into Google Sheets.

Step 2: Additional Data inside Google Sheets

Let’s create a third tab with our editorial calendar. Heres my example:

Really the only thing you have to watch out for is the dimension “Page”, as this is our “join key”. So make sure it’s in the same format as the key in the other source. Otherwise, you’ll have to normalize one or the other to match.

Step 3: Join Data together

To join the data together create the fourth tab.

Use IMPORTRANGE(….,A:C) to import pagePath, sessions, bounceRate from the “Traffic Data” tab.

Use REGEXEXTRACT  to prettify the URLs (in my case my GA data still has a bunch of parameters, so this will kill all parameters, that’s everything after the question mark in the URL). The RegEx I use to strip the URL from parameter is ([^\?]*) which matches everything until a question mark appears.

Use the functions index and match as  INDEX(,…,MATCH())  to match your additional data to your first data.

Step 4: Clean Up Data, Duplicates

We have most of our data ready. But right now, we have a bunch of duplicate rows because of query parameters. I like to have a final tab with “prettified” data, so I don’t mess up things when I import it into Google Data Studio. Here I simply use a basic SQL query to sum & average over all the articles we have.

Step 5: Creating the Data Source and Calculating new Metrics

Now let’s head over to Google Data Studio and work on the data source. First, create a new data source for this Google Sheet:

Second create two new calculated metrics, Sessions Per Post, and Post Count:

Finally, spin everything up in a report, and make it look like this:

Running Everything on Schedule

And that’s it! Don’t forget, your Google Analytics data source in Google Data Studio updates itself all the time, your Google Sheets sheet on the other hand only updates when you run the plugin. Luckily you can schedule it to run on a regular basis using the plugin itself.

Leave a Reply

Your email address will not be published. Required fields are marked *