Skip to main content

UA | Universal Analytics

Tutorial: How to create your own custom channel grouping

In this tutorial I will show you - step by step - how to mirror the default channel grouping according to the Google Analytics definitions and set up a custom channel grouping, all in BigQuery.

πŸ’‘
This article is about GA3 - Universal Analytics

If you need the Default Channel Grouping, just use the channel grouping dimension. However, there can be various reasons to build your own Channel Grouping.

When you use intraday data, for instance, as the channel grouping dimension is not available there. Or when you need to β€˜repair’ data quality issues on historical data, as this is not possible with the Default Channel Grouping dimension in the Google Analytics UI.

I will show you how you can mirror the standard Google Analytics definitions of the Default Channel Grouping in BigQuery. If you use this query as a starting point it’s not so difficult anymore to create your own custom or Default Channel Grouping.

Preparations

In this article we will use the Google Analytics Sample dataset for BigQuery, which contains analytics data from the Google Merchandise Store. I recommend you use your own data set instead.

Basic query

We start with the framework of our query. First let's define a time frame. In this case I'll take a static range with a start and end date. If you need a dynamic date range, read this.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)

Next, add a source for our data. To be able to query multiple day tables at once we use a wildcard * in the table name.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions

The query above will return an error. To make the wildcard work we need to add a _table_suffix and refer to the tables we want to query, using a cross join (in short:,) after our sessions table.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date

Finally, we need to add another condition. We only want to count visits that contain interaction: totals.visits = 1. By default and opposing to the Google Analytics user interface, the BigQuery export tables contains all sessions, including the ones without interactions.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and totals.visits = 1

Add default channel grouping

As a reference we add the channelgrouping dimension and a measure for sessions. We group the sessions by the alias default_channel_grouping and order by sessions descending.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  channelgrouping as default_channel_grouping,
  count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and totals.visits = 1
group by
  default_channel_grouping
order by
  sessions desc

Mirror default channel grouping

Next, we start mirroring the default channel grouping according to the documentation Google provides us. Basically we add a monstrous case when statement with the exact definitions that Google Analytics uses by default to create the channel grouping. Don't forget to add ,unnest(hits) as hits because the logic needs some hit level data too.

To add flexibility and efficiency in your query, you can use regular expression.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  channelgrouping as default_channel_grouping,
  case
    when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
    when trafficsource.medium = 'organic' then 'Organic Search'
    when hits.social.hassocialsourcereferral = 'Yes' or regexp_contains(trafficsource.medium, '^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
    when trafficsource.medium = 'email' then 'Email'
    when trafficsource.medium = 'affiliate' then 'Affiliates'
    when trafficsource.medium = 'referral' then 'Referral'
    when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' then 'Paid Search'
    when regexp_contains(trafficsource.medium, '^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
    when regexp_contains(trafficsource.medium, '^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'Content' then 'Display'
  else
  '(Other)'
end
  as custom_channel_grouping,
  count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and totals.visits = 1
group by
  default_channel_grouping,
  custom_channel_grouping
order by
  sessions desc

The query above will probably generate some slight differences, even when the default channel grouping is not already adjusted in the Google Analytics user interface. You can check this in Admin > Channel Settings > Channel Grouping. Whenever you see System defined just follow the default definition for this channel.

If the default channel grouping in your Google Analytics view is already customised for some channels, you'll have to add all additions on top of the default logic.

Here you are not limited by the building blocks that are already in the query: you can add any combination of conditions you want.

In the Google Analytics user interface there is a possibility to make a distinction between Generic and Branded Paid Search traffic. Although the query type is not available in our BigQuery export, we can use some custom logic if the campaign names are set up in a consistent manner. In this example branded paid search always has the word brand in the campaign title:

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  channelgrouping as default_channel_grouping,
  case
    when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
    when trafficsource.medium = 'organic' then 'Organic Search'
    when hits.social.hassocialsourcereferral = 'Yes' or regexp_contains(trafficsource.medium, '^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
    when trafficsource.medium = 'email' then 'Email'
    when trafficsource.medium = 'affiliate' then 'Affiliates'
    when trafficsource.medium = 'referral' then 'Referral'
    when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' and lower(trafficsource.campaign) like '%brand%' then 'Branded Paid Search'
    when regexp_contains(trafficsource.medium, '^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'Content' and lower(trafficsource.campaign) not like '%brand%' then 'Generic Paid Search'
    when regexp_contains(trafficsource.medium, '^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
    when regexp_contains(trafficsource.medium, '^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'Content' then 'Display'
  else
  '(Other)'
end
  as custom_channel_grouping,
  count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and totals.visits = 1
group by
  default_channel_grouping,
  custom_channel_grouping
order by
  sessions desc

Now it's your turn!

I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.