Skip to main content

🔒 GA4 | tutorials

Creating Default and Custom Channel Grouping in BigQuery - Part II: the Grouping Logic

This tutorial is the second out of two articles that walk you through how to build a comprehensive channel grouping using SQL, based on GA4, Google Ads and external data. This second part builds the actual channel grouping logic.

This article is a continuation of the previous Channel Grouping article published last week. Part I walked us through the steps to collect data from various places (these are – surprise-surprise – GA4; plus a source categorization reference table and the Google Ads data transfer).

Part II is a shorter piece: it is simply the channel grouping logic itself. Ultimately, an enormus CASE WHEN statement. As such, it can also be wrapped into a user-defined function, as described in an earlier article. The logic uses the following attributes:

  • manual_source (from GA4, fixed based on gclids)
  • manual_medium (from GA4, fixed based on gclids)
  • manual_campaign_name (from GA4, mostly for custom channel grouping rules, but Google's logic also uses it a little bit, though perhaps in a somewhat "forced" manner)
  • source_category (from an "external" mapping table)
  • ad_network_type (from Google Ads)
  • campaign_advertising_channel_type (from Google Ads)
  • page_referrer (from GA4, not used in default channel grouping)

There are two "layers" of doing this. The first one is to try to recreate a default channel grouping that matches Google's logic:

  • Direct: where all the source columns are NULL (note: it is worth still keeping the channel grouping as NULL too if you plan on doing any non-direct attribution logic in a subsequent step)
  • Cross-network: based on campaign_advertising_channel_type and manual_campaign_name
  • Display: based on manual_source, manual_medium and campaign_advertising_channel_type
  • Paid Video: based on manual_source, source_category, manual_medium, ad_network_type and campaign_advertising_channel_type
  • Paid Search: based on source_category, manual_medium, ad_network_type and campaign_advertising_channel_type
  • Paid Social: based on manual_source, source_category, manual_medium
  • Organic Social: based on manual_source, source_category, manual_medium
  • Email: based on manual_source, manual_medium
  • Affiliates: based on manual_medium
  • Paid Shopping: based on source_category, manual_medium
  • Organic Shopping: based on manual_medium, source_category, manual_campaign
  • Other Advertising: based on manual_medium
  • Organic Search: based on source_category, manual_medium
  • Organic Video: based on source_category, manual_medium
  • Referral: based on manual_medium
  • Audio: based on manual_medium
  • SMS: based on manual_source, manual_medium
  • Mobile Push Notifications: based on manual_source, manual_medium
  • Paid Other: based on source_category, manual_medium

Then you can add more channels of interest or more granular in the areas that matter most:

  • more granular break-down of Google Ads, Doubleclick or other paid channels - e.g. brand vs generic campaigns, or splitting branded and generic search campaigns; typically using manual_campaign_name
  • traffic attibutable to AI (LLMs): page_referrer
  • Google Discover (page_referrer): page_referrer
  • distiguishing different subtypes of organic traffic based on the landing page they arrive to: page_location (or a derivative of that)
  • ... etc!
💡
Note that most channels are identified by a combination of columns / rules (sometimes with an AND and sometimes with an OR logical operator, and sometimes both), and some if these rules (in isolation) overlap with those of other channels, so care should be taken when deciding the sequential order of CASE WHEN statements.

Let's see how all that manifests in SQL code.