Skip to main content

πŸ”’ GA4 | tutorials

Conversion attribution (event level): last non-direct click

This article describes the way to perform different flavors of last-click attribution on GA4 data in BigQuery, on the event level, populating traffic acqusition fields for events (e.g. conversions).

Despite the fact that there are now five sets of attribution columns in the GA4 export (or more if we count the Doubleclick ones), we digital analysts might come across situations where we have to create our own rules (or use the client's rules) to determine the originating channel for conversion events, and then create an aggregated report to show how many conversions were driven by the different acquisition channels.

Setting the stage: attribution on session and event levels

Attibution is a rather "loaded" topic for many reasons, but to a large part because it can be done in many-many different ways. It is also a subjective thing; there is no right or wrong approach (though some can safely be called worse than others πŸ˜†)β€”the question is, whether decisions can be made based on the approach you pick, and whether those decisions bring value to your business.

We'll casually omit the fact that Marketing Mix Modelling, incrementality and geo-holdout tests, and lead scoring based approaches might be better (or equally good) ways to analyze channel performance, and we'll just focus on multi-touch attribution, which is still a widely used mechanism and it has its rightful place in our arsenal.

Even within the Multi-Touch Attribution (MTA) space, there are different "levels" on which attribution can be done. It is often done on a session level, where sessions are collected before a conversion (with their originating channels), and then the conversion is distributed among all those sessions. Last click gives 1 conversion to the last session and 0 to others, linear distribution divides 1 by the number of sessions and each session gets that score, etc, you know the drill. (If you don't, this article gives a decent enough overview, or this one if you want to dive straight into Markov chains, the most sophisticated way of doing this.)

πŸ’‘
If you want to work on session level data, check out our article on sessionization, using the default ga_session_id, or by applying custom session definition rules.

Sometimes digital marketers look at it in a simpler way. We have conversion events in GA4 (and in advertising systems), and sometimes we just want to map each of those conversion events to a channel. In other words, we want to create a (textual) column for each (conversion) event record (we could call it something like attributed_channel), which states which channel is "responsible" for this particular conversion. This is a simpler approach, because we are not "distributing" conversions across sessions; we just simply add a "label" to each record. And since in one event record we can only assign one value, we're pretty much tied to some flavours of a first click or last click approach.

So, this is what we're going to look at now: how we can label conversions using a customized flavour of a last-click mechanism. Another article in the near future will deal with different first-click attribution types.

Why would we want to manually attribute conversions?

Google Analytics already attributes conversions, that too in multiple ways, as described in the corresponding dimensions and metrics article. (I plan to update it in the near future, but it already explains the most important options.) We'll do a brief re-cap in the next section.

So why would we want to do this ourselves? A few different reasons.

  1. The event scoped source, medium etc. attribution columns are often not recorded against conversion events. This is because since the summer of 2024, they are only populated for the first batch of events on each page (so usually only the page_view, session_start and first_visit events.) This is briefly described in Jules Stuifbergen's blog post and this MeasureSlack discussion archive.
  2. Google, most likely for the above reason, introduced the session_traffic_source_last_click collection, a STRUCT column containing multiple STRUCTs with source, medium etc. attribution fields, populated by a last non-direct click approach on different (sub)sets of channels.
    However, there are very specific rules around how they are calculated; these fields give us the ability to produce similar numbers to session scoped dimensions on the GA4 frontend. And we might not like those rules, or our business might require changing a few aspects:
        1. the lookback window for the last non-direct approach
        2. we might want to establish a priority order between channels if multiple channels are available within the lookback window
        3. we might want to exclude or only include certain channels from the mix
        4. the session_traffic_source_last_click is built based on the collected_traffic_source values at the start of each session, but it does not consider source, medium etc values arriving in the middle of the session. This in particular is something we might want to control (and change).
  3. We might not be happy with the last non-direct approach in the first place. (Note: first click flavours will be discussed in a subsequent article, though the principles for some of those will be similar to this one.)
  4. We're nerds and we want to train our neurons using SQL in BigQuery!

Using the default attribution fields

If we do like Google's default rules, and we want to produce numbers that align with the GA4 frontend, then we could use this field with no attribution:

SELECT    collected_traffic_source.manual_campaign_id,
          collected_traffic_source.manual_campaign_name,
          collected_traffic_source.manual_source,
          collected_traffic_source.manual_medium

FROM      -- change this to your google analytics 4 export location in bigquery
          `projectname.analytics_123456789.events_*`
          
WHERE     -- define static and/or dynamic start and end date
          _TABLE_SUFFIX BETWEEN '20250801' 
            AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

Or this one, with Google's last non-direct approach:

SELECT    session_traffic_source_last_click.manual_campaign.campaign_id,
          session_traffic_source_last_click.manual_campaign.campaign_name,
          session_traffic_source_last_click.manual_campaign.source,
          session_traffic_source_last_click.manual_campaign.medium

FROM      -- change this to your google analytics 4 export location in bigquery
          `projectname.analytics_123456789.events_*`

WHERE     -- define static and/or dynamic start and end date
          _TABLE_SUFFIX BETWEEN '20250801' 
            AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

The latter has multiple flavors by:

  • based on the UTM parameters (manual_campaign)
  • based on GCLID / Google Ads link (google_ads_campaign)
  • taking both the above into consideration (cross_channel_campaign)
  • based on the Google Marketing Platform (dv360_campaign, sa360_campaign, cm360_campaign)

So if any of these fits your needs, then we're done here. If you want to customize the approach, then let's dig in below.

Building a customized approach using window functions

In the code below, we'll use the public GA4 ecommerce dataset, and we'll perform the following:

  1. preparing the data: collecting all the fields necessary (including a hit_number, a rudimentary channel grouping field and calculating the days elapsed since the previous event (session), to be used when we define a lookback window
  2. a Last Click approach using window functions that takes the last available channel grouping within the session and propagating that to each event. This is useful to plug the GA4 "bug" described above, whereby event scoped attibution fields are empty for all non-pageview events (including conversion events), unless they are sent in the first batch along with the pageview.
  3. a Last Non-Direct Click approach using window functions that takes the last available channel grouping within the last 60 days of the user's journey and propagating that to each event. This will be different from session_traffic_source_last_click in the lookback window (especially if you customize it), and also because this will take into account when new source/mediums arrive in the middle of the session (which is a more accurate view on the journey).

So, first, let's prepare the data.

Building a base event table (demo GA4 dataset)