Skip to main content

🔒 GA4 | tutorials

How to (back)stitch your custom user id to GA4's client id in BigQuery to enable cross-device analysis

In this tutorial you learn how to stitch a user_id retroactively to a clientId (user_pseudo_id) in order to identify users on different devices and track their behaviour. User stitching can be applied on user, session and event level.

Do your website or app users have the ability to create an account and log in? Then you can collect a userId (or a custom identity id stored in a user property) in Google Analytics 4. If implemented correctly, this value will not only be used by GA4 to stitch users on different devices, but the user id will also be available in the BigQuery data export for you to utilise. One of our premium subscribers noticed this and asked:

'Is it possible to stitch userId retroactively to clientId? Scenario: Visitor made 10 events. Then logged in. We want to be able to recognize the user for the previous 10 events when the user was anonymous and not yet logged in.'

Is it possible? Yes, this tutorial will guide you show you exactly how to accomplish this.

💡
Should you do it? That depends on the purpose of user stitching (is it used for optimisation or advertising?), personal and organisational policies around privacy and consent (is the user aware of this?), legal implications (is this approved by your legal team?) and so on.

By default - when no custom user id is collected - the 'best' identifier available in the GA4 data export is the user_pseudo_id, also known as clientId. This is an id GA4 uses to identify a browser on a specific device, based on a cookie that is stored client side. Using this identifier - in combination with event_timestamp - we can list all events (and sessions) in the order they happened for every anonymous user.

We can do the same, but then cross-browser and cross-device, for users that logged in at least once in every browser, as long as there is a user id sent to GA4 at login. This is called user (back)stitching. Possible use cases include:

  • creating a frequently updated user_id - user_pseudo_id mapping table to query for later use
  • connect online user behaviour with data from you CRM
  • creating an advanced path explorer (using user_id as identifier)
  • usage as input in (attribution) models

Although the outcomes of this tutorial will provide you a better view on the behaviour of users than when there is no user_id available, it is good to remember the limitations of this approach. Users that don't want to be tracked and block the GA4 tracking script, will still not show up in your reports. Furthermore, this tutorial assumes that a user_id is always unique for a customer account.

For this tutorial I will use the - anonymous - user_id that GA4 collects for logged in subscribers on this website. These ids can and will not be tied to actual persons and are only used for analytical purposes.