Tutorial: How to customise the default engaged session definition (GA4)
6 min read

Tutorial: How to customise the default engaged session definition (GA4)

In this tutorial I will show you - step by step - how to replicate the engaged session definition in GA4, so you can customise it at will.
Tutorial: How to customise the default engaged session definition (GA4)

The calculation for engaged sessions above is based on the event user_engagement. To be more specific: when the string.value for parameter key session_engaged = '1'. The definition of engaged sessions, according to the GA4 user interface, is:

The number of sessions that lasted longer than 10 seconds, or had a conversion event, or had 2 or more screen or page views.

You can set a conversion event in the UI, but the definitions of session length or the amount of page views cannot be changed if you use the event user_engagement. But what if we want to customise the definition of an engaged session? In this tutorial I'll show you how to create your own engaged session definition, that can be applied on your historical data too.

Date range

We start with the framework of our query. First let's define a time frame. In this case I'll take a range with a static start and dynamic end date. If you're wondering what your options are, then read this tutorial first.

-- subquery to define static and/or dynamic start and end date
with date_range as (
select
    '20210101' as start_date,
    format_date('%Y%m%d',date_sub(current_date(), interval 1 day)) as end_date)

Replicate GA4 definition

When you look at the definition of an engaged session in GA4 above, you'll see that we're counting sessions that meet at least one of the next three conditions:

  • session length greater than 10 seconds, or
  • 2 or more screen / page views, or
  • conversion event

We will build our query in the same structure, containing a base sub query and three more sub queries, each checking one of the conditions above. Then we collect all sessions in another sub query and deduplicate them. Finally we count the sessions by date in our main query.

Translated to SQL the framework of our query is going to look like this:

-- subquery to define static and/or dynamic start and end date for the whole query
with date_range as (
select
    '20210101' as start_date,
    format_date('%Y%m%d',date_sub(current_date(), interval 1 day)) as end_date),
    
-- base query to pull in all raw data
base as (  
select
    'loremimpsum'
from
    -- change this to your google analytics 4 export location in bigquery
    `ga4bigquery.analytics_250794857.events_*`,
    date_range
where
    _table_suffix between date_range.start_date and date_range.end_date),
    
-- subquery to get all sessions with a length > 10 seconds
session_length as (
select
    'loremimpsum'
from
    base),
    
-- subquery to get all sessions with 2 or more (unique) page views
multiple_pageviews as (
 select
    'loremimpsum'
from
    base),
    
-- subquery to get all sessions with a conversion event (in this example 'first_visit')
conversion_event as (
select
    'loremimpsum'
from
    base),

-- subquery to combine and deduplicate all subqueries generated earlier
dedup as (
select
    *
from
    session_length
union distinct 
select
    *
from
    multiple_pageviews 
union distinct
select
    *
from
    conversion_event)

-- main query to count unique engaged sessions by date in descending order
select
    *
from
    dedup

Now we have our framework, it is merely a matter of filling in the details. After that you can easily tweak the conditions to create your own definition of an engaged session in GA4.

Base query

I'll just provide the base query here. It might not make a lot of sense yet, but we need this to get going. This is the only query that pulls data in from GA4's events_ schema, so make sure to change the reference to your export location.

-- base query to pull in all raw data
base as (
select
    user_pseudo_id,
    (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
    event_name,
    event_timestamp,
    (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') as pageview_location
from
    -- change this to your google analytics 4 export location in bigquery
    `ga4bigquery.analytics_250794857.events_*`,
    date_range
where
    _table_suffix between date_range.start_date and date_range.end_date)

Session length

Let's start with our first meaningful sub query, that will check for each session in our date range whether the session length is greater than 10 seconds. For this condition we need:

  • the user id
  • the session id
  • the first event timestamp of the session
  • a calculation of the time in seconds between the first and last event timestamp of the session

Then we group everything by user id and session id. The 10 second condition is configured in the having clause.

-- subquery to get all sessions with a length > 10 seconds
session_length as (
select
    user_pseudo_id,
    session_id,
    timestamp_micros(min(event_timestamp)) as session_start_time,
    (max(event_timestamp)-min(event_timestamp))/1000000 as session_length_seconds
from
    base
group by
    user_pseudo_id,
    session_id
having
    -- change this number to adjust the desired session length
    session_length_seconds > 10
order by
    user_pseudo_id,
    session_id)

Multiple pageviews

Next up: the sub query that scans each session in our date range on the multiple pageviews condition. To make this work we need:

  • the user id
  • the session id
  • the first event timestamp of the session
  • the amount of pageviews in the session

To prevent lots of duplicate rows I make use of a window function. Then we group everything by user id and session id. The pageview amount condition is configured in the where clause.

-- subquery to get all sessions with 2 or more (unique) page views
multiple_pageviews as (
select
    user_pseudo_id,
    session_id,
    session_start_time,
    pageviews
from (
    select
        user_pseudo_id,
        session_id,
        timestamp_micros(min(event_timestamp)) as session_start_time,
        count(pageview_location) over (partition by user_pseudo_id, session_id) as pageviews,
        row_number() over (partition by user_pseudo_id, session_id) as row_number
    from
        base
    group by
        user_pseudo_id,
        session_id,
        pageview_location)
where
    row_number = 1
    -- change this number to adjust the desired amount of page views
    and pageviews >= 2)

Conversion event

The next part of our journey is relatively simple. We just need to select sessions that contain a conversion event. Again, the configuration is done in the where clause.

Furthermore, we grab from the base query:

  • the user id
  • the session id
  • the first event timestamp of the session
-- subquery to get all sessions with a conversion event (in this example 'first_visit')
conversion_event as (
select
    user_pseudo_id,
    session_id,
    timestamp_micros(min(event_timestamp)) as session_start_time
from
    base
where
    -- change this event_name to adjust the desired conversion event
    event_name = 'first_visit'
group by
    user_pseudo_id,
    session_id)

Deduplicate results

Because the selected sessions defined in the previous sub queries can overlap, we need to deduplicate the results, so that we can count unique sessions, later on in our main query. To accomplish this we use union distinct.

-- subquery to combine and deduplicate all subqueries generated earlier
dedup as (
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    session_length
union distinct 
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    multiple_pageviews 
union distinct
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    conversion_event)

Count sessions

The dedup query generates a list of unique sessions that we only need to count, group by date and present in descending order.

-- main query to count unique engaged sessions by date in descending order
select
    date(session_start_time) as date,
    count(distinct concat(user_pseudo_id,session_id)) as engaged_sessions
from
    dedup
group by
    date
order by
    date desc

Final query

Now we have our building blocks, let's put everything together following the framework we designed earlier in this tutorial.

-- subquery to define static and/or dynamic start and end date
with date_range as (
select
    '20210101' as start_date,
    format_date('%Y%m%d',date_sub(current_date(), interval 1 day)) as end_date),

-- base query to pull in all raw data
base as (
select
    user_pseudo_id,
    (select value.int_value from unnest(event_params) where key = 'ga_session_id') as session_id,
    event_name,
    event_timestamp,
    (select value.string_value from unnest(event_params) where event_name = 'page_view' and key = 'page_location') as pageview_location
from
    -- change this to your google analytics 4 export location in bigquery
    `ga4bigquery.analytics_250794857.events_*`,
    date_range
where
    _table_suffix between date_range.start_date and date_range.end_date),

-- subquery to get all sessions with a length > 10 seconds
session_length as (
select
    user_pseudo_id,
    session_id,
    timestamp_micros(min(event_timestamp)) as session_start_time,
    (max(event_timestamp)-min(event_timestamp))/1000000 as session_length_seconds
from
    base
group by
    user_pseudo_id,
    session_id
having
    -- change this number to adjust the desired session length
    session_length_seconds > 10
order by
    user_pseudo_id,
    session_id),
    
-- subquery to get all sessions with 2 or more (unique) page views
multiple_pageviews as (
select
    user_pseudo_id,
    session_id,
    session_start_time,
    pageviews
from (
    select
        user_pseudo_id,
        session_id,
        timestamp_micros(min(event_timestamp)) as session_start_time,
        count(pageview_location) over (partition by user_pseudo_id, session_id) as pageviews,
        row_number() over (partition by user_pseudo_id, session_id) as row_number
    from
        base
    group by
        user_pseudo_id,
        session_id,
        pageview_location)
where
    row_number = 1
    -- change this number to adjust the desired amount of page views
    and pageviews >= 2),
    
-- subquery to get all sessions with a conversion event (in this example 'first_visit')
conversion_event as (
select
    user_pseudo_id,
    session_id,
    timestamp_micros(min(event_timestamp)) as session_start_time
from
    base
where
    -- change this event_name to adjust the desired conversion event
    event_name = 'first_visit'
group by
    user_pseudo_id,
    session_id),

-- subquery to combine and deduplicate all subqueries generated earlier
dedup as (
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    session_length
union distinct 
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    multiple_pageviews 
union distinct
select
    user_pseudo_id,
    session_id,
    session_start_time
from
    conversion_event)

-- main query to count unique engaged sessions by date in descending order
select
    date(session_start_time) as date,
    count(distinct concat(user_pseudo_id,session_id)) as engaged_sessions
from
    dedup
group by
    date
order by
    date desc

Now it's your turn!

I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.


Enjoying this post? Subscribe to the GA4BigQuery newsletter!

You will receive regular updates (max 1x/week) straight in your inbox. You can unsubscribe anytime.