Skip to main content

UA | Universal Analytics

Tutorial: How to replicate the 'Behavior | Site content | All pages' report

In this tutorial I will show you - step by step - how to replicate the Google Analytics All pages report in BigQuery.

💡
This article is about GA3 - Universal Analytics

In this tutorial I will show you - step by step - how to replicate the Google Analytics All pages report in BigQuery.

All pages report

In the Behaviour | Site content | All pages report you'll find data about the performance of your site content, segmented by page and page title.

Preparations

In this article we will use the Google Analytics Sample dataset for BigQuery, which contains analytics data from the Google Merchandise Store. I recommend you use your own data set instead.

Now, let's take a look at the primary dimensions and measures we need to replicate this report. As a reference I'll use the definitions in the Google Analytics user interface.

Primary dimensions:

  • page
  • page title

Measures

  • pageviews
  • unique pageviews
  • avg. time on page
  • entrances
  • bounce rate
  • exit rate
  • page value

Basic query

We start with the framework of our query. First let's define a time frame. In this case I'll take a static range with a start and end date. If you need a dynamic date range, read this.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)

Next, add a source for our data. To be able to query multiple day tables at once we use a wildcard * in the table name.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions

The query above will return an error. To make the wildcard work we need to add a _table_suffix and refer to the tables we want to query, using a cross join (in short:,) after our sessions table.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date

As we are replicating the All pages report, we have to unnest our session data so that we have access to the hit level data (again we use a cross join).

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date

Now BigQuery will look at all hits. But as we only need the hits that contain pageviews, we can add a condition to the filter accordingly.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'

Finally, we need to add another condition. We only want to count visits that contain interaction: totals.visits = 1. By default and opposing to the Google Analytics user interface, the BigQuery export tables contains all sessions, including the ones without interactions.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  *
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'
  and totals.visits = 1

Add dimensions

Next, we add the dimensions we need to replicate this report. As we can see in the BigQuery export schema we need:

  • hits.page.pagepath (page)
  • hits.page.pagetitle (page title)
with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'
  and totals.visits = 1

Add measure: pageviews

Pageviews is the total number of pages viewed. Repeated views of a single page are counted.

To count the pageviews for our dimensions we just have to count all rows and group them by our dimensions. Remember, we can only use this approach in combination with the hits.type = 'PAGE' condition.

By default the All pages report is sorted on pageviews (descending). We add that too.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'
  and totals.visits = 1
group by
  page,
  page_title
order by
  pageviews desc

Add measure: unique pageviews

Unique Pageviews is the number of sessions during which the specified page was viewed at least once. A unique pageview is counted for each page URL + page title combination.

This last sentence from the definition is the tricky part here. There are multiple ways to calculate this measure, depending on the dimensions you choose to include in your report.

  • if you have page and page title as dimensions, you can just use a count of sessions:

count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews

  • if you only have the dimension page in your report, you can use the same code, but it will be a quick and dirty calculation (as it will only be counted for each page url, and not for a combination with page title)
  • if you only have the dimension page in your report and strictly follow Google Analytics' definition, then you'll need a subquery (up) to achieve this and join the result (unique pageviews) in with a left join:
with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date),
    
up as (
select
  page,
  page_title,
  sum(unique_pageviews) as unique_pageviews
from (
  select
    hits.page.pagepath as page,
    hits.page.pagetitle as page_title,
    concat(hits.page.pagepath,hits.page.pagetitle) as page_concat,
    count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews
  from
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    unnest(hits) as hits,
    date_range
  where
    _table_suffix between start_date and end_date
    and totals.visits = 1
    and hits.type = 'PAGE'
  group by
    page,
    page_title,
    page_concat)
group by
  page,
  page_title)
  
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  up.unique_pageviews
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join up on hits.page.pagepath = up.page
  and hits.page.pagetitle = up.page_title
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'
  and totals.visits = 1
group by
  page,
  page_title,
  up.unique_pageviews
order by
  pageviews desc

In this tutorial we use both dimensions page and page value, so we can proceed without the subquery and count sessions as unique pageviews.

with date_range as (
  select
    '20160801' as start_date,
    '20160802' as end_date)
    
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between date_range.start_date and date_range.end_date
  and hits.type = 'PAGE'
  and totals.visits = 1
group by
  page,
  page_title
order by
  pageviews desc

Add measure: average time on page

The average amount of time users spent viewing a specified page or screen, or set of pages or screens.

Next up, average time on page. For this measure we cannot escape to add a subquery, as it is quite complicated.

Again, we start with the building blocks of our subquery: we're going in deep here: for every visit(or) we look at the timestamp of every pageview and identify the last pageview in the session and for every page we set the timestamp of the next pageview.

select
  fullvisitorid,
  visitstarttime,
  hits.page.pagepath,
  hits.page.pagetitle,
  hits.time / 1000 as hit_time,
  hits.type,
  hits.isexit,
  max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
  lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
  unnest(hits) as hits,
  date_range
where
  _table_suffix between start_date and end_date
  and hits.type = 'PAGE'
  and totals.visits = 1

Next step: calculate the time on page for each page for each session.

select
  fullvisitorid,
  visitstarttime,
  pagepath,
  pagetitle,
  hit_time,
  type,
  isexit,
  case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
from (
  select
    fullvisitorid,
    visitstarttime,
    hits.page.pagepath,
    hits.page.pagetitle,
    hits.time / 1000 as hit_time,
    hits.type,
    hits.isexit,
    max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
    lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
  from
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
    unnest(hits) as hits,
    date_range
  where
    _table_suffix between start_date and end_date
    and hits.type = 'PAGE'
    and totals.visits = 1)

Then we group pageviews, exits and total time on page by page and page title.

select
  pagepath,
  pagetitle,
  count(*) as pageviews,
  countif(isexit is not null) as exits,
  sum(time_on_page) as total_time_on_page
from (
  select
    fullvisitorid,
    visitstarttime,
    pagepath,
    pagetitle,
    hit_time,
    type,
    isexit,
    case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      hits.page.pagepath,
      hits.page.pagetitle,
      hits.time / 1000 as hit_time,
      hits.type,
      hits.isexit,
      max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
      lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
    from
      `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
      unnest(hits) as hits,
      date_range
    where
      _table_suffix between start_date and end_date
      and hits.type = 'PAGE'
      and totals.visits = 1))
group by
  pagepath,
  pagetitle

Finally we calculate the average time on page, based on the total time on page and the amount of pageviews minus the exits.

select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle)

Now we are ready to insert this subquery into our main query. We do this by wrapping it in a with subquery and inserting it by a left join on page and page title. This will result in quite a large query.

with date_range as (
  select
    '20160801' as start_date,
    '20160801' as end_date),

avg_time as (
select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle))
      
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
  avg_time_on_page
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join avg_time on hits.page.pagepath = avg_time.page
  and hits.page.pagetitle = avg_time.pagetitle
where
  _table_suffix between start_date and end_date
  and totals.visits = 1
  and hits.type = 'PAGE'
group by
  page,
  page_title,
  avg_time_on_page
order by
  pageviews desc

Add measure: entrances

Entrances is the number of times visitors entered your site through a specified page or set of pages.

The next few measures are relatively easy. Let's start with adding entrances. There is a column on hit level that indicates whether a hit is an entrance, we only have to count the cases.

with date_range as (
  select
    '20160801' as start_date,
    '20160801' as end_date),

avg_time as (
select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle))
      
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
  avg_time_on_page,
  countif(hits.isentrance = true) as entrances
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join avg_time on hits.page.pagepath = avg_time.page
  and hits.page.pagetitle = avg_time.pagetitle
where
  _table_suffix between start_date and end_date
  and totals.visits = 1
  and hits.type = 'PAGE'
group by
  page,
  page_title,
  avg_time_on_page
order by
  pageviews desc

Add measure: bounce rate

The percentage of single-page sessions in which there was no interaction with the page. A bounced session has a duration of 0 seconds.

Luckily, we have data on the amount of bounces, so we are able to calculate the bounce rate by dividing them by the amount of sessions that included a certain page.

with date_range as (
  select
    '20160801' as start_date,
    '20160801' as end_date),

avg_time as (
select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle))
      
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
  avg_time_on_page,
  countif(hits.isentrance = true) as entrances,
  countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join avg_time on hits.page.pagepath = avg_time.page
  and hits.page.pagetitle = avg_time.pagetitle
where
  _table_suffix between start_date and end_date
  and totals.visits = 1
  and hits.type = 'PAGE'
group by
  page,
  page_title,
  avg_time_on_page
order by
  pageviews desc

Add measure: exit rate

%Exit is (number of exits) / (number of pageviews) for the page or set of pages. It indicates how often users exit from that page or set of pages when they view the page(s).

Same as with entrances: we check if a hit is an exit. If yes, then we divide the total amount of exits per page by the amount of sessions for that page.

with date_range as (
  select
    '20160801' as start_date,
    '20160801' as end_date),

avg_time as (
select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle))
      
select
  hits.page.pagepath as page,
  hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
  avg_time_on_page,
  countif(hits.isentrance = true) as entrances,
  countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate,
  countif(hits.isexit = true) / count(*) as exit_rate
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join avg_time on hits.page.pagepath = avg_time.page
  and hits.page.pagetitle = avg_time.pagetitle
where
  _table_suffix between start_date and end_date
  and totals.visits = 1
  and hits.type = 'PAGE'
group by
  page,
  page_title,
  avg_time_on_page
order by
  pageviews desc

Add measure: page value

The average value of this page or set of pages. Page Value is ((Transaction Revenue + Total Goal Value) divided by Unique Pageviews for the page or set of pages)).

Our last measure from the All pages report is a bit less well known. In the documentation Google is more detailed than the description in the user interface:

Page Value is the average value for a page that a user visited before landing on the goal page or completing an Ecommerce transaction (or both).

As this measure is custom for every Google Analytics user, it is hard to include this in a standardised way. When I stumble upon a good use case and example data I may write a separate article about it.

Final query

with date_range as (
  select
    '20160801' as start_date,
    '20160801' as end_date),

avg_time as (
select
  pagepath as page,
  pagetitle,
  case when pageviews = exits then 0 else total_time_on_page / (pageviews - exits) end as avg_time_on_page
from (
  select
    pagepath,
    pagetitle,
    count(*) as pageviews,
    countif(isexit is not null) as exits,
    sum(time_on_page) as total_time_on_page
  from (
    select
      fullvisitorid,
      visitstarttime,
      pagepath,
      pagetitle,
      hit_time,
      type,
      isexit,
      case when isexit is not null then last_interaction - hit_time else next_pageview - hit_time end as time_on_page
    from (
      select
        fullvisitorid,
        visitstarttime,
        hits.page.pagepath,
        hits.page.pagetitle,
        hits.time / 1000 as hit_time,
        hits.type,
        hits.isexit,
        max(if(hits.isinteraction = true,hits.time / 1000,0)) over (partition by fullvisitorid, visitstarttime) as last_interaction,
        lead(hits.time / 1000) over (partition by fullvisitorid, visitstarttime order by hits.time / 1000) as next_pageview
      from
        `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
        unnest(hits) as hits,
        date_range
      where
        _table_suffix between start_date and end_date
        and hits.type = 'PAGE'
        and totals.visits = 1))
  group by
    pagepath,
    pagetitle))
      
select
  hits.page.pagepath as page,
  -- hits.page.pagetitle as page_title,
  count(*) as pageviews,
  count(distinct concat(cast(fullvisitorid as string), cast(visitstarttime as string))) as unique_pageviews,
  avg_time_on_page,
  countif(hits.isentrance = true) as entrances,
  countif(totals.bounces = 1) / count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as bounce_rate,
  countif(hits.isexit = true) / count(*) as exit_rate
from
  `bigquery-public-data.google_analytics_sample.ga_sessions_*` as sessions,
  unnest(hits) as hits,
  date_range
  left join avg_time on hits.page.pagepath = avg_time.page
  and hits.page.pagetitle = avg_time.pagetitle
where
  _table_suffix between start_date and end_date
  and totals.visits = 1
  and hits.type = 'PAGE'
group by
  page,
  avg_time_on_page
  -- ,page_title
order by
  pageviews desc

Now it's your turn!

I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Please let me know if you have any feedback or suggestions to improve the quality of the content.