Skip to main content

UA | Universal Analytics

How to query realtime tables and views (and combine it with daily tables)

How to query a ga_realtime_sessions_ streaming data table and combine it with daily tables. Some example queries to get started.

πŸ’‘
This article is about GA3 - Universal Analytics

If you don’t see an intraday table, but realtime tables and view, streaming export is enabled for your Google Analytics view. To query this data and join this with historical data from the ga_sessions_ tables, another approach is needed.

For each day, streaming export creates 1 new table and 1 (BigQuery) view of that table: Table: ga_realtime_sessions_YYYYMMDD is an internal staging table that includes all records of sessions for all activity that took place during the day. Data is exported continuously approximately every 15 minutes. Within this table are multiple records of a session when the session spans multiple export operations. The ga_realtime_sessions_YYYYMMDD tables should not be used (and are not supported by Google Analytics technical support) for queries. Queries on these tables may yield unexpected results as they may contain duplicate records of some sessions. Query the ga_realtime_sessions_view_YYYYMMDD view instead. View: ga_realtime_sessions_view_YYYYMMDD sits on top of the exported tables and is there to deduplicate multiple records of repeated sessions that exist across export boundaries. Query this table for deduplicated streaming data. (source)

At the time of writing this daily generated realtime view is only queryable with Legacy SQL. Any Standard SQL query will result in this error:

To be able to use Standard SQL, we have to create our own realtime view (source). Change the project and dataset in the following query and run it:

-- this query is not valid as there is no realtime data available in the sample data set

create view `project.dataset.ga_realtime_view` as
  select
    parse_date('%y%m%d',_table_suffix) as suffix,
    visitkey,
    array_agg(
    (select as struct t.* except (visitkey))
    order by exporttimeusec desc limit 1)[offset(0)].*
  from
  `project.dataset.ga_realtime_sessions_20*` as t
  group by
  suffix,
  visitkey

Now you have a deduplicated realtime view called ga_realtime_view. Next, you can query this view with Standard SQL:

-- this query is not valid as there is no realtime data available in the sample data set

select
  date,
  count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
from
  `project.dataset.ga_realtime_view`
where
  suffix = current_date()
  and totals.visits = 1
group by
  date

Apart from this workaround, there is another thing about realtime data: not all data fields are available in the realtime view, such as ads data and the Default Channel Grouping dimension.

In the next example query we will combine data from our realtime view with historical data from the ga_sessions_ tables and deal with the absence of the Default Channel Grouping in the realtime view as a bonus (also see this tutorial). To do this we’ll use union all.

-- this query is not valid as there is no realtime data available in the sample data set

select
  date,
  channelgrouping as default_channel_grouping,
  sessions
from (
  select
    date,
    case
      when trafficsource.source = '(direct)' and (trafficsource.medium = '(not set)' or trafficsource.medium = '(none)') then 'Direct'
      when trafficsource.medium = 'organic' then 'Organic Search'
      when hits.social.hassocialsourcereferral = 'yes' or regexp_contains(trafficsource.medium,'^(social|social-network|social-media|sm|social network|social media)$') then 'Social'
      when trafficsource.medium = 'email' then 'Email'
      when trafficsource.medium = 'affiliate' then 'Affiliates'
      when trafficsource.medium = 'referral' then 'Referral'
      when regexp_contains(trafficsource.medium,'^(cpc|ppc|paidsearch)$') and trafficsource.adwordsclickinfo.adnetworktype != 'content' then 'Paid Search'
      when regexp_contains(trafficsource.medium,' ^(cpv|cpa|cpp|content-text)$') then 'Other Advertising'
      when regexp_contains(trafficsource.medium,'^(display|cpm|banner)$') or trafficsource.adwordsclickinfo.adnetworktype = 'content' then 'Display'
    else
    '(Other)'
  end
    as channelgrouping,
    count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
  from
    `project.dataset.ga_realtime_view`,
    unnest(hits) as hits
  where
    suffix = current_date()
    and totals.visits = 1
  group by
    date,
    default_channel_grouping
  union all
  select
    date,
    channelgrouping,
    count(distinct concat(fullvisitorid, cast(visitstarttime as string))) as sessions
  from
    `bigquery-public-data.google_analytics_sample.ga_sessions_*`
  where
    _table_suffix between '20160801'
    and format_date('%Y%m%d',date_sub(current_date(), interval 1 day))
    and totals.visits = 1
  group by
    date,
    channelgrouping)
order by
  date desc