Skip to main content

πŸ”’ GA4 | tutorials

Boost Cross-Selling with Market Basket Analysis

This article describes the way to carry out a basic form of market basket analysis, a common and interesting type of investigation in the retail and ecommerce space. With this approach, you can identify pairs of products that are frequently purchased together.

Editor's note: this article is a guest post by Enio KurteΕ‘i. Worth following him for updates and looking at his earlier work (which we'll also bring over to GA4BigQuery in the near future).

Introduction

Understanding what products customers buy together is one of the most powerful insights in retail analytics. This type of analysis, known as market basket analysis, can transform how businesses approach product placement, inventory management, cross-selling strategies, and promotional campaigns. When you're managing an e-commerce platform or retail operation, questions like "What products should we bundle together?" or "Which items should be placed near each other in our store layout?" become critical for maximizing revenue and improving customer experience.

The analysis we're exploring today addresses these fundamental business questions by identifying the most frequently co-purchased product pairs. 

Marketing teams can use these product pair insights to create targeted bundle promotions, while merchandising teams can optimize product placement both online and in physical stores. The data also informs inventory management decisions, helping ensure that frequently co-purchased items maintain appropriate stock levels.

For e-commerce platforms, these insights can power recommendation engines that suggest complementary products during the shopping experience. When customers add one item to their cart, the system can intelligently recommend the products most likely to be purchased together, increasing average order value and improving customer satisfaction through relevant suggestions.

In today’s article, we’ll use GA4 and BigQuery to analyze the most commonly purchased product pairs, helping to answer key business questions around customer buying behavior.

Let me walk you through a SQL solution that tackles this market basket analysis challenge using Google Analytics 4 e-commerce data. This query demonstrates how to efficiently process transaction data and extract meaningful product relationship insights.

Full Code

WITH purchase_items AS (
  -- Get all items from purchase events
  SELECT
    ecommerce.transaction_id,
    item.item_name
  FROM
    `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`,
    UNNEST(items) AS item
  WHERE
    event_name = 'purchase'
    AND ecommerce.transaction_id IS NOT NULL
    AND item.item_name IS NOT NULL
),


product_pairs AS (
  -- Create pairs of products bought in same transaction
  SELECT
    a.item_name AS item1_name,
    b.item_name AS item2_name,
    a.transaction_id
  FROM
    purchase_items a
    JOIN purchase_items b ON a.transaction_id = b.transaction_id
  WHERE
    a.item_name < b.item_name  -- Avoid duplicates like (A,B) and (B,A)
)


-- Count and show most common pairs
SELECT
  item1_name,
  item2_name,
  COUNT(DISTINCT transaction_id) AS times_bought_together
FROM
  product_pairs
GROUP BY
  item1_name, item2_name
ORDER BY
  times_bought_together DESC

First CTE: Extracting Purchase Items

WITH purchase_items AS (
  -- Get all items from purchase events
  SELECT
    ecommerce.transaction_id,
    item.item_name
  FROM
    `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_202101*`,
    UNNEST(items) AS item
  WHERE
    event_name = 'purchase'
    AND ecommerce.transaction_id IS NOT NULL
    AND item.item_name IS NOT NULL
),

The first CTE establishes our foundation by extracting all purchased items from the GA4 events data. We're specifically filtering for purchase events because we only want to analyze completed transactions, not items that were merely viewed or added to cart. The UNNEST function is crucial here because GA4 stores multiple items within a single event record as an array, and we need to flatten this structure to get individual item records.

We're also applying essential data quality filters by excluding records where the transaction ID or item name is null. This ensures our analysis is based on clean, complete data. The transaction ID serves as our primary key for grouping items that were purchased together, while the item name gives us the product identifier we'll use for pairing analysis.

Second CTE: Creating Product Pairs

product_pairs AS (
  -- Create pairs of products bought in same transaction
  SELECT
    a.item_name AS item1_name,
    b.item_name AS item2_name,
    a.transaction_id
  FROM
    purchase_items a
    JOIN purchase_items b ON a.transaction_id = b.transaction_id
  WHERE
    a.item_name < b.item_name  -- Avoid duplicates like (A,B) and (B,A)
)

The second CTE performs the critical work of creating product pairs from items purchased in the same transaction. We achieve this through a self-join on the transaction ID, which effectively creates a Cartesian product of all items within each transaction. This mathematical approach ensures we capture every possible combination of products that appeared together in a single purchase.

The WHERE clause condition a.item_name < b.item_name is a elegant solution to a common analytical challenge. Without this filter, we would generate duplicate pairs like both (Product A, Product B) and (Product B, Product A), which represent the same relationship. By enforcing alphabetical ordering, we ensure each unique product pair appears only once in our results, making our analysis more efficient and accurate.

Final Query: Aggregating and Ranking

SELECT
  item1_name,
  item2_name,
  COUNT(DISTINCT transaction_id) AS times_bought_together
FROM
  product_pairs
GROUP BY
  item1_name, item2_name
ORDER BY
  times_bought_together DESC

The final portion of our query aggregates the product pairs and quantifies their co-occurrence frequency. We group by both item names to create unique product pair combinations, then count the distinct transaction IDs to determine how many times each pair was purchased together.

The ORDER BY clause sorts our results in descending order by frequency, immediately surfacing the most popular product combinations. This ranking allows business stakeholders to quickly identify the strongest product relationships and prioritize their strategic decisions accordingly.

Now it's your turn!

I hope you've enjoyed this tutorial and feel a bit more confident to utilise your own Google Analytics data in BigQuery. Drop a line in the comments if you have any questions, feedback or suggestions related to this article.