Data Sampling in Google Analytics 4: Cardinality, Thresholding & Data Sampling

by | Nov 28, 2023

Data Sampling in Google Analytics 4: Cardinality, Thresholding & Data Sampling
10 min read

Data Sampling in Google Analytics 4 is back! Bad news for Google Analytics 4 users. It looks like Google is doing everything in its power to push users to BigQuery or pay for a 360 license.

Prefer a quick video over reading? Watch Head of Analytics, Benoit Weber, break it down.

 

Data sampling in Google Analytics 4 is back

Data sampling and thresholding are concepts that have existed to some degree in Universal Analytics but are implemented a little bit differently in Google Analytics 4 (GA4).

Until this month, sampling was only impacting certain sections and reports in GA4.

But unfortunately, sampling is back!

You may already have it. You know, the small signs at the top of your GA4 reports? Google calls it “data quality” and it changes to indicate whether cardinality, sampling or thresholding is applied or not to the report.

Data Sampling in Google Analytics 4: Cardinality, Thresholding & Data Sampling

Depending on the amount of data you collect, the amount of data you are reporting on, you may face different challenges, including cardinality, threshold, and sampling.

Read on for an explanation of each concept, its impact on your reporting, and the alternatives, plus what I recommend you do.

Cardinality in GA4

What is cardinality?

Cardinality refers to the number of unique values assigned to a dimension. Some dimensions have a fixed number of unique values.

Every dimension has different cardinality.

Example:

  • Day of the week has a cardinality of 7: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
  • Device has a cardinality of 3: Mobile, Tablet, Desktop
  • Logged In Status has a cardinality of 2: true or false

Some dimensions can be considered high-cardinality, such as Page Path, Item Name, and Transaction ID.

How could cardinality become a problem for me?

Cardinality might become a problem once you start to have dimensions with high cardinality. If a dimension takes more than 500 values a day, it will be considered by GA4 as a high-cardinality dimension.

High-cardinality dimensions increase the number of rows in a report, making it more likely that a report hits its row limit, causing any data past the limit to go into the (other) row.

Some data condensed in GA4

Example:

11% of pageviews are condensed under (other) due to high cardinality. In this example, almost 12,397 distinct Page path and screen class are included in the report. The others are condensed under (other).

other category in GA4

 

What can I do?

  • Modify your reporting period to limit the data analysis scope.
  • Prefer standard reports for their aggregate tables, minimising the risk of data consolidating under an “(other)” row.
  • If you encounter an “(other)” row in a report, explore the same data in explorations for detailed insights.
  • Consider exporting your data to BigQuery for more extensive and customisable analysis, enabling advanced querying and exploration beyond the capabilities of standard reports.

Data Thresholding in GA4

What are data thresholds? Data thresholds are applied to prevent anyone viewing a report or exploration from inferring the identity or sensitive information of individual users based on demographics, interests, or other signals present in the data.

When thresholding is applied you will see the following message:

Thresholding applied in GA4

 

In their documentation, Google is very vague about the exact limits, but thresholds will be applied when Demographics data, Google signals, and Search query information data is used.

The exact limit is unknown but is usually observed when falling within the range of 30-50 users or events.

What can I do?

  • Increase your reporting period to avoid the threshold being applied
  • Turn off Google Signals in your reporting settings
  • Use device-based reporting identity over blended and observed
  • Consider exporting your data to BigQuery for more extensive and customisable analysis, enabling advanced querying and exploration beyond the capabilities of standard reports

Data Sampling in Google Analytics 4

What is data sampling? Data sampling is the data-analysis practice of analysing a subset of data to uncover meaningful information from a larger data set. The practice enables you to retrieve data more quickly with minimal impact on data quality.

Organically, until November 2023, data sampling was only applied to Explore, and Advertising sections in Google Analytics. But this is no longer the case.

Unfortunately, Google Analytics 4 has now reintroduced sampling in standard reports.

You may now see this type of message while accessing a standard report.

Data sampling in GA4

 

What are the limits for data sampling in GA4?

The quota limit for event level queries is 10 million events for standard Google Analytics properties and up to 1 billion events for Google Analytics 360 properties.

This approach allows Google to be efficient in compute costs, but this can lead to situations where it has provided an approximate value instead of the exact value.

What is the impact of data sampling in Google Analytics 4?

The data that is presented to you is no longer exact and is approximate when data sampling is applied. The smaller the sample the less accuracy, and the more approximate. GA4 reports do not always give you the exact numbers.

Google is applying sampling because processing data is computationally expensive.

  • Sampling now impacts any report inside the GA4 UI
  • Sampling also affects Looker Studio
  • Sampling also affects Google Analytics API

What can I do?

  • Modify your reporting period to limit the data analysis scope.
  • Consider limiting your data collection to minimise the amount of data you collect. Keep it simple, stupid (KISS) and do not track every single interaction on your site.
  • Consider exporting your data to BigQuery for more extensive and customisable analysis, enabling advanced querying and exploration beyond the capabilities of standard reports.
  • Explore the option of obtaining a GA4 360 license. With a 360 license, you have the capability in the UI to request more detailed results (up to 1B events) over faster results (100m events). You can also request unsampled results that will be sent to you.

Google is pushing us to Bigquery and the 360 license!

In terms of reporting, Google Analytics 4 offers us multiple options

In the user interface:

  • Standard reports: Report, Advertising
  • Custom Reports: Explore

Outside of GA4:

  • Dashboard created using GA4 Looker Studio connector
  • Reports using data exported using the API
  • Reports using GA4 Export for Bigquery

With the exception of BigQuery, all the above are facing:

  • Cardinality: when the underlying table in the report reaches its row limit, causing any data past the limit to go into the (other) row.
  • Data thresholding: applied when you request data based on a limited number of users, which could lead to privacy issues

In summary, if you don’t want to face GA4 data limitations, you have 3 solutions:

  1. Do not collect too much data
  2. Set up a data warehouse via BigQuery
  3. Purchase a 360 license

What I recommend

My recommended solution is to start working with Bigquery as it is a good solution to all three issues without having to buy a license for Google Analytics 360.

  • Sampling: Connect GA4 to BigQuery to access unsampled data in its raw format. This ensures that you can analyse your data in BigQuery without encountering sampling issues.
  • Thresholding: Linking GA4 to BigQuery prevents sensitive user data—particularly from Google Signals—from being transferred. This helps you sidestep thresholding concerns in Explore reports using Google Signals data.
  • Cardinality: By integrating GA4 with BigQuery, you can store all your data in a platform without cardinality limits. This eliminates row limit errors, allowing access to all your data, even when it involves high-cardinality dimensions.
Benoit Weber
Categories

Recommended for you

Get Our Newsletter

Sign up for our newsletter and receive monthly updates on what we’ve been up to, digital marketing news and more.

Your personal information will not be shared, and we don’t like mail spam or pushy salesmen either!