Fix Referral Spam in GA4 – Diagnosis and Solution

Dec 3

The Case File

Your GA4 property is recording sessions from domains you've never heard of. Traffic from sites like semalt.com, buttons-for-website.com, or more recent offenders like trafficbooster.pro and news.grets.store appear in your Traffic Acquisition report under the Referral channel. These sessions didn't originate from real users clicking links—they're referral spam, and they're polluting your analytics data.

Referral spam is fraudulent traffic that appears in your GA4 reports as legitimate referral visits. It inflates session counts, distorts engagement metrics like bounce rate and session duration, and obscures genuine user behavior patterns. The goal? Spammers want you to notice their domain in your reports and visit their site out of curiosity.

The Root Causes

Referral spam infiltrates GA4 through multiple attack vectors. Understanding each mechanism is critical for implementing the right fix.

1. Ghost Spam (Measurement Protocol Abuse)

The most insidious form. Ghost spam doesn't visit your website at all. Instead, attackers send fake hits directly to Google's servers using the Measurement Protocol—the same API that legitimate tracking uses.

All they need is your Measurement ID (the G-XXXXXXXXXX code visible in your page source or GTM container). They craft HTTP requests with fabricated referrer data and fire them at GA4's endpoint. Because these hits never touch your server, traditional defenses like .htaccess rules or Cloudflare blocks are useless.

2. Crawler Spam (Bot-Based Referrals)

These bots actually crawl your site, but they spoof the HTTP referrer header to inject spam domains into your data. Unlike ghost spam, crawler bots do generate server requests, making them detectable (and blockable) at the infrastructure level.

GA4 includes automatic bot filtering, but it's not comprehensive. Google's known-bot exclusion list doesn't catch new or sophisticated spam bots, especially those that mimic human behavior patterns.

3. Exposed Measurement IDs

If you implement GA4 via Google Tag Manager or directly via the gtag.js snippet, your Measurement ID is visible in your site's HTML. Spam scrapers harvest these IDs at scale and target them with ghost spam campaigns.

Server-side GTM reduces this exposure but doesn't eliminate it—determined attackers can still reverse-engineer your property ID through network traffic analysis.

4. Misconfigured Unwanted Referrals List

GA4's List Unwanted Referrals feature (found in Admin > Data Streams > Configure Tag Settings) is designed to prevent self-referrals from payment processors or login flows. However, it does not block spam referrals—it only prevents legitimate domains from breaking session attribution.

Many analysts mistakenly believe adding spam domains here will filter them out. It won't. This setting only affects session continuity, not data inclusion.

The "So What?" (Business Impact)

Referral spam isn't just a cosmetic annoyance—it has tangible business consequences:

Inflated Traffic Metrics

Spam sessions artificially boost your total session count, making your site appear more popular than it is. This skews conversion rate calculations downward (more sessions, same conversions = lower CVR).

Distorted User Behavior Analysis

Spam traffic typically has:

100% bounce rate (single-hit sessions)
0-second session duration
No key event completions

When aggregated with real user data, these metrics drag down your averages, making genuine user engagement appear worse than reality.

Broken Attribution Models

If spam referrals account for 10-20% of your referral traffic, your multi-channel attribution reports become unreliable. You might under-invest in high-performing channels because spam is diluting the referral channel's apparent value.

Wasted Ad Spend (Indirect)

If you use GA4 audiences for remarketing or similar audiences in Google Ads, spam users pollute your segments. You might end up targeting fake users, reducing campaign efficiency and inflating your Customer Acquisition Cost (CAC).

Compliance Risk

In rare cases, spam referrals from adult or gambling sites could create brand safety issues if stakeholders review your analytics reports without context.

The Investigation

Before implementing fixes, confirm the extent of the problem. Here's how to manually identify spam referrals in GA4:

Method 1: Traffic Acquisition Report

Navigate to Reports > Acquisition > Traffic Acquisition
Click the dropdown next to Session default channel group
Select Session source/medium or Session source as your primary dimension
Look for unfamiliar domains in the Referral channel
Check for red flags:

High session counts with zero key events
Bounce rates near 100%
Average engagement time of 0 seconds
Domains you've never partnered with or heard of

Method 2: Exploration Report (Advanced)

Go to Explore and create a new Free Form exploration
Add dimensions:

Page referrer (shows full referring URL)
Session source
Landing page

Add metrics:

Sessions
Engaged sessions
Key events

Apply a filter: Session default channel group exactly matches "Referral"
Sort by Sessions descending and review the Page referrer column

Known spam patterns to watch for:

Domains ending in .xyz, .top, .store (common TLDs for spam)
Referrers from Poland, Russia, or other unexpected geolocations
Referrers with generic names like trafficbooster, free-traffic, get-more-visitors

Method 3: Real-Time Report Check

Open Reports > Real-time and watch for suspicious referral spikes. Spam campaigns often generate bursts of fake traffic over 24-48 hours.

The Solution

Eliminating referral spam requires a multi-layered defense strategy. No single method is 100% effective, so combine approaches based on your technical capabilities.

Solution 1: List Unwanted Referrals (For Self-Referrals Only)

Important: This does not block spam—it only prevents legitimate domains from starting new sessions.

When to use: If you see referrals from your own payment gateway (e.g., paypal.com, stripe.com) or authentication providers (e.g., accounts.google.com) that are breaking user journeys.

Steps:

Go to Admin > Data Streams
Click your web data stream
Scroll down and click Configure tag settings
Click Show More, then List unwanted referrals
Add domains like paypal.com (without https:// or www.)

Solution 2: Create a Custom Data Filter (Regex-Based)

This is the most effective GA4-native solution for blocking spam at the property level.

Steps:

Compile a list of spam domains from your Traffic Acquisition report
Go to Admin > Data Settings > Data Filters
Click Create Filter
Configure:

Filter Name: Exclude Referral Spam
Filter Type: Exclude
Parameter: page_referrer
Match Type: matches regex

Value: Build a regex pattern like:

(semalt\.com|buttons-for-website\.com|trafficbooster\.pro|leadsgo\.io|news\.grets\.store)

Set Filter State to Testing first
Monitor for 48 hours using the Test data filter name dimension in reports
If results look correct, change Filter State to Active

Pro tip: Use \. to escape periods in domain names. The pipe | character means "OR" in regex.

Solution 3: BigQuery Post-Processing (Enterprise)

If you export GA4 data to BigQuery, filter spam in your SQL queries or create cleaned views.

Example query:

SELECT *

FROM `your-project.analytics_XXXXXX.events_*`

WHERE (

  SELECT value.string_value

  FROM UNNEST(event_params)

  WHERE key = 'page_referrer'

) NOT LIKE '%semalt.com%'

AND (

  SELECT value.string_value

  FROM UNNEST(event_params)

  WHERE key = 'page_referrer'

) NOT LIKE '%trafficbooster.pro%'

Solution 4: Looker Studio Filtering

If you build reports in Looker Studio (like Watson does), add a filter to your data source:

Edit your GA4 data source
Add a filter: Exclude Page referrer Contains semalt.com (repeat for each spam domain)
Apply to all reports using that data source

Case Closed

Finding referral spam manually requires constant vigilance—checking Traffic Acquisition reports, building custom Explorations, maintaining regex patterns, and updating filters as new spam domains emerge. It's a time-consuming game of whack-a-mole.

The Watson Analytics Detective dashboard spots this Critical error instantly, alongside 60+ other data quality checks. Watson automatically identifies likely spam sessions by cross-referencing known spam domains and behavioral patterns (zero engagement, abnormal geographic clusters, suspicious referrer strings). It calculates the percentage of your referral traffic that's fraudulent and flags it for immediate action—saving you hours of manual investigation.

Stop chasing spam referrals. Let Watson do the detective work. Explore Watson Analytics Detective →

Ali Izadi