Detect & Verify Long Mediums in GA4
Fix Extra Long Mediums in GA4
The Case File
Your GA4 reports show session medium values exceeding 20 characters. While not a critical data breach, these bloated parameters are quietly corrupting your channel grouping accuracy and fragmenting your traffic attribution reports.
The symptom: Instead of clean, standard medium values like cpc, email, or social, you're seeing entries like email-newsletter-promotional-campaign or social-media-organic-facebook-post. These verbose parameters break GA4's default channel grouping logic and scatter your traffic data across dozens of unintended categories.
What this check measures: Session medium length. GA4 allows up to 100 characters for utm_medium parameters, but best practices dictate keeping values under 20 characters. Anything longer typically signals a UTM tagging error, not an intentional naming choice.
The Root Causes
1. UTM Parameter Misunderstanding
Many marketers treat utm_medium as a descriptive field rather than a categorical classifier. The medium parameter should answer "how did the traffic arrive?" not "what was the specific campaign?"
Common mistakes:
Using utm_medium=spring-sale-promotional-email-blast instead of utm_medium=email
Combining multiple attributes: utm_medium=paid-social-facebook-carousel-ad instead of utm_medium=social
Including campaign details that belong in utm_campaign or utm_content
According to Google's official documentation, standard medium values should be broad categorical identifiers like cpc (cost-per-click), email, social, referral, affiliate, or display.
2. Automated URL Builder Concatenation Errors
Marketing automation platforms and URL builders can inadvertently create long medium values through variable concatenation.
Technical scenarios:
Template errors: A tracking template concatenates multiple variables: {campaign_type}-{ad_format}-{placement} resulting in utm_medium=performance-video-feed
Dynamic parameter generation: CRM systems or email platforms that auto-generate UTM parameters by combining database fields
Spreadsheet formula mistakes: Excel/Google Sheets formulas that concatenate cells without proper validation: =A2&"-"&B2&"-"&C2 creating unintended long strings
3. URL Encoding Expansion
Special characters in UTM parameters get percent-encoded, dramatically expanding character count.
Example transformation:
Original: utm_medium=email newsletter
Encoded: utm_medium=email%20newsletter (adds 2 characters per space)
With multiple spaces or special characters: utm_medium=social%20media%20-%20organic (28 characters)
Spaces, ampersands, quotes, and non-ASCII characters all trigger encoding, turning a 15-character medium into a 25+ character string.
4. GTM Variable Misconfiguration
Google Tag Manager implementations can create long medium values through improper variable setup.
Common GTM issues:
Data Layer concatenation: Variables that combine multiple dataLayer values without length validation
Lookup table errors: Variable lookup tables that return full descriptive strings instead of abbreviated codes
RegEx extraction failures: RegEx variables that capture too much of a URL or string, failing to extract only the intended medium value
Default value mistakes: Setting overly descriptive default values like organic-social-media-referral instead of (not set)
5. Third-Party Platform Defaults
Some advertising and marketing platforms generate their own UTM parameters with verbose naming conventions.
Platform-specific examples:
Email service providers (ESPs) that auto-generate: utm_medium=email-automated-workflow-trigger
Social media management tools using: utm_medium=social-organic-scheduled-post
Affiliate networks with: utm_medium=affiliate-commission-based-referral
6. Legacy Migration Issues
Organizations migrating from Universal Analytics or other platforms may carry over non-standard naming conventions that worked differently in previous systems.
Migration pitfalls:
Universal Analytics was more forgiving with channel grouping pattern matching
Custom channel definitions in UA that accommodated long medium values
Historical tracking templates not updated for GA4's stricter matching logic
The "So What?" (Business Impact)
Broken Channel Grouping
GA4's default channel grouping uses exact regex matching for medium values. According to Google's official documentation, channels like "Paid Social" look for medium matching ^(cpc|ppc|paidsearch) combined with social sources, or medium matching ^(cpv|cpa|cpp|content-text).
The problem: A medium value of paid-social-facebook-carousel won't match the regex pattern ^(social|social-network|social-media|sm|social network|social media)$ for the Organic Social channel. Instead, it gets dumped into "Unassigned" or misclassified entirely.
Business consequence: Your Traffic Acquisition report shows fragmented data. Instead of seeing consolidated "Paid Social" performance, you see dozens of individual medium variations, making it impossible to compare channel effectiveness.
Reporting Fragmentation
Each unique medium value creates a separate row in your reports. Ten variations of what should be "email" (like email-newsletter, email-promotional, email-transactional-automated) creates ten rows instead of one.
Impact on analysis:
Impossible trend analysis: Month-over-month comparisons fail when medium names change
Broken dashboards: Looker Studio reports with hardcoded filters miss new medium variations
Wasted analyst time: Hours spent manually consolidating data that should be grouped automatically
Attribution Data Loss
GA4's attribution models rely on accurate channel classification. When traffic is misclassified due to long or non-standard medium values, your attribution reports become unreliable.
Specific failures:
ROAS calculations: Paid channel performance metrics exclude misclassified campaigns
Conversion path analysis: Multi-touch attribution breaks when the same channel appears under different names
Audience building: Audiences based on traffic source/medium fail to capture all relevant users
Data Quality Perception
For organizations with data governance requirements or those preparing for audits, inconsistent UTM tagging signals broader data quality issues.
Stakeholder concerns:
Executive dashboards show unexplained "Unassigned" traffic spikes
Marketing teams lose confidence in GA4 data accuracy
Budget allocation decisions based on flawed channel performance data
The Investigation
Method 1: Traffic Acquisition Report Analysis
Steps to identify long mediums manually:
Navigate to Reports > Acquisition > Traffic acquisition
Click the pencil icon (Customize report) in the top right
Under Report data, change the dimension from "Session default channel group" to "Session medium"
Click Apply
Scan the list for medium values that appear unusually long or descriptive
Look for patterns: multiple words, hyphens connecting several terms, or encoded characters (%20, %2D)
Red flags to watch for:
Medium values longer than your screen width
Values with 3+ hyphenated segments
Encoded special characters (%20, %26, %2C)
Descriptive phrases instead of categorical terms
Method 2: Exploration Report with Calculated Field
For more precise analysis, create a custom exploration:
Navigate to Explore > Create a new exploration
Choose Free form template
Add Session medium as a dimension
Add Sessions as a metric
In the Variables panel, create a calculated metric:
Name: "Medium Length"
Formula: LENGTH(Session medium)
Add this calculated metric to your report
Sort by "Medium Length" descending to see longest values first
Advanced filter: Apply a filter where LENGTH(Session medium) > 20 to isolate problematic entries.
Method 3: BigQuery Analysis (for GA4 BigQuery Export users)
Query your GA4 BigQuery export to identify all medium values exceeding 20 characters:
sqlCopy code
SELECT
traffic_source.medium AS session_medium,
LENGTH(traffic_source.medium) AS medium_length,
COUNT(DISTINCT CONCAT(user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))) AS sessions
FROM
`project.dataset.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
AND traffic_source.medium IS NOT NULL
AND LENGTH(traffic_source.medium) > 20
GROUP BY
session_medium, medium_length
ORDER BY
sessions DESC
This query returns all medium values over 20 characters with session counts, helping you prioritize fixes.
Method 4: Real-Time Debugging
To catch long mediums as they occur:
Go to Reports > Realtime
Under "Event count by Event name," click View event count by Page title and screen name
Change dimension to Session source/medium
Click on any entry to see the full source/medium combination
Test your campaign URLs in real-time to verify medium values before full deployment
The Solution
Fix 1: Establish UTM Naming Conventions
Create a documented standard for your organization.
Recommended medium values by channel:
Channel Type
utm_medium Value
Use Case
Organic Search
organic
Natural search results
Paid Search
cpc or ppc
Google Ads, Bing Ads
Display Advertising
display
Banner ads, programmatic
Paid Social
social or paidsocial
Facebook Ads, LinkedIn Ads
Organic Social
social
Unpaid social posts
Email Marketing
All email campaigns
Affiliate Marketing
affiliate
Affiliate partner links
Referral
referral
Partner websites, backlinks
Implementation steps:
Create a master UTM documentation sheet with approved values
Limit medium values to 5-10 standard options for your organization
Use utm_campaign and utm_content for specificity (e.g., utm_medium=email&utm_campaign=spring-newsletter-2025)
Enforce lowercase only (GA4 is case-sensitive: "Email" ≠ "email")
Use hyphens for multi-word values (e.g., paidsocial or paid-social, not paid social)
Fix 2: Audit and Update Existing Campaign URLs
For active campaigns:
Inventory all link sources: Email templates, social media schedulers, ad platforms, affiliate dashboards
Search for long medium values: Use Find/Replace in your URL management system
Update systematically by channel:
Email: Replace email-newsletter-promotional → email
Social: Replace social-media-organic-facebook → social
Paid: Replace paid-search-google-brand-campaign → cpc
For Google Ads:
Navigate to Settings > Account settings
Check Auto-tagging is enabled (this uses gclid instead of UTM parameters)
If using manual tagging, update your Tracking template at account or campaign level
Use ValueTrack parameters properly: utm_medium=cpc (static) not utm_medium={campaignid}-{adgroupid}
For email platforms:
Access your ESP's UTM parameter settings (varies by platform)
Update the medium template from dynamic fields to static value: email
Move campaign-specific details to utm_campaign: {campaign_name} or {mailing_id}
Fix 3: Implement URL Builder Governance
Create a centralized URL generation process:
Use Google's Campaign URL Builder: https://ga-dev-tools.google/campaign-url-builder/
Or create a custom spreadsheet builder with dropdown validation:
Column A: Base URL
Column B: utm_source (dropdown with approved values)
Column C: utm_medium (dropdown with 5-10 approved values)
Column D: utm_campaign (free text with character limit)
Column E: Generated URL (formula-based)
Spreadsheet formula example:
Copy code
=A2&"?utm_source="&B2&"&utm_medium="&C2&"&utm_campaign="&SUBSTITUTE(LOWER(D2)," ","-")
Add data validation:
Restrict Column C (medium) to dropdown list only
Set character limit for campaign names (50 characters max)
Use conditional formatting to highlight URLs exceeding 200 characters total
Fix 4: Clean Historical Data (Advanced)
While you can't change historical GA4 data, you can create calculated dimensions for reporting:
In Looker Studio:
Create a calculated field in your GA4 data source
Name: "Cleaned Medium"
Formula:
Copy code
CASE
WHEN REGEXP_CONTAINS(Session medium, "email") THEN "email"
WHEN REGEXP_CONTAINS(Session medium, "social") THEN "social"
WHEN REGEXP_CONTAINS(Session medium, "cpc|ppc|paid.*search") THEN "cpc"
WHEN REGEXP_CONTAINS(Session medium, "display|banner") THEN "display"
ELSE Session medium
END
In BigQuery:
Create a view with cleaned medium values:
sqlCopy code
CREATE OR REPLACE VIEW `project.dataset.cleaned_sessions` AS
SELECT
*,
CASE
WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)email') THEN 'email'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)social') THEN 'social'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)cpc|ppc') THEN 'cpc'
WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)display') THEN 'display'
ELSE traffic_source.medium
END AS cleaned_medium
FROM
`project.dataset.events_*`
Fix 6: Ongoing Monitoring
Set up alerts:
Create a custom GA4 alert (Admin > Data display > Custom alerts)
Condition: When "Sessions" from "Session medium" contains unusual patterns
Unfortunately, GA4 doesn't support length-based alerts directly
Alternative: Weekly audit process:
Schedule weekly exploration report review
Filter for medium values over 20 characters
Trace back to source (email template, ad account, etc.)
Fix at source before data accumulates
Use Watson Analytics Detective:
The manual auditing process described above takes 15-30 minutes weekly. Watson automates this check, flagging long mediums instantly alongside 60+ other data quality issues.
Case Closed
Finding extra long mediums manually requires building custom explorations, creating calculated fields, or writing BigQuery SQL. Even then, you're only seeing the problem after it's already contaminated your reports.
The Watson Analytics Detective dashboard spots this Advice-level error instantly, displaying all session medium values exceeding 20 characters with session counts and trend analysis. It sits alongside 50+ other automated checks for PII breaches, referral spam, broken eCommerce tracking, and attribution issues—all in a single Looker Studio dashboard connected to your GA4 property.
Stop manually hunting for data quality issues. Let Watson investigate while you focus on insights.
Explore Watson Analytics Detective →