Detect & Verify Long Mediums in GA4

Fix Extra Long Mediums in GA4

The Case File

Your GA4 reports show session medium values exceeding 20 characters. While not a critical data breach, these bloated parameters are quietly corrupting your channel grouping accuracy and fragmenting your traffic attribution reports.

The symptom: Instead of clean, standard medium values like cpc, email, or social, you're seeing entries like email-newsletter-promotional-campaign or social-media-organic-facebook-post. These verbose parameters break GA4's default channel grouping logic and scatter your traffic data across dozens of unintended categories.

What this check measures: Session medium length. GA4 allows up to 100 characters for utm_medium parameters, but best practices dictate keeping values under 20 characters. Anything longer typically signals a UTM tagging error, not an intentional naming choice.

The Root Causes

1. UTM Parameter Misunderstanding

Many marketers treat utm_medium as a descriptive field rather than a categorical classifier. The medium parameter should answer "how did the traffic arrive?" not "what was the specific campaign?"

Common mistakes:

  • Using utm_medium=spring-sale-promotional-email-blast instead of utm_medium=email

  • Combining multiple attributes: utm_medium=paid-social-facebook-carousel-ad instead of utm_medium=social

  • Including campaign details that belong in utm_campaign or utm_content

According to Google's official documentation, standard medium values should be broad categorical identifiers like cpc (cost-per-click), email, social, referral, affiliate, or display.

2. Automated URL Builder Concatenation Errors

Marketing automation platforms and URL builders can inadvertently create long medium values through variable concatenation.

Technical scenarios:

  • Template errors: A tracking template concatenates multiple variables: {campaign_type}-{ad_format}-{placement} resulting in utm_medium=performance-video-feed

  • Dynamic parameter generation: CRM systems or email platforms that auto-generate UTM parameters by combining database fields

  • Spreadsheet formula mistakes: Excel/Google Sheets formulas that concatenate cells without proper validation: =A2&"-"&B2&"-"&C2 creating unintended long strings

3. URL Encoding Expansion

Special characters in UTM parameters get percent-encoded, dramatically expanding character count.

Example transformation:

  • Original: utm_medium=email newsletter

  • Encoded: utm_medium=email%20newsletter (adds 2 characters per space)

  • With multiple spaces or special characters: utm_medium=social%20media%20-%20organic (28 characters)

Spaces, ampersands, quotes, and non-ASCII characters all trigger encoding, turning a 15-character medium into a 25+ character string.

4. GTM Variable Misconfiguration

Google Tag Manager implementations can create long medium values through improper variable setup.

Common GTM issues:

  • Data Layer concatenation: Variables that combine multiple dataLayer values without length validation

  • Lookup table errors: Variable lookup tables that return full descriptive strings instead of abbreviated codes

  • RegEx extraction failures: RegEx variables that capture too much of a URL or string, failing to extract only the intended medium value

  • Default value mistakes: Setting overly descriptive default values like organic-social-media-referral instead of (not set)

5. Third-Party Platform Defaults

Some advertising and marketing platforms generate their own UTM parameters with verbose naming conventions.

Platform-specific examples:

  • Email service providers (ESPs) that auto-generate: utm_medium=email-automated-workflow-trigger

  • Social media management tools using: utm_medium=social-organic-scheduled-post

  • Affiliate networks with: utm_medium=affiliate-commission-based-referral

6. Legacy Migration Issues

Organizations migrating from Universal Analytics or other platforms may carry over non-standard naming conventions that worked differently in previous systems.

Migration pitfalls:

  • Universal Analytics was more forgiving with channel grouping pattern matching

  • Custom channel definitions in UA that accommodated long medium values

  • Historical tracking templates not updated for GA4's stricter matching logic

The "So What?" (Business Impact)

Broken Channel Grouping

GA4's default channel grouping uses exact regex matching for medium values. According to Google's official documentation, channels like "Paid Social" look for medium matching ^(cpc|ppc|paidsearch) combined with social sources, or medium matching ^(cpv|cpa|cpp|content-text).

The problem: A medium value of paid-social-facebook-carousel won't match the regex pattern ^(social|social-network|social-media|sm|social network|social media)$ for the Organic Social channel. Instead, it gets dumped into "Unassigned" or misclassified entirely.

Business consequence: Your Traffic Acquisition report shows fragmented data. Instead of seeing consolidated "Paid Social" performance, you see dozens of individual medium variations, making it impossible to compare channel effectiveness.

Reporting Fragmentation

Each unique medium value creates a separate row in your reports. Ten variations of what should be "email" (like email-newsletter, email-promotional, email-transactional-automated) creates ten rows instead of one.

Impact on analysis:

  • Impossible trend analysis: Month-over-month comparisons fail when medium names change

  • Broken dashboards: Looker Studio reports with hardcoded filters miss new medium variations

  • Wasted analyst time: Hours spent manually consolidating data that should be grouped automatically

Attribution Data Loss

GA4's attribution models rely on accurate channel classification. When traffic is misclassified due to long or non-standard medium values, your attribution reports become unreliable.

Specific failures:

  • ROAS calculations: Paid channel performance metrics exclude misclassified campaigns

  • Conversion path analysis: Multi-touch attribution breaks when the same channel appears under different names

  • Audience building: Audiences based on traffic source/medium fail to capture all relevant users

Data Quality Perception

For organizations with data governance requirements or those preparing for audits, inconsistent UTM tagging signals broader data quality issues.

Stakeholder concerns:

  • Executive dashboards show unexplained "Unassigned" traffic spikes

  • Marketing teams lose confidence in GA4 data accuracy

  • Budget allocation decisions based on flawed channel performance data

The Investigation

Method 1: Traffic Acquisition Report Analysis

Steps to identify long mediums manually:

  1. Navigate to Reports > Acquisition > Traffic acquisition

  2. Click the pencil icon (Customize report) in the top right

  3. Under Report data, change the dimension from "Session default channel group" to "Session medium"

  4. Click Apply

  5. Scan the list for medium values that appear unusually long or descriptive

  6. Look for patterns: multiple words, hyphens connecting several terms, or encoded characters (%20, %2D)

Red flags to watch for:

  • Medium values longer than your screen width

  • Values with 3+ hyphenated segments

  • Encoded special characters (%20, %26, %2C)

  • Descriptive phrases instead of categorical terms

Method 2: Exploration Report with Calculated Field

For more precise analysis, create a custom exploration:

  1. Navigate to Explore > Create a new exploration

  2. Choose Free form template

  3. Add Session medium as a dimension

  4. Add Sessions as a metric

  5. In the Variables panel, create a calculated metric:

    • Name: "Medium Length"

    • Formula: LENGTH(Session medium)

  6. Add this calculated metric to your report

  7. Sort by "Medium Length" descending to see longest values first

Advanced filter: Apply a filter where LENGTH(Session medium) > 20 to isolate problematic entries.

Method 3: BigQuery Analysis (for GA4 BigQuery Export users)

Query your GA4 BigQuery export to identify all medium values exceeding 20 characters:

sqlCopy code

SELECT

  traffic_source.medium AS session_medium,

  LENGTH(traffic_source.medium) AS medium_length,

  COUNT(DISTINCT CONCAT(user_pseudo_id, 

    (SELECT value.int_value FROM UNNEST(event_params) 

     WHERE key = 'ga_session_id'))) AS sessions

FROM

  `project.dataset.events_*`

WHERE

  _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'

  AND traffic_source.medium IS NOT NULL

  AND LENGTH(traffic_source.medium) > 20

GROUP BY

  session_medium, medium_length

ORDER BY

  sessions DESC

This query returns all medium values over 20 characters with session counts, helping you prioritize fixes.

Method 4: Real-Time Debugging

To catch long mediums as they occur:

  1. Go to Reports > Realtime

  2. Under "Event count by Event name," click View event count by Page title and screen name

  3. Change dimension to Session source/medium

  4. Click on any entry to see the full source/medium combination

  5. Test your campaign URLs in real-time to verify medium values before full deployment

The Solution

Fix 1: Establish UTM Naming Conventions

Create a documented standard for your organization.

Recommended medium values by channel:

Channel Type

utm_medium Value

Use Case

Organic Search

organic

Natural search results

Paid Search

cpc or ppc

Google Ads, Bing Ads

Display Advertising

display

Banner ads, programmatic

Paid Social

social or paidsocial

Facebook Ads, LinkedIn Ads

Organic Social

social

Unpaid social posts

Email Marketing

email

All email campaigns

Affiliate Marketing

affiliate

Affiliate partner links

Referral

referral

Partner websites, backlinks

Implementation steps:

  1. Create a master UTM documentation sheet with approved values

  2. Limit medium values to 5-10 standard options for your organization

  3. Use utm_campaign and utm_content for specificity (e.g., utm_medium=email&utm_campaign=spring-newsletter-2025)

  4. Enforce lowercase only (GA4 is case-sensitive: "Email" ≠ "email")

  5. Use hyphens for multi-word values (e.g., paidsocial or paid-social, not paid social)

Fix 2: Audit and Update Existing Campaign URLs

For active campaigns:

  1. Inventory all link sources: Email templates, social media schedulers, ad platforms, affiliate dashboards

  2. Search for long medium values: Use Find/Replace in your URL management system

  3. Update systematically by channel:

    • Email: Replace email-newsletter-promotional → email

    • Social: Replace social-media-organic-facebook → social

    • Paid: Replace paid-search-google-brand-campaign → cpc

For Google Ads:

  1. Navigate to Settings > Account settings

  2. Check Auto-tagging is enabled (this uses gclid instead of UTM parameters)

  3. If using manual tagging, update your Tracking template at account or campaign level

  4. Use ValueTrack parameters properly: utm_medium=cpc (static) not utm_medium={campaignid}-{adgroupid}

For email platforms:

  1. Access your ESP's UTM parameter settings (varies by platform)

  2. Update the medium template from dynamic fields to static value: email

  3. Move campaign-specific details to utm_campaign: {campaign_name} or {mailing_id}

Fix 3: Implement URL Builder Governance

Create a centralized URL generation process:

  1. Use Google's Campaign URL Builder: https://ga-dev-tools.google/campaign-url-builder/

  2. Or create a custom spreadsheet builder with dropdown validation:

    • Column A: Base URL

    • Column B: utm_source (dropdown with approved values)

    • Column C: utm_medium (dropdown with 5-10 approved values)

    • Column D: utm_campaign (free text with character limit)

    • Column E: Generated URL (formula-based)

Spreadsheet formula example:

Copy code

=A2&"?utm_source="&B2&"&utm_medium="&C2&"&utm_campaign="&SUBSTITUTE(LOWER(D2)," ","-")

Add data validation:

  • Restrict Column C (medium) to dropdown list only

  • Set character limit for campaign names (50 characters max)

  • Use conditional formatting to highlight URLs exceeding 200 characters total

Fix 4: Clean Historical Data (Advanced)

While you can't change historical GA4 data, you can create calculated dimensions for reporting:

In Looker Studio:

  1. Create a calculated field in your GA4 data source

  2. Name: "Cleaned Medium"

  3. Formula:

Copy code

CASE

  WHEN REGEXP_CONTAINS(Session medium, "email") THEN "email"

  WHEN REGEXP_CONTAINS(Session medium, "social") THEN "social"

  WHEN REGEXP_CONTAINS(Session medium, "cpc|ppc|paid.*search") THEN "cpc"

  WHEN REGEXP_CONTAINS(Session medium, "display|banner") THEN "display"

  ELSE Session medium

END

In BigQuery:

Create a view with cleaned medium values:

sqlCopy code

CREATE OR REPLACE VIEW `project.dataset.cleaned_sessions` AS

SELECT

  *,

  CASE

    WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)email') THEN 'email'

    WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)social') THEN 'social'

    WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)cpc|ppc') THEN 'cpc'

    WHEN REGEXP_CONTAINS(traffic_source.medium, r'(?i)display') THEN 'display'

    ELSE traffic_source.medium

  END AS cleaned_medium

FROM

  `project.dataset.events_*`

Fix 6: Ongoing Monitoring

Set up alerts:

  1. Create a custom GA4 alert (Admin > Data display > Custom alerts)

  2. Condition: When "Sessions" from "Session medium" contains unusual patterns

  3. Unfortunately, GA4 doesn't support length-based alerts directly

Alternative: Weekly audit process:

  1. Schedule weekly exploration report review

  2. Filter for medium values over 20 characters

  3. Trace back to source (email template, ad account, etc.)

  4. Fix at source before data accumulates

Use Watson Analytics Detective:

The manual auditing process described above takes 15-30 minutes weekly. Watson automates this check, flagging long mediums instantly alongside 60+ other data quality issues.

Case Closed

Finding extra long mediums manually requires building custom explorations, creating calculated fields, or writing BigQuery SQL. Even then, you're only seeing the problem after it's already contaminated your reports.

The Watson Analytics Detective dashboard spots this Advice-level error instantly, displaying all session medium values exceeding 20 characters with session counts and trend analysis. It sits alongside 50+ other automated checks for PII breaches, referral spam, broken eCommerce tracking, and attribution issues—all in a single Looker Studio dashboard connected to your GA4 property.

Stop manually hunting for data quality issues. Let Watson investigate while you focus on insights.

Explore Watson Analytics Detective →


Previous
Previous

Missing Demographic Data in GA4: Diagnosis and Solution

Next
Next

Campaign Misattribution in GA4: Diagnosis and Solution