May 13, 2026 · 4 min read

Why Your GA4 Data Is Messy (and How to Fix It)

Duplicate sources, (not set) campaigns, and inflated session counts. Here's what causes messy GA4 data and how to clean it up.

analytics utm seo

Open your GA4 source report

Go to Acquisition, then Traffic acquisition, then group by Session source. Scroll down. If you've been running campaigns for more than a few months, you'll find at least three of these problems:

Duplicate sources. facebook and Facebook and fb all listed separately. Same platform, three rows. Your "Facebook drove 1,200 sessions" is actually scattered across entries and the real number is higher.

"(not set)" campaigns. Someone tagged the source and medium but forgot the campaign name. Or a platform auto-tagged the click with gclid but didn't populate the campaign dimension. Now you have traffic you can't attribute to anything specific.

Internal referrals. Your own subdomain showing up as a traffic source. app.yoursite.com referring to www.yoursite.com. Each subdomain crossing creates a new session in GA4, inflating your session count and breaking the attribution chain.

"(direct) / (none)" overload. Half your traffic shows as direct. Some of it is real direct traffic. Some of it is dark social (links shared in Slack, iMessage, email clients that strip referrer headers). Some of it is just untagged campaign links. You can't tell which is which.

Why this happens

GA4 doesn't clean your data. It records what it receives. If your team sends inconsistent UTM values, GA4 stores inconsistent UTM values. If a link has no parameters, GA4 guesses the source from the referrer header (or defaults to direct if there's no referrer).

The root causes are almost always:

  1. No enforced naming conventions. People type what feels right. "fb" feels right to one person. "facebook" feels right to another. Both are correct in their mind and wrong in your reports.

  2. UTMs on internal links. This is the most damaging mistake. When someone adds UTM parameters to a link between pages on your own site, GA4 starts a new session. One real visit becomes two or three sessions from different "sources." Your session count inflates, your attribution breaks, and your bounce rate drops (which looks like good news but isn't).

  3. Missing parameters. A link with utm_source but no utm_campaign still works as a redirect. GA4 records the source but shows "(not set)" for the campaign. It's not an error, it's a gap.

  4. Platform auto-tagging conflicts. Google Ads auto-tags with gclid. If you also add manual UTM parameters, GA4 has to decide which to use. Sometimes they conflict. The result is duplicate entries or misattributed traffic.

How to fix it

Fix the naming problem

Write down your allowed values for utm_source, utm_medium, and utm_campaign. Share it with everyone who creates links. Better yet, use a tool that enforces the values so nobody can deviate.

For existing data, create channel groupings in GA4 to merge the duplicates. Go to Admin, then Data display, then Channel groups. Map "fb," "Facebook," and "facebook" to the same channel. This doesn't fix the underlying data but it fixes your reports.

Stop tagging internal links

Search your site for internal links with UTM parameters. Common culprits: navigation menus, footer links, CTAs on landing pages that link to other pages on the same domain. Remove the UTMs. Use GA4 event tracking for internal click measurement.

Set up cross-domain tracking

If you run multiple subdomains (www, app, blog, shop), configure cross-domain tracking in GA4 so visits across subdomains don't start new sessions. Go to Admin, then Data Streams, then your stream, then Configure tag settings, then Configure your domains.

Tag everything external

The "(direct) / (none)" bucket shrinks when you tag more of your outbound links. Every email link, every social post, every partner mention should have UTM parameters. The traffic you don't tag is the traffic you can't attribute.

Fix it at the source

GA4 cleanup is reactive. You're fixing data after it's already dirty. The proactive fix is to correct UTM values before they reach GA4.

Attri's a.js snippet does this. It reads the URL parameters when the page loads, applies your alias and normalization rules, and rewrites the URL before GA4's tag fires. GA4 never sees "fb." It sees "facebook." The data is clean from the first pageview.

That's the difference between cleaning your reports quarterly and never having to clean them at all.

Set up parameter resolution or start with the free UTM builder to enforce conventions on new links.

← All posts