The typical B2B database is one provider, 40% coverage, 25% stale data. The waterfall approach combines seven to ten providers in sequence, pays only on hits, and delivers 80%+ coverage. Here's the complete blueprint.
If I had to summarize the work of a GTM Engineer in a single project that delivers the most value in the least time, it would be the waterfall enrichment stack. Not a tactic, not a tool config — a systematic approach that lifts your data layer from "half is guesswork" to "we know who we're approaching."
This is part of our GTM Engineering series. For context, read What is GTM Engineering?. Here you get the exact playbook.
Why one enrichment provider falls short by definition
The typical early-stage stack: you buy Apollo or ZoomInfo, run your outbound, and accept that 40-60% of your records are incomplete. That's become normal. It's also wrong.
No single enrichment provider has full coverage on the European market, especially the Benelux. Apollo is strong in the US, weak in NL/BE. Cognism is good in EU but misses many scale-ups. LeadMagic finds emails where Apollo fails. Findymail verifies what Hunter delivers. Datagma fills mobile numbers. Lusha handles LinkedIn data differently than Apollo.
The waterfall math is simple: if provider A finds 60% of your records, and provider B finds 50% — with partial overlap — together you hit 75-85% coverage. Provider C adds another 5-10%. By the end of a well-stacked waterfall you reach 80-92% verified data, per benchmarks from Vanderbuild.
The six layers of a complete waterfall
A good waterfall has six distinct layers, each with a specific purpose. Don't skip the details: this is where GTM Engineering work shows up versus tool config.
Layer 1: Identification (who is this person?)
First step: know who you're dealing with. Input usually an email or LinkedIn URL. Output: a record with name, title, company, location. Primary use: Apollo or Cognism as base. Backup: LeadMagic or People Data Labs.
Expected hit rate: 65-80% on an average B2B list. For Dutch leads at the low end of that range; for American leads the high end.
Layer 2: Email verification (is this address valid?)
Identification gives you an email. Verification checks if it exists and is active. Skipping costs you 5-15% bounce rate and domain reputation. Tools: NeverBounce, ZeroBounce, Findymail (often verification-included), Bouncer.
Key rule: verify before you send. A clean list of 60% records is always better than a dirty list of 95%. Landbase documents that bounce rates above 2% drop inbox placement 15-25%.
Layer 3: Contact data enrichment (mobile, alternate emails)
For multichannel outbound you need more than one contact point. Mobile numbers (where legal), secondary emails, LinkedIn URLs. Tools: Datagma, Cognism, Lusha, Hunter.
Watch: GDPR. Collecting mobile numbers of European contacts requires a legitimate basis. Outside ICP-fit + full opt-out flow = legal risk.
Layer 4: Firmographics
Company size, industry, revenue, headquarters, tech stack. Tools: Crunchbase, BuiltWith, ZoomInfo, Apollo for basics. For Dutch companies: KvK API as supplement.
Tip: don't enrich everything for every record. Identification + verification first. Only when a record is classified as ICP-fit, expand with deep firmographics. Otherwise you burn credits on records you'll never approach.
Layer 5: Signal data
Buying signals: recent job postings, leadership changes, funding, tech changes, news. Tools: Common Room, Trigify, BuiltWith for tech changes, LinkedIn Sales Navigator alerts. See Signal-based outbound for the deep dive.
Layer 6: Intent data (third-party)
External intent platforms like Bombora, G2, or TrustRadius indicate which companies are researching your category. Optional for early-stage, nearly essential for enterprise sales. Only start when the first five layers are running.
Order matters: a concrete waterfall
Here's a waterfall we recently set up for a Dutch SaaS scale-up for email discovery on European ICP accounts.
- Step 1: LeadMagic (cheapest, strong on EU). Hit rate ~50%.
- Step 2 (if LeadMagic fails): Findymail. Hit rate ~60% on remaining 50%.
- Step 3 (if Findymail fails): Apollo direct API. Hit rate ~40% on remaining 20%.
- Step 4 (if Apollo fails): Hunter.io. Hit rate ~25% on remaining 12%.
- Step 5 (if Hunter fails): Datagma. Hit rate ~20% on remaining 9%.
- Step 6 (if all fail): Manual LinkedIn lookup flag for SDR.
Cumulative coverage after layer 5: about 92%. Manual review of the last 7-8%. Credit usage: only pay on hits, so costs are maximum when coverage is maximum.
Credit strategy: don't pay for useless data
The biggest cost in a waterfall is wasted credits. Three rules to prevent this.
Rule 1: qualify before enrichment, not after. Filter first on ICP criteria you can determine cheaply (LinkedIn data is free to scrape, company size often sits in public sources). Only when a record meets minimum baseline, deploy credits on email enrichment. This alone can drop credit usage 60-70%.
Rule 2: cache, cache, cache. A record you enriched last month doesn't need to be re-queried. Build a simple cache layer (in Postgres or even Airtable) storing records with enrichment results and timestamp. For 90% of cases, 90-day-old data is still usable. Shorter for mobile numbers; longer for firmographics.
Rule 3: stop on hit, don't continue. Trivial but with wrong tool config easily wrong: once a layer delivers a hit, you don't proceed to the next provider. Otherwise you pay for data you already have. Clay's waterfall system handles this natively; in Make or n8n you must build it explicitly with conditional branches.
What it actually costs in Clay
Concrete numbers for a typical scale-up workload of 5,000 enrichment runs per month on European B2B leads:
- Clay subscription: $349/month (Pro);
- LeadMagic credits: ~$200/month;
- Findymail credits: ~$150/month;
- Apollo API: ~$100/month (low-tier);
- Datagma + Hunter: ~$100/month;
- Verification (NeverBounce): ~$50/month.
Total: ~$950/month for a complete enrichment layer with 80%+ coverage. Compare to one ZoomInfo Pro license: typically $15-30K per year. The waterfall is cheaper, more accurate, more flexible.
The seven pitfalls I regularly see
Pitfall 1: same provider in multiple layers. Sometimes Apollo delivers different data via one endpoint than another. Counting that as two layers means double-paying for the same data. Test at setup.
Pitfall 2: no confidence score. A "hit" isn't automatically "high-quality hit." Some providers return a confidence score. Filter on it before marking a record "done."
Pitfall 3: forgetting GDPR. Especially for mobile and personal email. Build your opt-out flow from the start. Keep audit trails per record.
Pitfall 4: no monitoring. A waterfall with 80% coverage today can sit at 60% six months later because a provider degraded their data or changed pricing. Monthly review.
Pitfall 5: too many layers. Diminishing returns. After layer 5-6, each extra provider adds 1-3% coverage at disproportionate credit cost. Stop there.
Pitfall 6: not syncing to CRM. The data sits in Clay but never reaches HubSpot/Salesforce. Build the sync at setup, not later.
Pitfall 7: only focusing on email. For multichannel you also need LinkedIn URLs, mobile, and title data. Plan for it.
The 5-day implementation
Day 1: Define ICP, prep test batch of 500 records. Set up Clay account or refine existing. Shortlist providers based on geography and data type.
Day 2: Build layers 1-2 (identification + verification). Test on your 500-record batch. Measure hit rate per layer, cost per layer, output quality.
Day 3: Add layers 3-4 (contact data + firmographics). Build cache layer. Set up CRM sync.
Day 4: Layers 5-6 (signal + optional intent). Confidence scoring and confidence-based routing. Monitoring dashboard.
Day 5: End-to-end test on a new batch of 2,000 records. Document what works. Train your team to operate it.
Five days of work, a result that lasts years — if maintained. For an interim/fractional GTM Engineer this costs typically €6-10K. For a full-time hire doing it themselves: a week's work. ROI almost always recovered within 60 days.
In the next post I cover layer 5 specifically: which buying signals actually work in B2B and how to detect them.