Clay's AI scoring is still rules-based. Here's what's missing.

Clay changed how ops teams think about lead enrichment. You can pull in firmographic data, scrape websites, run waterfall enrichment across providers, and pipe it all into a structured table in minutes.

But when it comes to scoring, something breaks down.

Most Clay workflows end with a prompt. An LLM reads the enriched fields (company size, industry, tech stack, job title) and decides whether the lead is a good fit. Sometimes it assigns a letter grade. Sometimes a number. Sometimes a label like “high intent.”

This feels like intelligence. It’s not.

Prompt-based scoring is rules-based scoring in a trench coat

When you write a prompt that says “score this lead from 1 to 100 based on company size, industry fit, and whether they use Salesforce,” you’re encoding rules. The same rules you’d put in a spreadsheet or a HubSpot workflow, just written in natural language instead of formulas.

The LLM doesn’t know which leads actually convert in your pipeline. It doesn’t have access to your historical close rates by segment. It can’t tell you that “days since last activity” matters 3x more than “company size” in your specific business.

It’s pattern-matching against its training data, which is the internet’s collective opinion of what a good lead looks like. Not yours.

Prompt-based scoring is a more flexible way to write rules. It’s not a replacement for statistical learning.

Where the gap shows up

The problem isn’t that Clay workflows are bad. They’re genuinely useful for enrichment and routing. The problem is what happens at the end of the workflow, at the scoring step.

A prompt-based score can’t tell you:

Which fields actually predict conversion in your pipeline. It guesses based on general knowledge, not your data.
How confident the score is. There’s no AUC, no precision/recall, no way to measure whether the scoring is working.
Whether the explanation is stable. Ask the same LLM to score the same lead twice with slightly different phrasing and you’ll get different results.
What changed. When a lead’s score shifts, there’s no feature importance breakdown showing which field moved it.

This is the difference between “this lead looks good based on what I generally know about B2B” and “this lead has a 78% conversion likelihood based on 14 months of your closed-won data, driven primarily by engagement recency and deal size.”

What a predictive endpoint adds to the workflow

The fix isn’t to rip out Clay. It’s to replace the prompt-based scoring step with an actual predictive score.

ax1om exposes conversion likelihood scores as an API endpoint. That endpoint slots directly into a Clay workflow:

Clay enriches the lead. Firmographic data, tech stack, intent signals, whatever you’re pulling in.
Clay sends the enriched record to ax1om’s API. A single call with the relevant fields.
ax1om returns a 0 to 100 conversion likelihood score plus feature importances. Trained on your historical CRM data, not general AI knowledge.
Clay routes based on the score. High-fit leads go to sales, mid-fit go to nurture, low-fit get deprioritized.

The enrichment layer stays the same. The scoring layer upgrades from subjective to statistical.

What changes in practice

Before (prompt-based): Your Clay workflow enriches a lead, then an LLM prompt scores it as “B+” based on industry and company size. Your team routes it to an SDR. The SDR works it. It doesn’t convert. Nobody knows why the score was wrong because there’s no feedback loop.

After (predictive endpoint): Your Clay workflow enriches a lead, then ax1om’s API returns a score of 34 with feature importances showing that “days since last engagement” and “deal velocity” are both low. The lead gets routed to nurture instead. Your SDR focuses on the leads scoring 70 and above. Close rate goes up.

The difference is that the predictive score is accountable. You can measure its AUC. You can see which fields drive it. You can track whether it improves outcomes over time. When it’s wrong, you can see why.

The enrichment-scoring stack

The real power of tools like Clay isn’t scoring, it’s data assembly. Clay is excellent at pulling together the inputs that make a model more accurate.

The stack that makes sense:

Enrichment (Clay). Assemble firmographic, technographic, and intent data.
Scoring (ax1om API). Predict conversion likelihood from enriched and CRM data.
Activation (Salesforce, Outreach, Slack). Route, prioritize, alert based on scores.

Each layer does what it’s best at. Clay assembles data. ax1om scores it. Your CRM and sequencing tools act on it.

The bottom line

AI-powered enrichment is a real advancement. AI-powered scoring, at least the prompt-based kind, is a lateral move from the spreadsheet era. It’s faster to set up. It’s more flexible. But it’s still someone’s opinion encoded as criteria, just translated into natural language.

Predictive scoring trained on your actual conversion data is a different category entirely. It learns from outcomes. It explains itself. It improves over time.

If you’re running Clay workflows today, the enrichment is working. The scoring step is where the upgrade lives.

ax1om’s API endpoint is available on Pro plans and above. Connect Salesforce, train your first model, and start returning scores in your Clay workflows, in minutes rather than months.