Why rules-based lead scoring fails, and what to replace it with

Rules-based scoring is an opinion in a spreadsheet. ML scoring is what your data actually shows. A case study from a B2B software company with 16,600 lead records explains why.

Most B2B companies still score leads the same way they did a decade ago: assign points to actions, add them up, and hand the list to sales.

Downloaded a whitepaper? 10 points. Visited the pricing page? 35 points. Attended a webinar? 5 points. Cross a threshold, and you’re an MQL.

The problem isn’t that this system is wrong. It’s that it’s made up.

The point system is a guess

In a traditional scoring model, a marketing team sits down and assigns weights to lead behaviors based on intuition. A CTA click gets 35 points. A webinar gets 5. Why? Because someone decided CTA clicks matter more.

A recent study by Gonzalez-Flores, Rubiano-Moreno, and Sosa-Gomez (2025) documented exactly this pattern in a B2B software company case study. The company had a traditional scoring model with manually assigned weights for everything from email opens (2 points) to demo requests (30 points). Sales reps were still overwhelmed, late to contact high-intent leads, and losing deals to competitors who responded faster.

The scores looked precise. They weren’t. They were opinions formatted as numbers.

What actually breaks

Rules-based scoring fails in three predictable ways:

1. The weights are arbitrary. No one validates whether a webinar attendance (5 points) actually correlates with conversion at 1/7th the rate of a CTA click (35 points). The ratios are invented, not measured.

2. The model is static. Buyer behavior changes. New channels emerge. The scoring rules don’t update unless someone manually rewrites them. Most teams don’t.

3. It ignores feature interactions. A lead who downloads content and comes from a referral source might convert at 4x the rate of either signal alone. Point-based systems can’t capture this, they just add.

The research found that sales representatives lack initial information to assess opportunity viability relative to others, causing delayed outreach, intuition-based decisions, resource waste, forecast inaccuracy, and lost revenue.

What ML scoring actually does differently

The researchers tested 15 classification algorithms on 4 years of real CRM data (23,154 lead records from Microsoft Dynamics). The best performer, Gradient Boosting Classifier, hit 98.39% accuracy with an AUC of 0.9891.

That’s not a marginal improvement. That’s a fundamentally different capability.

Here’s what changes when you move from rules to ML:

Weights are learned, not assigned. The model discovers which features actually predict conversion from your historical data. No guessing required.

The model sees interactions. Gradient boosting and tree-based algorithms naturally capture feature interactions: combinations of attributes that together predict conversion better than any single signal.

It retrains. As your data changes, the model updates. New patterns get captured. Old ones that stop working get dropped.

Every lead gets a probability. Not a point total, an actual conversion likelihood based on the full picture of that lead’s attributes and behavior.

The features that actually matter

When the researchers ran feature importance analysis on the trained model, the top predictors weren’t what most marketing teams would guess:

  1. Lead source: where the lead originated
  2. Reason for state: why the lead is in its current status
  3. Lead classification: how the lead was categorized
  4. Product: what product the lead expressed interest in
  5. Number of responses: total interactions with the company
  6. Account type: commercial, educational, or research
  7. Interest level: stated purchase intent

Notice what’s missing from the top of the list: page views, email opens, and content downloads. The behaviors that traditional models weight most heavily weren’t the strongest predictors. The structural attributes (source, classification, product fit) carried more signal.

This is the core insight: you cannot determine which features matter through assumption alone. Measurement is essential.

The gap between theory and practice

The paper notes that “a disconnect between theory and applied practical models has persisted” in consumer behavior research since the 1970s. The same applies to lead scoring. The theory is clear: ML outperforms rules. The academic evidence is consistent. But most B2B companies are still running point systems designed by a marketing manager in a spreadsheet.

The barrier isn’t technical. Gradient boosting classifiers are well understood, computationally efficient, and available in every major ML framework. The barrier is operational: ops teams don’t have the data science resources to build and maintain custom models.

That’s the gap worth closing.

The bottom line

Rules-based scoring tells you what your marketing team thinks matters. ML scoring tells you what your data shows actually converts.

The research is clear: a properly trained classification model on real CRM data outperforms traditional scoring by a wide margin. The question isn’t whether to make the switch. It’s how quickly you can get there.


Reference: Gonzalez-Flores, L., Rubiano-Moreno, J., & Sosa-Gomez, G. (2025). The relevance of lead prioritization: a B2B lead scoring model based on machine learning. Frontiers in Artificial Intelligence, 8, 1554325.