How the score is computed
The 0–100 score, √-decay over repeats, severity weights, and what a regression looks like in CI.
The doctor produces a single integer from 0 to 100 plus a letter
grade. This page explains the math, why the √-decay matters, and
how to read the trend.
The formula
For a given audit, group the findings by rule id. For each group:
penalty_for_rule = weight × Σ ( 1 / √(i + 1) for i in 0..count-1 )
Where weight is the rule's severity weight:
| severity | weight |
|---|---|
error | 5 |
warn | 2 |
info | 0.5 |
The total penalty is the sum across all rules. The final score is:
score = max(0, round(100 − penalty))
And the grade:
| score range | grade |
|---|---|
| 95–100 | A |
| 85–94 | B |
| 70–84 | C |
| 50–69 | D |
| 0–49 | F |
Why √-decay
A single em-dash on every page of a 200-file repo shouldn't tank the score. By the same token, a 200-file repo with 200 different rules firing is a real problem. Linear decay would treat these two cases identically; √-decay makes the first one cheap and the second one expensive.
The 1/√(i+1) sequence decays as:
occurrence (i) | multiplier | effective weight (error) |
|---|---|---|
| 0 (1st) | 1.0 | 5.0 |
| 1 (2nd) | 0.707 | 3.54 |
| 2 (3rd) | 0.577 | 2.89 |
| 3 (4th) | 0.5 | 2.5 |
| 9 (10th) | 0.316 | 1.58 |
The 10th repeat of the same error rule costs 1.58 instead of
5.0 — the rule has clearly been triaged and we're not going to
keep punishing the team for it.
By contrast, 10 different error rules would each cost 5.0
for their first occurrence, totaling 50.0 — a 50-point drop, no
matter how the rules overlap.
The √-decay only applies within a single rule's repeat set. Two
findings of the same warn rule decay; two findings of differentwarn rules do not. The doctor counts distinct rule ids, not
total findings.
Determinism
The score is a pure function of the diff and the ruleset. There is no randomness, no timing, no machine-state, no network. The same diff on the same ruleset on the same day next year produces the same number.
This is what makes it useful as a CI gate and a trend line:
- CI gate: a single drop of more than 3 points is a regression; a drop of 0.3 is noise.
- Trend line: the score over a series of commits is a chart you can ship in a status report.
The trade-off is that the score is not comparable across projects. A score of 88 in a 5-file prototype and a score of 88 in a 200-file production app represent very different levels of effort. Compare within a project, not across them.
Worked example
Say a project has:
- 1
errorofwatch-without-cleanup - 2
warns ofno-em-dash-in-str(one occurrence each, but a repeat of the same rule) - 1
infoofprefer-script-setup-for-new-files
Then:
watch-without-cleanup: 5.0 × 1.0 = 5.0
no-em-dash-in-str: 2.0 × (1.0 + 0.707) = 3.41
prefer-script-setup: 0.5 × 1.0 = 0.5
----
total: 8.91
score = max(0, round(100 − 8.91)) = 91
grade = B
The first em-dash costs 2.0; the second (same rule) costs
2.0 × 0.707 = 1.41. If a third em-dash appeared, it would cost
2.0 × 0.577 = 1.15. The decay is gentle but real.
What a regression looks like
A single commit that introduces a new rule firing (not a repeat of
an existing one) costs the full weight on first appearance. If the
diff adds 3 new warn rules of different ids, the score drops by
3 × 2.0 = 6.0 points — a 6-point regression.
This is intentional: a new pattern in the diff is a signal that the diff is doing something the existing ruleset didn't anticipate. The team should look at it.
If the same diff instead adds 3 more occurrences of a warn rule
that was already firing, the cost is:
2.0 × (1/√(i+1) for i in 3..5)
= 2.0 × (0.5 + 0.433 + 0.378)
= 2.0 × 1.311
= 2.62
A 2.6-point regression for the same kind of warning — much softer.
Threshold and --fail-on
Two related but distinct exit-code signals:
--fail-on <level>exits non-zero if any finding is at or above the given severity. This is a count gate.--threshold <n>exits non-zero if the score is belown. This is a score gate.
A typical setup:
export default defineConfig({
failOn: 'error',
threshold: 80,
});
Means: "block the merge if there's a single error finding or if
the score drops below 80." The two signals are independent — you
can be at score 95 with an error finding (gate fails), or at
score 75 with only warns (gate also fails).
For pre-commit, drop the threshold to 50 and set failOn: 'error'.
For PR CI, the config above works. For release branches, raise the
threshold to 90 and set failOn: 'warn'.
If you're just starting with the doctor, leave threshold unset
(0) and only use failOn: 'error'. The score is a useful dashboard
metric; the threshold is a hard wall. Don't make the wall the first
thing you set.
Edge cases
- Zero findings:
penalty = 0,score = 100,grade = A. - All
info:penalty ≤ 50(only happens with thousands ofinfofindings on a single rule).scorefloors at0. - Empty diff (
--diffwith no changes): the doctor exits0with no report.
The score is clamped to [0, 100]. It is never negative and never
above 100.