How the score is computed · the-doctor.report

The doctor produces a single integer from 0 to 100 plus a letter grade. This page explains the math, why the √-decay matters, and how to read the trend.

The formula

For a given audit, group the findings by rule id. For each group:

text

penalty_for_rule = weight × Σ ( 1 / √(i + 1)  for i in 0..count-1 )

Where weight is the rule's severity weight:

severity	weight
`error`	5
`warn`	2
`info`	0.5

The total penalty is the sum across all rules. The final score is:

text

score = max(0, round(100 − penalty))

And the grade:

score range	grade
95–100	A
85–94	B
70–84	C
50–69	D
0–49	F

A single em-dash on every page of a 200-file repo shouldn't tank the score. By the same token, a 200-file repo with 200 different rules firing is a real problem. Linear decay would treat these two cases identically; √-decay makes the first one cheap and the second one expensive.

The 1/√(i+1) sequence decays as:

occurrence (`i`)	multiplier	effective weight (`error`)
0 (1st)	1.0	5.0
1 (2nd)	0.707	3.54
2 (3rd)	0.577	2.89
3 (4th)	0.5	2.5
9 (10th)	0.316	1.58

The 10th repeat of the same error rule costs 1.58 instead of 5.0 — the rule has clearly been triaged and we're not going to keep punishing the team for it.

By contrast, 10 different error rules would each cost 5.0 for their first occurrence, totaling 50.0 — a 50-point drop, no matter how the rules overlap.

Determinism

The score is a pure function of the diff and the ruleset. There is no randomness, no timing, no machine-state, no network. The same diff on the same ruleset on the same day next year produces the same number.

This is what makes it useful as a CI gate and a trend line:

CI gate: a single drop of more than 3 points is a regression; a drop of 0.3 is noise.
Trend line: the score over a series of commits is a chart you can ship in a status report.

The trade-off is that the score is not comparable across projects. A score of 88 in a 5-file prototype and a score of 88 in a 200-file production app represent very different levels of effort. Compare within a project, not across them.

Worked example

Say a project has:

1 error of watch-without-cleanup
2 warns of no-em-dash-in-str (one occurrence each, but a repeat of the same rule)
1 info of prefer-script-setup-for-new-files

Then:

text

watch-without-cleanup:   5.0 × 1.0  =  5.0
no-em-dash-in-str:       2.0 × (1.0 + 0.707)  =  3.41
prefer-script-setup:     0.5 × 1.0  =  0.5
                                              ----
                              total:            8.91

score = max(0, round(100 − 8.91)) = 91
grade = B

The first em-dash costs 2.0; the second (same rule) costs 2.0 × 0.707 = 1.41. If a third em-dash appeared, it would cost 2.0 × 0.577 = 1.15. The decay is gentle but real.

What a regression looks like

A single commit that introduces a new rule firing (not a repeat of an existing one) costs the full weight on first appearance. If the diff adds 3 new warn rules of different ids, the score drops by 3 × 2.0 = 6.0 points — a 6-point regression.

This is intentional: a new pattern in the diff is a signal that the diff is doing something the existing ruleset didn't anticipate. The team should look at it.

If the same diff instead adds 3 more occurrences of a warn rule that was already firing, the cost is:

text

2.0 × (1/√(i+1) for i in 3..5)
= 2.0 × (0.5 + 0.433 + 0.378)
= 2.0 × 1.311
= 2.62

A 2.6-point regression for the same kind of warning — much softer.

Threshold and `--fail-on`

Two related but distinct exit-code signals:

--fail-on <level> exits non-zero if any finding is at or above the given severity. This is a count gate.
--threshold <n> exits non-zero if the score is below n. This is a score gate.

A typical setup:

doctor.config.ts

export default defineConfig({
  failOn: 'error',
  threshold: 80,
});

Means: "block the merge if there's a single error finding or if the score drops below 80." The two signals are independent — you can be at score 95 with an error finding (gate fails), or at score 75 with only warns (gate also fails).

For pre-commit, drop the threshold to 50 and set failOn: 'error'. For PR CI, the config above works. For release branches, raise the threshold to 90 and set failOn: 'warn'.

Edge cases

Zero findings: penalty = 0, score = 100, grade = A.
All info: penalty ≤ 50 (only happens with thousands of info findings on a single rule). score floors at 0.
Empty diff (--diff with no changes): the doctor exits 0 with no report.

The score is clamped to [0, 100]. It is never negative and never above 100.

The formula

Why √-decay

Determinism

Worked example

What a regression looks like

Threshold and --fail-on

Edge cases

Threshold and `--fail-on`