scoring logic v2: the score finally earns the number

a bad score is annoying. a fake-good score is worse.

that was the problem with the old voice score. you could paste a draft with obvious ai writing patterns, five things that still needed fixing, and a voice that did not really match the profile. somehow the number could still sit around 70 or 80. not because the writing earned it. because the scoring logic was too forgiving in the wrong places.

it made the product feel confused. the issue cards were saying "fix this." the score was saying "mostly fine." users were right to ask what was going on.

scoring logic v2 comparison showing an old 78 score beside the new four-bucket score model
the new score is built from the draft itself: brand voice match, ai-pattern cleanliness, remaining fix load, and writing strength.

what changed

scoring logic v2 is a rebuild of the premise. the score does not start at a polite middle number and drift up or down. it earns points from evidence in the draft.

brand voice match40 points
ai-pattern cleanliness25 points
remaining fix load20 points
writing strength15 points

that split is intentional. writing quality still matters, but it cannot rescue a draft that sounds like ai or ignores the active brand voice. a polished generic paragraph should not score like a clean on-profile draft. that is the whole point of the product.

brand voice now carries real weight

the biggest bucket is brand voice match. that means the score looks at the active profile first: the voice keywords, sentence length, rhythm variance, paragraph shape, argument pattern, opening moves, and never-list words when a v2 profile has them.

this matters because "good writing" and "your writing" are not the same thing. a draft can be grammatically clean and still sound like the wrong person. v2 makes that drift expensive.

ai patterns are no longer a side note

the old score deducted for ai patterns, but it did not punish clusters hard enough. one cliche opener is one problem. four formulaic phrases in a short email is a different problem. v2 counts both the number of ai patterns and their density.

that is why an ai-heavy sample in the stress test landed at 30 instead of pretending to be a passable 78. the score is lower because the evidence is worse. that is not harsh. that is honest.

the score improves when the user fixes the issues

this was the most important part for me. if the issue panel says there are red and yellow cards left, the score now feels that. when the user accepts a fix, edits a line, or clears an issue, the draft is rescored from the remaining visible issues.

in the stress test, the same clean on-profile draft moved like this as issues disappeared: 66, 69, 72, 86, 91. that is what the score should do. not jump randomly. not stay stuck. move because the draft actually got better.

writing strength still matters, but it is smaller on purpose

there is still a writing-strength bucket. it looks at signal quality, mechanics, and format fit: specificity, a clear ask in an email, readable sentence rhythm, and whether the draft fits the selected format.

but it is only 15 points. that is the right size. a strong hook should help. it should not hide five ai patterns.

why this version feels better

i do not want a score that flatters the user. i want a score that explains the work left.

if a pasted draft starts at 45, that is not a product failure. it is useful feedback if the score also shows why: too many ai tells, too many open fixes, not enough profile match. the number should give you a path, not a mood.

scoring logic v2 is not perfect. no scoring system is. but it is now grounded in the things the product is actually asking you to fix. that is the bar. if the draft gets cleaner, the score goes up. if it still sounds generic, the score says so.

that is the whole improvement.