← The Field
Workflows 6 min read

I Made AI Grade Its Own Writing and Refuse to Hand Me Anything Below 8 Out of 10

The six-dimension rubric I built that makes AI score its own drafts and refuse to hand back anything below 8 out of 10, with the prompt file to run it yourself.

“Make it better” is the most useless instruction you can give an AI. Say it and you get back a clean, lifeless draft, every time. Here is the six-point rubric I built that makes the AI grade its own writing and refuse to show me anything under an 8.

The Problem

I publish something almost every day. A blog post, a newsletter, a stack of social posts on top of that. Most of those start as an AI draft, because writing every one from scratch isn’t a thing one person has the hours for. The problem was the editing loop. I’d read a draft, decide it was fine, and type “make this better.” Then I’d get back something that was also fine. Cleaner sentences, same flat draft. “Make it better” isn’t an instruction. It’s a vibe, and an AI reads a vibe by smoothing everything into competent mush. I was still doing all the real editing by hand, except now I was doing it twice.

The Workflow

Here’s what I built. It’s a text file and a loop, not software.

  1. Wrote the standard down. Six dimensions, the things that actually make a post good: hook, specificity, my voice, anti-slop (does it read like a human or like AI), teaching payload, and the call to action. Each one scored 1 to 10. The part that matters is the anchor. Every dimension has a written definition at 3, 6, and 9, so the score isn’t a guess. A 9 on “hook” means something specific. Without the anchors, “8 out of 10” is just the model’s mood that pass.

  2. Set a boundary instead of a target. The instruction isn’t “make it good.” It’s “revise until every dimension hits 8, or stop after 5 passes and hand it back with the scorecard.” A boundary tells the model two things a target can’t: keep going, and also when to quit and escalate. “Make it better” never ends. “Get every score to 8 or stop at five tries” ends.

  3. Made it fix the weak thing, not rewrite everything. Each pass, it finds the lowest-scoring dimension and revises that one specifically, then leaves the rest alone. If the hook is a 5 and everything else is an 8, it fixes the hook. That’s how I actually edit. I don’t burn down a draft that’s 80% there, I fix the one line that’s dragging it.

  4. Added four hard gates that override the score. A single em dash fails the whole piece. So does lifestyle flexing, anything my CEO couldn’t read, and raw real-time venting. Doesn’t matter if every dimension scored a 9. One hard gate and it does not ship. The gates get checked before the scores even get read.

  5. Wired it in as a gate, not a suggestion. It’s Step 7 in my content process now. Every channel runs through it before anything gets scheduled. Not “you could score this if you want.” Nothing schedules until it’s scored.

What Broke

The loop works exactly as designed, and that’s the problem. It scores itself up every pass and lands on all-8s, every time. Hand it anything, wait two or three passes, get back a clean post that clears every dimension. The trap is in what it can’t see. It grades the six things written on the card and goes completely quiet on the one thing I never managed to write down: does this actually land for the person I’m writing for, framed the way I’d frame it. I write for independent solo professionals, the consultant or the fractional exec running an entire business alone. The rubric doesn’t know that reader. So it hands me a technically perfect post aimed about fifteen degrees off from the human I’m trying to reach, and I’m still the one who reframes it by hand. A rubric only enforces what you put in it. Relatability to a specific person is the part I haven’t figured out how to encode, so it stays my job.

The Result

First live run, I fed it a flat draft I’d written about cleaning up an email list. Robot Wayne, start to finish. The hook scored a 4. Two passes later, every dimension was at 8 or above, and the post actually had a pulse. I didn’t touch it between those two scores. The loop found the weak line, rewrote it, scored it again, and kept going until it cleared.

The real result isn’t the two passes though. It’s that the system now rejects its own work before it reaches me. I used to get a first draft and a job to go with it: babysit this up to publishable. Now I get a draft that’s already cleared the bar on the six things a rubric can catch, and the only work left is the one thing it can’t, which happens to be the part I’m actually good at. Same amount of me in the final post. A lot less of me getting it there.

The Prompt

Here’s the actual rubric, lightly trimmed. Drop it into whatever AI tool you already use, hand it a draft, and tell it to run the loop.

You are scoring a draft of my writing before I'm allowed to publish it.

THE BOUNDARY
Revise until every dimension below scores 8 or higher, OR you've done 5
passes, whichever comes first. Each pass, fix the single lowest-scoring
dimension and leave the rest alone. At 5 passes without all-8s, stop and hand
it back with the scorecard and the weakest dimension named. Never ship under
threshold quietly.

HARD GATES (auto-fail, override every score, fix before scoring continues)
1. An em dash anywhere. Replace with a period, comma, or rewrite.
2. Lifestyle or income flexing, fake urgency, fabricated results.
3. Anything I wouldn't want my CEO reading.
4. Real-time raw venting. Retrospective ("here's what I learned") is fine;
   "I'm drowning right now" is not.

THE SIX DIMENSIONS (score each 1-10, with a written anchor at 3/6/9)
1. Hook - does the first line make someone stop? (9 = a flat-told specific
   that creates a question the reader needs answered.)
2. Specificity - real tools, real numbers, real named failures? (9 = the
   detail does the persuading, not the adjectives.)
3. Voice - is the actual human on the page, or cleaned-up AI? (9 = at least
   one sarcastic or self-deprecating beat that advances the point.)
4. Anti-slop - clean of AI tells? (Deduct for: delve, crucial, pivotal,
   underscore, "it's not X it's Y," rule-of-three padding, "-ing" tails.)
5. Teaching payload - can the reader do or decide something after reading?
   (9 = a transferable move they can use Monday.)
6. CTA - one earned, channel-right close? Never three asks.

SCORECARD (emit every pass)
PASS n
1 Hook .......... x/10 - reason
2 Specificity ... x/10 - reason
3 Voice ......... x/10 - reason
4 Anti-slop ..... x/10 - flags, or "clean"
5 Teaching ...... x/10 - reason
6 CTA ........... x/10 - reason
Hard gates: PASS/FAIL (which)
Lowest: <dimension> -> next pass targets this

The trick isn’t the six categories. It’s the written anchor under each one. “Hook: 9” means nothing until you’ve defined what a 3, a 6, and a 9 actually look like. Once you do, “8 out of 10” is a measurement instead of a feeling.

The dispatch

One workflow, every Tuesday morning.

Be among the first subscribers. Real workflows from one person doing the work of a whole team, whether that's your own business or a department of one. Free, forever.

No tracking pixels. No drip campaigns. Unsubscribe anytime.