When a Detector Judges Your Writing: Grammarly vs ZeroGPT on 160 Samples

Imagine turning in a paper you wrote yourself… and a tool insists it “looks AI.” That one number can affect a grade, a scholarship, or your confidence. So I tested two popular detectors—Grammarly’s AI detector and ZeroGPT—across 160 text samples to see how they behave when the stakes are real.

What I Measured (and What the Score Actually Means)

Each tool outputs a human score between 0 and 1. Higher means “more likely written by a human.” For readability, you can think of 0.85 as roughly 85% human-like.

The dataset includes 78 human-written samples and 82 AI-generated samples. Every sample went through both tools, and I recorded their scores.

Also Read: Originality AI vs Grammarly AI Detector

Quick scoreboard (higher is better on Human text, lower is better on AI text)

Human-written text: Grammarly averaged 98.96% human, while ZeroGPT averaged 89.26% human.
AI-generated text: ZeroGPT averaged 23.11% human (good: it’s skeptical), while Grammarly averaged 57.83% human (risky: it often “trusts” AI).

The Big Question for Students: “Will it falsely accuse me?”

AI detectors are often used like a yes/no decision. To simulate that, I used a simple cut-off: if the score is ≥ 0.5, the tool says “human”; if it’s below 0.5, it says “AI.”

Two mistakes can happen: false accusation (a human piece gets flagged as AI), and AI slip-through (AI text gets treated as human).

Also Read: ZeroGPT vs Quillbot AI Detector

What happened with the 0.5 cut-off

False accusations (Human → flagged AI): ZeroGPT did this 7/78 times (9.0%). Grammarly did it 0/78 times (0.0%).
AI slip-through (AI → treated Human): ZeroGPT missed 19/82 AI samples (23.2%). Grammarly missed 41/82 AI samples (50.0%).

So the trade-off is clear: Grammarly is “gentle” on human writers but lets a lot of AI pass. ZeroGPT catches more AI but is more likely to wrongly doubt a real student.

Chart 1: The Average Scores Tell a Story

Averages are not everything, but they’re a helpful first clue. This bar chart compares the average score each tool gave to human-written vs AI-generated text.

Also Read: Originality.ai vs Quillbot AI Detector

Bar chart comparing average human scores for ZeroGPT vs Grammarly across human-written and AI-generated text

Average human score by tool. Higher on “Human-written” is better. Lower on “AI-generated” is better.

Chart 2: Where the Scores Spread Out (and Where They Overlap)

To see consistency, I used a box plot (a simple chart that shows the “middle chunk” of scores and how far they spread). When boxes overlap a lot, the tool has a harder time separating human and AI.

Box plot showing score distributions for ZeroGPT and Grammarly on human-written and AI-generated text

Score spread by tool. Wider overlap usually means more ambiguous decisions.

Notice the pattern: Grammarly scores are packed very close to 1.0 for human text—which is great— but it also gives many AI samples surprisingly high scores. ZeroGPT separates the groups more, but has a few human samples that drop low.

Chart 3: Error Rates (The Part Students Actually Feel)

If you only remember one chart, make it this one. It turns the scores into “real-world mistakes” at the 0.5 cut-off.

Bar chart comparing error rates at a 0.5 threshold for ZeroGPT and Grammarly

ZeroGPT makes more false accusations; Grammarly misses more AI.

Chart 4: Do the Tools Agree With Each Other?

This scatter plot shows each sample as a dot: the x-axis is ZeroGPT’s score and the y-axis is Grammarly’s score. If both tools judged a text similarly, dots would cluster near the diagonal line.

Scatter plot comparing ZeroGPT and Grammarly human scores for each sample

Each point is one sample. Dots far from the diagonal are “disagreements.”

So Which Detector Is “Better”?

It depends on the risk you care about.

If your priority is protecting students from false accusations: Grammarly behaved safer in this dataset (0% false accusations at the 0.5 cut-off). But the downside is huge: it treated half of the AI samples as “human,” which makes it weak as an enforcement tool.

If your priority is catching AI text more often: ZeroGPT was more skeptical of AI overall and missed fewer AI samples. However, it still falsely flagged about 9.0% of genuine human samples—meaning real students can get dragged into explaining themselves.

Optional nerd note (kept simple): I also computed a single “separation score” called AUC. Think of it as: “Across every possible cut-off, how well can the tool rank human texts above AI texts?” ZeroGPT scored 0.871 vs Grammarly’s 0.808 (higher is better).

What Students Should Do If a Detector Is Used Against Them

Detectors are not proof. If you want protection, keep evidence of your writing process:

Save drafts (Google Docs version history counts).
Keep your outline and sources.
Write a short reflection: what you changed and why.
If you used tools like Grammarly for grammar fixes, note that clearly.

Deep Dive Screenshots: What the Tools Look Like

Grammarly AI Detector Examples

Grammarly Check Result 1

Grammarly Check Result 2

Grammarly Check Result 3

Grammarly Check Result 4

ZeroGPT Detector Examples

ZeroGPT Check Result 1

ZeroGPT Check Result 2

ZeroGPT Check Result 3

ZeroGPT Check Result 4

The Bottom Line

In this 160-sample test, Grammarly’s detector was less likely to falsely accuse a human writer—but it also let a lot of AI pass as human. ZeroGPT did a better job spotting AI overall, but it sometimes doubted genuine writing.

If you’re a student, the most important takeaway is this: a detector score should start a conversation, not end one. Keep your drafts, document your process, and treat any single number as a clue—not a verdict.