[STUDY] When an AI Judges Your Writing: Grammarly vs ZeroGPT on 160 Samples
AI Detectors

[STUDY] When an AI Judges Your Writing: Grammarly vs ZeroGPT on 160 Samples

Shadab Sayeed
Written by Shadab Sayeed
March 06, 2026
Calculating…

When a Detector Judges Your Writing: Grammarly vs ZeroGPT on 160 Samples

Imagine turning in a paper you wrote yourself… and a tool insists it “looks AI.” That one number can affect a grade, a scholarship, or your confidence. So I tested two popular detectors—Grammarly’s AI detector and ZeroGPT—across 160 text samples to see how they behave when the stakes are real.

What I Measured (and What the Score Actually Means)

Each tool outputs a human score between 0 and 1. Higher means “more likely written by a human.” For readability, you can think of 0.85 as roughly 85% human-like.

The dataset includes 78 human-written samples and 82 AI-generated samples. Every sample went through both tools, and I recorded their scores.

Also Read: Originality AI vs Grammarly AI Detector

Quick scoreboard (higher is better on Human text, lower is better on AI text)

  • Human-written text: Grammarly averaged 98.96% human, while ZeroGPT averaged 89.26% human.
  • AI-generated text: ZeroGPT averaged 23.11% human (good: it’s skeptical), while Grammarly averaged 57.83% human (risky: it often “trusts” AI).

The Big Question for Students: “Will it falsely accuse me?”

AI detectors are often used like a yes/no decision. To simulate that, I used a simple cut-off: if the score is ≥ 0.5, the tool says “human”; if it’s below 0.5, it says “AI.”

Two mistakes can happen: false accusation (a human piece gets flagged as AI), and AI slip-through (AI text gets treated as human).

Also Read: ZeroGPT vs Quillbot AI Detector

What happened with the 0.5 cut-off

  • False accusations (Human → flagged AI): ZeroGPT did this 7/78 times (9.0%). Grammarly did it 0/78 times (0.0%).
  • AI slip-through (AI → treated Human): ZeroGPT missed 19/82 AI samples (23.2%). Grammarly missed 41/82 AI samples (50.0%).

So the trade-off is clear: Grammarly is “gentle” on human writers but lets a lot of AI pass. ZeroGPT catches more AI but is more likely to wrongly doubt a real student.

Chart 1: The Average Scores Tell a Story

Averages are not everything, but they’re a helpful first clue. This bar chart compares the average score each tool gave to human-written vs AI-generated text.

Also Read: Originality.ai vs Quillbot AI Detector

Bar chart comparing average human scores for ZeroGPT vs Grammarly across human-written and AI-generated text
Average human score by tool. Higher on “Human-written” is better. Lower on “AI-generated” is better.

Chart 2: Where the Scores Spread Out (and Where They Overlap)

To see consistency, I used a box plot (a simple chart that shows the “middle chunk” of scores and how far they spread). When boxes overlap a lot, the tool has a harder time separating human and AI.

Box plot showing score distributions for ZeroGPT and Grammarly on human-written and AI-generated text
Score spread by tool. Wider overlap usually means more ambiguous decisions.

Notice the pattern: Grammarly scores are packed very close to 1.0 for human text—which is great— but it also gives many AI samples surprisingly high scores. ZeroGPT separates the groups more, but has a few human samples that drop low.

Chart 3: Error Rates (The Part Students Actually Feel)

If you only remember one chart, make it this one. It turns the scores into “real-world mistakes” at the 0.5 cut-off.

Bar chart comparing error rates at a 0.5 threshold for ZeroGPT and Grammarly
ZeroGPT makes more false accusations; Grammarly misses more AI.

Chart 4: Do the Tools Agree With Each Other?

This scatter plot shows each sample as a dot: the x-axis is ZeroGPT’s score and the y-axis is Grammarly’s score. If both tools judged a text similarly, dots would cluster near the diagonal line.

Scatter plot comparing ZeroGPT and Grammarly human scores for each sample
Each point is one sample. Dots far from the diagonal are “disagreements.”

So Which Detector Is “Better”?

It depends on the risk you care about.

If your priority is protecting students from false accusations: Grammarly behaved safer in this dataset (0% false accusations at the 0.5 cut-off). But the downside is huge: it treated half of the AI samples as “human,” which makes it weak as an enforcement tool.

If your priority is catching AI text more often: ZeroGPT was more skeptical of AI overall and missed fewer AI samples. However, it still falsely flagged about 9.0% of genuine human samples—meaning real students can get dragged into explaining themselves.

Optional nerd note (kept simple): I also computed a single “separation score” called AUC. Think of it as: “Across every possible cut-off, how well can the tool rank human texts above AI texts?” ZeroGPT scored 0.871 vs Grammarly’s 0.808 (higher is better).

What Students Should Do If a Detector Is Used Against Them

Detectors are not proof. If you want protection, keep evidence of your writing process:

  • Save drafts (Google Docs version history counts).
  • Keep your outline and sources.
  • Write a short reflection: what you changed and why.
  • If you used tools like Grammarly for grammar fixes, note that clearly.

Deep Dive Screenshots: What the Tools Look Like

Grammarly AI Detector Examples

ZeroGPT Detector Examples

The Bottom Line

In this 160-sample test, Grammarly’s detector was less likely to falsely accuse a human writer—but it also let a lot of AI pass as human. ZeroGPT did a better job spotting AI overall, but it sometimes doubted genuine writing.

If you’re a student, the most important takeaway is this: a detector score should start a conversation, not end one. Keep your drafts, document your process, and treat any single number as a clue—not a verdict.

About the Author
Shadab Sayeed

Shadab Sayeed

CEO & Founder · DecEptioner
Dev Background
Writer Craft
CEO Position
View Full Profile

Shadab is the CEO of DecEptioner — a developer, programmer, and seasoned content writer all at once. His path into the online world began as a freelancer, but everything changed when a close friend received an 'F' for a paper he'd spent weeks writing by hand — his professor convinced it was AI-generated.

Refusing to accept that, Shadab investigated and found even archived Wikipedia and New York Times articles were being flagged as "AI-written" by popular detectors. That settled it. After months of building, DecEptioner launched — a tool built to defend writers who've been wrongly accused. Today he spends his days improving the platform, his nights writing for clients, still driven by that same moment.

Developer Content Writer Entrepreneur Anti-AI-Detection