[STUDY] Can Pangram Detect Undetectable AI?

If you are a student using AI tools, the real question is not whether a humanizer can swap words around. The real question is this: can it fool a detector without making the writing worse? To test that idea, we looked at 100 Undetectable AI rewrites and checked how human they looked to Pangram. The results were impressive on the surface, but a closer look shows a much messier story.

What We Tested

Each sample started with an original passage and an Undetectable AI rewrite. Then the rewritten version was checked in Pangram. Because detector tools usually return an AI score, the dataset was converted into a human score instead. In this article, that means a score of 100 looks fully human to Pangram, while a score of 0 looks fully AI.

This matters because a high score may look like a clean win. But detection is only one part of the story. Students also care about whether the final writing still sounds natural, keeps its meaning, and stays readable.

Also Read: Can Undetectable AI Bypass Turnitin? A 100-Sample Test Students Should Read Carefully

What stood out right away

Average Pangram human score: 72.0/100
Median score: 100/100 (the median is the middle score when all 100 samples are sorted)
Perfect 100s: 61 out of 100 samples
Very strong passes: 70 samples scored 90 or above
Clear failures: 25 samples scored 10 or below
Average length change: the rewrites were 22.2% longer than the originals

Pangram Often Bought the Rewrite — But Not Every Time

The first chart shows the big picture. Pangram did not respond in a smooth, gradual way. Instead, the scores were heavily split. A large group of samples landed at 100, while another chunk collapsed into the very low range. In simple terms, Undetectable AI often looked extremely human to Pangram, but when it failed, it tended to fail hard.

Also Read: Can Undetectable.ai Really Slip Past Sapling AI? We Tested 100 Rewrites to Find Out.

Bar chart of Pangram score bands across 100 Undetectable AI rewrites

Score bands across the 100-sample test. The biggest single group landed at a perfect 100 human score.

The threshold view makes that split even clearer. 70% of the rewrites scored at least 90 human, and 61% scored a perfect 100. That is a strong sign that Undetectable AI can often push text past Pangram. At the same time, 25% of the samples still scored 10 or below, which means the bypass was far from guaranteed.

Also Read: Can Undetectable AI Bypass ZeroGPT? I Tested 100 Rewrites, and the Answer Is More Complicated Than the Hype

Bar chart showing how often Undetectable AI rewrites reached common Pangram score thresholds

Undetectable AI cleared Pangram’s higher score ranges surprisingly often, but a meaningful failure group remained.

One more detail is important here: the median score was 100, but the average was only 72.0. That gap tells us the results were not balanced. The tool produced many big wins, but the misses dragged the overall average down.

Also Read: [STUDY] Can Undetectable AI Bypass GPTZero? A 100-Sample Reality Check

The Hidden Cost: The Rewrites Usually Got Longer

Passing a detector is not the same as writing well. One of the clearest patterns in the dataset is that Undetectable AI often expanded the text instead of simply refining it. The average original sample was about 185 words. The average rewrite climbed to about 231 words.

That is not a tiny change. More than half of the dataset — 54 samples — became at least 20% longer. Another 12 samples grew by 50% or more. For students, this matters because longer writing is not always better writing. It can become more repetitive, more padded, and less direct.

Also Read: [STUDY] Can Undetectable AI Bypass Originality AI? A 100-Sample Reality Check

Bar chart showing how much longer or shorter the rewrites became compared with the originals

Most rewrites expanded the source text instead of keeping it tight.

Did that extra length help the bypass? Not really. The scatter plot below shows that making the text longer had almost no clear relationship with Pangram’s final score. The dots are spread all over the chart, and the correlation is only 0.06, which is basically close to no pattern at all.

Scatter plot comparing word-count change and Pangram human score

In plain English: making the rewrite longer did not automatically make it look more human to Pangram.

What the Scores Missed: Problems Inside the Rewrites

This is where the test gets interesting. A detector score can say “human,” but the text itself may still look odd to an actual reader. During review of the 100-sample dataset, several recurring rewrite problems showed up.

Broken openings and sentence fragments. Some rewrites started with damaged text such as “The them to over heat...”, “N of millions...”, or “L soon enough...”. These are obvious quality problems that a student or teacher would notice immediately.
Merged-word glitches. In a few samples, words were jammed together, creating artifacts like givenAn, behaviorFear, and offeredSince.
Duplicate-word errors. A few rewrites repeated words in a visibly awkward way, including “high high,” “low low,” and “The The Moon.”
Formatting damage. Some numbered headings were preserved badly, producing lines like “1. your target audience:” or “Advantages of Fixed Deposits for Financial Stability.The”
Meaning drift. A number of rewrites added filler, changed emphasis, or inserted details that were not clearly present in the original. That does not always create a direct contradiction, but it can still weaken accuracy.

What is striking is that some of these flawed outputs still scored extremely well in Pangram. In other words, a perfect human score did not always mean a clean rewrite. That is a key takeaway for students: detector success and writing quality are not the same thing.

Bar chart of rewrite problems spotted during review

These counts are based on visible issues found during review. Some categories overlap, so one sample can appear in more than one bucket.

One especially important example: the dataset included at least 1 case where the rewrite was effectively unchanged, and Pangram gave it a 0 human score. That reminds us that simple paraphrasing shortcuts do not reliably work.

What the Interfaces Looked Like

The screenshots below help connect the numbers to real examples. The first gallery shows Undetectable AI rewriting text. The second shows Pangram’s decisions on different rewritten samples. Together, they make the bigger point: some outputs sail through, some do not, and the “successful” ones are not always the strongest writing.

Undetectable AI rewrite example 1

Undetectable AI rewrite example 2

Undetectable AI rewrite example 3

Undetectable AI rewrite example 4

Undetectable AI rewrite example 5

Pangram result example 1

Pangram result example 2

Pangram result example 3

Pangram result example 4

Pangram result example 5

Pangram result example 6

The Final Take

Undetectable AI was often effective at bypassing Pangram in this 100-sample test. A perfect 100 human score appeared in 61% of the dataset, and 70% of the rewrites scored 90 or above. That is not a small result.

But the win comes with a catch. The rewrites were usually longer, often more padded, and sometimes visibly damaged by broken phrasing, formatting issues, or meaning drift. So the real lesson is not just that Undetectable AI can beat Pangram fairly often. It is that beating a detector is a much lower bar than producing strong writing.

For students, that distinction matters. A detector might be fooled, but a reader still has eyes. And in the long run, clear thinking and clean writing are harder to fake than a score.

[STUDY] Can Pangram Detect Undetectable AI?

What We Tested

What stood out right away

Pangram Often Bought the Rewrite — But Not Every Time

The Hidden Cost: The Rewrites Usually Got Longer

What the Scores Missed: Problems Inside the Rewrites

What the Interfaces Looked Like

The Final Take

Shadab Sayeed