Can GPTZero Detect GPT-5? The Surprising Results!

Shadab Sayeed

AI Writing August 16, 2025

Can GPTZero Detect GPT-5? The Surprising Results!

As we all know, GPTZero is one of the more popular AI detectors out there, but the question remains: can GPTZero detect GPT-5? The short answer is YES. The longer answer is the devil lies in the details. Keep reading to know more about it.

Why GPTZero Can Detect GPT-5?

The simple answer is GPTZero seems to have been trained on many types of AI outputs, so detecting GPT-5 is not all that difficult for it. They never advertise bypassing GPT-5 detection as a feature on their website or any marketing material that you come across, so it is clearly built to figure out if something is AI or not. Hence, GPTZero is pretty good at spotting GPT-5 output.

Dataset at a Glance

Because OpenAI keep changing the names of the models in their website, and even their Android app, which confused a lot of people, we now use the models using the API, as they don’t have confusing names like the web or the app.
Just a note, these models are the ones on the website/app, with whatever name they’re using now.

Please note that gpt-5, gpt-5-mini and gpt-5-nano, have a dropdown for ‘reasoning effort’ with options of ‘minimal’, ‘low’, ‘medium’, ‘high’, we used the default, which is ‘medium’ when collecting these samples. As for gpt-5-chat-latest, it’s the only non-reasoning model, we used it’s default temperature & top_p, which is 1.

We ran 202 passages through GPTZero, all generated by four GPT-5 variants (gpt-5, gpt-5-chat-latest, gpt-5-mini, gpt-5-nano). GPTZero gave each passage a “human-written” probability from 0 to 100 (0 = certainly AI, 100 = certainly human). We treated scores ≥ 50 as “predicted human” (a miss in this all-AI set) and < 50 as “predicted AI” (a hit).

Here’s what we got:

Correct AI detections (hits): 195 / 202
Misses (flagged as human): 7 / 202
Accuracy: 96.5%
False-negative rate: 3.5%

In simpler terms, if GPTZero sees 100 GPT-5 outputs, it identifies about 96 or 97 of them as AI. You can see these counts in the first bar-chart rendered above (though we can’t literally display it here - imagine a bar-chart with two big bars).

Also Read: Can ZeroGPT detect GPT5?

Variation Across GPT-5 Variants

Now let’s see how GPTZero performed on different GPT-5 models. The second bar-chart (again, imagine it) reveals some interesting patterns.

GPT-5 Variant	Samples	Misclassified (Pred Human)	Accuracy
gpt-5	45	4	0.911
gpt-5-chat-latest	52	1	0.981
gpt-5-mini	52	1	0.981
gpt-5-nano	53	1	0.981

As you can see, the biggest GPT-5 model is slightly harder for GPTZero to correctly identify, missing roughly 1 in 11 passages. The smaller or chat variants, on the other hand, are detected nearly 98% of the time, which is quite reliable.

What Does a 3.5% False-Negative Rate Mean?

It basically means that out of these 202 AI-generated passages, about 7 got flagged as human. In other words, GPTZero wasn’t entirely sure they are AI. If you rely on it 100% and strictly believe everything it says, you might incorrectly assume those 7 passages were human. That’s 3.5% overall, or about 9% for the largest, more sophisticated GPT-5 variant. It’s still not perfect, but it’s pretty darn close.

Also Read: Can Turnitin Detect GPT5?

Key Take-aways

GPTZero is pretty reliable (≈ 96.5% detection success) against GPT-5 output.
Larger, more advanced GPT-5 variants can fly under the radar a small fraction of time (≈ 1 in 11).
Always add genuine human samples before deciding a policy. You need to know false positives (when it flags a real human as AI) or you’re in for trouble.
Layering multiple signals - like style analysis, metadata checks, or watermarking—remains advisable if you are dealing with high-stakes decisions.

A Quick Note on Statistics

We often talk about “accuracy,” “false-negative rate,” and “misses,” but here is the gist:

Accuracy: The percentage of AI texts that GPTZero correctly labeled as AI.
Misses: The ones it thought were human, even though they were AI.
False-negative rate: The proportion of AI texts that slipped through as “human.”

In everyday language, it just means how likely GPTZero is to catch GPT-5 outputs without messing up too often.

One Single Opinion

In my humble opinion, GPTZero is good enough to act as a first-line filter, especially if all you want to do is get a quick sense if a text is AI. However, if it’s a formal or high-stakes scenario, do not rely on it blindly. You might still want to combine GPTZero with watermarking or manual checks to be extra sure.

The Bottom Line

GPTZero flags 96.5% of GPT-5’s outputs as AI, which is definitely not bad. But just like any detector, it is not 100% foolproof. Larger GPT-5 variants do give it a tougher time. So, either develop a robust system that integrates multiple signals to verify an essay’s authenticity, or do it manually if you are really worried about those tricky 3–9% that might slip by. AI detectors are still evolving, so it’s a cat and mouse game that will only get interesting over time.