Is CoPilot Detectable by ZeroGPT? - The Surprising Truth!

Is CoPilot Detectable by ZeroGPT? - The Surprising Truth!

As we all know, CoPilot is an AI-based code-suggestion system from GitHub. However, is it easily detectable by ZeroGPT? The short answer is YES. The longer answer is the devil lies in the details. Keep reading to know more about it.

Why ZeroGPT can detect CoPilot?

The simple answer is just like any other AI generator, CoPilot wasn’t particularly made for bypassing AI detectors like ZeroGPT. Hence, if it is not made to accomplish this task it won’t be able to do it. And ZeroGPT explicitly advertises AI detection. So, it is bound to pick up CoPilot.

ZeroGPT also provides two things: a categorical label (“AI” or “Human”) and a numeric score. They basically label text as “AI” if it is likely generated by AI, and “Human” if they think it’s written by a person. Meanwhile, their numeric score is supposed to represent how “human-like” the text is. But it’s not truly that simple, as you’ll see below.

What’s in the data

We tested ZeroGPT on a dataset of 100 texts: half of them were written by CoPilot, and the other half were written by humans. This is to check how many times ZeroGPT guesses them right, and to see if the numeric “ZeroGPT Score” can actually help identify CoPilot vs. real human writing.

Also Read: Is Copilot Detectable by ZeroGPT?

Key Findings

  1. ZeroGPT’s categorical label vs. Ground Truth
    • Accuracy: 82% (ZeroGPT was correct 82 times out of 100 texts.)
    • Precision (AI): 0.776 (Out of all texts labeled AI, about 77.6% really were AI/CoPilot.)
    • Recall (AI): 0.900 (Out of all the AI texts, it caught about 90% of them.)
    • F1 (AI): 0.833 (A combined score of precision and recall; the closer to 1, the better.)

    The confusion matrix was [[37, 13], [5, 45]]—37 truly human recognized as human, 13 humans flagged as AI (false positives), 5 AI texts flagged as human (false negatives), and 45 AI texts flagged as AI (true positives). So it’s pretty good at catching CoPilot texts (90% recall), but it also wrongfully flags 13 out of 50 human texts as AI.

  2. ZeroGPT’s numeric “human-score” is paradoxical
    • Mean human-score for CoPilot texts: 86.2
    • Mean human-score for Human texts: 32.4

    Essentially it’s the reverse of what you’d expect. If you look only at that numeric score, you’d mistakenly think CoPilot is more human-like than actual people.

  3. Thresholding the numeric score for detection is unreliable

    If you consider “AI-likelihood” as (100 – ZeroGPT’s human-score), it doesn’t track well with real data. The ROC curve had an AUC of around 0.12, which is very poor. So do not rely on that numeric score alone to detect CoPilot.

Which tool or method is made to do this task perfectly?

Right now, none. ZeroGPT tries to do it, but you can see it’s not perfect. If you want to rely on ZeroGPT for detecting CoPilot, stick to its categorical label. Otherwise, you might need to build your own specialized classifier. The numeric “human-score” is misleading since it ironically says CoPilot is more “human.”

Frequently Asked Questions

Q1. Does ZeroGPT really detect CoPilot?

Yes, ZeroGPT detects CoPilot with about 82% accuracy in our dataset. It has a high recall of 90% for AI texts, meaning it flags most CoPilot texts correctly as “AI.”

Q2. What is the numeric “human-score,” and why is it contradictory?

ZeroGPT’s numeric score is intended to show how human- or AI-like a text is. However, it gave an average of 86.2 for CoPilot and 32.4 for humans—exactly the opposite of expectations.

Q3. Why do we care about precision and recall?

Precision tells you: when ZeroGPT says “AI,” how often it’s correct. Recall tells you: of all true AI texts, how many did it catch? Balancing both helps you trust the detector for large-scale use.

Q4. Can I rely on the numeric “human-score” to separate AI from human text?

No. Based on our findings, it’s misleading. Use the categorical label or train your own model.

Q5. Is using Copilot cheating?

Not automatically. Think of CoPilot as advanced autocomplete that learns from public code/text. You still need to review, understand and test everything it outputs. If your course or company prohibits AI tools then it would count as cheating. So always check your instructor’s or employer’s policy before using it. And note ZeroGPT only flags natural language, not code logic – it won’t catch that you used CoPilot, though comments or docs generated by AI might get flagged.

Why the discrepancy?

It might be that ZeroGPT’s scoring mechanism wasn’t designed to handle CoPilot’s style. Many AI detectors are trained on large language model data (like ChatGPT). CoPilot may feature different patterns that ironically appear “human” to their system.

The Bottom Line

CoPilot is indeed detectable by ZeroGPT’s categorical label (82% overall accuracy and 90% recall for AI). However, you should not rely on their numeric “human-score.” It’s reversed—CoPilot texts get a higher human-score than real human texts. If you absolutely want to detect CoPilot, refer to ZeroGPT’s label (expect some false positives), or build a custom classifier. For casual use, ZeroGPT’s label is probably enough—but don’t let that numeric “human-score” fool you!