[SHORT STUDY] How accurate is Turnitin? - An Independent Analysis

[SHORT STUDY] How accurate is Turnitin? - An Independent Analysis

As we all know, Turnitin’s AI detector is one of those detectors that has made lives a living hell for many students. However, is it really accurate or is it simply overhyped? The short answer is that it’s pretty accurate when it says the text is AI-written, but the longer answer is the devil lies in the details. Keep reading to know more about it.

Before I dive in, I want to share my personal experience. Even during my usage I have always felt that Turnitin has very low false-positive rates & it takes steps to maintain that. For example, it won’t even show you the AI score if it is below 20%. It was not like this previously. This shows that, unlike other AI detectors, they are not just after the money & accept the fact that their detector is not 100% accurate.

So, let’s jump into the nitty-gritty details of how Turnitin’s AI detection performed in a data-backed test at a 50% threshold.

1. The Dataset and the “50% Threshold” Rule

  • We tested Turnitin’s AI detector on a total of 160 texts. Out of these:
    • 82 were actually AI-written (by GPT or some other large language model).
    • 78 were actually Human-written.
  • Prediction rule:
    • If Turnitin’s AI Score ≥ 50%, label the text “AI”.
    • Otherwise, label it “Human”.

Also Read: Does Scribbr AI uses Turnitin's AI detection?

2. Overall Performance (50% Threshold)

  • Accuracy: 82.5% – 132 out of 160 were correctly labeled.
  • Precision (PPV) on AI: 98.2% – of AI-labeled texts, 98.2% were truly AI.
  • Recall (Sensitivity) on AI: 67.1% – Turnitin caught about two-thirds of actual AI texts.
  • Specificity (True Negative Rate): 98.7% – almost 99% of real human text was correctly labeled “Human.”
  • False Positive Rate: 1.3% – only 1 human text out of 78 was wrongly flagged.
  • False Negative Rate: 32.9% – about one in three AI texts slipped through.
  • F1 score for AI class: 0.7971 – balances precision and recall.
  • Balanced Accuracy: 0.829 – average detection rate for AI and human texts.
  • ROC AUC: 0.8736 – raw scores separate AI vs. human quite well.

3. Confusion Matrix (50% Threshold)

Ground Truth → Human AI
Predicted Human 77 27
Predicted AI 1 55

 

Key points:
• 1 out of 78 humans was wrongly flagged as AI (false positive).
• 27 out of 82 AI were missed and labeled “Human” (false negatives).

4. Methodology

We took the “Written By” column as ground truth, used Turnitin’s raw AI percentage, and applied a single 50% cutoff to label AI vs. Human. No extra calibrations or special post-processing were performed.

Also Read: Can Turnitin detect ChatGPT in another language?

5. Main Takeaways

  • Turnitin is very conservative at the 50% threshold—rarely accuses humans wrongly.
  • It misses about one in every three AI-generated texts.
  • High precision (98.2%) means flagged AI is almost certainly AI.
  • Lower recall (67.1%) shows many AI texts slip by.
  • ROC AUC of 0.8736 suggests room to adjust thresholds for different goals.

6. Practical Implications

  • If you hate false accusations, 50% threshold is a sweet spot.
  • To catch more AI, consider lowering the threshold (e.g., 30% or 20%), but expect more false positives.
  • Combine Turnitin’s score with drafts, style analysis, or formatting clues for thorough checks.

7. Limitations and Context

  • Dataset of 160 texts—results may vary with different essays or lengths.
  • Assumed “Written By” labels were accurate.
  • No advanced calibrations or custom fine-tuning were used.

8. Suggested Citation

Analysis of Turnitin AI Detector on a 160-text dataset using a 50% AI-score threshold. Accuracy 82.5%, Precision 98.2%, Recall 67.1%, Specificity 98.7%, ROC AUC 0.874; 1.3% false positive rate and 32.9% false negative rate.

Frequently Asked Questions

Q1. How accurate is Turnitin’s AI detection at 50% threshold?

At a 50% threshold, overall accuracy is 82.5%—132 out of 160 texts were correctly labeled.

Q2. Why does Turnitin not show my AI score sometimes?

They don’t show the AI score if it’s below 20%, to keep false positives to a minimum.

Q3. If Turnitin flags me as AI, is it always correct?

Almost. Precision is 98.2%, so flagged AI is highly likely to be truly AI.

Q4. Can Turnitin detect all AI content?

No. The false negative rate is 32.9%, meaning about one in three AI texts slip through at this threshold.

Q5. Does Turnitin falsely detect AI?

It depends on the threshold beyond which you consider it as AI-written. If you keep the threshold at 50% (like in this study it was 1 out of 78 samples), then there is a very low probability of Turnitin falsely detecting your text as AI-written.

A Single Opinion (That Might Help You)

If your primary goal is to avoid falsely accusing genuine writers, Turnitin’s conservative threshold is a decent pick. Yes, it misses some AI text, but that’s often better than scaring away original writers.

The Bottom Line

Turnitin shines in minimizing false positives—only 1 out of 78 human texts was wrongly flagged. However, about 1 in 3 AI-generated texts was missed. Students writing their own work can rest somewhat easy. Instructors wanting to catch every AI sample should consider adjusting thresholds or using additional methods. AI detection is still evolving, so expect ongoing fine-tuning from Turnitin.