AI Text Detection: How Accurate Are AI Detection Tools & Can We Bypass Them?

by | Feb 8, 2023

AI Text Detection: How Accurate Are AI Detection Tools & Can We Bypass Them?
7 min read

With AI text becoming increasingly popular you’ve probably wondered (like the rest of us) how easy it is to detect. Our Data & Engineering team set out to answer the question of how accurate AI text detection tools actually are, and whether we can bypass them.

AI Detection Test

Below we’ll endeavour to answer the following questions:

  1. How accurate are AI text detection tools?
  2. Can we bypass them?
  3. Can paraphrasing help bypass AI detection?

For the experiment, we only used AI-generated text, using GPT3 Da-vinci-003. We tested 12 pieces of content, with a word length of 60 – 130 words each. Content was generated over multiple iterations, tones, subtopics, writing styles and tuning of various parameters. We compared five AI detection tools in their accuracy for detecting AI-generated text and then retested using grammar and paraphrasing tools to see if they decreased the level of detection.

AI Text Detection Tools Used

AI text detection tools used

Paraphrasing Tools Used

We also used Grammarly to clean up any grammar and spelling mistakes in the text to test whether this method also decreased detection.

Results

How Accurate Are AI Detection Tools?

First, we asked: How accurate are AI text detection tools?

Reminder: We set out to test the accuracy of five different AI tools using AI-generated text across 12 pieces of content.

Results

How accurate are AI text detection tools

The AI text detection tools were able to correctly identify AI content just 7 out of 12 times (58.3 per cent). 

There was less than 60% accuracy across the five AI detection tools tested. CopyLeaks was only able to correctly predict that four of the 12 pieces of content were written by AI, Originality.ai gave a score correctly predicting that 7 of the 12 pieces of content were AI-generated, Contentatscale.ai was able to assume that half were AI-written, Writer.com was only able to establish that five were AI-generated, while Openai Detector (Huggingface) also averaged out at 7 out of 12 correctly identified.

Can We Bypass AI Detection Tools?

With the five AI text detection tools correctly able to detect AI-written content 58 per cent of the time, we then tested whether we could bypass them by rerunning the test after using grammar and paraphrasing tools. 

We plugged in our original 12 pieces of content to Quillbot (free version) and Jasper AI (content improver).

Results

We found that paraphrasing boosts the originality score of the content across all detection tools tested.

AI Detection Scores After Paraphrasing Using Quillbot

AI text detection results after paraphrasing with Quillbot (free bot)

Can Quillbot be detected? Quillbot paraphrasing tool (free version) was able to boost the originality score for most pieces of content.

AI Detection Scores After Paraphrasing Using Jasper AI

AI Detection Scores After Paraphrasing Using Jasper AI

Jasper AI content improver (paragraph generator) was able to bypass all content detection tools with high originality scores. Only one AI piece of text was correctly detected across 60 tests.

AI Detection Scores After Correction with Grammarly

AI Detection Scores After Correction with Grammarly

We also tested AI detection across the tools after using Grammarly to correct spelling and grammar. Interestingly CopyLeaks was now better able to detect the content as AI, now correctly identifying half of the content as opposed to just four in the original test. Similarly, Contentatscale.ai and Writer.com were able to better predict AI, while Openai Detector (Huggingface) was able to detect just five correctly instead of seven as in the original test.

ChatGPT AI Text Detection

Lastly, we checked if the AI-written content (both original and paraphrased versions) were able to be detected by ChatGPT. Unlike the other tools, ChatGPT was able to detect all AI-generated text from GPT3-Davinci-003 and Jasper AI. This is likely due to all the existing tools being modelled on GPT3. However, there were some instances where style of writing and the topic written by AI was not detected by ChatGPT (though this needs more research to be certain).

Summary

  • For short-length content, the AI text detection tools were able to predict correctly 7/12 times
  • Paraphrasing boosts the originality score of the content for all detection tools
  • Use of uncommon words/high vocabulary increases the originality score
  • Quillbot paraphrasing tool was able to boost the originality score for most pieces of content
  • Jasper AI content improver was able to bypass all content detection tools with high originality scores
  • ChatGPT detects all AI-written content generated from GPT3-Davinci-003 and Jasper AI
  • There were some instances where the style of writing and topic written by AI was not detected by ChatGPT

Caveats/Challenges

  • Some of the AI detection tools need 200 words to predict accurately
  • The experiments were done on short-length paragraphs (60-130 words)
  • Topic was fairly similar across all 12 pieces of content
  • Jasper AI content improver does not take more than 800 characters for paraphrasing
  • Jasper AI paragraph generator does not offer flexibility on the length of the content
  • Quillbot free version does not allow paraphrasing beyond 125 words
James Bardsley
Categories

Recommended for you

Get Our Newsletter

Sign up for our newsletter and receive monthly updates on what we’ve been up to, digital marketing news and more.

Your personal information will not be shared, and we don’t like mail spam or pushy salesmen either!