With AI text becoming increasingly popular you’ve probably wondered (like the rest of us) how easy it is to detect. Our Data & Engineering team set out to answer the question of how accurate AI text detection tools actually are, and whether we can bypass them.
AI Detection Test
Below we’ll endeavour to answer the following questions:
- How accurate are AI text detection tools?
- Can we bypass them?
- Can paraphrasing help bypass AI detection?
For the experiment, we only used AI-generated text, using GPT3 Da-vinci-003. We tested 12 pieces of content, with a word length of 60 – 130 words each. Content was generated over multiple iterations, tones, subtopics, writing styles and tuning of various parameters. We compared five AI detection tools in their accuracy for detecting AI-generated text and then retested using grammar and paraphrasing tools to see if they decreased the level of detection.
AI Text Detection Tools Used
Paraphrasing Tools Used
We also used Grammarly to clean up any grammar and spelling mistakes in the text to test whether this method also decreased detection.
Results
How Accurate Are AI Detection Tools?
First, we asked: How accurate are AI text detection tools?
Reminder: We set out to test the accuracy of five different AI tools using AI-generated text across 12 pieces of content.
Results
The AI text detection tools were able to correctly identify AI content just 7 out of 12 times (58.3 per cent).
There was less than 60% accuracy across the five AI detection tools tested. CopyLeaks was only able to correctly predict that four of the 12 pieces of content were written by AI, Originality.ai gave a score correctly predicting that 7 of the 12 pieces of content were AI-generated, Contentatscale.ai was able to assume that half were AI-written, Writer.com was only able to establish that five were AI-generated, while Openai Detector (Huggingface) also averaged out at 7 out of 12 correctly identified.
Can We Bypass AI Detection Tools?
With the five AI text detection tools correctly able to detect AI-written content 58 per cent of the time, we then tested whether we could bypass them by rerunning the test after using grammar and paraphrasing tools.
We plugged in our original 12 pieces of content to Quillbot (free version) and Jasper AI (content improver).
Results
We found that paraphrasing boosts the originality score of the content across all detection tools tested.
AI Detection Scores After Paraphrasing Using Quillbot
Can Quillbot be detected? Quillbot paraphrasing tool (free version) was able to boost the originality score for most pieces of content.
AI Detection Scores After Paraphrasing Using Jasper AI
Jasper AI content improver (paragraph generator) was able to bypass all content detection tools with high originality scores. Only one AI piece of text was correctly detected across 60 tests.
AI Detection Scores After Correction with Grammarly
We also tested AI detection across the tools after using Grammarly to correct spelling and grammar. Interestingly CopyLeaks was now better able to detect the content as AI, now correctly identifying half of the content as opposed to just four in the original test. Similarly, Contentatscale.ai and Writer.com were able to better predict AI, while Openai Detector (Huggingface) was able to detect just five correctly instead of seven as in the original test.
ChatGPT AI Text Detection
Lastly, we checked if the AI-written content (both original and paraphrased versions) were able to be detected by ChatGPT. Unlike the other tools, ChatGPT was able to detect all AI-generated text from GPT3-Davinci-003 and Jasper AI. This is likely due to all the existing tools being modelled on GPT3. However, there were some instances where style of writing and the topic written by AI was not detected by ChatGPT (though this needs more research to be certain).
Summary
- For short-length content, the AI text detection tools were able to predict correctly 7/12 times
- Paraphrasing boosts the originality score of the content for all detection tools
- Use of uncommon words/high vocabulary increases the originality score
- Quillbot paraphrasing tool was able to boost the originality score for most pieces of content
- Jasper AI content improver was able to bypass all content detection tools with high originality scores
- ChatGPT detects all AI-written content generated from GPT3-Davinci-003 and Jasper AI
- There were some instances where the style of writing and topic written by AI was not detected by ChatGPT
Caveats/Challenges
- Some of the AI detection tools need 200 words to predict accurately
- The experiments were done on short-length paragraphs (60-130 words)
- Topic was fairly similar across all 12 pieces of content
- Jasper AI content improver does not take more than 800 characters for paraphrasing
- Jasper AI paragraph generator does not offer flexibility on the length of the content
- Quillbot free version does not allow paraphrasing beyond 125 words