How To Block Copyscape And Why You Should Block Them
This article on how to block Copyscape isn’t in any way shape or form meant to be malicious towards Copyscape or it’s parent company. This article has been compiled solely to prevent innocent webmasters and business owners from receiving copyright infringement warnings on content that is purely coincidental.
Skip The Content And Jump Straight To: How To Block Copyscape
What Is Copyscape And How Do I Block Them?
According to their about page, Copyscape is provided by Indigo Stream Technologies Ltd and to be honest, it doesn’t give too much detail away regarding their operations on how the platform works and what they do with that information other than they use major search engines only to then “post-process” results. To put it bluntly, Copyscape is a free and paid for plagiarism service that can match other websites or documents for any form of plagiarism. They send out bots to crawl and index your website into a database so their users can then compare their websites to other on the internet.
Why Should I Block Copyscape?
Now I don’t have a problem with Copyscape or their services and this article isn’t a defensive campaign as to why Copyscape is wrong. The issue I have is what most web designers will face or have experienced before, the end user. Copyscape has a paid for service that automatically checks for copyright infringements and notifies you if anything looks remotely the same as yours. Once you have discovered this then you can send an email to the website owner, the domain registrar, the company hosting the servers and if they want, the company that built the website.
Why The End User Is The Issue
Recently, my client and I had “DMCA Take Notice” giving us all this legal mumbo jumbo about how their content is protected by law and they swear that they are the true owners of the content, etc, etc. At first, there seemed to be panic and confusion going around because when we build a website from the ground up, we tend to write a page about a product or service ourselves and never do a “copy and paste” job.
When we began to look into it further we realised that a few sentences started out the same as the other website and a few combinations of words matched but all together, it really quite different content. Instead of looking into the “copyright infringement”, the end user decided that this was unacceptable and sent out an email asking for the content to be removed. You will see below just how petty this process is and how minimal the plagiarised content is.
For demo purposes, here are a few sentences that Copyscape picked up on. I’ve highlighted the matching content:
If you’re looking for a carpenter in XXXXXXX then call XXXXXX
We offer high quality workmanship with a truly professional carpentry service
Call us today on XXXXXX
Our expert knowledge and advice extends from our time served team of carpenters that have over 30 years’ experience
The total text that was deemed to be an infringement totalled 16%. This included stop words such as: and, to, at, be etc etc. Not to mention Call us today on 07xxxxxxxx was written three times on the page.
This really does seem to be a flaw in the way this information is processed and analysed because not only are they picking up stops words and repeated call to actions but the platform is matching content that is separated within a sentence. If you’re going to write content for a business like this (a carpenter) then most of the time, you will write certain phrases and wording that others have come up with before.
I decided to google the matched phrases on Google UK to see how many instances appear on the search engine.
If you’re looking for a carpenter = About 1,540,000 results
then call = About 8,160,000 results
We offer high quality workmanship = About 131,000 results
professional carpentry service = About 53,300 results
Call us today = About 23,400,000 results
Our expert knowledge and advice = About 45,300 results
time served team of carpenters that = About 22 results
Why This Can Damage Your Credibility
If a Copyscape user were to send a DMCA notice then they would *presumably* send the notice to the domain registrar, the server hosting company, the company that built the website and to the business owner. This chain of reporting can be really dangerous because by sending this email to (for example) let’s say GoDaddy, then GoDaddy’s copyright team will shut you down without warning. The same goes for the company hosting the servers.
As you will notice from the examples I’ve given above that this is merely an innocent coincidence of thrown together wording. The websites in question are carpentry businesses and the sentences that Copyscape has picked up on are all either call to actions or text that explains about their business, services or history.
To us, the content seemed to consist of the following:
- Call to actions
- Promoting their experience
- Ensuring the customer knows they are an established business
- A few search terms
None of the infringed content was designed to promote work completed by other businesses which I feel would have been a better reason to send out an infringement notice.
How To Block Copyscape
Now if you are looking to block Copyscape for “darker” reasons then I highly suggest you change your morals. This technique should only be used to avoid misunderstandings and over sensitive Copyscape users who jump the gun a little bit too early.
Block Copyscapes I.P Address
To block Copyscape from crawling your website, simply copy and paste this into your .htaccess file.
We had tested messing around with the robots.txt however Copyscapes bots can choose to ignore these types of requests. Placing this I.P Deny into your .htaccess file will block them and their bots entirely from your website making it a whole lot harder to compare content for plagiarism.
Test it out for yourself and you should get a 403 response like this one: