By Amine Rahal, entrepreneur & writer. Amine is the CEO of IronMonk, a digital marketing agency specializing in SEO & CMO at Regal Assets, an IRA company.
A duplicate content penalty can devastate your SEO rankings. As the owner of two digital marketing agencies, the very words “duplicate content” put the fear of God in me. If you’re flagged by Google’s PageRank algorithm for duplicate content, you can kiss your chances of ranking goodbye until they’re fixed.
Needless to say, it’s crucial that you avoid duplicate content if you want to succeed with your content strategy. But sometimes, even without being aware of it, we can accidentally publish non-original content on our websites. Luckily, if you do happen to have duplicate content, there are relatively simple solutions available to fix the problem.
In this article, I’ll go over my tried and true strategies for correcting duplicate content and improving your PageRank after creating non-original content.
How To Detect Duplicate Content
First, it’s important to note that not all duplicated content is published with malicious intent. Although now a bit dated, the former head of Google’s web spam team, Matt Cutts, remarked that at least 25% of the internet’s content was duplicative in 2013. Clearly, not all of this is intentionally plagiarised, but rather accidental or made in error.
Your first step is to run an SEO audit using a keyword research tool such as SEMrush, Moz or Ahrefs. These software solutions effectively do the same thing, and they all offer free trials, so it shouldn’t matter which one you select. Running a “Site Audit” using these tools will produce a report that includes the URLs of all your highly duplicated pages (i.e., >5%).
Some SEOs on a budget simply like to copy and paste the first sentence of their article onto Google Search. If anything other than their URL pops up, you likely have duplicated material on your hands. However, this method is sometimes inaccurate and can generate a lot of false negatives. That’s why I recommend dedicated plagiarism software such as:
Earlier in my career, I used a service called Copyscape (or Siteliner) to crawl the web for plagiarized or duplicated content. As a rule, I like to make sure nothing more than 4% of a website’s material exists elsewhere on the internet. If my Copyscape results come back in excess of that, then I edit the content until it’s under the 4% mark.
A Note On Short Content And Duplicated Content
Shorter content containing fewer words is more likely to have high duplication results. This is especially true for “listicle” or roundup review articles in which products are mentioned by name. Often, simply writing out the long-form of a product title (e.g., “Joe Smith’s Ultra Healthy Canine Superfood for Large Adult Dogs”) several times can be enough to trigger 5% duplication or more in articles that only consist of a few hundred words.
If you can work around this issue by abbreviating the title names, then do so. However, there’s often no way to avoid running into these issues when creating short listicle articles. If that’s the case, don’t panic. I’ve ranked countless short listicles with relatively high duplicated content due to this inevitability, and I believe the PageRank algorithm makes an exception in these cases.
Cleaning Up Your Content
Once you’ve written a list of all the URLs under your domain with content that’s 5% duplicated or more, you can begin the editing process. If you have a large website (i.e., hundreds of pages) replete with duped content, then you might want to consider hiring an SEO content writing agency to outsource your editing. If not, you’ll have to rewrite the content yourself.
Plagiarism checkers will issue a report for each page that highlights the duplicated content. Simply keep this tab open in a side-by-side view with your text editor, and manually go through each article and substantively rewrite each highlighted text segment. There’s no “easy” way out of the problem — it has to be a thorough rewrite.
It’s not enough that you merely swap out a few keywords here and there with synonyms. Instead, I always delete the duplicated text outright and start again from scratch. I try to find a completely different thought to express in its place, or at least rewrite the text so that every word is original and therefore meaningfully different from its previous version. Remember, PageRank is intelligent and can see through lazy attempts to rewrite.
When you’re finished, run the article through Copyscape again or run a full Site Audit using your SEO research tool. If the page doesn’t appear or comes back with less than 4% of its content flagged, you can move on to the next piece.
Protect Against Web Scrapers
Web scraper bots are designed to steal high-quality content from websites and republish it on their own. This is unethical and usually a violation of copyright law. Unfortunately, it can also result in a duplication flag against your own website.
Running a Site Audit or Copyscape query can help detect when your website has been scraped. However, I also recommend setting up a Google Alert for each of your blog post titles. This way, if a bot scrapes your content and republishes it, you will receive an alert to your inbox. From there, you can contact the web host and request they remove the content as it constitutes a copyright violation.
Keep It Real With Your Content
We all know that plagiarizing is wrong, but few know that you can unintentionally plagiarise or republish content, even if it’s your own, and get penalized for it.
To keep your SEO performance strong, make sure you’re habitually running Site Audits and always run your articles through Copyscape before posting them. To ward off scrapers, I also advise that you set up a Google Alert for each article title. If you can follow these rules, you’ll stay free of duplication penalties and your SEO results will show for it.