Posted on 14 Comments

Content Theft Worsens

If you follow my comment feed, then you may have noticed that I am getting huge amounts of trackback spam. Why not just turn off trackbacks? Because these people are stealing my content, and likely your content, for their own personal gain and the trackback is the easiest way to find them. Yes, they generate a link back to Reality Me which in theory should help my page rank but not when it is with duplicate content. I have installed the Antileech WordPress plugin but I am still figuring out how to use it without cutting off my feeds to legitimate readers. If you do end up getting a "this content is stolen" message instead of the actual post, please email juggler at gmail.com and I will fix it. That said, can you confirm which feedreader you use based upon the following:

  • Blogdigger/2.0 (http://www.blogdigger.com/; contact@blogdigger.com) Referred by: http://www.zimbio.com/Jaycees/trackers/7/Blog+Search+Tracker
  • Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 6 subscribers; feed-id=3701543567382179734) Referred by: http://www.google.com/reader/view/
  • Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 9 subscribers; feed-id=8604077678671105327) Referred by: http://www.google.com/reader/view/?tab=my
  • Feedster Crawler/3.0; Feedster, Inc. Referred by: http://ranchero.com/
  • Gregarius/0.5.4 ( http://devlog.gregarius.net/docs/ua) Referred by: http://blognetwork.knoxnews.com/feed.php?channel=81
  • Liferea/1.4.3b (Linux; en_US.UTF-8; http://liferea.sf.net/)
  • NewsGatorOnline/2.0 (http:/www.newsgator.com; 1 subscribers) Referred by: http://www.newsgator.com/ngs/subscriber/WebEd2.aspx?fld=0
  • NewzCrawler/1.8 (compatible; MSIE 6.00; Newz Crawler 1.8; http://www.newzcrawler.com/ )
  • SharpReader/0.9.7.0 (.NET CLR 1.1.4322.2407; WinNT 5.1.2600.0) Referred by: http://127.0.0.1:12108/sharpreader/page.html
  • Wasabot/1.4 (+ http://www.wasalive.com ) Java/1.6.0_02

I am assuming that Blogdigger, Gregarius, and Wasabot are used by content thieves.

14 thoughts on “Content Theft Worsens

  1. I generally use liferea but occasionally use akregator or sharpreader.

    How much content theft are you seeing? I get one or two bad links a week on my blog, and Katie probably gets double or triple that, but that’s not enough for me to justify the effort in trying to do anything about it.

  2. In the past two days I count roughly 21 websites that have scraped content from RSS feeds. They include urls like videogamearticles.info and ipod.2webhost.info and businessteacher.info and new-age.dailygeektoy.com

    They are all automated and riddled with google ads.

  3. im using google reader

  4. Blogdigger is a blog search engine; we periodically visit your feed and index new posts so they are searchable by keyword, tag, etc. on our site. We do offer feeds for individuals to subscribe to search results, and we pass on only a headline and short excerpt of your post, along with links back to your post and site. There are some who abuse our service, and we’re working constantly to prevent them from abusing our site and your content by blocking them from accessing our service.

    Hope this helps, let me know if I can be of further assistance, I’m happy to help in anyway that I can.

  5. Thanks Greg Gershman! I’m setting my Antileech preferences to allow Bloggerdigger. I’m finding a lot of these content theft sites are only up a week or so before they start 404ing. I’m not even sure how they profit. Doesn’t seem like they last long enough to get indexed by search engines. How do they get traffic? Surely the trackbacks don’t provide them enough adsense revenue.

  6. Actually, I doubt that the spam bloggers are using those actual programs. It is possible to mask your scraping software as any other application in a bid to reduce how easily it is blocked. If you represent your evil RSS scraper as Google Reader, for example, it is harder to block because you can’t just filter based on the application type.

    Those can be spoofed and changed at will, you’re better off checkign out the IP addresses and seeing where they lead.

    Let me know if you need any help with that!

  7. Thanks Jonathan Bailey! I’m examining my server logs to see if I can determine any patterns. Of course, if they make any decent money at their spam blogs, and I’m sure they do, then they probably have this down to a science and I bet beating them at the game could turn into a fulltime job.

  8. One thing that might help … in a couple of different ways … is to not publish full posts in your feeds. I am not very familiar with WordPress and how it works, but I have a separate text area field set up in which I type my excerpt, or the first bit of my post. I try to make it interesting or enough of a hook to get the reader to click through to my actual site to read the whole post. This way, the feed does not have enough content to make it worthwhile to scrape, and it forces people to actually visit the site.

    Not that traffic is a huge deal to me … these are just tips passed along from elsewhere.

  9. No, no, no, no, no, no, no. It’s just not right to have teaser feeds. I for one simply can’t stand them. I’m not going to every blog to read the whole thing. The whole point of the feed reader is to read feeds, not to read teasers.

    I used to have a bunch of blog feeds in my reader that were teasers. About six months ago, I went through and deleted them all. I had slowly stopped reading them because they were such a hassle, and they were just cluttering my reader.

  10. Ah yes, that is the point of view of the READER. That is different than the point of view of the web site owner. Many bloggers use RSS feeds as a tool to generate traffic to their site, which is actually the intended use of RSS feeds. If all their posts can be read from a reader, then few people will visit. “Teasers” or summaries only in the feed are far more effective in driving traffic – and that benefits bloggers that rely on traffic stats for ad rates, and also defeat the scrapers – which is what Doug is concerned with here.

  11. full feeds are the only way to go teasers just annoy me – same as jonathan said above
    as a website owner, i don’t want to annoy my readers, hence i will only use and promote full feeds on all my sites
    theres an interesting discussion about this at problogger
    http://www.problogger.net/archives/2007/09/12/full-or-partial-rss-feeds-the-great-feed-debate/

  12. Has antileech worked for you? I just installed it this morning.

  13. Frankly I haven’t had the time to invest in it. It is not automatic. You have to make choices and I wasn’t willing to risk cutting off legitimate readers for these content thieves. What they are doing is wrong but in essence, it is not much different than what most bloggers do. They are taking an excerpt and linking back to the source.

    I had trouble getting antileech to work on http://domesticpsychology.com/blog/ and I am thinking it is because of the blog being in a subdirectory of the domain.

    I think antileech has the right idea. I just need to confirm with the server logs that I’m blocking the right stuff before I am willing to put a little check in that box.

    Another approach would be to block everything that antileech detects giving a message to please email you and when/if your readers contacted you then you could uncheck the appropriate feed.

  14. You were supposed to tell me you got it to work perfectly & it’s a piece of cake.

    *sigh*

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.