Posted on 14 Comments

Content Theft Worsens

If you follow my comment feed, then you may have noticed that I am getting huge amounts of trackback spam. Why not just turn off trackbacks? Because these people are stealing my content, and likely your content, for their own personal gain and the trackback is the easiest way to find them. Yes, they generate a link back to Reality Me which in theory should help my page rank but not when it is with duplicate content. I have installed the Antileech WordPress plugin but I am still figuring out how to use it without cutting off my feeds to legitimate readers. If you do end up getting a "this content is stolen" message instead of the actual post, please email juggler at gmail.com and I will fix it. That said, can you confirm which feedreader you use based upon the following:

  • Blogdigger/2.0 (http://www.blogdigger.com/; contact@blogdigger.com) Referred by: http://www.zimbio.com/Jaycees/trackers/7/Blog+Search+Tracker
  • Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 6 subscribers; feed-id=3701543567382179734) Referred by: http://www.google.com/reader/view/
  • Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 9 subscribers; feed-id=8604077678671105327) Referred by: http://www.google.com/reader/view/?tab=my
  • Feedster Crawler/3.0; Feedster, Inc. Referred by: http://ranchero.com/
  • Gregarius/0.5.4 ( http://devlog.gregarius.net/docs/ua) Referred by: http://blognetwork.knoxnews.com/feed.php?channel=81
  • Liferea/1.4.3b (Linux; en_US.UTF-8; http://liferea.sf.net/)
  • NewsGatorOnline/2.0 (http:/www.newsgator.com; 1 subscribers) Referred by: http://www.newsgator.com/ngs/subscriber/WebEd2.aspx?fld=0
  • NewzCrawler/1.8 (compatible; MSIE 6.00; Newz Crawler 1.8; http://www.newzcrawler.com/ )
  • SharpReader/0.9.7.0 (.NET CLR 1.1.4322.2407; WinNT 5.1.2600.0) Referred by: http://127.0.0.1:12108/sharpreader/page.html
  • Wasabot/1.4 (+ http://www.wasalive.com ) Java/1.6.0_02

I am assuming that Blogdigger, Gregarius, and Wasabot are used by content thieves.

Posted on 2 Comments

Anyone want to do a group project?

I’ve never done an open source project before. Here’s my proposal. We write a WordPress plugin that helps create a blacklist of known content thieving IPs. When an IP from the blacklist requests the RSS feed or direct link from the WordPress blog, we deliver an anti-theft of content notice instead of the actual content. The plugin will have the ability to deliver a custom message allowing people the personal choice of making the payload as obscene or marketable as they like. I have some thoughts on implementation since the splog delivering the content might have a different IP than the scavenger. For instance, the plug could alter the comment interface to include a check to mark a comment or trackback as potential content theft. The plugin would then have to examine the server logs to try to draw a correlation between when the real content was posted, the IPs that requested the RSS or post, and the time the stolen content was posted. With large samplings to a single database I think we could be very effective at blocking the thieves. Now, what’s the abuse potential here?

After getting this working on WordPress, I think we could extend it to other platforms.

Update: Looks like Owen Winkler (Antileech) has already written this! Kudos! Lorelle gives an overview and also recommends Digital Fingerprint Detecting Content Theft WordPress Plugin.