Subscribe to RSS Feed

TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.

http:BL - Anti-Spam Tech for Servers from the HoneyPot Project

Let me preface this by being very clear that this sort of thing is more or less beyond my level of technical acumen. That being said, it sounds like the next step in the battle against spammers and harvesters, and is being well received on Digg: http:BL surfaces the database of Project Honeypot - a two year old project to identify spammers & email address harvesters. More importantly, it does so through server modules, and a web API, and returns rich data about IP’s as opposed to a yea/nay blacklist response.

Project Honey Pot is the first and only distributed system for identifying spammers and the spambots they use to scrape addresses from your website. Using the Project Honey Pot system you can install addresses that are custom-tagged to the time and IP address of a visitor to your site. If one of these addresses begins receiving email we not only can tell that the messages are spam, but also the exact moment when the address was harvested and the IP address that gathered it. [from: About Us]

http:BL makes this database available as an Apache module that can be used to block access to your site for spam-originators, as well as a web API. Using either, you can get a formatted response from the http:BL describing each IP that you query (i.e.: each IP visiting your site) and then act on it:

The first octet (127 in the example above) is always 127 and is pre-defined to not have a specified meaning related to the particular visitor.

The second octet represents the number of days since last activity. In the example above, it has been 3 days since the last time the queried IP address saw activity on the Project Honey Pot network. This value ranges from 0 days to 255 days. This value is useful in helping you assess how “stale” the information provided by http:BL is and therefore the extent to which you should rely on it.

The third octet (5 in the example above) represents a threat score for IP. This score is assigned internally by Project Honey Pot based on a number of factors including the number of honey pots the IP has been seen visiting, the damage done during those visits (email addresses harvested or forms posted to), etc. The range of the score is from 0 to 255, where 255 is extremely threatening and 0 indicates no threat score has been assigned. In the example above, the IP queried has a threat score of “5″, which is relatively low. While a rough and imperfect measure, this value may be useful in helping you assess the threat posed by a visitor to your site.

The fourth octet (1 in the example above) represents the type of visitor. Defined types include: “search engine,” “suspicious,” “harvester,” and “comment spammer.” Because a visitor may belong to multiple types (e.g., a harvester that is also a comment spammer) this octet is represented as a bitset with an aggregate value from 0 to 255. In the example above, the type is listed as 1, which means the visitor is merely “suspicious.” A chart outlining the different types appears below. This value is useful because it allows you to treat different types of robots differently.

[From: API Spec]

This is powerful - a big step beyond standard blacklisting, and the API access means its likely to turn up in all sorts of places: I can see someone quickly adding it to WordPress, for example, as a plugin complement to Akismet.

There’s already people out there writing code to protect their blog comment forms, and no doubt more to follow.

, , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Close
E-mail It