Subscribe to RSS Feed

TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.

My spare time TechMeme competitor

I have a rudimentary knowledge of a lot of things: statistics, programming theory, php, mysql, consumer behaviour, and so on. Individually, I’m an expert in none, and could not do any professionally to save my life. Taken as a package though, I know just enough info about just enough disciplines to have a little bit of fun.

In that vein, over the last week, while recovering from a flu/cold/lackofsleep, I glued together a clunky TechMeme competitor as a conceptual exercise for myself to see if I could apply the bits and pieces that I know to generate relevant results.

You can see the system’s (live) output here: http://hddb.net/techstream_index.html

(hddb.net is a different venture of mine, and a convenient development box)
(”techstream” is just my internal development prefix, not branding)

It refreshes every half hour, and as of right now, is actively following 167 blogs in the technology sector. It uses MagpieRSS to cache and check feeds, so hopefully if you’re on the list you’re not seeing weird spikes from hddb.net.

Once it picks up your post, it does a bunch of brute force, ugly stuff to it to try and place it in a larger context. It looks for other posts that yours links to, and other posts that link to yours. It splits up your post title into tags, and searches for other posts that share common tags. When all is said and done, it stitches together tag relevance, links to and from, how long the post has been kicking around, click-through popularity, etc, and through magic that I can’t even really follow any more, its spits out the output that you see.

The algorithm takes about a minute to run during low post volume times (like Sunday evenings), and swells to up to 5 minutes when the blogosphere is cooking.

The result? Not bad, IMHO. Its not as balanced as TechMeme, in that hot items will cling to the top of the page for longer than they should. The page is also longer than it should be. It also doesn’t have the breadth of TechMeme - I imagine Gabe’s algorithm is following more that 167 posts. I’m also manually adding feeds - one of TechMeme’s greatest strengths is that it (I think) picks up new blogs to follow automatically, based on link volumes that it sees. I’d like to get there eventually. I’m also missing any RSS output - there’s nothing to subscribe to yet, and I’m not even sure what form such a subscription would take (I’ve never really followed TechMeme’s bulk feed output).

Anyway, enjoy. Your thoughts/comments/etc. are welcome. If you’d like your blog added to the index, let me know.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Related Posts

TechWatching.com v1.5 is Live
Staying on TechMeme
A Suggestion for TechMeme: Split into bigbusiness & grassroots by indexing stock price
Kevin Rose jumps on the Twitter-wagon
TechWatching Update… new & improved

3 Responses to “My spare time TechMeme competitor”

  1. James Thomas |

    Very cool… What you’re doing is kinda close to an idea I started on. I was pulling tech RSS feeds and posting them as a thread in discussion forum software. The idea was to eventually make a firefox plugin so you can track the resulting discussion as you visit the remote pages.

    In the end, parsing the RSS feeds was too difficult, as even RSS feeds have different values in different tags… and I didn’t have the motivation to finish it. :)

  2. Rod |

    Cheers James! I got around the tagging issue to a degree by disregarding “tags” that people may have actually assigned their post (pulled from technorati, or whatnot), and instead just using post titles.

    I start with the post title and drop all punctuation, etc.
    Then I ditch words with 4 letters or less.
    Then I ditch words from a “ban list” that I have of common words that don’t carry any meaning (in this context).

    When all of that is done, a post title like:

    “Apple to announce subnotebook at January Macworld”

    would have the following tags:

    apple
    announce
    subnotebook
    january
    macworld

    Then, a routine checks tagsets for other posts, and looks for matches. An article like:

    “January might see Apple subnote”

    …would match on “January” and “Apple” and get a match strength score of 0.4 (2 tags of 5 had matches), which determines the “strength” of the relationship used in determining page position.

    So - its still not 100% as, for instance, “subnote” and “subnotebook” don’t match even though they’re communicating the same concept. And an article about “Google to announce Sprint deal in January” would also have a 0.4 match on “announce” and “january” - even though its not conceptually related. So - there’s still work to be done, but its all entertaining, and gratifying to see when it works.

    -R

  3. TechWatching.com v1.5 is Live « TechFold |

    […] mentioned earlier that I might spin off my relevance algorithm as a standalone site - which I’ve now gone ahead and done at TechWatching.com. TechWatching is a blog/news […]

Leave a Reply

Close
E-mail It