Subscribe to RSS Feed

TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.

Track your Memes on Twitter!

Here’s some fun news: this afternoon I decided to figure out how to use the Twitter API, which is decidedly simple and functional. Anyway, if you’re a twitterholic, you can now get your dose of technology, automotive, environmental, or sun microsystems news via twitter - each of my memetrackers now merrily posts all of its front-page updates to a twitter account for your enjoyment!

TechWatching (tech and web): http://twitter.com/techwatching
SunMeme (sun micro): http://twitter.com/sunmeme
WheelScore (automotive): http://twitter.com/wheelscore
PlasticBasket (environmental): http://twitter.com/plasticbasket.net

On a sidenote, I’m appreciating the utility of Twitter more and more every day as a broadcast medium and selective news filter.

, , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Follow Stories with Cluster Permalinks

I just rolled out a minor but important new feature across the memetracker network that I’m building (TechWatching, Wheelscore, SunMeme, PlasticBasket) - story cluster permalinks.

For background, a “story cluster” is a group of stories around a certain topic - a “meme” or summary of a blogosphere discussion. A story cluster permalink gives you a permanent URL for viewing story clusters - so if you’re following Cadillac’s hybrid motorbike or Fred Wilson’s views on Triangulation, you can follow that story today, tomorrow, a month from now, a year from now - whenever, not just when its still on the front page.

To get to a cluster’s permalink, just look for the “PERMALINK THIS CLUSTER” link at the top of each cluster.

So - enjoy!

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tweetmeme DOA?

I’m fitting this in while visiting family, so this post is late, and likely short on insight - that being said, after pouring hours and days and weeks of hobby-time in building a relevancy algorithm (see: techwatching) I feel the need to comment on Tweetmeme, which launched yesterday to much fanfare.

Divining interesting-ness from a content pool depends on a number of bits of information that provide relationships between disparate bits, allowing them to be linked together into a topical unit (i.e.: a cluster of stories all discussing a certain topic) that makes sense. Over on TechWatching, I use four things to create topical units (”story clusters”):

  1. forward links - the pages that a blog post links to
  2. back links - the pages that link to a given page
  3. keywords - the meaningful words that posts share - i.e.: proper names like “Google” are counted, conjunctions like “and” are not
  4. time - content must have some chronological proximity to be considered “linked” - i.e.: an article about Google from two months ago is less likely to be discussing the topic-du-jour than an article from two hours ago

Now, Techwatching indexes blog posts - which are characterized by all of the above four points - i.e.: blogs are noted for linking, posting quickly (chronological proximity), using relevant keywords and so on. My question about Tweetmeme is whether Twitter provides the same fertile breeding ground of memetic confluence as blogs… Personally, I think its ephermal nature and limited length works against it. Aside from the occaisonal exceptions (Apple events, the socal fires), Twitter seems to be pretty scattershot - and those special situations seem to me to be better served by something more explicit, like hashtags.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching vs. MacWorld: MacWord 1, TechWatching 0

TechWatching is now serving more relevant results and better links, and is doing so more frequently, thanks to a new & improved algorithm behind the scenes, prompted mainly by the MacWorld disaster.

I say “disaster” because the volume of stories related to MacWorld was the perfect storm of relevancy overabundance - the prior version of TechWatching could not scale to sort the volume, and the site quickly devolved to a huge mish-mash of barely organized, repetitious, muddled posts.

The new ranking & relevancy algorithm is actually a lot simpler than the old one: I ended up cutting code length by about 30% to get better results, and more efficient db use means that I can pump up the update frequency without melting down the server - all good.

If you stop by the site, you’ll see better “topical roundup tags” at the top, and more complete story clusters down below.

As always, your feedback on the site and the recent changes would be very appreciated.

If you enjoyed this post, make sure you subscribe to my RSS feed!

SimplePie RSS Parsing

I just switched TechWatching from MagpieRSS to SimplePie. Each of these is an automated RSS feed parser built for PHP, and both feature great features like caching, http-last-modified intelligent requests to lessen the overhead of feed-checking, etc. MagpieRSS, for whatever reason, has died on the vine - i.e.: last blog post is October of 2006. SimplePie has stepped up in the meantime to fill the gap.

So - I’m looking forward to not having to deal with a bunch of problems I thought I’d have to hack Magpie to handle - character encoding, CDATA encoded content blocks, inconsistent namespaces, etc - SimplePie does an awesome job of handling feeds and their data. Big shout out to the SimplePie folks for their great work - thank-you.

,

If you enjoyed this post, make sure you subscribe to my RSS feed!

What’s up with NYT’s “Blog”runner?

I just took a first look at Blogrunner, to see what it does differently than TechMeme or TechWatching. Hmmm. Was it always like this? Or has it changed since the NYT bought it?

Here’s a snap from the Blogrunner landing page:

Here’s a list of the above-the-fold links on that page:

BBC News
BarackObama.com
The NYT
The NYT again
The Washington Post
The Financial Times
AP
The NYT yet again

Hmmmmmm. Doesn’t seem very blog-y. Scrolling down below the fold, there’s a long, narrow list of content composed of individual stories. Of the 33 listed when I stopped by…

  1. 7 were from the NYT.
  2. 19 were from what I would consider to be MSM news sources (WSJ, CNN, Forbes, LATimes, newswires, etc.) - that’s 58%

Blogrunner certainly has blog content - you just need to dig through a layer of MSM window dressing to get to it. Anyway, it seems odd to me that a service that proclaims blog-centric aggregation is dominated by MSM news sources.

One behavioural element that I’ve noticed in working on the TechWatching algorithm is that MSM sources will dominate a content pool - say what you will about blogging, the MSM still leads most “breaking” stories, and as such, bloggers will point to MSM outlets. A link analysis engine will then float the MSM new sources to the top of the “most relevant” pile - creating a presentation like what we see on Blogrunner, or what often happens on Techmeme. (Note: I’m not saying that’s how those sites work, but I can see how link-watching algorithms could create a page like that on any aggregator).

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching Update… new & improved

Some exciting news tonite! Well, exciting for me anyway. My TechMeme alternative and Tailrank-competitor TechWatching now offers vastly improved “conceptual clustering.”

Previously, the homepage organized “hot topics” into keyword clusters, based on the relative prevalence of a given keyword within a pool of content from a certain time period. So, for example, “Google” would be associated with “wireless,” “spectrum,” “auction,” “reader,” and “street.” That’s where it stopped, however, not taking into account that wireless/spectrum/auction all referred to the same topic. The result was that “reader” and “street” were crowded out of the “Hot Topics” area by three similar tags about the upcoming wireless spectrum auction - which left some interesting Google-related news out of sight.

Now, that second layer of clustering is in effect, grouping together wireless/spectrum/auction under “spectrum,” which was the most “powerful” keyword in the concept cluster based on observed volume. That leaves room for “reader” and “street” and lets you know about updates to Google’s RSS-reader and street-level views - hurray! The “other” conceptual group member tags - i.e.: wireless and spectrum, now provide alternative views of the same story cluster, available via an “Also See” link.

Anyway, it all sounds sort of complex: but the upshot, hopefully, is simplicity for TechWatching readers. Have a look below, or check it out for real:

On a side note, a few short weeks ago, I knew nothing about algorithms and content analysis - not to say that I know much now, but I’m certainly having fun figuring it out.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Is their a market for niche-vertical Digg Clones?

Dave Winer is talking elitism again, suggesting that he’d like a digg-clone with 25, 100, or 1000 members comprised of those who he considers worthy.

First observation: FARK is essentially a hybrid between Winer’s vision and Digg. The FARK elite are the mods who have the power to “greenlight” a submission from TotalFark and push it up onto the frontpage of fark.com. FARK imposes a Winer-style “elite filter” on top of a Digg-style submitter-pool. ParisLemon points out that Calacanis’ Netscape tried for a similar model, with more public moderation and moderated and user-voted content in parallel.

Second Observation: The Blogosphere already does this, to a degree. This is the concept that TechMeme figured out first: each link in a blog post is essentially a “submission” to the blogosphere, if the blogosphere were itself a digg-style aggregator. That is to say, bloggers are the curators: Each blog post that links to the same story/webpage/etc. counts as another vote in its favour. To carry over Winer’s idea, he’d like a customizable pool of blogs to draw on to populate his aggregator.

IMHO, TechMeme lost its way a bit over time, by including things like MSM coverage, dugg stories, and press releases - opening the curation too wide (hence my response at TechWatching).

Bring it all together…

And were talking about a spectrum of curation with two axes:

Dave Winer’s staked out that corner that says “community moderation of small pool of submitters,” compare to, for example, FARK’s position at “direct moderation of a large pool,” or Digg’s “indirect (community) moderation of a large pool.” Netscape sits awkwardly in the middle, hoping to hold the middle ground.

So: is there a market in Dave’s corner of aggregation-space? What do you think? The chart above does nothing to relate position to “quality” - so I suppose ultimately it depends on the quality of the Elite that Winer deigns to include. Perhaps there’s a different middle ground to be found by voting on who’s considered “elite” enough to be one of the curators…

, , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching Blog Search is coming along…

Want to know what the tech blogosphere has said about Windows Vista in the past week? Click here to have a look.

TechWatching now surfaces the indexes that it uses for meme-tracking as a searchable database. Some compartive analysis:

  1. Its very vertical. This searches only those blogs that TechWatching indexes - about 225 as of this writing. This number will grow over time of course, but it will never have the breadth of say Google Blog Search. On the upside, results will be spam free and from “trusted” sources vetted by the entire tech blogosphere.
  2. Its not deep yet. TechWatching isn’t crawling sites - just following feeds. So the content is very chronologically shallow so far, going back only 7 days (that’s when the current database was rolled out). Of course, it gets deeper second by second, but if you’re looking for older stuff, its back to Google or Technorati.
  3. It indexes titles only. I’m trading efficiency for depth here, based on the assumption that if a blogger is writing about something for a feed-reading audience, they’re going to put their important topical keywords in their post titles. Undoubtedly some posts will fall through the cracks because of this - but FWIW, if a blogger is constantly obfuscating their posts with titles that bear at-best tangential relation to their post content, well… that’s annoying for feed scanning people too, not just robots.

I also built in a time-boxer today, so you can see posts from the last few hours, today, etc. Of course, the periods longer that two weeks don’t mean a lot as the index doesn’t go back that far yet….

Anyway - enjoy. I built this engine on top of TechWatching because I find Technorati frustrating and inconsistent, and lord only knows, the world needs an alternative to Google. Also, I believe in the power of focused verticals to deliver superior results.

More to follow - stay tuned.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching.com v1.5 is Live

I mentioned earlier that I might spin off my relevance algorithm as a standalone site - which I’ve now gone ahead and done at TechWatching.com. TechWatching is a blog/news aggregator that competes squarely with TechMeme.

TW is divided into 4 sections:

  1. Hot Topics: TW maintains a keyword index of the blogs it follows. When a keyword shows up more frequently than average, it can be promoted to a Hot Topic - in which case it will show up at the top of the page with closely related keywords and relevant stories.
  2. New Stories: Immediately under the Hot Topics resides “new stories” - stories that are getting attention in the tech blogosphere, but that don’t fit into a specific Hot Topic. This area is analogous to the TechMeme presentation.
  3. Below the Fold: Under New Stories resides BTF: Stories that have fallen out of the New Stories area because they are aging and haven’t gotten enough momentum behind them to stay “new” or to be promoted to a Hot Topic.
  4. Must Read: The right-hand column contains a list of the most-clicked on stories of the last 24 hours, collected from the site and RSS readers. Presumably lots of people are reading these - and thus, so should you.

That’s it in a nutshell. Its only refreshing hourly right now, but I’ll be ramping up the rate tonite.

Below is a screenshot, or click over to http://techwatching.com to see the real thing. Feedback is welcome and appreciated!

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Close
E-mail It