Subscribe to RSS Feed

TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.

Tweetmeme DOA?

I’m fitting this in while visiting family, so this post is late, and likely short on insight - that being said, after pouring hours and days and weeks of hobby-time in building a relevancy algorithm (see: techwatching) I feel the need to comment on Tweetmeme, which launched yesterday to much fanfare.

Divining interesting-ness from a content pool depends on a number of bits of information that provide relationships between disparate bits, allowing them to be linked together into a topical unit (i.e.: a cluster of stories all discussing a certain topic) that makes sense. Over on TechWatching, I use four things to create topical units (”story clusters”):

  1. forward links - the pages that a blog post links to
  2. back links - the pages that link to a given page
  3. keywords - the meaningful words that posts share - i.e.: proper names like “Google” are counted, conjunctions like “and” are not
  4. time - content must have some chronological proximity to be considered “linked” - i.e.: an article about Google from two months ago is less likely to be discussing the topic-du-jour than an article from two hours ago

Now, Techwatching indexes blog posts - which are characterized by all of the above four points - i.e.: blogs are noted for linking, posting quickly (chronological proximity), using relevant keywords and so on. My question about Tweetmeme is whether Twitter provides the same fertile breeding ground of memetic confluence as blogs… Personally, I think its ephermal nature and limited length works against it. Aside from the occaisonal exceptions (Apple events, the socal fires), Twitter seems to be pretty scattershot - and those special situations seem to me to be better served by something more explicit, like hashtags.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

What’s up with NYT’s “Blog”runner?

I just took a first look at Blogrunner, to see what it does differently than TechMeme or TechWatching. Hmmm. Was it always like this? Or has it changed since the NYT bought it?

Here’s a snap from the Blogrunner landing page:

Here’s a list of the above-the-fold links on that page:

BBC News
BarackObama.com
The NYT
The NYT again
The Washington Post
The Financial Times
AP
The NYT yet again

Hmmmmmm. Doesn’t seem very blog-y. Scrolling down below the fold, there’s a long, narrow list of content composed of individual stories. Of the 33 listed when I stopped by…

  1. 7 were from the NYT.
  2. 19 were from what I would consider to be MSM news sources (WSJ, CNN, Forbes, LATimes, newswires, etc.) - that’s 58%

Blogrunner certainly has blog content - you just need to dig through a layer of MSM window dressing to get to it. Anyway, it seems odd to me that a service that proclaims blog-centric aggregation is dominated by MSM news sources.

One behavioural element that I’ve noticed in working on the TechWatching algorithm is that MSM sources will dominate a content pool - say what you will about blogging, the MSM still leads most “breaking” stories, and as such, bloggers will point to MSM outlets. A link analysis engine will then float the MSM new sources to the top of the “most relevant” pile - creating a presentation like what we see on Blogrunner, or what often happens on Techmeme. (Note: I’m not saying that’s how those sites work, but I can see how link-watching algorithms could create a page like that on any aggregator).

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching Update… new & improved

Some exciting news tonite! Well, exciting for me anyway. My TechMeme alternative and Tailrank-competitor TechWatching now offers vastly improved “conceptual clustering.”

Previously, the homepage organized “hot topics” into keyword clusters, based on the relative prevalence of a given keyword within a pool of content from a certain time period. So, for example, “Google” would be associated with “wireless,” “spectrum,” “auction,” “reader,” and “street.” That’s where it stopped, however, not taking into account that wireless/spectrum/auction all referred to the same topic. The result was that “reader” and “street” were crowded out of the “Hot Topics” area by three similar tags about the upcoming wireless spectrum auction - which left some interesting Google-related news out of sight.

Now, that second layer of clustering is in effect, grouping together wireless/spectrum/auction under “spectrum,” which was the most “powerful” keyword in the concept cluster based on observed volume. That leaves room for “reader” and “street” and lets you know about updates to Google’s RSS-reader and street-level views - hurray! The “other” conceptual group member tags - i.e.: wireless and spectrum, now provide alternative views of the same story cluster, available via an “Also See” link.

Anyway, it all sounds sort of complex: but the upshot, hopefully, is simplicity for TechWatching readers. Have a look below, or check it out for real:

On a side note, a few short weeks ago, I knew nothing about algorithms and content analysis - not to say that I know much now, but I’m certainly having fun figuring it out.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Is their a market for niche-vertical Digg Clones?

Dave Winer is talking elitism again, suggesting that he’d like a digg-clone with 25, 100, or 1000 members comprised of those who he considers worthy.

First observation: FARK is essentially a hybrid between Winer’s vision and Digg. The FARK elite are the mods who have the power to “greenlight” a submission from TotalFark and push it up onto the frontpage of fark.com. FARK imposes a Winer-style “elite filter” on top of a Digg-style submitter-pool. ParisLemon points out that Calacanis’ Netscape tried for a similar model, with more public moderation and moderated and user-voted content in parallel.

Second Observation: The Blogosphere already does this, to a degree. This is the concept that TechMeme figured out first: each link in a blog post is essentially a “submission” to the blogosphere, if the blogosphere were itself a digg-style aggregator. That is to say, bloggers are the curators: Each blog post that links to the same story/webpage/etc. counts as another vote in its favour. To carry over Winer’s idea, he’d like a customizable pool of blogs to draw on to populate his aggregator.

IMHO, TechMeme lost its way a bit over time, by including things like MSM coverage, dugg stories, and press releases - opening the curation too wide (hence my response at TechWatching).

Bring it all together…

And were talking about a spectrum of curation with two axes:

Dave Winer’s staked out that corner that says “community moderation of small pool of submitters,” compare to, for example, FARK’s position at “direct moderation of a large pool,” or Digg’s “indirect (community) moderation of a large pool.” Netscape sits awkwardly in the middle, hoping to hold the middle ground.

So: is there a market in Dave’s corner of aggregation-space? What do you think? The chart above does nothing to relate position to “quality” - so I suppose ultimately it depends on the quality of the Elite that Winer deigns to include. Perhaps there’s a different middle ground to be found by voting on who’s considered “elite” enough to be one of the curators…

, , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

TechWatching.com v1.5 is Live

I mentioned earlier that I might spin off my relevance algorithm as a standalone site - which I’ve now gone ahead and done at TechWatching.com. TechWatching is a blog/news aggregator that competes squarely with TechMeme.

TW is divided into 4 sections:

  1. Hot Topics: TW maintains a keyword index of the blogs it follows. When a keyword shows up more frequently than average, it can be promoted to a Hot Topic - in which case it will show up at the top of the page with closely related keywords and relevant stories.
  2. New Stories: Immediately under the Hot Topics resides “new stories” - stories that are getting attention in the tech blogosphere, but that don’t fit into a specific Hot Topic. This area is analogous to the TechMeme presentation.
  3. Below the Fold: Under New Stories resides BTF: Stories that have fallen out of the New Stories area because they are aging and haven’t gotten enough momentum behind them to stay “new” or to be promoted to a Hot Topic.
  4. Must Read: The right-hand column contains a list of the most-clicked on stories of the last 24 hours, collected from the site and RSS readers. Presumably lots of people are reading these - and thus, so should you.

That’s it in a nutshell. Its only refreshing hourly right now, but I’ll be ramping up the rate tonite.

Below is a screenshot, or click over to http://techwatching.com to see the real thing. Feedback is welcome and appreciated!

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

My spare time TechMeme competitor

I have a rudimentary knowledge of a lot of things: statistics, programming theory, php, mysql, consumer behaviour, and so on. Individually, I’m an expert in none, and could not do any professionally to save my life. Taken as a package though, I know just enough info about just enough disciplines to have a little bit of fun.

In that vein, over the last week, while recovering from a flu/cold/lackofsleep, I glued together a clunky TechMeme competitor as a conceptual exercise for myself to see if I could apply the bits and pieces that I know to generate relevant results.

You can see the system’s (live) output here: http://hddb.net/techstream_index.html

(hddb.net is a different venture of mine, and a convenient development box)
(”techstream” is just my internal development prefix, not branding)

It refreshes every half hour, and as of right now, is actively following 167 blogs in the technology sector. It uses MagpieRSS to cache and check feeds, so hopefully if you’re on the list you’re not seeing weird spikes from hddb.net.

Once it picks up your post, it does a bunch of brute force, ugly stuff to it to try and place it in a larger context. It looks for other posts that yours links to, and other posts that link to yours. It splits up your post title into tags, and searches for other posts that share common tags. When all is said and done, it stitches together tag relevance, links to and from, how long the post has been kicking around, click-through popularity, etc, and through magic that I can’t even really follow any more, its spits out the output that you see.

The algorithm takes about a minute to run during low post volume times (like Sunday evenings), and swells to up to 5 minutes when the blogosphere is cooking.

The result? Not bad, IMHO. Its not as balanced as TechMeme, in that hot items will cling to the top of the page for longer than they should. The page is also longer than it should be. It also doesn’t have the breadth of TechMeme - I imagine Gabe’s algorithm is following more that 167 posts. I’m also manually adding feeds - one of TechMeme’s greatest strengths is that it (I think) picks up new blogs to follow automatically, based on link volumes that it sees. I’d like to get there eventually. I’m also missing any RSS output - there’s nothing to subscribe to yet, and I’m not even sure what form such a subscription would take (I’ve never really followed TechMeme’s bulk feed output).

Anyway, enjoy. Your thoughts/comments/etc. are welcome. If you’d like your blog added to the index, let me know.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

AT&T Protest Graphic for your blog, website, whatever

AT&T is stumbling through the Internet age with a stunning lack of foresight: in support of their TV offering, and to make friends with Hollywood, they’ve agreed to work towards actively scanning all AT&T network traffic for copywritten material.

I’m not going to go into the details. CenterNetworks, Uneasy Silence, Dave Winer, Techcrunch, Ars Technica, AllThingsDigital, and others have covered it exhaustively already.

I will, however, contribute a protest button/graphic/badge. Right-click & Save As, and deploy on your blog/website/tshirt/etc as you see fit. Link it back to whatever post, petition, goatse picture, or whatever you choose. Please don’t hotlink. Its a friendly png of 14kb in size, with a popular pop-culture reference to make it extra topical.

UPDATE: I took it down until I can figure out if the AT&T corporate logo can be used for critical purposes under the “fair use” clause without first seeking permission. I am a legal chicken-shit, yes.

UPDATE 2: What do you say - would it be covered under nominative use?

UPDATE 3: Removed literal trademark elements while retaining visual cues and tagline syntax. Should be legal, IMHO.

, , , , , , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Apple, Sun, Vista, and ZFS

A few days back, Sun CEO Jonathan Schwartz made an announcement about Sun’s next-gen file system (ZFS) being one of the key features of Apple OSX Leopard. Today, Apple rejected that sound bite, claiming HFS+ (from 1998) is still the order of the day.

First point: this reminds me a lot of Windows Vista and the whole WinFS debacle. Its amazing how much trouble filesystems seem to give OS manufacturers.

Second point: How does one have so severe a disconnect between major companies that embarrassing PR situations like this take place?

I’m not much of a geek, but one thing that does get me excited is file systems. Speed, searchability, security, reliability - all of these are tied to your FS, and ZFS seems to offer great advantages for each. So - I wish Apple or MS would get to it already.

, , , , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Staying on TechMeme

Huh. Looks like I’m de-listed from TechMeme - again. TechMeme seems to drop sites from its “to-index” list if you don’t link to other stories that get on TechMeme regularly. To be honest, I don’t really like that - it increases the feedback loop that clusters blog posts around whatever the two or three hot topics of the day may be. That decreases the breadth of stories that get written as writers are encouraged to write on TechMeme’d topics and discouraged from writing on others. Of course it also decreases the breadth of stories that hit TechMeme.

Of course no one should be writing “for” TechMeme - but TechMeme to some degree has come to define what’s “topical” in the tech blogosphere. I think Gabe R. would state that TechMeme just chronicles the discussion taking place, but I’d argue that by virtue of existing, TechMeme is influencing. The traffic that TechMeme can deliver provides motivation to write on stories that are on TechMeme - making it an active part of the blogosphere - not an impartial observer. I’ve certainly written posts on TechMeme-listed news expressly to get traffic - and have been rewarded for it with nice spikes.

Over the past two weeks, however, I’ve been actively trying to read TechMeme less, so as to get out of the topics that bloggers flock to and focus writing on what really interests myself and hopefully TechFold readers. The hope is that in creating a blog with a personality of its own, as opposed to just a reflection of What’s Hot on TechMeme, will in the long run garner this site more & more dedicated readership.

Thoughts? Please share.

,

If you enjoyed this post, make sure you subscribe to my RSS feed!

A Suggestion for TechMeme: Split into bigbusiness & grassroots by indexing stock price

I love Techmeme, visit it all the time. But more and more, it seems to be getting stormed by “big” stories from Google or Microsoft, leaving the other happenings of the Tech Blogosphere hanging off the bottom, unnoticed.

So - I have a proposal for Gabe: map story keywords to stock prices. If the dominant keyword in a story (i.e.: Google = GOOG) is publically traded and has a stock price over a certain threshold ($100? $50?), the story and all “discussion” and “related” stories go into BlueChipMeme or BigMeme or something.

That would be a nice, opinion agnostic way of separating out big business vs. grassroots/startups. Dell, IBM, Google, Microsoft, Yahoo!, etc. would be chronicled on one site; startups, opinions, and blogosphere happenings on another.

Plus: its a cool mashup!

EDIT: Another option - keep all stories in the same site, but offer a stock price threshold slider at the top of the page to let readers tailor the stories being displayed to their preferences.

, , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Close
E-mail It