Subscribe to RSS Feed

TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.

Another Tag Silo - Twitter Hashtags

A few days ago, I riffed on how the failure of user-powered tagging was what was driving the need for a semantic web - that jumbled, discontiguous tagging implementations had created a plethora of tag city-states who’s inability to talk on a “national” level had reduced the tagging movement to a curiosity.

Today, another entrant in the form of Hashtags - tags for twitter post. Again, useful within the silo of the twitter-verse, but clunky to extend outwards. You can read more on hashtags via stoweboyd, or stephanie booth, or check out full coverage.

The stated purpose of hashtags is to all one to follow a topical twitter-stream, as was useful for those techies fleeing the SoCal fires this past year. But how much cooler would it be if you could stitch together Twitter content, Flickr coverage, posted videos, blog posts, and news, into a single realtime view of a given situation? That would look a lot like the output of a semantic application.

To do so now would require onerous hard-coding of proprietary hooks into each services API (twitter, flicker, youtube, etc.), with more custom coding to parse out time and geo-relevance data. As I mentioned in my previous article, a two-tiered tagging system composed of machine and human tags, shared in a consistent format, and conforming to common baseline standards would enable this.

, , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

The Failure of Tagging is what’s driving Web 3.0

Tagging is the dream that everyone more or less forgot about.

Tags - little topical keyword snippets that were supposed to herald the start of a new, exciting web of user-classified content. By appending your blogposts/photos/videos/user-generated-content (ugc) with a few simple keywords, the web was expected to “self-organize” into an organic, emergent user-powered taxonomy that would provide the framework for parsing out “meaning.” It was going to make it easy to find deep information on any topic - linking up the long tail of all content types into a parseable, coherent, human-organized whole. Effectively, tagging was intended to provide a distributed, free, human powered search index - one publicly-owned, free super-index to rule them all.

Unfortunately, it didn’t happen. The practicalities of execution were pushed by the wayside in rush to embrace the decentralized nature of ugc under the guise of woodsy, campfire “folksonomies.” A point of balance between “free” and “structured” was never identified, and “Tagging” was never executed in a way that would give it a fighting chance to live up to its promise; a hodge-podge of formats, options, and implementations guaranteed that the concept’s deployments would remain silo’d and keep conceptually linked material fragmented. The glow from each of the folksonomy campfires didn’t extend off of its own URL.

As a result, what we’ve come to accept as “tagging” is the silo’d view. Flickr and del.icio.us are the two brightest burning fires here - both have deep tag taxonomies that have been successfully integrated into the core of their respective user experiences, but neither is designed to extend outwards from their individual campground silos.

And therein lies the core of the problem: a standardized means of sharing tags was never devised, and as anathematic (sp?) as it might be to the “open” nature of Web 2.0, a clearing-house mechanism to reconcile divergent taxonomies never arose. Technorati tried and failed to provide this: remember the days when “tag search” was a prominent Technorati home page option? Its not any more. Do you see “Tag Search” on Google Blog Search? No - because Google’s algorithm does a better job of stitching together meaning than user-supplied tags.

The fundamental broken-ness of tagging has been compounded over time by user’s recognition that outside of those certain successful silos, tagging really has no point - consider the example of tag-based classified search engine Edgeio - which just went belly up. Or “geotagging” as a concept outside of Flickr. At the core of the dismisssal is the user’s desire for consistency - with a Google-esque algorithm you at least know that the logic that generated your SERP (search engine results page) was applied consistently; with tags each content source has its own human-powered logic which likely bears limited resemblance to anyone else’s. Thus - people search by tag less, leading content producers to tag less, etc. - an unvirtuous cycle.

So - in a nutshell, tagging as a Web 2.0 concept is a mess: its fragmented and forgotten.

Let’s look at how it got there, and possibly how it can come back:

  1. Standardized means of tagging: Standards. Yes, they add value, as clunky and overbearing and non-west-coast-hipster-info-wants-to-be-free as the concept may sound. Consider: from a technical standpoint, how do you describe what a tag is? Its meta-data, sure. But how is it content encoded? Is it one word only, or n-words? How do you separate multiple words? Are blog post categories “tags” as well, or are they strictly user-supplied? What about other meta-data that may be collected - like EXIF stuff from pictures? IMHO, a standard definition of what a tag actually is would have been a good first step to interoperability, laying out a baseline for connecting tags between sites. You can successfully implement tagging on a site without giving much thought to these questions - but you won’t be able to do much else with your tag data.
  2. Standardized means of communicating tags: Tags in RSS in a non-proprietary format - seems like a no brainer to me. Tags on pages: the “rel=tag” concept never seemed to get consistently executed; let’s nail down a crawler-friendly spec, and complement it with an useful meta tag spec. Take a look at a Flickr photo page and hit “view source” - how are the tags identified as such? Not with a rel attribute. Take a look at del.icio.us - no rel attributes there too. And, each blogging platform and plugin has its own, different way of going about it too. No wonder Technorati gave up on tags - crawling this stuff is a nightmare.
  3. Standardized “complement” tags: Before you get your back up on “standardizing” tagging, I’m proposing a two-tiered tag system: Date, time, latitude, longtitude, camera type, movie length, format, etc. are meta-data types that lend themselves to structure. A community defined “standard” for tagging time, for instance, might simply list descriptors in different languages - “morning,” “evening,” “mid-day” - etc., which content platforms (WordPress, Flickr, etc.) could then add to posts in a consistent fashion. That would allow for searches along the lines of “media:pictures [taken in the] time:evening [around] location:Sydney+Harbour.”

Ok - so lets sum it all up: what I’m advocating isn’t really about tagging at all: its about meta-data interoperability, for which tagging is one, convenient vector. Convenient because bits and pieces of the technology are in place already, and because there’s some familiarity with it all ready.

So - my suggestion is to bring tagging back by creating a body like the RSS Advisory Board, composed of individuals and organizational stakeholders, to kickstart a “Universal Tag Metadata Format” (UTMF). Publish an initial spec, get some buy in from the big players (Google, Flickr, etc.), and then try this again.

In the title for this post, I note that the failure of tagging is driving “Web 3.0:” if you think about it, the promise of the semantic web is more or less the same as that of tagging - information that knows what it is and its context. Much of the the 3.0 development hinges on AI techniques to achieve what human tagging hasn’t - consistent, exhaustive classification of information so that it can be linked together. I’m of the mind that truly effective AI is still decades away, and that most “3.0″ plays are just extensions of 1.0 algorithmic search engine technology.

The wait for AI leaves a significant gap - which well thought-out human tagging structure could fill.

, ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Google: new unavailable_after meta tag

From funny blog Liam is Big comes news of a new meta tag that Google indexes and uses: the unavailable_after tag. This will apparently take stuff out of Google’s index after a certain date so that limited-time pages (like contests and promotions) won’t clutter up the tubes with 404’s after their done.

,

If you enjoyed this post, make sure you subscribe to my RSS feed!

When AdSense Fails

For all Google’s algorithmic awesomeness, the AdSense crawler still has the incredible ability to suck at keyword analysis. Take, for example, the awesomely popular Desktop Tower Defence game. Check out the AdSense placements:

Yes, that’s an ad for some type of antenna tower, because the page says “Tower” on it in a number of places.

Meanwhile, the perfectly serviceable meta keyword and content tags tell the real story:

<meta name=”description” content=”A flash version of Warcraft III TD”>
<meta name=”keywords” content=”warcraft, flash, game”>

So - Goolge is missing some killer targeted inventory, and HandDrawnGames is missing revenue. Is there no opportunity to create a better connection between content and ad placement here?

  1. Meta Tags: I understand that meta tags are easily abused and Google by-and-large disregards them. What about algorithmically assessing the credibility of meta tags on a site by site basis on the criteria of URL age, history, and traffic pattern?
  2. Webmaster Tools & AdSense: Again why not let webmasters categorize their sites in Google’s Webmaster Tools, allowing superior placement? Again, a credibility algorithm could reduce the impact from link farms, etc.
  3. Tie into DMOZ: Ok, DMOZ is dead in the water. But perhaps its time to resurrect it, and make use of it as a categorization engine for AdSense. Crank up the community profile of DMOZ again, and surface its “category lookup” as a free API, of which AdSense would be the biggest but not only customer.
  4. Del.icio.us: Ok, the Yahoo ownership might make this sticky for Google, but Del.icio.us URL tag history would be a great way to categorize sites for AdSense inventory purposes. Sure del.icio.us can be gamed, but so can anything, and community self-policing tends to dampen gamed popularity spikes. Perhaps Yahoo should be using this as a source of competitive advantage in Panama?

Are people at the search engines thinking of these sort of things? I would have thought Google would be all over this, given that relevance was what made AdSense king in the first place.

, , , , , , , , , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Close
E-mail It