TechFold - Bold tech & web commentary
Bold tech & web commentary
TechFold is technology discussion, commentary, reviews, and opinions from well outside the valley. There's no koolaid to drink here, and TechFold is not in SL, or on Twitter.
The Failure of Tagging is what’s driving Web 3.0
Tagging is the dream that everyone more or less forgot about.
Tags - little topical keyword snippets that were supposed to herald the start of a new, exciting web of user-classified content. By appending your blogposts/photos/videos/user-generated-content (ugc) with a few simple keywords, the web was expected to “self-organize” into an organic, emergent user-powered taxonomy that would provide the framework for parsing out “meaning.” It was going to make it easy to find deep information on any topic - linking up the long tail of all content types into a parseable, coherent, human-organized whole. Effectively, tagging was intended to provide a distributed, free, human powered search index - one publicly-owned, free super-index to rule them all.
Unfortunately, it didn’t happen. The practicalities of execution were pushed by the wayside in rush to embrace the decentralized nature of ugc under the guise of woodsy, campfire “folksonomies.” A point of balance between “free” and “structured” was never identified, and “Tagging” was never executed in a way that would give it a fighting chance to live up to its promise; a hodge-podge of formats, options, and implementations guaranteed that the concept’s deployments would remain silo’d and keep conceptually linked material fragmented. The glow from each of the folksonomy campfires didn’t extend off of its own URL.
As a result, what we’ve come to accept as “tagging” is the silo’d view. Flickr and del.icio.us are the two brightest burning fires here - both have deep tag taxonomies that have been successfully integrated into the core of their respective user experiences, but neither is designed to extend outwards from their individual campground silos.
And therein lies the core of the problem: a standardized means of sharing tags was never devised, and as anathematic (sp?) as it might be to the “open” nature of Web 2.0, a clearing-house mechanism to reconcile divergent taxonomies never arose. Technorati tried and failed to provide this: remember the days when “tag search” was a prominent Technorati home page option? Its not any more. Do you see “Tag Search” on Google Blog Search? No - because Google’s algorithm does a better job of stitching together meaning than user-supplied tags.
The fundamental broken-ness of tagging has been compounded over time by user’s recognition that outside of those certain successful silos, tagging really has no point - consider the example of tag-based classified search engine Edgeio - which just went belly up. Or “geotagging” as a concept outside of Flickr. At the core of the dismisssal is the user’s desire for consistency - with a Google-esque algorithm you at least know that the logic that generated your SERP (search engine results page) was applied consistently; with tags each content source has its own human-powered logic which likely bears limited resemblance to anyone else’s. Thus - people search by tag less, leading content producers to tag less, etc. - an unvirtuous cycle.
So - in a nutshell, tagging as a Web 2.0 concept is a mess: its fragmented and forgotten.
Let’s look at how it got there, and possibly how it can come back:
- Standardized means of tagging: Standards. Yes, they add value, as clunky and overbearing and non-west-coast-hipster-info-wants-to-be-free as the concept may sound. Consider: from a technical standpoint, how do you describe what a tag is? Its meta-data, sure. But how is it content encoded? Is it one word only, or n-words? How do you separate multiple words? Are blog post categories “tags” as well, or are they strictly user-supplied? What about other meta-data that may be collected - like EXIF stuff from pictures? IMHO, a standard definition of what a tag actually is would have been a good first step to interoperability, laying out a baseline for connecting tags between sites. You can successfully implement tagging on a site without giving much thought to these questions - but you won’t be able to do much else with your tag data.
- Standardized means of communicating tags: Tags in RSS in a non-proprietary format - seems like a no brainer to me. Tags on pages: the “rel=tag” concept never seemed to get consistently executed; let’s nail down a crawler-friendly spec, and complement it with an useful meta tag spec. Take a look at a Flickr photo page and hit “view source” - how are the tags identified as such? Not with a rel attribute. Take a look at del.icio.us - no rel attributes there too. And, each blogging platform and plugin has its own, different way of going about it too. No wonder Technorati gave up on tags - crawling this stuff is a nightmare.
- Standardized “complement” tags: Before you get your back up on “standardizing” tagging, I’m proposing a two-tiered tag system: Date, time, latitude, longtitude, camera type, movie length, format, etc. are meta-data types that lend themselves to structure. A community defined “standard” for tagging time, for instance, might simply list descriptors in different languages - “morning,” “evening,” “mid-day” - etc., which content platforms (WordPress, Flickr, etc.) could then add to posts in a consistent fashion. That would allow for searches along the lines of “media:pictures [taken in the] time:evening [around] location:Sydney+Harbour.”
Ok - so lets sum it all up: what I’m advocating isn’t really about tagging at all: its about meta-data interoperability, for which tagging is one, convenient vector. Convenient because bits and pieces of the technology are in place already, and because there’s some familiarity with it all ready.
So - my suggestion is to bring tagging back by creating a body like the RSS Advisory Board, composed of individuals and organizational stakeholders, to kickstart a “Universal Tag Metadata Format” (UTMF). Publish an initial spec, get some buy in from the big players (Google, Flickr, etc.), and then try this again.
In the title for this post, I note that the failure of tagging is driving “Web 3.0:” if you think about it, the promise of the semantic web is more or less the same as that of tagging - information that knows what it is and its context. Much of the the 3.0 development hinges on AI techniques to achieve what human tagging hasn’t - consistent, exhaustive classification of information so that it can be linked together. I’m of the mind that truly effective AI is still decades away, and that most “3.0″ plays are just extensions of 1.0 algorithmic search engine technology.
The wait for AI leaves a significant gap - which well thought-out human tagging structure could fill.
semantics, tagging, tags web3.0If you enjoyed this post, make sure you subscribe to my RSS feed!
Related Posts
Another Tag Silo - Twitter Hashtags
Whoops: rwURL was messed up
Apparenlty $35M is the margin between life and death
What’s up with NYT’s “Blog”runner?
Seth’s Day Old Sushi

Subscribe to RSS Feed
Subscribe to TechFold RSS





I’ve been thinking about this issue for a while, too. I like your thoughts but would like to know more, notably about your concluding paragraphs.
Jon, to expand a little:
The “semantic web” implies a future in which a page about penguins can communicate that its about “penguins,” “animals,” “flightless birds,” and “cold climate fauna” - and not about NHL hockey teams, on Linux distributions.
That is to say, meta data provides context for data, allowing it to be parsed intelligently compared to straight keyword analysis.
So - the big question is where that metadata comes from. In the past, Google has been successful by looking to link-text for meta data - i.e.: If I link to our hypothetical penguin page with “penguins: our flightless, arctic friends,” Google can increase the probability that the page is about “penguins: flightless birds, arctic” compared to “penguins: nhl hockey.” By analyzing thousands of links, Google can often craft a pretty good picture.
Obviously though, algorithmic analysis has holes and is vulnerable to fraud, and generally works at a high level, whereas a truly “semantic web” should be able to parse out deep, granular meaning about a page’s contents, and do so correctly, with a modicum of human intuition.
My thought is that a true semantic web will depend on artificially intelligent crawlers that are still many years away. Tagging will not get us to the same place as an automated AI crawler, but can perform a bridging function between now and AI’s availability - if a rational, agreed upon means of collecting and sharing tag-based meta data can be agreed upon by the key players.
[…] few days ago, I riffed on how the failure of user-powered tagging was what was driving the need for a semantic web - that jumbled, discontiguous tagging implementations had created a plethora of tag city-states […]