Showing posts with label www. Show all posts
Showing posts with label www. Show all posts

Tuesday, October 02, 2007

SearchLight

I mucked around with Custom Search Engines today, and created SearchLight, a mini search-engine restricted to retrieving results from websites about off-camera photographic lighting.

Give it a whirl, and let me know if there are sites you'd like me to add.

Monday, May 07, 2007

Methinks...

...we should rechristen it SueTube. Don't you?

Monday, February 19, 2007

New Blog Home

Assuming DNS has updated, you should now notice that this humble blog is at http://blog.randomprocesses.net/. Still hosted on Google servers (so I don't pay extra for bandwidth cost), but by mucking about with DNS and adding a CNAME alias from the hostname "blog" to ghs.google.com, I get it all working quite seamlessly.

Monday, October 09, 2006

Foot in blog syndrome

I think it's funny as hell that the post on how much we value security, is followed by the one explaining that our blog was hacked into.

I love my company :)

Thursday, October 05, 2006

Upgrade

I switched to the new "Blogger Beta" this weekend. The downside was that I lost some of the customization that I'd done to the old template, but I like the new interface much more than I did the old one. And we finally have labels! Oh joy! I had tried the roundabout del.icio.us route before, but that was simply too cumbersome to last more than about 2 posts.

Mmm... even has that new blog smell...

Thursday, August 31, 2006

New Home

The old lab server went titsup about 3 weeks ago, and I've been without a web home while the hard drive gets some recovery work done. So I finally bit the bullet and picked up a domain I've been eyeing for a while.

Welcome to my new home. I even did the decor myself. It's somewhat sparse on content, but that'll get fleshed out whenever I get another 3.8 minutes to spare.

Sunday, April 30, 2006

"Manual" is the new "algorithmic"

A recent conversation with a friend about the whole Web 2.0 madness got him to flatter me into pimping my opinion on the subject to the blog. The title of the post was (of course) inspired by a conversation with the other wife.

In the beginning, there were directories (think Yahoo! and the ODP). These were manually populated by some trusted community of people, who made sure that the links pointed to relevant content. Eventually the amount of content on the web grew way beyond the capability of manual discovery, and fairly complicated algorithms crawled and sifted through the mounds of crud to find the data relevant to most queries (think Google).

Once it became obvious that search was an incredibly powerful driving force for web commerce, it wasn't long before an entire community of black-hat search engine optimizers (SEOs) popped up to manipulate the rankings to their advantage. After all, "There was GOLD in them thar SERPs! (Search Engine Result Pages)". Most search engines of course have groups of people dedicated to making sure the ranking algorithms are wise to their tricks.

Fast-forward to 2003(ish), and the pendulum swings back to manual labor. Del.icio.us and Flickr introduce this novel concept: Let users "tag" content (URLs and images respectively) with words representative of the content (like "family", "poodle", "jazz"). This works great. Free labelled data! Naively, one could use this as a direct relevance statement. An object tagged with the term "jazz" must obviously be a valid search result for the query "jazz", right? Quite so, but the real power of this turns up if you can generalize the labelling to unlabeled content on the web. That's exactly what machine learning algorithms do. If Yahoo!'s smart, their boffins are using their acquisitions of del.ico.us and Flickr to do exactly that.

From a naive point of view, it would appear that we're done. We've solved the relevance problem if the users themselves tell us what's relevant. Right?

Not quite.

Keep in mind that the only reason index spam wasn't a problem with algorithmic search from 1998 to about 2002 was because it didn't (yet) drive commerce. Once Yahoo! really does start using that label data, and the black-hats catch on that tagging is being used to influence search results, what's going to stop an SEO from tagging affiliate pages for online casinos with "cooking"? Pretty much nothing. At that point the value of the labelled data is zilch. We'll have to resort to natural language techniques for summarization to automatically generate tags. Guess what? That's back to algorithmic information retrieval again.

So that's my $0.02. We're in a temporarily happy phase where "manual is the new algorithmic" (smile Coe). In a couple of years' time we'll be back to where we started. Enjoy it while it lasts.

Tags: