8 February 2005



For the past few months I've been randomly tweaking and subsequently corrupting my Web server. With all of this comes frequent browsing through the server logs, so I began cataloging all of the search engine bots that were passing through. You can get pretty familiar with the first couple of numbers in their IPs and quickly recognize sections in the logs that are all bot all the time.

I began to realize that, I swear, 90% of traffic is bot traffic. And it's a lot of traffic. Is this the case with the rest of the untravelled world--excluding the Amazons and the Microsofts or the BoingBoings and Instapundits? I began to think that there's a major section of the Internets that exists merely to be indexed. Millions of pages being created then crawled, created then crawled, ad infinitum. And there are so many search engine bots that I'd never heard of: Convera, Become, Kinja. ?!?

But now, I understand the difference in weight of each request. Hundreds of search engine hits are required to allow one person to find that single page where you describe, say, how to configure Tomcat on IIS. The extra work required to allow a single page to be found and requested cannot be compared in volume to that single page. There's no equivalence: bots grab syntax, people grab semantics. The Internet is not devouring itself or existing merely to continue existing (although I'm still very tempted to go the social construction [Wikipedia] route here: would I need to search for instructions on settup up Tomcat on IIS if I didn't have a Web site that required that information? And without Web sites, we don't need Web crawlers to index those pages to tell us how to set up Web sites.).


[ posted by sstrader on 8 February 2005 at 4:23:14 PM in Culture & Society ]