Friday, October 29, 2004

Firefox accounts for 19% of ZDNet users!

In the article Firefox aims for 10 percent of Web surfers, ZDNet says:

«ZDNet UK's own figures show that since the beginning of this year, there has been an increase in the percentage of site visitors using a Mozilla browser. In February, about 9 percent of site visitors were using a Mozilla-based browser; this increased to 19 percent in October. Over the same period, IE use decreased from 88 percent to 79 percent.»

Go Firefox!

Tuesday, October 26, 2004

Picasa and Hello

Two cool products from Google: Picasa is a digital photo organizing tool and Hello is an IM client to share photos with chat buddies. Too bad, they are windows-only.

Zero to (free) shopping cart in sixty minutes

Richard Soderberg demonstrates how a few disparate (and free) web services can be integrated into an e-commerce site. Very Cool!

A testament to the power of the Web.

Bush and Kerry are related?

Both Kerry and Bush are related going back several generations ago (16th century) according to this ancestry.com genealogical chart.

Friday, October 22, 2004

Whitehouse and robots.txt

The Brad Blog has this story about how the Whitehouse might be attempting to "clean" the site up by removing some audio/video clips. Some commenters suggested that it's a moot point since the Wayback Machine archives everything anyway, and others corrected them saying multimedia content is not archived.

More interesting for me was one of the comments which said:

«Even for HTML files (such as the list of Coalition members) it would be a simple matter for the White House to instruct the Wayback Machine to remove them from its archive using robots.txt, like it has already done for most Iraq-related documents.»
So, I was curious and got the robots.txt file [1].
$ wc -l whitehouse-gov-robots.txt
1972 whitehouse-gov-robots.txt

$ grep iraq robots.txt | wc -l iraq
835
So, "iraq" was mentioned in more than 42% of the lines in the file. Here are some of the lines:
Disallow:       /911/911day/iraq
Disallow:       /911/progress/iraq
Disallow:       /911/sept112002/iraq
Disallow:       /deptofhomeland/analysis/iraq
Disallow:       /deptofhomeland/iraq
Ok. Maybe the Whitehouse doesn't want the public to know about the President's public utterances about Iraq and 9/11, which might come back later to haunt him.
Disallow:       /firstlady/healthystart/iraq
Disallow:       /firstlady/iraq
Disallow:       /firstlady/whitehouselife/iraq
Disallow:       /firstlady/recipes/iraq
Hmm. I guess the First Lady has some recipes for Iraqi people, but doesn't want Google to index them.
Disallow:       /kids/barney/iraq
Disallow:       /kids/pets/iraq
Disallow:       /teeball/iraq
Disallow:       /tee-ball/iraq
Right! We don't want anyone to know what the Whitehouse has said about Kids and T-Ball in Iraq. That's highly sensitive material.
Disallow:       /vote/iraq
Ah, the truth has come out. Finally!

Looks like somebody went overboard and added /iraq to every folder on the site.

[1] For those who might not know what a robots.txt file is, it's a file maintained by website administrators to "suggest" web-crawling robots (like google) from scanning parts of their site which they don't want indexed. A well-behaved web-crawler is supposed to heed the suggestions.