The Brad Blog has this
story about how the Whitehouse might be attempting to "clean" the site up by removing some audio/video clips.
Some commenters suggested that it's a moot point since the
Wayback Machine archives everything anyway, and others corrected them saying multimedia content is not archived.
More interesting for me was one of the comments which said:
«Even for HTML files (such as the list of Coalition members) it would be a simple matter for the White House to instruct the Wayback Machine to remove them from its archive using robots.txt, like it has already done for most Iraq-related documents.»
So, I was curious and got the
robots.txt file [1].
$ wc -l whitehouse-gov-robots.txt
1972 whitehouse-gov-robots.txt
$ grep iraq robots.txt | wc -l iraq
835
So, "iraq" was mentioned in more than 42% of the lines in the file. Here are some of the lines:
Disallow: /911/911day/iraq
Disallow: /911/progress/iraq
Disallow: /911/sept112002/iraq
Disallow: /deptofhomeland/analysis/iraq
Disallow: /deptofhomeland/iraq
Ok. Maybe the Whitehouse doesn't want the public to know about the President's public utterances about Iraq and 9/11, which might come back later to haunt him.
Disallow: /firstlady/healthystart/iraq
Disallow: /firstlady/iraq
Disallow: /firstlady/whitehouselife/iraq
Disallow: /firstlady/recipes/iraq
Hmm. I guess the First Lady has some recipes for Iraqi people, but doesn't want Google to index them.
Disallow: /kids/barney/iraq
Disallow: /kids/pets/iraq
Disallow: /teeball/iraq
Disallow: /tee-ball/iraq
Right! We don't want anyone to know what the Whitehouse has said about Kids and T-Ball in Iraq. That's highly sensitive material.
Disallow: /vote/iraq
Ah, the truth has come out. Finally!
Looks like somebody went overboard and added /iraq to every folder on the site.