Prevent Robots from accessing an action

11 September, 2006

Show comments

If, like on this site, you have some demo applications with some data in them, it's wise to try to ensure that the test data you spent an hour or two creating doesn't get deleted.

For the demos that were on this site (they'll be coming back after some tweaking) I set up a script to run once per hour to truncate and reimport the test data - thus ensuring that malicious data, should there be any, is removed and there is always some data to play with.

More than likely, most users who would visit a site with a demonstration arenĀ“t going to click a link marked "delete" unless they want to check it works. There are a group of site users however who will click all of your delete links thus leaving subsequent visitors with no data to play with.

Who is this group? Well it should be obvious, they are site-crawlers.

There's an easy way to avoid an innocent robot from deleting all your data, and that would be to use your robots.txt file to define where the robot can go like this:

User-agent: *    # applies to all robots

/*/delete/ # applies to any url that contains the 'folder' delete

Google will certainly honor such an entry, not sure regarding other crawlers, but I'll check the logs to see ;)