Our Crawler (
SpiderMonkey)visits URLs during server off-peak load times and feeds the result to the cluster's index. The main database is refreshed frequently. The temp. database is minimally crawled on a schedule.
SpiderMonkeyis a Canadian Web search project first coded in 1993 and abides the original Robot Exclusion Standard. Specifically, SpiderMonkey adheres to the initial 1994 Robots Exclusion Standard but where the 1996 non-ratified standard superceded the 1994 standard, the proposed standard is followed. The /robots.txt is a de-facto standard and is not owned by any standards body but the universal context is explained by using the tools linked from this page.SpiderMonkey will obey the first record in the robots.txt file with a User-Agent containing "SpiderMonkey". If there is no such record,
it will obey the first entry with a User-Agent of