Home >Knowledge Base

Web crawler

A web crawler, also known as a search engine robot, spider, Internet bot, or just crawler for short, is a software application that systematically browses and scans the Internet for the purpose of indexing pages. In general, a web crawler works by reading and identifying the hyperlinks on one page, then systematically browsing each hyperlink recursively.

Internet search engines and a few other types of sites make use of crawlers to refresh their index of pages of others web site's content. Web crawlers record and store the pages they visit (or crawl) which are then processed by a search engine. The engine indexes the content allowing for the fast and efficient searching that todays internet users have become accustom to.

Websites can insert specific code that will tell the crawler that they do not want to be indexed by search engines. This technique of staying invisible to web crawlers is often used for sites and applications that are still in development.