Robot tags

Robot tags are HTML meta tags or HTTP headers used to control how search engines crawl and index specific pages on a website. They guide web crawlers on whether to index a page or follow its links.

Common directives in robot tags include:

noindex: Tells search engines not to index the page.
nofollow: Prevents crawlers from following links on the page.
noarchive: Stops search engines from storing a cached version of the page.
nosnippet: Prevents search engines from showing snippets or descriptions of the page in search results.

Unlike robots.txt, which sets rules at a site-wide level, robot tags provide more granular control over individual pages. For example, you can use <meta name="robots" content="noindex, nofollow"> to exclude a single page from being indexed or followed.

Robot tags are helpful for managing crawl budget, as they prevent search engines from wasting resources on low-value or irrelevant pages. They also help address index bloat by ensuring only important pages are indexed.

Proper use of robot tags supports good website architecture by keeping the focus on essential content and ensuring search engines only crawl what matters. However, overusing these tags can unintentionally hide valuable pages.