Skip to main content

Pattern · TypeScript

robots.txt and robots metadata in Next.js App Router

Static robots via app/robots.ts and per-route noindex via generateMetadata.

Level: beginnerTime estimate: ~25 min

Global rules live in robots.ts; draft or utility pages opt out via route metadata.

  • Do not send conflicting signals between robots.txt and meta robots
  • the sitemap URL in robots must be canonical
  • preview deployments often need full noindex

Crawlers read both robots.txt and <meta name="robots">. Split responsibilities: the file for site-wide rules, metadata for pages that must not be indexed.

Code

app/robots.ts in this project:

import type { MetadataRoute } from 'next'
import { getSiteUrl } from '@/lib/site'

const host = getSiteUrl().replace(/\/$/, '')

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: '*', allow: '/' },
      { userAgent: 'GPTBot', allow: '/' },
      { userAgent: 'Google-Extended', allow: '/' },
      { userAgent: 'CCBot', allow: '/' },
      { userAgent: 'anthropic-ai', allow: '/' },
      { userAgent: 'Claude-Web', allow: '/' },
      { userAgent: 'ClaudeBot', allow: '/' },
      { userAgent: 'PerplexityBot', allow: '/' },
      { userAgent: 'Applebot-Extended', allow: '/' },
    ],
    sitemap: `${host}/sitemap.xml`,
    host,
  }
}

Per-route noindex for a draft:

import type { Metadata } from 'next'

export async function generateMetadata(): Promise<Metadata> {
  return {
    title: 'Draft',
    robots: { index: false, follow: false },
  }
}

Verification

  • GET /robots.txt returns plain text with User-agent rules, optional Disallow, and a single Sitemap: line with the canonical HTTPS sitemap URL.
  • Excluded pages expose noindex in HTML and are omitted from the sitemap when the policy is full exclusion.

Sources

Need this implemented for your domain and stack?

Short form: name, phone, and site. After you submit, we reply with next steps and a phase outline; details are refined on a call.