Pagination Handling in Headless Architectures

Q: Should I use offset-based or cursor-based pagination for SEO?

Offset-based pagination producing /page/2/, /page/3/ paths is strongly preferred for SEO. It creates predictable, crawlable URLs that search engines can discover and re-crawl without complex state tracking. Cursor-based pagination suits API performance at scale but requires a secondary URL layer to remain crawlable.

Q: How do I handle noindex for paginated pages beyond page 1?

Apply noindex, follow to pages 2+. This preserves crawl budget by keeping Googlebot focused on the canonical first page, while the follow directive lets link equity pass through to deeper content pages.

Without an explicit pagination contract between your CMS API and frontend, crawlers encounter orphaned endpoints, infinite scroll traps, and index-bloating query-string duplicates. This page documents how to design the full pagination stack — from API contract to CDN header — so every /page/{n}/ route resolves predictably, carries the correct indexation signals, and doesn’t squander crawl budget on low-value archive pages.

Prerequisites

Before implementing pagination, confirm each item below is in place:

Node.js 20+ and the CLI for your framework (Next.js 14+, Nuxt 3.12+, SvelteKit 2+, or Astro 4+)
CMS_URL environment variable set to your headless CMS base endpoint in .env.local (and in your CI secret store)
CMS API returns pagination metadata — at minimum totalPages or a nextCursor field in every list response
Edge/CDN middleware support — Vercel Edge Middleware, Cloudflare Workers, or Netlify Edge Functions for header injection
Screaming Frog or similar crawler available for post-deploy validation
Google Search Console property verified so URL Inspection is accessible

Execution Path: Offset Pagination vs Cursor Pagination

The pagination strategy you choose at the API layer determines which URL shapes and indexation patterns are feasible downstream.

Step-by-Step Implementation

Step 1 — Define the API pagination contract

Create a TypeScript interface that normalises both offset and cursor responses into a single shape your route generators can consume:

// lib/pagination.ts
export interface PaginationMeta {
  totalPages: number;
  currentPage: number;
  nextCursor?: string;
  prevCursor?: string;
}

export interface PaginatedResponse<T> {
  items: T[];
  pagination: PaginationMeta;
}

export async function fetchPage<T>(
  endpoint: string,
  page: number
): Promise<PaginatedResponse<T>> {
  const res = await fetch(`${process.env.CMS_URL}${endpoint}?page=${page}&pageSize=10`);
  if (!res.ok) throw new Error(`CMS fetch failed: ${res.status}`);
  const raw = await res.json();
  // Normalise vendor-specific shapes
  return {
    items: raw.data ?? raw.items ?? raw.results,
    pagination: {
      totalPages: raw.totalPages ?? raw.meta?.total_pages,
      currentPage: page,
      nextCursor: raw.nextCursor,
      prevCursor: raw.prevCursor,
    },
  };
}

Validation: Call fetchPage('/articles', 1) in a test script and assert pagination.totalPages is a positive integer before wiring it into route generation.

Step 2 — Generate static routes at build time

Static pre-rendering of every /page/{n}/ route guarantees 200 responses and eliminates client-side routing fallbacks that block crawlers. This is the core technique covered in dynamic route generation for headless builds.

See framework-specific implementations in the next section.

Step 3 — Inject `rel=prev`/`next` and canonical per page

These link tags tell search engines the pagination sequence. Without them, each page appears to be standalone content competing with page 1.

Step 4 — Apply indexation directives to deep pages

Pages 2 and beyond rarely warrant independent index slots — the first page owns the keyword ranking. Set noindex, follow on page 2+ to consolidate signals and protect crawl budget in headless deployments.

Step 5 — Enforce canonical URL patterns at the edge

Query-string variants (?page=2) must redirect 301 to path-based equivalents (/page/2/). Apply this at the CDN layer before requests hit your origin. This is part of the broader canonical URL enforcement strategy.

Step 6 — Validate and monitor

Run curl -sI header checks, Lighthouse CI, and GSC URL Inspection against every /page/{n}/ route after each deploy. See the Validation Protocol section below.

Framework-Specific Implementations

Next.js App Router

generateStaticParams pre-renders all paginated routes at build time. Combine it with a <Head> component that injects the correct rel links:

// app/blog/page/[page]/page.tsx
import { fetchPage } from '@/lib/pagination';

export async function generateStaticParams(): Promise<Array<{ page: string }>> {
  const { pagination } = await fetchPage('/articles', 1);
  return Array.from({ length: pagination.totalPages }, (_, i) => ({
    page: (i + 1).toString(),
  }));
}

export default async function BlogPage({ params }: { params: { page: string } }) {
  const current = parseInt(params.page, 10);
  const { items, pagination } = await fetchPage('/articles', current);
  const base = 'https://seo-architecture.com/blog/page';

  return (
    <>
      <head>
        <link rel="canonical" href={`${base}/${current}/`} />
        {current > 1 && <link rel="prev" href={`${base}/${current - 1}/`} />}
        {current < pagination.totalPages && (
          <link rel="next" href={`${base}/${current + 1}/`} />
        )}
        {current > 1 && <meta name="robots" content="noindex, follow" />}
      </head>
      {/* render items */}
    </>
  );
}

SEO impact: Pre-renders all paginated routes as static HTML served from the CDN edge. Crawlers receive immediate 200 responses with correct rel signals already in the markup — no JavaScript execution required.

Validation: Inspect .next/server/app/blog/page/ — one directory per page number must exist. Run curl -s https://your-domain.com/blog/page/2/ | grep -E 'rel="(prev|next|canonical)"' to confirm tags are present in the raw HTML.

SvelteKit

SvelteKit’s +page.server.ts load function handles server-side page resolution and lets you inject X-Robots-Tag headers directly:

// src/routes/blog/page/[page]/+page.server.ts
import type { PageServerLoad } from './$types';
import { fetchPage } from '$lib/pagination';

export const load: PageServerLoad = async ({ params, setHeaders }) => {
  const current = parseInt(params.page, 10);
  const { items, pagination } = await fetchPage('/articles', current);

  if (current > 1) {
    setHeaders({ 'X-Robots-Tag': 'noindex, follow' });
  }

  return { items, pagination, current };
};

<!-- src/routes/blog/page/[page]/+page.svelte -->
<script lang="ts">
  export let data;
  const { items, pagination, current } = data;
  const base = 'https://seo-architecture.com/blog/page';
</script>

<svelte:head>
  <link rel="canonical" href="{base}/{current}/" />
  {#if current > 1}<link rel="prev" href="{base}/{current - 1}/" />{/if}
  {#if current < pagination.totalPages}<link rel="next" href="{base}/{current + 1}/" />{/if}
</svelte:head>

SEO impact: Header injection via setHeaders happens before the HTML response leaves the server — CDN and crawler both see X-Robots-Tag without any client-side dependency.

Validation: curl -sI https://your-domain.com/blog/page/2/ and assert x-robots-tag: noindex, follow appears in the response headers.

Nuxt 3

Nuxt’s useHead composable applies rel tags during SSR, ensuring they appear in the initial HTML payload rather than being injected by client JavaScript:

// pages/blog/page/[page].vue
<script setup lang="ts">
import { fetchPage } from '~/lib/pagination';

const route = useRoute();
const current = Number(route.params.page);
const { data } = await useAsyncData(`blog-page-${current}`, () =>
  fetchPage('/articles', current)
);

const base = 'https://seo-architecture.com/blog/page';

useHead({
  link: [
    { rel: 'canonical', href: `${base}/${current}/` },
    ...(current > 1
      ? [{ rel: 'prev', href: `${base}/${current - 1}/` }]
      : []),
    ...(data.value && current < data.value.pagination.totalPages
      ? [{ rel: 'next', href: `${base}/${current + 1}/` }]
      : []),
  ],
  meta: current > 1
    ? [{ name: 'robots', content: 'noindex, follow' }]
    : [],
});
</script>

SEO impact: SSR-rendered rel tags are visible in the raw HTML — no hydration lag means crawlers see correct pagination signals on the first parse.

Validation: nuxi generate then check dist/blog/page/2/index.html for rel="prev", rel="canonical", and <meta name="robots" content="noindex, follow"> in the <head>.

Astro

Astro’s built-in paginate() helper handles route generation and injects metadata automatically:

// src/pages/blog/[...page].astro
---
export async function getStaticPaths({ paginate }) {
  const res = await fetch(`${import.meta.env.CMS_URL}/articles?pageSize=10&page=1`);
  const { items, pagination } = await res.json();
  // Fetch remaining pages
  const allItems = items; // simplified — fetch all in your real implementation
  return paginate(allItems, { pageSize: 10 });
}

const { page } = Astro.props;
const isDeepPage = page.currentPage > 1;
---
<html lang="en">
<head>
  <link rel="canonical" href={page.url.current} />
  {page.url.prev && <link rel="prev" href={page.url.prev} />}
  {page.url.next && <link rel="next" href={page.url.next} />}
  {isDeepPage && <meta name="robots" content="noindex, follow" />}
</head>

SEO impact: paginate() generates /blog/1/, /blog/2/ routes with page.url.prev and page.url.next automatically populated — reducing manual routing errors common in hand-rolled implementations.

Validation: astro build then verify dist/blog/ contains one directory per page number, each with a correct index.html.

HTTP Headers and CDN Directives

The table below documents every header relevant to paginated headless routes. Configure these at the CDN or edge middleware layer so they apply to all responses, regardless of framework:

Header	Required value	Rationale
`X-Robots-Tag`	`noindex, follow`	Applied on page 2+ as a server-side fallback when the `<meta>` tag may be delayed by hydration
`Link` (canonical)	`<https://domain.com/page/{n}/>; rel="canonical"`	HTTP-level canonical signal, picked up by crawlers before HTML parsing
`Link` (pagination)	`<…/page/1/>; rel="prev"`, `<…/page/3/>; rel="next"`	Explicit pagination sequence for cross-engine compatibility
`Cache-Control`	`public, max-age=86400, stale-while-revalidate=604800`	Serves page 1 from CDN cache; `stale-while-revalidate` prevents crawler timeouts during revalidation
`Vary`	`Accept-Encoding`	Prevents cache poisoning when serving both brotli and gzip variants

Redirect rule (Cloudflare Workers or Vercel middleware):

// middleware.ts (Next.js / Vercel Edge)
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

export function middleware(request: NextRequest) {
  const url = request.nextUrl.clone();
  const pageParam = url.searchParams.get('page');

  // Redirect ?page=2 → /page/2/
  if (pageParam && url.pathname.startsWith('/blog')) {
    url.searchParams.delete('page');
    url.pathname = `/blog/page/${pageParam}/`;
    return NextResponse.redirect(url, 301);
  }

  return NextResponse.next();
}

export const config = {
  matcher: ['/blog/:path*'],
};

This redirect rule is an extension of the redirect chain management patterns that apply across all headless routing scenarios.

Sitemap Chunking for Paginated Routes

Including all /page/{n}/ routes in your XML sitemap generation pipeline is essential for discovery on sites with deep archives. However, sitemap files cap at 50,000 URLs — split paginated routes into a dedicated sitemap-paginated.xml and reference it from your sitemap_index.xml:

<!-- sitemap_index.xml -->
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://seo-architecture.com/sitemap-articles.xml</loc>
    <lastmod>2026-06-22</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://seo-architecture.com/sitemap-paginated.xml</loc>
    <lastmod>2026-06-22</lastmod>
  </sitemap>
</sitemapindex>

Only include page 1 of each section in sitemaps. Pages 2+ carry noindex and their inclusion in the sitemap sends contradictory signals to Googlebot.

Validation Protocol

Run these checks after every deploy touching pagination routes:

1. Header audit (all paginated routes)

for page in 1 2 3 5 10; do
  echo "=== /blog/page/${page}/ ==="
  curl -sI "https://your-domain.com/blog/page/${page}/" \
    | grep -iE '(x-robots-tag|link:|cache-control|location|http/)'
done

Expected output for page 2+: x-robots-tag: noindex, follow and no location: (meaning no unexpected redirect).

2. Markup verification

curl -s "https://your-domain.com/blog/page/2/" \
  | grep -E '<(link|meta)[^>]+(rel="(canonical|prev|next)"|name="robots")[^>]*/>'

All four tags (canonical, prev, next, robots) must appear in the raw HTML — not injected post-hydration.

3. Redirect chain check

# Confirm ?page=2 redirects 301 to /page/2/
curl -sI "https://your-domain.com/blog?page=2" \
  | grep -E '(HTTP/|location:)'

Expected: HTTP/2 301 followed by location: .../blog/page/2/.

4. GSC URL Inspection

Use the Google Search Console URL Inspection API to confirm page 1 is indexed and pages 2+ return EXCLUDED with reason NOINDEX.

5. Lighthouse CI threshold

Set a custom Lighthouse CI audit in your CI pipeline to catch regressions:

# lighthouserc.yml
ci:
  assert:
    assertions:
      canonical: ['error', { minScore: 1 }]
      robots-txt: ['error', { minScore: 1 }]

Troubleshooting

Symptom	Root cause	Fix
Pages 2–N return 404 after deploy	`generateStaticParams` fetched page 1 only and `totalPages` resolved to 1	Assert `totalPages > 0` and log the raw CMS response during build; check `CMS_URL` env var is set in CI
`rel="next"` missing from `<head>`	Tag injected by client JS after hydration — not present in initial HTML	Move `rel` injection to SSR layer (`useHead`, `getServerSideProps`, or `+page.server.ts`)
Query-string URLs (`?page=2`) being indexed	301 redirect rule not applied at edge before HTML response	Deploy middleware redirect and verify with `curl -sI` — confirm `location` header present
Duplicate canonical URLs across page 2+	All pages setting canonical to page 1 without a unique self-canonical	Each page must carry its own self-referential canonical; page 1 is NOT the canonical for page 2
`X-Robots-Tag: noindex` applied to page 1	Off-by-one error in page comparison	Change `page >= 1` condition to `page > 1`; validate with header check script above
Sitemap includes all page numbers	`noindex` pages included in sitemap creates contradictory signals	Filter sitemap generator to `page === 1` only; run `xmllint` against generated sitemap
ISR revalidation exposes stale `totalPages`	New content published but page count not updated until next revalidation	Set `revalidate` to match CMS publish frequency; add on-demand revalidation webhook triggered by CMS publish events

Child Pages

Pagination SEO Best Practices for Headless APIs — canonical tag patterns, rel=prev/next header strategies, and crawl budget controls to prevent index fragmentation across API-driven archive pages.
Crawlable Pagination Without rel=next/prev in Headless — self-referencing canonicals, unique per-page metadata, and real server-rendered links now that Google ignores rel=next/prev.
Infinite Scroll SEO for Headless Product Listings — pairing the scroll UX with paginated crawlable URLs, History API sync, and server-rendered fallback anchors.

Frequently Asked Questions

Should I use offset-based or cursor-based pagination for SEO? Offset-based pagination producing /page/2/, /page/3/ paths is strongly preferred for SEO. It creates predictable, crawlable URLs that search engines can discover and re-crawl without complex state tracking. Cursor-based pagination suits API performance at scale but requires a secondary URL translation layer to remain crawlable.

How do I handle noindex for paginated pages beyond page 1? Apply noindex, follow to pages 2+. This preserves crawl budget by keeping Googlebot focused on the canonical first page, while the follow directive lets link equity pass through to deeper content pages.

Do headless frameworks automatically inject rel=prev/next? No. All major headless frameworks — Next.js, Nuxt, SvelteKit, Astro — require explicit rel=prev/next injection via their respective head-management APIs. The framework routes pages but does not add pagination link tags without configuration.

How does pagination affect Core Web Vitals in headless setups? Poorly implemented pagination causes CLS from dynamic content injection and delayed LCP from client-side data fetching. Pre-rendering /page/{n}/ routes as static HTML eliminates both problems by serving fully-rendered pages from the CDN edge.

Part of: Dynamic Routing & Indexation Workflows

Related

Dynamic Route Generation — build-time static path generation for archive and taxonomy routes
Canonical URL Enforcement — preventing duplicate indexation across parameterised URL variants
Slug Normalization Strategies — standardising URL shapes to eliminate redirect chains before they form
XML Sitemap Generation for Headless — chunking and submitting sitemaps that cover paginated archive routes
Crawl Budget Impact in Headless — allocating crawler quota across SSG, ISR, and server-rendered route types