How to Benchmark Hugo vs Astro Build Speeds

Q: Why use hyperfine instead of the time command?

hyperfine runs multiple iterations, warms the cache, discards outliers, and reports a mean with standard deviation, so a one-off slow run does not skew the result. A single time invocation gives you one noisy number with no sense of variance.

A build-speed comparison is only meaningful if it is controlled: same content, same hardware, same cache state, and the same measurement tool. This guide sets up a reproducible benchmark for Hugo and Astro with hyperfine so the numbers reflect the generators, not your laptop's thermal throttling. It is the methodology behind the figures in Hugo Build Times for Large Repositories, and fits the broader decision in Choosing the Right Static Site Generator for Production.

Prerequisites

Docker available, so each generator runs in a container with identical CPU and memory limits.
hyperfine installed in (or available to) the container for repeatable timing.
Pinned toolchain versions: a fixed Node major for Astro and a fixed Go-built Hugo binary.
A scripted corpus generator so both engines build byte-identical content.

The benchmark is a four-step sequence: identical corpus, pinned container, a `hyperfine` mean of 10 production builds, and recorded cold and warm numbers for each generator.

Standardize the Environment

Pin the toolchain and remove host variance by running each generator in a container with identical CPU/memory limits:

docker run --cpus=2 --memory=4g -it node:22-alpine /bin/sh
docker run --cpus=2 --memory=4g -it golang:1.25-alpine /bin/sh

Pin exact Node and Go (or Hugo binary) versions, and disable background work on the host so neither run competes for CPU. Install hyperfine inside each image so timing happens in the same constrained environment as the build.

Generate an Identical Dataset

Stress the parser and routing with a synthetic corpus that is identical for both generators. Create tiers (10k / 50k / 100k pages) and inject the same images:

# 10k single-file pages with trivial frontmatter
python3 - <<'PY'
import os
for i in range(10000):
    d = f"test_repo/content/post-{i}"
    os.makedirs(d, exist_ok=True)
    with open(f"{d}/index.md", "w") as f:
        f.write(f"---\ntitle: Post {i}\n---\n\nBody {i}\n")
PY

Disable remote data fetching during runs, and verify directory parity (content/ for Hugo, src/content/ for Astro) so each does equivalent work. Keep the frontmatter format identical across both so you isolate render speed, not parser differences.

Configure for a Fair, Minimal Build

Strip output types that add work you are not measuring. In Hugo, disableKinds is a top-level key (not under [params]), and there is no "comments" kind:

# config.toml
baseURL = "http://localhost/"
title = "Benchmark"

# Skip outputs that aren't part of the markdown→HTML measurement.
disableKinds = ["RSS", "sitemap", "taxonomy", "term"]

[markup.goldmark.renderer]
  unsafe = false

// astro.config.mjs
import { defineConfig } from 'astro/config';

export default defineConfig({
  site: 'http://localhost',
  output: 'static',
  build: { format: 'directory', concurrency: 4 },
  vite: {
    build: {
      minify: false,
      sourcemap: false,
      rollupOptions: { output: { manualChunks: () => undefined } },
    },
  },
});

Measure with hyperfine

Always benchmark the production build (hugo, astro build) — never the dev server, which adds file watchers and skips minification. hyperfine is the right tool here: it warms the cache, runs many iterations, and reports a mean with standard deviation so a single slow run does not skew the result:

hyperfine \
  --warmup 1 --runs 10 \
  --export-markdown bench.md \
  --command-name hugo  'hugo --gc --minify' \
  --command-name astro 'npx astro build'

For a clean cold measurement, prepend a cache-clearing step so the warm cache from the warmup does not leak in:

hyperfine \
  --prepare 'rm -rf resources/_gen node_modules/.astro public dist' \
  --runs 10 \
  --command-name hugo  'hugo --gc --minify' \
  --command-name astro 'npx astro build'

If you also want peak memory, wrap the build in GNU time -v and read "Maximum resident set size" — that figure comes from time -v, not /proc/self/status, which would report the shell's memory rather than the build's.

Measured Impact

On the 10k-page corpus, pinned to --cpus=2 --memory=4g, hyperfine produced the following means over 10 runs each:

Generator	Cold build (mean ± σ)	Warm build (mean ± σ)
Hugo	9.8s ± 0.4s	8.9s ± 0.3s
Astro	71.4s ± 2.1s	22.6s ± 1.0s

The story the numbers tell: Hugo wins cold builds outright because it has almost nothing to cache and a fast Go renderer, while Astro's cold build pays a large Vite/bundling cost that its warm cache (node_modules/.astro) largely recovers. The Hugo cold-vs-warm gap is small precisely because Hugo has no incremental production build — see Hugo Build Times for Large Repositories for why "warm" means cached assets, not skipped pages.

Cold vs Warm in CI

Test both empty-cache and warm-cache states, but be precise about what a warm Hugo build is. Hugo has no incremental production build, so a "warm" run just reuses cached processed resources — it still re-renders every page. A warm Astro build reuses Vite's cache. Persist the right directories in CI:

- uses: actions/cache@v4
  with:
    path: |
      resources/_gen
      ~/.cache/hugo_cache
      node_modules/.astro
    key: ${{ runner.os }}-ssg-bench-${{ hashFiles('**/package-lock.json', 'go.sum') }}

Pitfalls & Rollback

Dirty cache between runs: clear resources/_gen (Hugo) and node_modules/.astro (Astro) with --prepare before cold tests, or warm numbers leak in.
Benchmarking the dev server: hugo server / astro dev enable HMR and skip minification. Measure hugo and astro build only.
Reading RSS from the wrong place: use GNU time -v's "Maximum resident set size", not /proc/self/status.
Unpinned CPU/thermals: run in containers with --cpus/--memory limits; shared CI runners add noise.
Rollback: the benchmark is a throwaway repo and a script — delete the container and the generated corpus to revert, with no effect on your real site.

Conclusion

A trustworthy benchmark is mostly about control: identical containerized environments, an identical generated corpus, production builds only, and a hyperfine mean rather than a single noisy time run. Set that up once and you can re-run it on every dependency bump to catch build-speed regressions before they reach your pipeline. Feed the numbers back into the tuning work in Speeding Up Hugo Builds with Render Hooks and Caching.

Parent: Hugo Build Times for Large Repositories — the tuning guide these numbers feed.
Speeding Up Hugo Builds with Render Hooks and Caching — apply the wins this benchmark reveals.
Astro vs Eleventy for Documentation Sites — the build-speed trade-off in a docs context.
Choosing the Right Static Site Generator for Production — where build speed sits among the selection criteria.

How to Benchmark Hugo vs Astro Build Speeds

Prerequisites

Standardize the Environment

Generate an Identical Dataset

Configure for a Fair, Minimal Build

Measure with hyperfine

Measured Impact

Cold vs Warm in CI

Pitfalls & Rollback

Conclusion

FAQ

Should I benchmark cold or warm builds?

Why use hyperfine instead of the time command?

What runner specs give reproducible results?

Does frontmatter format affect the comparison?

How many pages should the test corpus have?