Self-Hosting SearXNG
Every time you type into Google, you file a deposition. The query graph — what you searched, when, from where, in what order — is worth more than the answers you got back. You are not the customer. You are the corpus.
This weekend I moved my daily driver search to a self-hosted SearXNG instance at search.c0xl.ch. Here is the reasoning, the architecture, and the three mistakes I avoided.
"Self-hosted search engine" is three different projects
People conflate them. They shouldn't.
- Meta-search — a frontend that fans your query out to Google, Bing, Brave, DuckDuckGo, Mojeek, and twenty others in parallel, deduplicates, reranks, strips tracking. No crawler, no index. Tool: SearXNG.
- Own crawler + own index — you scrape the web and build a real index. Tool: YaCy. Honest assessment: result quality oscillates between mediocre and warum habe ich das getan. Only defensible if sovereignty over the index itself is the point.
- Search over your own content — your blog, your wiki, your docs. Tool: Meilisearch or Typesense. Not a web search engine. A search service for your stack.
Most people saying "I want to self-host search" mean (1). They discover this about six months in, after they tried (2) and gave up. Save the six months.
Architecture
SearXNG itself is laughably small. One Python container, one Valkey (the BSD-licensed Redis fork that matters since the 2024 SSPL relicense) for the rate limiter and the link-token mechanism that distinguishes browsers from bots. Put it behind Caddy with HSTS and strict security headers. Done.
The only architectural choice worth arguing about is the Docker network layout. The naive pattern — every service publishes a port on 127.0.0.1, Caddy proxies to 127.0.0.1:${SERVICE_PORT} — works, and also means:
- Every service's port is a shared global variable.
- Nothing is actually network-isolated; everything sits on the default bridge with everything else.
- Caddy needs host-level network access to each service.
The correct pattern is one pre-created Docker network — call it proxy-net — that Caddy joins, and that every reverse-proxied service joins. No port publishing, no 127.0.0.1, no port conflicts. Caddy resolves services by container name through Docker's embedded DNS:
search.c0xl.ch {
reverse_proxy searxng:8080
}
That is it. searxng is a hostname because Docker makes it one. Rename the container, update two lines. No hardcoded IPs live anywhere.
Three things that will bite you
The limiter is not optional. Disable server.limiter: true and a single crawler finding your instance will burn your upstream engines within hours — Google and Bing start returning CAPTCHAs, then 429s, then silence. The limiter plus Valkey backend is what distinguishes "a private search instance" from "an involuntary open proxy."
The secret key is not optional either. SearXNG refuses to boot without a real server.secret_key. By design: the default value in the example config is a sentinel that crashes on startup, specifically so you cannot deploy without rotating it. Generate with openssl rand -hex 32. Paste. Move on.
Publishing the instance publicly is almost always wrong. A public SearXNG instance requires exit-IP rotation, aggressive rate limiting, bot detection tuned past the defaults, and someone to babysit when Google blocks your egress. A private instance, reachable only through your WireGuard mesh or behind a @allowed remote_ip block in Caddy, requires none of that. Decide deliberately. There is no "just put it online" option that survives six months of unattended operation.
Engines worth enabling
The defaults are serviceable. Three additions pay back immediately:
- Brave as a general engine. Independent index, doesn't rate-limit the way Google does.
- A Linux cluster — Arch Wiki, Gentoo Wiki, Stack Exchange (
stackoverflow,unix,askubuntu),man, GitHub. Bangs:!al,!ge,!st,!ux,!man,!gh. - Your own content via
json_engine. If your blog exposes a JSON search API, SearXNG can fold your own writing into the result set.
Firefox integration is an OpenSearch descriptor. Add https://search.c0xl.ch/search?q=%s as a custom engine, assign keyword c0, and every address-bar query routes through your own infrastructure.
What this actually defends against
Be honest about the threat model. Self-hosted search does not hide you from Google if you log into Gmail five minutes later. It does not anonymize you from a TLS-inspecting corporate proxy. What it does:
- Breaks the session graph. Google sees one egress IP making many queries on behalf of many things. The semantic clustering that makes ad targeting work falls apart.
- Removes a third party from your tool-calling LLM stack. If you run local inference and want it to search,
search.c0xl.ch/search?q=...&format=jsonbeats routing through Tavily or SerpAPI on privacy, latency, and — at this scale — cost. - Forces you to understand your own query patterns. You become the operator, not the product.
The last one is underrated. After a week of operating your own search you notice which upstream engines return useful results and which return SEO slurry. You notice how many of your queries are documentation lookups that never needed a general web search. You start curating. The feedback loop runs the correct direction for once