Serpwise AI

A technical look at how the Serpwise gateway intercepts requests, evaluates rules, and modifies responses.

The Request Pipeline

Every request that flows through Serpwise follows the same pipeline inside the gateway. Understanding this pipeline helps you write effective rules and diagnose unexpected behavior.

Incoming Request
      │
      ▼
1. Domain Resolution     ← Identify which domain config to load
      │
      ▼
2. Shield Check          ← Block exploit paths, check IP blacklist
      │
      ▼
3. Bot Detection         ← Identify crawlers and AI bots
      │
      ▼
4. Redirect Check        ← Short-circuit with redirect if matched
      │
      ▼
5. Origin Fetch          ← Forward request to your origin server
      │
      ▼
6. Rule Evaluation       ← Match rules against request + response
      │
      ▼
7. HTML Modification     ← Apply matched rule actions to the HTML
      │
      ▼
8. Response Return       ← Send modified response to visitor/CDN

Non-HTML responses (images, fonts, JSON APIs, videos) are streamed directly from origin without modification — the gateway only processes HTML documents.

Domain Resolution

When a request arrives, the gateway identifies which domain configuration to load. It matches against either:

The Host header (e.g. www.example.com) — used in production with CDN routing
The proxy subdomain (e.g. abc123.edge.serpwise.ai) — used during setup and testing

Domain configurations are cached in-memory inside the gateway. When you save changes in the dashboard, Serpwise automatically invalidates the cache for that domain so the next request picks up the new config.

Shield Check

Before touching the origin, the Serpwise Shield evaluates the incoming request path against a list of known exploit patterns (.env, wp-config.php, xmlrpc.php, etc.).

If a match is found:

The request is blocked immediately (no origin hit)
The source IP is added to an automatic blackhole for 24 hours
All subsequent requests from that IP return a 403 during the ban period

Bot Detection

The gateway maintains a registry of 50+ known crawlers and bots. Each request's User-Agent header is matched against this registry to determine:

Whether the request is from a bot
Which bot it is (e.g. Googlebot, Bingbot, GPTBot, ClaudeBot)
Whether it's an AI training crawler

This information is available as rule conditions (request.is_bot, request.bot_name) and is recorded in request logs for analytics.

Redirect Check

Before fetching from origin, the gateway checks the request path against your configured redirects. Two tiers of matching are evaluated in order:

Exact match — O(1) hash map lookup. If the full path matches a redirect, the gateway returns the redirect response immediately without contacting origin.
Regex match — Evaluated in priority order. The first matching regex pattern wins, and capture groups can be used in the destination URL.

This short-circuit behavior means redirects have minimal latency overhead.

Origin Fetch

If no redirect matches, the gateway forwards the request to your configured origin server. The origin connection supports:

HTTP and HTTPS protocols
Custom host headers
Configurable port
TLS/SSL verification toggle (useful for self-signed certs in staging environments)
Configurable connection timeout

The gateway sends the original request headers to origin, including User-Agent, so your origin can distinguish bots from humans if needed.

Rule Evaluation

After receiving the origin response, the gateway evaluates your enabled rules (ordered by priority) against the combined context of the request and response.

A rule matches when its condition group evaluates to true. Conditions can check:

Request properties (URL path, query parameters, method, headers, user agent, bot identity)
Response properties (status code, content type, response headers, body content)

Multiple conditions within a rule can be combined with AND or OR logic, and condition groups can be nested.

Only rules where the condition matches are applied. If multiple rules match, they are all applied in priority order.

HTML Modification

The gateway uses an HTML parser to apply rule actions to the response body. The parser operates on the full HTML document, so actions can target any element:

Head modifications — meta tags, link elements, script injections are inserted in the <head>
Body modifications — HTML snippets can be injected at the start or end of <body>
Element removal — any element matching a CSS selector can be removed
Find & replace — text substitution across the entire document, with optional regex support
Structured data — JSON-LD scripts are parsed and can be injected or modified

After all matching rules are applied, the modified HTML is returned to the visitor.

Gateway Cache

Domain configurations (rules, redirect lists, origin settings, security settings) are loaded from our central database and cached locally for performance.

Cache invalidation happens automatically: whenever you save changes in the dashboard, the affected domain's cache is refreshed instantly. The next incoming request uses the updated configuration.

The in-memory cache means the gateway never queries the database on the hot path. All per-request decisions are made from in-memory data, keeping latency minimal.

How It Works