Edge Crawler & Indexer
Automatically warm your edge cache and build a structural index of your pages for AI features.
The Edge Crawler & Indexer is an asynchronous background system powered by Bastio AI (Firecrawl). It systematically navigates your website through the SerpWise Edge Proxy to accomplish two major goals:
- Cache Warming: By visiting your URLs, the crawler forces a "Cache Miss", causing SerpWise to run your rules and store the final HTML in memory. When real visitors or Googlebot arrive later, they get instant "Cache Hits".
- Page Indexing: The crawler extracts clean, LLM-ready Markdown, raw HTML, and Meta Titles from every page it visits. This structured index powers future SerpWise features (like AI Product Feeds, automatic XML Sitemap generation, and cross-site context for Gemini).
Credit Usage: The Edge Crawler consumes 1 Credit per successfully crawled and indexed page.
Setting Up Crawlers
You can configure multiple independent crawlers for a single domain (e.g., a "Daily Products Crawl" and a "Weekly Blog Crawl").
- Navigate to your Domain Dashboard and click the Edge Crawler tab.
- Click Add Crawler.
Configuration Options
- Name: A friendly identifier (e.g., "XML Sitemap Warmer").
- Schedule: Choose how often this crawler should run automatically (
Daily,Weekly,Monthly, orManual Only). - Start URL: The entry point for the crawler. We highly recommend using an XML Sitemap URL (e.g.,
https://example.com/sitemap_products.xml), as this is the most efficient way to discover all your important pages. You can also use your homepage (/). - Max Pages Limit: A safety ceiling. The crawler will stop once it hits this many pages, preventing you from accidentally burning through all your credits on an infinitely deep website.
- Max Depth: How many "clicks" deep from the Start URL the crawler should go. If you use a Sitemap, a Depth of
1is usually sufficient. - Include / Exclude Paths: You can restrict the crawler to only visit specific sections of your site (e.g., Include
*/products/*, Exclude*/checkout/*).
Manual Execution
Even if a crawler is on a schedule, you can trigger it instantly at any time:
- Find the crawler in the Saved Crawlers table.
- Click the Run Now (Play) button.
- A banner will appear indicating the crawler is running, and you'll see the "Pages" count update in real-time as webhooks arrive.
The Page Index
Below the crawler configurations, you will find the Indexed Pages table. This is a unified view of every URL the crawler has successfully discovered, fetched, and saved to the database.
It displays:
- The URL Path
- The final HTTP Status Code (returned after SerpWise rules are applied)
- The extracted
<title>tag - The relative time since it was last updated
When SerpWise generates automated XML sitemaps or AI-powered Meta Suggestions, it queries this exact database table instead of re-scraping your site, saving you credits and processing time.
What Crawling Enables
The page index built by the Edge Crawler powers a wide range of URL Intelligence features:
SEO Audit Engine
Automated 50+ point checks run on every crawled page — scoring, categories, and actionable fixes.
Content Analysis
Word count, readability, keyword extraction, and duplicate detection for every indexed page.
Internal Link Graph
Map internal link structure, find orphan pages, and measure link depth.
Change Detection
Detect SEO-critical changes between crawls — title, canonical, robots, and content modifications.
Sitemaps
Auto-generate XML and HTML sitemaps from your crawled page index.
Product Feeds
Generate product feeds from pages with Product schema markup.