Structured extraction pass rate from the current 100-case production benchmark.
What Haunt can extract.
Haunt is strongest when you send a public URL plus a plain-English prompt and want structured JSON back. The current production benchmark showed 88/100 structured extraction cases passing after hardening.
That number is useful because it has edges. Haunt is good at normal public web pages, docs, APIs, product pages, metadata, GitHub repositories, and dedicated Reddit/HN-style public data paths. Haunt is not a magic login-wall, CAPTCHA, X/Twitter, Facebook, Instagram, or LinkedIn people-scraping machine.
Average latency in the benchmark. Heavy pages can still take longer.
Observed P95 latency. This is an API, not a fake instant oracle.
Green lanes
Company websites
Extract names, descriptions, positioning, pricing links, product text, metadata, and public business facts.
Docs and API pages
Turn public docs, API pages, OpenAPI JSON, changelogs, and reference pages into structured JSON.
GitHub repositories
GitHub repositories and public metadata are a strong fit through dedicated API-backed routes and normal extraction.
Product and pricing pages
Extract public plan names, prices, product lists, descriptions, and table-shaped content when present in the page.
JSON, XML, OpenAPI
Structured source formats are among Haunt's strongest benchmark categories.
Reddit and HN public routes
Use dedicated routes where public data paths are stable instead of forcing a generic browser scrape.
Yellow lanes
| Target | What works | Boundary |
|---|---|---|
| LinkedIn company pages | Public metadata, title, description, company shell, sometimes website and employee signals. | No people scraping, comments, members, logged-in views, or post history promises. |
| Instagram business profiles | OpenGraph/profile metadata when exposed publicly. | Metadata only. Not followers, comments, private media, or logged-in content. |
| Facebook business pages | Partial public page-card metadata when visible. | Not a reliable social feed scraper. |
| Heavy JavaScript apps | Sometimes works through browser rendering. | Timeouts and thin shells are possible. |
Red lanes
| Target | Verdict | Reason |
|---|---|---|
| X/Twitter | Unsupported for generic extraction | Heavy anti-bot/login-wall behavior. Competitors often special-case this with paid data APIs. |
| CAPTCHA and human verification | Clean failure | Haunt is CAPTCHA-aware and does not bypass human-verification challenges. |
| Private/login-only pages | Pro/Scale authorised extraction only | Only use sessions, cookies, or headers you are allowed to provide. |
| Personal profiles, comments, groups, followers | Not a product promise | Risky, unreliable, and not Haunt's wedge. |
Use the right route
For ordinary pages, use /v1/extract. For cheap public intelligence, prefer dedicated routes when available:
GET /v1/github/repo?owner=browserbase&repo=stagehand
GET /v1/hackernews/item/40310896
POST /v1/company/enrich
POST /v1/reddit
POST /v1/reddit/comments
Honest limit: Haunt should be sold as structured extraction from public and authorised pages, not broad social scraping. That honesty is a feature, not a confession. For agent builders, use the focused AI agent web extraction path as the shortest route to demo, signup, and first call.
Ready to try it?
Start with the fixed demo response, then get a free key and make one safe first extraction.