What Haunt can extract.
Haunt is strongest when you send a public URL plus a plain-English prompt and want structured JSON back. Use this guide as a current capability map, not a promise that every target will work.
Good fit.
Use Haunt where public or authorised content is visible enough to support evidence-backed extraction.
Company websites
Extract names, descriptions, positioning, pricing links, product text, metadata, and public business facts.
Docs and API pages
Turn public docs, API references, OpenAPI JSON, changelogs, and reference pages into structured JSON.
GitHub repositories
GitHub repositories and public metadata are a strong fit through dedicated API-backed routes and normal extraction.
Product and pricing pages
Extract public plan names, prices, product lists, descriptions, and table-shaped content when present in the page.
JSON, XML, OpenAPI
Structured source formats are among Haunt's strongest source categories.
Reddit and HN public routes
Use dedicated routes where public data paths are stable instead of forcing a generic browser scrape.
Sometimes useful, with boundaries.
| Target | What works | Boundary |
|---|---|---|
| LinkedIn company pages | Public metadata, title, description, company shell, sometimes website and employee signals. | No people scraping, comments, members, logged-in views, or post history promises. |
| Instagram business profiles | OpenGraph/profile metadata when exposed publicly. | Metadata only. Not followers, comments, private media, or logged-in content. |
| Facebook business pages | Partial public page-card metadata when visible. | Not a reliable social feed scraper. |
| Heavy JavaScript apps | Sometimes works through bounded browser rendering. | Timeouts and thin shells are possible. |
Red lanes.
X/Twitter: unsupported for generic extraction because of heavy anti-bot and login-wall behaviour.
CAPTCHA and human verification: clean failure. Haunt is CAPTCHA-aware and returns an explicit human-verification error instead of guessing.
Private/login-only pages: Pro/Scale authorised extraction only. Use sessions, cookies, or headers you are allowed to provide.
Personal profiles, comments, groups, followers: not a product promise.
POST /v1/extract GET /v1/github/repo?owner=browserbase&repo=stagehand GET /v1/hackernews/item/40310896 POST /v1/company/enrich POST /v1/reddit POST /v1/reddit/comments
Choose the job page that matches your visitor.
Use focused pages instead of making people decode the whole product.