capability proof

What Haunt can extract.

Haunt is strongest when you send a public URL plus a plain-English prompt and want structured JSON back. Use this guide as a current capability map, not a promise that every target will work.

capability map
Green lanes: normal public pages, docs, APIs, product pages, metadata, GitHub repositories, Reddit, and HN public routes.
Yellow lanes: public social or JavaScript-heavy pages where visible content may be partial.
Red lanes: login walls, human verification, private data, and unsupported social profile scraping.
green lanes

Good fit.

Use Haunt where public or authorised content is visible enough to support evidence-backed extraction.

Company websites

Extract names, descriptions, positioning, pricing links, product text, metadata, and public business facts.

Docs and API pages

Turn public docs, API references, OpenAPI JSON, changelogs, and reference pages into structured JSON.

GitHub repositories

GitHub repositories and public metadata are a strong fit through dedicated API-backed routes and normal extraction.

Product and pricing pages

Extract public plan names, prices, product lists, descriptions, and table-shaped content when present in the page.

JSON, XML, OpenAPI

Structured source formats are among Haunt's strongest source categories.

Reddit and HN public routes

Use dedicated routes where public data paths are stable instead of forcing a generic browser scrape.

yellow lanes

Sometimes useful, with boundaries.

TargetWhat worksBoundary
LinkedIn company pagesPublic metadata, title, description, company shell, sometimes website and employee signals.No people scraping, comments, members, logged-in views, or post history promises.
Instagram business profilesOpenGraph/profile metadata when exposed publicly.Metadata only. Not followers, comments, private media, or logged-in content.
Facebook business pagesPartial public page-card metadata when visible.Not a reliable social feed scraper.
Heavy JavaScript appsSometimes works through bounded browser rendering.Timeouts and thin shells are possible.

Red lanes.

X/Twitter: unsupported for generic extraction because of heavy anti-bot and login-wall behaviour.

CAPTCHA and human verification: clean failure. Haunt is CAPTCHA-aware and returns an explicit human-verification error instead of guessing.

Private/login-only pages: Pro/Scale authorised extraction only. Use sessions, cookies, or headers you are allowed to provide.

Personal profiles, comments, groups, followers: not a product promise.

use the right route
POST /v1/extract
GET /v1/github/repo?owner=browserbase&repo=stagehand
GET /v1/hackernews/item/40310896
POST /v1/company/enrich
POST /v1/reddit
POST /v1/reddit/comments
next path

Choose the job page that matches your visitor.

Use focused pages instead of making people decode the whole product.

AI agent web extraction

Demo, key, then first agent call.

Open AI agent path

Turn URL into JSON

URL plus prompt to structured JSON.

Open JSON API path

Extract pricing tables

Visible plan cards and pricing pages.

Open pricing-table path

MCP web scraping

Hosted and local MCP setup.

Open MCP path