Company Website Data Extraction API: Turn Business Pages into JSON

May 2026 · 7 min read

Most company websites are written for humans, not databases. That is fine until you need to enrich leads, compare competitors, monitor partner pages, or feed clean company facts into an internal tool.

You can scrape the HTML yourself, maintain selectors for every layout, and spend the afternoon arguing with cookie banners. Or you can send Haunt a URL and a plain-English extraction prompt, then get structured JSON back.

Use case: extract useful business facts from company websites — name, category, services, pricing hints, location, contact routes, proof points, and calls to action — without building a custom parser for every site.

What company website extraction is useful for

This pattern is useful when you already have URLs and need the page turned into structured records:

Lead enrichment: add company category, offer, audience, pricing hints, and contact routes to a lead list.
Competitor monitoring: track changes to pricing pages, feature pages, testimonials, or positioning.
Partner/vendor research: extract what a service does, who it is for, and whether it has docs, pricing, or proof.
Directory building: convert messy service pages into consistent profiles for review.
Sales research: summarize what a company sells before writing a human, relevant note.

The data you can ask for

Because Haunt uses a natural-language prompt, you are not limited to a fixed schema. You describe the fields you want.

{
  "company_name": "...",
  "website": "...",
  "category": "...",
  "one_sentence_summary": "...",
  "target_customers": ["..."],
  "products_or_services": ["..."],
  "pricing_signals": ["..."],
  "contact_routes": ["..."],
  "trust_signals": ["..."],
  "primary_call_to_action": "..."
}

The important bit: Haunt should return what the page supports. If the page does not mention pricing, the result should say that clearly instead of inventing a number. Fake confidence is worse than missing data.

Example API request

Send a URL plus a prompt describing the company facts you want:

curl -X POST https://hauntapi.com/v1/extract \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_HAUNT_API_KEY" \
  -d '{
    "url": "https://example-company.com",
    "prompt": "Extract company facts for lead enrichment. Return JSON with company_name, one_sentence_summary, category, target_customers, products_or_services, pricing_signals, contact_routes, trust_signals, and primary_call_to_action. If a field is not visible on the page, return null or an empty list rather than guessing."
  }'

Example Python workflow

For a small batch, keep the prompt stable and swap the URL:

import requests

headers = {
    "Content-Type": "application/json",
    "X-API-Key": "YOUR_HAUNT_API_KEY",
}

prompt = """
Extract company facts for lead enrichment.
Return JSON with company_name, one_sentence_summary, category,
target_customers, products_or_services, pricing_signals,
contact_routes, trust_signals, and primary_call_to_action.
Do not guess missing fields.
"""

for url in urls:
    response = requests.post(
        "https://hauntapi.com/v1/extract",
        headers=headers,
        json={"url": url, "prompt": prompt},
        timeout=60,
    )
    response.raise_for_status()
    print(response.json())

Why not just parse HTML?

HTML parsing is fine when the layout is stable and you know exactly where the data lives. Company sites are usually not like that. One uses a pricing table. One hides pricing behind copy. One has the offer in the hero. One has the contact route in the footer. One ships half the page through JavaScript.

Approach	Best for	Weak spot
CSS selectors	Known layout, repeated pages	Breaks when the site changes
Raw scraping API	Fetching HTML at scale	You still need parsing and cleanup
Haunt extraction	Turning varied pages into structured JSON	Best results need clear prompts and public page content

Prompt tips for better lead-enrichment JSON

Say exactly which fields you want.
Tell the model not to guess missing values.
Ask for arrays when a page may contain multiple services, audiences, or contact routes.
Keep source-specific notes in separate fields, so your app can review them later.
Store the URL and timestamp next to the extracted JSON. Websites change. Annoying little beasts.

When this is not the right tool

Do not use company website extraction as a magic database. If you need private CRM data, logged-in dashboards, guaranteed fresh legal filings, or verified phone numbers, use the proper source. Haunt is for extracting visible web page content into useful structure.

It is also not a permission slip to spam people. Use extracted lead context to be more relevant, not louder.

Try company website extraction

Start with one URL and one plain-English prompt. If the output is useful, batch the same prompt across your lead list.

Get a Haunt API key Try on RapidAPI

Company Website Data Extraction API: Turn Business Pages into JSON

What company website extraction is useful for

The data you can ask for

Example API request

Example Python workflow

Why not just parse HTML?

Prompt tips for better lead-enrichment JSON

When this is not the right tool

Try company website extraction

Related guides