Company Website Data Extraction API: Turn Business Pages into JSON

May 2026 · 7 min read

Most company websites are written for humans, not databases. That is fine until you need to enrich leads, compare competitors, monitor partner pages, or feed clean company facts into an internal tool.

You can scrape the HTML yourself, maintain selectors for every layout, and spend the afternoon arguing with cookie banners. Or you can send Haunt a URL and a plain-English extraction prompt, then get structured JSON back.

Use case: extract useful business facts from company websites — name, category, services, pricing hints, location, contact routes, proof points, and calls to action — without building a custom parser for every site.

What company website extraction is useful for

This pattern is useful when you already have URLs and need the page turned into structured records:

The data you can ask for

Because Haunt uses a natural-language prompt, you are not limited to a fixed schema. You describe the fields you want.

{
  "company_name": "...",
  "website": "...",
  "category": "...",
  "one_sentence_summary": "...",
  "target_customers": ["..."],
  "products_or_services": ["..."],
  "pricing_signals": ["..."],
  "contact_routes": ["..."],
  "trust_signals": ["..."],
  "primary_call_to_action": "..."
}

The important bit: Haunt should return what the page supports. If the page does not mention pricing, the result should say that clearly instead of inventing a number. Fake confidence is worse than missing data.

Example API request

Send a URL plus a prompt describing the company facts you want:

curl -X POST https://hauntapi.com/v1/extract \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_HAUNT_API_KEY" \
  -d '{
    "url": "https://example-company.com",
    "prompt": "Extract company facts for lead enrichment. Return JSON with company_name, one_sentence_summary, category, target_customers, products_or_services, pricing_signals, contact_routes, trust_signals, and primary_call_to_action. If a field is not visible on the page, return null or an empty list rather than guessing."
  }'

Example Python workflow

For a small batch, keep the prompt stable and swap the URL:

import requests

headers = {
    "Content-Type": "application/json",
    "X-API-Key": "YOUR_HAUNT_API_KEY",
}

prompt = """
Extract company facts for lead enrichment.
Return JSON with company_name, one_sentence_summary, category,
target_customers, products_or_services, pricing_signals,
contact_routes, trust_signals, and primary_call_to_action.
Do not guess missing fields.
"""

for url in urls:
    response = requests.post(
        "https://hauntapi.com/v1/extract",
        headers=headers,
        json={"url": url, "prompt": prompt},
        timeout=60,
    )
    response.raise_for_status()
    print(response.json())

Why not just parse HTML?

HTML parsing is fine when the layout is stable and you know exactly where the data lives. Company sites are usually not like that. One uses a pricing table. One hides pricing behind copy. One has the offer in the hero. One has the contact route in the footer. One ships half the page through JavaScript.

Approach Best for Weak spot
CSS selectors Known layout, repeated pages Breaks when the site changes
Raw scraping API Fetching HTML at scale You still need parsing and cleanup
Haunt extraction Turning varied pages into structured JSON Best results need clear prompts and public page content

Prompt tips for better lead-enrichment JSON

When this is not the right tool

Do not use company website extraction as a magic database. If you need private CRM data, logged-in dashboards, guaranteed fresh legal filings, or verified phone numbers, use the proper source. Haunt is for extracting visible web page content into useful structure.

It is also not a permission slip to spam people. Use extracted lead context to be more relevant, not louder.

Try company website extraction

Start with one URL and one plain-English prompt. If the output is useful, batch the same prompt across your lead list.

Get a Haunt API key Try on RapidAPI