Traditional web scraping for text extraction follows a tedious pattern: fetch the HTML, parse it with BeautifulSoup, strip out navigation and footers, remove scripts and styles, then hope what's left is the actual article content.
It works — until it doesn't. Modern websites use dynamic rendering, shadow DOMs, and client-side frameworks that make simple HTML parsing unreliable. You end up with navigation text mixed into your content, or worse, empty results because the content loaded via JavaScript after your scraper already finished.
A text extraction API handles all of this for you:
Here's the simplest way to extract readable text from any webpage using Python:
import requests
response = requests.post("https://hauntapi.com/v1/extract", headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json"
}, json={
"url": "https://en.wikipedia.org/wiki/Web_scraping"
})
data = response.json()
print(data["content"]) # Clean, extracted text
That's it. The API returns the page's main content as clean text — no HTML tags, no navigation clutter, no boilerplate.
pip install hauntapi
from hauntapi import HauntClient
client = HauntClient(api_key="your-api-key")
result = client.extract("https://en.wikipedia.org/wiki/Web_scraping")
print(result.content)
Same thing in Node.js using the Fetch API:
const response = await fetch("https://hauntapi.com/v1/extract", {
method: "POST",
headers: {
"X-API-Key": "your-api-key",
"Content-Type": "application/json"
},
body: JSON.stringify({
url: "https://en.wikipedia.org/wiki/Web_scraping"
})
});
const data = await response.json();
console.log(data.content);
Sometimes you don't want all the text — you want specific data points. Haunt API lets you provide an extraction prompt to pull exactly what you need:
response = requests.post("https://hauntapi.com/v1/extract", headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json"
}, json={
"url": "https://news.ycombinator.com",
"prompt": "Extract the top 5 stories with their titles, points, and URLs as JSON"
})
print(response.json()["data"])
# Returns structured JSON with exactly what you asked for
This is where traditional scrapers fall short. With BeautifulSoup, you'd need to inspect the HTML, find the right CSS selectors, handle edge cases, and write brittle parsing code. With an extraction API, you just describe what you want in plain English.
Need to extract text from hundreds or thousands of pages? Here's a production-ready batch script with error handling:
import requests
import json
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
API_KEY = "your-api-key"
URLS = [
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3",
# ... hundreds more
]
def extract_text(url):
try:
resp = requests.post(
"https://hauntapi.com/v1/extract",
headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
json={"url": url},
timeout=30
)
resp.raise_for_status()
return {"url": url, "text": resp.json().get("content", ""), "status": "ok"}
except Exception as e:
return {"url": url, "text": "", "status": "error", "error": str(e)}
results = []
with ThreadPoolExecutor(max_workers=5) as pool:
futures = {pool.submit(extract_text, url): url for url in URLS}
for future in as_completed(futures):
results.append(future.result())
# Save to file
with open("extracted.json", "w") as f:
json.dump(results, f, indent=2)
print(f"Extracted {len([r for r in results if r['status'] == 'ok'])}/{len(URLS)} pages")
Here's how the approaches stack up for text extraction:
Haunt API gives you 100 free requests per month on the Basic plan. No credit card needed. For production use, the Pro plan is $0.01 per request — that's $10 for 1,000 extractions. Compare that to the developer time you'd spend writing and maintaining BeautifulSoup scrapers.
Start extracting text from webpages in minutes. 100 free requests, no credit card required.
Get Your Free API Key →The key insight: if you're still manually parsing HTML to get text from webpages, you're spending time on the wrong problem. Let the API handle rendering, parsing, and cleaning — focus on actually using the data.
Ready to extract clean text from any webpage?
Try Haunt API Free →