← All posts

MCP Server for Web Scraping — Give Your AI Agent Real-Time Web Data

AI agents are only as useful as the data they can access. Large language models like Claude and GPT-4 have vast knowledge baked in during training, but they can't access live web data on their own. That's where MCP (Model Context Protocol) comes in — and web scraping is one of the most valuable capabilities you can give your agent through it.

What is the Model Context Protocol?

MCP is an open protocol introduced by Anthropic that standardizes how AI models connect to external data sources and tools. Think of it as USB-C for AI — a universal connector that lets any AI client talk to any data source through a consistent interface.

An MCP server exposes tools that an AI agent can call during a conversation. Instead of hard-coding API integrations, you configure an MCP server once and your agent discovers and uses its capabilities automatically.

Why Web Scraping + MCP is Powerful

Web scraping as an MCP tool unlocks use cases that were previously impossible or required complex multi-step orchestration:

The key insight: your AI agent can now access any public web page as if it were browsing the internet, without you writing custom scrapers for each site.

Building an MCP Server for Web Extraction

Here's the minimal structure of an MCP server that provides web scraping as a tool:

// server.ts — MCP server with web extraction tool
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server({
  name: "web-scraper",
  version: "1.0.0",
}, {
  capabilities: { tools: {} },
});

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [{
    name: "extract_web_data",
    description: "Extract structured data from any web page. \
      Handles JavaScript rendering, Cloudflare protection, \
      and returns clean structured data.",
    inputSchema: {
      type: "object",
      properties: {
        url: { type: "string", description: "URL to extract data from" },
        prompt: { type: "string", description: "What data to extract" },
      },
      required: ["url", "prompt"],
    },
  }],
}));

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "extract_web_data") {
    const { url, prompt } = request.params.arguments;
    const result = await fetch("https://hauntapi.com/v1/extract", {
      method: "POST",
      headers: { "Authorization": "Bearer YOUR_API_KEY" },
      body: JSON.stringify({ url, prompt }),
    });
    const data = await result.json();
    return { content: [{ type: "text", text: JSON.stringify(data) }] };
  }
});

The server exposes a single extract_web_data tool that takes a URL and a natural language prompt describing what to extract. The AI agent calls this tool whenever it needs live web data.

The Haunt API MCP Server (Ready to Use)

You don't have to build this from scratch. Haunt API ships a pre-built MCP server that handles all the complexity:

Install it via npm:

npm install -g @hauntapi/mcp-server

Or use it with npx:

npx @hauntapi/mcp-server --api-key haunt_xxx

Using It With Claude Desktop and Other Clients

Add the Haunt MCP server to your Claude Desktop configuration:

// ~/Library/Application Support/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "haunt": {
      "command": "npx",
      "args": ["-y", "@hauntapi/mcp-server"],
      "env": {
        "HAUNT_API_KEY": "haunt_your_key_here"
      }
    }
  }
}

Once configured, Claude can extract data from any website during your conversation. Just ask:

MCP vs Traditional Web Scraping for AI

Here's how MCP-based web scraping compares to the old approach:

The MCP approach is 10x faster for ad-hoc research tasks and makes your AI agent genuinely useful for real-time web data tasks.

Get started with the Haunt MCP server in 60 seconds. Free tier includes 100 requests/month.

View Documentation