Skip to main content

What It Does

  • Fetches content from public web pages using a static URL or a column of URLs
  • Two AI modes: generate plain-English summaries or structured data mapped to new columns
  • Works row-by-row (if you’re using a column of URLs) or once for all rows (if a single URL is entered)
  • Custom prompts guide what the AI returns—bullets, data fields, or summaries
  • Schema support allows you to define the structure of extracted data (e.g., JSON → columns)

🏁 Getting Started

AI Web Scraping node configuration screenshot
1

Add the Node

Drag the AI Web Scraping node into your workflow.
2

Choose URL Source

Either paste a single Web URL or select a column of URLs.
3

Select a Method

Choose between Summary (text output) or Structured Output (JSON → columns).
4

Write Your Prompt

Give the AI clear instructions—what to summarize or extract.
5

(Optional) Add a Response JSON Schema

If using Structured Output, provide a Response JSON schema to define how output fields map to columns.
6

(Optional) Name Your Output Column

For Summary only. The default is SUMMARY, but you can rename it.
7

Run the Node

Click Run to fetch and enrich your table—once per row or once for all rows.

Inputs

🛠️ Required Fields

  • Web URL or Column (✅)
    Either paste a single URL or select a column containing URLs.
    Why it matters: Controls whether the node runs once (for all rows) or row-by-row with different results.
  • Method (✅)
    Choose Summary or Structured Output.
    Why it matters: Summary returns a single text column; Structured Output uses your schema to return multiple fields.
  • Prompt (✅)
    Write instructions like “Summarize in 3 bullets” or “Extract name, price, and features as JSON.”
    Why it matters: The AI follows this prompt to determine what content to pull from each page.

🎯 Optional Fields

  • Response JSON Schema (⚪️)
    A sample JSON structure showing what you want the AI to return.
    Why you’d use it: Required only for Structured Output. This tells the node how to turn the AI’s response into new columns.
  • Output Column Name (⚪️)
    For Summary only. Choose your output column name (default: Summary).
    Why you’d use it: Keeps naming aligned with your CRM or reporting conventions.

Output

  • A SUMMARY column (or custom name) when using Summary mode
  • One or more new columns mapped from your JSON schema (when using Structured Output)
AI Web Scraping node output example showing Summary column

How It Works

  1. Detects whether you’ve provided a static URL or selected a URL column
  2. For each run:
    • Fetches the webpage
    • Builds an AI prompt using your instructions (and schema, if any)
    • Sends it to the AI engine
    • Parses the result: either plain text or JSON
  3. Appends new column(s) to the table:
    • Same output across all rows (if using one URL)
    • Row-specific output (if using a URL column)

🚀 Example Use Cases & Prompts

Use CasePrompt
Competitor Research“Summarize the 3 main features of this product.”
Pricing Table Extraction“Extract plan name, price, and key benefits as JSON.”
Blog Summarization“Summarize this post in 3 key bullets for an executive briefing.”
Change Monitoring“List new sections added since the last version of this page.”
Product Catalog Structuring“Extract product name, launch year, and category as a JSON object.”

✨ Pro Tips

Know your method: A JSON-style prompt alone won’t return structured output—be sure to select Structured Output and fill the Response JSON Schema.
Validate early: Start with one URL and check your prompt + schema before scaling across 100s of rows.
Be descriptive: Use prompts like “Extract X, Y, and Z as JSON with keys a, b, and c” to guide the AI clearly.
If using Structured Output, always fill in the Response JSON Schema — otherwise the node won’t know how to structure the output.

⚠️ Important Considerations

Every row counts: If using a URL column, each row is billed separately.
Same result across all rows: If you paste a single URL, the output is identical for every row in your table.
Schema is not optional in Structured Output: Leaving it blank means the node won’t know how to create columns from the AI’s response and would result in skewed responses.
Some websites might not work as expected: If a page needs you to log in, click to reveal content, or loads slowly, the node might miss or skip parts of it.

🛠 Troubleshooting & Gotchas

SymptomProbable CauseQuick Fix
All rows have the same resultA single URL was entered manuallyUse a column to scrape each row’s unique URL
Summary looks generic or offPrompt was too broad or unclearTweak the prompt to be more specific and focused
Blank values in some rowsSome rows had missing or invalid URLsCheck your URL column for empty or malformed entries

📝 FAQ

Yes—select a column that contains URLs, and the node will run one scrape per row.
Only in Structured Output mode—and you must also fill the Response JSON Schema field for it to work correctly.
Redirects are followed automatically. The final URL is tracked behind the scenes.

💰 Pricing

Each URL scraped—whether from a single value or a column—costs 5 credits per URL.
Credit usage varies by node depending on complexity and AI operations.
Check the documentation of each node for specific credit details.

From pricing pages to product updates to competitor blogs—this node grabs the gold so you don’t have to. Drop it into your flow and turn any URL into usable, structured insights. 🔍📊