> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nrev.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Web Scraping

> Scrape public web pages—one at a time or row by row—and turn content into summaries or structured data.

## What It Does

* **Fetches content** from public web pages using a static URL or a column of URLs
* **Two AI modes**: generate plain-English summaries or structured data mapped to new columns
* **Works row-by-row** (if you're using a column of URLs) or once for all rows (if a single URL is entered)
* **Custom prompts** guide what the AI returns—bullets, data fields, or summaries
* **Schema support** allows you to define the structure of extracted data (e.g., JSON → columns)

***

## 🏁 Getting Started

<Frame>
  <img src="https://mintcdn.com/nurturev/OitEIaKlfl7lCKCJ/images/AI%20Web%20Scraping%20node%20configuration%20screenshot.png?fit=max&auto=format&n=OitEIaKlfl7lCKCJ&q=85&s=4990c68020ecaa34de4dda891742c3b2" alt="AI Web Scraping node configuration screenshot" style={{ borderRadius: '0.5rem', width: '100%', margin: '1.5rem 0' }} width="1182" height="1664" data-path="images/AI Web Scraping node configuration screenshot.png" />
</Frame>

<Steps>
  <Step title="Add the Node">
    Drag the **AI Web Scraping** node into your workflow.
  </Step>

  <Step title="Choose URL Source">
    Either paste a **single Web URL** or select a **column of URLs**.
  </Step>

  <Step title="Select a Method">
    Choose between `Summary` (text output) or `Structured Output` (JSON → columns).
  </Step>

  <Step title="Write Your Prompt">
    Give the AI clear instructions—what to summarize or extract.
  </Step>

  <Step title="(Optional) Add a Response JSON Schema">
    If using `Structured Output`, provide a **Response JSON schema** to define how output fields map to columns.
  </Step>

  <Step title="(Optional) Name Your Output Column">
    For `Summary` only. The default is **SUMMARY**, but you can rename it.
  </Step>

  <Step title="Run the Node">
    Click **Run** to fetch and enrich your table—once per row or once for all rows.
  </Step>
</Steps>

***

## Inputs

### 🛠️ Required Fields

* **Web URL or Column (✅)**\
  Either paste a single URL or select a column containing URLs.\
  *Why it matters:* Controls whether the node runs once (for all rows) or row-by-row with different results.

* **Method (✅)**\
  Choose `Summary` or `Structured Output`.\
  *Why it matters:* `Summary` returns a single text column; `Structured Output` uses your schema to return multiple fields.

* **Prompt (✅)**\
  Write instructions like “Summarize in 3 bullets” or “Extract name, price, and features as JSON.”\
  *Why it matters:* The AI follows this prompt to determine what content to pull from each page.

### 🎯 Optional Fields

* **Response JSON Schema (⚪️)**\
  A sample JSON structure showing what you want the AI to return.\
  *Why you’d use it:* Required **only for `Structured Output`**. This tells the node how to turn the AI’s response into new columns.

* **Output Column Name (⚪️)**\
  For `Summary` only. Choose your output column name (default: `Summary`).\
  *Why you’d use it:* Keeps naming aligned with your CRM or reporting conventions.

***

## Output

* A **`SUMMARY`** column (or custom name) when using `Summary` mode
* One or more **new columns** mapped from your JSON schema (when using `Structured Output`)

***

## How It Works

1. Detects whether you’ve provided a **static URL** or selected a **URL column**
2. For each run:
   * Fetches the webpage
   * Builds an AI prompt using your instructions (and schema, if any)
   * Sends it to the AI engine
   * Parses the result: either plain text or JSON
3. Appends new column(s) to the table:
   * Same output across all rows (if using one URL)
   * Row-specific output (if using a URL column)

***

## 🚀 Example Use Cases & Prompts

| Use Case                    | Prompt                                                              |
| --------------------------- | ------------------------------------------------------------------- |
| Competitor Research         | “Summarize the 3 main features of this product.”                    |
| Pricing Table Extraction    | “Extract plan name, price, and key benefits as JSON.”               |
| Blog Summarization          | “Summarize this post in 3 key bullets for an executive briefing.”   |
| Change Monitoring           | “List new sections added since the last version of this page.”      |
| Product Catalog Structuring | “Extract product name, launch year, and category as a JSON object.” |

***

## ✨ Pro Tips

<Tip>
  **Know your method**: A JSON-style prompt alone won’t return structured output—be sure to select `Structured Output` and fill the **Response JSON Schema**.
</Tip>

<Tip>
  **Validate early**: Start with one URL and check your prompt + schema before scaling across 100s of rows.
</Tip>

<Tip>
  **Be descriptive**: Use prompts like “Extract X, Y, and Z as JSON with keys a, b, and c” to guide the AI clearly.
</Tip>

<Tip>
  If using `Structured Output`, always fill in the **Response JSON Schema** — otherwise the node won’t know how to structure the output.
</Tip>

***

## ⚠️ Important Considerations

<Warning>
  **Every row counts**: If using a URL column, each row is billed separately.
</Warning>

<Warning>
  **Same result across all rows**: If you paste a single URL, the output is **identical for every row** in your table.
</Warning>

<Warning>
  **Schema is not optional in `Structured Output`**: Leaving it blank means the node won’t know how to create columns from the AI's response and would result in skewed responses.
</Warning>

<Warning>
  **Some websites might not work as expected**: If a page needs you to log in, click to reveal content, or loads slowly, the node might miss or skip parts of it.
</Warning>

***

## 🛠 Troubleshooting & Gotchas

| Symptom                       | Probable Cause                        | Quick Fix                                            |
| ----------------------------- | ------------------------------------- | ---------------------------------------------------- |
| All rows have the same result | A single URL was entered manually     | Use a column to scrape each row’s unique URL         |
| Summary looks generic or off  | Prompt was too broad or unclear       | Tweak the prompt to be more specific and focused     |
| Blank values in some rows     | Some rows had missing or invalid URLs | Check your URL column for empty or malformed entries |

***

## 📝 FAQ

<AccordionGroup>
  <Accordion title="Can I scrape a list of URLs?">
    Yes—select a column that contains URLs, and the node will run one scrape per row.
  </Accordion>

  <Accordion title="Do I need to write JSON in my prompt?">
    Only in `Structured Output` mode—and you must also fill the **Response JSON Schema** field for it to work correctly.
  </Accordion>

  <Accordion title="What happens if a page redirects?">
    Redirects are followed automatically. The final URL is tracked behind the scenes.
  </Accordion>
</AccordionGroup>

***

## 💰 Pricing

> Each URL scraped—whether from a single value or a column—costs **5 credits per URL**.

<Note>
  Credit usage varies by node depending on complexity and AI operations.\
  Check the documentation of each node for specific credit details.
</Note>

***

<p style={{ fontSize: '1rem', fontWeight: 'bold', marginTop: '1.5rem' }}>
  From pricing pages to product updates to competitor blogs—this node grabs the gold so you don’t have to. Drop it into your flow and turn any URL into usable, structured insights. 🔍📊
</p>
