Limit(N) Node

What It Does

Restrict rows globally or within groups, limiting dataset size.
Optional sorting of rows based on any specified column.
Allows selective data preprocessing (grouping keys and column to sort).
Supports multiple grouping keys to limit rows per category.
Graceful fallback when limit_across_groups is true but no grouping keys provided.

🏁 Getting Started

Add the Limit (Top N) Node

Drag and drop the Limit (Top N) Node into your workflow.

Define Limit Settings

Specify the number of rows to return, sorting options, and grouping keys if required.

Run the Workflow

Execute the workflow to limit the rows in the output DataFrame.

Monitor the Output

The output DataFrame will contain the same columns as the input with a limited number of rows based on your settings.

Inputs

Input Name	Type	Required	Description
`input_df_s3_url`	`Optional[str]`	Yes, if template variables are used	S3 URL to the input DataFrame (CSV/Parquet). Required when using template variables in settings.

Outputs

The node returns a List[Dict[str, Any]] where each dictionary contains:

Output Name	Type	Description
`s3_output_url`	`str`	S3 URL of the output DataFrame (Parquet format)
`s3_output_url_csv`	`str`	S3 URL of the output DataFrame (CSV format)
`file_info`	`Dict`	Contains metadata: `rows_count` (int), `columns_count` (int), `columns` (List[str])
`handle_condition`	`str`	Always `"_default"` for this node (no conditional outputs)

Output DataFrame Structure

The output DataFrame will contain the same columns as the input DataFrame, with the following characteristics:

All input columns preserved: No columns are added or removed.
Row count limited: The number of rows is reduced based on limit settings.
Selective preprocessing: Only grouping keys are preprocessed.
Grouping keys: Converted to string format with nulls replaced by ‘(Empty)’.
Column to sort: No preprocessing - uses pandas default null handling.
Other columns: Preserved in original format.
Sorting applied: If specified, rows are sorted by the designated column using pandas default null handling.

How It Works

Data Loading: Loads input data from S3 using the data loading helper.
Field Validation: Ensures all referenced columns exist in the input data.
Data Preprocessing:
- Grouping keys: Converts to string format and replaces nulls with ‘(Empty)’.
- Column to sort: No preprocessing applied - uses pandas default null handling.
- Other columns: Left unchanged.
Limit Logic Application:
- Without Grouping: Applies limit to the entire dataset.
- With Grouping: Groups data by specified keys, applies limit to each group, then combines results.
- Graceful Fallback: If limit_across_groups is True but no grouping keys provided, behaves as if no grouping.
Sorting: If column_to_sort is specified, sorts data before applying limit using pandas default null handling.
Test Mode: If enabled, limits output to 5 rows regardless of limit setting.
Output Generation: Saves results to S3 in both Parquet and CSV formats.

🚀 Example Use Cases & Prompts

Use Case	Setup or Prompt Example
Sampling Large Datasets	Limit to a small number of rows for a preview
Top N by Metric	Limit to top N rows based on a sorting column (e.g., `score`)
Grouped Limiting	Limit rows within groups (e.g., top N customers per region)
Performance Optimization	Reduce the dataset size for faster processing

✨ Pro Tips

Use grouping_keys for applying limits within different categories (e.g., top N customers per region).

If your dataset is large, use Test Mode to preview the output with just 5 rows for quick validation.

⚠️ Important Considerations

If limit_across_groups is set to True but no grouping_keys are provided, the node will behave as if limit_across_groups is False.

Sorting will use pandas default null handling: null values will be placed at the end for ascending and at the beginning for descending.

🛠 Troubleshooting & Gotchas

Symptom	Likely Cause	Quick Fix
No rows in output	Missing `grouping_keys`	Ensure `grouping_keys` is set if `limit_across_groups` is True.
Unexpected column order	Column sorting issue	Verify `column_to_sort` and `sorting_order` settings.
No data found	Invalid S3 URL	Ensure correct S3 URL is provided for the input DataFrame.

📝 FAQ

Can I apply a limit within groups?

Yes, set limit_across_groups to true and specify grouping_keys to limit rows within each group.

What happens if no sorting column is specified?

The node will apply the limit to the unsorted data, returning rows in their original order.

💰 Pricing

The Limit (Top N) Node incurs no additional cost for limiting rows.

Action	Credit Cost
Row Limiting	0 credits

There is no charge for this node unless it’s used in conjunction with other nodes that incur charges.

Drop this node into your flow to efficiently limit the number of rows and optimize data processing. 🚀

Getting Started

Building Workflows

Nodes

Connections

Creator Hub

Limit(N) Node

What It Does

🏁 Getting Started

Inputs

Outputs

Output DataFrame Structure

How It Works

🚀 Example Use Cases & Prompts

✨ Pro Tips

⚠️ Important Considerations

🛠 Troubleshooting & Gotchas

📝 FAQ

💰 Pricing

Getting Started

Building Workflows

Nodes

Connections

Creator Hub

​What It Does

​🏁 Getting Started

​Inputs

​Outputs

​Output DataFrame Structure

​How It Works

​🚀 Example Use Cases & Prompts

​✨ Pro Tips

​⚠️ Important Considerations

​🛠 Troubleshooting & Gotchas

​📝 FAQ

​💰 Pricing

What It Does

🏁 Getting Started

Inputs

Outputs

Output DataFrame Structure

How It Works

🚀 Example Use Cases & Prompts

✨ Pro Tips

⚠️ Important Considerations

🛠 Troubleshooting & Gotchas

📝 FAQ

💰 Pricing