Skip to main content

What It Does

  • Restrict rows globally or within groups, limiting dataset size.
  • Optional sorting of rows based on any specified column.
  • Allows selective data preprocessing (grouping keys and column to sort).
  • Supports multiple grouping keys to limit rows per category.
  • Graceful fallback when limit_across_groups is true but no grouping keys provided.

🏁 Getting Started

Limit Top N Node config screenshot
1

Add the Limit (Top N) Node

Drag and drop the Limit (Top N) Node into your workflow.
2

Define Limit Settings

Specify the number of rows to return, sorting options, and grouping keys if required.
3

Run the Workflow

Execute the workflow to limit the rows in the output DataFrame.
4

Monitor the Output

The output DataFrame will contain the same columns as the input with a limited number of rows based on your settings.

Inputs

Input NameTypeRequiredDescription
input_df_s3_urlOptional[str]Yes, if template variables are usedS3 URL to the input DataFrame (CSV/Parquet). Required when using template variables in settings.

Outputs

The node returns a List[Dict[str, Any]] where each dictionary contains:
Output NameTypeDescription
s3_output_urlstrS3 URL of the output DataFrame (Parquet format)
s3_output_url_csvstrS3 URL of the output DataFrame (CSV format)
file_infoDictContains metadata: rows_count (int), columns_count (int), columns (List[str])
handle_conditionstrAlways "_default" for this node (no conditional outputs)

Output DataFrame Structure

The output DataFrame will contain the same columns as the input DataFrame, with the following characteristics:
  • All input columns preserved: No columns are added or removed.
  • Row count limited: The number of rows is reduced based on limit settings.
  • Selective preprocessing: Only grouping keys are preprocessed.
  • Grouping keys: Converted to string format with nulls replaced by β€˜(Empty)’.
  • Column to sort: No preprocessing - uses pandas default null handling.
  • Other columns: Preserved in original format.
  • Sorting applied: If specified, rows are sorted by the designated column using pandas default null handling.

How It Works

  1. Data Loading: Loads input data from S3 using the data loading helper.
  2. Field Validation: Ensures all referenced columns exist in the input data.
  3. Data Preprocessing:
    • Grouping keys: Converts to string format and replaces nulls with β€˜(Empty)’.
    • Column to sort: No preprocessing applied - uses pandas default null handling.
    • Other columns: Left unchanged.
  4. Limit Logic Application:
    • Without Grouping: Applies limit to the entire dataset.
    • With Grouping: Groups data by specified keys, applies limit to each group, then combines results.
    • Graceful Fallback: If limit_across_groups is True but no grouping keys provided, behaves as if no grouping.
  5. Sorting: If column_to_sort is specified, sorts data before applying limit using pandas default null handling.
  6. Test Mode: If enabled, limits output to 5 rows regardless of limit setting.
  7. Output Generation: Saves results to S3 in both Parquet and CSV formats.

πŸš€ Example Use Cases & Prompts

Use CaseSetup or Prompt Example
Sampling Large DatasetsLimit to a small number of rows for a preview
Top N by MetricLimit to top N rows based on a sorting column (e.g., score)
Grouped LimitingLimit rows within groups (e.g., top N customers per region)
Performance OptimizationReduce the dataset size for faster processing

✨ Pro Tips

Use grouping_keys for applying limits within different categories (e.g., top N customers per region).
If your dataset is large, use Test Mode to preview the output with just 5 rows for quick validation.

⚠️ Important Considerations

If limit_across_groups is set to True but no grouping_keys are provided, the node will behave as if limit_across_groups is False.
Sorting will use pandas default null handling: null values will be placed at the end for ascending and at the beginning for descending.

πŸ›  Troubleshooting & Gotchas

SymptomLikely CauseQuick Fix
No rows in outputMissing grouping_keysEnsure grouping_keys is set if limit_across_groups is True.
Unexpected column orderColumn sorting issueVerify column_to_sort and sorting_order settings.
No data foundInvalid S3 URLEnsure correct S3 URL is provided for the input DataFrame.

πŸ“ FAQ

Yes, set limit_across_groups to true and specify grouping_keys to limit rows within each group.
The node will apply the limit to the unsorted data, returning rows in their original order.

πŸ’° Pricing

The Limit (Top N) Node incurs no additional cost for limiting rows.
ActionCredit Cost
Row Limiting0 credits
There is no charge for this node unless it’s used in conjunction with other nodes that incur charges.

Drop this node into your flow to efficiently limit the number of rows and optimize data processing. πŸš€