Building a TaskEstimator#
The TaskEstimator trait controls how many distributed tasks the planner allocates to each stage of the query plan.
The number of tasks is assigned to the different stages in a bottom-up fashion. See
How a Distributed Plan is Built for the overall picture, and the
inject_network_boundaries
source for the details. A TaskEstimator is what hints this process how many tasks should be used.
While a default implementation exists for file-based DataSourceExec nodes (those backed by FileScanConfig), you
can provide custom TaskEstimator implementations for your own ExecutionPlan types.
Providing your own TaskEstimator#
Providing a TaskEstimator allows you to do three things:
Tell the distributed planner how many tasks should be used for your own
ExecutionPlans (task_estimation).Prepare the leaf node at planning time for distributed execution (
scale_up_leaf_node). The recommended approach is to return aDistributedLeafExecwrapping your original plan along withNper-task variants (one per task).DistributedLeafExecis transparent to network boundaries—it exposes the same partition count as the original—and the task spawner automatically replaces it with the appropriate per-task variant before serialising the plan and sending it to a worker.If your leaf node already handles task dispatch internally (e.g., by reading
DistributedTaskContext.task_indexinexecute()), you can omitDistributedLeafExecand simply return the prepared plan directly fromscale_up_leaf_node.Route each task to a specific worker URL (
route_tasks) — see Routing tasks to specific workers below.
There’s a complete example in the examples/ folder:
custom_execution_plan.rs - A complete example showing how to implement a custom execution plan (
numbers(start, end)table function) that works with distributed DataFusion, including a custom codec and TaskEstimator.
A
TaskEstimatordecides a leaf node’s work at planning time. If your leaf’s units of work are only discovered at runtime (paginated APIs, queues, progressive discovery), see Work Unit Feeds for the complementary mechanism.
Routing tasks to specific workers#
By default, the planner spreads a stage’s tasks across the available workers round-robin. When a task’s
data has a home — a worker that already holds it in a cache or on local disk — you can send the task
there instead by implementing TaskEstimator::route_tasks, which returns the worker URL for each task
(returning None, the default, keeps the round-robin behaviour). Routing pairs naturally with
scale_up_leaf_node: that decides what data task i reads, and route_tasks decides where it runs.
For a complete, runnable walkthrough — parquet files consistently routed to workers (by rendezvous hashing of the file path) so each worker can serve them from an in-memory cache on repeat queries — see the custom_worker_url_routing.rs example.