Skip to main content
Ctrl+K

Distributed DataFusion documentation

User Guide

  • Concepts
  • Getting Started
  • Spawn a Worker
  • Implementing a WorkerResolver
  • Building a ChannelResolver
  • Building a TaskEstimator
  • How a Distributed Plan is Built

Contributor Guide

  • Index
  • Setup
  • Tests
  • Benchmarks

DataFusion Distributed#

DataFusion Distributed is a library that enhances Apache DataFusion with distributed capabilities.

These docs will guide you towards using the library for building your own Distributed DataFusion cluster, and how to contribute changes to the library yourself.

User Guide

  • Concepts
  • Public API
    • DistributedPhysicalOptimizerRule
    • Worker
    • WorkerResolver
    • TaskEstimator
    • DistributedTaskContext
    • ChannelResolver
  • Getting Started
    • How to use Distributed DataFusion
    • Next steps
  • Spawn a Worker
    • Overview
    • Launching the Arrow Flight server
    • WorkerSessionBuilder
    • Serving the Endpoint
  • Implementing a WorkerResolver
    • Static WorkerResolver
    • Dynamic WorkerResolver
  • Building a ChannelResolver
    • Providing your own ChannelResolver
  • Building a TaskEstimator
    • Providing your own TaskEstimator
  • How a Distributed Plan is Built

Contributor Guide

  • Index
  • Setup
    • Prerequisites
    • Clone and Setup
    • Running Examples
    • Resources
  • Tests
    • Running Unit Tests
    • Running Integration Tests
    • Resources
  • Benchmarks
    • Local Benchmarks
    • Remote Benchmarks

next

Concepts

Edit on GitHub

This Page

  • Show Source