Research February 3, 2026

Scaling to a Million Units: Why Hierarchical Coordination Wins

Discrete event simulation demonstrates that hierarchical coordination scales to 1M+ nodes while centralized architectures bottleneck at ~10,000. Optimal coordinator duty cycle: 24-48 hours.

RT

Research Team

Project Dyson

Scaling to a Million Units: Why Hierarchical Coordination Wins

The Dyson swarm will eventually comprise millions of autonomous units. We built a discrete event simulator to answer the critical question: What coordination architecture can scale that far?

The Three Questions

This simulation addresses three interrelated research questions:

  • RQ-1-24: How do coordination architectures scale to millions of units?
  • RQ-1-39: What's the optimal duty cycle for cluster coordinators?
  • RQ-2-17: At what fleet size do coordination constraints dominate?

The Key Finding: Hierarchy is Essential

Hierarchical coordination scales to 1M+ nodes; centralized hits bottlenecks at ~10,000.

Architecture Scalability Limit Communication Overhead
Centralized ~10,000 nodes 5-15%
Hierarchical 1,000,000+ nodes 2-8%
Mesh ~100,000 nodes 10-25%

The centralized approach fails not because of bandwidth, but because of message processing latency at the central node.

Why Centralized Fails

In a centralized architecture:

  • Every node reports to a single coordinator
  • Message processing time: O(N)
  • At 10,000 nodes, queue depth exceeds acceptable latency
  • At 100,000 nodes, the system becomes unresponsive

The bottleneck isn't bandwidth—it's processing time.

Why Mesh Becomes Inefficient

Mesh topology provides excellent resilience but:

  • Message complexity: O(N²) for gossip protocols
  • At 100,000 nodes, overhead exceeds 25% of communication bandwidth
  • Coordination consistency becomes unreliable

Mesh works well for small clusters but not swarm-scale operations.

The Hierarchical Solution

The hierarchical architecture uses ~100-node clusters with rotating coordinators:

       [Ground Control]
              |
       [Regional Coordinators] (10-100)
              |
       [Cluster Coordinators] (100-1000)
              |
       [Node Clusters] (50-100 nodes each)

Message complexity: O(N × log(N))—scalable to millions.

Coordinator Duty Cycle: The 24-Hour Sweet Spot

The simulation tested duty cycles from 1 hour to 7 days:

Duty Cycle Power Variance Handoff Success Availability
1 hour <5% 95% 99.9%
6 hours 8% 98% 99.8%
24 hours 12% 99.5% 99.5%
48 hours 18% 99.8% 99.2%
7 days 35% 99.9% 98%

24-48 hours provides optimal balance:

  • Low enough handoff frequency to minimize overhead
  • Short enough exposure to limit single-point-of-failure risk
  • Predictable timing for handoff scheduling

Power Budget Implications

Coordinator duty comes with power overhead:

  • Baseline node: 5 W average
  • Coordinator mode: 15-20 W average

With 24-hour duty cycles in 100-node clusters:

  • Each node serves as coordinator ~1% of the time
  • Average power impact: ~0.15 W per node
  • Acceptable within power budget

State Transfer Requirements

Each coordinator handoff requires transferring:

  • Ephemeris catalog: 10-50 MB
  • Conjunction queue: 1-5 MB
  • Routing tables: 0.5-1 MB

Total transfer time: 1-10 seconds over optical ISL

This is fast enough to complete handoffs without disrupting cluster operations.

The 50,000-Node Inflection Point

For manufacturing fleet coordination (RQ-2-17), the simulation identifies:

Fleet Size Hierarchical Overhead Coordination Viable?
1,000 1% Yes
10,000 2% Yes
50,000 4% Inflection point
100,000 6% Marginal
500,000 10% Requires optimization

At ~50,000 nodes, coordination overhead reaches 5%—our target threshold for acceptable overhead. Beyond this, additional optimizations are required.

Recommendations for Phase 1

1. Implement Hierarchical Architecture from Day One

Don't start with centralized and migrate later—design for hierarchy from the beginning.

2. Use 100-Node Clusters with 24-Hour Rotation

This provides:

  • Manageable cluster size for coordination
  • Predictable handoff scheduling
  • Balanced power distribution

3. Design for 1M+ Node Scalability

Even if Phase 1 deploys only 10,000 units, the architecture must support Phase 2 scale.

4. Limit Per-Node Bandwidth to 0.5-1 kbps

This constraint ensures the architecture scales without bandwidth bottlenecks.

Try It Yourself

We've published the interactive simulator so you can explore coordination architectures. Adjust node count, topology, cluster size, and duty cycle to see how overhead and scalability change.

Methodology

The simulation uses:

  • Discrete event simulation with message passing
  • Topology modeling (centralized, hierarchical, mesh)
  • Power consumption profiles for coordinator vs baseline nodes
  • 50-100 Monte Carlo runs per configuration

Results represent relative comparisons between architectures.

What's Next

This research answers RQ-1-24, RQ-1-39, and RQ-2-17, providing validated coordination architecture for Phase 1 and Phase 2. The hierarchical approach with rotating coordinators is now the baseline design.

Remaining work:

  • Spatial partitioning algorithm benchmarking
  • Adaptive rotation policy evaluation
  • Hardware-in-the-loop validation

Research Questions:

Interactive Tool: Swarm Coordination Simulator

Tags:

simulationresearch-questionphase-1phase-2coordinationdiscrete-event-simulation
D
Project Dyson

A non-profit organization dedicated to realizing a Dyson swarm through detailed planning, research aggregation, and multi-LLM collaboration.

Resources

Community

© 2026 Project Dyson. Open source under MIT license.

Built with Svelte, powered by AI collaboration