Research February 3, 2026

Scaling to a Million Units: Why Hierarchical Coordination Wins

Discrete event simulation demonstrates that hierarchical coordination scales to 1M+ nodes while centralized architectures bottleneck at ~10,000. Optimal coordinator duty cycle: 24-48 hours.

Research Team

Project Dyson

Scaling to a Million Units: Why Hierarchical Coordination Wins

The Dyson swarm will eventually comprise millions of autonomous units. We built a discrete event simulator to answer the critical question: What coordination architecture can scale that far?

The Three Questions

This simulation addresses three interrelated research questions:

RQ-1-24: How do coordination architectures scale to millions of units?
RQ-1-39: What's the optimal duty cycle for cluster coordinators?
RQ-2-17: At what fleet size do coordination constraints dominate?

The Key Finding: Hierarchy is Essential

Hierarchical coordination scales to 1M+ nodes; centralized hits bottlenecks at ~10,000.

Architecture	Scalability Limit	Communication Overhead
Centralized	~10,000 nodes	5-15%
Hierarchical	1,000,000+ nodes	2-8%
Mesh	~100,000 nodes	10-25%

The centralized approach fails not because of bandwidth, but because of message processing latency at the central node.

Why Centralized Fails

In a centralized architecture:

Every node reports to a single coordinator
Message processing time: O(N)
At 10,000 nodes, queue depth exceeds acceptable latency
At 100,000 nodes, the system becomes unresponsive

The bottleneck isn't bandwidth—it's processing time.

Why Mesh Becomes Inefficient

Mesh topology provides excellent resilience but:

Message complexity: O(N²) for gossip protocols
At 100,000 nodes, overhead exceeds 25% of communication bandwidth
Coordination consistency becomes unreliable

Mesh works well for small clusters but not swarm-scale operations.

The Hierarchical Solution

The hierarchical architecture uses ~100-node clusters with rotating coordinators:

       [Ground Control]
              |
       [Regional Coordinators] (10-100)
              |
       [Cluster Coordinators] (100-1000)
              |
       [Node Clusters] (50-100 nodes each)

Message complexity: O(N × log(N))—scalable to millions.

Coordinator Duty Cycle: The 24-Hour Sweet Spot

The simulation tested duty cycles from 1 hour to 7 days:

Duty Cycle	Power Variance	Handoff Success	Availability
1 hour	<5%	95%	99.9%
6 hours	8%	98%	99.8%
24 hours	12%	99.5%	99.5%
48 hours	18%	99.8%	99.2%
7 days	35%	99.9%	98%

24-48 hours provides optimal balance:

Low enough handoff frequency to minimize overhead
Short enough exposure to limit single-point-of-failure risk
Predictable timing for handoff scheduling

Power Budget Implications

Coordinator duty comes with power overhead:

Baseline node: 5 W average
Coordinator mode: 15-20 W average

With 24-hour duty cycles in 100-node clusters:

Each node serves as coordinator ~1% of the time
Average power impact: ~0.15 W per node
Acceptable within power budget

State Transfer Requirements

Each coordinator handoff requires transferring:

Ephemeris catalog: 10-50 MB
Conjunction queue: 1-5 MB
Routing tables: 0.5-1 MB

Total transfer time: 1-10 seconds over optical ISL

This is fast enough to complete handoffs without disrupting cluster operations.

The 50,000-Node Inflection Point

For manufacturing fleet coordination (RQ-2-17), the simulation identifies:

Fleet Size	Hierarchical Overhead	Coordination Viable?
1,000	1%	Yes
10,000	2%	Yes
50,000	4%	Inflection point
100,000	6%	Marginal
500,000	10%	Requires optimization

At ~50,000 nodes, coordination overhead reaches 5%—our target threshold for acceptable overhead. Beyond this, additional optimizations are required.

Recommendations for Phase 1

1. Implement Hierarchical Architecture from Day One

Don't start with centralized and migrate later—design for hierarchy from the beginning.

2. Use 100-Node Clusters with 24-Hour Rotation

This provides:

Manageable cluster size for coordination
Predictable handoff scheduling
Balanced power distribution

3. Design for 1M+ Node Scalability

Even if Phase 1 deploys only 10,000 units, the architecture must support Phase 2 scale.

4. Limit Per-Node Bandwidth to 0.5-1 kbps

This constraint ensures the architecture scales without bandwidth bottlenecks.

Try It Yourself

We've published the interactive simulator so you can explore coordination architectures. Adjust node count, topology, cluster size, and duty cycle to see how overhead and scalability change.

Methodology

The simulation uses:

Discrete event simulation with message passing
Topology modeling (centralized, hierarchical, mesh)
Power consumption profiles for coordinator vs baseline nodes
50-100 Monte Carlo runs per configuration

Results represent relative comparisons between architectures.

What's Next

This research answers RQ-1-24, RQ-1-39, and RQ-2-17, providing validated coordination architecture for Phase 1 and Phase 2. The hierarchical approach with rotating coordinators is now the baseline design.

Remaining work:

Spatial partitioning algorithm benchmarking
Adaptive rotation policy evaluation
Hardware-in-the-loop validation

Research Questions:

Interactive Tool: Swarm Coordination Simulator

Tags:

simulationresearch-questionphase-1phase-2coordinationdiscrete-event-simulation

Continue Exploring

More Articles View Project Plan Research Hub