120 Versions, 3 AI Reviewers: Sizing the Coordination Link for 100,000 Spacecraft

If you are building a fleet of 100,000 autonomous spacecraft, you need to know how much bandwidth the coordination channel requires before any hardware exists. We set out to derive that number from first principles. After 120 versions of iterative AI peer review, the paper "Design Equations and Parametric Sizing for Hierarchical Coordination in Large Autonomous Space Swarms" lands on a two-test feasibility framework and a 35 kbps S-band coordination link.

The Problem

Earlier this year we published preliminary coordination results from a discrete event simulator. Those results confirmed that hierarchical coordination scales to 1M+ nodes and identified overhead in the 2-8% range. But the simulator modeled bandwidth as a simple product of message rate, message size, and bit width. It had no concept of TDMA slot structure, no physical-layer overhead accounting, and no way to answer the practical question: what PHY rate does the coordinator link actually need?

The paper addresses this.

The Two-Test Framework

The paper decomposes feasibility into two independent tests:

Test A (byte budget): Does the total protocol overhead fit within the per-node bandwidth allocation? This test operates on information bytes and is independent of the physical layer. The overhead has three components: baseline telemetry (20.5% -- periodic status reports, topology-invariant), architecture-specific overhead (5.6% -- heartbeats, summaries, elections), and command traffic (gated by the campaign duty factor d).

Test B (TDMA schedulability): Do all the TDMA time slots fit within the 10-second coordination cycle? Each member node sends a 256-byte ephemeris report in its own time slot. The slot is longer than the raw payload time because of FEC encoding, framing headers, guard intervals, and acquisition preambles. The ratio of payload time to total slot time is the slot efficiency, which we call gamma.

Gamma connects the information layer to the physical layer. At 35 kbps with CCSDS Proximity-1 framing: gamma = 0.732. This means that only 73.2% of each time slot carries actual payload data. The rest is overhead that the information layer never sees but the TDMA scheduler must account for.

Both tests must pass for a configuration to be feasible. Test A is usually non-binding (overhead stays well below 100%); Test B is what drives the PHY rate requirement.

The 35 kbps Result

The coordinator ingress link is the bottleneck. In a 100-node cluster, 99 members each send 256 bytes per 10-second cycle. The rate ladder traces how the information rate translates to a PHY rate requirement:

Step	Value	Basis
1. Information rate requirement	20.2 kbps	99 nodes x 256B x 8 / 10s
2. Slot expansion (1/gamma)	27.1 kbps	20.2 / 0.745
3. Half-duplex constraint (1/alpha_RX)	29.9 kbps	27.1 / 0.908
4. Minimum viable PHY	30 kbps	680 ms margin
5. Recommended PHY	35 kbps	1,830 ms margin

At 30 kbps the system works but has only 680 ms of margin in the superframe -- not enough to absorb ARQ retransmissions under fading. At 35 kbps the margin grows to 1,830 ms, which accommodates the P95 retransmission demand under the Gilbert-Elliott channel model with margin left over.

At 24 kbps, the ingress phase alone exceeds 10 seconds. The link is physically infeasible.

Campaign Duty Factor

Not all mission phases generate the same command traffic. A formation-keeping maneuver every 14 days is very different from a continuous debris removal campaign. The paper parameterizes this with the campaign duty factor d, which represents the fraction of operational time spent in active commanding:

Mission Phase	d	p_cmd	eta_cmd	Rationale
Quiescent / cruise	0.0	---	0%	No commanding
Station-keeping	0.05	0.009	0.4%	Delta-v every ~14 days
Collision avoidance	10^-5	1.0	~0%	~10 events/yr per cluster
Reconfiguration campaign	0.10	1.0	4.1%	Multi-day batch
Active debris removal	0.50	0.5	10.3%	Continuous ops
Stress bound	1.00	1.0	41%	<1% of operational time

The recommended default is d = 0.10 (reconfiguration campaign). Even at the theoretical stress bound of d = 1.0, total utilization reaches 67% -- high but not infeasible. The stress bound requires continuous commanding every cycle for all nodes, which represents less than 1% of operational time based on known mission profiles: station-keeping maneuvers occur every ~14 days (d = 0.05), and conjunction avoidance averages ~10 events per year per 100-node cluster (d approximately 10^-5).

NS-3 Validation

An analytical model is only as trustworthy as its assumptions. To test ours, we implemented an independent packet-level TDMA simulation in NS-3 -- the standard network simulator used in telecommunications research. The NS-3 scenario shares no code with our Python model. It uses NS-3's native PointToPointNetDevice, BurstErrorModel, and FlowMonitor.

Both models agree on the feasibility boundary: 24 kbps is infeasible (>50% deadline miss rate) and 35 kbps is feasible (<1% miss rate). That agreement from two codebases that share no implementation is the main validation outcome.

Gamma values from NS-3 are systematically 3-8% lower than analytical predictions. We decomposed this discrepancy into three components: a 16-bit framing difference (NS-3 uses 88-bit framing vs. our 104-bit CCSDS model, accounting for 1.8%), stochastic acquisition jitter in NS-3 (1.5-4.2% -- NS-3 draws acquisition time from a log-normal distribution rather than using a fixed 5 ms), and residual scheduling granularity (<1%). Under matched assumptions, the discrepancy narrows to <2%.

The NS-3 scenario itself uses idealized assumptions (star topology, no Doppler, no orbital dynamics). It validates the MAC-level timing accounting, not the full physical environment. Flight-hardware validation remains future work.

What Changed From the Preliminary Results

The earlier blog post reported "2-8% overhead" and confirmed that hierarchical coordination scales. The paper refines these findings:

Feature	Preliminary	Paper (Version DQ)
Overhead model	Simple bandwidth = messages/s x bytes x 8 / 1000	Three-component decomposition (baseline + architecture + command)
TDMA analysis	None	Slot-level with gamma, superframe timing, ARQ
Command traffic	Fixed	Campaign duty factor d parameterizes mission phases
Coordinator bottleneck	Not identified	20.2 kbps info-rate; 35 kbps recommended PHY
Validation	DES self-consistency	Independent NS-3 packet-level simulation
Overhead number	"2-8%"	5.6% architecture-specific; 20.5% baseline telemetry
Channel model	None	Gilbert-Elliott with ARQ; inter-cycle recovery

The "2-8%" figure from the preliminary work aligns with the paper's 5.6% architecture-specific overhead. The 20.5% baseline telemetry was always present but not separated out.

The Peer Review Process

This paper went through 120 versions of iterative review by three AI models (Claude Opus, Gemini Pro, GPT-5.2). The process added CCSDS framing analysis, the campaign duty factor, NS-3 validation, Gilbert-Elliott channel modeling, the two-test framework formalization, a gamma lookup table for practitioners, fleet-level spatial reuse analysis, and four alternative TDMA slot structures. Each addition was prompted by specific reviewer concerns.

What This Means for the Roadmap

Per-Cluster Sizing is Established

The coordinator link needs 35 kbps at S-band. This is a low rate -- well within the capability of small patch antennas. Per-node allocation of 1 kbps provides enough headroom for all known mission phases.

Fleet-Level Spatial Reuse is the Next Gap

All results in the paper are per-cluster (50-500 nodes per coordinator). Scaling to 100,000 nodes requires 1,000 clusters operating simultaneously. The paper shows that with spatial reuse factor R >= 3, fleet-level coordination is feasible at the 10-second cycle period. But R >= 3 requires either directional antennas or frequency planning, and this has not been validated. Fleet-level spatial reuse is the next validation priority.

Architecture-Specific Overhead is Small

At 5.6%, the hierarchical coordination architecture adds modest overhead. Command traffic dominates (>60% of total) and is topology-invariant -- you pay the same command cost regardless of whether you use centralized, hierarchical, or mesh coordination. The architecture choice matters for scalability and fault tolerance, not for bandwidth.

Code Availability

The simulation code (Python, MIT license) is open source:

packet_level_tdma.py -- TDMA slot-level model with gamma decomposition
generate_ns3_validation_fig.py -- NS-3 comparison figures

The NS-3 scenario is in publications/ns3-validation/. All results use deterministic seeds for reproducibility.

Research Questions:

Interactive Tool: Swarm Coordination Simulator (updated with TDMA feasibility analysis)

Explore

Community

Contribute