DARPA’s Air Combat Evolution program is now a clear waypoint in the story of autonomy in high speed, high consequence domains. What began as a laboratory challenge to see whether algorithms could learn the tactics of within‑visual‑range aerial combat has matured into live-flight experiments that force us to confront not only technical thresholds but the social and ethical architecture that must surround any deployment of lethal autonomy.

The program’s arc is useful to recite because it demonstrates how a careful, phased approach can move ideas from simulation to aircraft without pretending the hard work is done. ACE’s public narrative begins with the AlphaDogfight Trials in 2019–2020. Those simulated tournaments, streamed and scrutinized by the community, culminated in August 2020 when Heron Systems’ agent defeated an experienced F-16 pilot in a virtual dogfight. The Trials were designed as a seedbed for algorithms and a way to build confidence among pilots and engineers that AI could meaningfully contribute to air combat tactics.

ACE then adopted a staged progression across phases: Phase 1 emphasized simulation and algorithmic development; Phase 2 moved agents into flight-capable testbeds; Phase 3 was intended to explore complex human-machine collaboration in manned flight environments. These phases are not bureaucratic checkpoints. They are experiments in risk management, trust calibration, and iterative learning at scale. The program’s Phase 1 and later TA3 awards to companies such as Dynetics illustrate the modular way DARPA funded parallel efforts to broaden the technical base.

The first major transition out of the simulator came in late 2022 and 2023. DARPA and its partners ported ACE agents into a specially modified F-16 testbed, the X-62A VISTA, and ran a sequence of flight tests at the Air Force Test Pilot School at Edwards Air Force Base. Those flights demonstrated that reinforcement‑learned and hybrid AI controllers could operate full‑scale fighters under supervised conditions and generated essential data about real aerodynamics, actuator latencies, sensor noise, and pilot reactions—all factors that do not appear in sanitized simulations.

By spring 2024 DARPA and collaborators publicly framed a new milestone. ACE teams flew in-air engagements in which AI agents controlled the X-62A in within‑visual‑range scenarios against human‑piloted F-16s. DARPA described these events as among the first live demonstrations of AI piloting a fighter in contested maneuvers against a human. The effort involved a diverse set of performers and government partners working to test both tactics and trust metrics.

Reading the timeline as data rather than hype yields a set of sobering technical lessons. Algorithms that excel in simulation encounter unexpected failure modes when confronted with imperfect sensors, actuator limits, and environmental variability. The step from simulated dogfights to the X-62A did not eliminate those problems. Instead it revealed new categories of failure that required changes to training curricula, safety envelopes, and the human supervisory interfaces that mediate pilot-agent interaction. Some issues were mundane engineering: measurement latencies, control authority handover, and deterministic time budgets for decision loops. Others were psychological: how to measure operator trust and how pilots adapt mission tactics when paired with an agent whose reasoning is not directly inspectable.

The societal lessons are no less important. ACE deliberately framed autonomy as augmenting rather than replacing human judgment, positioning pilots as battle managers who supervise multiple assets. That is a defensible concept but it is not a policy solution. If autonomy systems gain kinetic authority, mechanisms for accountability, auditability, and legal responsibility must be explicit before operational fielding. Philosophically, the ACE timeline shows how secrecy and incremental testing can lower immediate risk while obscuring longer term governance choices. The technology timeline is not the same as the policy timeline. The former proceeds through experiments and software iterations. The latter must catch up with doctrine, rules of engagement, and international norms. My point is straightforward: speed of technical progress does not absolve us from deliberative choices about use.

What comes next for ACE-style autonomy is predictable in shape though not in detail. Expect further integration tests where AI agents operate as wingmen, coordinate multi-axis tactics, and accept constrained levels of autonomy under human supervision. Expect cross-domain lessons to be transferred into maritime and ground autonomy programs, and expect commercial advances in perception and compute to continue lowering technical barriers while raising ethical stakes.

If there is an instructive virtue in the ACE timeline it is humility. The program shows us how to combine rigorous experimentation with transparency to the pilot community, and it shows where additional work must be done: robust testing across the full ecological envelope of flight, formal methods for certifying learning systems, interfaces that make agent intent legible to human supervisors, and legal frameworks that tie technical capability to clearly allocated responsibility. Technology can produce extraordinary capability, but without parallel governance those capabilities will be brittle or worse.

For readers concerned with battlefield ethics, procurement officers, and engineers alike, ACE is a practical case study. It is not destiny. It is an architecture that can be shaped. The choice before the community is whether we will allow technical possibility to set policy or whether we will design institutions and norms that ensure autonomy is a tool of human judgment rather than a substitute for it.