The Department of Defense has for several years sought to reconcile two competing imperatives. On one hand there is an operational demand to harness AI for speed, scale, and accuracy. On the other hand there is an ethical and legal obligation to ensure that the use of force remains accountable to human judgment. The DoD’s formal architecture for resolving that tension rests on successive policy layers: the AI Ethical Principles adopted in 2020, a 2021 implementing memorandum from the Deputy Secretary, a 2022 Responsible AI Strategy and Implementation Pathway, and operational tools released more recently to help project teams translate principle into practice.
At the rhetorical level the Department has been explicit. The five ethical principles emphasize responsibility, traceability, reliability, equity, and governability. The language repeatedly places responsibility with human personnel and requires that systems be designed so they can be deactivated or disengaged if they behave outside intended parameters. The 2021 memorandum directed a holistic, disciplined approach across governance, acquisition, testing and workforce preparation. These statements are not window dressing. They establish a normative baseline that any deployment must reference.
When policy meets metal, however, the mandates become messier. DoD Directive 3000.09, updated in January 2023, governs autonomy in weapons systems. It formalizes senior review requirements and introduces a flowchart to determine when systems require additional scrutiny. But it also recognizes operational realities by allowing for human-supervised modes in certain time-critical defensive contexts. That carve out is sensible in the abstract. It is nevertheless consequential in practice because it permits exceptions to more restrictive oversight where reaction time makes a human-in-the-loop unfeasible. The net effect is a graduated regime of oversight rather than an absolute rule.
Two consequences follow from the graduated approach. First, the DoD’s core ethical phraseology hinges on the term “appropriate levels of human judgment”. That phrase is flexible on purpose. Flexibility can be an asset when commanders must adapt to novel environments. Yet it is also an interpretive void, one that shifts the burden from rulemaking to ex post accountability. Second, much of the Department’s operationalization relies on voluntary processes and internal toolkits rather than binding acquisition clauses or statutory mandates. The CDAO’s Responsible AI Toolkit, released publicly in late 2023, is a prime example. It is a practical and useful collection of assessments and worksheets, but it is voluntary. Departments and program offices can adopt it, adapt it, or ignore it. Voluntarism is an attractive short term strategy for experimentation. Over the long run it risks producing a patchwork of compliance with uneven human oversight.
Independent oversight bodies have noticed the same implementation gap. GAO reviews in recent years have repeatedly found that DoD lacks consistent, department-wide guidance for AI acquisitions, an accurate inventory of AI systems, and clear definitions to identify the AI workforce. Without those fundamentals the Department cannot reliably assure that human oversight is being resourced or embedded into procurement and fielding decisions. In short, the principles are in place, but the bureaucratic plumbing to enforce them is still under construction.
There are operational pressures that push in the opposite direction. Field commanders understandably prize systems that reduce cognitive load and shorten decision loops. Commercial AI also exerts pressure through contracts that deliver capability quickly and with minimal friction. When an urgent capability promises a tactical advantage, the incentive is to accept narrower oversight or to accept waivers. DoD documents themselves allow for waivers in cases of urgent military need. That is realistic. It is also where human oversight mandates become most vulnerable to erosion.
From an engineering and ethical standpoint the real work of human oversight is technical and organizational, not merely rhetorical. Human oversight requires design patterns that make human roles meaningful and traceable. That means instrumentation for explainability and confidence metrics, interfaces that present uncertainties rather than opaque outputs, audit trails that record both algorithmic reasoning and operator interventions, robust test and evaluation regimes, and red team exercises that stress failure modes. It also demands acquisition language that conditions acceptance on demonstrated auditability and governability, and workforce development so that operators and commanders possess the literacy to exercise ‘‘appropriate judgment’’. The Department has drafted many of these pieces, but the measures are unevenly required and unevenly enforced.
Policy recommendations should therefore be concrete. First, the DoD should convert voluntary toolkits that demonstrate efficacy into mandatory acquisition requirements for any system that influences targeting, escalation, or other uses of force. Second, the Department should quantify what it means by “appropriate levels of human judgment” in different operational contexts. Those quantifications should include measurable human factors milestones: reaction times, interface confirmation steps, and minimum audit logging requirements. Third, waivers for senior review should be narrowly constrained, publicly logged, and subject to after-action evaluation that is accessible to oversight authorities. Fourth, Congress and Department leadership should close the workforce gap by defining an AI career path with clear skill codes, training requirements, and retention incentives so that human oversight is not performed by improvisation. GAO recommendations already point in these directions; implementing them should be a priority.
Finally, the question is not simply whether humans remain nominally in the chain of control. The deeper ethical question is whether human beings retain epistemic authority over the facts and judgments that lead to force. Human oversight that amounts to a superficial button press after an opaque recommendation is not moral or legally sufficient. If DoD is to retain both operational advantage and democratic legitimacy it must pursue oversight that is meaningful, auditable, and institutionalized. Without that, responsibility will be a rhetorical placeholder rather than a practical constraint on the use of force.
Human oversight mandates in DoD AI are therefore no longer a problem of proclamation. They are an engineering and governance problem. The Department has assembled a commendable policy architecture and an array of tools. The remaining challenge is implementation at scale under the incentives and pressures of procurement and combat. Closing that implementation gap will determine whether the DoD’s human oversight commitments are binding constraints or convenient rhetoric.