NATO’s work over the past three years marks a clear shift from abstract principles toward practical steps for integrating humans and machines into cohesive teams. That shift is not uniform, nor is it complete, but it is real. In July 2024 NATO published a revised Artificial Intelligence strategy that explicitly broadens the Alliance’s orientation to include generative AI and AI-enabled information tools while reaffirming the Principles of Responsible Use adopted in 2021.
Those Principles of Responsible Use first articulated in 2021 remain the moral spine of NATO policy. They name lawfulness, responsibility and accountability, explainability and traceability, reliability, governability, and bias mitigation as the compass points for defence AI. That choice of high-level principles was necessary then and it remains necessary now, but principles alone do not create operational doctrine.
What has changed between principles and practice is a growing emphasis on operationalisation. NATO has set up bodies and processes to translate the PRUs into testable artefacts, notably through work on certification standards and internal review mechanisms. In early 2023 NATO began a formal process to develop an AI certification standard designed to make governance and engineering expectations more concrete for industry and Allied procurement.
Parallel to certification work, NATO’s community of practice has produced guidance on autonomy and human-autonomy teaming. Practitioners and specialist teams within NATO have contributed an operational lexicon and practical guidance that are intended to inform force development and procurement decisions. One such output, widely cited in the professional literature, is the set of Autonomy Guidelines prepared for practitioners that offer a scaffold for thinking about levels of autonomy, human-on-the-loop responsibilities, and risk mitigations. These guidelines do not replace national rules of engagement or international law, but they supply common language and practical assessment criteria that are essential for multinational operations.
Operational testing has followed the doctrinal and standards work. NATO agencies and affiliated bodies have run interoperability exercises that focus on uncrewed systems and counter-uncrewed capabilities. The C-UAS Technical Interoperability Exercise in September 2024 is an example where sensors, effectors, and data-sharing protocols were exercised with the explicit intent of ensuring that heterogeneous member-state systems can cooperate under stress. Exercises like these test not only hardware and software but also command relationships, information exchange, and the human decisions that bind machines into teams.
Two tensions dominate the doctrinal conversation about human-robot teams. The first is the human-machine responsibility problem. NATO’s principles insist on human accountability, but the operational reality of distributed autonomy makes attribution and meaningful human control more complicated than a slogan can capture. When autonomy is layered across multiple platforms, when decisions are mediated by opaque models, and when timelines compress, the line between human oversight and machine action blurs. NATO’s effort to produce certification standards and traceability requirements is a direct response to this problem. Absent measurable traceability and lifecycle assurance, accountability risks becoming rhetorical.
The second tension is interoperability versus national sovereignty. Member states acquire systems according to national procurement cycles and risk appetites. For human-robot teams to function across the Alliance we need common interfaces, shared data models, agreed autonomy lexicons, and harmonised assurance processes. NATO cannot compel a single procurement path but it can and has started to create the institutional hooks that encourage commonality. DIANA, NATO test centres, and the Alliance’s standardisation fora are instruments intended to close that gap. The question for doctrine writers is how prescriptive NATO should become about thresholds for autonomy and for which warfare functions coalition-level rules must apply.
Doctrine for human-robot teams cannot be a single manual. It must be modular, enabling commanders to assemble a doctrinal mosaic for particular missions. That mosaic should include at least four elements. First, clear role definitions that allocate responsibility between human and machine at mission and action levels. Second, measurable assurance criteria for explainability, traceability, reliability, and bias mitigation so systems can be certified for intended contexts. Third, training and simulation regimes that rehearse human-autonomy teaming under degraded and adversarial conditions. Fourth, legal and ethical checklists integrated into rules of engagement and engagement authorisation workflows so that the human in the loop remains a meaningfully informed agent rather than a procedural fig leaf.
Practically speaking, NATO’s next doctrinal moves should be modest and technical rather than grand and vague. Set interoperability baselines for messaging, metadata, and command-state exchange. Define minimum assurance tests for autonomy enabled functions operating in contested environments. Institutionalise red-teaming and third-party audits as routine parts of procurement. Require mission-level human situational awareness metrics that can be measured in training. And finally, commit to transparency about governance processes so that public legitimacy keeps pace with technical capability. These are not philosophical luxuries. They are the scaffolding that will permit the Alliance to field reliable human-robot teams without outsourcing moral agency.
One must also accept an uncomfortable truth. Doctrine cannot erase operational risk. It can only change the distribution of that risk and ensure that it is acknowledged, governed, and auditable. NATO’s revised AI strategy, its certification initiative, the autonomy guidelines and interoperability exercises together signal a deliberate attempt to move from aspiration to practice. That progress is encouraging. It will be insufficient unless the Alliance insists on measurable standards, invests in shared testbeds, and treats ethics and law as engineering constraints rather than optional commentary.
In the end, human-robot teams are an alliance problem as much as they are a national capability problem. Doctrine is the tool by which the Alliance converts shared values into shared practice. NATO has laid important foundations. Turning those foundations into robust, operational doctrine for human-robot teams is now the urgent task. If NATO succeeds, it will be because it has learned to make principled trade-offs explicit, measurable, and reversible in the face of failure. If it fails, the result will not only be tactical confusion. It will be a deeper erosion of the moral clarity that underpins collective defence.