As of July 20, 2023 the central problem for militaries adopting autonomy is not only engineering capability. It is the cultivation of reliable, calibrated human-robot bonds that allow teams to act coherently under stress. Machines can extend reach and endurance, but without shared mental models, predictable interfaces, and deliberate training they will be unpredictable teammates rather than force multipliers.
There are three interrelated failures I see repeatedly in current doctrine and experimentation. First, programs too often treat autonomy as a toolset rather than a teammate. Second, human factors research that could guide training is underused by live-force exercises. Third, most training still emphasizes system pilots rather than the relational skill set a human needs to sustain a team-level partnership with autonomous agents. These deficiencies are not philosophical abstractions. They show up as miscalibrated trust, brittle procedures, and catastrophic OOTL failures when systems behave in unexpected ways.
What should training for the machine age battlefield aim to achieve? The objective is not simple familiarity. Training must produce calibrated trust, shared situation awareness, and a repertoire of plays that humans and machines can reliably execute together. The National Academies and human-autonomy research programs emphasize shared mental models and shared situation awareness as foundational to effective teaming. Training must therefore construct common ground: representations of intent, role expectations, failure modes, and decision thresholds that both parties can leverage during high tempo operations.
Practical modalities already exist and should be expanded. Synthetic environments and hybrid field experiments permit safe, repeatable practice of edge cases. DARPA’s OFFSET program demonstrates how iterative, mixed virtual-physical field experiments can mature human-swarm interfaces and human-swarm tactics before systems are pushed into operational units. The value of these programs is the rapid cycle of designing, exercising, diagnosing, and redesigning interactions between people and many-agents systems. Training doctrine should adopt that cycle as standard practice rather than exceptional research activity.
I propose a competency-based curriculum for human-robot teaming with five pillars:
1) Cognitive Transparency and Mental Model Literacy. Operators must be trained to form accurate mental models of autonomy behavior. Exercises should force trainees to articulate what the agent is “trying” to do and why, using constrained explanation templates. This is not about anthropomorphism. It is about predictable internal models that allow operators to forecast agent actions and spot divergence early. Research on explainability and shared mental models provides clear scaffolding for such syllabi.
2) Playbook Design and Execution. Teams should practice a compact set of plays or templates for common missions. Plays reduce cognitive load and enable rapid mutual adaptation. Training should iterate plays in synthetic environments under communication loss, sensor degradation, and adversary interference so that both people and autonomy learn the boundary conditions of each play. The playbook concept is already recognized in human-AI teaming literature and should be doctrinal.
3) Trust Calibration and Bias Exposure. Trust is a fragile and idiosyncratic variable. Training must therefore include controlled failure drills, trust repair exercises, and metrics-guided reflection. Individual differences in confidence and bias toward machines mean that curricula should include psychometrically-informed modules that surface operator tendencies and teach corrective strategies. Empirical work shows training can reduce misplaced reliance and improve decision-making when robot assessments conflict with human judgement.
4) Interface Fluency and Conversation Protocols. Humans must be fluent not only in controls but in the conversational acts that sustain teamwork: declaring intent, requesting replan, signalling uncertainty, and passing authority. DARPA and other programs have experimented with game-like, gesture, and haptic command modalities to reduce operator burden. Training needs to normalize multimodal interaction patterns and make the costs and latency of each channel transparent to users.
5) Ethical and Responsibility Scenarios. Teaming with agents forces hard questions about accountability and escalation. Training must integrate ethically fraught scenarios so commanders and operators practice lawful, proportionate responses when machines present options or act autonomously. This reduces moral shock and clarifies decision authority under uncertainty. National-level doctrine should require that these scenarios appear early and often in qualification.
How to embed this curriculum into existing force structures? My recommendation is a tiered approach: introductory professional military education to build conceptual literacy, advanced operator courses with high-fidelity simulators for playbook practice, and unit-level war-games and live combined-arms exercises that validate human-robot team performance in contested electromagnetic and information environments. Crucially, training pipelines must include cross-training of humans into the role of the machine and vice versa. Letting humans run the autonomy stack in simulation builds empathy for latency, sensing limits, and failure modes. Conversely, including autonomy-in-the-loop during commander decision rehearsals forces software designers to see where explanations fail.
Assessment must change. Stop measuring only platform reliability and operator checklist compliance. Start measuring team fluency, shared situation awareness, and calibration of authority transitions under stress. Objective measures can include time-to-detect divergence, correct invocation of contingency plays, and qualitative scoring of after-action shared mental model alignment. These metrics should guide both acquisition and persistent training. The National Academies committee has emphasized the need for metrics and soldier-in-the-loop experiments to make research relevant to operational users.
Finally, accept the epistemic humility that machines will remain partial teammates for years. Do not treat autonomy as a panacea. Training is where we convert technological possibility into operational reliability. If we do this well, human-robot bonds will be the product of disciplined practice, transparent expectations, and regular exposure to failure. If we do it poorly, we will produce brittle, overconfident teams that fail when it matters most. The choice is not between human primacy and machine primacy. It is between deliberate cultivation of partnership or the improvisation of crisis. The former requires investment in curriculum, exercise architecture, and human-centered metrics now. The latter guarantees painful lessons later.