Trust by Default

Dec 21

Defense AI capabilities are accelerating rapidly, but the design frameworks for how operators should relate to those systems are not. Miscalibrated trust is becoming an operational failure mode, and right now, many of those trust relationships are being shaped accidentally through engineering defaults rather than intentionally through design.

TL;DR

AI systems will be increasingly influencing operational decisions across every phase of military workflows.
Over-trust and under-trust both increase risks of operational failure.
Explainability alone does not produce calibrated trust and may increase automation bias.
Operators often oscillate between blind acceptance and exhaustive skepticism.
Autonomous systems introduce new tensions between trust, oversight, and desire for control.
Commercial technology has already documented many of these human-AI trust failures.
Defense needs to treat trust calibration as a formal operational design requirement.

The Relationship Nobody Designed

An operator in the box at a terminal with an AI recommendation onscreen for lethal action. The system has processed more data, from more sources, faster than the operator could in real-time. The system recommendation is there and the window is closing.

The question is not whether the AI is correct. The question is whether the operator’s trust in that recommendation, in that specific moment, against that specific threat, is calibrated correctly.

Too much trust creates one kind of failure. The operator defers to the system when they should have intervened. Too little trust creates another. The operator overrides a correct recommendation, eliminating the very operational advantage the system was designed to provide. Both are operational failures. Both carry consequences. Yet despite the extraordinary amount of momentum surrounding AI in defense, very little attention is being given to how operators should actually relate to these systems once they enter real operational workflows.

Over the last several years working on command-and-control platforms, autonomous systems, and mission-critical interfaces, we are seeing conversations around AI capabilities accelerate dramatically. Nearly every discussion now includes some variation of the same themes: decision aids, cognitive load reduction, sensor integration, autonomous execution, machine-assisted analysis. The assumption that AI will become embedded within military workflows is no longer speculative, it is inevitable and we're barreling towards it.

What has struck me, however, is how rarely those conversations include a meaningful discussion about the warfighter experience and trust calibration. Not trust in the abstract or training mission but real-world operational trust.

How should uncertainty be communicated? How should confidence be surfaced? When should an operator trust the recommendation in front of them, and when should they challenge it? What should meaningful oversight actually look like inside an interface under pressure within a chain of command? These questions are often treated as downstream implementation details, secondary concerns to solve once the technical capability is demonstrated.

This is becoming, or has become, a central user experience design problems in modern defense systems.

Miscalibrated Trust

The scale of investment makes this challenge difficult to ignore. Defense organizations and contractors are investing billions into AI-enabled capabilities intended to accelerate decision-making, prioritize information, and increase operational tempo. In many cases, those investments are justified. AI-assisted systems can process sensor data faster than humans, identify patterns at scale, surface relevant information, and disseminate it more quickly than operators working alone ever could.

The problem is not the capability investment. The problem is the relationship between operators and those capabilities is often being shaped implicitly through engineering defaults rather than intentionally through human-centered design, putting the warfighter first. Every interface communicates a trust model whether the design team intends it to or not.

If every recommendation appears visually identical, operators learn that every recommendation carries equal reliability. If override is cumbersome, operators learn not to question the system. If the system fails without explanation, distrust persists long after the issue itself is corrected. These are not philosophical concerns. They are interaction design decisions with operational consequences.

What I have observed in research environments is operators trust systems to varying degrees which may or may not align with what those systems are actually capable of and in comparison to their own training and expertise. These circumstances may lead to trust miscalibration which can happen in both directions.

The over-trust scenario is relatively familiar. Aviation has spent decades studying automation bias, where operators become overly reliant on system recommendations and fail to intervene appropriately when the system is wrong. As AI systems become more integrated into defense workflows, similar patterns will begin to emerge.

During research sessions evaluating AI-assisted workflows, I have watched soldiers presented with interfaces that identify threats, recommend deterrent solutions, and ask whether they want to engage. In some cases, the response is almost immediate. Yes. Engage. The recommendation itself becomes the authority signal.

What makes these interactions difficult to interpret is that research environments are inherently imperfect, imagined scenarios. Soldiers participating in evaluations are often highly trained, highly confident, and motivated to project competence. Many are extremely accustomed to adapting quickly to new systems and procedures. It can be surprisingly difficult to get beneath the surface layer of “Yeah, this works,” especially when evaluating hypothetical or emerging capabilities.

The opposite pattern appears just as frequently. Rather than over-trusting the system, some operators don't trust it at all. They want every possible input surfaced before acting. Every confidence level, classification criterion, and supporting metadata. Every explanation for how the recommendation was generated. In low-pressure environments, this can sound responsible and appropriately cautious. Under the duress of real-time operations, however, it can become debilitating. At some point, the speed advantage the system was designed to provide collapses under the weight of the operator’s need to validate every detail manually.

Control Versus Trust

There is a third category of behavior that I have found particularly interesting because it is not exactly about trust as it is about control.

While evaluating autonomous agent workflows, I have often seen operators manually define extremely detailed courses for systems specifically built for high-level autonomous tasking. The concept behind these systems is relatively straightforward: assign agents, define objectives, set actions, and allow the autonomy stack to execute the details. Yet many operators instinctively gravitate toward micromanaging every movement the platform should make. Not necessarily because they distrusted the autonomy, but because they wanted the system to do exactly what they commanded. Their mental model is not set to command-and-observe but toward something much akin to teleoperation.

That distinction matters because the interface required to support autonomous objective-setting may be fundamentally different from the interface that supports high-touch manual control. Neither behavior is inherently wrong and both are understandable. But only one actually aligns with the operational intent of the system being designed.

The explainable AI conversation complicates this further. One of the more counterintuitive findings emerging from explainable AI research is that explanations themselves can increase automation bias where a clear and confident explanation can function as an authority signal. The more coherent and polished the rationale appears, the more likely operators may be to defer to it, including in situations where the recommendation is wrong.

DARPA’s framing around explainable AI is important here because it avoids a trap many organizations still fall into: the goal is not complete trust, the goal is calibrated trust and those are not the same thing.

Adding confidence scores to an interface is not the same as designing calibrated trust. More transparency does not automatically produce better judgment. In some cases, it may simply make operators more confident in potentially bad recommendations.

AI Inside the OODA Loop

Part of the reason this problem feels so urgent is that AI will no longer sit adjacent to military decision-making, it will be embedded directly inside it.

The OODA loop, Observe, Orient, Decide, Act, has shaped military thinking around decision-making in high-intensity situations for decades. Historically, systems supported each phase through information hierarchy, contextual organization, structured decision pathways, and trasparent execution. AI is now entering every phase of the loop simultaneously with sensors filtering and prioritizing information algorithmically. Analysis layers contextualizing and weighting incoming data through machine learning. Recommendation engines influencing decisions. All of which are marching toward autonomous systems executing actions directly.

The loop itself has not changed, but what the operator does within each phase absolutely will.

At the Observe stage, operators will need to understand what information has been filtered or prioritized by AI systems versus what remains raw input. During Orient, they need to understand how machine-assisted analysis should relate to their own expertise and outside situational awareness. During Decide, interfaces must avoid creating automation bias at precisely the phase where critical thinking matters most. This is where concepts like cognitive forcing functions become incredibly important. These trigger guards are deliberate interface interventions which interrupt repetitious deference long enough to require engaged judgment before action occurs.

At Act, the challenge changes again. Once autonomous systems are maneuvering, sensing, coordinating, or executing tasks independently, the operator is no longer simply deciding whether to follow a recommendation. They are deciding whether they understand the system well enough to intervene appropriately if necessary. That is a fundamentally different design problem than many current conversations acknowledge.

Commercial Lessons With Defense Stakes

What makes all of this particularly important is that these patterns are not theoretical. The commercial technology world has already spent years documenting many of them at scale. Users over-trust systems and make poor decisions. Users under-trust systems and abandon capabilities entirely. Users lose confidence after a single unexplained failure and often never recover trust completely. These are known patterns across consumer software, autonomous systems, medical technology, and aviation interfaces.

Defense is not facing a fundamentally different human problem, the stakes are simply much higher.

There is also an expectation gap already emerging that defense organizations should take seriously. The next generation of warfighters are arriving with deeply ingrained expectations shaped by their lifetime using commercial technology. They have grown up using systems that personalize experiences, support fluid interaction, fail gracefully, and increasingly explain their behavior. They are evergreen. At the same time, military systems still feel fragmented, rigid, workflow-heavy, and just old. I have heard Marines refer to legacy systems and equipment simply as “green gear,” shorthand for technology generations behind what they encounter everywhere else in their lives.

That expectation mismatch matters because operators form trust relationships with systems extremely quickly. Early interactions establish mental models that can persist for years, accurate or not.

Designing Trust Intentionally

The encouraging part is that this problem is solvable. Human factors engineering, usability testing, cognitive task analysis, iterative prototyping, and contextual inquiry are all well-established methodologies. Commercial technology, aviation, and medical systems have already demonstrated the value of designing for calibrated trust rather than assuming it emerges automatically from technical capability alone.

The defense industry does not need to invent these methods from scratch but it does need to begin treating trust calibration as a requirement rather than an afterthought. Regardless of whether organizations acknowledge it, design decisions are already being made, and not by experts in design working to gather and represent the warfighter's perspectives.

The operator wants to trust the system. The question is whether defense programs are intentionally designing that relationship before these systems become problematically ubiquitous.

Trust by Default

Further Reading

The Operational Landscape