In 2024, an Air Canada customer asked a chatbot about bereavement fares. The bot confidently gave him a refund policy that didn’t exist. The airline refused to honor it. A tribunal ruled in the customer’s favor. The bot hadn’t decided anything; it had predicted an answer based on patterns in its training data. The company treated that prediction as policy.

This is the risk at the heart of designing with AI today: probabilistic systems wrapped in deterministic interfaces. The AI offers a guess, the interface presents it as truth, and the user, or the organization, acts on it.

Humans are wired for deterministic thinking. We prefer to believe that past actions determine future outcomes. Flip a coin 999 times and get heads every time, the deterministic mind assumes the coin is rigged. The probabilistic mind accepts that the 1000th flip could still go either way. That second mindset is harder to hold onto, but it is exactly what designers need right now.

Products operate in complex, nonlinear environments, and AI is accelerating that complexity. When designers and product teams treat AI outputs as the answer rather than one of many possible answers, they build fragile experiences, and in some cases, like medical diagnostics or financial forecasting, genuinely dangerous ones.

This article is a practical guide to designing probabilistically with AI as a partner. It is about using AI to sharpen your thinking rather than outsource it, accounting for model bias, human sentiment, and perceived risk along the way.

Probabilistic Thinking + AI

Most questions we ask AI do not produce binary answers. They produce probabilities based on patterns in data. If you ask, “Do aliens exist?” the answer will be somewhere between plausible and uncertain. Scientists consider life elsewhere in the universe likely, but without any concrete evidence, we cannot confirm it. The answer doesn’t resolve the question; it frames it as a probability.

Designers should read AI outputs the same way. They are signals, not conclusions, possible outcomes that have to be interpreted within the context of product goals, user behavior, and business constraints.

Many digital products already work this way. Netflix doesn’t know you’ll enjoy Superstore because you watched The Office; it estimates the probability and surfaces the title accordingly. The interface is responding to a prediction.

Design decisions can follow the same logic. AI models can combine behavioral analytics with research insights to estimate the likelihood of certain outcomes, and those probabilities can act as a yardstick for design strategy. Consider a scenario where analytics suggest a 60% versus 90% confidence that users will complete a purchase. At 60%, the design has to do more persuasive work, testimonials, explanations, comparisons, and reassurance signals may help the user move toward a decision. At 90%, the user is already motivated, and the design should start removing friction so the action can happen quickly. Same screen, very different design problem.

Comparison of two hair product ads showing the same model, with the simplified design on the right labeled 90% confidence and the text-heavy design on the left labeled 60% confidence.
Note: This is an oversimplification of the idea. Please be mindful of the intricate details of your product. (Large preview)

AI can also simulate outcomes using historical data and behavioral models before you commit to a direction. The value of those simulations depends heavily on how prompts are structured, the context they define, the hypothesis being tested, user motivation, and the edge cases you want stressed.

I can think of one such practical use: evaluating early designs through structured prompts, especially when you don’t have direct access to the user group you’re designing for. The prompt below is a starting point for evaluating a design from the perspective of neurodivergent users as well. Treat it as a template, adapt the user group, criteria, and output format to your product, and use it as a conversation starter with your team rather than a verdict.

Evaluate the [design file or weblink] for usability, accessibility, and content relevance from the perspective of neurodivergent users such as those with autism spectrum disorder, ADHD, learning disabilities, etc.
Please consider the following criteria:

  1. Is the layout and navigation intuitive for neurodivergent users?
  2. Is the language and content appropriate and engaging for neurodivergent users?
  3. Are there any barriers (technical, cognitive, or sensory) that this group might face when using the site?
  4. How well does the site meet the specific needs or goals of neurodivergent users?

Provide a SWOT analysis, probability score for successful use by neurodivergent users, and any recommendations for improvement.

Note: This is an oversimplification of the idea. Please be mindful of the intricate details of your product and make any appropriate changes.

That said, simulations do not replace experimentation. Because models are trained on historical data, they reflect past behavior more strongly than they predict future change. Imagine designing a voice interface for elderly users who struggle with touchscreens. A model trained on mobile interaction data might predict low engagement, not because the idea lacks value, but because the dataset reflects different user behavior. Simulations should always surface assumptions, not prevent innovation.

Be Cautious of Skewed Probabilistic Thinking Using AI

AI systems are built on historical data, more specifically, on the datasets they are trained on. That foundation shapes the outputs we receive. During the AI Summit in France, India’s Prime Minister Narendra Modi shared an example that illustrates this well. If you ask an AI model to generate an image of a person writing with the left hand, the output may still show a person writing with their right hand. The reason is statistical: most people are right-handed, and the training data reflects that. This may have improved over time, but the point remains relevant. I still occasionally see this behavior when generating images with similar models.

What you receive is not truth. It is the most statistically likely outcome given the data available. Always ask whether past data meaningfully predicts future behavior. If additional context can improve the prediction, include it. Without context, the output is just one of many possible answers dressed up as the only one.

Promt, which reads: create an image of a person sitting in his chair facing his desk and writing with his left hand in his notebook, and the image created for it.
(Large preview)

Confidence scores deserve the same scrutiny. Overtrusting a high-confidence output leads to the Air Canada situation. Dismissing a low-confidence one can cause teams to miss a real signal buried in noisy data. A prediction with 90% confidence is not necessarily correct, and a 40% signal is not necessarily useless. Designers must still weigh the possibilities, consider the case in front of them, and bring judgment to what the AI recommends.

Transparency is how you make that possible. As AI systems increasingly shape decisions, people need visibility into how outputs are generated, the sources, the reasoning, and the summaries behind a recommendation. Black-box systems breed distrust. Systems that reveal their reasoning let users evaluate outputs for themselves. That transparency is good design and ethical practice. It respects the trust people place in these tools.

Thinking in probabilities often means resisting the temptation of quick answers. AI can accelerate research and surface patterns faster than ever before, but those outputs are starting points, not final decisions.

Practice Probabilistic Design with AI

Design shapes how a product is ultimately experienced — the decisions designers make determine whether the experience feels adequate, intuitive, or exceptional. And design is inherently full of assumptions and bets. Even the most rigorous research can yield multiple valid solutions to the same problem, each carrying a different probability of success.

Thinking probabilistically means recognizing that design decisions rarely produce binary outcomes. They lead to a range of possible results, and the role of the designer is to navigate those possibilities and identify the path most likely to create value. This mindset also builds adaptability: user needs evolve, strategies change, and sometimes ideas fail. Teams that lean on data signals, experimentation, and learning loops move faster toward the most effective solution.

Before the practical principles, one fundamental idea:

Design decisions should be optimized for likelihood, not certainty.

Design for Likelihood, Not Certainty

Every design decision is a bet, not a guarantee. Even when decisions are informed by research and data, they are still based on smaller samples and assumptions about how users will behave at scale. A well-researched idea can still fail in the real world.

The Air Canada chatbot from the introduction is a design lesson as much as a legal one. The bot was doing what language models do, predicting plausible text. The interface, however, communicated that prediction with complete confidence, no caveats, no “here’s what our policy usually says,” no obvious path to a human. The user read confidence as commitment, and legally, so did the tribunal.

This is what happens when probabilistic systems are wrapped in deterministic interfaces. The interface transforms likelihood into certainty, and that is where the risk emerges.

Designing for likelihood means letting the interface continue to have uncertainty, visible fallbacks to human support, and clear labeling when content is AI-produced, preventing unforeseen issues.

Designers should avoid binary thinking — a great idea does not mean guaranteed success, and a familiar idea is not guaranteed to fail. Examine variations, confidence levels, and edge cases instead. AI can certainly help here, acting as a portfolio-thinking engine that surfaces different interpretations, highlights risks, and generates structured recommendations. The goal is not to optimize for certainty, but for value: it should always be value-driven.

Think of the moment in Avengers: Infinity War when Doctor Strange tells Tony Stark that out of millions of possible futures, there is only one where they win. AI cannot tell you the future, but it can help you explore the possible paths. Instead of asking whether an idea will succeed, ask AI to estimate the likelihood and get a score, and use those signals to guide decisions.

Use Data as a Compass, Not a Map

Even an actual probability is not a final answer. Imagine an AI model predicts an 80% likelihood that users prefer a minimal checkout experience. That does not mean the solution is simply “build a minimal checkout.” Data should function as a compass, not a map.

  • Why did the model produce that prediction?
  • What data influenced it?
  • What assumptions is it leaning on?
  • What user behavior is it actually detecting?

These questions help designers validate predictions through usability testing and additional research. AI excels at identifying patterns, but it rarely explains why those patterns exist. Understanding motivation is still a human-centered research task.

The clearest cautionary tale here is Amazon’s experimental AI recruitment tool, which the company reportedly scrapped after discovering that the model had learned to downgrade resumes from women. The training data, roughly a decade of historical hiring decisions, was skewed toward male candidates, and the model inherited that skew. It began penalizing resumes that included the word “women’s,” as in “women’s chess club captain,” and favoring language more commonly found on men’s resumes. The system was not intentionally biased — the data was. Amazon reportedly tried to adjust it and eventually shut the project down because they could not guarantee it would not surface other discriminatory patterns.

Examples like this are why interpreting AI output critically matters. Designers need to understand the data behind a prediction and evaluate the reliability of the models they depend on. A recommendation is only as good as the data it was trained on, and the only way to know what that data is hiding is to ask.

Experiment as a Learning System

Experimentation is usually framed as a way to validate a design decision. Want to lift the click-through rate of a CTA? Run an A/B test. Probabilistic thinking reframes this. Experiments should not only confirm solutions but also reduce uncertainty.

  • Traditional approach: Testing features to confirm success.
  • Probabilistic approach: Testing assumptions to reduce uncertainty.

Traditional A/B testing is expensive. It costs engineering time, traffic allocation, and user exposure, especially when a losing variant runs against a significant chunk of your audience. AI simulations can help filter weaker ideas before they reach production by making experimentation more efficient. User needs shift constantly, and the most effective teams iterate fast.

AI can help evaluate assumptions early by modeling potential outcomes based on historical and behavioral data. These simulations act as a hypothesis filter, pointing to the directions worth investing engineering effort in. This also supports personalization — different users may respond better to different experiences. Version A may resonate with high-intent users while version B works better for exploratory ones. Multiple experiences living side by side are not a flaw; they can be an intentional strategy.

AI amplifies probabilistic thinking by surfacing scenarios, assigning likelihood scores, and enabling personalization at scale. Experimentation becomes a continuous feedback loop:

Predict → Test → Learn → Adjust → Repeat!

A few steps to make it work:

  • Shift the framing

    • So instead of saying: Will this feature succeed?
    • Ask: What assumptions are we testing?
    • Use this template to define the hypothesis:

      We believe [behavioral assumption] will impact [metric] because [reason]. We’ll know we are right when [evidence].

      Example: We believe simplifying the onboarding flow from 5 steps to 3 will increase completion rate because users experience decision fatigue when too many choices are presented. We’ll know we’re right when we see at least a 15% increase in step-to-step conversion with no drop in activation rate.

  • AI simulations

    • Use AI to predict some of the assumptions.
    • Later, use the learning to identify the top candidates to test the hypothesis.
  • Embrace multi-versions

    • It is absolutely fine to have two live versions.
  • Fail fast

    • Reward learning vs success.
    • Normalize smaller experimentations instead of a sweep of large changes. So instead of taking on a risky bet, pick up a few probabilities and test them.
  • Visualize probability

    • Create a probability table with probabilities of each variant and its prediction of success to keep track of all the changes.

Communicate Uncertainty Clearly

One of the hardest things for designers is making uncertainty understandable and actionable. When uncertainty is hidden, users treat AI outputs as facts. When it’s communicated clearly, trust increases.

Ranges, estimates, and confidence indicators go a long way. A delivery window of “Friday to Monday” tells the truth about variability without misleading anyone, whereas a specific timestamp that slips erodes trust every time. A face recognition feature that says “this looks like Pratik, is that right?” sets more honest expectations than one that just labels the photo with a name.

(Large preview)

Communicating uncertainty does not weaken trust — it strengthens it. The goal is not to eliminate uncertainty but to design for it intelligently.

Different users respond to uncertainty differently, and your design should account for that:

User type Risk Design goal
Overtrusting users They act too quickly and trust AI results easily./ Show uncertainty more prominently.
Distrustful users They ignore AI entirely. Show historical accuracy or confidence levels.
Skeptical/balanced users Uses AI as a guide, not as a rule. Reinforce AI assistance and let them decide the sort of framing.

Keep Humans In the Loop

AI should augment human judgment, and certainly not replace it. The most trustworthy systems are designed with clear moments where people can review, challenge, correct, or override machine suggestions. Human-in-the-loop (HITL) is not a safety net — it is a refinement engine. Every override, correction, or rejection becomes high-quality feedback that improves the model over time.

Control is a prerequisite for adoption. Users are more willing to rely on AI when they understand how a suggestion was generated, can evaluate its implications, and can easily intervene. Well-designed products make this explicit: who is acting, what happens if the suggestion is wrong, and where the user can step in.

These interactions are also critical for system improvement. Every accept, reject, or edit is a strong signal, and compared to passive analytics, this kind of feedback produces far more meaningful training data. It closes the loop between real-world usage and model performance.

What Does HITL Look Like in Practice?

GitHub Copilot is a good everyday example. It offers inline code suggestions that developers can accept with a tab, edit, or ignore entirely. The system never commits code on the user’s behalf. Authorship stays with the humans. Every data point becomes implicit feedback about which suggestions were useful. Gmail’s Smart Compose works similarly, presenting predicted text as optional, keeping tone and intent in the user’s hands.

In higher-stakes contexts, HITL becomes more explicit. Risk and fraud systems typically use probability scores to route decisions: low-risk: proceed automatically; medium-risk: trigger additional verification; and high-risk: escalate to a human reviewer. This balances speed with judgment without removing oversight.

In safety-critical domains like healthcare, human oversight is non-negotiable. AI may flag anomalies or suggest a diagnosis, but the clinician retains final authority. Tools that explain the details help the practitioner understand why a recommendation was made, reinforcing confidence without removing accountability.

Designing for Human Judgment

From a UX perspective, HITL is about matching the interaction pattern to the level of risk. Simple accept/reject affordances work well for low-risk suggestions that improve speed without real consequences. As the stakes climb, impacting data, money, or people, preview and approval steps become essential. Explanations help users calibrate trust rather than blindly accept outputs.

What happens behind the scenes matters just as much. The system should capture user decisions with context, feed them into learning workflows, and log overrides for auditability. Over time, teams can track signals like override rate, confidence accuracy, time-to-approval, and perceived trust. A high override rate is not a user failure. It is a signal that the design or the model needs attention.

The Risk of Getting It Wrong

Poorly implemented HITL systems can fail in subtle ways. Human review can devolve into a rubber stamp. Workflows can slow down so much that users route around the safeguards. Feedback can skew toward a narrow subset of users. These risks are real, but they are design problems, not reasons to remove HITL.

The goal is not to maximize human involvement. It is to focus it where uncertainty, impact, or ethics demand it. Keeping HITL is less about control and more about clarity: clarity about who decides, when uncertainty matters, and how responsibility is shared between people and machines.

Optimize for Resilience, Not Just Conversion

Good design adapts as the landscape shifts. Product design, especially in AI-powered systems, can no longer afford to optimize only for short-term conversion metrics. User intent is fluid as well as ever-changing, environments change rapidly, and probabilistic systems continuously evolve too. What works today can quietly break tomorrow. Designing for resilience means building products that stay reliable, trustworthy, and useful even as assumptions, data, and user behaviors change.

Resilient design shifts the question from:

How do we maximize this metric right now?! → How does this system behave over time, under stress, and in uncertainty?

A resilient system is one that:

  • Adapts as new data and behaviors emerge.
  • Fails safely rather than catastrophically.
  • Remains transparent and explainable.
  • Avoids brittle, over-optimized interaction patterns.
  • Anticipates second-order and unintended effects.

Do not just consider last quarter’s numbers. Peek into the following quarters to identify the shift and make changes accordingly.

Build Systems That Adapt as Probabilities Change

Likelihoods shift constantly, AI models drift, contexts evolve, and user needs mature as well, so designing as if conditions are stable creates fragility in probabilistic environments. A resilient approach assumes volatility as the default.

Think about how recommendation systems tend to evolve. The early version of a content feed optimizes for engagement, and for a while, engagement goes up. Then users start to notice the feed feels narrow, repetitive, maybe even exhausting. Resilient systems rebalance, introducing novelty, diversifying signals, and pulling in long-term satisfaction measures alongside short-term clicks.

Designers should create interfaces that expect change, dynamic re-ranking, contextual explanations, and escape hatches from stale personalization loops, all of which help systems stay useful as probabilities shift.

Optimize for Long-term Outcomes, Not Just Short-term Wins

Short-term conversion gains often hide long-term costs. Speeding up onboarding can reduce comprehension. Maximizing notification CTR can erode trust. Optimizing engagement alone can produce unhealthy usage patterns. Fragile systems maximize numbers while ignoring second-order effects, the downstream consequences that show up weeks or months later.

Duolingo’s hearts system is a good example of designing against this. It introduces friction: if you make too many mistakes, you run out of hearts and have to wait or practice older material to earn more. On paper, that looks like a conversion killer: fewer lessons per session. In practice, the team has publicly discussed how it supports long-term motivation and retention, which is the metric that actually matters for a learning app. Short-term engagement dips, but long-term outcomes improve.

Meta has made a similar, if more reluctant, shift. The company publicly acknowledged that optimizing purely for “time spent” produced unintended emotional and societal effects, which led to a stated pivot toward “meaningful social interactions” as a guiding metric. Whether that shift fully landed is up for debate, but the acknowledgment itself is the point: optimizing for the wrong thing at scale has real downstream cost.

So, designers must routinely ask:

  • What behaviors are we unintentionally reinforcing?
  • Will this interaction still be healthy if repeated at scale?
  • Are we optimizing for the ecosystem’s wellbeing or just the next click?

Plan For Uncertainty the Way You Plan For Scale

Teams routinely plan for traffic spikes, but rarely for uncertainty spikes. Yet AI systems degrade, adversarial behaviors evolve, and external shocks can reshape user behavior overnight. Resilient design assumes variability and prepares for it.

This means designing for degrading confidence. What does your interface do when the AI isn’t sure? Does it quietly fail, or does it gracefully hand off? Does the experience still make sense if AI assistance goes away entirely? A good fallback strategy is as important as the happy path.

Some practical actions:

  • Design for degrading confidence.
    Show fallback states, allow manual overrides, and visualize uncertainty where it matters.
  • Measure long-term user health.
    Track satisfaction, retention quality, and unintended behavior, not just conversion.
  • Build adaptability in.
    Use adjustable ranking rules, dynamic states, and continual experimentation across segments.
  • Model second-order effects early.
    Every optimization casts a shadow; surface it before shipping.
  • Use a resilience checklist before launch.
    How does the system behave under low AI confidence? What’s the safe fallback? What drifts do we anticipate?

Conclusion

If you take one thing from this article into your next design review, make it this:

Stop asking “Will this work?” and start asking “How likely is this to work, and what happens when it doesn’t?”

That single reframe changes how you write hypotheses, interpret AI output, scope experiments, and design for the moments when the system is wrong. Starting this week, name the assumption behind every AI recommendation you accept, find one place in your product where a probabilistic output is presented as a certainty, fix the framing, and design the fallback before the happy path.

The shift from deterministic to probabilistic design is less about new tools and more about a new posture. AI has not introduced uncertainty into our world. It has simply made the uncertainty that was always there impossible to ignore. AI can estimate, simulate, and recommend, but it cannot decide what matters, which users are being overlooked, or which unconventional idea is worth defending against a model trained on yesterday’s data. Those remain human responsibilities. Think in ranges, not points. Test assumptions, not features. Build for adaptation, not perfection. In a world where prediction is cheap, and judgment is rare, the most valuable thing a designer can do is keep asking, What else might be true?

Smashing Editorial
(yk)





Source link


administrator