Reality check · AI placed honestly

What AI can't do, even when it looks like it can.

AI sounds confident when it's guessing. It tells you what you want to hear. It gives two different answers to the same question. Six risks every business should know before putting it to work - and how we handle them.

What this is about

AI is useful when you know what it isn't. It's not an oracle, not a fact-checker and not a small person inside the computer. It's a statistical language model - very good at sounding plausible, not necessarily at being right.

This page isn't a warning against AI. We use it every day ourselves. But anyone who deploys it blindly risks the very substance of their business. Here are the six risks we see most often - and how we plan around them in our projects, instead of arguing them away.

First family

Content errors

Hallucinations, bias, unstable answers - what makes AI wrong on the substance.

Second family

Structural risks

Data protection, black box, dependency - what only shows up after longer use.

Consequence

With judgement

Use AI where it helps - and safeguard where it can mislead.

One

AI sounds certain, even when it's guessing.

When a language model doesn't know an answer, it doesn't stay silent. It invents one. And it invents it in the same confident tone as the right answers - with sources that don't exist, paragraphs never written, studies never published. This is called hallucination. It isn't a bug, but a property of the underlying maths.

The risk in business isn't that AI gets things wrong - that happens to humans too. The risk is that it gets things wrong with confidence. Get a wrong answer with hesitation, and you become suspicious. Get it with self-assurance, and you accept it. That's why AI must never be the sole source in our projects - it always comes with a mechanism that makes its claims checkable: source citations, original references, four-eyes review for anything consequential.

What the research says

The technical term is hallucination - systematically reviewed in academic literature by Ji, Lee et al. in "Survey of Hallucination in Natural Language Generation" (ACM Computing Surveys, 2023). The finding: with current models, hallucination rates in uncontrolled applications range, depending on the domain, between 3 and 27 percent - with peaks in fields like law and medicine where training data is thin.

The effect was spectacularly documented in Mata v. Avianca (2023): two New York attorneys had ChatGPT find them precedents for a damages brief. Six of the cited rulings didn't exist. The court imposed sanctions. The attorneys hadn't checked - because the answer sounded so credible. That's the hallucination problem in one sentence.

Two

It tells you what you want to hear.

Language models are trained to come across as helpful - and that means: agreeing with their conversation partner where possible. If you frame a question with an assumption, the AI will tend to confirm your assumption rather than contradict it. Whoever asks "Isn't it also true that...?" almost always gets a friendly yes - even when the claim is outlandish.

In business this gets dangerous where AI is used to prepare decisions. Whoever builds their preferred outcome into the question gets it confirmed. Whoever asks for the risks of an idea gets the risks that match their mood. AI isn't a neutral adviser - it's a mirror that traces friendly lines. Anyone using it as a sparring partner has to make a deliberate counter-question. So we always build an explicit counter-position into AI-supported decision processes: the AI may agree - but it must also contradict, in a second, clearly separated step.

What the research says

In an AI context, the effect is called sycophancy - described, among others, in Anthropic's research "Towards Understanding Sycophancy in Language Models" (2023). The paper shows experimentally that leading language models adjust their answers in over 60 percent of cases when the user hints at disagreement - even on objectively verifiable facts.

Related is the older confirmation bias (Peter Wason, 1960) from cognitive psychology: people preferentially seek information that confirms their hypotheses. AI amplifies this effect because it's trained on helpfulness - it reliably delivers the confirming information. Whoever knows the bias can switch it off. Whoever doesn't builds themselves an echo chamber at the speed of light with AI.

Three

The same question. Two different answers.

Whoever puts the same question to a language model twice often gets two different answers. Sometimes only in wording, sometimes in content too. That's not a defect - it's how it works: language models pick from several plausible continuations, and which one they pick isn't fully fixed.

In business this has consequences. Whoever has a quote calculated gets two prices in a row. Whoever has a contract clause reviewed gets "legally sound" once and "questionable" the next. AI isn't deterministic - the same business case with two different outcomes is possible. Consequence: anywhere binding matters, AI must not have the final word. It can prepare, suggest, check. But the decision is made by a human, with a fixed procedure and a record.

What the research says

The technical term is stochastic generation: language models pick each next word from a probability distribution, controlled by a parameter called "temperature." The mathematical basis is laid out in Holtzman et al., "The Curious Case of Neural Text Degeneration" (ICLR 2020). The empirical picture: even at low temperature, answers to the same question can differ on central points.

A much-noted Stanford study from 2023 measured the behaviour of GPT models over several months - and found that even identical tasks under identical conditions, at different points in time, produce different results. For reproducible business processes that's a hard problem. The practical consequence: AI for drafting, yes - for legally sound, checkable, repeatable statements no, or only in a chain that makes the result reproducible again at the end.

Four

You see what happens - but not why.

When an employee makes a decision, you can ask them why. You get a reason you can weigh. With a language model, you also get a reason - but it's invented afterwards, not the actual one. The actual one lies in billions of model weights and isn't legible to humans.

That's the black-box question, and it's more important in SMEs than the hype suggests. Whoever uses AI in customer contact has to be able to explain why it did what - for complaints, for damages, for legal enquiries. "The AI decided it that way" isn't an answer a customer, a business partner or a judge accepts. So we build AI in where the human makes the decision and AI simplifies the work - not the other way round.

What the research says

The field is called Explainable AI, or XAI for short. Worked through, for instance, in Doshi-Velez and Kim, "Towards A Rigorous Science of Interpretable Machine Learning" (2017). The finding: with classical statistical methods one can mathematically follow the decision. With neural networks, especially with language models, it's practically not possible with today's state of research.

Politically, the EU has responded. In the EU AI Act (2024), transparency is a core obligation for many AI applications - with different requirements depending on the risk class. For businesses that means: whoever uses AI in HR, lending, safety matters, is legally required to make decisions traceable. "The AI suggested it" is no longer enough - neither technically nor legally.

Five

What you type in doesn't always stay with you.

Type a customer name, a draft contract or a calculation into a public AI tool, and you're sending that data to a server that isn't yours. What the provider does with it stands in their terms - sometimes the data is used to train further models, sometimes not, sometimes not officially but in fact yes.

For a business this means: personal data, trade secrets, pricing - anything that shouldn't get out doesn't belong in an uncontrolled AI. GDPR doesn't joke around, and the damages are real: Samsung engineers exchanged source code with ChatGPT in 2023, code that landed in training data. So we put AI to work in two clearly separated modes: for uncritical content in the leading tools - where they're best. For sensitive content, in dedicated, contractually secured environments that pass on no data, allow no training, and sit in the EU.

What the research says

The legal basis in Europe is the GDPR (since 2018), complemented since 2024 by the EU AI Act. Both make clear: personal data may only be given to third-party processors under strict conditions - and a US-based AI provider is, in data protection terms, a third-party processor. The Datenschutzkonferenz (Germany's federation of data protection authorities) issued a 2024 guideline explicitly warning that naive use of public AI tools with personal data is, as a rule, unlawful.

In practice the risk has surfaced repeatedly. The Samsung affair, 2023: engineers typed internal code into ChatGPT - it landed in the training data and was returned to other users in later queries. Samsung subsequently banned its use internally. A Cyberhaven study from 2023 shows: 11 percent of the data that employees enter into public AI tools is classified internally as confidential. Few businesses have a protection scheme for that.

Six

Whoever stops thinking eventually stops thinking at all.

The quietest risk of AI isn't that it gives wrong answers, but that at some point you stop checking. When a tool is right 99 percent of the time, people start to skip the last one percent - until an error slips through that becomes visible. With AI this threshold is especially low, because the answers come so fluently and so confidently.

For a business that means: employees who use AI have to learn not to switch to autopilot. That's a training question, not a software question. And it's a design principle: we build AI-supported tools so that the human stays engaged - with visible source citations, with confirmation steps at critical points, with occasional "check this consciously" moments. AI should help you work more cleverly, not wean you off thinking.

What the research says

The effect is called automation bias - described, among others, by Linda Skitka et al., "Does Automation Bias Decision-Making?" (International Journal of Human-Computer Studies, 1999). Studies in aviation and medicine show: as soon as a system is regarded as an "intelligent assistant," people take over its recommendations even when other information clearly contradicts them. The machine's error rate gets multiplied by the human's.

Related is the skill atrophy research: when a skill goes unused for long enough, it withers. A widely cited MIT study from 2025 looked at how intensive use of language models affects one's own writing and argumentation skills - and found measurable declines after just a few weeks. The consequence is not to avoid AI. It's to build it in so that human competence stays awake - and isn't quietly rationalised away.

What these six risks share

AI is powerful enough that honest handling pays off. And fragile enough that naive handling does damage.

We don't sell AI for AI's sake. We put it to work where it really saves time, and we build the mechanisms that put its risks in check: source citations against hallucinations, counter-questions against sycophancy, reproducibility shells against stochasticity, humans-in-the-loop against the black box, EU hosting against data leakage, deliberate checkpoints against automation bias. These mechanisms are not what the AI demo shows. But they're what makes the difference between "we use AI" and "we use it responsibly."

If you want to put AI to work seriously in your business

We build it so that you're cleverer for it afterwards - not more dependent. And we say honestly where we wouldn't use it today.

Why we always build AI only after the digital foundation is on Tools first, AI second. How we clarify the actual problem before a project, you'll find on Before we build, we figure out what's broken. What the biggest hurdles in rollout are, is on The biggest hurdles rarely lie in the technology.