How does an AI work?

What this is about

An AI doesn't think. It calculates. It once read a great deal, learned patterns from it - and now, for every question, it suggests word by word what would plausibly come next. That's the whole mechanism. The rest is scale.

This page is a walk through the steps in which a modern AI works. No formula, no code - instead images that stay with you long after you've clicked the page away. Whoever nods six times by the end has understood the mechanism.

Step one

Reading

During training the model reads billions of texts and remembers which words typically go together.

→

Step two

Meaning as a map

From what it has read, an inner map forms in which similar meanings lie close together.

→

Step three

Answering

When you ask, the model wanders through this map and writes the most likely continuation, word by word.

One

An AI learns by solving fill-in-the-blank texts.

Imagine someone hands you a novel, covers every tenth word and asks you to guess it. At the start you'll often be wrong. After the hundredth book you'll get better. After the millionth you'll sense words without really knowing the sentence - simply because certain words almost always turn up in certain contexts. That's exactly how an AI is trained.

During training the model sees billions of sentences, always with a missing piece it's meant to predict. Each time it compares its prediction with the real word. If it was off, it adjusts itself inwardly by a tiny amount. This happens trillions of times. In the end the model is extremely good at guessing the next word - and that very ability is all it can do.

What's behind it

Technically the procedure is called next-token prediction. During training the texts pass through a huge neural network - a mathematical construct with billions of tiny adjustable dials (called parameters or weights). Every wrong prediction nudges these dials, via a procedure called backpropagation, a little in the right direction.

The finished model is nothing more than the set pattern of these dials - frozen, copyable, callable. When you talk to ChatGPT, Claude or Gemini today, you're talking to a file holding these billions of numbers. The model learns nothing new while talking with you - the training happened beforehand and stays separate from use.

Two

Texts are broken into small building blocks.

Before the model can work with a text at all, it has to cut it into manageable pieces. These aren't always whole words - often they're syllables, word fragments, sometimes just a single letter. The German word "Versicherungsschein", for instance, is split into several pieces, while the English "the" stays as one. These pieces are called tokens, and to the AI a question is nothing but a string of tokens.

This fragmentation has real consequences. The longer a text, the more tokens - and the more the model has to compute. The costs an AI provider charges are almost always tied to this token count. The famous length limits too ("this model can process 200,000 tokens at once") are measured in this unit. Anyone who builds with AI quickly learns: tokens are the currency in which AI is paid for and thought about.

What's behind it

The splitting procedure is called tokenisation. It isn't the same in every model - English is usually tokenised more efficiently than German, because the training material was historically English-heavy. A rough rule of thumb: 1,000 tokens correspond to roughly 700 to 800 English words.

Tokens are also the reason an AI sometimes fails at seemingly trivial tasks - such as when you ask how many "r"s there are in "strawberry". For the model the word isn't a chain of letters but a clump of tokens. Counting letters is therefore unnaturally hard for an AI - a weakness that follows directly from this mechanism.

Three

Meaning becomes a place in space.

Here's the image that sticks best. Every token gets a place on a vast inner map within the model. "Dog" and "cat" lie close together, because both are pets. "Dog" and "barking" lie close, because they often appear in the same sentence. "Dog" and "square root" lie far apart. This map doesn't have two dimensions like a city map - it has thousands. But the principle is the same: meaning becomes distance.

From this follows something remarkable: the model doesn't need to know a question word for word to answer it. If you ask about "a loyal four-legged companion", the AI finds the same place on its map as for "dog". That's exactly what makes AI so different from a classic search engine. It doesn't search for words, it searches for meanings.

What's behind it

These places on the inner map are called embeddings. Technically they're long lists of numbers - typically 1,000 to 4,000 per token. Each number describes one aspect of the meaning. What aspect that is in each case nobody can clearly put into words - the model worked out these dimensions for itself during training.

Embeddings are also the basis for the procedure called RAG (retrieval augmented generation). Here each of your documents gets such a "place", and when you ask something, the system pulls from your document store the pieces that lie closest to the place of your question. That's how an AI gains access to your internal knowledge without it ever having gone into training.

Four

Attention decides what counts.

An AI doesn't read your question from left to right like a human. It has all the words in front of it at once and decides, for each one, which of the others matter. In the sentence "Anna gave the dog the sausage, because he was hungry" the AI has to grasp that "he" refers to the dog, not to Anna. To do this it actively looks "back" at all the previous words and weights them - some strongly, others barely.

This deliberate looking back and forth is the breakthrough that sets modern AI apart from its predecessors. It's called attention - and it's the reason an AI today can summarise long texts, hold references across several paragraphs and pick up on allusions. What it doesn't have is attention span in the human sense: it distributes attention, it doesn't feel it.

What's behind it

The procedure is called self-attention and is the heart of the transformer architecture, which Google researchers introduced in 2017 in a paper titled "Attention is All You Need". Everything running today under the names GPT, Claude, Gemini, LLaMA, Mistral or DeepSeek is at its core a variant of this one blueprint.

In practice, self-attention also limits what an AI can look at in one go. This limit is called the context window. Current models have windows of 100,000 to several million tokens - enough for hundreds of pages of text. Whatever lies outside the window, the model doesn't know in that particular conversation. That's why an AI by default remembers nothing from one conversation to the next - unless the application actively gives the memory back to it.

Five

The answer takes shape word by word.

When the AI answers you, it doesn't have a finished text in its head that it merely outputs. It rolls on word by word. For each step it works out which token would most likely come next - then the next, then the next. The whole thing happens thousands of times per second. What you see appearing on the screen word by word isn't animation - that's actually the order in which the AI thinks.

And it doesn't always take the most likely word. If it did, it would sound sterile and predictable. Instead it rolls the dice - steered by a value called temperature. Low temperature: tame, conservative answers. Higher temperature: more creative, more surprising wording, but also more nonsense. That's why the same question never gets quite the same answer twice - and why an AI sometimes invents, with full conviction, things that aren't true.

What's behind it

The step-by-step generation is called autoregressive generation. Each newly generated token is simply appended to the question, then the model runs again and produces the next. That's exactly why AI answers feel so fluid - they're computed at the very pace at which you can read them.

The inventions are called hallucinations. They're not a malfunction but a consequence of the design: the model is trained always to deliver a plausible continuation, even when it doesn't know the answer. Countermeasures are: attaching sources (RAG), asking the model for evidence, checking answers against a second system. More on this point is on What AI can't do, even when it looks like it can.

Six

Training and answering are two different things.

The actual learning of an AI happens once, at enormous effort: thousands of high-performance graphics cards work for weeks to months, electricity costs in the millions, computing time on the scale of whole power-plant sections. What comes out of it is the finished file with the billions of set dials. This phase is called training.

When you talk to the model afterwards, it learns nothing new. It reads your question, computes the answer, forgets everything again. This phase is called inference - the ongoing use. The model you talk to today is exactly the same model someone else talked to yesterday - and that someone else will talk to again tomorrow. That's also why you can't simply teach an AI your company values in conversation: once the window closes, everything is gone. Anyone who wants to give an AI their own knowledge permanently has to build it differently - with databases, RAG, or special retraining.

What's behind it

There are three ways to bring an AI your own knowledge, and they differ considerably in effort and effect. The simplest is prompting - you simply include the relevant information in every request. The second is RAG - before answering, the system itself fetches the matching pieces from your knowledge base. The third is fine-tuning - you retrain the model with your data, which permanently changes its dials.

For the vast majority of mid-sized business projects, RAG is the right answer. Fine-tuning only pays off once a clear, recurring use case justifies the high costs. Which path is the right one in your specific project is one of the questions we clarify in the initial conversation - long before anything is built.

What these six points have in common

An AI is a very well-built prediction machine for the next word. Reading, breaking down, placing, attention, predicting, the separation of learning and answering - that's all it takes to understand the mechanism in broad strokes.

Whoever has this image in mind no longer falls for every advertising spot. They see why AI is astonishingly good at some tasks and astonishingly wide of the mark at others. And they understand why a serious answer to "Can we use AI in our business?" always begins with a counter-question: For what exactly? With which data? Who checks the result? These questions are the real work - and the reason good AI projects rarely fail on the technology.

If you want to go deeper

The mechanism is one side. The other knowledge pages cover what follows from it - for your data, your possibilities and your limits.

What AI really is today and where the terms come from is on What actually is AI?. Which risks come with this mechanism is on What AI can't do, even when it looks like it can. And which other AI-related terms you'll come across, you'll find in the AI glossary.

Let's talk ← Back to the knowledge base