Glossary

AI language. Briefly explained.

Over a hundred terms that come up in AI conversations — without jargon. For leaders who want to keep up without becoming developers.

Accuracy: Share of correct answers on a test set. A quick yardstick — but blind to nuance, context, and rare failures.
Adversarial Attack: Deliberately manipulated input that misleads a model — for example a prompt that bypasses safety rules or an image that fools a classifier.
Agent: An AI system that plans multiple steps, uses tools and judges intermediate results — rather than just answering once.
AGI: Artificial General Intelligence. Hypothetical AI matching human cognitive ability across any task. Does not exist — a goal debate, not a product.
Algorithmus: A sequence of steps a computer follows to turn an input into a result — the term goes back to al-Khwarizmi in the 9th century. In classical software, prescribed by humans and traceable; in AI models, learned from data and for that reason often hard to see through.
Alignment: The effort of training a model to follow human intent and values — not just the literal prompt.
API: Application Programming Interface. The endpoint through which software calls an AI model. Usually priced per token.
Attention: Mechanism that makes models focus on relevant parts of the input while generating. Core of every transformer architecture.
AutoML: Automated training and tuning of models. Lowers the entry barrier — but does not replace domain understanding.
Backpropagation: The standard way neural networks learn: errors at the output are propagated back through the network, weights get adjusted.
Batch: A group of training examples passed through the model together. Batch size affects learning quality and memory needs.
Benchmark: Standardized test used to compare models — MMLU, HELM, HumanEval. A useful yardstick, but no proxy for real-world value.
Bias: Systematic distortion in models — produced by imbalanced training data. Not a bug, but a mirror of the world the data comes from.
Black Box: Models whose internal decision logic is not directly inspectable. The core challenge behind explainability and governance.
Chain-of-Thought: A technique where the model "thinks aloud" in intermediate steps before answering. Often more accurate — but slower and more expensive.
ChatGPT: OpenAI’s chat assistant that triggered the public AI breakthrough in 2022. Today a synonym for conversational AI, though only one of many products.
Claude: Anthropic’s model family. Known for long context windows and a rigorous safety-training approach (Constitutional AI).
Constitutional AI: Anthropic’s training approach: the model is given a "constitution" of principles and learns to critique itself against them.
Context Engineering: The deliberate shaping of context — which role, rules, and data the model receives. Often matters more than the model itself.
Context Window: The maximum amount of text a model can hold "in mind" at once. Beyond it, older content is forgotten.
Copilot: An AI assistant that supports professionals in the background — code, text, email. Augments rather than replaces.
Copyright: Unresolved dispute: who owns rights to AI-generated output? And: may a model train on copyrighted data? Legal terrain still shifting.
Corpus: The body of text a model was trained on. Corpus quality and diversity shape what the model can do — and where its bias comes from.
Data Governance: How a company ensures data is handled correctly, safely and in line with rules — roles, processes, accountabilities plus compliance with regulations such as GDPR or Sarbanes-Oxley. In AI projects, often what should have been clarified beforehand.
Data Labeling: Manually annotating training data — labeling images, categorizing texts, marking errors. Time- and cost-intensive, but decisive for quality.
Dataset: A structured data collection used to train, evaluate or refine a model. Quality and representativeness are prerequisites for success.
Dataset Shift: Models are trained on specific data; reality in production drifts over time — customer behaviour moves with the seasons, new terms appear, markets turn. Predictions then quietly drift off, often without anyone noticing. Models age too.
Deep Learning: Subfield of machine learning using deep neural networks — many layers, millions to billions of parameters. Basis of every modern AI breakthrough.
Diffusion Model: Model type that generates images by iteratively denoising. The technique behind Midjourney, Stable Diffusion, DALL-E.
Distillation: A small model learns from a larger one — teacher-student principle. Result: comparable quality at much lower compute.
Embedding: A numeric representation of text that encodes meaning. The basis for search, RAG, and similarity matching.
Encoder: The input side of a model that translates text into vectors. Counterpart: decoder. Many modern models use both.
Epoch: A complete pass of the training dataset through the model. Multiple epochs are needed — too many lead to overfitting.
EU AI Act: EU regulation that classifies AI systems by risk tier — bans, obligations, transparency duties. Phased rollout since 2024.
Evaluation: Systematic testing whether a model does what it should — via benchmarks, human judgement, or real-world A/B tests.
Explainability: A model’s property of making its decisions traceable. Critical in regulated domains (medicine, finance, law).
Feature Engineering: Preparing raw data so that a model actually sees the signals relevant to the task. In classical machine learning, a time-consuming process driven by domain knowledge — with deep learning and LLM models largely automatic, but careful data preparation remains mandatory.
Federated Learning: Instead of pooling all data centrally, multiple local nodes — smartphones, hospital servers, bank machines — train the model on their own data. Only the learned model parameters travel to a central place; the data itself stays where it is. Important wherever data cannot leave its local context for privacy, trust or compliance reasons.
Few-Shot Prompting: Giving the model two to five examples before it solves the task. Often more accurate than zero-shot, but costlier in tokens.
Fine-Tuning: Re-training a model on your own data. Powerful but costly — often RAG or better prompting is enough.
Foundation Model: A large, generally trained base model used as the starting point for many applications — GPT, Claude, Llama. Broad rather than specialized.
Function Calling: A model decides which external function (tool, API, database) to call to fulfill a task — rather than inventing the answer.
Gemini: Google’s multimodal model family. Tightly integrated into Google products (Workspace, Search, Android).
Generative AI: Umbrella term for AI systems that create new content — text, image, audio, video, code. As opposed to pure classification or prediction models.
GPT: Generative Pre-trained Transformer. Originally OpenAI’s model name, now a generic term for transformer-based text generation models.
GPU: Graphics Processing Unit. Hardware built for parallel compute — the engine of AI training. Nvidia dominates the market.
Gradient Descent: The optimization routine by which models learn: minimize error by nudging parameters step by step toward lower error.
Guardrails: Rules and filters that prevent a model from producing unwanted output — toxicity, hallucinations, confidential data.
Hackathon: A portmanteau of "hack" and "marathon", originally from software development. A time-boxed work sprint — typically 24 to 72 hours — in which teams work under time pressure on a concrete problem and present something tangible at the end: prototype, concept, solution. Today used as an innovation format inside companies, as a community event, or as a team-building instrument.
Halluzination: When a model confidently states something false. Not a bug — a side effect of statistical language.
Human-in-the-Loop: Design principle: a human reviews, corrects or confirms the AI output before it takes effect. Standard in critical applications.
Hyperparameter: Training settings not learned from data but set in advance — learning rate, batch size, layer count. Shape the result substantially.
Inference: Applying a trained model: input in, output out. Cheap per call compared to training — but costly in aggregate at scale.
In-Context Learning: A model’s ability to learn from examples inside the prompt — without retraining. Foundation of all few-shot techniques.
Jailbreak: Prompt technique that circumvents a model’s safety guardrails to produce forbidden output. An arms race between attackers and model operators.
Knowledge Graph: Structured knowledge as nodes and edges (entity — relation — entity). Foundation of semantic search and explainable AI systems.
Latency: Time between a request and the model’s first response. Decisive for interactive applications — too high breaks UX.
Llama: Meta’s open-weights model family. Freely usable, popular for self-hosted enterprise AI applications.
LLM: Large Language Model. A neural network trained on huge text corpora that predicts language — GPT, Claude, Gemini.
LoRA: Low-Rank Adaptation. Fine-tuning method that trains only small add-on weights — resource-efficient and composable.
Machine Learning: Umbrella term: systems that learn from data rather than being explicitly programmed. AI today is almost always machine learning.
MCP: Model Context Protocol. Open standard that lets AI systems access external tools, data, and services — vendor-agnostic.
Memory: An AI system’s ability to retain information beyond a single conversation. Foundation of personalized assistants — and a governance risk.
Mistral: The French model family from Mistral AI. A mix of open-weights and proprietary models, European-grounded.
MLOps: Operational discipline for running AI models: deployment, monitoring, versioning, rollback. Counterpart to DevOps in software.
Model Card: Datasheet for an AI model: capabilities, limits, training data, known risks. Transparency instrument — mandated by the EU AI Act.
Model Collapse: Quality decay when models are repeatedly trained on AI-generated data — each pass drifts the distribution further from reality.
MoE: Mixture of Experts. Architecture where only part of the network activates per request. Enables very large models without proportionally higher compute cost.
Multi-Agent: Multiple AI agents collaborate or challenge each other — debate, role splitting, verification. Lifts quality, but also cost and complexity.
Multimodal: A model that handles not only text but also images, audio, or video — within one conversation.
Neural Network: Mathematical model loosely inspired by the nervous system — layers of interconnected "neurons" that learn to recognize patterns.
One-Shot Prompting: Giving the model exactly one example before the task. A middle ground between zero-shot and few-shot.
Open Source: Model whose weights (and often training code) are publicly available — Llama, Mistral. Enables self-hosting, inspection and customization.
Orchestration: Coordinating multiple AI components in a flow — which model, when, with which context, followed by which action. Core of modern AI applications.
Overfitting: A model learns training data too well — reproduces it exactly but fails on new input. A sign of poor generalization.
Parameter: A model’s learnable weights. More parameters = more capacity but also more compute. Modern models: billions to trillions of parameters.
PEFT: Parameter-Efficient Fine-Tuning. Umbrella term for methods that adapt models with minimal resource cost — LoRA, adapters, prompt tuning.
Perplexity: Measure of how much a model is "surprised" by actual data. Lower = better predicted. Classical benchmark for language models.
PII: Personally Identifiable Information. Data that makes a person identifiable — name, email, IP. AI systems must handle PII with special care.
Positional Encoding: Technique that gives transformer models the order of tokens — without it the model would not know which word comes first.
Pre-training: A model’s first training phase on huge, general datasets. Establishes baseline understanding — expensive, rarely done in-house, usually by foundation model vendors.
Prompt: The instruction given to an AI model. Prompt quality decides answer quality — often more than the model itself.
Prompt Engineering: The craft of giving a model instructions clear and structured enough to produce reliable output. A discipline in its own right — more editorial than engineering.
Prompt Injection: A manipulated input that overrides a model’s original instructions — often via hidden text in documents or web pages. Primary attack vector for agents.
Quantization: Compressing model weights to fewer bits (e.g. 4-bit instead of 16-bit). Makes models smaller and faster — at small quality cost.
RAG: Retrieval-Augmented Generation. The model searches your documents first, then answers. Massively reduces hallucination.
Reasoning Model: A model optimized for multi-step reasoning — spends compute on intermediate steps before answering. Strong on math, logic, code.
Red Teaming: Systematically attacking a model with a dedicated team — to find weaknesses before going live. Required for high-risk AI systems.
Reinforcement Learning: Learning through reward and penalty — a model tries, gets feedback, improves. Basis of RLHF and agent training.
Responsible AI: The expectation that AI systems do more than function: that their decisions stay traceable, that bias gets tested, that in the end someone is accountable. Concretely: audit trails, explainability standards, data-protection mechanics. Most important where stakes are high — medicine, hiring, courts, credit decisions.
Retrieval: Finding relevant information in a data source — classically by keyword, today by embedding. Core of every RAG application.
RLHF: Reinforcement Learning from Human Feedback. Humans rate model responses, the model learns from that. The method that makes ChatGPT feel helpful.
Role Prompting: Assigning the model a role ("You are an experienced lawyer …") — to control tone, focus, and assumptions. Simplest lever for better results.
Sampling: The process by which a model chooses among possible next tokens. Controlled by temperature, top-k and top-p.
Scaling Laws: Empirical laws showing how model performance scales with size, data and compute budget. Basis for deciding to train larger.
Semantic Search: Searching by meaning rather than exact match. Uses embeddings to find similar content even when terms differ.
Self-Supervised Learning: Learning where labels are generated from the data itself — for example hiding a word and letting the model predict it. Basis of all LLM pre-training.
Stop Sequence: A predefined string whose appearance stops the model from generating further. Technical control over output length and format.
Supervised Learning: Learning from labeled examples: for each input the model knows the desired output during training.
Sycophancy: A model’s tendency to flatter the user — confirming positions rather than challenging them. A known side-effect of RLHF: kindly rated answers get learned as "correct".
Synthetic Data: Artificially generated training data — often produced by other models. Useful when data is scarce, risky when it leads to model collapse.
System Prompt: The invisible always-on instruction that shapes a model’s behaviour in an application — tone, rules, limits.
Temperature: Creativity dial. 0 = deterministic, repeatable. 1+ = experimental, variable. Low for facts, high for ideas.
Token: The smallest unit a model breaks text into — often a word fragment. Pricing and context are measured in tokens.
Tokenizer: Algorithm that splits raw text into tokens. How it slices determines how much text fits in a context window — and how expensive requests become.
Tool Use: A model’s ability to operate external tools — calculator, database, search, API. Turns a model into an actionable agent.
Top-k / Top-p: Two dials for generation diversity. Top-k: choose only from the k most likely tokens. Top-p: choose only from tokens whose cumulative probability exceeds p.
Training: The process by which a model learns from data — weights get adjusted until error is minimized. The most compute-intensive phase of the lifecycle.
Training Data: The dataset a model learns from. The model’s boundary: it can only reflect what was in the data (and its transformations).
Transfer Learning: Adapting a pre-trained model to a new task — rather than starting from scratch. Standard practice in modern AI development.
Transformer: The architecture behind nearly all today’s LLMs. Introduced by Google in 2017. Core: the attention mechanism that computes contextual relationships.
Underfitting: A model has learned too little to explain the data — subpar performance on both training and test. A sign of insufficient capacity or too-short training.
Unsupervised Learning: Learning without labels: the model finds structure in data on its own. Basis of pre-training and clustering.
Validation Set: A held-out slice of the dataset used to monitor training without burning the test set. Key defense against overfitting.
Vector: A list of numbers that encodes meaning in AI. Embeddings are vectors — in high-dimensional space, proximity means similarity.
Vector Database: Database that stores embeddings and enables fast similarity search — Pinecone, Weaviate, pgvector. Infrastructure backbone of RAG systems.
Weights: The numeric values adjusted during training that encode a model’s knowledge. Releasing a model means sharing its weights.
Workflow Automation: Orchestrating business processes via AI components — triaging emails, drafting reports, preparing decisions. Usually higher ROI than one-off chat queries.
Zero-Shot: Solving a task without giving the model examples. Fast, but more error-prone than few-shot.

What next?

ContextDenkwerkstattHow the terms behave in practice — six essays on architect illusion, hallucination, sovereignty.To the Denkwerkstatt →ApplyClarify in sparringUnderstanding terms is easier than applying them. When it gets concrete: propose a time.Sparring →