The only AI glossary you’ll need this year
TechCrunch has published a “living” AI glossary aimed at terms now common in product, investment, and engineering discussions.
Shane Barrett·updated July 04, 2026

Terminology is becoming an interface layer
The glossary framing is pragmatic. It treats terms such as LLM, RAG, RLHF, AI agent, API endpoint, and chain-of-thought reasoning as working vocabulary rather than marketing labels.
That distinction matters. In applied ML, imprecise terminology creates measurement errors before any benchmark is run. “AI agent,” for example, is described as a system that can execute multistep tasks on a user’s behalf, potentially using multiple AI systems and external services. The important architectural variable is not the label. It is autonomy plus tool access plus task decomposition.
The same applies to API endpoints. The source describes them as callable interfaces that allow software systems to trigger actions or retrieve data. In agentic systems, these endpoints become part of the model’s action space. That changes evaluation. A chatbot can be scored on response quality. An agent using endpoints must also be evaluated for action validity, permission boundaries, failure recovery, and computational overhead.
Definitions remain unstable where benchmarks are weakest
AGI is the clearest example of semantic drift. The source notes that definitions vary across OpenAI leadership, OpenAI’s charter, and Google DeepMind. The common axis is broad human-level or beyond-human capability across many economically or cognitively relevant tasks. The measurement problem remains unresolved in the cited material.
For paperscode.org readers, the implication is direct: avoid treating AGI references as empirical claims unless the paper defines the task suite, baseline population, and scoring protocol. Without that, the term has low experimental density.
More useful terms are tied to observable mechanisms. LLMs are described as deep neural networks with billions of numerical parameters that learn relationships between words and phrases, forming a multidimensional representation of language. Training is the process of feeding data into a model so it can learn patterns. Inference is the process of running the trained model to make predictions or draw conclusions. These definitions are operational. They map onto cost, latency, memory, evaluation, and deployment constraints.
Chain-of-thought reasoning is also framed operationally: decomposing a problem into intermediate steps can improve output quality, especially in logic or coding contexts, but usually increases time to answer. That is a trade-off, not a universal improvement. Any implementation should report accuracy gain against additional latency and token cost.
What to audit in implementations
The CryptoRank summary extends the glossary into deployment risk. It flags hallucinations and a reported RAM shortage, described as “RAMageddon,” with possible pressure on token-based model costs and AI-enabled crypto or DeFi infrastructure. The evidence here should be treated cautiously: it is a secondary summary, not a benchmark. Still, the categories are relevant.
Three checks follow from the terminology.
First, separate model capability from system capability. An LLM benchmark does not validate an agent unless the evaluation includes tool calls, endpoint interaction, state management, and recovery from partial failure.
Second, require ablation where technique names are used. If a system claims RAG, RLHF, Mixture of Experts, distillation, or chain-of-thought prompting, the implementation should show what changes when that component is removed or constrained. Otherwise the term functions as packaging, not evidence.
Third, track inference economics. The glossary distinction between training and inference is basic but necessary. Many production bottlenecks emerge after training, when repeated inference, longer reasoning traces, external tool calls, and memory pressure accumulate. Parameter efficiency and latency should be reported alongside task accuracy.
The practical value of the new glossaries is therefore limited but real. They provide a shared vocabulary. They do not provide validation. For research and code readers, every term should be converted into a testable component: architecture, data path, benchmark, ablation, and runtime cost.