Technical Documentation

How AI Models Actually Work

Deep technical explanations with code, diagrams, and mathematical foundations. Understand the architecture behind GPT-4, Claude, and other modern LLMs.

Agentic Design Patterns Part 2: Foundational Patterns with Working Code

Deep dive into prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration. Real Python code you can run and modify.

AI agents technical

Jan 24, 2026

How We Track AI Model Costs: Real Data, Not Marketing Claims

Behind the scenes of our cost analysis methodology. See how we track token usage, calculate real costs, and determine which AI models actually save you money.

AI technical costs

Jan 24, 2026

Agentic Design Patterns: Complete Guide to Building AI Agents

Deep dive into the 21 essential design patterns for building autonomous AI agents. Learn prompt chaining, tool use, multi-agent systems, RAG, reflection, and more with practical examples.

AI agents technical

Jan 23, 2026

ChatGPT Evolution: From GPT-3.5 to GPT-4 Turbo

How OpenAI's ChatGPT models have evolved across standardized benchmarks. Performance comparison on MMLU, GSM8K, and TruthfulQA showing real-world improvements from GPT-3.5 to GPT-4 Turbo.

benchmarks ChatGPT GPT-4

Jan 21, 2026

GSM8K: Testing AI Mathematical Reasoning

How we measure whether AI can actually solve math problems - from word problems to multi-step algebra. Why most models still struggle with grade school math.

benchmarks testing GSM8K

Jan 21, 2026

MMLU Benchmark: Measuring True AI Intelligence

A deep dive into the Massive Multitask Language Understanding benchmark - the gold standard for evaluating AI reasoning across 57 academic subjects

benchmarks testing MMLU

Jan 21, 2026

Open-Source AI Testing Tools

Our complete suite of benchmarking and evaluation tools for testing AI systems. Run the same tests we use, verify our results, and contribute improvements.

Jan 21, 2026

TruthfulQA: Can You Trust Your AI?

Testing whether AI models tell the truth or spread misinformation. How we measure hallucination resistance and factual accuracy across controversial topics.

benchmarks testing TruthfulQA

Jan 21, 2026