Technical Documentation

How AI Models Actually Work

Deep technical explanations with code, diagrams, and mathematical foundations. Understand the architecture behind GPT-4, Claude, and other modern LLMs.

Agentic Design Patterns Part 2: Foundational Patterns with Working Code

Deep dive into prompt chaining, routing, parallelization, reflection, tool use, planning, and multi-agent collaboration. Real Python code you can run and modify.

AI agents technical
Jan 24, 2026

How We Track AI Model Costs: Real Data, Not Marketing Claims

Behind the scenes of our cost analysis methodology. See how we track token usage, calculate real costs, and determine which AI models actually save you money.

AI technical costs
Jan 24, 2026

Agentic Design Patterns: Complete Guide to Building AI Agents

Deep dive into the 21 essential design patterns for building autonomous AI agents. Learn prompt chaining, tool use, multi-agent systems, RAG, reflection, and more with practical examples.

AI agents technical
Jan 23, 2026

ChatGPT Evolution: From GPT-3.5 to GPT-4 Turbo

How OpenAI's ChatGPT models have evolved across standardized benchmarks. Performance comparison on MMLU, GSM8K, and TruthfulQA showing real-world improvements from GPT-3.5 to GPT-4 Turbo.

benchmarks ChatGPT GPT-4
Jan 21, 2026

GSM8K: Testing AI Mathematical Reasoning

How we measure whether AI can actually solve math problems - from word problems to multi-step algebra. Why most models still struggle with grade school math.

benchmarks testing GSM8K
Jan 21, 2026

MMLU Benchmark: Measuring True AI Intelligence

A deep dive into the Massive Multitask Language Understanding benchmark - the gold standard for evaluating AI reasoning across 57 academic subjects

benchmarks testing MMLU
Jan 21, 2026

Open-Source AI Testing Tools

Our complete suite of benchmarking and evaluation tools for testing AI systems. Run the same tests we use, verify our results, and contribute improvements.

Jan 21, 2026

TruthfulQA: Can You Trust Your AI?

Testing whether AI models tell the truth or spread misinformation. How we measure hallucination resistance and factual accuracy across controversial topics.

benchmarks testing TruthfulQA
Jan 21, 2026

LLM Hallucinations in Practice: A Claude Sonnet 4.5 Case Study

Real-world analysis of how even advanced LLMs can overcomplicate simple problems - and how prompt engineering helps

LLM technical hallucinations
Nov 15, 2025

KV Cache and Memory Management

Deep dive into KV cache optimization - the key to fast and efficient LLM inference

LLM technical optimization
Jan 26, 2024

Evaluating Long-Context Performance

How to test if LLMs actually use their 100K+ token context windows effectively

LLM technical long-context
Jan 25, 2024

Position Encodings in Transformers Explained

How transformers understand word order - from sinusoidal to RoPE and ALiBi

LLM technical transformers
Jan 24, 2024

LLM Inference Optimization: Speed & Cost Guide

How to make LLM inference faster and cheaper - quantization, batching, KV caching, and more

LLM technical optimization
Jan 23, 2024

Training Large Language Models: Complete Guide

How LLMs are trained from scratch - pre-training, fine-tuning, RLHF, and everything in between

LLM technical training
Jan 22, 2024

Attention Mechanisms Explained

Visual guide to how attention works in transformers - from basic self-attention to modern sparse patterns

LLM technical attention
Jan 20, 2024

How Long-Context Models Work: Technical Architecture

Deep dive into the technical innovations that enable models like Claude, Kimi, and GPT-4 to handle 100K+ token contexts

LLM technical architecture
Jan 20, 2024

Transformer Architecture: Complete Visual Guide

How transformers work from input to output - the architecture behind GPT, BERT, and modern LLMs

LLM technical transformers
Jan 20, 2024

More Coming Soon

We're continuously adding new technical deep-dives. Topics in the pipeline:

  • Position Encodings Explained
  • LLM Training & Fine-Tuning
  • Inference Optimization Techniques
  • RLHF & Alignment Methods
  • Quantization & Compression
  • Mixture of Experts (MoE)