Articles Notes Learnings About Bookmarks

Articles

In-depth articles on building scalable systems and backend engineering.

April 13, 2026

Admission Control: Client, Service, and Proxy Layers

A deep dive into the three layers of admission control — client-side concurrency limits, service-side in-flight counting, and Envoy proxy circuit breakers — with Go examples and comparisons to load shedding, backpressure, throttling, and rate limiting.

16 min read

February 24, 2026

Probability in a Stream: Exact vs Approximate Design

A practical walk-through of the streaming distribution problem, with exact design, chi-square uniformity, and the memory-bounded variant.

9 min read

February 24, 2026

Python Threading Primitives and Queues (and asyncio)

A practical guide to Python threading primitives, queue.Queue, asyncio.Queue, and when to use asyncio.to_thread.

9 min read

February 24, 2026

Bursty Traffic: Queue Growth and Control Patterns

A pragmatic playbook for handling bursty traffic with rate limiting, load shedding, backpressure, and retry control, plus a Dynamo-style case study.

9 min read

February 23, 2026

Tiktoken vs SentencePiece: Methodology, BPE, and When to Use Each

A practical comparison of tiktoken and SentencePiece, including how BPE works, high-level pseudocode, and which model families use each.

10 min read

February 23, 2026

When Time Complexity Lies (And What Actually Matters)

Why Big-O can mislead in production and what to pay attention to instead: constants, cache behavior, input ranges, and p99 latency.

7 min read

February 23, 2026

Smooth Text Streaming in React + Next.js (OpenAI API)

How to render OpenAI streaming responses smoothly in React and Next.js using requestAnimationFrame batching and UI-friendly buffering.

8 min read

February 23, 2026

LLM Jailbreaking in Practice

A grounded overview of jailbreaks, prompt injection, and layered defenses, with references to OWASP, OpenAI evals, and current research.

12 min read

February 23, 2026

Ever Wonder Why Livegrep Is So Fast?

A practical breakdown of how Livegrep works, why indexed search feels instant, and how it compares to GitHub Code Search, Google Code Search, and grep.app.

9 min read

February 20, 2026

Latency Ladder for Pragmatic Engineers

A practical latency cheat sheet across caches, RTT, TCP/TLS handshakes, and LLM TTFT decomposition.

8 min read

February 20, 2026

OSI Model for Pragmatic Engineers

A practical OSI guide focused on debugging flow, fault isolation, and where latency hides in real systems.

7 min read

February 20, 2026

Similarity Systems in Production: A Latency Playbook

A pragmatic guide to autocorrect, typeahead, and web-crawl dedup through the lens of latency, memory, and implementation tradeoffs.

11 min read

February 20, 2026

US West to US East RTT: The Number, The Floor, and the Lie

A practical latency deep-dive: coast-to-coast RTT ranges, physics floors, handshake amplification, and how to budget for p95/p99 reality.

8 min read

February 20, 2026

LLM Inference: From Zero to Modern

A practical bottom-up guide to modern LLM serving: KV cache, continuous batching, PagedAttention, speculative decoding, and disaggregated prefill/decode.

16 min read

February 20, 2026

Production LLM Streaming Architecture

How to build resilient token streaming with SSE, Redis Streams, replayable offsets, and clean cancellation semantics.

14 min read

January 20, 2024

Database Indexing: A Deep Dive

Understanding how database indexes work and when to use them effectively.

12 min read

January 15, 2024

Building Resilient Systems

Strategies for designing systems that gracefully handle failures and recover quickly.

15 min read

January 10, 2024

API Rate Limiting: Best Practices

Implementing effective rate limiting strategies to protect your APIs from abuse.

10 min read