Learnings

Daily tidbits and small discoveries.

February 24, 2026

What is the pattern called when you store every state change as an append-only log?

That’s Event Sourcing. You persist every state-changing event in an append-only log, and rebuild current state by replaying events. It’s often paired with CQRS so read models are derived from the event stream.

Example:

Events:
- AccountCreated(id=123)
- MoneyDeposited(id=123, amount=50)
- MoneyWithdrawn(id=123, amount=10)

Rebuild balance by replaying in order:
0 -> +50 -> -10 = 40

Why must condition variable waits be wrapped in a while loop?

Condition variables can wake up spuriously or be notified when the condition is still false. Always re-check the predicate in a while loop to ensure correctness.

Example:

with self.cv:
    while not self.q and self.running:
        self.cv.wait()  # no timeout here

Does SIGINT target a specific thread or the whole process?

SIGINT is delivered to the process (foreground process group). The default action terminates the process. In Python, custom signal handlers run only in the main thread, so worker threads do not receive SIGINT directly. To stop threads cleanly, set a shared flag/event from the handler.

Does random.choice accept any iterable?

random.choice requires a sequence (supports __len__ and __getitem__), not a generic iterable. It works with list/tuple/str/range but not generators or sets.

How do you pick 3 random numbers from a range?

For distinct numbers, use random.sample(range(a, b+1), 3) (no replacement). For numbers with repeats, use random.choices(range(a, b+1), k=3) or [random.randint(a, b) for _ in range(3)].

Example:

indexes = random.sample(r, k=3)  # no replacement
indexes = random.choices(r, k=3)  # with replacement

How do you set the random seed in Python?

Call random.seed(value) before generating numbers to make results reproducible (for the same Python version).

What is Chi-square in statistics?

Chi-square refers to two related things:

  1. χ² distribution: the distribution of a sum of squared standard normals, X = Z1^2 + ... + Zk^2. It is right-skewed and starts at 0. Under H0, the test statistic follows this distribution.
  2. χ² statistic: the deviation measure you compute, χ² = Σ((O - E)^2 / E).

It is not just for uniformity. The same formula tests any expected distribution (uniform, fair dice, binned normal, custom probs).

Example:

def chi_square(observed, expected):
    return sum((o - e) ** 2 / e for o, e in zip(observed, expected))

Uniformity special case (E = N/k):

from typing import List, Dict
import math

CRITICAL_005 = {1: 3.84, 2: 5.99, 3: 7.81, 4: 9.49, 5: 11.07, 6: 12.59, 7: 14.07}

def test_uniformity(observed: List[float], alpha: float = 0.05) -> Dict[str, any]:
    k = len(observed)
    if k < 2:
        return {'uniform': True}

    total = sum(observed)
    expected = total / k
    stat = sum((o - expected) ** 2 / expected for o in observed)
    df = k - 1
    critical = CRITICAL_005.get(df, df + 1.96 * math.sqrt(2 * df))
    return {'uniform': stat <= critical, 'stat': stat, 'df': df}

Key idea: χ² distribution is the math foundation; χ² statistic is the computed deviation. Uniformity is just one application.

When is a chi-square test statistically significant?

It’s significant when the chi-square p-value is below your alpha (e.g., 0.05). The chi-square approximation is unreliable for small samples: expected counts per bin should be large enough (rule of thumb: ≥ 5, often ≥ 10). For k categories, that suggests $N \gtrsim 5 \times k$ (or $N \gtrsim 10 \times k$) before trusting the test. If expected counts are small, combine bins or use an exact test.

What are key design considerations for LLM token encoding/decoding under burst load?

Key considerations:

  1. Tokenizer choice + thread safety (tiktoken/sentencepiece/tokenizers; thread-local if needed).
  2. Hot path latency (avoid locks/allocs; preload merges/vocab).
  3. Burst handling (bounded queues/backpressure; limited thread pool).
  4. Caching (LRU for repeated texts/prefixes; decode cache if memory allows).
  5. Batching (micro-batch with tight max wait to protect p99).
  6. Memory cap (single shared model, byte-budgeted caches).
  7. Concurrency model (native libs may release GIL; use thread-local if unsafe).
  8. Observability (p50/p90/p99, hit rate, queue depth, batch size).
  9. Validation (burst load tests; verify p99, not just average).

Overload and capacity controls (different category): 10) Admission control / load shedding (reject to protect p99 under spikes). 11) Backpressure / rate limiting (slow upstream to avoid queue growth). 12) Autoscaling / capacity headroom (HPA for ramps; still need shedding for sharp spikes).

How do you compute percentiles from a list without a sketch?

Sort the list and index into it. This is O(n log n).

Example:

def percentile(values, p):
    if not values:
        return None
    v = sorted(values)
    idx = int(p * (len(v) - 1))
    return v[idx]

p50 = percentile(durations, 0.50)
p99 = percentile(durations, 0.99)

Does Uvicorn use processes or threads for concurrency?

Uvicorn runs a single process by default (async event loop). If you start it with --workers N, it spawns N separate processes (multiprocessing).

What is singleflight (request coalescing)?

Singleflight is request coalescing for identical in-flight work. If many identical requests arrive together, only the first does the work; the rest wait and reuse the same result. It is a temporary in-flight cache and is often paired with a normal result cache.

How do you use vercel inspect?

Use vercel inspect [deployment-id-or-url] to retrieve deployment info. Add --logs to include logs and --wait to wait for the deployment to be ready. (Vercel CLI docs)

February 23, 2026

Is a BK-tree the only option for near-duplicate word lookup?

No. A BK-tree is just a strong fit for typo-distance queries (edit distance <= k). It works best for small k (1–2) and supports dynamic inserts, which makes it a common choice for autocomplete and spell-checking. Other options include normalized set lookups, phonetic hashing, or SymSpell depending on accuracy, speed, and memory needs.

Why is BK-tree search fast in practice for word-level dedup?

Edit distance per comparison is O(n*m) but words are short, so the cost is small. A BK-tree only computes distances for nodes it visits, and the triangle-inequality pruning keeps that set small when k is low (usually 1–2). Worst-case behavior exists for large k or skewed trees, but typical spell-check workloads are fast.

How long are most English words in practice?

Using Norvig’s Google Books analysis (words with ≥100k mentions), the 99th percentile is around ~12 letters for word tokens and ~15 letters for distinct word types. This keeps per-comparison edit-distance costs small for most workloads.

Why is the edit-distance base case the remaining length?

In DP, dp[i][j] is the cost to transform word1[i:] into word2[j:]. If i == len(word1), the source suffix is empty, so the only valid operations are inserting all remaining characters of word2[j:], costing len(word2) - j. If j == len(word2), the only valid operations are deleting the remaining word1[i:] characters, costing len(word1) - i.

How does Python decide whether to load a module or reuse it?

Python first checks the module cache in sys.modules. If the module is already present, import returns it without re-executing. If not cached, the import machinery searches via sys.meta_path/sys.path, creates a module object, inserts it into sys.modules, and then executes the module code.

Is ... the same as pass in Python?

No. pass is a statement that does nothing and is used where a statement is required. ... is the Ellipsis object, an expression you can use in code (e.g., type stubs or slicing). In a function body, ... acts as a no-op expression statement, but it is not the same as pass.

How do you force-open Docker Desktop from the command line on macOS?

Use open -a Docker. If it doesn't show, try open -a Docker --args --unattended to launch in the background.

How do you stage only certain lines or hunks in git?

Use interactive staging: git add -p. It shows each hunk and lets you pick y/n, split hunks with s, or edit them with e to stage specific lines.

Example:

git add -p
# y = stage hunk
# n = skip hunk
# s = split hunk
# e = edit hunk

What is the one pruning rule that makes BK-tree fast?

At a node, only follow child edges whose distance label is within k of the query’s distance to that node. Formula: abs(edge_dist - d) <= k, where d = dist(query, current_node). Equivalently, d - k <= edge_dist <= d + k. This rule safely prunes entire branches and is the core speedup.

How long does it take to restart a paused Supabase project?

Community reports indicate that resuming a paused Supabase project can take a few minutes, and sometimes longer depending on load. Plan for a non-instant resume when scheduling work that depends on a paused project.

February 22, 2026

How does Python unpacking work with __iter__?

Python unpacking (a, b = x) calls iter(x) under the hood. If your class implements __iter__ and yields values in order, it can be unpacked like a tuple. This is useful for returning object data ergonomically while keeping named attributes.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __iter__(self):
        yield self.x
        yield self.y

p = Point(3, 5)
a, b = p
print(a, b)  # 3 5

# Also works in loops/functions expecting iterables:
print(list(p))  # [3, 5]
February 20, 2026

Why is ripgrep (rg) useful when working with LLMs on codebases?

rg is fast enough to repeatedly narrow context while iterating with an LLM. It helps you locate symbols, call sites, and file sets quickly so prompts and edits are grounded in real code instead of guesswork. Typical pattern: rg --files to discover, rg -n to locate, then open only relevant files.

# find candidate files
rg --files | rg "(route|controller|service)"

# find symbol usage with line numbers
rg -n "createBookmark|upsertBookmark|/api/admin"

# scope searches by extension
rg -n "TODO|FIXME" src -g "*.ts"

Does Resend use React Email or its own framework for emails?

Resend is an email delivery API/platform, not a required custom rendering framework. You can send html, text, or a React element via the SDK. The common React setup is to build templates with @react-email/components (React Email) and pass them to Resend.

Sources:

  • https://resend.com/docs/api-reference/emails/send-email
  • https://github.com/resend/resend-node
  • https://resend.com/docs/dashboard/emails/email-templates
  • https://resend.com/blog/react-email-5
import { Resend } from 'resend';
import { Html, Body, Container, Heading, Text } from '@react-email/components';

const resend = new Resend(process.env.RESEND_API_KEY);

function WelcomeEmail({ name }: { name: string }) {
  return (
    <Html>
      <Body>
        <Container>
          <Heading>Welcome</Heading>
          <Text>Hey {name}, thanks for joining.</Text>
        </Container>
      </Body>
    </Html>
  );
}

await resend.emails.send({
  from: 'Acme <onboarding@resend.dev>',
  to: ['user@example.com'],
  subject: 'Welcome',
  react: <WelcomeEmail name="Andrew" />,
});

How does Vercel Cron work in simple terms?

You register scheduled jobs that point to your own REST endpoints. At the scheduled time, Vercel sends an HTTP request to that endpoint, and your function runs on Vercel compute. Endpoint auth is optional, but recommended. If you set CRON_SECRET in project env vars, Vercel includes Authorization: Bearer <CRON_SECRET> on cron requests; your route should verify that header and reject anything else.

Architecture (simple)

[vercel.json cron config]
        |
        v
[Vercel Scheduler]
        |
        | HTTP GET at scheduled time (UTC)
        | Authorization: Bearer <CRON_SECRET> (if configured)
        v
[/api/cron/daily-task endpoint on your app]
        |
        | verify bearer token
        v
[run job logic: send email / sync / cleanup]

Example guard (Next.js):

export async function GET(req: NextRequest) {
  const auth = req.headers.get('authorization');
  if (auth !== `Bearer ${process.env.CRON_SECRET}`) {
    return NextResponse.json({ error: 'unauthorized' }, { status: 401 });
  }

  // do scheduled work
  return NextResponse.json({ ok: true });
}

How do you publish an npm package reliably?

Use a repeatable checklist: pick a unique package name, set bin/files/engines, build first, npm login, then npm publish --access public for scoped public packages. Add prepublishOnly so publish always builds from a clean step.

# inside package folder
npm whoami
npm version patch
npm run build
npm pack --dry-run
npm publish --access public

# one-time local install test
npm i -g .
websitectl --help

What is the practical difference between agent tools, skills, and a knowledge base?

Tools perform actions (side effects), skills define reusable workflow behavior, and a knowledge base stores facts/content for retrieval. For this project: bookmark add is a tool call, curation rubrics are skills, and learnings/bookmarks data is knowledge.

Tool:    bookmark_add(url) -> writes to DB
Skill:   "bookmark curation checklist" -> guides decisions
KB:      learnings/bookmarks -> retrieved context for answers

Can TypeScript repos support multiple projects like Go modules?

Yes. Use npm/pnpm/yarn workspaces for multiple packages and TypeScript project references for incremental typed builds across package boundaries. A common split is apps/web, packages/contracts, and packages/cli.

{
  "workspaces": ["apps/*", "packages/*"],
  "scripts": {
    "build": "turbo run build"
  }
}

# tsconfig project reference
{
  "references": [{ "path": "../contracts" }]
}

What are the practical workspace options beyond Turborepo, and should we use npm workspaces?

For a minimal setup, start with npm workspaces only. Add Turborepo later if build orchestration and caching become painful. Common options: npm workspaces (built-in, simplest), pnpm workspaces (faster installs, strict dependency graph), Yarn workspaces (good but extra tooling conventions), Turborepo/Nx (task orchestration), and Lerna/Changesets (versioning/publishing flows).

# minimal root package.json
{
  "private": true,
  "workspaces": ["apps/*", "packages/*", "tools/*"]
}

# install all workspace deps
npm install

# run a script in one workspace
npm run build --workspace websitectl

What is the standard automation PR flow for content/code updates?

Typical flow: automation creates a branch and commit, opens a PR to the default branch, CI runs checks, and if repository rules allow it, automation enables auto-merge or merges directly after checks pass.

1) create branch + commit
2) open PR -> main
3) run CI (lint/test/build)
4) merge path:
   - auto-merge (if checks pass + rules satisfied), or
   - direct merge by bot/user with permissions

What happens to deploys if Vercel is not connected to the Git repo?

No automatic preview/production deploys are triggered by PR events or merges. Deploys must be triggered manually (for example via vercel --prod or Vercel API). In this mode, merging a PR does not redeploy by itself.

# manual production deploy
npm run deploy

# deploy script (example)
# vercel --prod

How do npm packages control what is exposed to consumers?

Use exports to define importable entry points, bin for CLI command exposure, and files to control what gets published in the tarball. Anything not included by exports is effectively private to consumers, even if it exists in the package contents.

{
  "name": "websitectl",
  "bin": {
    "websitectl": "dist/cli.js"
  },
  "exports": {
    ".": "./dist/index.js"
  },
  "types": "./dist/index.d.ts",
  "files": ["dist", "README.md", "LICENSE"]
}

Why does C++ have and, or, and not?

They are alternative operator tokens for readability and historical portability. In C++, they are keywords with identical behavior to symbolic operators: and == &&, or == ||, not == !.

bool a = true;
bool b = false;

if (a and not b) {
  // same as: if (a && !b)
}

Do extra tabs or spaces affect Python performance?

No. Python parses indentation for block structure, then compiles to bytecode. Extra whitespace does not add runtime cost. The real downside is readability and maintainability for humans.

def a():
    return 1


def b():
        return 1

# Runtime difference is effectively none; style/readability is the concern.

Is there a performance difference between unnecessary async def + await vs plain def?

Yes. A pure async call path can be around an order of magnitude slower per call (commonly microseconds vs hundreds of nanoseconds) due to coroutine object creation and scheduling overhead. Usually negligible compared to real network I/O, but unnecessary async adds cognitive/debugging cost.

def sync_fn(x):
    return x + 1

async def async_fn(x):
    return x + 1

# sync_fn(1) -> direct value
# await async_fn(1) -> coroutine lifecycle overhead

What does Python's "coroutine was never awaited" warning mean?

An async def function was called, producing a coroutine object, but nothing awaited it, so its body never executed. Python warns when that abandoned coroutine is garbage-collected.

async def make_stream():
    return client.stream("POST", url)

# wrong: make_stream() returns a coroutine
async with make_stream() as r:
    ...

# right: await first, then use returned object
stream_cm = await make_stream()
async with stream_cm as r:
    ...

Why does async def change calling behavior even when there is no await in the body?

async def always returns a coroutine object when called. You must await it to get the underlying return value. A plain def returns the value directly.

def sync_fn():
    return "hello"

async def async_fn():
    return "hello"

sync_fn()         # "hello"
async_fn()        # <coroutine object ...>
await async_fn()  # "hello"

When should you use async def vs plain def?

Use async def only when the function body needs to await async operations. If a function only computes/returns values or passes through objects synchronously, keep it as plain def.

def build_payload(msg: str) -> dict:
    return {"message": msg}

async def send_payload(client, payload: dict):
    return await client.post("/send", json=payload)

How do coroutines/tasks/promises compare across Python and Node.js, and what about threads/GIL/multiprocessing?

Python async def returns a coroutine; asyncio.create_task(...) schedules it. Node async functions return a Promise immediately and execution is scheduled on the event loop. Both are event-loop concurrency models for I/O. Node JavaScript execution is single-threaded per process, with background threadpool help for some I/O/crypto. Python supports both threads and async, but CPython's GIL means CPU-bound Python bytecode does not run truly in parallel across threads; use multiprocessing for parallel CPU work. Node's multi-core scaling is usually multiple processes (cluster, process managers, container replicas, worker-based patterns). Python's equivalent is multiprocessing/process pools.

# Python
async def work():
    ...

coro = work()                         # coroutine object (not scheduled yet)
task = asyncio.create_task(work())    # scheduled task
result = await task

# Node.js
async function work() {
  ...
}

const p = work();                      // Promise (already started by async function semantics)
const result = await p;

# CPU scaling
# Python: multiprocessing / ProcessPoolExecutor
# Node: cluster / multiple processes / worker patterns

Why can QUIC reduce handshake latency vs TCP, and what are the tradeoffs?

TCP+TLS typically needs multiple round trips before application data is flowing on a fresh connection. QUIC (over UDP) combines transport + crypto setup so initial handshakes can require fewer RTTs, and resumed connections can be faster (0-RTT). This helps high-latency links (for example mobile networks). Tradeoffs: UDP blocking/throttling on some networks, more CPU complexity in user-space stacks, 0-RTT replay considerations, and operational debugging can be harder than mature TCP tooling.

Fresh connection (typical):
TCP + TLS: more round trips before useful data
QUIC: fewer round trips; can use 0-RTT on resume

Pros:
- lower connection setup latency
- better behavior under packet loss (stream-level in HTTP/3)

Cons:
- UDP may be filtered/rate-limited in some environments
- implementation/ops complexity
- 0-RTT replay risk for non-idempotent requests

What is out-of-order execution and why does it matter in practice?

Modern CPUs can execute independent instructions out of program order to keep execution units busy, then retire results in-order for correctness. This hides some memory/latency stalls and is why source order is not equal to execution order at the microarchitectural level. Data dependencies, branch mispredicts, and cache misses still limit gains.

Program order:
A: load x
B: add y
C: store z

CPU may execute B before A if independent, then retire in-order.
Net effect: better pipeline utilization, but dependency chains still bottleneck.

How does Amdahl's Law limit parallel speedups?

If a fraction of work is inherently serial, total speedup has a hard upper bound no matter how many cores you add. Even with infinite processors, max speedup is 1 / serial_fraction. This is why reducing serial bottlenecks often beats adding more workers.

Amdahl's Law:
S(N) = 1 / (s + (1 - s)/N)

Example: s = 0.1 (10% serial)
Max theoretical speedup as N -> infinity:
1 / 0.1 = 10x

What is the core "ngmi" takeaway for engineering teams?

In fast AI-adoption cycles, rankings can change quickly: engineers who operationalize new tools early may leapfrog higher-ranked peers who stay in disbelief too long. The practical lesson is to optimize for adaptation speed, not just historical performance.

Source: https://ghuntley.com/ngmi/

Old ranking stability assumption:
A > B > C (for years)

AI adoption shock:
C adopts early + compounds usage
B delays + underestimates shift

New output ranking can flip within quarters.

What does the "time to oh-f moment" idea imply?

The longer someone remains in early denial/disbelief stages, the longer it takes to recognize they are being outpaced. Teams should shorten this loop with mandatory hands-on practice, not passive observation.

Source: https://ghuntley.com/ngmi/

Belief-stage lag (concept):
longer disbelief -> slower behavior change -> larger output gap

Intervention:
- set adoption deadlines
- require weekly usage artifacts
- review outcomes, not opinions

How can org economics shift without immediate layoffs in AI transitions?

One pattern is silent restructuring: normal attrition plus selective backfills with AI-native engineers can change team composition over time without explicit mass layoffs. The strategic signal is hiring and replacement policy, not just headline layoffs.

Source: https://ghuntley.com/ngmi/

Headcount stays near flat short-term
while composition shifts over replacement cycles:

attrition -> backfill with higher leverage profile
=> rising average output per seat

Why can -> Generator[str] crash at runtime even though Python does not enforce type annotations?

Type annotations are not runtime-validated against values, but they are still evaluated as Python expressions at import time (unless postponed). Generator[str] is an invalid generic arity for typing.Generator, so the annotation expression itself fails during import.

from typing import Generator, Iterator

# CRASHES at import (invalid Generator arity)
def stream_bad(prompt: str) -> Generator[str]:
    yield "hello"

# OK: Iterator needs one type param
def stream_ok(prompt: str) -> Iterator[str]:
    yield "hello"

from __future__ import annotations

# OK: annotation is deferred (stored as string)
def stream_deferred(prompt: str) -> Generator[str]:
    yield "hello"

Why does httpx annotate client.stream() as AsyncIterator[Response] when you use it with async with?

The implementation uses @asynccontextmanager over an async generator that yields one Response. Stubs often reflect the generator yield type, but the returned object is an async context manager to be entered with async with, not awaited as a value.

# source shape (simplified)
async def stream(...) -> AsyncIterator[Response]:
    response = await self.send(request, stream=True)
    try:
        yield response
    finally:
        await response.aclose()

# usage
async with client.stream("POST", url) as response:  # right
    ...

r = await client.stream("POST", url)                # wrong

Why can't you create a context manager inside a method and return the resource?

Returning exits the async with block immediately, so cleanup runs and the resource is closed before the caller uses it. Use yield from an async generator so the context stays open across emitted items.

# broken: resource closes on return
async def stream_broken(self, prompt):
    async with self.client.stream("POST", ...) as r:
        return AsyncChatStream(r)

# works: yield keeps context alive
async def stream_ok(self, prompt):
    async with self.client.stream("POST", ...) as r:
        async for chunk in parse_sse_data_line_async(r):
            yield chunk

Why is _AsyncGeneratorContextManager not awaitable even though it is "async"?

"Async" and "awaitable" are different protocols. Awaitables implement __await__. Async context managers implement __aenter__/__aexit__. client.stream() returns a context manager, so use async with, not await.

cm = client.stream("POST", url)

await cm             # TypeError: not awaitable

async with cm as r:  # correct
    ...

Why is streaming inherently a leaky abstraction?

A non-streaming call returns a finished value. A stream is an ongoing relationship: connection lifetime, cancellation, backpressure, and mid-stream errors all matter. Callers must participate in lifecycle handling; wrappers can reduce but not eliminate this complexity.

non_streaming() -> value
streaming() -> ongoing resource relationship

You must handle:
- open/close lifecycle
- cancellation
- partial failures
- consumer-driven stop conditions

What's the cleanest pattern for async streaming abstractions in Python?

Async generators. They hold the context manager open via yield — the context manager's lifetime equals the stream's lifetime, not the method's lifetime. return exits the method and triggers __aexit__ (closing the connection). yield suspends the method, keeping the async with block alive until the caller stops iterating. In sync generators, delegation is yield from. In async generators there is no async yield from, so delegation is async for ...: yield ....

# Async: no 'async yield from' syntax, so delegate with async-for + yield
async def stream_ok(self, prompt):
    async with self.client.stream("POST", ...) as r:
        async for chunk in parse_sse_data_line_async(r):
            yield chunk

# Async broken: return exits async-with immediately and closes r
async def stream_broken(self, prompt):
    async with self.client.stream("POST", ...) as r:
        return AsyncChatStream(r)

# Sync equivalent: yield-from exists and delegates directly
def stream_sync(self, prompt):
    with self.client.stream("POST", ...) as r:
        yield from parse_sse_data_line(r)

What is SendType in Generator[YieldType, SendType, ReturnType] and AsyncGenerator[YieldType, SendType]?

Generators are two-way: they can yield values out and also receive values back in via .send() / .asend(). SendType is the type accepted on that inbound channel. This pattern exists but is niche in modern async code; most streaming generators only yield outward and never receive input, so SendType is usually None. That is why AsyncIterator[T] is often preferred over AsyncGenerator[T, None] in API signatures.

# Async generator that yields out and receives values in
async def chat():
    while True:
        user_msg = yield "ready"
        yield f"you said: {user_msg}"

g = chat()
await anext(g)            # "ready"
await g.asend("hello")   # "you said: hello"
await g.asend("world")   # "ready"

# In stream APIs, inbound .asend(...) is usually unused:
# prefer AsyncIterator[T] for simpler signatures.

What is generator.send() and why does it exist?

send(value) resumes a suspended generator and makes the yield expression evaluate to value. next() also resumes, but yield evaluates to None. This makes generators two-way: they can yield values out and receive values in.

This capability was formalized in PEP 342 (2005) so generators could be used as coroutines before native async/await existed (PEP 492, 2015). That history is why Generator and AsyncGenerator include SendType in their type parameters. In modern streaming/iteration code, Iterator[T] / AsyncIterator[T] is usually preferred because inbound send is rarely used.

def accumulator():
    total = 0
    while True:
        value = yield total   # OUT: yield total | IN: receive sent value
        total += value

g = accumulator()
next(g)        # 0   - prime generator (yield receives None)
g.send(10)     # 10  - value=10, total=10
g.send(5)      # 15  - value=5, total=15

# Typing shapes:
# Generator[YieldType, SendType, ReturnType]
# AsyncGenerator[YieldType, SendType]
# Iterator[YieldType] / AsyncIterator[YieldType] (preferred for modern stream APIs)

Does asyncio.FIRST_COMPLETED treat exceptions as "completed"?

Yes. A task is "done" whether it returned a value or raised an exception. With FIRST_COMPLETED, the first completed task may be a failure, so robust hedging logic should skip failed tasks and continue waiting on pending ones.

# bad: assumes first done means success
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
result = next(iter(done)).result()  # may raise

# good: keep waiting until one succeeds (or all fail)
success = None
while tasks:
    done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
    for t in done:
        try:
            success = t.result()
            break
        except Exception:
            pass
    if success is not None:
        break
    tasks = list(pending)

What control methods do generators expose, and what does .close() / .aclose() actually do?

Generators are iterators plus bidirectional control: send, throw, and close (async: asend, athrow, aclose). close/aclose injects GeneratorExit at the suspension point, which unwinds finally and context managers for cleanup.

# sync generator controls:
# __next__(), send(val), throw(exc), close()

# async generator controls (awaitable):
# __anext__(), asend(val), athrow(exc), aclose()

async def stream():
    async with open_connection() as conn:
        while True:
            yield await conn.read()
            # aclose() injects GeneratorExit here -> __aexit__ -> connection closed

How are async generators closed (break, aclose, cancel, Ctrl+C, GC), and which paths are safe?

break/return from async for triggers automatic aclose() (safe). Explicit await gen.aclose() is safe. task.cancel() is safe only if you also await task completion. asyncio.run() on Ctrl+C cancels tasks and processes cleanup. GC finalization is risky for async generators because the collector cannot await cleanup.

Closure paths:
- break/return from async for -> Python calls aclose() (safe)
- await gen.aclose() (safe)
- task.cancel(); await gather(...) (safe)
- Ctrl+C under asyncio.run() -> task cancellation chain (safe)
- GC finalization of orphaned async generators (unsafe/racy)

What causes aclose(): asynchronous generator is already running, and how do you fix it?

It usually happens when async generators are orphaned and GC tries to finalize them while async cleanup is still in progress. Root issue: synchronous GC cannot await async cleanup. Fix by cancelling pending tasks and awaiting them so cleanup finishes in event-loop context before shutdown.

for t in pending:
    t.cancel()

# Required: let cancellation propagate and cleanup complete
await asyncio.gather(*pending, return_exceptions=True)

# Now generators are closed and connections released

When should you use task.cancel(), asyncio.Event, threading.Event, or asyncio.to_thread()?

Use task.cancel() for I/O tasks that regularly await. Use asyncio.Event for broadcast stop signals or cooperative shutdown across many coroutines. Use threading.Event for real OS threads. Use asyncio.to_thread() to bridge unavoidable blocking sync code into async without blocking the event loop.

# task cancellation
task.cancel()

# broadcast shutdown signal
shutdown = asyncio.Event()
# coroutines check shutdown.is_set()

# thread coordination
stop = threading.Event()

# bridge sync function into async
resp = await asyncio.to_thread(requests.get, url)

Why did chunk get inferred as Any when iterating over an async generator?

The function used yield (generator behavior) but was annotated like it returned a single item type. That mismatch can degrade inference to Any. For async generators, annotate -> AsyncIterator[T]; for sync generators, annotate -> Iterator[T].

from collections.abc import AsyncIterator

async def stream_chunks(...) -> AsyncIterator[ChatStreamingResponseChunkData]:
    yield chunk

Why did r get inferred as Any after await fetch_chat_stream_async()?

In a streaming helper, the call site omitted a required argument (model). Once the function call is invalid, type checkers often fall back to Any, and that Any propagates to r.

# bad: missing `model`, so inference degrades
r = await fetch_chat_stream_async(messages=msgs)

# good: full signature restores inferred type
r = await fetch_chat_stream_async(model="openai/gpt-4o-mini", messages=msgs)

Why did the type checker show coroutine attributes (cr_running, cr_await) instead of httpx.Response attributes?

The stream function used a sync client type (httpx.Client) in async code. That makes await client.post(...) invalid, so the checker treats the expression like a coroutine-ish mismatch instead of a concrete httpx.Response.

# bad: sync client typed in async flow
async def fetch(client: httpx.Client, url: str):
    return await client.post(url)  # invalid await

# good: async client + await
async def fetch(client: httpx.AsyncClient, url: str) -> httpx.Response:
    return await client.post(url)

Why did with httpx.AsyncClient() raise AttributeError: __enter__?

AsyncClient only supports async context manager hooks (__aenter__/__aexit__). In streaming code you must use async with, including the nested client.stream(...) block.

async with httpx.AsyncClient(timeout=None) as client:
    async with client.stream("POST", url, json=payload) as r:
        async for line in r.aiter_lines():
            ...

Why do async with and async for exist?

In httpx streaming, enter/exit and line reads can block on network I/O. async with and async for use awaitable hooks (__aenter__/__aexit__, __anext__) so the loop can schedule other tasks while waiting for bytes.

async with client.stream("GET", url) as r:   # async acquire/release
    async for line in r.aiter_lines():        # async next chunk
        ...

What happens if iter_lines() and aiter_lines() are mixed with the wrong loop style?

Use sync iterator + for (iter_lines) in sync code, and async iterator + async for (aiter_lines) in async code. Crossing them raises type/runtime errors (for example, async for over iter_lines() or for over aiter_lines()).

# bad 1: async for on sync iterator
async with client.stream("GET", url) as r:
    async for line in r.iter_lines():
        ...  # TypeError: object has no __aiter__

# bad 2: sync for on async iterator
async with client.stream("GET", url) as r:
    for line in r.aiter_lines():
        ...  # TypeError: async_generator is not iterable

# good
async with client.stream("GET", url) as r:
    async for line in r.aiter_lines():
        ...

How can sync streaming code be used from async code as a workaround?

Wrap the sync function in asyncio.to_thread(...) so it runs in a worker thread and does not block the event loop. This is a bridge strategy when you cannot migrate sync stream code immediately.

import asyncio
import httpx

def read_sync_stream(url: str) -> None:
    with httpx.Client(timeout=None) as client:
        with client.stream("GET", url) as r:
            for line in r.iter_lines():
                if line:
                    print(line)

# async caller
await asyncio.to_thread(read_sync_stream, url)

How do you cancel an httpx async stream with Event?

Use an asyncio.Event as a cooperative stop signal. Check event.is_set() inside the read loop, then break so the async with block closes the response cleanly.

stop_event = asyncio.Event()

async def consume(url: str) -> None:
    async with httpx.AsyncClient(timeout=None) as client:
        async with client.stream("GET", url) as r:
            async for line in r.aiter_lines():
                if stop_event.is_set():
                    break
                if line:
                    handle(line)

# elsewhere (button handler, shutdown task, etc.)
stop_event.set()

How do you limit concurrent async streams?

Guard each stream operation with an asyncio.Semaphore so only N streams run at once. This prevents connection spikes and keeps resource usage predictable.

sem = asyncio.Semaphore(5)

async def stream_one(client: httpx.AsyncClient, url: str) -> None:
    async with sem:
        async with client.stream("GET", url) as r:
            async for line in r.aiter_lines():
                if line:
                    handle(line)

async with httpx.AsyncClient(timeout=None) as client:
    await asyncio.gather(*(stream_one(client, u) for u in urls))

Why did '_AsyncGeneratorContextManager' has no attribute 'aiter_lines' occur?

client.stream(...) returns a context manager, not a response object. Enter it first, then iterate on the response.

async with client.stream("POST", url, json=payload) as r:
    async for line in r.aiter_lines():
        ...

Why does httpx require a context manager for streaming?

Streaming keeps a connection open; context managers guarantee cleanup and return connections to the pool. The manual path exists (client.send(req, stream=True)), but then you must explicitly call await response.aclose().

Does httpx have stream=True on client.post()?

No. client.post() is the eager-body path. Incremental reads use client.stream(...) (or async equivalent), which makes stream lifecycle explicit and safer.

Can you do httpx.get() like requests.get()?

Yes for sync usage. For async usage, use httpx.AsyncClient and await client.get(...); not await httpx.get(...) in async-first code.

Is there an async-first HTTP library with a simple await get(url) API?

Python typically uses client objects for async HTTP to manage pools, timeouts, and lifecycle explicitly. JS environments expose fetch globally because runtime/event-loop integration is different.

How do you annotate return types for async def functions?

Annotate the awaited value type (for example -> str), not Awaitable[str] in normal function signatures. Tooling may still display Coroutine[Any, Any, T] at call sites before awaiting.

async def foo() -> str:
    return "ok"

How do you make an async iterator with generator syntax?

Use async def with yield to create an async generator, then consume with async for.

async def gen():
    yield 1

async for x in gen():
    ...

What happens if you use sync for on an async iterator?

It raises a TypeError because protocols differ: sync iteration expects __iter__/__next__, while async iteration uses __aiter__/__anext__.

Does httpx aiter_lines() return str or bytes?

str. In contrast, requests.iter_lines() is commonly handled as bytes unless decoding is enabled (for example decode_unicode=True) or manual decoding is applied.

Is httpx the same as requests but async?

Mostly in API style: httpx adds async support via AsyncClient, includes HTTP/2 support, and uses httpcore. requests is built around sync urllib3 and remains sync-focused.

Is there a structural typing mechanism in Python like TypeScript interfaces?

Yes: typing.Protocol. Any class matching the required shape satisfies it without inheritance. collections.abc also provides standard protocol abstractions.

from typing import Protocol

class StreamLike(Protocol):
    async def __anext__(self): ...

How do I stream LLM tokens with httpx, and what is the requests equivalent?

Use API-level streaming ("stream": true) so the server emits incremental events, then use HTTP client streaming so your app reads chunks as they arrive instead of buffering the whole body. In requests, you enable this with stream=True. In httpx, you use client.stream(...) / async with client.stream(...). This lowers time-to-first-token in real UIs because tokens can be rendered immediately.

# requests (sync)
payload = {"model": "openai/gpt-4o-mini", "messages": msgs, "stream": True}
with requests.post(url, json=payload, headers=headers, stream=True) as r:
    for line in r.iter_lines(decode_unicode=True):
        if line and line.startswith("data:"):
            print(line)  # parse SSE chunk and render token

# httpx (async)
async with httpx.AsyncClient(timeout=None) as client:
    async with client.stream("POST", url, json=payload, headers=headers) as r:
        async for line in r.aiter_lines():
            if line and line.startswith("data:"):
                print(line)  # parse SSE chunk and render token

How does asyncio.wait (return_when) compare to gather and Node Promise combinators?

asyncio.wait(tasks, return_when=...) gives you low-level control over completion conditions and returns (done, pending). FIRST_COMPLETED is useful for hedged requests (use the fastest provider), FIRST_EXCEPTION for fail-fast orchestration, and ALL_COMPLETED for barrier-style completion. asyncio.gather(...) is higher-level result aggregation in positional order. In Node.js, closest matches are Promise.all (fail fast, ordered results) and Promise.allSettled (wait for all outcomes regardless of failures).

import asyncio

async def fetch(provider):
    ...

async def hedged_request():
    tasks = {asyncio.create_task(fetch(p)) for p in ["openai", "anthropic", "groq"]}
    done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)

    winner = done.pop()
    try:
        return await winner
    finally:
        for t in pending:
            t.cancel()  # cancel slower duplicates

# Other modes:
# await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
# await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)

# gather: ordered aggregation
# results = await asyncio.gather(*tasks, return_exceptions=True)

# Node.js analogs:
# await Promise.all(promises)        // fail-fast
# await Promise.allSettled(promises) // collect all outcomes

What is the difference between a coroutine and asyncio.create_task()?

Calling an async def function returns a coroutine object, but it does not start running until it is awaited (or otherwise scheduled). asyncio.create_task(coro) schedules it on the event loop immediately, so it can run concurrently before you later await the task result.

import asyncio

async def work(name):
    await asyncio.sleep(1)
    return name

async def main():
    coro = work("A")          # not running yet
    result = await coro        # starts and waits

    task = asyncio.create_task(work("B"))  # scheduled now
    # do other work here while B runs
    result2 = await task

What are the async versions of iterator, generator, and context-manager dunder hooks?

Sync iteration uses __iter__ + __next__ (for) and ends with StopIteration. Async iteration uses __aiter__ + __anext__ (async for) and ends with StopAsyncIteration. Sync context managers use __enter__ + __exit__ (with). Async context managers use __aenter__ + __aexit__ (async with). Generators follow the same split: def ... yield for sync generators, async def ... yield for async generators consumed with async for.

# Iteration hooks
class SyncIteratorExample:
    def __iter__(self):
        return self
    def __next__(self):
        # End sync iteration
        raise StopIteration

class AsyncIteratorExample:
    def __aiter__(self):
        return self
    async def __anext__(self):
        # End async iteration
        raise StopAsyncIteration

# Context manager hooks
class SyncContextManagerExample:
    def __enter__(self):
        # Acquire resource
        ...
    def __exit__(self, exc_type, exc, tb):
        # Release resource
        ...

class AsyncContextManagerExample:
    async def __aenter__(self):
        # Acquire async resource
        ...
    async def __aexit__(self, exc_type, exc, tb):
        # Release async resource
        ...

What are contextmanager and asynccontextmanager, and are they stdlib?

They are decorators from Python's standard library module contextlib. @contextmanager builds a sync context manager for with, and @asynccontextmanager builds an async context manager for async with, usually from generator-style functions with yield.

from contextlib import contextmanager, asynccontextmanager

@contextmanager
def managed_resource():
    # Setup before entering `with`
    try:
        yield "resource"
    finally:
        # Cleanup on exit
        pass

@asynccontextmanager
async def managed_async_resource():
    # Async setup before entering `async with`
    try:
        yield "resource"
    finally:
        # Async cleanup on exit
        pass

How do I use uv to initialize and manage a Python project?

uv is a fast Python project/package manager. Typical flow: initialize a project, add deps, run code, then lock/sync environments. You can also pass --env-file at runtime so you do not need python-dotenv loading in app code.

References:

  • https://docs.astral.sh/uv/
  • https://docs.astral.sh/uv/concepts/projects/init/
  • https://docs.astral.sh/uv/guides/projects/
  • https://docs.astral.sh/uv/concepts/projects/run/
# new folder
mkdir myapp && cd myapp
uv init

# or initialize an existing folder (same directory target)
uv init .

# add a dependency
uv add httpx

# run with env vars from a file (skip dotenv load in code)
uv run --env-file .env main.py

# produce/update lockfile and sync env
uv lock
uv sync

How do generators and iterators interact with __iter__ and __next__?

If __iter__ returns a generator, the class is an iterable and the generator is the iterator. If __iter__ returns self, the class is the iterator and must implement __next__.

class A:
    def __iter__(self):
        return (i for i in range(3))  # iterable only

class B:
    def __iter__(self):
        return self
    def __next__(self):
        ...  # iterator

Why does a type checker show Coroutine[...] or Any instead of httpx.Response?

Common causes are mismatched client type (httpx.Client vs httpx.AsyncClient), missing function arguments, or await used on a sync call. Once signatures and call sites are corrected, inference typically resolves to httpx.Response.

async def fetch(client: httpx.AsyncClient) -> httpx.Response:
    return await client.get(url)

# bad: await on sync client call can degrade inference
# async def bad(client: httpx.Client):
#     return await client.get(url)

How much latency does a syscall add?

Roughly tens to a few hundred nanoseconds for trivial syscalls, and microseconds for heavier operations depending on OS, hardware, and workload.

# Measure rough syscall overhead (example: getpid)
import os, time
n = 1_000_000
t0 = time.perf_counter_ns()
for _ in range(n):
    os.getpid()
print((time.perf_counter_ns() - t0) / n, "ns/call")

Why does buffered I/O matter for Python LLM token streams?

Python stdout is typically line-buffered on terminals (flushes on newline) and more fully buffered when piped/redirected. In token streams, printing partial tokens without can delay visible output unless you flush. Buffering exists to reduce syscall count (write calls), improving CPU efficiency and throughput. The same tradeoff appears in Go (bufio.Writer), C (setvbuf), Rust (BufWriter), and Node/TypeScript streams: larger buffers reduce overhead but can increase per-token display latency.

# Python: line-buffered on TTY; partial tokens may wait without newline
print(token, end="")
print(token, end="", flush=True)  # force per-token visibility

# Go: buffered writer, explicit flush
w := bufio.NewWriter(os.Stdout)
fmt.Fprint(w, token)
w.Flush()

# C: choose buffering mode (line/full/none)
setvbuf(stdout, NULL, _IOLBF, 0); // line buffered

# Rust: BufWriter + explicit flush
write!(writer, "{}", token)?;
writer.flush()?;

# Node/TS: writes may be buffered by stream/runtime
process.stdout.write(token);

What is the difference between requests stream=True and httpx streaming?

requests uses a request flag (stream=True) to avoid eager body download. httpx uses an explicit streaming context (client.stream(...) / async with client.stream(...)) and then iter_*/aiter_* consumption methods.

# requests
r = requests.get(url, stream=True)
for line in r.iter_lines():
    ...

# httpx
with httpx.Client() as c:
    with c.stream("GET", url) as r:
        for line in r.iter_lines():
            ...

Can the OpenAI API query my ChatGPT conversation history?

No. There is no general API to fetch personal chat history from chat.openai.com. You need to provide transcripts yourself or store/retrieve your own conversation data in your app.

# Store your own app conversations; there is no ChatGPT-history API
store.append({"user": prompt, "assistant": answer})
# later: query your own store by conversation_id

What does stream=True actually control?

There are two layers: API payload streaming (stream: true) tells the server to emit incremental events, while Python requests stream=True controls whether the HTTP response body is buffered eagerly or consumed incrementally.

# API/server layer
payload = {"model": model, "messages": msgs, "stream": True}

# HTTP client layer (requests)
r = requests.post(url, json=payload, stream=True)

How do I stream lines with httpx?

Use a streaming context first, then line iterators: with client.stream(...) as r: r.iter_lines() for sync and async with client.stream(...) as r: async for line in r.aiter_lines() for async.

# sync
with httpx.Client() as c:
    with c.stream("GET", url) as r:
        for line in r.iter_lines():
            ...

# async
async with httpx.AsyncClient() as c:
    async with c.stream("GET", url) as r:
        async for line in r.aiter_lines():
            ...

Does stream always mean SSE?

No. SSE is one protocol over HTTP streaming. Streaming can also be chunked bytes, NDJSON, large file transfer, or any incremental response body.

# SSE frame example
# data: {"delta":"hi"}



# NDJSON example
# {"delta":"hi"}

Why do iter_lines-style APIs work on non-streaming responses too?

If the response is already buffered, iterators still work by iterating over in-memory content. You keep a unified API, but lose streaming latency and memory benefits.

# buffered first, then iterate in memory
r = requests.get(url)
for line in r.iter_lines():
    ...

# true streaming from socket
r = requests.get(url, stream=True)
for line in r.iter_lines():
    ...

Why does httpx.Response expose both sync and async line iterators?

The response abstraction supports both sync and async client flows. Use iter_lines() with sync clients and aiter_lines() with async clients; mixing modes raises errors or causes blocking issues.

# sync response
for line in resp.iter_lines():
    ...

# async response
async for line in resp.aiter_lines():
    ...

How does streaming map to Go HTTP clients?

In Go, resp.Body is an io.ReadCloser you can read incrementally by default. You still need API-level streaming flags when the server must emit incremental events (like SSE).

resp, _ := http.Get(url)
defer resp.Body.Close()
scanner := bufio.NewScanner(resp.Body)
for scanner.Scan() {
    line := scanner.Text()
    _ = line
}

Should I create an OpenRouter client per request or reuse one?

Reuse one client across a larger scope. A per-request context manager adds setup/teardown overhead. The context manager is mainly for guaranteed cleanup at lifecycle boundaries.

with OpenRouter(api_key=key) as client:
    client.chat.send(...)
    client.chat.send(...)  # reuse one client

What is the best cleanup pattern in Python for network clients?

Prefer with/context managers or explicit close() usage in try/finally. Avoid relying on __del__ for important cleanup because garbage collection timing is not deterministic.

provider = Provider(...)
try:
    provider.query("hello")
finally:
    provider.close()

Does del obj destroy a Python object immediately?

No. del removes a reference name; the object is reclaimed only when references drop to zero (or GC later collects cycles). It is not deterministic cleanup.

x = SomeObject()
y = x
del x  # object still alive via y

How should I think about dunder methods conceptually?

Think in terms of syntax hooks: language constructs (with, for, len, indexing, operators, attribute access) dispatch to specific dunder methods under the hood.

class Box:
    def __len__(self): return 3

len(Box())  # calls __len__

Why does Go not use Python-style destructors?

Go prefers explicit deterministic cleanup with defer at function scope instead of GC-timed destructors. This avoids lifecycle ambiguity and keeps cleanup predictable.

f, _ := os.Open("file.txt")
defer f.Close()
// use f

Why is time-to-first-token (TTFT) slower on advanced models?

TTFT is dominated by prefill over the full prompt. Larger models, longer prompts, queueing, cold starts, and hardware/provider differences all increase first-token latency.

t0 = time.perf_counter()
resp = client.chat.completions.create(...)
print("TTFT", time.perf_counter() - t0)

How does KV cache avoid recomputing all prior tokens?

During prefill, each token's keys/values are stored by layer and position. During decode, only the new token's K/V is computed and appended; prior tokens are read from cache.

# prefill: cache prompt K/V once
kv = prefill(prompt_tokens)
# decode: append only new token K/V each step
kv.append(step(new_token))

Is there a TypeScript-style interface shape for async clients in Python?

Yes. Use typing.Protocol for structural typing. Any object matching the required async method signatures can satisfy the protocol without explicit inheritance.

class AsyncClientLike(Protocol):
    async def get(self, url: str) -> Any: ...

How should async return types be annotated in Python?

Annotate the awaited result type (e.g., async def f() -> Response). Type checkers still know the call expression returns a coroutine before awaiting.

async def fetch(client: httpx.AsyncClient) -> httpx.Response:
    return await client.get(url)

Why do httpx.AsyncClient, async with, and async for matter?

Async resources expose async protocols (__aenter__, __aexit__, __anext__). Use async with/async for so the event loop can schedule other work during I/O waits.

async with httpx.AsyncClient() as c:
    async with c.stream("GET", url) as r:
        async for line in r.aiter_lines():
            ...

Is httpx basically async requests?

Conceptually yes: similar ergonomic API, but httpx supports both sync and async clients and modern transport features like HTTP/2. requests remains sync-focused.

# requests (sync)
r = requests.get(url)

# httpx async
async with httpx.AsyncClient() as c:
    r = await c.get(url)

How does Go streaming with blocking reads compare to Python async iterators?

In Python, an async iterator cooperatively yields control at each await, so the event loop can run other tasks. In Go, you typically keep a blocking Next(ctx)/ReadBytes loop and run it in a goroutine. The goroutine can block on network I/O while the runtime scheduler runs other goroutines, so you still get async behavior at the system level without an async iterator language feature.

# Python: async iterator style
async def stream_py(resp):
    async for line in resp.aiter_lines():
        yield line

# Go: blocking iterator/read loop in goroutine style
func streamGo(ctx context.Context, r *bufio.Reader, emit func([]byte) bool) error {
    for {
        line, err := r.ReadBytes('
') // blocking read
        if err != nil {
            if errors.Is(err, io.EOF) { return nil }
            return err
        }
        if !emit(line) { return nil }
        if ctx.Err() != nil { return ctx.Err() }
    }
}

Why do p99/p99.9/p99.99 latencies often matter more than averages?

Users do not experience averages; they experience individual requests. At scale, even rare slow paths happen constantly. A p99.99 outlier means 1 in 10,000 requests is slow; with high request volume, that becomes continuous user pain. Tail latency also compounds across fan-out systems: one slow dependency can dominate the whole request.

# Back-of-envelope tail math
# If you serve 10,000,000 requests/day:
# p99.99 slow-path rate = 0.01% = 1/10,000
# -> about 1,000 slow requests/day

# Fan-out amplification:
# one user request calls 20 downstream services
# even if each service has good median,
# combined tail risk increases significantly.

# Practical rule:
# optimize and monitor p95/p99/p99.9/p99.99,
# not just average or p50.

What are sketches (Bloom, HyperLogLog, quantile sketches, MinHash/SimHash), and when should you use them?

A sketch is a compact probabilistic summary that trades exactness for speed and memory. Use sketches when exact storage/computation is too expensive and bounded error is acceptable. They are usually a first-pass filter/estimator, not the final source of truth.

# Sketch chooser (back-of-envelope)
# Need fast membership check with tiny memory?
# -> Bloom filter (false positives possible, no false negatives)

# Need approximate distinct count at scale?
# -> HyperLogLog (cardinality estimation)

# Need approximate percentiles/quantiles in streams?
# -> KLL / t-digest / DDSketch (quantile sketches)

# Need near-dup candidate retrieval for text/docs?
# -> SimHash (Hamming distance on compact fingerprints)

# Need Jaccard-style overlap approximation?
# -> MinHash (+ LSH for scalable lookup)

# Pattern in production:
# sketch first (cheap) -> exact/heavier verification on candidates
February 19, 2026

What is the simplest way to kill a process on a port (macOS/Linux)?

Use lsof to find the PID on that port, then kill it in one command.

kill $(lsof -ti tcp:3000)

How do I stream responses from the OpenRouter API?

Use the stream: true parameter in your chat completion request. The response will be a stream of Server-Sent Events (SSE) that you can process incrementally.

const response = await openrouter.chat.completions.create({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Hello!' }],
  stream: true,
});

for await (const chunk of response) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Does the OpenRouter SDK support the same interface as OpenAI?

Yes! OpenRouter is fully compatible with the OpenAI SDK. You just need to set a custom baseURL and use your OpenRouter API key.

import OpenAI from 'openai';

const openrouter = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
});
February 18, 2026

What is the provider field in OpenRouter responses?

The provider field in OpenRouter responses tells you which underlying provider actually served your request. This is useful for debugging and understanding routing behavior, especially when using models available from multiple providers.