Andrew Ho

“What is RTT from US West to US East?” sounds like a simple question.

It is simple if you want one number. It is useful if you want the distribution.

The practical answer for production systems is:

Good cloud backbone path: usually around 60-85 ms RTT.
Very clean, lucky path: can dip into low 60s.
Noisy public internet path: often 70-110+ ms.

If you budget one value, use ~70 ms RTT baseline and design for ~90-120 ms tail.

Start With Physics, Not Benchmarks

There is a hard lower bound from propagation delay in fiber.

Signal in fiber is roughly two-thirds the speed of light, and real routes are not straight lines. Packets follow conduit and peering topology, not a map ruler.

That means:

There is a best-case floor.
Real paths are always above that floor.
Network policy (BGP), not geography alone, picks your actual route.

So the “coast-to-coast RTT” is not a constant. It is an outcome.

Why Your RTT Changes Hour to Hour

Same source region, same destination region, different latency:

Different egress PoP at provider edge.
Different peering path on transit networks.
Queueing under bursty traffic.
Path asymmetry (A->B route differs from B->A).

This is why p95 and p99 matter more than p50 for user-perceived responsiveness.

RTT Is Not Request Latency (But It Multiplies It)

Many people measure RTT once and assume request overhead is roughly that number. It is not.

Connection startup cost is often multiples of RTT:

TCP handshake: ~1 RTT.
TLS handshake: ~1 RTT (TLS 1.3 full handshake).
Request/first-byte path: another RTT-scale component.

If you miss keep-alive or connection reuse, coast-to-coast startup tax gets expensive quickly.

Where QUIC / HTTP/3 Changes the Picture

QUIC runs over UDP and integrates TLS 1.3 into the transport handshake.

What it improves:

Fewer startup RTT penalties in common cases (especially with connection resumption).
No TCP head-of-line blocking across multiplexed streams at the transport layer.
Better behavior on lossy/mobile networks in many real-world deployments.
Connection migration support (e.g., network changes like Wi-Fi to cellular).

What it does not change:

Physics floor still exists. Coast-to-coast propagation delay is unchanged.
Bad routing and congestion still hurt p95/p99.
Application queueing/prefill still dominates many LLM first-token paths.

Tradeoffs to account for:

Some enterprise/firewall environments treat UDP less favorably.
Observability/tooling can be less familiar than mature TCP workflows.
CPU and tuning characteristics differ by implementation.
Fallback paths (HTTP/2 over TCP) still matter for compatibility.

So HTTP/3 is a strong latency lever, but not a free pass. It reduces transport overhead and improves loss recovery behavior; it does not erase geographic RTT or overloaded backends.

TTFT, QUIC, and SSE (What Is Actually True)

For streaming UX, a useful decomposition is:

text

TTFT ~= queue_wait + prefill_compute + transport_startup + first_byte_delivery

Where QUIC helps:

It can reduce transport startup overhead versus fresh TCP+TLS paths.
It can improve behavior under loss for multiplexed streams.

Where QUIC does not help:

Queue spikes in model serving.
Large prefill compute on long prompts.
Poor region placement.

SSE fact check:

SSE is an HTTP response format, not a transport protocol.
SSE can run over HTTP/1.1, HTTP/2, and HTTP/3.
So "SSE over QUIC" is possible, but not guaranteed on every request.

Whether a specific stream uses QUIC depends on client/runtime support, ALPN negotiation, edge/CDN behavior, and fallback policy.

What This Means for LLM UX

Cross-country RTT leaks into first-token experience:

Prompt upload path.
Scheduler/queue response path.
First token delivery path.

That is why TTFT decomposition matters:

Queue.
Prefill.
Network overhead.

Model-side wins can be hidden by transport and placement decisions.

Architecture Moves That Actually Help

If you care about interactive latency across US coasts:

Region affinity: keep users near inference when possible.
Connection reuse: avoid repeated handshakes.
Stream responses: surface progress before full completion.
Hedge tail latency: race/fallback when p99 matters.
Measure by percentile and geography: global median hides pain.

A Better Mental Model

Treat RTT as a budget input, not a KPI.

Use it to reason about:

Handshake overhead.
Retry penalties.
Streaming smoothness.
Tail latency amplification under load.

Asking “what is West-to-East RTT?” is useful. Building systems as if the answer is a single fixed number is not.

Practical Budget

For US West <-> US East planning:

Baseline: 70 ms RTT.
Healthy range: 60-85 ms.
Tail allowance: up to 120 ms under real internet variability.

That budget is conservative enough to prevent surprise and tight enough to be actionable.