How an HFT Team Hit the Wall on Solana — and What Happened When They Moved to RPCFast

The call that most high-frequency trading teams eventually make to RPCFast starts the same way. Not with a catastrophic failure, but with a plateau. The strategy is working. The signals are good. The execution is just... consistently not as good as it should be. P99 latency is higher than expected. Bundle acceptance rate is stuck in the 50–60% range regardless of tip size. Slot lag appears during exactly the windows when it's most expensive. And the team has already exhausted the obvious fixes.

3d render of an abstract particle design on a space nebula background
Image by kjpargeter on Freepik

This is a reconstructed account of one such team — a three-person prop trading desk running market-making and arbitrage strategies on Solana. The details are composited from common patterns, but the numbers are real.

Where They Started

The team had been running on a premium shared endpoint for eight months. By consumer dApp standards, the setup was solid — a well-known provider, a high-tier plan with gRPC access, reasonable uptime. By HFT standards, it had three problems they hadn't fully diagnosed yet.

The first was resource contention. Shared infrastructure means shared compute. During quiet periods — low network activity, off-peak hours — the node performed well. During the windows that mattered for their strategies — high-volatility trading sessions, major token events, periods of rapid price movement across DEXs — the node was under load from other tenants simultaneously. Latency spiked. gRPC stream delivery slowed. The execution quality degraded at exactly the moments when execution quality was most valuable.

The second problem was geography. Their RPC node was in a US West data center. Roughly 65% of high-stake Solana validators are concentrated in US East. Every transaction submission was crossing the country before it reached the validator most likely to be the current slot leader. The round-trip added 55–70ms of structural latency to every bundle — not a configuration problem, not a provider quality problem, just physics.

The third problem was Geyser feed quality. Their gRPC subscription was receiving account updates in 80–120ms on average, with spikes to 300ms+ under congestion. They didn't know this because they had no reference to compare against. The provider's dashboard showed green. The feed was working. It just wasn't working at the latency tier their strategies required.

The Diagnostic Phase

Before the migration, the team spent three weeks instrumenting their actual performance. This is the step most teams skip, and skipping it is why they often switch providers and see no improvement.

They set up a parallel slot freshness monitor — polling getSlot() from their primary endpoint and two reference providers every 200ms, logging the delta continuously. What they found: during off-peak hours, their node was 0–1 slots behind. During peak trading windows — 13:00–19:00 UTC — it was consistently 2–4 slots behind. They had been trading on state that was 800ms–1,600ms old during the sessions that generated the most opportunity.

They measured transaction landing rate by sending 50 test memo transactions per hour through their actual submission path, tracking how many appeared on-chain within three slots. The result: 71% landing rate overall, dropping to 58% during peak congestion windows.

They logged Geyser update timestamps against slot production times. The average delay between an account state change and their bot receiving it via gRPC was 94ms. During congestion, it exceeded 250ms in roughly 12% of updates — the tail that was killing their most time-sensitive strategies.

With those numbers documented, the diagnosis was unambiguous. This was an infrastructure problem, not a strategy problem.

The Migration

The team moved to dedicated bare-metal infrastructure through RPCFast — colocated in a US East data center, on the same LAN segment as a high-stake validator cluster. The transition took four days including testing: two days to provision and configure the node, two days of parallel running with both endpoints active to validate the numbers before cutting over.

The changes to their stack were surgical. Endpoint URLs updated. Geyser subscription endpoints updated. Geographic routing for Jito bundle submission updated to prioritize US East block engines. Everything else — strategy logic, transaction construction, tip calibration — stayed identical. The goal was to isolate the infrastructure variable, not change everything at once.

What Changed

After two weeks on the new stack, measured against the same methodology as the diagnostic phase:

Overall execution quality — measured as the percentage of detected opportunities that resulted in a profitable landed transaction — went from 31% to 67%.

What This Pattern Looks Like in General

The team's story isn't unusual. HFT strategies on Solana tend to hit the same ceiling at roughly the same point in their development: the strategy is good enough that infrastructure is the binding constraint, but the team hasn't yet built the measurement apparatus to prove it. They're experiencing the symptoms — inconsistent execution, unexplained bundle failures, strategies that work in backtesting but underperform live — without a clean diagnosis.

The diagnostic work is the unlock. Two to three weeks of rigorous measurement — slot lag against a reference, actual landing rate from real transactions, Geyser update delay logged against slot production — produces a picture that's usually unambiguous. Either the infrastructure is the problem, or it isn't. And if it is, the fix is known.

The teams that compete at the top of Solana HFT aren't running smarter strategies in isolation. They're running smarter strategies on infrastructure that doesn't lose slots, drop bundles, or serve stale state at the moments when those failures are most expensive.