Infrastructure
Edge Compute for Real-Time Fraud Scoring
Why Hitpixel runs risk decisioning at the edge instead of the origin, and how the architecture splits light and heavy models across 300+ POPs.
Edge compute for real-time fraud scoring is what separates a checkout that converts from a checkout that asks for patience. A 200ms detour from a Riyadh shopper to a Virginia datacenter for a single risk decision sounds harmless on a status page. In production it shows up as a measurable drop in completion rate across EMEA, MENA, and APAC corridors, and the gap widens as mobile networks dominate the traffic mix. This post is how we think about the split between edge and origin, and why we no longer run any first-pass risk model at the origin.
The Latency Math At Checkout
A card-not-present checkout has a fixed budget. Most consumers tolerate roughly 1.2 to 1.8 seconds between pressing pay and seeing a result before they start to suspect something is broken. Past 2.5 seconds, abandonment climbs sharply and the customer reaches for a different tab.
Inside that budget you have to do TLS termination, parse the request, run a fraud decision, hand off to the acquirer, wait for the issuer response, and render the outcome. The acquirer-to-issuer leg alone is often 600 to 900ms in cross-border corridors. If your fraud model adds another 200ms transcontinental round trip, you have spent more than half your budget before any actual payment work begins.
The fix is not a faster model. The fix is a closer model. Running the first-pass risk decision in the same point of presence that terminated the TLS handshake removes the round trip entirely. The cost is that you have to fit the model into the constraints of an edge runtime, which is where the architecture gets interesting.
The Three-Layer Architecture
We run risk in three layers, and only one of them sits at the origin.
Layer One: Device Fingerprint At The Edge
Before any model runs, we collect a deterministic device fingerprint at the edge. IP reputation, ASN classification, JA3 TLS fingerprint, header ordering, language and timezone consistency, request timing relative to page load. This is data that exists at the edge and nowhere else. By the time the request reaches any model, the fingerprint is already attached to the request context.
This layer alone catches the noisy bottom of the fraud distribution. Bot traffic from known abusive ASNs, requests with TLS fingerprints that do not match the claimed user agent, sessions where the typing rhythm does not match a human. We block roughly 38% of attempted fraud here without ever running a model.
Layer Two: Light Model At The Edge
The remaining traffic gets a gradient-boosted decision tree that runs entirely inside the edge worker. The model is small enough to load in single-digit milliseconds, scores a transaction in under 8ms, and outputs a probability plus a confidence band.
Three outcomes are possible. High confidence approve, the transaction proceeds straight to authorization. High confidence decline, the transaction is rejected at the edge with no acquirer cost. Borderline, the transaction is forwarded to the origin for the heavy model.
The light model uses about 60 features. Device fingerprint, BIN intelligence, basket composition, velocity counters maintained in Cloudflare Workers KV and Durable Objects, and a small set of behavioral signals from the session. It does not need to be the most accurate model. It needs to be the most accurate model that fits in the latency budget of an edge worker, which is a different optimization.
Layer Three: Heavy Model At The Origin
Borderline cases, roughly 6 to 9% of traffic depending on the corridor, go to the origin where we have GPU-backed inference and a much larger feature set. Graph features that look at the sender across other merchants in the same client engagement, deep behavioral embeddings, cross-merchant velocity, and a generative component that scores narrative consistency on the order details.
This is the model that costs money to run. Running it on every transaction would burn budget and add latency to traffic that does not need it. Running it only on the borderline 7% gives us the precision where it matters and keeps the median checkout under one second.
Why Not Just Run It All At The Origin
The honest answer is that operators who route every decision to a central origin are subsidizing customers in the region the origin happens to live in. A US-origin gateway gives a great experience to US shoppers and a degraded experience to everyone else. For client engagements that earn more than half of revenue outside North America, that math does not work.
The other answer is that origin-only architectures are fragile. A single regional outage in the origin cloud takes down checkout globally. An edge-first architecture degrades gracefully. If the origin is unreachable, borderline transactions can fall through to a conservative deny rather than failing open or failing closed across the board.
Comparing The Edge Runtimes
Three runtimes are realistic candidates for this workload. We have run production traffic on all three and the choice is less obvious than the marketing pages suggest.
Cloudflare Workers
The most points of presence by a wide margin and the lowest cold-start variance. The V8 isolate model means there is effectively no cold start, which matters for fraud scoring because a 300ms cold start on a request that needs to complete in 1.2 seconds is unacceptable. Workers KV and Durable Objects give you stateful primitives at the edge that the other runtimes lack. The constraint is the CPU time limit per request, which is generous but not unlimited, and the memory ceiling, which forces you to keep models small.
AWS Lambda@Edge
Tighter integration with the rest of AWS, which matters if your origin already lives in AWS and you want CloudFront in the same control plane. Cold starts are real and variable. The AWS Lambda@Edge documentation is honest about the constraints, including the limited number of edge locations relative to Cloudflare and the restricted runtime feature set compared to standard Lambda.
Vercel Edge Functions
Built on the same V8 isolate primitive as Cloudflare Workers under the hood, with a more developer-friendly deployment story and tight Next.js integration. Excellent for product teams. Less ideal for the kind of low-level control fraud scoring demands, where you want direct access to TLS-level signals and the ability to reach into the request lifecycle. Vercel publishes a thoughtful piece on edge function tradeoffs that is worth reading even if you choose a different runtime.
We use Cloudflare Workers for the fraud scoring layer because the global footprint and the absence of cold-start variance matter more for our workload than tighter cloud integration. A different operator with a different traffic shape might land somewhere else.
What This Buys
The numbers we look at are simple. Median checkout latency dropped from 1.4 seconds to 720ms when we moved the first-pass model to the edge. Approval rate climbed roughly two and a half points across MENA and APAC, almost entirely because the latency improvement reduced abandonment between authorization and 3DS challenge response. Fraud loss rate stayed flat, because the heavy model still catches the borderline cases that the light model defers.
The full architecture sits behind every checkout across the client integrations engineered by our practice, and the broader gateway design is described in our technology overview.
The Operator Lesson
The lesson is not that edge is always better. The lesson is that a global checkout product without an edge story is a US checkout product with international customers attached. If your acquirer mix is global and your shoppers are global, your risk decisioning has to be global too. Running it all at the origin is a choice, and it is the wrong one once your traffic mix tilts away from the origin's home region.
The shift from origin-only to edge-first is not a rewrite. It is a layering exercise. Keep the heavy model where it lives. Move the device fingerprint and the light model to the edge. Measure the median and the P99 on every corridor. The architecture pays for itself in the first month of approval rate uplift, and the resilience benefits compound from there.