Why Real ACL Hot Reload Should Not Make Every Connection Pay for It

When many systems talk about rule updates, the first questions are usually simple ones.

Can updates happen without downtime?
Can they take effect in real time?
Can the control plane push them quickly?

All of that matters. But if the goal is a genuinely high-performance edge server kernel, then merely supporting hot updates is nowhere near enough.

Because in production, what really determines the experience is not whether rules can be changed. It is this:

When rules are changing frequently, does the system still avoid slowing down every connection that passes through it?

That is harder than many people expect.

A rule system naturally faces two very different kinds of pressure at once.

Slow-path rule change, compilation, installation, and replacement
Fast-path rule matching, connection judgment, and object selection

If the implementation is not restrained enough, the two paths start contaminating each other.

Updates hold locks too heavily and matching threads line up behind them.
To support complex rules, every connection pays for metadata lookups that are not always needed.
Rule replacement creates half-switched windows between old and new rule sets.
A rule update succeeds functionally, yet overall performance falls because management logic slowed the hot path.

This is not a feature-checklist problem. It is a question of whether the rule engine was truly designed for a production hot path.

Our answer was very explicit:

Keep rule updates in the slow path. Protect rule matching in the fast path.

That is what this article is about.

1. The real problem is not whether rules can be hot-updated, but whether hot updates pollute the matching path

In many implementations, rule systems drift naturally toward a direction that looks reasonable at first but becomes expensive in practice.

Update shared structures directly when rules change.
Let matching threads hold long-lived read locks on those shared structures.
Prepare every possible context up front before matching, just in case a rule might need it.
Recycle old rule objects immediately after the update finishes.

That style of implementation usually runs at first and is easy to write. Once it enters real production traffic, the problems start to surface.

If matching threads hold read locks for long periods, update threads wait on write locks. Once update frequency rises, or rules grow more complex, the outcome is not simply that updates become a bit slower.

Matching threads and update threads begin to interfere with one another.
Tail latency rises during peak periods.
Frequent control-plane changes reduce data-plane stability.

2. Expensive context is prefetched just to avoid branching

Some rules depend on additional context such as process information, device state, source attributes, or the result of external judgment. A lot of systems choose an easy but expensive strategy:

Fetch all of that context up front whether the current rule set actually needs it or not.

The downside is immediate.

Connections that only need domain or address checks still pay for extra lookups first.
Requests that will never hit a relevant rule still absorb the additional cost.
Metadata lookups with timeouts or system-call cost get forced into every connection hot path.

3. Old and new rules lack a stable transition surface

If an update overwrites the old rule object directly and immediately frees the old one, the system can enter a very dangerous state.

New rules have started to enter the shared structure.
Old rules are still in use by concurrent matching threads.
Object lifetime and concurrent reads no longer line up.

On the surface, that looks like a correctness problem. At a deeper level, it is also a performance problem.

Because once a system cannot switch safely, it usually responds by doing one of three things.

Add heavier locks,
Fall back to more conservative serialized paths,
Or make updates slower and rougher than they need to be.

In the end, the fast path is still what pays.

2. Our core principle: let matching threads see a stable snapshot, not a structure that may still be changing

The first thing we changed in this refactor was not to patch the lock story one more time. We changed the model itself:

Matching threads do not touch a rule set that may still be mutating. They read from a stable snapshot.

That means a rule update no longer tries to modify a public shared array in place. It follows a different path.

Compile the new rules on the slow path first.
Build a new rule snapshot.
Atomically switch the visible entry point to that new snapshot.
Let the old snapshot drain naturally.
Recycle old objects only afterward.

The most important point here is not merely that snapshots exist. It is this:

The snapshot is not a control-plane concept. It is a matching-plane concept.

In other words, it is not there to make configuration management look cleaner. It is there so that every connection entering rule matching sees one complete, stable, non-mutating view of the rule set.

That brings three direct benefits.

Concurrent matching no longer needs to hold long-lived rule read locks.
Update threads do not need to wait for all matching threads to finish before preparing new rules.
The retirement of old rules becomes a controlled lifecycle instead of a risky overwrite.

This may sound like a concurrency detail, but it determines whether a system can stay stable when high concurrency and high-frequency rule updates happen at the same time.

3. The second key point: expensive context is no longer prefetched by default, but triggered lazily only when a rule actually needs it

Making the switch between old and new rules safe is not enough by itself.

If the matching path still prepares a large amount of context on every connection even when most of it will never be used, then the fast path is still wasting work.

So we made a second optimization:

Expensive metadata lookup moved from default execution to on-demand execution when a rule truly requires it.

The logic behind that is very simple.

Not every connection needs process information. If the rules currently being matched care only about source address, destination address, domain name, port, or inbound tags, then there is no reason to perform an additional process-information lookup before any process-related condition is even encountered.

It may sound like just one skipped step, but the impact is large.

That is because these kinds of lookups usually share several properties.

They are much more expensive than ordinary in-memory checks.
They may involve deeper system-level search.
They need timeout protection.
They amplify tail latency under high concurrency.

If they become part of the default pre-match path for every connection, then even when process-related conditions are rarely used, the system is still paying for them constantly.

Our design did not remove advanced process-aware capability. Instead, it made the capability self-declared:

Rules declare whether they need that class of context, and only rules that truly depend on it trigger the extra lookup.

That kind of lazy evaluation is especially valuable in edge scenarios.

Most ordinary rules travel a shorter path.
A smaller number of high-complexity rules still retain full expressive power.
The cost of advanced capability no longer spills outward onto every request.

A truly high-performance rule system never lays every capability across the hot path at once. It makes the hot path pay only for what it must pay for at that moment.

4. The third key point: old rules must not disappear immediately. They have to wait until concurrent readers drain naturally

One of the easiest things to overlook in rule replacement is object retirement.

"The new rules are live" is not the same thing as "nobody is still reading the old rules."

As long as concurrent matching threads still hold references to the old rule snapshot, reclaiming the old objects immediately is unsafe.

But if the whole matching path is locked just to prevent that, performance collapses back toward the same bottleneck.

So we chose a more restrained model:

An old snapshot enters retirement, but is not reclaimed aggressively. It leaves only after the final concurrent reader is gone.

That mechanism matters enormously because it gives the system an important property:

Rule switching can complete quickly while the lifetime of old rules remains safe, complete, and waitable.

That avoids two equally undesirable extremes.

Making switching heavily serialized with large locks purely for safety
Making object lifetime so aggressive in the name of speed that correctness becomes fragile

A mature hot-update system is not one that merely switches quickly. It is one where:

Switching is fast,
the read path stays stable,
old-object retirement has boundaries,
and concurrent reads and writes do not trample one another.

5. The fourth key point: making a rule live is not just replacing an array. It is an installation process with scope and lifecycle

There is another rule-system problem that is often underestimated:

A rule update is not just text replacement. It is the installation and switching of runtime objects.

That means an update has to handle at least four things.

Compile the new rules into executable matching objects first.
Bind those new rules to the correct scope.
Keep the matching view complete while old and new rules are switching.
Shut down resources correctly when old rules retire.

If those steps are not separated clearly, then under high-frequency change the system quickly drifts into inconsistency.

In this refactor, we turned updates into a clearer pipeline:

Compile the rules on the slow path first,
bind them into the correct scope,
publish them into a new snapshot view,
mark the old snapshot as retired and wait for it to drain,
and only then close the old rule objects.

That means a rule update is no longer "push some configuration in and see what happens." It becomes a runtime switch with lifecycle management.

The value of that design is not only performance. It is predictability.

Because what production systems fear most is not that a rule cannot be changed. It is this:

The update looks successful, yet some connections are still reading an old structure.
Occasional anomalies appear during hot update.
Rules appear switched while their backing resources are not ready.
Behavior becomes inconsistent under certain concurrency windows.

A production-ready hot-update rule system should not depend on luck.

6. Why this goes deeper than ordinary ACL hot reload

Plenty of systems can claim that they support rule updates. But "supports updates" and "updates without harming the hot path" are very different things.

To do this well, a system has to solve several problems at once.

1. Matching comes first, not updating

The primary responsibility of a rule engine is not to make the control plane pleasant. It is to let every connection match stably, quickly, and concurrently.

2. Advanced capability must not contaminate every request by default

Process recognition, extended context, and richer metadata are all valid capabilities, but they should not become a fixed tax on every connection.

3. Lifecycle must exist independently of configuration text

What participates in matching is not JSON, not a form, and not a fragment of configuration text. It is a runtime object with a real lifecycle.

4. Hot updates cannot rely on one giant lock for correctness

A big lock is easy to understand at small scale. Under high concurrency and frequent updates, it is often one of the first things to damage tail latency.

That is why we care much more about this:

It is acceptable for the update path to be somewhat heavier. The matching path, however, must remain lighter, steadier, and less contended.

That difference is exactly what separates a kernel-style design from the mindset of an ordinary management backend.

7. What this really improves is not just average speed, but stability at peak load

A lot of people evaluate rule engines by average matching speed alone.

But in production, the truly dangerous problems are usually not averages. They are situations like these.

The control plane changes a rule and tail latency rises on the data plane.
One class of complex rule is enabled and suddenly every connection gets heavier.
Frequent rule updates produce visible jitter in concurrent matching.
The instant old rules switch to new rules becomes an unstable window.

The optimization set we built improves stability in exactly those scenarios.

Updates stay on the slow path.
Reads go through stable snapshots.
Complex metadata is triggered only on demand.
Old rules retire only after readers have drained.

That means the system is not simply making one operation faster. It is doing something more important:

Even in a real environment where rules keep changing, the fast path stays as short, stable, and predictable as possible.

That is the kind of performance an edge platform actually needs.

8. Why we think this is worth writing about in a technical blog

Because this is not a surface feature.

It reflects a very explicit engineering position:

Runtime governance capability inside an edge server kernel must serve the hot path, not force the hot path to pay for governance.

Rules, ACLs, access policies, and runtime judgments are all important in enterprise environments. But a mature platform should not respond to stronger capability by quietly making every connection slower.

What we value more is something else:

Capability keeps expanding, while the hot path stays as protected as possible from that expansion.

That is why the value of this refactor is not merely that the rule system got faster. It is that:

Matching threads see stable snapshots rather than shared structures in motion.
Expensive context is no longer prefetched by default, but triggered lazily on rule demand.
Old and new rules switch with a full lifecycle instead of risky in-place overwrite.
High-frequency updates and high-concurrency matching can coexist with much less interference.

This is not the easiest capability to reduce to one marketing sentence. But it is exactly the kind of thing that speaks to people who really understand systems.

Anyone who has built a high-concurrency production rule system knows immediately where the difficulty lies, and why this kind of implementation raises the level of the platform.

9. If we had to summarize this optimization in one sentence

The right summary is not "we support hot rule updates."

It is this instead:

We turned rule updating into a slow-path behavior and protected rule matching as a fast-path behavior, so that even while rules keep changing, the system still avoids making every connection pay extra for management activity.

That is what we believe deserves to be called performance engineering.