The KEDA HTTP Addon project contains three major components: the operator, scaler and interceptor.
Of these, the interceptor is the only component that sits in the critical path of all incoming HTTP requests. We also run them in a fleet that is horizontally scaled by software.
We're going to focus on how we ensure that any interceptor replica can route an incoming request to the correct backing application at any time.
The interceptor component is designed to run in a Deployment
that KEDA automatically scales. This high-level design has a few implications:
Service
and port.Since the interceptor needs to do a lookup to the routing table before forwarding any request (or returning an error code), lookups need to be as fast as possible. That means storing the routing table in memory and keeping each interceptor's in-memory copy up to date with the central copy.
We do this wth a relatively simple event loop, outlined in the below (Go-like) pseudocode:
table = fetch_table_from_kubernetes()
report_alive()
ticker = start_ticker(every_1_second)
for {
select {
case new_table <- kubernetes_events_chan:
table = new_table
case <-ticker:
table = fetch_table_from_kubernetes()
}
}
A few important points to note about this event loop:
kubernetes_events_chan
receives notifications about changes to the central routing table in near-real time. When we get a change notification, we immediately update the in-memory copy.ticker
fires a signal every second, at which time we do a full refresh of the routing table. This mechanism ensures that all interceptor replicas receive changes to the routing table within 1 second of the change being made to the central copy.The KEDA HTTP Addon runs a fleet of interceptors and enlists KEDA to actively scale the fleet. This means that as we send more HTTP traffic to the cluster, we expect the interceptors to automatically scale up. This behavior is one of the most important features of the HTTP Addon.
We've built this event loop into the interceptors to ensure that there can be thousands [1] of them running at once, and they all stay up to date with the central routing table -- data that they need to do their job.
[1] As we see in the pseudocode above, each interceptor issues requests to the Kubernetes API, so as we scale them out, we generate more consistent traffic to the cluster API. As you scale further than the low thousands of replicas, you would need to add an intermediate layer of caching between the interceptors and the API to ensure that you don't crash the cluster control plane.