When you provision workload in the cloud to serve an application, having a load balancer at the front end of the applications’ tier is almost always a must, to ensure that users’ requests are redirected to the workload instances that have the capacity to serve the request with better performance.
Load balancing in Google Cloud Platform GCP is a fully scalable and redundant managed service offered in different flavors (global external, regional external, regional internal) This article focuses on the global external load balancer. Figure 1 below illustrates the high level architecture of Google global load balancer that will be discussed in more details in this article.
Generally, Load balancing in the cloud offers two main functions:
Its, obvious from the name this type of LB, it is external (client/user facing) and it’s a global, also, it’s an application layer (HTTP(S)) type of LB. As highlighted in my previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations” with Google external global load balancer all what you need is a single IP to front end your application stack that cloud be distributed globally without the need to deploy a load balancer per region!
Since the Google global LB is not a single box, it’s not a VM instance and it’s not a cluster, so how does it work and where does it reside?
GCP global LB is constructed of the following components
GCP LB is capable to scale quickly and effortlessly, according to GCP “Cloud Load Balancing is built on the same front-end serving infrastructure that powers Google. It supports 1 Million+ queries per second with consistent high performance and low latency. Traffic enters Cloud Load Balancing through 80+ distinct global load balancing locations, maximizing the distance traveled on Google’s fast private network backbone.”
If LB for encrypted traffic (HTTPS), the target proxy requires signed certificate in order to terminate the SSL/TLS session and the proxy will re initiate a new session with the back for the session/request. If the new connection to the back is also encrypted, then the target VM instance need to have a certificate installed as well.
Once, the requested URL is received by the proxy, the URL policy map will be evaluated. Unless its configured for SSL or TCP proxy, it will be sent directly to the backend without URL map check.
The URL map illustrated in the figure below, shows an example of a website or service that offer uploading photos (static content), in which traffic with the /photo/* is redirected to a multi-regional cloud storage bucket. While traffic destined to multimedia video content, is distributed across different instances’ groups if its HD or non-HD video content.
Although, the processing of SSL can be CPU intensive, especially when the used ciphers are not CPU efficient, it is always recommended to use secure sessions from the proxy to the backend instances and avoid sending the traffic over unencrypted TCP as it it typically will reduce the level of security between the GCP global load balancer and the backend instances. In addition, SSL proxy may handle HTTPS but it is recommended to create HTTPS target proxy with at least one signed SSL certificate installed on the target HTTPS proxy.
GCP recently introduce the support of QUIC for HTTPS load balancing, which is a UDP-based encrypted transport protocol optimized for HTTPS, that is used to deliver traffic to Google products such as Google Web Search, YouTube, and other services that you probably use over Google Chrome. QUIC offers faster and more reliable web-based connections, as it won’t let a problem with one request to slow down other streams in the same connection, even on an unreliable connection. At the time of this blog writing, GCP is the first major public cloud to offer QUIC support for its HTTPS load balancers.
source and more details refer to: Introducing QUIC support for HTTPS load balancing
QUIC: A UDP-Based Multiplexed and Secure Transport draft-ietf-quic-transport-13
Moreover, GCP Global HTTP/HTTPS LB allows you to create custom request headers, in case the default ones are not sufficient or do not meet your requirements.
“User-defined request headers allow you to specify additional headers that the load balancer adds to requests. These headers can include information that the load balancer knows about the client connection, including the latency to the client, the geographic location of the client’s IP address, and parameters of the TLS connection.User-defined request headers are supported for backend services associated with HTTP(S) Load Balancers”
At the time of this blog writing, this capability is in Beta release, which means it is not covered by any SLA or deprecation policy and might be subject to backward-incompatible changes.
When it comes to instances health check, typically the GCP LB health checks are used to decide if an instance(s) is “healthy” and functioning. Functioning here might be checked using an application layer probe such as HTTP(s) probe. With GCP LB, If you check the logs on the instance, you may notice that the health check polling is happening more frequently than what you may have configured. This is because GCP LB offer the ability to create redundant copies of each health checker, which are used to probe your instances. If any health checker fails, a redundant one can take over without delay.
As discussed in the previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations”, each VPC in GCP has a single virtual software distributed FW, in order to make sure the LB and health check can communicate with the intended VMs in the respective instance group, traffic needs to be explicitly allowed by the FW rules. “You must create a firewall rule that allows traffic from 18.104.22.168/22 and 22.214.171.124/16 to reach your instances. These are IP address ranges that the load balancer uses to connect to backend instances. This rule allows traffic from both the load balancer and the health checker, also, keep in mind that GCP firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself”
One of the key aspects of connectivity design with GCP LB is, that by default, HTTP(S) load balancing distributes requests evenly among available instances. However, some applications behind a NAT device they will appear as sourced from the same IP, also stateful servers used by ads, gaming applications etc. may go through multiple applications’ tiers requests before the user end up on the targeted instance. When the session is disconnected due to poor quality or a moving mobile user, it can lead to bad user experience. That’s why considering “session affinity” to identify requests from a user by the client IP or preferably by the value of a cookie to re-direct client request to the same instance in a consistent manner, assuming the intended instance is healthy and has capacity to handle the request.
Beware that, when auto scaling functionality adds or removes instances within an instance group, technically the backend service may reallocate load, and the target instance may move, therefore to reduce the impact of such situation, you need to ensure that the initial minimum number of instances provisioned by the auto scaling is sufficient to handle the anticipated load, and auto scaling is kicked-off only when there is an unexpected load increase. However, this may not always be the case, because it requires good understanding of the expect load and required workload to handle it, also, the load may not be consistent during the day or day of the week in which the minimum number of instances can not be pre-provisioned to cover the expected load for the entire week (this is where auto-scaling is required).
Furthermore, Kubernetes Engine offers integrated support of Google HTTP LB, where an ingress controller can be created in a cluster, and then Kubernetes Engine creates an HTTP(S) load balancer and configures it to route traffic to application, also, path matcher can be leveraged for more specific requests routing into multiple containers with different images or functions. According to GCP “If you are exposing an HTTP(S) service hosted on Kubernetes Engine, HTTP(S) load balancing is the recommended method for load balancing.”
Technically, the system or cluster admin, need to create a Service resource of type NodePort, to make the frontend Pods deployment reachable within the container cluster, such as Web frontend.
Then with the ingress controller, Kubernetes creates an Ingress resource on the cluster. which is responsible for creating an HTTP(S) Load Balancer to route all external HTTP(s) traffic to the frontend – Web NodePort Service that was exposed.
One of the key design consideration here, is that the LB is Node/VM aware only (typically GCP LB point to instance group(s), that contain the backend cluster nodes), while from Containerized application architecture point of view, almost always VM:Pod is not 1:1. Therefore this may introduce imbalanced load distribution issue here.
Consequently, as illustrated in the figure above, if traffic evenly distributed between the two available nodes (50:50) with Pods part of the targeted Service, the Pod on the left node will handle 50% of the traffic while each Pod hosted by the right node will receive about 25%. Kubernetes Kube-Proxy (IPTebles at kernel level) here deals with the distribution of the traffic to help considering all the Pod part of the specific Service across all nodes.
As it shown in the figure above, because the backend Service, (IPTables) can randomly pick a Pod that potentially residing in a different node, there will be an extra network hop, for the incoming and return traffic. As result, this will create what is commonly known as “Traffic Trombone”.
Also, you may have noticed that there are source and destination NAT has been done. The destination NAT, in order to send traffic to the selected Pod, while the source NAT, is to ensure the traffic will come return to the same originally selected node by the LB, in order to do source NAT to the LB IP before sending the traffic back to the client otherwise there will be mismatch in the session (original client request is toward the LB IP).
Practically, this imbalance issue may not be always a big problem if there is ration of VM:Pod is well balanced and the added latency is not an problem.
That being said, the good news is, recently GCP announced (at GCP Next18) a new capability in GCP load balancing with Kubernetes, which is ‘container native load balancing’ using network endpoint group, in which the GCP LB will be container aware, this means the LB will target containers directly and not the node/VM. This native container support, offer more efficient load distribution, as well as more accurate health checks (container level visibility), without the need for multiple NATing. From external clients’ point of view, this will provide better users experience, due to the optimized data path as there is no proxy in between which reduces the possible latency of multiple hopes packets’ forwarding.
For GCP cloud service consumers who need to use multiple Google Kubernetes Engine clusters to host their applications, for: better resilience, scalability, isolation, compliance as well as archiving low-latency applications access, from anywhere around the world.
GCP introduced the support of Multi-cluster Ingress controller, with a new CLI tool called “kubemci” that help to automatically deploy an ingress controller using Google Cloud Load Balancer to serve multi-cluster Kubernetes Engine environments (Federated or standalone clusters). As result customers will be able to leverage GCP global LB along with the multiple Kubernetes Engine clusters distrusted across different regions, from the closest cluster using a single GCP LB frontend anycast IP address.