When you provision workload in the cloud to serve an application, having a load balancer at the front end of the applications’ tier is almost always a must, to ensure that users’ requests are redirected to the workload instances that have the capacity to serve the request with better performance.
Load balancing in Google Cloud Platform GCP is a fully scalable and redundant managed service offered in different flavors (global external, regional external, regional internal) This article focuses on the global external load balancer. Figure 1 below illustrates the high level architecture of Google global load balancer that will be discussed in more details in this article.
Generally, Load balancing in the cloud offers two main functions:
Its, obvious from the name this type of LB, it is external (client/user facing) and it’s a global, also, it’s an application layer (HTTP(S)) type of LB. As highlighted in my previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations” with Google external global load balancer all what you need is a single IP to front end your application stack that cloud be distributed globally without the need to deploy a load balancer per region!
Since the Google global LB is not a single box, it’s not a VM instance and it’s not a cluster, so how does it work and where does it reside?
GCP global LB is constructed of the following components
GCP LB is capable to scale quickly and effortlessly, according to GCP “Cloud Load Balancing is built on the same front-end serving infrastructure that powers Google. It supports 1 Million+ queries per second with consistent high performance and low latency. Traffic enters Cloud Load Balancing through 80+ distinct global load balancing locations, maximizing the distance traveled on Google’s fast private network backbone.”
If LB for encrypted traffic (HTTPS), the target proxy requires signed certificate in order to terminate the SSL/TLS session and the proxy will re initiate a new session with the back for the session/request. If the new connection to the back is also encrypted, then the target VM instance need to have a certificate installed as well.
Once, the requested URL is received by the proxy, the URL policy map will be evaluated. Unless its configured for SSL or TCP proxy, it will be sent directly to the backend without URL map check.
The URL map illustrated in the figure below, shows an example of a website or service that offer uploading photos (static content), in which traffic with the /photo/* is redirected to a multi-regional cloud storage bucket. While traffic destined to multimedia video content, is distributed across different instances’ groups if its HD or non-HD video content.
Although, the processing of SSL can be CPU intensive, especially when the used ciphers are not CPU efficient, it is always recommended to use secure sessions from the proxy to the backend instances and avoid sending the traffic over unencrypted TCP as it it typically will reduce the level of security between the GCP global load balancer and the backend instances. In addition, SSL proxy may handle HTTPS but it is recommended to create HTTPS target proxy with at least one signed SSL certificate installed on the target HTTPS proxy.
Moreover, GCP Global HTTP/HTTPS LB allows you to create custom request headers, in case the default ones are not sufficient or do not meet your requirements.
“User-defined request headers allow you to specify additional headers that the load balancer adds to requests. These headers can include information that the load balancer knows about the client connection, including the latency to the client, the geographic location of the client’s IP address, and parameters of the TLS connection.User-defined request headers are supported for backend services associated with HTTP(S) Load Balancers”
At the time of this blog writing, this capability is in Beta release, which means it is not covered by any SLA or deprecation policy and might be subject to backward-incompatible changes.
When it comes to instances health check, typically the GCP LB health checks are used to decide if an instance(s) is “healthy” and functioning. Functioning here might be checked using an application layer probe such as HTTP(s) probe. With GCP LB, If you check the logs on the instance, you may notice that the health check polling is happening more frequently than what you may have configured. This is because GCP LB offer the ability to create redundant copies of each health checker, which are used to probe your instances. If any health checker fails, a redundant one can take over without delay.
As discussed in the previous blog “Networking in AWS vs. Google Cloud Platform – Design Considerations”, each VPC in GCP has a single virtual software distributed FW, in order to make sure the LB and health check can communicate with the intended VMs in the respective instance group, traffic needs to be explicitly allowed by the FW rules. “You must create a firewall rule that allows traffic from 22.214.171.124/22 and 126.96.36.199/16 to reach your instances. These are IP address ranges that the load balancer uses to connect to backend instances. This rule allows traffic from both the load balancer and the health checker, also, keep in mind that GCP firewall rules block and allow traffic at the instance level, not at the edges of the network. They cannot prevent traffic from reaching the load balancer itself”
One of the key aspects of connectivity design with GCP LB is, that by default, HTTP(S) load balancing distributes requests evenly among available instances. However, some applications behind a NAT device they will appear as sourced from the same IP, also stateful servers used by ads, gaming applications etc. may go through multiple applications’ tiers requests before the user end up on the targeted instance. When the session is disconnected due to poor quality or a moving mobile user, it can lead to bad user experience. That’s why considering “session affinity” to identify requests from a user by the client IP or preferably by the value of a cookie to re-direct client request to the same instance in a consistent manner, assuming the intended instance is healthy and has capacity to handle the request.
Beware that, when auto scaling functionality adds or removes instances within an instance group, technically the backend service may reallocate load, and the target instance may move, therefore to reduce the impact of such situation, you need to ensure that the initial minimum number of instances provisioned by the auto scaling is sufficient to handle the anticipated load, and auto scaling is kicked-off only when there is an unexpected load increase. However, this may not always be the case, because it requires good understanding of the expect load and required workload to handle it, also, the load may not be consistent during the day or day of the week in which the minimum number of instances can not be pre-provisioned to cover the expected load for the entire week (this is where auto-scaling is required).
Furthermore, Kubernetes Engine offers integrated support of Google HTTP LB, where an ingress controller can be created in a cluster, and then Kubernetes Engine creates an HTTP(S) load balancer and configures it to route traffic to application, also, path matcher can be leveraged for more specific requests routing into multiple containers with different images or functions. According to GCP “If you are exposing an HTTP(S) service hosted on Kubernetes Engine, HTTP(S) load balancing is the recommended method for load balancing.”