HomeCasesImproving Maintenance and Observability on Istio

Improving Maintenance and Observability of an HR Platform Running on Istio

AWS
Cloud-Native
Kubernetes

A provider of human resources software—processing $10 billion annually—turned to Altoros for upgrading its platform, aiming to keep critical components operational during migration.

Improving Maintenance and Observability of an HR Platform Running on Istio

About the project

Brief results of the collaboration:

  • A service mesh based on Istio was updated without any downtime, keeping critical HR services secure and operational for 1,400+ organizations.
  • With an upgraded platform, the company was able to improve maintainability and transparency, resuming the development of new services.
  • By employing Altoros-provided best practices and recommendations, the organization is able to troubleshoot services much easier, as well as benefit from the flexibility, scalability, and high availability provided by Amazon EKS.

The customer

Based in the USA, the customer is a full-service human resources (HR) platform provider. The company aims to solve some of the largest HR problems mid-sized organizations face, such as talent and time management, benefits administration, payroll, etc. The customer’s main product is an HR platform with an open API utilized by 60+ third-party developers. The platform serves 1,400+ companies nationwide and processes more than $10 billion in annual payroll.

The need

The company's services were running on Amazon Elastic Kubernetes Service (EKS) since 2018. After two years in production, the in-house team experienced issues upgrading Istio, the platform's service mesh, due to a lack of transparency, official documentation, and focused expertise with the product. This meant that critical services providing observability, traffic management, security, etc., would remain outdated and eventually run into incompatibility issues.

The company turned to Altoros, a certified Kubernetes solutions provider and an Amazon partner, for assistance in updating the service mesh.

The challenges

Under the project, the team at Altoros had to address the following issues:

  • The lack of documentation in upgrading Istio v1.4 to v1.5 and v1.5 to v1.6 made it difficult to create a clear path for the update process.
  • The HR platform was constantly under load, meaning Istio had to be updated without any downtime.

The solution

Stage 1. Evaluation

Along with the customer, engineers at Altoros assessed the existing Istio v1.4 service mesh and outlined an update strategy.

Without any official documentation for upgrading Istio v1.4 to v1.5 and v1.5 to v1.6, as well as multiple incompatibility issues between versions due to a shift from microservices in v1.4 to a monolithic model in v1.5, our developers opted to bypass v1.5 and upgrade directly to Istio v1.6.

Stage 2. Migration

To ensure a smooth and seamless upgrade, our team performed the update using a canary deployment, a new feature added in Istio v1.6. In this manner, our DevOps experts deployed a new Istio v1.6 control plane that ran in parallel with the existing Istio v1.4 control plane.

While both control planes were up and running, engineers at Altoros were able to shift a portion of the customer's workloads to the Istio v1.6 control plane and monitor the effects. This process enabled our team to run exhaustive tests and resolve any issues before redirecting all of the company's traffic to the upgraded control plane. This way, our developers performed the entire upgrade process without experiencing any downtime.

Stage 3. Training

With the service mesh updated, DevOps experts at Altoros facilitated knowledge transfer with the in-house team. This introduced the customer to Kubernetes best practices, such as creating different namespaces for each team to isolate network resources, adding virtual services for each app to make it easier to troubleshoot errors, etc.

Critically, our team also shared the knowledge and experience the company needed to keep their service mesh up-to-date, enabling them to take advantage of new features and services.

The outcome

Partnering with Altoros, the company successfully upgraded its Istio service mesh to v1.6 with zero downtime, while also providing their in-house team with the expertise needed to keep the platform up-do-date. With an updated service mesh, the customer can ensure its HR platform that serves 1,400+ mid-sized companies and processes over $10 billion in payroll annually remain operational and secure. The organization now has the expertise to perform upgrades, develop new services, and enact further improvements by implementing recommendations and best practices shared by Altoros.

Technology stack

Client platform

Amazon Elastic Kubernetes Server

Scripting languages

Bash

Frameworks and tools

Istio, Kapitan, Jaeger, Kiali, NGINX Ingress Controller, Spinnaker

1,400+

mid-sized
companies

$10B

annual payroll
processed

60+

third-party
developers

Contact us

Let’s Talk

Ryan Meharg

Ryan Meharg

Cloud Solutions Architect

ryan.m@altoros.com650 265-2266

4900 Hopyard Rd. Suite 100 Pleasanton, CA 94588