HomePortfolioTroubleshooting the Suite of Financial Tools to Serve 300,000 Daily Users

Troubleshooting the Suite of Financial Tools to Serve 300,000 Daily Users

Finance
Cloud-Native
Kubernetes
VMware

A provider of financial software and services turned to Altoros to train its in-house team around Kubernetes and gain firsthand experience with real-world scenarios.

Troubleshooting the Suite of Financial Tools to Serve 300,000 Daily Users

About the project

Brief results of the collaboration:

  • Within the two-week engagement period, the customer’s engineering team acquired in-depth knowledge of operating the Tanzu Kubernetes Grid that serves 300,000 daily users.
  • Now, the in-house team has proficient expertise in configuration, monitoring, troubleshooting, and upgrading scenarios to maintain the system’s stability under the load of 100 billion market data entries, as well as a billion e-mails and instant messages per day.

The customer

Based in the USA, the customer is a leading software and service provider for the financial industry. The organization primarily focuses on delivering market analytics, data services, and financial news. Founded in 1981, the company has more than 170 global offices with 19,000+ employees.

The need

The customer had a suite of financial software that served 300,000 users daily across 150+ countries. To ensure high availability of services, the company adopted Tanzu Kubernetes Grid Integrated Edition (TKGI). However, the organization was only taking first steps in its cloud-native journey, and the in-house team still lacked profound expertise in running the platform. Consequently, the team was not able to address issues such as unexpected crashes of virtual machines (VM) and pods.

The company turned to Altoros, a certified VMware Tanzu solutions prodiver, for an in-depth, hands-on training around configuring, managing, and fine-tuning their TKGI deployment.

The challenges

The company’s suite handles 100 billion market data entries, 2 million news stories, and over a billion of e-mail and instant messages daily. This puts the underlying platform under a pretty heavy load. The ability to scale up and avoid crashing any service is critical. In this regard, the training program had to focus on addressing and preventing the existing issues.

The solution

Days 1–2. Engineers at Altoros customized the training agenda to address the issues specific for the current TKGI deployment. Our developers also provided recommendations around configuration and management of the platform, as well as shared DevOps best practices.

Days 3–5. The team at Altoros demonstrated how to troubleshoot vSphere virtual machines step by step: export credentials to access BOSH Director, display VMs, SSH tunnel, pull logs, disable autohealing to prevent BOSH from creating missing VMs, etc.

Then, our engineers showed how to identify deployment issues. The process embraced listing nodes along with IP addresses, displaying roles in namespaces and pods in a namespace along with the attached worker VM, retrieving high-level information about a deployment, listing persistent physical volumes in a given namespace, etc.

To further assist the in-house team with troubleshooting the platform, our developers advised on how to integrate monitoring and logging tools.

Days 6–7. To keep the platform up-to-date, DevOps experts at Altoros devised a strategy for performing several key operations, such as pulling down updates, upgrading OpsManager and OpsDirector, updating and creating Windows stemcells, applying tile updates, upgrading TKGI clusters, etc.

Days 8–10. To help cope with potential issues, our engineers walked the in-house team through three real-world scenarios that simulate the main bottlenecks in the customer’s current deployment:

a) Developers at Altoros demonstrated how to detect troublesome processes and restart them.
b) Our team guided through accessing unresponsive VMs and recreating missing ones.
c) DevOps experts at Altoros showed how to enable the database to accurately display the status of tasks when an error occurs due to a rollback.

The outcome

Partnering with Altoros and introducing the in-house team to the efficient management of TKGI, the customer gained necessary expertise to optimize its suite of financial tools that serves 300,000 daily users. With firsthand experience across configuration, monitoring, troubleshooting, and upgrading scenarios, in-house engineers can now operate the platform that handles 100 billion market data entries. Now, the company has the knowledge to timely identify arising issues and resolve them on the go.

Technology stack

Server platform

Windows Server

Client platform

Tanzu Kubernetes Grid Integrated Edition

Frameworks and tools

Bash, BOSH

Data storage

VMware vSphere Virtual Machine File System

1B

e-mails and messages

100B

market data entries

300,000+

users daily

/
01

Want to develop something similar?

Preloader
Ryan Meharg

Ryan Meharg

Technical Director

ryan.m@altoros.com650 265-2266

4900 Hopyard Rd. Suite 100 Pleasanton, CA 94588