About the project
The updated software solution is optimized to work in cloud environments and can support 20,000,000 users and 200,000,000 machines. Thanks to the scalability of HBase in combination with Apache Hadoop, the system can easily expand while maintaining high performance.
The customer is a global provider of automated IT management services. The legacy system based on .NET and RDBMS was rather slow and required adding new expensive database servers to support more users. We were to develop an easy-to-scale solution that would store massive sets of structured and unstructured data. The amounts of data to be stored were as follows:
- 2,000,000 tenants
- 20,000,000 users
- 200,000,000 machines
- 40,000,000 support sessions daily
- 50,000,000 files uploaded daily
Other crucial tasks included minimizing or preventing downtimes, ensuring high availability, and providing wide querying capabilities for the ticketing module, as well as for data analysis.
We had to suggest a solution that would meet all of the customer’s requirements and be compatible with the existing .NET system. The R&D department investigated the subject and offered the following two options:
- a system that uses Couchbase for caching and either HBase or Couchbase for data storage
- a NoSQL solution in combination with Solr
After a thorough feasibility study, we came to the conclusion that the first alternative would be less cost-effective. Couchbase requires that the working set is placed directly into the memory; otherwise, access operations are slowed down considerably. Thus, it would be too expensive to store all data in Couchbase.
The main advantage of the second option was that the system could be deployed on commodity hardware with a possibility to easily distribute the database across multiple hosts as the load increases. The result would be a scalable and low-cost solution. It was decided to use HBase instead of Couchbase. HBase’s capabilities are enough to support queries and range scans, as well as provide access to data by key. Denormalization patterns and Solr were implemented for processing more complex queries that have secondary fields.
Before we started working on the real system, a prototype was prepared and deployed on Amazon EC2. After a thorough research, we implemented the cost-effective and scalable NoSQL-based Solution for Managing Virtual IT Infrastructure that provides high availability and great performance.
Let's see what we can do together
Cloud Solutions Architect