About the project
This business intelligence solution can obtain data from SQL databases, MS Excel files, tables, text files, as well as from social networks, including Salesforce, Facebook, and LinkedIn while preserving the exact structure and content of any document. Users can pivot rows and columns, as well as join, aggregate, and transform data. Furthermore, it’s possible to sort or filter information according to certain criteria. Processed data can be uploaded to any database or system selected as the output source. Users can create and map data flows. The solution features an interface typical of Microsoft products and the editing process is based on the WYSIWYG principle. It means that users can see how data will be transformed and add new relations between the rows and columns with a mouse click.
Science Warehouse brings cloud-based services and data analysis technologies to procurement organizations. The company offers a range of solutions that simplify purchasing processes by delivering savings and provide real-time desicion support based on collected analytical data.
Initial technical specifications
- Around 10–20 GB of data from different sources is added to the database weekly.
- The Oracle database contains approximately 12,000,000 products data and 500,000,000 price records.
- The procurement solution is capable of setting different prices for different groups of customers.
- The archived price history contains nearly 400,000,000 of records.
- A lot of computing operations are required to properly analyze and visualize data. Several processes are running concurrently: new data is added to the database; users search for products, create orders, and carry out different kinds of transactions.
The business need
The customer wanted to preserve the core architecture of the existing Java application and keep the current version of the Oracle database to avoid additional costs. Apart from buying a more expensive license to upgrade to the Oracle Database Enterprise Edition, the company would have to migrate data from the legacy database to a new store and spend time on testing, tuning, etc. Science Warehouse wanted to check out if there was a NoSQL solution that could help them to improve performance without huge investments like this.
The workshop agenda included
- Introduction to the NoSQL paradigm and overview of the pros and cons of the traditional SQL databases
- Comparison of two approaches: storing all data in one centralized database vs. storing different types of data in several application databases
- A run through the different types of NoSQL databases: key/value, column family, document, and graph—including comparison of their current SQL queries against the various NoSQL queries
- Instructions on data modeling, based on the architecture of the Science Warehouse system
- Practical recommendations on how to improve the system — including current Oracle database architecture and queries — and suggestions on using NoSQL databases to solve particular tasks
While the agenda of the first day, as described above, mainly covered the theoretical aspects of the new non-relational databases, the second day was dedicated to practical examples and actual suggestions for optimization. Altoros’s big data experts pointed out some bottlenecks that were detected while reviewing the system and the database that had been carried out prior to the workshop. They provided some recommendations on how to improve the system and also demonstrated how this can be done.
Altoros suggested using the polyglot persistence approach for optimizing the Science Warehouse solution. It would enable the company to store different types of data in different databases: transactional data would be stored in the Oracle SQL database, while non-sensitive, quickly growing data (such as logging information, price history, etc.) would be kept in a NoSQL database.
The following changes were initiated
- Add a caching layer that would be based on a document store (e.g. Couchbase Server or MondoDB) to speed up access to the most popular items in the catalogue. Altoros’s engineers analyzed the most popular queries and found out that there were millions of simple queries that consumed a lot of database and network resources. For instance, one query was executed nearly 70 million times and the database server processed more than 9 TB of data. With the aspect-oriented programming approach, the caching layer could be seamlessly weaved into the Science Warehouse system without making significant changes to the business logic of the solution.
- Write the log data on visitor activities on the Web site to a Column Family store (e.g. Cassandra or HBase). Previously, this information was recorded to a single table of the Oracle database. Since this data is never being deleted, the table grows constantly. Event logs are not a kind of sensitive business data and they do not require support for ACID characteristics. So, this data can be easily recorded to a separate NoSQL store that can be seamlessly integrated into the system.
- Move a price history data to a Column Family store (e.g. Cassandra or HBase). This option will allow for building analytics reports using data from a NoSQL database instead of querying the main Oracle database.
During a two-day workshop, Science Warehouse’s engineers got enough detailed information on NoSQL to evaluate the possibilities and start prototyping themselves. The recommendations from Altoros helped the company to point in the right direction for the future of their system and define the areas they should focus on. The given practical examples continuously referred to their current setup and everyday work, which helped Science Warehouse to rapidly see the difference between SQL and NoSQL solutions and learn about the possibilities of a multi-database system. The company’s engineers analyzed Altoros’s recommendations and started prototyping the suggested features.
The workshop showed that it is not necessary to re-write a system from scratch or even refactor it to solve scalability and performance issues. The NoSQL database technologies can quite easily be introduced into existing systems to speed up operations, gain scalability, and avoid upgrading to expensive licenses.
“Altoros’s profound knowledge and overview of big data technologies really helped our engineers to save a lot of time—and money, too. Our business-critical products will be optimized significantly based on the workshop we did with Altoros and the features suggested by this company that we are prototyping now. I can highly recommend Altoros’s engineers to help improve software products using cutting-edge technologies.”
Let's see what we can do together
Cloud Solutions Architect
© Copyright 2003-2019 Altoros. All rights reserved.