Enabling Scalability of a Video Platform for Digital Advertising

A service provider of digital marketing solutions turned to Altoros to enable scalability of its platform for launching advertising campaigns.Develop similar
MARKETING AND ADVERTISING
AWS
BIG DATA
BUSINESS INTELLIGENCE
JAVA

About the project

Brief results of the collaboration:

  • Now, the system aggregates and analyzes 60 billion records of video metadata in the required time span, with around 1 TB of new data added daily.
  • The platform generates timely BI reports, compiling the lists of videos for targeted advertising and suggestions for revenue improvement.

The customer

The company is a California-based provider of digital marketing solutions. The customer offers promotional campaigns via advertisement embedded in YouTube and Facebook videos.

The need

The customer experienced issues with the analytical module of its digital marketing platform. Generating BI reports either took around a day or resulted in a timeout error, while the company’s analysts needed the up-to-date information daily.

As the company planned to embrace a bigger market segment, the platform should also be able to aggregate billions of video metadata records and merge new updates each day. So, there was a need for a distributed data processing solution that would replace the existing DB2-based module.

The challenge

In the course of the project, the team faced the following challenges:

  • The platform needed to be scalable enough to aggregate 60 billion records of video metadata.
  • Daily, the solution was required to merge 1 TB of incoming data with no performance bottlenecks.

The solution

As the customer aimed at aggregating larger data volumes—billions of videos—the legacy DB2 database was no longer an option. So, engineers at Altoros implemented a distributed solution based on Cloudera/Hadoop.

Apache Kafka was used to smartly queue video metadata updates (titles, number of clicks, etc.), so that only the latest information would be sent for processing. The data model was also optimized, improving performance.

Experts at Altoros evaluated a variety of distributed frameworks and utilized Apache Spark to enable the system to analyze terabytes of data in parallel.

The team discovered that the BI reports were not responding due to some special characters in the input CSV files. To solve the issue, our developers employed the OpenCSV library—to transform data into a readable format prior to merging it in Hive.

Finally, Apache Oozie automated the process of setting up and running jobs inside the data processing layer. The Zabbix service helped to monitor cluster performance.

The outcome

Cooperating with Altoros, the customer enabled its platform to generate timely BI reports based on a larger amount of videos. Now, the company’s analysts have the up-to-date information daily—for offering targeted ads.

The new distributed data processing module enables to store and analyze 30 TB of compressed data, merging 1 TB of new information within a night. The time spent on executing queries within the data processing layer was also cut multi-fold.

Technology stack

Server platform
Ubuntu (over AWS)
Platform
Cloudera (CDH 6)
Programming languages
Java, Scala
Technologies
Apache Spark, Hive, Cloudera Impala, Tableau, Zabbix, Apache Oozie, Apache Kafka
Database
DB2, HDFS, SQL Server, SQLite
slider image
slider image

Contact us

Let's see what we can do together

Siarhei Sukhadolski

Siarhei Sukhadolski

Artificial Intelligence practice head

Headquarters

location icon830 Stewart Dr., Suite 119Sunnyvale, CA 94085
First Name*
Last Name*
Email*
Phone
Your company name*
How can we help you?