The Analytical System for Banners Display

The system can serve targeted advertisements and process data retrieved from banner networks.Develop similar
BUSINESS INTELLIGENCE
CLOUD-NATIVE
HADOOP
JAVA
DIGITAL ADVERTISING

About the project

The main goal was to deliver near real-time statistics on clicks and displays, so that advertisers could manage their budgets more effectively. Consequently, this helped to improve the impact of advertisements and increased the overall turnover.

The need

The customer is a large Internet service provider that operates 20+ popular Web sites. The company has more than ten years of experience in digital marketing, advertising, Web development, and hosting. They also produce Web security software and use their own banner network. They needed a system that would enable showing more targeted advertisements.

The challenge

Before Hadoop had been implemented, there was no possibility for the company to get real-time statistics on displays. Data analysis was extremely slow and could only be launched once in 24 hours. The company also had to purchase expensive hardware to scale the system as the number of displays grew.

We were also to select a product for storing data. The key idea was to find a solution with a minimum input threshold that would be similar to standard SQL. In that way, DBA engineers already employed by the customer’s company would be able to build data queries. The customer will not incur any additional expenses on training for the staff.

The solution

We built a Hadoop cluster to ensure fast data analysis. The system loads data from Nginx servers at certain intervals. The servers upload banners directly to the HDFS via the WebDAV extension. In order to make CDH3 work, we had to add some patches to the standard WebDAV protocol.

Apache Hive was used as a data warehouse system for fast data querying and analysis. Data is partitioned by preset time intervals, so we can store active and archived data in one table. This also helps to simplify system maintenance and reduce costs, in case we need to recalculate results for any elapsed period of time. In this project, we used Pentaho-Kettle, an ETL solution by Pentaho that connects to Hive through a standard JDBC connector.

NameNode backup was implemented with NFS protocol to ensure high availability. We also developed some utility programs for Apache Hadoop to simplify the most frequent cluster administration tasks.

The outcome

The resulting system allowed for the following:

  • near real-time report building (statistics is updated every 10 minutes);
  • optimized display of banners (the most popular banners are displayed on top);
  • targeted display of banners (targeting data is received from third-party projects).

Technology stack

Server platforms
RHEL, Nginx
Programming language
Java
Technologies
Apache Hadoop, Apache Hive, Pentaho
Databases
MySQL, HDFS

Contact us

Let's see what we can do together

Ryan Meharg

Ryan Meharg

Cloud Solutions Architect

Headquarters

location icon830 Stewart Dr., Suite 119Sunnyvale, CA 94085
First Name*
Last Name*
Email*
Phone*
Your company name*
How can we help you?