The Apriori Algorithm vs. k-means Clustering for a Recommendation Engine

by Alex Khizhniak and Sofia ParfenovichNovember 3, 2013

When building a recommender for an online movie store, completely different approaches may exist.

Table of Contents

Association rules or other options?

Data analytics for a large online store involves a number of challenges. Product data may be complex by nature and reach terabytes in size, your data stores may be (geo-)distributed, association algorithms may require significant memory resources, etc.

One of our customers needed a recommendation engine for a media streaming service to increase sales. The task was to develop a model that would provide relevant movie suggestions to users. Due to the extremely large size of data, the customer wanted to avoid using clustering, which groups data based on purchasing history.

Initially, the decision was to go with the Apriori algorithm that builds association rules based on frequent sequences found in transactions. However, when working with real data, we stumbled upon certain limitations and therefore looked for other options. As a result, using k-means clustering succeeded in building more relevant recommendations and providing more options for visitors.

An example of clustering

Findings

Today, we present our findings in a detailed research paper, “A Comparison of the Apriori Algorithm vs. k-means Clustering for a Movie Recommendation Engine.” The document contains:

a brief overview of 4 popular algorithms for building recommendations
tips on efficient data preprocessing to reduce computational resources used
3 methods that can improve the quality of search recommendations
a comparative table with results of using Apriori vs. k-means
12 diagrams that feature recommendations based on real-life data

Download the study and feel free to send us your feedback.

Want details? See the slides!

Alex Khizhniak is Director of Technical Content Strategy at Altoros and a cofounder of a local Java User Group. Managing distributed teams since 2004, he has gained experience as a journalist, an editor-in-chief, a technical writer, a technology evangelist, a project manager, and a product owner. Alex is obsessed with AI/ML, data science, data integration, ETL/DWH, data quality, databases (SQL/NoSQL), big data, IoT, and BI. The articles and industry reports he created or helped to publish reached out to 3,000,000+ tech-savvy readers. Some of the pieces were covered on TechRepublic, ebizQ, NetworkWorld, CIO.com, etc. Find him on Twitter at @alxkh.

The Apriori Algorithm vs. k-means Clustering for a Recommendation Engine

Association rules or other options?

Findings

Want details? See the slides!

Further reading

Contact Us