Essential Optimization Methods to Make Apache Spark Work Faster
Apache Spark is one of the most popular technologies for processing, managing, and analyzing big data. Different modules of Apache Spark vary in terms of performance. In this regard, knowing different optimization methods to improve query runtime is crucial.
This report focuses on the analysis of two Apache Spark modules—Spark Core and Spark SQL—before and after optimization techniques were applied. Some of the optimization methods include:
- improving slow processes, such as
- reconfiguring the User RDD and the Post RDD
- replacing the default Java serializer with the Kyro serializer
The optimization results are supported by four performance diagrams and four descriptive tables.