The ETL Tool for Collecting Data from Social Networks

The project is a modern ETL (extract, transform, load) tool that provides a new secure and efficient way for managing your data.Develop similar
SOFTWARE
BUSINESS INTELLIGENCE
HADOOP
NOSQL

About the project

This business intelligence solution can obtain data from SQL databases, MS Excel files, tables, text files, as well as from social networks, including Salesforce, Facebook, and LinkedIn while preserving the exact structure and content of any document. Users can pivot rows and columns, as well as join, aggregate, and transform data. Furthermore, it’s possible to sort or filter information according to certain criteria. Processed data can be uploaded to any database or system selected as the output source. Users can create and map data flows. The solution features an interface typical of Microsoft products and the editing process is based on the WYSIWYG principle. It means that users can see how data will be transformed and add new relations between the rows and columns with a mouse click.

The need

The customer is the owner of a data integration platform that enables users to migrate and manage data among several sources. The company addressed Altoros to expand the capabilities of their data analytics platform. A new system had to be based on Hadoop to process data faster than the legacy platform and it also had to support multiple formats.

The challenge

The architecture of the legacy platform didn’t provide possibilities to add new functionality without rewriting the initial structure of the system. Altoros’s engineers found a solution and created the Extensions Framework that enables users to build custom operators and save them as native components.

The solution

Altoros transformed the platform into a flexible system that can be easily extended through custom operators. Furthermore, third-party developers got the possibility to create new extensions for any systems they want to map. The ETL tool can be a good solution in situations when you need:

  • A cross-platform host
  • A big data migration tool
  • Scheduled Data flow transformations
  • NoSQL compatibility
  • Simple Business Intelligence interface

After a thorough feasibility study, we came to the conclusion that the first alternative would be less cost-effective. Couchbase requires that the working set is placed directly into the memory; otherwise, access operations are slowed down considerably. Thus, it would be too expensive to store all data in Couchbase.

The main advantage of the second option was that the system could be deployed on commodity hardware with a possibility to easily distribute the database across multiple hosts as the load increases. The result would be a scalable and low-cost solution. It was decided to use HBase instead of Couchbase. HBase’s capabilities are enough to support queries and range scans, as well as provide access to data by key. Denormalization patterns and Solr were implemented for processing more complex queries that have secondary fields.

The outcome

This ETL tool is a new-generation Business Intelligence solution that is built around users. Based on a simple interface, the system offers a visually clear way of managing big data and working in cloud environments. The ETL solution allows for getting new insights from unstructured business data through mapping and deep analysis.

Technology stack

Server platforms
Windows, Unix
Programming languages
C# (MS .NET), C/C++, Lua
Technologies
WPF, Actipro WPF Controls, Apache Hadoop

Contact us

Let's see what we can do together

Ryan Meharg

Ryan Meharg

Cloud Solutions Architect

Headquarters

location icon830 Stewart Dr., Suite 119Sunnyvale, CA 94085
First Name*
Last Name*
Email*
Phone*
Your company name*
How can we help you?