A common problem for researchers who work on genome analysis is the need to store and process terabytes of data fast. Deployed on Amazon public cloud, the system was powered by Amazon Web Services and Amazon EMR. With this optimal solution our customer was able to process 150 GB of genome sequencing data within 24 hours and in the most cost-efficient manner.
Apart from building an algorithm for detecting SNP, we were to determine what hardware configuration could provide the required data processing speed.
The customer helps scientists and laboratories to conduct research and experiments in the field of life sciences. Their key services include next-generation sequencing, bioanalytical and mass spectrometry, as well as DNA sequencing. The customer turned to Altoros to develop a solution that would detect SNP in digitized DNA sequences saved in the FASTA/FASTQ format easier and less time-consuming.
The team completed the following tasks for this project:
With the help of the automated SNP detection system, the biological laboratory of our customer managed to process 150 GB of genome sequence data within 24 hours at minimum cost. We started with development of a prototype to test the possible deployment options and make sure the functionality works correctly. The system for SNP detection was later installed on the customer’s private distributed infrastructure and data processing was performed with Apache Hadoop.
Linux, Amazon Web Services
Client Platforms/Application Servers
Internet Explorer, Firefox, Safari, Chrome
Perl, Java, Bash
Map / Reduce, Java, HTML, Apache Hadoop, Amazon EMR