Planning Your Cloud Stack: Storage and Database Solutions

by Vitaly SedelnikOctober 12, 2010

It is high time to review the possible options for organizing storage and database in the Cloud. Cloud storage implies keeping your data on virtual servers—commonly, hosted by third-party companies—with the ability to scale the resources as the requirements change. As for Cloud database solutions, secure access to the database hosted off-premises is crucial for every company, and in case of GreenEgg, it’s no exception, either.

Cloud Storage solutions:

  • Amazon S3

Amazon S3 is an online service that provides storage via the Web interface. S3 can store arbitrary data of any size accessible from anywhere through the Web, but most of its usage arises from the fact that S3 is tightly integrated with Amazon’s cloud platform, EC2. S3 is used to upload machine images to an EC2 account. Also, S3 storage can be mounted as a file system into an EC2 instance. Another typical use of S3 is hosting of static content, such as photos, audio/video, etc. Amazon claims that S3 utilises the same scalable storage infrastructure that uses to run its own global e-commerce network, so the main advantages of S3 include scalability, high availability, and low latency. S3 enables access via the REST and SOAP interfaces, as well.

  • SimpleDB

SimpleDB is another part of Amazon Web Services, a structured distributed storage solution. Unlike strict relational databases, SimpleDB data is organized in domains, items, attributes, and values. Domains are collections of items that are described by attribute-value pairs. The key benefit of SimpleDB is automatic, geo-redundant replication. So, for each data item, multiple replicas are created in different data centers within a selected region. This enables high availability and low latency. SimpleDB also automatically indexes data items to enable efficient queries and provides a simple API for storage and access.

Cloud Database solutions:

  • The first option, quite obvious, is to run one of the regular database engines. Most Cloud providers offer ready-to-run server configurations that contain standard database engines, such as MySQL, Oracle, PostgreSQL, Sybase, etc. This option doesn’t really differ from running a database on a regular hosting.
  • MySQL Cluster

MySQL Custer is another technology that provides clustering capabilities for the MySQL database engine. Its main goals are high availability and performance, but at the same time it allows for nearly linear scalability. In MySQL Cluster, each node is independent and self-sufficient, and there is no single access point across the system. It uses synchronous replication in order to guarantee that data is written to multiple nodes upon committing the data. The MySQL Cluster engine is based on the regular MySQL engine and supports most of its features.

  • ScaleDB

ScaleDB is a pluggable storage engine for MySQL. It turns MySQL into a highly-available, clustered database that can be scaled dynamically in the Cloud. Unlike MySQL, Cluster ScaleDB uses centralized Cluster Manager that enables multiple nodes to share the same physical data. This allows for dynamic scalability, but the usage of the centralized manager can bring a single point of failure into the system. However, ScaleDB claims that if the cluster manager fails, one of the database nodes takes over the manager role. ScaleDB also provides an open source API licensed under GPL v2.

  • Other notable database solutions for the Cloud are Amazon RDS, SQL Azure, FathomDB, and LongJump.

In my next two posts, I will get back to the analysis of the most suitable e-commerce solutions and the corresponding Cloud products for running the GreenEgg’s business in the Cloud.