Planning Your Cloud Stack for Magento and Apache OFBiz

by Vitaly SedelnikOctober 18, 2010
What technology options are the most suitable for a cloud stack running Magento and Apache Open for Business (OFBiz) as e-commerce platforms?

A cloud stack for a green energy company

Today, I will describe challenges, problems, and solutions identified in the process of building an enterprise cloud for one of our customers. It is a green energy company—let’s call it “GreenEgg” (the name is fictional)—that provides its customers with electricity and solar devices. A significant part of the GreenEgg’s business is on the web—the company is using its website for billing management and e-commerce (selling and shipment tracking of solar devices). Since the website is deployed on a regular hosting, GreenEgg is experiencing several typical issues in its everyday operation:

  1. Website traffic is not regular but has peaks and gaps. So, during the peaks, the server could be overloaded and could provide low response time; however, during the traffic gaps, the server stands idle and resources are wasted.
  2. It’s vital for the GreenEgg’s website to have 100% uptime (literally, as close to 100% as possible). This is of critical importance for several large-scale customers. Currently, GreenEgg is using an internal back-up server and specific maintenance services and both these options imply additional expenses.
  3. To expand its business to other areas and to attract new customers, GreenEgg needs a smooth way to expand capacity of the website. This can lead to the implementation of its own cluster with several nodes, load balancing, etc.
  4. Enterprise-level customers of GreenEgg often require a proven secure storage solution for their confidential information (for example, financial and contact information).
  5. In addition, GreenEgg needs a better tool to monitor and analyze the website activity, detect problems, and generate reports. Regular web statistics is becoming too slow and complicated, as the company grows.

When defining the most appropriate cloud architecture for GreenEgg, several systems were reviewed as candidates to run GreenEgg’s website, with custom development considered, as well. Finally, the top two selected e-commerce alternatives become Magento and Apache Open for Business (OFBiz).

This blog post describes how a cloud solution helped GreenEgg to resolve the problems above, as well as how the company planned and implemented the solution from scratch. We will focus on utilizing the most popular cloud products and service providers, such as Amazon EC2, RightScale, enStratus, and Eucalyptus. Generally speaking, we’ll focus on a full stack necessary for deploying a PHP/MySQL application in the cloud.

 

Defining the cloud layers

In light of the great variety of cloud products, tools, and services available on the market, we need a way to classify them. That will help us to understand which products can work together, supplement each other, and—vice versa—which ones cannot be used simultaneously in a single cloud stack. Let’s take generic cloud computing layers as a basis:

This scheme is quite clear and straightforward, but five layers are definitely not enough to cover all the variety of cloud products, tools, and services available. For example, many applications allow for using different application servers, web servers, and storage solutions. Say, Open For Business can be configured to use Tomcat or Jetty as a web server. A storage solution for Magento could be standard MySQL, MySQL Cluster, or ScaleDB. Further, the Platform, Infrastructure, and Server layers do not cover such aspects as hardware virtualization, different operating systems, and management tools.

Considering all of the above-mentioned, we introduce the following 10 layers to represent the cloud stack (from top to bottom):

  1. Clients (programs/devices that users with different roles run/access)
  2. Application’s UI (typically, within a browser or a desktop application)
  3. Application services (basically, the ways for a user to communicate with the application)—typically, it is HTTP (web) and web services API (REST, SOAP, etc.)
  4. Application’s code (core code base of your application)
  5. Middleware (software that connects your application with other software components and allows them to work together)—typically, this is an application and/or web server
  6. Storage (a data store for your application)
  7. Cloud management (a solution that manages instances of your application in the cloud, provides administration and monitoring services—e.g., RightScale or enStratus)
  8. OS (an operating system that runs your application)
  9. Hardware virtualization (Usually, an application deployed in a cloud is not running on a single dedicated server but starts up on a cloud node by request, collapsing when it’s no longer needed.)
  10. Hardware/infrastructure (includes a physical server(-s) that runs your application and network infrastructure—typically provided by a cloud vendor like Amazon EC2)

The levels above classify products and services in a more detailed way and better fit to the general web application layer. Later, I’ll refer to these layers to analyze and select the most suitable cloud solution for GreenEgg.

 

Cloud management providers

Cloud infrastructure is a physical and organizational environment (servers and software) for hosting and running your cloud applications. The choice of a cloud infrastructure provider depends on many factors, but what a customer needs to do in the first place is to identify what kind of features (autoscaling, a web-based control panel, advanced configuration possibilities, etc.) s/he needs and which pricing option fits best. Let’s have a closer look at the cloud infrastructure solutions available on the market.

Public cloud infrastructure:

  • Amazon EC2 is Amazon’s cloud computing platform that enables users to rent virtual computers to run any application. EC2 supports both Windows and Linux virtual servers, scalable deployment with geographical location control, and autoscaling adapted to the website traffic. All the features are controlled via an API. Users are charged per actual running time and data transfer (charges for optional features could be applied, as well). The pricing options include plans from standard instances to high-memory or high-CPU configurations. Along with EC2, Amazon offers the tightly integrated S3 (Simple Storage Service)—a cloud storage solution.
  • GoGrid is a cloud infrastructure provider that hosts Linux and Windows virtual machines. Management is performed via a web-based control panel, while an API is available, too. Among the main features, I can mention load balancing, cloud and dedicated servers, and a cloud storage service.
  • Other notable cloud infrastructure providers include RackSpace, FlexiScale, Nimbus, etc.

Another option is a private cloud infrastructure that can also be utilized to build up a cloud based on a company’s internal data center. In reality, it refers to corporate data centers adopting the technologies and practices of public cloud infrastructures, including systems management software, cluster/grid technology, load balancing, and virtualization. Private infrastructure helps to avoid the most common security pitfalls associated with other cloud options, since the customer’s data cannot be accessed by any third party. With this option, resources are reached via cloud management software—for example, Eucalyptus or Terracotta.

These are just a few cloud infrastructure providers, the industry leaders.

 

Cloud management providers

Now, when we are aware of the available infrastructure providers for GreenEgg, we need to find a proper cloud management solution. A cloud management platform enables managing your entire deployments instead of administering individual servers. So, here, I will give a brief overview of the leading cloud management providers.

  • RightScale is a web-based cloud management platform that supports multiple infrastructure providers. In particular, it can work with EC2, GoGrid, FlexiScale, and Rackspace. Among the top features of RightScale are automation of creating servers and arrays of servers, quick deployment, dynamic scalability, API to implement monitoring, alerting, load balancing, etc. All the tasks are performed via the Web-based application, RightScale Dashboard. RightScale is available on a subscription basis, but the free Developer edition exists, as well.
  • Scalr is an open source web-based cloud computing platform for managing Amazon EC2 Cloud. It’s the main competitor of RightScale. The code of Scalr is freely available under the GNU General Public License (version 2) and hosted at Google Code. Support is available on a monthly subscription basis.
  • Eucalyptus is an open-source software platform that implements cloud computing on computer clusters. The platform allows for organizing a private cloud based on a company’s data center(-s) and enables users to access cloud computing resources. Eucalyptus interface is compatible with EC2, so it can be used to implement hybrid clouds. Eucalyptus supports Linux and Windows virtual machines, multiple clusters, elastic IPs, as well as user and group management. The platform is implemented in Java and C. There is also a commercial Enterprise Edition of Eucalyptus available.
  • Other notable cloud management providers and solutions are Microsoft Azure, enStratus, and Terracotta.

Our next step will be to identify what solutions are available to organize a database/storage in the cloud.

 

Storage and database solutions

It is high time to review the possible options for organizing storage and database in the cloud. Cloud storage implies keeping your data on virtual servers—commonly, hosted by third-party companies—with the ability to scale the resources as the requirements change. As for cloud database solutions, secure access to the database hosted off-premises is crucial for every company, and in case of GreenEgg, it’s no exception, either.

Cloud storage options:

  • Amazon S3 is an online service that provides storage via the web interface. S3 can store arbitrary data of any size accessible from anywhere through the web, but most of its usage arises from the fact that S3 is tightly integrated with Amazon’s cloud platform, EC2. S3 is used to upload machine images to an EC2 account. Also, S3 storage can be mounted as a file system into an EC2 instance. Another typical use of S3 is hosting of static content, such as photos, audio/video, etc. Amazon claims that S3 utilises the same scalable storage infrastructure that Amazon.com uses to run its own global e-commerce network, so the main advantages of S3 include scalability, high availability, and low latency. S3 enables access via the REST and SOAP interfaces, as well.
  • SimpleDB is another part of Amazon Web Services, a structured distributed storage solution. Unlike strict relational databases, SimpleDB data is organized in domains, items, attributes, and values. Domains are collections of items that are described by attribute-value pairs. The key benefit of SimpleDB is automatic, georedundant replication. So, for each data item, multiple replicas are created in different data centers within a selected region. This enables high availability and low latency. SimpleDB also automatically indexes data items to enable efficient queries and provides a simple API for storage and access.

Cloud database options:

  • The first option, quite obvious, is to run one of the regular RDBMS. Most cloud providers offer ready-to-run server configurations that contain standard database engines, such as MySQL, Oracle, PostgreSQL, Sybase, etc. This option doesn’t really differ from running a database on a regular hosting.
  • MySQL Cluster is another technology that provides clustering capabilities for the MySQL database engine. Its main goals are high availability and performance, but at the same time it allows for nearly linear scalability. In MySQL Cluster, each node is independent and self-sufficient, and there is no single access point across the system. It uses synchronous replication in order to guarantee that data is written to multiple nodes upon committing the data. The MySQL Cluster engine is based on the regular MySQL engine and supports most of its features.
  • ScaleDB is a pluggable storage engine for MySQL. It turns MySQL into a highly available, clustered database that can be scaled dynamically in the cloud. Unlike MySQL, Cluster ScaleDB uses centralized Cluster Manager that enables multiple nodes to share the same physical data. This allows for dynamic scalability, but utilizing a centralized manager can bring a single point of failure into the system. However, ScaleDB claims that if the cluster manager fails, one of the database nodes takes over the manager role. ScaleDB also provides an open-source API licensed under GPL v2.
  • Other notable database solutions for the cloud are Amazon RDS, SQL Azure, FathomDB, and LongJump.

Let’s get back to the analysis of the most suitable e-commerce solutions and the corresponding products for running GreenEgg’s business in the cloud.

 

A cloud stack for Magento

Magento is an open-source e-commerce web application based on the components of Zend Framework. Magento uses Apache as a web server and MySQL as a database engine. So, to build a complete solution stack, we need to define a cloud infrastructure provider, a management solution (optionally), and a database engine.

Infrastructure options:

  • Private infrastructure needs to be considered because of the security and privacy requirements. The major drawback of the private infrastructure is that it takes a lot of administration and maintenance effort. That’s why it is hardly as cost-effective as the solutions offered by specialized providers. In addition, it could make sense to implement a hybrid cloud model, hosting all sensitive data in-house.
  • One of the highly developed cloud infrastructure providers is EC2—it has ready-to-use server images for PHP/MySQL applications and great extension capabilities. Also, most of cloud services and tools on the market do support EC2 (in many cases, they were developed for EC2—take RightScale, for instance). All this can greatly reduce the required administration effort. So, as a cloud provider, EC2 seems to be the best option for GreenEgg.

Management options:

  • Manual administration is needed in any case, especially at the beginning of the cloud implementation process. But later, it’s more effective to utilize a specialized management solution.
  • Eucalyptus can be useful in case of running a hybrid cloud, since it implements the EC2 interface and can integrate private infrastructure into a public EC2-based cloud.
  • RightScale is the best candidate for managing an EC2 cloud. It can be brilliantly integrated with EC2 and has everything to run an EC2 cloud, including monitoring, alerting, automated scaling, etc. However, because of its price, manual administration could be also considered at the initial stages of cloud deployment.

Database options:

  • Running a regular MySQL engine can be an option at the beginning of implementing the cloud solution, because no additional configuration or management is needed. However, as the traffic and website popularity grow, scalable solutions may be required.
  • ScaleDB implements centralized architecture that allows for dynamic scalability, but it can bring a single point of failure into the system. ScaleDB might be a good solution if the site traffic is not stable and has unpredictable peaks. However, in case of high and geographically distributed traffic, ScaleDB might experience low performance.
  • MySQL Cluster seems to be the best database engine for Magento because of its distributed and scalable architecture. It can easily replace a regular MySQL database and handle high traffic and regional expansion by adding a new data node in an appropriate region of the EC2 cloud.

Now, let’s sum everything up. The proposed cloud solution stack for GreenEgg in case of implementing Magento is as follows:

  1. EC2
  2. manual management or RightScale
  3. MySQL Cluster
  4. Apache

For other possible technologies within this stack, please see the reference architecture below.

A cloud stack for Magento

 

A cloud stack for Apache OFBiz

Let’s elaborate on the options for another e-commerce solution, Apache Open for Business (OFBiz). Again, I’ll focus on cloud infrastructure, database, and management options most suitable for this cloud stack.

OFBiz is an open-source enterprise automation software project that includes components such as CMS, e-commerce, CRM, ERP, etc. OFBiz is based on the Java EE platform and makes use of JavaDB as a database server, Apache Geronimo as an application server, and Tomcat or Jetty as a web server.

Infrastructure options for OFBiz are just the same as for Magento: the favorable option is EC2, however, private infrastructure or hybrid cloud should also be considered. OFBiz utilizes JavaDB (also called Apache Derby) as a database engine. JavaDB has no built-in clustering support. However, it could be integrated with another open-source solution called Sequoia to provide replication over multiple data nodes, load balancing, and failover. Sequoia is distributed under the Apache license.

OFBiz, the Accounting Manager app (image credit)

Since OFBiz and its application server, Apache Geronimo, are both based on the Java EE platform, OFBiz has a built-in support for clustering at the application-server level. So, this becomes a matter of proper configuration of Geronimo. With a management solution like RightScale, adding application nodes can be done automatically—the only thing you need to do is prepare appropriate instance templates and scripts.

Another promising clustering option for running OFBiz is using the Terracotta framework that allows virtually any Java application to be executed on several instances within a cloud. Terracotta works at the JVM level. Therefore, the framework makes it possible to run tasks—such as sending e-mails, preparing reports, or doing maintenance jobs—on dedicated nodes and improve the overall system performance.

So, the proposed solution stack for OFBiz is as follows:

  1. EC2–RightScale–JavaDB
  2. Sequoia–Terracotta
  3. Geronimo–Tomcat

For other possible technologies within this stack, please see the reference architecture below.

A cloud stack for Apache OFBiz

Let me know if you have any technical questions or want us to assist you with an architecture design for your cloud system.

 

About the author

Vitaly Sedelnik is Lead Cloud Solutions Engineer at Altoros. He has extensive experience with managing development, operations, and support teams for both startups and enterprises. As a solutions architect, Vitaly helped several customers of Altoros move to the cloud and increase DevOps productivity by adopting Cloud Foundry. He was also actively involved in working with other cloud platforms—such as RightScale, OpenStack, vSphere, CloudStack, etc.


This blog post is written by Vitaly Sedelnik; edited by Alex Khizhniak.