Building a Custom BOSH CPI for the Cloud Foundry PaaS: A GCE Example

by Alexander LomovSeptember 1, 2014
Learn about version compatibility and architectural issues, how to address them, how to bind BOSH and a CPI, create custom stemcells, etc.

Portability and cross-platform compatibility are the fundamental principles and key advantages of the Cloud Foundry PaaS. Despite that, its architecture supported a limited number of cloud platforms until now: OpenStack, AWS, vSphere, vCloud, and Warden. However, thanks to the efforts of the community some new names have been added to the list of available IaaS vendors. At the end of May 2014, Pivotal released its Google Compute Engine CF-BOSH CPI. Developers are currently discussing ways to create a CPI for Microsoft Azure in the BOSH Developers Google group. Finally, the BOSH team have released an experimental version of the external CPI that can serve as a new way for creating CPIs.

In this post, we will share our experience of developing a custom CPI for Cloud Foundry using the standard CPI mechanism. You will also learn about the issues we have encountered and get some tips on how to address them.

 

Teaching Cloud Foundry to work with a new IaaS

Although Cloud Foundry itself can be deployed on almost any cloud infrastructure (with custom tools), BOSH—the official tool for installing this PaaS—has a limited range of supported solutions. It interacts with different IaaS platforms through BOSH Cloud Provider Interface (CPI), which is a set of common methods that allow for working with a specific cloud. These methods help to create and remove images, start and stop VMs, set up networks, and perform other cloud management tasks. So, since BOSH did not support GCE out of the box, we were to improve BOSH and create a CPI for this infrastructure.

 

Creating a custom CPI for Google Compute Engine

Since BOSH did not have a CPI for GCE, it had to be created from scratch. To do that, we used fog, a Ruby-based tool that makes it possible to work with a cloud API. This Ruby Gem provides a simple DSL that allows for working with a variety of cloud providers. The fog library is used to map the XML/JSON responses from an IaaS provider into objects. That is one of the reasons why it is used inside BOSH, for instance, in OpenStack CPI, the BOSH Registry, and Director components. So, we decided that fog would be a good choice for creating a GCE CPI.

 
Version compatibility issues

While we were researching into this issue, Google released a new version of Google Compute Engine API. At that point, fog was still using v1.beta16. To make fog compatible with MicroBOSH, we introduced a number of changes and shared the updates with the community. These covered support for GCE v1, working with Google storage, and fixing tests. Here is the list of all the commits:

  • Fixed tests for the Google Cloud module of fog—#2508
  • Fixes that improve the operation of Google Cloud Storage—#2994
  • Fixes that add a possibility to create public buckets in Google Cloud Storage—#2555, #2554, and #2556
  • Refactoring—#2496, #3000, and #2995

In addition to these improvements, we patched fog inside our GitHub repo. This enabled us to provide support for GCE API v1 and the required version of fog inside BOSH.

 
Architectural issues

Incompatibility of product versions was not the only issue we were to deal with. Almost all requests to the GCE API are asynchronous and return operation objects (there are three types of objects: GlobalOperation, RegionalOperation, and ZoneOperation). This allows for building asynchronous functions. However, at that moment, fog did not support asynchronous operations, which was discussed in fog PR #2501. This issue has been partially fixed now, but it still remains relevant for functions that create sources.

Another issue was caused by the way the fog library maps responses from IaaS providers. It converts responses into fields of an object. The process is actually simple and straightforward. However, the format of fields fetched from GCE does not always coincide with the required format for managing resources. For instance, when you fetch disk information, zone details are represented as a zone name, while you need it to be displayed as an URL to perform any actions on this disk.

To eliminate the drawbacks described above, we pulled a change request (#2501), but these updates have not been accepted yet.

 

Updating BOSH

The built-in components of BOSH use fog v1.14.0 (the current stable version is 1.22.1), which is still an issue for many developers. To update the version of fog and add the required functionality, we decided to create an extension to fog v1.14 using a monkey patch.

Since BOSH was initially designed to work with AWS and Openstack, it reflects architectures of these two cloud platforms. In particular, it uses three types of disks: system, ephemeral, and persistent. This approach to data storage had been used since the first BOSH releases to reduce the cost of deployment. However, this variety of disks causes some issues. For example, when an instance is terminated and then started again, an ephemeral disk will be recreated from scratch, and BOSH will not find the necessary disks. This bug has been fixed in the latest version of BOSH. If an instance flavour does not have an ephemeral disk, all data is dumped to the persistent storage. However, the error is still relevant for AWS deployments.

 

Binding BOSH and a CPI

So, we have created a CPI and updated fog, but it was not the end of the story. Now, we had to add the new CPI to BOSH. To do this, you need to make adjustments inside the BOSH components, for instance, the BOSH Agent, the release creator, components used for building stemcells, and in other parts of the solution. You will also have to change MicroBOSH defaults and managers. Many of these components use a simple “when..case” syntax for choosing a CPI. Since even an easy factory method pattern would fit this case better, this makes it challenging to provide support for new cloud platforms in BOSH. Work on fixing these drawbacks is underway, read more about adding external CPIs in this project and this issue in Pivotal Tracker. Fortunately, an experimental CPI that uses an external CPI was released recently.

Although, a lot has been done to simplify adding new CPIs, BOSH supports only several architectures, such as OpenStack, AWS, vSphere, vCloud, and Warden. You may still find it difficult to add other platforms that are based on different architectural principles. However, given the active community supporting the project, it should become possible and easy in the future.

 
Creating custom stemcells

To work with a certain cloud platform, BOSH requires a stemcell (image) created specifically for it. One of the requirements for this stemcell and the GCE archive containing it is that they should be created with the sparse option. It means that files with “holes” (areas that have never been written to) should be interpreted as zeros and stored in an archive. For instance, an AWS image of 1.2 GB will occupy 10 GB if it is not archived. Compared to the standard procedure for creating an Amazon Machine Image, this method takes more time.

Tools for building stemcells require special skills and knowledge about images in the IaaS you have selected. You may need to log into your AWS account to use it with Vagrant (true for default settings). At the same time, there are some nice and modern solutions, such as Packer, that automate creation of machine images (currently used inside BOSH Lite). Normally, it takes several hours to build a stemcell.

MicroBOSH has its own stemcell with a compiled blob store that has everything you need to run BOSH inside a particular virtual machine. Because of that, you need to rebuild the MicroBOSH stemcell every time you make changes to BOSH release components. It can take quite some time, if you want to modify a BOSH release and test these changes by deploying it with MicroBOSH. This complicates and adds time to the process of stemcell debugging and CPI development.

 

Conclusions

When we had reached this point, Pivotal released its Google Compute Engine CF-BOSH CPI. It uses the updated fog v1.22.0 that eliminates most of the issues we had to overcome at the beginning of this project. So, we were moving in the same direction, and our commits to fog helped Pivotal to deliver their update faster. In addition, we detected some issues that can be fixed and improved in the upcoming Cloud Foundry releases. We will keep on contributing to fog and other open-source solutions around Cloud Foundry.

 

Further reading