Organizing Storage in Multiple Fog Containers Using CarrierWave

by Dmitry SavitskiMay 23, 2013
This blog post explores CarrierWave capabilities and how to benefit from the solution's flexibility.

CarrierWave is one of the most popular Ruby-on-Rails solutions for file upload and storage. Most of us used it more than once. In this blog post, I want to dig into how flexible the solution really is. First, we will look a little deeper into how it works on lower levels (feel free to skip that part if you know stuff) and then watch how you can take advantage of its flexibility on a couple of simple scenarios.

 

How it works

First of all, CarrierWave has a number of modules extending your object-relational mapping (ORM) classes. The gem itself includes the extension only for ActiveRecord, but everything else is readily available. All it does is creating a class method—mount_uploader—that receives any string parameter of an ORM model.

# == Schema Information
#<
# Table name: items
#
#  id         :integer          not null, primary key
#  file       :string(255)

class Item &lt; ActiveRecord::Base
  mount_uploader :file
end

Every time you instantiate an object of the Item class, your ORM downloads the data from your database and instantiates the object as usual. Now, there is an object of the Uploader::Base class, mounted where just a string parameter should have been. Methods such as file and file= don’t access data in the @file variable directly, as there is a middle man now.

The file= setter method of a new object now proceeds through the following steps:

  1. Accepts an object of any class extending File, including various types of streams, such as Tempfile and ActionDispatch::Http::UploadedFile. This is the object you receive from HTTP multipart file uploads.
  2. Caches the received file in a temporary directory locally.
  3. Assigns the file’s name to our @file variable. Unlike some other upload and store gems, CarrierWave doesn’t save a full file path to a database, rather its original name only. The responsibility of building the full path is delegated to the Uploader object.
  4. Waits for an object to be persisted.
  5. Optionally conducts file processing, resulting in one or more new files.
  6. Copies a file or processed files from a temporary storage to a persistant storage. Again, exact location and specifics of a storage is determined by the Uploader object.

When you update the existing object, the process is generally the same with the addition of Uploader that downloads a previously uploaded version, caches it, and restores if a persisting object vailed.

So, how exactly does Uploader choose where and how to store the uploaded file and how to retrieve it? Carrierwave::Uploader::Base has a large number of defined default methods. Some of them are grouped into the Strategy modules, specifying each aspect of the process (names for cache/storage directories, specifics of processing, etc.). These methods also have access to model objects Uploader is mounted on, which means we can tweak handling of each uploaded file.

Of course, there is the Carrierwave.configure method accepting a block of configurations. However, it merely provides default results for these uploader instance methods.

CarrierWave has two main storage strategies: :file for a local storage and :fog for a remote storage. A user can add other strategies as long as they support storing and retrieving, of course, but the above-mentioned strategies already cover most options. This means, the fog storage strategy is itself a delegation to the fog/fog gem, which provides a common API for multiple cloud storage solutions. To initialize the CarrierWave fog storage, you should provide a service name, a container name, and credentials.

 

How to use CarrierWave to take advantage

All code snippets given here are excerpts of a more complex working application. This means, some stuff could be left out, and some become more complex while I was trying to simplify the code.

 
1. Specifying upload directories

Both default storage options rely on a number of uploader methods to detemine how to handle uploads. The main option is the store_dir method that by default looks like this.

def store_dir
  "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end

Uploaded files will be stored locally in a number of nested folders in a public directory of the application, with each file having its own folder.

  • Generally, we try to normalize a database in such a way that a single class has a single file field maximum, so that the mounted_at parameter could be left out.
  • On your development machine, you will probably run an application on different environments (at least test). So, consider adding Rails.env to store_dir.
  • Relying on a class name of the model can render data inconsistently if you refactor the model. File will still be there, but Uploader will be unable to retrieve it, since it will look in a wrong place. I prefer to implicitly set a folder name in each Uploader.

Anyway, now, you can see that by overwriting the store_dir method in your uploaders, you can store your uploads in any way you like. For example, you can group files by their creator’s identity rather than by their type.

class GeneralApplicationUploader &lt; Carrierwave::Uploader::Base
  
  def store_dir
    folder = respond_to?(:folder_name) ? folder_name : model.class.to_s.underscore
    base_upload_dir &lt;&lt; "#{folder}/#{model.id}"
  end

  private

 def base_upload_dir
    "uploads/#{Rails.env}/"
  <span class="k">end</span>

end

class ItemUploader &lt; GeneralApplicationUploader

  def folder_name
   'items'
  end

end

There is one thing you should always remember, though: once a file was uploaded, any change to how store_dir resolves will prevent Uploader from finding that file.

 
2. Differences in handling local and cloud storages

Now, if you expect a lot of upload/download activity, and you have an option of using a remote storage in production, you should definitely do that. Using a remote storage for development or test environments, on the other hand, can be troublesome (that is, too expensive).

Ideally, strategies for handling the file and fog storages should behave in the exact same way. For most cases, they do. It is up to you to decide on whether you should develop an application using a cloud storage all the time or cut your expenses and develop using a local storage, being prepared to deal with a few differences.

If you, for example, have a number of text documents stored, and you want to show documents’ text on a page, there will be a difference. It will be there, because the fog storage mostly handles file URLs and passes them to a client browser not a server.

There is a method you can employ to check if you are using a local or a remote storage.

class TextUploader &lt; GeneralApplicationUploader
  def store_local?
    _storage == CarrierWave::Storage::File
  end

  # Then the text file content can be accessed as:

  def body 
    store_local? ? File.read(path) : open(url).<read>
  end
end

class Item &lt; ActiveRecord::Base
  mount_uploader :file, TextUploader
end

This way, Item.first.file.body will return the same text regardless of whether a file is stored remotely or locally.

 
3. Safe file names

By default, CarrierWave already sanitizes the name of a file it receives, keeping only English letters and numbers. There is also a configuration option that helps you to keep all unicode characters. It also helps you to avoid file path injection vulnerabilities. However, storing a file with its name unchanged still has some disadvantages. For example, one can upload a file with a name so long that saving it causes an exception on the file system level. To avoid this, we rename all files that are saved to the system, encoding old file names.

  
def full_filename(for_file)
    original_name = for_file || model.read_attribute(:mounted_as)
   [Digest::MD5.hexdigest(original_name), File.extname(original_name)].join
 end

Above the full_filename method is used both on storing a file (where for_file is a file name of an incoming upload) and on retrieving a file (when it is nil). The good part is that the name is stored in a database in its original state, while on a disk, it is properly encoded.

Now, you can provide this file for download, with it retaining the original file name, using this link.

= link_to 'Download', item.file.url, download: item[:file], target: '_blank'

Cached files are also saved to the disk, so, we will probably have to encode their names too.

  
def cache_name
    if cache_id &amp;&amp; original_filename
      name = Digest::MD5.hexdigest(full_original_filename + cache_id)
      extension = File.extname(full_original_filename) 
      [name, extension].join
    end
  end

 
4. Switching remote storage containers of a file

In some cloud file storages, for example on Rackspace, a file can be stored in two types of containers.

  • Public. A file is readily available for download via HTTP, sometimes even with the container delivery network (CDN) support. Content is delivered quickly, but you can’t even dream of many security features, such as hotlinking protection.
  • Private. A file is available only via a SSL-secured temporary link. However, a file is difficult to download other than with your application, as CDN is unavailable.

Let’s imagine that we need to store some type of files, but it is a user who decides whether it should be publicly available or hidden. Naturally, we will want the interface to be as simple as possible.

# == Schema Information
#
# Table name: items
#
#  id         :integer          not null, primary key
#  file       :string(255)
#  hidden     :boolean          not null, default(FALSE)

class Item &lt; ActiveRecord::Base
  mount_uploader :file, SwitchingStoragesUploader
end

Uploader::Base has two methods—fog_public and fog_directory—that decide where to store uploads.

class SwitchingStoragesUploader &lt; GeneralApplicationUploader
  
  def fog_public
    !(model.respond_to?(:hidden) &amp;&amp; (model.hidden_changed? ? model.hidden_was : model. hidden))
  end
  
  def fog_directory
    fog_public ? 'public_container_name' : 'private_container_name'
  end
   
end

This way, when you create a new object of the class Uploader is mounted on, CarrierWave chooses a storage depending on the persisted value of the hidden field. If you change that parameter on the existing object, though, without touching the file field, an uploaded file won’t move, and the reference to it will be lost.

class SwitchingStoragesUploader &lt; GeneralApplicationUploader
  
  def initialize(*)
    if model.respond_to? :hidden
      model.define_method(:hidden=) do |new_value|
        send "#{ mounted_as }_will_change!"
        super(new_value)
      end
    end
    super
  end
   
end 

That redefinition method, though hacky, will make Uploader work as expected: when you change a value of the hidden parameter, Uploader downloads a file from an old container, deletes the file from the storage, then uploads a cached file to a new container after the object is persisted.

Do we really want to burden our application server with upload/download routines all the time? I think, we don’t. As I mentioned before, all interactions with cloud storages are conducted through the fog gem that provides a common API for these storage services. This API is, in fact, wider than it is used for CarrierWave functionality.

class SwitchingStoragesUploader &lt; GeneralApplicationUploader

 def initialize(*)
    if model.respond_to? :hidden
    
      if model.persisted? &amp;&amp; model.hidden_changed?
        def model.before_save(model)
          model.send(mounted_as).copy_with_fog
        end
      end
      
    end
    
    super
  end
  
  def copy_with_fog
    unless store_local? || mounted_column_changed?
      begin
      
        source_container = fog_directoy
        target_container = fog_public? 'private_container_name' : 'public_container_name'
        
        fog_api = Fog::Storage.new(fog_credentials)
         
        fog_api.copy_object(source_container, store_dir, target_container, store_dir>)
        fog_api.delete_object(source_container, store_dir)
          
      rescue Fog::Errors::Error
        model.errors.add(mounted_as, 'Error occured while migrating file in storage!'))
        false
      end
    end
  end 
  
 def mounted_column_changed?
    model.public_send(mounted_as).cached?.present?
 end 
   
end

The mounted_column_changed? method skips the process of copying if a new file is provided. In that case, a new file will be stored to a new container anyway. That’s also why we check if the model was persisted before.

As you can see, all we do is initialize an object of Fog::Storage, use it to copy a file from one container to another, then delete the file in an old container. If this, somehow, fails, we add an error to the model and cancel Save, providing a level of strong exception safety to the process.

 

Conclusion

The approach is simple: control as much of the process as possible by application entities, not by storing full paths to a database. For this purpose, nearly every aspect of that process is arranged as a method defined on the Uploader object, giving a developer a good level of flexibility. CarrierWave is a nice tool to help you store your files, but, like any tool, it requires some knowledge to handle it well.

 

Further reading