Implementing and Improving PostgreSQL Full-Text Search in Ruby Apps

by Pavel AstraukhAugust 28, 2013
Learn how to enable a full-text search by a title, content, the author's name, and comments for a Ruby-on-Rails application.

PostgreSQL is a great relational database with a lot of features. This blog post describes how to implement PostgreSQL full-text search and add it to your Ruby-on-Rails app. I will demonstrate how it works on a test app that searches articles by a title, content, the author’s name, and comments. In addition, the article explains how to speed up search by eliminating excessive join queries that greatly slow down the system’s performance.

 

Preparation

There already exist ready-made solutions for implementing PostgreSQL full-text search. We will be using this one. So, you need to add the following line to your Gemfile.

gem 'pg_search'

Then, add the following line to your application.rb.

config.active_record.schema_format = :sql

 

A sample app

My sample app will have three models.

  1. class Article < ActiveRecord::Base
    belongs_to :author
    has_many :comments

    attr_accessible :content, :title
    end
  2. class Author < ActiveRecord::Base
    attr_accessible :name
    has_many :articles
    end
  3. class Comment < ActiveRecord::Base
    belongs_to :article
    attr_accessible :content
    end

With these models, search will be implemented for articles. It will become possible to search an article by its title, content, comments, and the author’s name. In addition, search will have other settings about which you can read at gem’s page. So, to enable search for articles you should add the followong to the article model.

include PgSearch

pg_search_scope :search,
against: [:title, :content],
associated_against: {
author: :name,
comments: :content
},
using: {
tsearch: {
dictionary: 'english',
any_word: true,
prefix: true
}
}

Then, you can perform search using the Article.search method.

irb(main):011:0> Article.search('title')
=> [

Hooray, it found an article by the word title! However, look at the following query.

Article Load (1.9ms)  SELECT "articles".*, ((ts_rank((to_tsvector('simple', coalesce("articles"."title"::text, '')) || to_tsvector('simple', coalesce("articles"."content"::text, '')) || to_tsvector('simple', coalesce(pg_search_1eb533ea18bbbe0846ef24.pg_search_a612c20e7f822205b5b540::text, '')) || to_tsvector('simple', coalesce(pg_search_121ea89914a721445aee70.pg_search_6e76a7a40d9cb3861e7fb2::text, ''))), (to_tsquery('simple', ''' ' || 'title' || ' ''')), 0))) AS pg_search_rank FROM "articles" LEFT OUTER JOIN (SELECT "articles"."id" AS id, string_agg("authors"."name"::text, ' ') AS pg_search_a612c20e7f822205b5b540 FROM "articles" INNER JOIN "authors" ON "authors"."id" = "articles"."author_id" GROUP BY "articles"."id") pg_search_1eb533ea18bbbe0846ef24 ON pg_search_1eb533ea18bbbe0846ef24.id = "articles"."id" LEFT OUTER JOIN (SELECT "articles"."id" AS id, string_agg("comments"."content"::text, ' ') AS pg_search_6e76a7a40d9cb3861e7fb2 FROM "articles" INNER JOIN "comments" ON "comments"."article_id" = "articles"."id" GROUP BY "articles"."id") pg_search_121ea89914a721445aee70 ON pg_search_121ea89914a721445aee70.id = "articles"."id" WHERE (((to_tsvector('simple', coalesce("articles"."title"::text, '')) || to_tsvector('simple', coalesce("articles"."content"::text, '')) || to_tsvector('simple', coalesce(pg_search_1eb533ea18bbbe0846ef24.pg_search_a612c20e7f822205b5b540::text, '')) || to_tsvector('simple', coalesce(pg_search_121ea89914a721445aee70.pg_search_6e76a7a40d9cb3861e7fb2::text, ''))) @@ (to_tsquery('simple', ''' ' || 'title' || ' ''')))) ORDER BY pg_search_rank DESC, "articles"."id" ASC

Above, you can see that the query has several joins, which is not good, because it greatly slows performance. I will show you how to avoid this behavior and improve search performance.

 

Improving performance

I am going to add a new column to the tables with articles. The new column will have the
tsvector type and store all the words by which an article can be searched. I will also add the GIN index to speed up search. So, we need to generate migration.

class AddTsvectorColumnToArticles < ActiveRecord::Migration
def up
add_column :articles, :search_vector, :tsvector

execute <<-EOS
CREATE INDEX articles_search_vector_idx ON articles USING gin(search_vector);
EOS
end

def down
remove_column :articles, :search_vector
end
end

I’ve added the tsvector column to the articles named search_vector. Now, we need to fill up this vector. For the purpose, I am going to write the postgresql function for filling up the vector and a trigger that will fill up this vector when inserting or updating an article. Below is an example of migration for creating a trigger and a function.

class CreateFunctionAndTriggerForFillingSearchVectorOfArticles < ActiveRecord::Migration
def up
execute <<-EOS
CREATE OR REPLACE FUNCTION fill_search_vector_for_acticle() RETURNS trigger LANGUAGE plpgsql AS $$
declare
article_author record;
article_comments record;

begin
select name into article_author from authors where id = new.author_id;
select string_agg(content, ' ') as content into article_comments from comments where article_id = new.id;

new.search_vector :=
setweight(to_tsvector('pg_catalog.english', coalesce(new.title, '')), 'A') ||
setweight(to_tsvector('pg_catalog.english', coalesce(new.content, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_author.name, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_comments.content, '')), 'B');

return new;
end
$$;
EOS

execute <<-EOS
CREATE TRIGGER articles_search_content_trigger BEFORE INSERT OR UPDATE
ON articles FOR EACH ROW EXECUTE PROCEDURE fill_search_vector_for_acticle();
EOS

Article.find_each(&:touch)
end

def down
execute <<-EOS
DROP FUNCTION fill_search_vector_for_acticle();
DROP TRIGGER articles_search_content_trigger ON articles;
EOS
end
end

Each time, when an article is created or updated, a new vector will be built for the article. Look at the following example.

select name into article_author from authors where id = new.author_id;
select string_agg(content, ' ') as content into article_comments from comments where article_id = new.id;

I am fetching a name from an author and concatenated content of an article. Then, I create a new vector.

new.search_vector :=
setweight(to_tsvector('pg_catalog.english', coalesce(new.title, '')), 'A') ||
setweight(to_tsvector('pg_catalog.english', coalesce(new.content, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_author.name, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_comments.content, '')), 'B');

More details about syntax of each function can be found in this documentation.

We also need to update an article each time when a comment is updated. For this purpose, we will add touch true for association at the comments model.

belongs_to :article, touch: true

Now, every time when comments are updated, an article will be updated true. This will call a trigger that will update the articles’ search_vector column.

The last step is to make search use the new search_vector column. For the purpose, we need to change pg_search_scope to the article model.

pg_search_scope :search,
against: :search_vector,
using: {
tsearch: {
dictionary: 'english',
any_word: true,
prefix: true,
tsvector_column: 'search_vector'
}
}

 

Results

Now, our articles have the tsvector column that stores searches by text.

irb(main):051:0> Article.last.search_vector
=> "'astraukh':6B 'comment':9B 'content':4B,10B 'pavel':5B 'titl':1A"
irb(main):052:0> Article.search('title')
=> [

A query looks like this.

Article Load (0.4ms)  SELECT "articles".*, ((ts_rank(("articles"."search_vector"), (to_tsquery('english', ''' ' || 'title' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "articles" WHERE ((("articles"."search_vector") @@ (to_tsquery('english', ''' ' || 'title' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "articles"."id" ASC

As you see, the query doesn’t have any joins, and this behavior greatly improves search performance. Now, you know how to implement and improve your search using PostgreSQL full-text search.

 

Further reading

 

About the author

Pavel Astraukh developed a number of projects using Ruby on Rails and other technologies. He worked in small and large teams utilizing the Agile methodology. Find him on GitHub.