Ruby on Rails + ElasticSearch

This is a tutorial how to use ElasticSearch with Ruby on Rails. ElasticSearch is a distributed RESTful Search Engine build on top of Apache Lucene.

Sure! You can use your SQL database for search. But that is usually slow and you will not get very good search results. With ElasticSearch you can deliver a fuzzy search. Let’s say you have a record “Hibernate” in the database. If somebody is doing a search for “hibernate” you will get a match with a simple SQL query. But what if your customers input looks like this:

  • hibernate 3.2
  • hibernate.jar
  • hibernate.jar 3.5

In this cases you will have 0 results with a simple SQL query. With ElasticSearch you would still have results. Depending on your configuration. So let’s start!

ES

Just download the current version from here: http://www.elasticsearch.org/download/. Unpack it and run the ES server with this command:

./bin/elasticsearch -f

The ES server is now running on localhost:9200. If you type in “http://localhost:9200/_search” into your browser you should get some basic results.

I assume that you know already Ruby on Rails and ActiveRecord. Of course there is a GEM to interact with ElasticSearch. Checkout the Tire GEM. That is a pretty good wrapper for ElastisSearch. Just add it to your Gemfile.

gem 'tire', '0.5.4'

and run:

bundle update

In your application.rb you need to require the new package:

require 'tire'

And in environment.rb you need to init the package:


begin
  Tire.configure do
    logger STDERR
    url Settings.elasticsearch_url
  end
rescue => e
  p "Wrong configuration: #{e}"
end

On the GitHub Page there is a description how to integrate Tire into your model. But honestly I don’t like that very much. That just blows up the model class. I prefer a clear separation between my models and the interaction with ElasticSearch.

The model I wanted to map and to make searchable with ElastiSearch is “product.rb”. It is located in “app/models/”. I created another directory called “app/elastics/”. And here I placed a new file “product_elastic.rb”, which is mapping my Product class to ElasticSearch and is responsible for the interaction with the ES server.

The first thing you have to do is to create a mapping. You have to map your properties from your model to ElasticSearch. This is how I did my first mapping:


def self.create_index_with_mappings
  Tire.index Settings.elasticsearch_product_index do
    create :mappings => {
      :product => {
        :properties => {
          :_id => { :type => 'string', :analyzer => 'keyword', :include_in_all => false },
          :name => {:type => 'string', :analyzer => 'snowball', :boost => 100},
          :description => { :type => 'string', :analyzer => 'snowball' },
          :description_manual => { :type => 'string', :analyzer => 'snowball' },
          :language => { :type => 'string', :analyzer => 'keyword'}
        }
      }
    }
  end
 end

The analyzers are documented on the ElasticSearch homepage: http://www.elasticsearch.org/guide/. Here is the magic happening 😉

Than I wrote 2 more methods (clean_all and reset) to delete the “product” index at ES and to create the mappings.


def self.clean_all
  Tire.index( Settings.elasticsearch_product_index ).delete
end

def self.reset
  self.clean_all
  self.create_index_with_mappings
end

Don’t call the reset method in production 😉 With this 3 methods you can delete old mapping and create a new one. You can now run the rails console and try out the methods. That should all work fine.

If creating the mapping was successful, the next step is to index the data in the database. To index a model, the model itself must offer a method which returns their values as JSON. I added a “to_indexex_json” method to the Product class:


def to_indexed_json
 {
   :_id => self.id.to_s,
   :_type => "product",
   :name => self.name,
   :description => self.description ? self.description : "" ,
   :description_manual => self.description_manual ? self.description_manual : "" ,
   :language => self.language,
   :group_id => self.group_id ? self.group_id : "",
   :prod_key => self.prod_key,
 }
 end

The first 2 attributes are required by Tire, to link the response from ES with your model. And here is the method in “product_elastic.rb” to index one record.


def self.index( product )
  Tire.index Settings.elasticsearch_product_index do
    store product.to_indexed_json
    product.update_attribute(:reindex, false)
  end
rescue => e
  p "ERROR in index(product) Message: #{e.message}"
  p "ERROR in index(product) backtrace: #{e.backtrace}"
end

And this here is the method to index all products from the DB:


def self.index_all
  Product.all.each do |product|
    ProductElastic.index product
  end
  self.refresh
end

def self.refresh
  Tire.index( Settings.elasticsearch_product_index ).refresh
end

Easy! Right? You can try the methods in the rails console. All right. You can now create mappings and index data. The only thing which is missing now is the search method. There is a lot to say about the search. I really recommend that you take your time and read the documentation about the search at the Tire homepage and the ES homepage. But here is one example.


def self.search(q, page_count = 1)
  if (q.nil? || q.empty?)
    raise ArgumentError, "query is empty! This is not allowed"
  end

  page_count = 1 if page_count.nil? || page_count.to_i < 1
  results_per_page = 30
  from = results_per_page * (page_count.to_i - 1)

  q = "*" if !q || q.empty?

  s = Tire.search( Settings.elasticsearch_product_index,
   load: true,
   from: from,
   per_page: results_per_page,
   size: results_per_page) do |search|

    search.sort { by [{:_score => 'desc'}] }

    search.query do |query|
      query.string "name:" + q
    end

  end

  s.results
end

The method above is with paging. You can easily use it with the “will_paginate” GEM.

There is much more to say about the mapping and the search. But this is the part there you have to invest time to figure it out, to deliver the search results you want. You will find a pretty good documentation on the ElasticSearch homepage: http://www.elasticsearch.org/guide/. And the core committer from the Tire project always responded in less than 24 hours to my tickets. The project support is very good.

Together with Timo I integrated ElasticSearch into VersionEye. Since them the search is much faster and the results are better. Even if there is no perfect match you will get results.

3 thoughts on “Ruby on Rails + ElasticSearch

  1. Hello, thanks for the nice article!

    I’ve got couple of notes to the code:

    1/ No need to `reset :url` in your configuration block, just set the new one.

    2/ I like to see different approaches for separating the concerns in the application; the default model integration is really to get people started easily, of course it’s possible to design more separated/composed architectures. See eg. https://github.com/karmi/tire/issues/481#issuecomment-9482185 for an example with a `Searchable` module which gets included in the models.

    3/ Your `ProductElastic.create_mappings` method is incorrectly named — it creates the _index_ with mappings. Hence: `ProductElastic.create_index_with_mappings`. Also, consider changing `ProductElastic` to something more descriptive.

    4/ It’s preferable to you `:analyzer => “keyword”` instead of `:index => ‘not_analyzed’`.

    5/ In the `clean_all` method, it’s possible to just call `Tire.index(Settings.elasticsearch_product_index).delete`, no need for a block.

    6/ In the `ProductElastic.index` method, just pass the `product` object — it will be automatically converted to JSON by calling the `to_indexed_json` method.

    7/ The `ProductElastic.index_all` method is actually very problematic, since you’re loading _all_ products into memory and iterating over them. Tire API and DSL has the `Index.import` and `MyModel.import` method, which uses pagination libraries to import your records into batches — please research that. If you use a pagination library such as WillPaginate or Kaminari, just call `Product.import` and everything will be handled for you, effectively.

    8/ Lastly, please don’t use the `query_string` query, unless you need to support the full Lucene query syntax. Use the `match` query (see integration tests for documentation).

    1. Hi Karel. Many thanks for your feedback. I really appreciate that. I customised the code above according to your recommendations.

      But point 6 doesn’t work like you recommended. I am testing my code with RSpec and for every method in ProductElastic there are at least 2 tests. After I removed the “.to_indexed_json” call, the tests failed. It seems that this method will not be called automatically.

      For point 7 and 8 I will do some research. A better solution for point 7 would be great 🙂

    2. I did some import because of the bulk import. I tried this here:

      Tire.index( Settings.elasticsearch_product_index ).import Product.all

      But that returns an error. This here:

      [ERROR] 400 > {“error”:”Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0″}, retrying (2)…

      What I am doing wrong?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s