Geek2Geek – Centralized Logging

Last week it happened again. Geek2Geek!

Geek2Geek_1_twitter

This time we came together at Flyeralarm in Berlin to talk about centralized logging. That is an interesting topic for all companies which have to scale. As soon you have more than 1 server you need to think about how you collect and analyze your log files in a distributed system. There are a couple good solutions out there for this problem.

Jilles van Gurp did the first talk about the ELKstack. ELK stands for E = Elasticsearch, L = Logstash and K = Kibana. All three products belong to the Elasticsearch company and they work all together smoothly in harmony. Jilles showed us how they use the ELK stack at Linko to build the LinkoApp.

Geek2Geek_2_linkoapp

Jilles gave us a short intro to the technology on a couple slides before he switched to the live demo. It was very interesting to listen to his real-world experiences with the ELK stack.

Geek2Geek_3_colectd

The learning from the past couple months are, it is easy to setup but you should be careful with the Elasticsearch cluster. Don’t shut it down all at once 😉

After the first presentation the Pizza arrived and we took a little break with Pizza & Beer.

Geek2Geek_4_pizza

Lennart is THE guy behind Graylog2. He started the project a couples years ago at Jimdo. The very first version was implemented in Ruby. Graylog2 is a completely rewrite in Java. Lennart is also CoFounder of Torch, the company behind Graylog2.

Lennart gave a short intro about the history, intention and philosophy behind Graylog2.

Geek2Geek_5_Graylog2

I was impressed how much he knows about the other logging solutions, such as Logstash/Kibana and Splunk. He was not afraid to talk about feature comparisons and pros & cons of the different solutions.

Geek2Geek_6_Graylog2

Graylog2 is build for Enterprise usage. It is optimized for speed and high volume data. The interesting thing is that you can use it together with Elasticsearch and Logstash.

Many thanks for to Jilles and Lennart for the great talks. Both solutions are very interesting. If you still read logs on the server with “less” you should definitely check out these 2 great solutions!

By the way. I also tried to organize a Splunk talk, but unfortunately I couldn’t find any Splunkies willing to give a talk about Splunk at Geek2Geek.

Many thanks to Flyeralarm for sponsoring Location, Pizza and Beer! You guys are awesome!

By the way Flyeralarm just opened a new branch in Berlin. They have a really nice office. This is their meeting room for example.

Screen Shot 2014-05-22 at 12.58.18

And they are currently looking for experienced PHP developers. If you are interested you should contact Thomas.

Ruby on Rails + ElasticSearch

This is a tutorial how to use ElasticSearch with Ruby on Rails. ElasticSearch is a distributed RESTful Search Engine build on top of Apache Lucene.

Sure! You can use your SQL database for search. But that is usually slow and you will not get very good search results. With ElasticSearch you can deliver a fuzzy search. Let’s say you have a record “Hibernate” in the database. If somebody is doing a search for “hibernate” you will get a match with a simple SQL query. But what if your customers input looks like this:

  • hibernate 3.2
  • hibernate.jar
  • hibernate.jar 3.5

In this cases you will have 0 results with a simple SQL query. With ElasticSearch you would still have results. Depending on your configuration. So let’s start!

ES

Just download the current version from here: http://www.elasticsearch.org/download/. Unpack it and run the ES server with this command:

./bin/elasticsearch -f

The ES server is now running on localhost:9200. If you type in “http://localhost:9200/_search” into your browser you should get some basic results.

I assume that you know already Ruby on Rails and ActiveRecord. Of course there is a GEM to interact with ElasticSearch. Checkout the Tire GEM. That is a pretty good wrapper for ElastisSearch. Just add it to your Gemfile.

gem 'tire', '0.5.4'

and run:

bundle update

In your application.rb you need to require the new package:

require 'tire'

And in environment.rb you need to init the package:


begin
  Tire.configure do
    logger STDERR
    url Settings.elasticsearch_url
  end
rescue => e
  p "Wrong configuration: #{e}"
end

On the GitHub Page there is a description how to integrate Tire into your model. But honestly I don’t like that very much. That just blows up the model class. I prefer a clear separation between my models and the interaction with ElasticSearch.

The model I wanted to map and to make searchable with ElastiSearch is “product.rb”. It is located in “app/models/”. I created another directory called “app/elastics/”. And here I placed a new file “product_elastic.rb”, which is mapping my Product class to ElasticSearch and is responsible for the interaction with the ES server.

The first thing you have to do is to create a mapping. You have to map your properties from your model to ElasticSearch. This is how I did my first mapping:


def self.create_index_with_mappings
  Tire.index Settings.elasticsearch_product_index do
    create :mappings => {
      :product => {
        :properties => {
          :_id => { :type => 'string', :analyzer => 'keyword', :include_in_all => false },
          :name => {:type => 'string', :analyzer => 'snowball', :boost => 100},
          :description => { :type => 'string', :analyzer => 'snowball' },
          :description_manual => { :type => 'string', :analyzer => 'snowball' },
          :language => { :type => 'string', :analyzer => 'keyword'}
        }
      }
    }
  end
 end

The analyzers are documented on the ElasticSearch homepage: http://www.elasticsearch.org/guide/. Here is the magic happening 😉

Than I wrote 2 more methods (clean_all and reset) to delete the “product” index at ES and to create the mappings.


def self.clean_all
  Tire.index( Settings.elasticsearch_product_index ).delete
end

def self.reset
  self.clean_all
  self.create_index_with_mappings
end

Don’t call the reset method in production 😉 With this 3 methods you can delete old mapping and create a new one. You can now run the rails console and try out the methods. That should all work fine.

If creating the mapping was successful, the next step is to index the data in the database. To index a model, the model itself must offer a method which returns their values as JSON. I added a “to_indexex_json” method to the Product class:


def to_indexed_json
 {
   :_id => self.id.to_s,
   :_type => "product",
   :name => self.name,
   :description => self.description ? self.description : "" ,
   :description_manual => self.description_manual ? self.description_manual : "" ,
   :language => self.language,
   :group_id => self.group_id ? self.group_id : "",
   :prod_key => self.prod_key,
 }
 end

The first 2 attributes are required by Tire, to link the response from ES with your model. And here is the method in “product_elastic.rb” to index one record.


def self.index( product )
  Tire.index Settings.elasticsearch_product_index do
    store product.to_indexed_json
    product.update_attribute(:reindex, false)
  end
rescue => e
  p "ERROR in index(product) Message: #{e.message}"
  p "ERROR in index(product) backtrace: #{e.backtrace}"
end

And this here is the method to index all products from the DB:


def self.index_all
  Product.all.each do |product|
    ProductElastic.index product
  end
  self.refresh
end

def self.refresh
  Tire.index( Settings.elasticsearch_product_index ).refresh
end

Easy! Right? You can try the methods in the rails console. All right. You can now create mappings and index data. The only thing which is missing now is the search method. There is a lot to say about the search. I really recommend that you take your time and read the documentation about the search at the Tire homepage and the ES homepage. But here is one example.


def self.search(q, page_count = 1)
  if (q.nil? || q.empty?)
    raise ArgumentError, "query is empty! This is not allowed"
  end

  page_count = 1 if page_count.nil? || page_count.to_i < 1
  results_per_page = 30
  from = results_per_page * (page_count.to_i - 1)

  q = "*" if !q || q.empty?

  s = Tire.search( Settings.elasticsearch_product_index,
   load: true,
   from: from,
   per_page: results_per_page,
   size: results_per_page) do |search|

    search.sort { by [{:_score => 'desc'}] }

    search.query do |query|
      query.string "name:" + q
    end

  end

  s.results
end

The method above is with paging. You can easily use it with the “will_paginate” GEM.

There is much more to say about the mapping and the search. But this is the part there you have to invest time to figure it out, to deliver the search results you want. You will find a pretty good documentation on the ElasticSearch homepage: http://www.elasticsearch.org/guide/. And the core committer from the Tire project always responded in less than 24 hours to my tickets. The project support is very good.

Together with Timo I integrated ElasticSearch into VersionEye. Since them the search is much faster and the results are better. Even if there is no perfect match you will get results.