Introduction to Ansible

Ansible is a great tool for IT automation.

I’m using Ansible to manage the whole infrastructure for VersionEye. Currently I have 36 roles and 15 playbooks defined for VersionEye. I can setup the whole infrastructure with 1 single command! Or just parts of it. I even use Ansible for deployments. Deploying the VersionEye crawlers into the Amazon Cloud is 1 single command for me. And I even rebuild the capistrano deployment process for Rails apps with Ansible.

AnsibleLogo_transparent_web

I just wrote an introduction to Ansible on the VersionEye Blog. Check it out here.

daemon script

This shell script runs forever and checks if the rails worker is running and if not it starts it again:

while :
do
  if ps ax | grep -v grep | grep 'rails worker' > /dev/null
  then
      echo "service running, everything is fine"
      sleep 5
  else
      echo "service is not running. Lets start again"
      cd $APP_ROOT
      $BUNDLE exec unicorn_rails -D -c $CONF
      echo "restarted on $(cat /rails/pids/unicorn.pid)"
      sleep 15
  fi
done

Geek2Geek – Centralized Logging

Last week it happened again. Geek2Geek!

Geek2Geek_1_twitter

This time we came together at Flyeralarm in Berlin to talk about centralized logging. That is an interesting topic for all companies which have to scale. As soon you have more than 1 server you need to think about how you collect and analyze your log files in a distributed system. There are a couple good solutions out there for this problem.

Jilles van Gurp did the first talk about the ELKstack. ELK stands for E = Elasticsearch, L = Logstash and K = Kibana. All three products belong to the Elasticsearch company and they work all together smoothly in harmony. Jilles showed us how they use the ELK stack at Linko to build the LinkoApp.

Geek2Geek_2_linkoapp

Jilles gave us a short intro to the technology on a couple slides before he switched to the live demo. It was very interesting to listen to his real-world experiences with the ELK stack.

Geek2Geek_3_colectd

The learning from the past couple months are, it is easy to setup but you should be careful with the Elasticsearch cluster. Don’t shut it down all at once ;-)

After the first presentation the Pizza arrived and we took a little break with Pizza & Beer.

Geek2Geek_4_pizza

Lennart is THE guy behind Graylog2. He started the project a couples years ago at Jimdo. The very first version was implemented in Ruby. Graylog2 is a completely rewrite in Java. Lennart is also CoFounder of Torch, the company behind Graylog2.

Lennart gave a short intro about the history, intention and philosophy behind Graylog2.

Geek2Geek_5_Graylog2

I was impressed how much he knows about the other logging solutions, such as Logstash/Kibana and Splunk. He was not afraid to talk about feature comparisons and pros & cons of the different solutions.

Geek2Geek_6_Graylog2

Graylog2 is build for Enterprise usage. It is optimized for speed and high volume data. The interesting thing is that you can use it together with Elasticsearch and Logstash.

Many thanks for to Jilles and Lennart for the great talks. Both solutions are very interesting. If you still read logs on the server with “less” you should definitely check out these 2 great solutions!

By the way. I also tried to organize a Splunk talk, but unfortunately I couldn’t find any Splunkies willing to give a talk about Splunk at Geek2Geek.

Many thanks to Flyeralarm for sponsoring Location, Pizza and Beer! You guys are awesome!

By the way Flyeralarm just opened a new branch in Berlin. They have a really nice office. This is their meeting room for example.

Screen Shot 2014-05-22 at 12.58.18

And they are currently looking for experienced PHP developers. If you are interested you should contact Thomas.

Centralized Logging at Geek2Geek

The next Geek2Geek event is this Friday in Berlin Kreuzberg. The event is hosted by Flyeralarm. They even sponsor Beer & Pizza. Many Thanks for that!

Geek2Geek

This time the topic is centralized logging. We have 2 awesome speakers for the event!

Lennart, is THE guy behind Graylog2, the cutting edge logging technology for the new Enterprise. He will give us a short intro into the framework and show the newest features.

Jilles is the lead architect at Linko. He will show us how he is using the ELK stack to build Linko. ELK stands for = ElasticSearch, Logstash and Kibana.

Don’t miss it! Come to Geek2Geek, learn something new, eat a Pizza and drink a Beer!

MongoDB could not restore backup because of “key too large to index” error

Recently I made a db dump on a MongoDB 2.4 server like this:

mongodump --db veye_dev

And I tried to restore it on a MongoDB 2.6 server like this:

mongorestore dump/veye_dev

Unfortunately at some point the restore process failed with this error message:

mongo error: "Btree::insert: key too large to index, failing

There are some restrictions for the size of the index in MongoDB 2.6. Luckily I found the issue on MongoDBs ticket system. The import can succeed if the mongod process is running with the parameter “failIndexKeyTooLong=false”. Just start it like that:

sudo mongod --setParameter failIndexKeyTooLong=false

And now execute the mongorestore again. That worked for me.

Deployment with Capistrano 3

Capistrano is a ruby based deployment tool which executes commands in parallel on multiple remote machines, via the SSH protocol. With Capistrano you can deploy your Rails application on N servers with one single command from your dev machine. You even don’t need to login via SSH to your server. This command can rollout your application on N servers:

cap production deploy

And if something goes wrong you can easily rollback to the last stable deployment. Just like this.

cap production deploy:rollback

Screen Shot 2013-01-15 at 8.15.33 PM

Capistrano is pretty cool. I used already the previous version 2.X. The new version 3.X I’m using already since a couple months in production and it is super stable.

If you are deploying your Rails application to dedicated servers or instances on AWS, than Capistrano is the way to go!

Before you start with Capistrano, you have to implement SSH with authentification keys instead of password. In that way you can just login to your server with a simple “ssh user@server” without password. That is possible if your public ssh certificates are on the server. In that way the server “knows” you.

First of all you need to add the Gem to your Gemfile.

gem 'capistrano'

And if you are using Rails and Bundler you want to add this 2 lines as well.

gem 'capistrano-rails' , '~> 1.1.1'
gem 'capistrano-bundler', '~> 1.1.2'

Now you have to run bundler, to install the packages.

bundle install

As next step you have to capify your rails project. Just run:

capify .

That will create some files in your project.

[add] writing './Capfile'
[add] writing './config/deploy.rb'
[add] writing './config/deploy/production.rb'
[add] writing './config/deploy/staging.rb'
[add] writing './config/deploy/test.rb'
[done] capified!

In the Capfile you can require some capistrano packages. For a Rails App it will look like this.

require 'capistrano/setup'
require 'capistrano/deploy'
require 'capistrano/bundler'
require 'capistrano/rails'
require 'capistrano/rails/assets'
require 'capistrano/rails/migrations'

# Loads custom tasks from `lib/capistrano/tasks' if you have any defined.
Dir.glob('lib/capistrano/tasks/*.cap').each { |r| import r }

In Capistrano 3 most of the magic happens in the deploy.rb file, which is the central configuration file for Capistrano. In general it fetches the current code from your Git server, runs bundler, rake db:migrate, precompiles your assets and starts/restarts the ruby app server.

Here is my deploy.rb with some additional comments.


# Force rake through bundle exec
SSHKit.config.command_map[:rake] = "bundle exec rake"

# Force rails through bundle exec
SSHKit.config.command_map[:rails] = "bundle exec rails"

set :migration_role, 'app' # Defaults to 'db'
set :assets_roles, [:app] # Defaults to [:web]

# The name of your application
set :application, 'myapp'

# Configuration for the source control management system
set :scm , :git
set :repo_url, 'git@github.com:myorga/myapp.git'
set :branch , "master"

# This forwards the user agents and uses the local
# user for the git authentification.
set :ssh_options, {:forward_agent => true}

# User on remote server
set :user , "ubuntu"

# Application root directory on remote server
set :deploy_to , '/var/www/myapp'

# Shared directories over different deployments
set :linked_dirs, %w(pids log)

# Configuring capistrano log output
set :format , :pretty
set :log_level, :info # :debug :error :info

# Keeps the last 5 deployments on the server for rollback scenarios
set :keep_releases, 5

namespace :deploy do

 desc 'Start application'
  task :start do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh start"
  end
 end

 desc 'Stop application'
  task :stop do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh stop"
  end
 end

 desc 'Restart application'
  task :restart do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh restart"
  end
 end

 after :finishing, 'deploy:restart'
 after :finishing, 'deploy:cleanup'

end

The script for starting and stoping unicorn you can find here: http://robert-reiz.com/2012/02/29/running-unicorn-as-a-service-on-debian-linux/.

In Capistrano you have different environments. For example “test”, “staging” and “production”. You can define as much as you want. Each environment has his own configuration file under “config/deploy/”. For example “config/deploy/production.rb”. Which might look like this:

set :stage, :production

# Setting RAILS_ENV environment variable on server
set :rails_env, :production

set :normalize_asset_timestamps, %{public/images public/javascripts public/stylesheets}

role :app, %w{ubuntu@myapp_server}

set :ssh_options, {
   forward_agent: true # , auth_methods: %w(password)
}

The most important line is the one with the role. In capistrano you can define different roles and assign them to different servers. So that some deployment commands will be only executed on specific servers. You can read more to that in the official docu. For this article I keep it simple and go ahead with only 1 role and 1 server.

On the remote server(s) you have to create the application root directory. If your application has the name “myapp” it would look like this:

  /var/www/myapp
  /var/www/myapp/releases
  /var/www/myapp/shared

Make sure that the user you defined in the deploy.rb file has full read and write access to this directories. Capistrano will create for each deployment a separate directory in the “release” directory, named with the timestamp of the deployment. The last deployment will be linked to “/var/www/myapp/current”. The “current” directory is a symbolic link to the latest deployment in “/var/www/myapp/releases”.

Now you can deploy with:

cap production deploy

If you have done everything right the deployment will run through and deploy your application.

This command shows you all possible Capistrano tasks:

cap -T

If you don’t deploy on Heroku or CloudControl, than Capistrano is a big help. It makes life much easier :-)

Let me know if you have questions. Either in the comments or on Twitter.