Inject SSH pub key to Vagrant image

Usually if you create a Vagrant VM an insecure private key gets injected into the VM, which is located at ~/.vagrant.d/insecure_private_key. In Ansible you can reference that key to ensure a passwordless login to the VM. Since Vagrant 1.8.5 this doesn’t work anymore, because of security reasons. That’s why I use now this shell provisioner with a bit Ruby code to inject my public SSH key to the VM:

config.vm.provision "shell" do |s|
  ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
  s.inline = <<-SHELL
    echo #{ssh_pub_key} >> /home/ubuntu/.ssh/authorized_keys
    echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
    apt-get -y install python-simplejson
  SHELL
end

The 2nd line is pure Ruby code. It reads the public SSH key from the default .ssh directory from the home directory and stores the content in the ssh_pub_key variable. The first 2 lines of the shell provisioner are injecting the SSH key to the authorized keys for the user ubuntu and root.

With that the VM is build together with my own public SSH key and I can login to the VM via SSH without entering a password. That makes it also super easy to handle the VM later with Ansible.

MongoDB Map & Reduce with Date filter

We are using MongoDB as primary DB at VersionEye, together with MongoID. Software package is a document in the “products” collection. These products collections has a subcollection with “versions”. Assume we want to know how many versions/artifacts existed for a given language to a given time?

That is not a simple query in MongoDB. This kind of queries can be handled with Map & Reduce. With Map & Reduce you can execute JavaScript on DB Level. Here is the current solution:



border = until_date.at_midnight + 1.day

map = %Q{
  function() {
    if ( this.versions == null || this.versions.count == 0 ) return;

    that_day = new ISODate("#{border.iso8601}");
    for (var version in this.versions){
      created = this.versions[version].created_at
      if (created != null && created.getTime() < that_day.getTime()){
        emit( this.versions[version]._id, { count: 1 } );
      }
    }
  }
}

reduce = %Q{
  function(key, values) {
    var result = { count: 0 };
    values.forEach(function(value) {
      result.count += value.count;
    });
    return result; 
  }
}

Product.where(:language => language, :created_at.lt => border ).map_reduce(map, reduce).out(inline: true)

The tricky part was this line:

that_day = new ISODate("#{border.iso8601}");

To find out how to convert a Ruby Date object into the JavaScript Date object.

Otherwise you have to know that even through you are iterating over a versions collection you can not access the version object through “version”! You have to access it this way:

this.versions[version]

Otherwise it works fine 🙂

PDFKit – invalid byte sequence in US-ASCII

I’m using PDFKit at VersionEye to generate the PDF invoices. It’s a really awesome project. The idea behind PDFKit is that you generate the documents as HTML and CSS and then convert it to PDF. That works really well. Generating a PDF works like this:

kit = PDFKit.new(html, :footer_html => footer_file, :page_size => 'A4')

The first parameter “html” is the HTML as string. In addition to that you can give a separate path to a HTML file as footer. And of course you can choose the output format. In this case DIN A4.

That worked all really well, but sometimes I got a

invalid byte sequence in US-ASCII Exception

I found out that there was some kind of special character in the HTML. That can happen if you fill the HTML template with usernames for example, and one of the users is a French dude or even worst a Chinese dude, then you have some odd characters in your markup 🙂 But luckily there is a solution for that. You can enforce UTF-8 encoding for the string.

This line fixed it for me.

html = html.force_encoding(Encoding::UTF_8)

daemon script

This shell script runs forever and checks if the rails worker is running and if not it starts it again:

while :
do
  if ps ax | grep -v grep | grep 'rails worker' > /dev/null
  then
      echo "service running, everything is fine"
      sleep 5
  else
      echo "service is not running. Lets start again"
      cd $APP_ROOT
      $BUNDLE exec unicorn_rails -D -c $CONF
      echo "restarted on $(cat /rails/pids/unicorn.pid)"
      sleep 15
  fi
done

Geek2Geek – Centralized Logging

Last week it happened again. Geek2Geek!

Geek2Geek_1_twitter

This time we came together at Flyeralarm in Berlin to talk about centralized logging. That is an interesting topic for all companies which have to scale. As soon you have more than 1 server you need to think about how you collect and analyze your log files in a distributed system. There are a couple good solutions out there for this problem.

Jilles van Gurp did the first talk about the ELKstack. ELK stands for E = Elasticsearch, L = Logstash and K = Kibana. All three products belong to the Elasticsearch company and they work all together smoothly in harmony. Jilles showed us how they use the ELK stack at Linko to build the LinkoApp.

Geek2Geek_2_linkoapp

Jilles gave us a short intro to the technology on a couple slides before he switched to the live demo. It was very interesting to listen to his real-world experiences with the ELK stack.

Geek2Geek_3_colectd

The learning from the past couple months are, it is easy to setup but you should be careful with the Elasticsearch cluster. Don’t shut it down all at once 😉

After the first presentation the Pizza arrived and we took a little break with Pizza & Beer.

Geek2Geek_4_pizza

Lennart is THE guy behind Graylog2. He started the project a couples years ago at Jimdo. The very first version was implemented in Ruby. Graylog2 is a completely rewrite in Java. Lennart is also CoFounder of Torch, the company behind Graylog2.

Lennart gave a short intro about the history, intention and philosophy behind Graylog2.

Geek2Geek_5_Graylog2

I was impressed how much he knows about the other logging solutions, such as Logstash/Kibana and Splunk. He was not afraid to talk about feature comparisons and pros & cons of the different solutions.

Geek2Geek_6_Graylog2

Graylog2 is build for Enterprise usage. It is optimized for speed and high volume data. The interesting thing is that you can use it together with Elasticsearch and Logstash.

Many thanks for to Jilles and Lennart for the great talks. Both solutions are very interesting. If you still read logs on the server with “less” you should definitely check out these 2 great solutions!

By the way. I also tried to organize a Splunk talk, but unfortunately I couldn’t find any Splunkies willing to give a talk about Splunk at Geek2Geek.

Many thanks to Flyeralarm for sponsoring Location, Pizza and Beer! You guys are awesome!

By the way Flyeralarm just opened a new branch in Berlin. They have a really nice office. This is their meeting room for example.

Screen Shot 2014-05-22 at 12.58.18

And they are currently looking for experienced PHP developers. If you are interested you should contact Thomas.

Deployment with Capistrano 3

Capistrano is a ruby based deployment tool which executes commands in parallel on multiple remote machines, via the SSH protocol. With Capistrano you can deploy your Rails application on N servers with one single command from your dev machine. You even don’t need to login via SSH to your server. This command can rollout your application on N servers:

cap production deploy

And if something goes wrong you can easily rollback to the last stable deployment. Just like this.

cap production deploy:rollback

Screen Shot 2013-01-15 at 8.15.33 PM

Capistrano is pretty cool. I used already the previous version 2.X. The new version 3.X I’m using already since a couple months in production and it is super stable.

If you are deploying your Rails application to dedicated servers or instances on AWS, than Capistrano is the way to go!

Before you start with Capistrano, you have to implement SSH with authentification keys instead of password. In that way you can just login to your server with a simple “ssh user@server” without password. That is possible if your public ssh certificates are on the server. In that way the server “knows” you.

First of all you need to add the Gem to your Gemfile.

gem 'capistrano'

And if you are using Rails and Bundler you want to add this 2 lines as well.

gem 'capistrano-rails' , '~> 1.1.1'
gem 'capistrano-bundler', '~> 1.1.2'

Now you have to run bundler, to install the packages.

bundle install

As next step you have to capify your rails project. Just run:

capify .

That will create some files in your project.

[add] writing './Capfile'
[add] writing './config/deploy.rb'
[add] writing './config/deploy/production.rb'
[add] writing './config/deploy/staging.rb'
[add] writing './config/deploy/test.rb'
[done] capified!

In the Capfile you can require some capistrano packages. For a Rails App it will look like this.

require 'capistrano/setup'
require 'capistrano/deploy'
require 'capistrano/bundler'
require 'capistrano/rails'
require 'capistrano/rails/assets'
require 'capistrano/rails/migrations'

# Loads custom tasks from `lib/capistrano/tasks' if you have any defined.
Dir.glob('lib/capistrano/tasks/*.cap').each { |r| import r }

In Capistrano 3 most of the magic happens in the deploy.rb file, which is the central configuration file for Capistrano. In general it fetches the current code from your Git server, runs bundler, rake db:migrate, precompiles your assets and starts/restarts the ruby app server.

Here is my deploy.rb with some additional comments.


# Force rake through bundle exec
SSHKit.config.command_map[:rake] = "bundle exec rake"

# Force rails through bundle exec
SSHKit.config.command_map[:rails] = "bundle exec rails"

set :migration_role, 'app' # Defaults to 'db'
set :assets_roles, [:app] # Defaults to [:web]

# The name of your application
set :application, 'myapp'

# Configuration for the source control management system
set :scm , :git
set :repo_url, 'git@github.com:myorga/myapp.git'
set :branch , "master"

# This forwards the user agents and uses the local
# user for the git authentification.
set :ssh_options, {:forward_agent => true}

# User on remote server
set :user , "ubuntu"

# Application root directory on remote server
set :deploy_to , '/var/www/myapp'

# Shared directories over different deployments
set :linked_dirs, %w(pids log)

# Configuring capistrano log output
set :format , :pretty
set :log_level, :info # :debug :error :info

# Keeps the last 5 deployments on the server for rollback scenarios
set :keep_releases, 5

namespace :deploy do

 desc 'Start application'
  task :start do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh start"
  end
 end

 desc 'Stop application'
  task :stop do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh stop"
  end
 end

 desc 'Restart application'
  task :restart do
   on roles(:app), in: :sequence, wait: 5 do
   execute "/etc/init.d/unicorn.sh restart"
  end
 end

 after :finishing, 'deploy:restart'
 after :finishing, 'deploy:cleanup'

end

The script for starting and stoping unicorn you can find here: https://robert-reiz.com/2012/02/29/running-unicorn-as-a-service-on-debian-linux/.

In Capistrano you have different environments. For example “test”, “staging” and “production”. You can define as much as you want. Each environment has his own configuration file under “config/deploy/”. For example “config/deploy/production.rb”. Which might look like this:

set :stage, :production

# Setting RAILS_ENV environment variable on server
set :rails_env, :production

set :normalize_asset_timestamps, %{public/images public/javascripts public/stylesheets}

role :app, %w{ubuntu@myapp_server}

set :ssh_options, {
   forward_agent: true # , auth_methods: %w(password)
}

The most important line is the one with the role. In capistrano you can define different roles and assign them to different servers. So that some deployment commands will be only executed on specific servers. You can read more to that in the official docu. For this article I keep it simple and go ahead with only 1 role and 1 server.

On the remote server(s) you have to create the application root directory. If your application has the name “myapp” it would look like this:

  /var/www/myapp
  /var/www/myapp/releases
  /var/www/myapp/shared

Make sure that the user you defined in the deploy.rb file has full read and write access to this directories. Capistrano will create for each deployment a separate directory in the “release” directory, named with the timestamp of the deployment. The last deployment will be linked to “/var/www/myapp/current”. The “current” directory is a symbolic link to the latest deployment in “/var/www/myapp/releases”.

Now you can deploy with:

cap production deploy

If you have done everything right the deployment will run through and deploy your application.

This command shows you all possible Capistrano tasks:

cap -T

If you don’t deploy on Heroku or CloudControl, than Capistrano is a big help. It makes life much easier 🙂

Let me know if you have questions. Either in the comments or on Twitter.

Configuring host and port for Selenium/Capybara

I’m using Capybara and Selenium together with RSpec to test the Web Interface for VersionEye. That works very well. For an integration test I needed a callback on localhost:3000/auth/*. By Default Capybara is starting the tests on an odd host and port name to avoid conflicts with locahost:3000, which is the default host and port for Rails Apps in development. It took me something like 30 min. to find out how to force capybara to run all Tests on localhost:3000. That’s why I think it’s worth blogging 🙂

Either in your `spec_helper.rb` or in `spec/support/capybara.rb` you will have this imports:

require 'capybara/rails'
require 'capybara/rspec'
require 'capybara/firebug'

Below that you can configure Capybara like this.

Capybara.app_host = "http://localhost:3000"
Capybara.server_host = "localhost"
Capybara.server_port = "3000"

That worked for me.

has_secure_password with Rails 4.1

I just started a new project with Rails 4.1 and explored the has_secure_password feature. Really awesome stuff!

I hope you are not storing passwords in clear text to your database! You should always store some kind of hashed values instead of clear text passwords. In case somebody steals your database he/she still doesn’t has the passwords.

There are a couple good tutorials how to hash and store passwords in a secure way. I implemented it a couples times by myself with Ruby.

A more sophisticated solution is devise, a very robust gem for security and authentication.

However. Since Rails 3.1 you can take advantage of has_secure_password. This mechanism in Rails takes care of password validation and encryption. It requires a field ‘password_digest’ in your model, where it will store the encrypted password. Let’s generate a simple model.

rails g model user username:string password_digest:string

Let’s add this line to the user model.

has_secure_password

This will add an attribute password and password_confirmation to your model. This 2 fields are now part of your model but not part of the database schema! Because we don’t want to store cleartext passwords.

Let’s add some tests.

require 'spec_helper'

describe User do

  it "fails because no password" do
    User.new({:username => "hans"}).save.should be_false
  end

  it "fails because passwrod to short" do
    User.new({:username => "hans", :password => 'han'}).save.should be_false
  end

  it "succeeds because password is long enough" do
    User.new({:username => "hans", :password => 'hansohanso'}).save.should be_true
  end

end

3 very simple tests. Persisting a new user without password should fail. Persisting a new user with too short password should fail as well. And creating a new user with a long password should succeed. If you run this tests with RSpec the 2nd test will fail. By default Rails doesn’t has a validation for the length of the password. So let’s add it to our user model.

class User < ActiveRecord::Base

  has_secure_password
  validates :password, :length => { :minimum => 5 }

end

If you run the tests again, they will be green. If you take a look into your database you will see that the user table/collection has a column password_digest with a very cryptical value. But there are no columns for password! That’s exactly what we wanted.

Now lets do the authentication. Assume a new user signed up at your portal and now he wants to login. This is how you authenticate him.

user = User.find_by_username("USERNAME").authenticate("CLEAR_TEXT_PASSWORD")

If the username and the clear text password from the HTTP Request is correct it will return a valid user from the database. Otherwise it will return nil!

has_secure_password validates the password on creation time, the very first time you persist it. It doesn’t check the password field after that, for updates for example. And that’s OK. Because that means you can load a user later from db, change it and persist it without knowing the password.

Another feature of this mechanism is password confirmation. has_secure_password also adds an attribute password_confirmation to your model. This attribute gets only validated if it’s not nil. If it’s not nil it must be equal to the password attribute. Let’s add 2 more tests for that one.

  it "fails because password confirmation doesnt match" do
    User.new({:username => "hans",
      :password => 'hansohanso',
      :password_confirmation => 'aa'}).save.should be_false
  end

  it "succeeds because password & confirmation match" do
    User.new({:username => "hans",
      :password => 'hansohanso',
      :password_confirmation => 'hansohanso'}).save.should be_true
  end

To make this tests pass you have to add one more line to the model.

class User < ActiveRecord::Base   
  has_secure_password   
  validates :password, :length => { :minimum => 5 }
  validates_confirmation_of :password
end

The line “validates_confirmation_of :password” will check the password confirmation.

Rails doesn’t force you to have a password confirmation for your model, but if you want it you can turn it on.

I really like this feature because it saves me a lot of code and development time. And for most applications this is really enough.

Let me know what you think about this, either in the comments or on Twitter.

Comparison of Application Level Package Managers

I have to work with a lot (9) of different package managers at my daily work at VersionEye. Part of our mission is it to make manual updating of dependencies extinct, because it’s a manual and time consuming task which nobody enjoys. That’s why we are building a notification system for open source software libraries to make Continuous Updating easy and fun. And since we support several programming languages – 8 at this point! – I get to write crawlers and parsers for all of them. To give you a better overview over the strengths and weaknesses of these package managers, I picked the most popular one for each language and will compare them. The contenders are:

  • RubyGems / Bundler (Ruby)
  • PIP / PyPI (Python)
  • Packagist / Composer (PHP)
  • NPM (Node.JS)
  • Bower (JS, CSS, HTML)
  • CocoaPods (Objective-C)
  • Maven (Java)
  • Lein (Clojure)

Comparison Matrix

Here are the results of my comparison.

Comparison of Package Managers

You can read the complete article on the VersionEye Blog and follow the discussion on hacker news and Reddit.

Semantic Versioning

Do you know semantic versioning? You should! It describes how to name version numbers. Check it out here: semver.org.

This is the pattern it describes:

MAJOR.MINOR.PATCH

MAJOR version when you make incompatible API changes 
MINOR version when you add functionality in a backwards-compatible manner
PATCH version when you make backwards-compatible bug fixes

The cool thing here is that on the version number itself you can already see how big the changes are in the new release. A typical semantic version number is this:

3.2.1 

Let’s say I am using version “3.2.0” in my project. Now I can immediately see that the new version of the package “only” contains a patch. That means for me that I can update without worrying. On the other side if this version comes out:

4.0.0 

And I am using version “3.2.1” of the package in my project, I can now immediately see that this update will very likely break my build! In this case I have to look into the change logs and follow the migration paths.

Semantic versioning even addresses alpha and beta versions. If you are working on version “4.0.0” but it’s not quiet ready but you wanna release anyway something, you can name it like this:

4.0.0-a

That means that is version “4.0.0” alpha. And this here would be the beta version:

4.0.0-b 

Another convention is “RC”, that means “Release Candidate”. You can use it like this:

4.0.0-RC1 
4.0.0-RC2

The complete order over all of them is like this, the highest and newest version is on the top.

4.0.0
4.0.0-RC2 
4.0.0-RC1
4.0.0-b
4.0.0-a

That basically means 4.0.0 > 4.0.0-RC2 > 4.0.0-RC1 > 4.0.0-b > 4.0.0-a.

I’m the author of naturalsorter. That is an open source library to sort semantic version numbers in the correct way.

HTTP_REFERER for RSpec is missing

Currently got this error message after executing my RSpec tests:

ActionController::RedirectBackError:
 No HTTP_REFERER was set in the request to this action, so redirect_to :back could not be called successfully. If this is a test, make sure to specify request.env["HTTP_REFERER"].

The error message and Stackoverflow tells you to set request.env[“HTTP_REFERER”]. I did that:

request.env["HTTP_REFERER"] = "/signin"

But that didn’t helped. Instead of that I set the HTTP_REFERRER directly in the post. Here is the snippet from my test code:

post "/sessions", {:session => {:email => user.email, :password => user.password}}, {"HTTPS" => "on", 'HTTP_REFERER' => '/signin'}

That fixed my problem.

Testing AJAX with Capybara and Selenium

In the past days I migrated my tests from WebRat to Capybara and I wrote a couple new acceptance tests with RSpec, Capybara and the selenium-webdriver. All in one it’s pretty cool.

You can just keep writing your acceptance tests as usual with RSpec and Capybara. Here is a small example.

describe "Empty Payment History", :js => true do
  it "shows correct message when there's no history" do
    visit "/settings/payments"
    have_css "#payment_history", text: "You dont have any Payment history"
  end
end

This test is sending a request to “/settings/payments” and is testing if on the page the CSS class “payment_history” occurs. Pretty easy. This you could also do with WebRat. But the magic is in the first line. “:js => true” that tells Capybara that it should execute the test with the selenium-webdriver. That will basically start your browser (Firefox) and you can see how the test gets executed. This is not possible with WebRat.

It’s just getting a little bit tricky if you do a lot of AJAX requests on the page. In the Capybara documentation they write that you should use the “find” methods, because they wait until an element appears on the page. That didn’t worked out for me. The test always failed. Somebody on Stackoverflow wrote that this construct would work for AJAX pages.

within('#payment_history') do
  page.all('a',  :text => 'View receipt')
end

And he was right! This test always succeeded. ALWAYS! Even if the test was completely wrong! 😀 Yeah. Very funny! *LOL* Seems like a bug. I did a little bit more research and finally I found a solution which worked correctly.

using_wait_time 10 do
  page.should have_content("View receipt")
end

With “using_wait_time” you can force Selenium to wait for a couple seconds, until the AJAX requests are done. That finally worked out and the tests are working now correctly.

Don’t use Webrat anymore

Webrat is a testing Framework for Ruby. In general it is pretty cool, but DEAD! The last version was released more than 2 years ago. And there are only 200 GEMs referencing it.

Screen Shot 2013-05-19 at 1.58.05 PM

The newest PullRequests on GitHub are 1 year old! Not an active project! Don’t use dead projects!

I moved my tests to Capybara. This project is more active. VersionEye shows that the newest version was released 1 month ago and there are almost 2300 GEMs referencing it.

Screen Shot 2013-05-19 at 2.01.56 PM

And the newest PullRequests on GitHub are only 4 days old. That all shows me that it’s still active and I feel better if I know that there developers fixing bugs 🙂

Moving Tests from Webrat to Capybara

I one of my applications I had a bunch of tests written with RSpec and Webrat. Unfortunately it seems that Webrat is not longer maintained actively anymore. That’s why it is a good decision to move to Capybara, an active Test Framework for Ruby.

The Migration was so far pretty smooth. Most time it was a simple replacement of code. Most time I had to replace something like this:

response.should contain("STRING_TO_TEST")

With this :

response.body.should match("STRING_TO_TEST")

Otherwise assertions like this caused problems:

response.status.should == 401

That worked again as soon I wrote it like this here:

response.status.should eq(401)

2 times I got the error message that response is nil. That I could resolve by assigning it explicitly.

response = post @project_uri, {:api_key => @user_api.api_key}, "HTTPS" => "on"
response.status.should eq(403)

Otherwise it worked out pretty good.

Testing SSL with Capybara and Selenium

I am using Capybara with Selenium as JS engine to write acceptance tests for a Ruby on Rails application. In some controllers I am forcing SSL with the “force_ssl” filter from Rails. By running the tests with Selenium this caused some problems. Selenium is launching Firefox and calls the URL https://127.0.0.1:3000/signin. Of course there is no SSL for localhost! This causes an error and the test fails.

I did some research for this. There are some tickets on GitHub and StackOverflow to this. but nothing what actually solves the core problem. For right now I just solved it, with running the filter only in production mode and not in test mode.

force_ssl if Rails.env.production?

Now Firefox is launching on http://127.0.0.1:3000/signin.

undefined method `visit’ for RSpec with Capybara

I just started to write an acceptance test with capybara. I followed the code example on the GitHub Page and I got this odd error:

Failure/Error: visit 'http://127.0.0.1:3000/signin'
 NoMethodError:
 undefined method `visit' for #<RSpec::Core::ExampleGroup::Nested_1::Nested_1:0x007fda48e0f680>

I placed my test in “spec/requests”. After some research I found out that the new Capybara GEM expects the test to be in “spec/features”. After I moved my test file to the right directory it worked perfectly.

How to mock the GitHub API

If you write code against the GitHub API you have to mock it somehow. Otherwise it can be tricky to test it. Here is how I did it. I found this great GEM FakeWeb. With this GEM you can fake Web Requests. It allows you to register URLs with fix responses. Her is an example:

FakeWeb.register_uri(:get, "https://github.com/", :body => "Awesome")

If you do now a HTTP request to github.com you will get “Awesome” as response.

Net::HTTP.get(URI.parse("http://github.com"))
=> "Awesome"

Here is how I mocked the GitHub OAuth login precess. I just registered this 2 URLs:

FakeWeb.register_uri(:get, "https://github.com/login/oauth/access_token?client_id=#{Settings.github_client_id}&client_secret=#{Settings.github_client_secret}&code=123", :body => "token=token_123")
FakeWeb.register_uri(:get, "https://api.github.com/user?access_token=token_123", :body => "{\"id\": 1, \"email\": \"test@test.de\"}")

And that’s it. With that you can now test your callback.


get "/auth/github/callback?code=123"
assert_response :success

I am using RSpec for testing. Let me know if you have questions.

Determine scopes for given GitHub Token

If you have a given token from GitHub and you want to know which scopes it has you have to check the Headers. Just use the token for any resource on the GitHub API and double check the headers of the response. In the headers the “x-oauth-scopes” field tells you the which scopes the token has.

Here is a small example with Ruby and HTTParty.


response = HTTParty.get("https://api.github.com/user?access_token=#{token}", :headers => {"User-Agent" => A_USER_AGENT } )
response.headers['x-oauth-scopes']

If you HTTParty, party hard! 😉