MongoDB Map & Reduce with Date filter

We are using MongoDB as primary DB at VersionEye, together with MongoID. Software package is a document in the “products” collection. These products collections has a subcollection with “versions”. Assume we want to know how many versions/artifacts existed for a given language to a given time?

That is not a simple query in MongoDB. This kind of queries can be handled with Map & Reduce. With Map & Reduce you can execute JavaScript on DB Level. Here is the current solution:



border = until_date.at_midnight + 1.day

map = %Q{
  function() {
    if ( this.versions == null || this.versions.count == 0 ) return;

    that_day = new ISODate("#{border.iso8601}");
    for (var version in this.versions){
      created = this.versions[version].created_at
      if (created != null && created.getTime() < that_day.getTime()){
        emit( this.versions[version]._id, { count: 1 } );
      }
    }
  }
}

reduce = %Q{
  function(key, values) {
    var result = { count: 0 };
    values.forEach(function(value) {
      result.count += value.count;
    });
    return result; 
  }
}

Product.where(:language => language, :created_at.lt => border ).map_reduce(map, reduce).out(inline: true)

The tricky part was this line:

that_day = new ISODate("#{border.iso8601}");

To find out how to convert a Ruby Date object into the JavaScript Date object.

Otherwise you have to know that even through you are iterating over a versions collection you can not access the version object through “version”! You have to access it this way:

this.versions[version]

Otherwise it works fine 🙂

MongoDB could not restore backup because of “key too large to index” error

Recently I made a db dump on a MongoDB 2.4 server like this:

mongodump --db veye_dev

And I tried to restore it on a MongoDB 2.6 server like this:

mongorestore dump/veye_dev

Unfortunately at some point the restore process failed with this error message:

mongo error: "Btree::insert: key too large to index, failing

There are some restrictions for the size of the index in MongoDB 2.6. Luckily I found the issue on MongoDBs ticket system. The import can succeed if the mongod process is running with the parameter “failIndexKeyTooLong=false”. Just start it like that:

sudo mongod --setParameter failIndexKeyTooLong=false

And now execute the mongorestore again. That worked for me.

MongoID Lessons Learned

I am using MongoID to access MongoDB from a Ruby on Rails application. It is a good library. But there are some things I found out during the project, I want to share here. Nothing bad. Just some behaviors should now about.

Case Insensitive Search
There are different ways to write queries with MongoID. One pretty cool feature is that you can use regex for you queries. If you want to have all users who are starting with “mike”, you can write something like this:

query = User.where(name: /^#{name}/)

That is very useful. You can also make easily a case insensitive search by adding a “i” to the end of the regex.

query = User.where(name: /^#{name}/i)

Very useful feature. But you should know that the case insensitive search is 50% slower than the default regex search. At least that is what I measured. Funny that such a small “i” can have such a big impact 🙂

One fast workaround is to store the names twice. One times regular and one times lowercased. In that way you can execute the fast query on the lowercased column and in the UI you can show the normal name.

Returning an Empty Query
Sometimes you want to build together a criteria object dynamically. And sometimes it happens that you just return an empty criteria object. Something like this here:

Mongoid::Criteria.new(User)

I don’t know why but I assumed that this would return an empty criteria list with 0 users. But indeed it is a empty criteria and it returns all users from the Database 🙂

This is the code which returns really 0 users:

Mongoid::Criteria.new(User, {_id: -1})

This is a criteria object with the constraint “_id = -1”. And because there is no object in the collection with ID equal -1 it returns 0 Users.

Failed to connecto to primary node

If you try to connect to MongoDB Replica Set via MognoID and you get this error message here:

/opt/local/lib/ruby1.9/gems/1.9.1/gems/mongo-1.5.2/lib/mongo/repl_set_connection.rb:165:inconnect': Failed to connect to primary node. (Mongo::ConnectionFailure) from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongo-1.5.2/lib/mongo/repl_set_connection.rb:500:insetup'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongo-1.5.2/lib/mongo/repl_set_connection.rb:144:in initialize' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid/config/replset_database.rb:24:innew'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid/config/replset_database.rb:24:in configure' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid/config.rb:316:inconfigure_databases'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid/config.rb:119:in from_hash' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid/config.rb:136:inload!'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/mongoid-2.3.4/lib/mongoid.rb:147:in load!' from /Users/reiz/workspace/versioneye/versioneye/config/application.rb:33:inclass:Application'
from /Users/reiz/workspace/versioneye/versioneye/config/application.rb:18:in <module:Versioneye>' from /Users/reiz/workspace/versioneye/versioneye/config/application.rb:17:in'
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/railties-3.1.0/lib/rails/commands.rb:52:in require' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/railties-3.1.0/lib/rails/commands.rb:52:inblock in '
from /opt/local/lib/ruby1.9/gems/1.9.1/gems/railties-3.1.0/lib/rails/commands.rb:49:in tap' from /opt/local/lib/ruby1.9/gems/1.9.1/gems/railties-3.1.0/lib/rails/commands.rb:49:in'
from script/rails 
in require' from script/rails:6:in'

Than you should check out this link here: https://github.com/mongoid/mongoid/issues/1783#issuecomment-4319677Durran Jordan helped me to solve problem. Thx for that!

Rails + MongoDB ReplicaSet Configuration

This article shows how to configure a Ruby on Rails app for a MongoDB ReplicaSet. How to set up a MongoDB ReplicaSet I described here: MongoDB ReplicaSet Tutorial. And how to configure Rails to work together with MongoDB is described here: Rails + MongoDB Quickstart Tutorial.

If you want to connect to a ReplicaSet you just have to change your configuration a little bit. This is how your mongoid.yml file should look like:

production:
  database: mydb_prod
  hosts:
    - - mongonode1:1222
    - - mongonode2:1222
    - - mongonode3:1222
  read: :secondary

“mongonode1”, “mongonode2” and “mongonode3” should be mapped to a real ip address, of course! Usually you do that in “/etc/hosts”.

You don’t have to define a PRIMARY! The driver will figure out which of them is the PRIMARY. I like this! You also don’t need a load balancer. Because all nodes are known by the driver. And so the driver will figure it out.

The property “read” can have 2 values: [“:secondary”, “:primary”]. “:secondary” means that read operations can also be routed to secondary nodes. “:primary” means that all read operations will be routed to the PRIMARY node.

MongoDB ReplicaSet Tutorial

MongoDB is a document-oriented database. One cool feature is that you can easily set up a MongoDB cluster, a Replica Set. A Replica Set has at least 3 nodes. Show here on the image from the MongoDB docu.

In a Replica Set there is always one Master! All other nodes are “SECONDARY” nodes! Write operations have always to go to the “PRIMARY” (Master) Node. Read operations can also go to SECONDARY nodes.

If you want to start a mongod process as part of a Replica Set you can use this parameters:

./bin/mongod --port <PORT> --bind_ip <IP-ADRESS> --replSet <REPL-NAME> --rest

for example:

./bin/mongod --port 1222 --bind_ip 192.168.0.101 --replSet myreplset --rest

You can do that on 3 different servers, or more if you want. Than log in into the server which should be the first PRIMARY. Here you have to start the mongo shell and initiate the replica set.

rs.initiate()

That is initiating the replica set. Now the replica set is running with 1 node. Now you have to add the other nodes.

rs.add(“192.168.0.102:1222”)
rs.add("192.168.0.102:1222")

All right. Now the replica set is running with 3 nodes. Now you can open a browser and navigate to the ip address of the PRIMARY node:

http://192.168.0.101:1222/_replSet

Because we started the mongod process with the parameter “–rest” we can see now a small web application showing us the status of our replica set. Here we should see 3 Members. 1 PRIMARY and 2 SECONDARY.

Now you can connect to the PRIMARY via the mongo shell console and add a document to the database. For example:

mongo --host 192.168.0.101 --port 1222 
> use myfirstdb
> db.users.save ( {name : "Hans"} )

If you connect now to a SECONDARY you will see the document there.

mongo --host 192.168.0.102 --port 1222 
> use myfirstdb
> db.users.find()

If the PRIMARY goes down, there is an election in the replica set. So that one of the SECONDARIES become the new PRIMARY. When the old PRIMARY is again up and running, he will sync with the other nodes and become again the current PRIMARY.

I have setup a replica set for an Ruby on Rails application. And I played around with it. I just rebooted randomly some of the nodes. But my Ruby on Rails was still available could deliver the data. That is pretty cool! 🙂

You can read more about Replica Set Configuration here: MongoDB Replica Set Configuration.

 

MongoID read_secondary deprecated

OK. If you are using Rails + MongoID and you get this error message here:

read_secondary options has now been deprecated and will be removed in driver v2.0. Use the :read option instead.

Than you just have to change your configuration. Just replace “read_secondary” with “read”.