Mongodb Replication & Recovery

In a non-trivial system, we’d normally look to have three types of database set-up. A primary would be set-up as a writeable database, one or more secondary databases would be set-up as readonly databases and finally and arbiter is set-up to be used to help decide which secondary database takes over in the case of the primary database going down.

Note: An arbiter is added to stop tied votes when deciding a secondary to take over as primary and thus should only be used where an even number of instances of mongodb exists in a replication set.

The secondary databases will be “eventually consistent” in that when data is written to the primary database it is not immediately replicated to the secondary databases, but will “eventually” be replicated.

Let’s look at an example replication set…

To set-up a replication set, we would start with a minimum of three instances of, or machines running, mongodb. As previously mentioned, this replication set would consist of a primary and secondary database and arbiter.

Let’s run three instances on a single machine to begin with, so we need to create three database folders, foe example

mkdir MyData\database1
mkdir MyData\database2
mkdir MyData\database3

Obviously, if all three are running on the same machine, we need to give the mongodb instances their own ports, for example run the following commands each in their own command prompt

mongod --dbpath /MyData\database1 --port 30000 --replSet "sample"
mongod --dbpath /MyData\database2 --port 40000 --replSet "sample"
mongod --dbpath /MyData\database3 --port 50000 --replSet "sample"

“sample” denotes a arbitrary, user-defined name for our replication set. However the replication set still hasn’t been created at this point. We instead need to run the shell against one of the servers, for example

Note: the sample above, showing all databases on the same machine is solely an example, obviously no production system should implement this strategy, each instance of primary, secondary and arbiter, should be run on it’s own machine.

mongo --port 30000

Now we need to create the configuration for our replication set, for example

var sampleConfiguration =
{ _id : "sample", 
   members : [
     {_id : 0, host : 'localhost:30000', priority : 10 },
     {_id : 1, host : 'localhost:40000'},
     {_id : 2, host : 'localhost:50000', arbiterOnly : true } 
   ]
}

This sets up the replication set, stating the host on port 300000 is the primary (due to it’s priority being set, in this example). The host on port 40000 doesn’t have a priority (or abiterOnly) so this is the secondary and finally we have the arbiter.

At this point we’ve created the configuration but we still need to actually initiate/run the configuration. So, again, from the shell we write

rs.initiate(sampleConfiguration)

Note: This will take a few minutes to configure all the instances which make up the replication set. Eventually the shell will return from initiate call and should say “ok”.

The shell prompt should now change to show the replication set name of the currently connected server (i.e. PRIMARY).

Now if we write data to the primary it will “eventually” be replicated to all secondary databases.

If we take the primary database offline (or worse still a fault occurs and it’s taken offline without our involvement) a secondary database will be promoted to become the primary database (obviously in our example we only have one secondary, so this will take over as the primary). If/when the original primary comes back online, it will again become the primary database and the secondary will, of course, return to being a secondary database.

Don’t forget you can use

rs.help()

to view help for the various replication commands.