Category Archives: MongoDB

Docker, spring boot and mongodb

I wanted to create a docker build to run a spring boot based application along with it’s mongodb database which proved interesting. Here’s what I fouand out.

Dockerfile

To begin with, we need to create a docker configuration file, named Dockerfile. This will be used to create a docker image which we will host a spring boot JAR. Obviously this will require that we create an image based upon a Java image (or create our own). So let’s base our image on a light weight open JDK 1.8 image, openjdk:8-apline.

Below is an example Dockerfile

FROM openjdk:8-alpine
MAINTAINER putridparrot

RUN apk update

ENV APP_HOME /home
RUN mkdir -p $APP_HOME

ADD putridparrot.jar $APP_HOME/putridparrot.jar

WORKDIR $APP_HOME

EXPOSE 8080

CMD ["java","-Dspring.data.mongodb.uri=mongodb://db:27017/","-jar","/home/putridparrot.jar"]

The above will be used to create our image, based upon openjdk:8-apline, we then run an update (in case its required) we create an environment variable for our application folder (we’ll simply install our application into /home, but it could be more specific, such as /home/putridparrot/app or whatever), we then create that folder.

Next we ADD our JAR, so this is going to in essence copy our JAR from our host machine into the docker image so that when we run the image it’ll also include the JAR within that image.

I’m also exposing port 8080 as my JAR will be exposing port 8080, hence when we interact with port 8080 docker will proxy it through to our JAR application.

Finally we add a command (CMD) which will run when the docker image is run. So in this case we run the executable JAR passing in some configuration to allow it to access a mongodb instance (which will be running in another docker instance.

Note: The use if the db host is important. It need not be named db but the name needs to be the same as we’ll be using within the upcoming docker-compose.yml file

Before we move onto the mongodb container we need to try to build our Dockerfile, here’s the commands

docker rmi putridparrot --force
docker build -t putridparrot .

Note: These commands should be run from the folder containing our Dockerfile.

The first command will force remove any existing images and the second command will then build the docker image.

docker-compose.yml

So we’ve created a Dockerfile which will be used to create our docker image but we now want to create a docker-compose file which will be used to run both our newly created image and then a mongodb image and by use of commands such as depends_on and the use of the name of our mongo service (which we used within the JAR execution command). Here’s the docker-compose.yml file

version: "3.1"

services:
  putridparrot:
    build: .
    restart: always
    ports: 
      - "8080:8080"
    depends_on:
      - db

  db:
    image: mongo
    volumes:
      - ./data:/data/db
    ports:
      - "27017:27017"
    restart: always

The first line simply sets the version of the docker-compose syntax, in this case 3.1. This is followed by the services which will be run by docker-compose. The first service listed is our JAR’s image. In fact we do not use the image, we rebuild it (if required) via the build command – this looks for a Dockerfile in the supplied folder (in this case we assume it’s in the same folder as the docker-compose.yml file). We then set up the port forwarding to the docker image. This service depends on a mongodb running, hence the depends_on option.

The next service is our mongodb image. As mentioned previously, the name here can be whatever you want, but to allow our other service connect to it, should be used within our JAR configuration. Think of it this way – this name is the hostname of the mongodb service and docker will handle the name resolution between docker instances.

Finally, we obviously use the mongo image, and we want to expose the ports to allow access to the running instance and also store the data from the mongodb on our host machine, hence allow it to be used when a new instance of this service is started.

Now we need to run docker-compose using

docker-compose up

If all goes well, this will then, possibly build a new image of our JAR, the will bring up the services. As the first service depends_on the second, it will in essence be executed once the mongodb service is up and running, obviously allow it to then connect to the database.

MongoDB revisited

As I’m busy setting up my Ubuntu server, I’m going to revisit a few topics that I’ve covered in the past, to see whether there are changes to working with various servers. Specifically I’ve gone Docker crazy and want to run these various server in Docker.

First up, let’s see what we need to do to get a basic MongoDB installation up and running and the C# client to access it (it seems some things have changes since I last did this).

Getting the server up and running

First off we’re going to get the Ubuntu server setup with an instance of MongoDB. So let’s get latest version on mongo for Docker

docker pull mongo:latest

this will simply download the latest version of the MongoDB but not run it. So our next step is to run the MongoDB Docker instance. By default the port MongoDB uses is 27017, but this isn’t available to the outside world. So we’re going to want to map this to a port accessible to our client machine(s). I’m going to use port 28000 (there’s no specific reason for this port choice). Run the following command from Ubuntu

docker run -p 28000:27017 --name my-mongo -d mongo

We’ve mapped MongoDB to the previously mentioned port and named the instance my-mongo. This will run MongoDB in the background. We can now look to write a simple C# client to access the instance.

Interactive Shell

Before we proceed to the client, we might wish to set-up users etc. in MongoDB and hence run its shell. Now running the following

docker exec -t my-mongo mongo

Didn’t quite work as expected, whilst I was placed inside the MongoDB shell, commands didn’t seem to run.

Note: This could be something I’m missing here, but when pressing enter, the shell seemed to think I was about to add another command.

To work with the shell I found it simpler to connect to the Docker instance using bash, i.e.

docker exec -t my-mongo bash

then run

mongo

to access the shell.

I’m not going to set up any users etc. at this point, we’ll just used the default setup.

Creating a simple client

Let’s fire up Visual Studio 2015 and create a console application. Then using NuGet add the MongoDB.Driver by MongoDB, Inc. Now add the following code to your Console application

public class Person
{
   public ObjectId Id { get; set; }
   public string FirstName { get; set; }
   public string LastName { get; set; }
   public int Age { get; set; }
}

class Program
{
static void Main(string[] args)
{
   var client = new MongoClient("mongodb://xxx.xxx.xxx.xxx:28000");
   var r = client.GetDatabase("MyDatabase");
   var collection = r.GetCollection<Person>("People");
   collection.InsertOne(new Person 
   { 
      FirstName = "Scooby", 
      LastName = "Doo", 
      Age = 27 
   });
}

Obviously replace the xxx.xxx.xxx.xxx with the IP address of your server (in my case my Ubuntu server box), the port obviously matches the port we exposed via Docker. You don’t need to “create” the database explicitly via the shell or a command, you can just run this code and it’ll create MyDatabase then the table People and then insert a record.

Did it work?

Hopefully your Console application just inserted a record. There should have been no timeout or other exception. Ofcourse we can use the Console application, for example

var client = new MongoClient("mongodb://xxx.xxx.xxx.xxx:28000");
var r = client.GetDatabase("MyDatabase");
var collection = r.GetCollection<Person>("People");
foreach (var p in collection.FindSync(_ => true).ToList())
{
   Console.WriteLine($"{p.FirstName} {p.LastName}");                
}

I’m using the synchronous methods to find and create the list, solely because my Console application is obviously pretty simple, but the MongoDB driver library offers Async versions of these methods as well.

The above code will write out Scooby Doo as the only entry in our DB, so all worked fine. How about we do the same thing using the shell.

If we now switch back to the server and if its not running, run the MongoDB shell as previously outlined. From the shell run the following

use MyDatabase
db.People.find()

We should now see a single entry

{ 
  "_id" : ObjectId("581d9c5065151e354837b8a5"), 
  "FirstName" : "Scooby", 
  "LastName" : "Doo", 
  "Age" : 27 
}

Just remember, we didn’t set this instance of MongoDB up to use a Docker Volume and hence when you remove the Docker instance the data will disappear.

So let’s quickly revisit the code to run Mongo DB within Docker and fix this. First off exit back to the server’s prompt (i.e. out of the Mongo shell and out of the Docker bash instance).

Now stop my-mongo using

docker stop my-mongo

You can restart mongo at this point using

docker start my-mongo

and your data will still exist, but if you run the following after stopping the mongo instance

docker rm my-mongo

and execute Mongo again the data will have gone. If we add a volume command to the command line argument, and so we will execute the following

docker run -p 28000:27017 -v mongodb:/data/mongodb --name my-mongo -d mongo

the inclusion of the /v will map the mongodb data (/data/mongodb) to the volume on the local machine named mongodb. By default this is created in /var/lib/docker/volumes, but ofcourse you could supply a path to an alternate location

Remember, at this point we’re still using default security (i.e. none), I will probably create a post on setting up mongo security in the near future

Mongodb Replication & Recovery

In a non-trivial system, we’d normally look to have three types of database set-up. A primary would be set-up as a writeable database, one or more secondary databases would be set-up as readonly databases and finally and arbiter is set-up to be used to help decide which secondary database takes over in the case of the primary database going down.

Note: An arbiter is added to stop tied votes when deciding a secondary to take over as primary and thus should only be used where an even number of instances of mongodb exists in a replication set.

The secondary databases will be “eventually consistent” in that when data is written to the primary database it is not immediately replicated to the secondary databases, but will “eventually” be replicated.

Let’s look at an example replication set…

To set-up a replication set, we would start with a minimum of three instances of, or machines running, mongodb. As previously mentioned, this replication set would consist of a primary and secondary database and arbiter.

Let’s run three instances on a single machine to begin with, so we need to create three database folders, foe example

mkdir MyData\database1
mkdir MyData\database2
mkdir MyData\database3

Obviously, if all three are running on the same machine, we need to give the mongodb instances their own ports, for example run the following commands each in their own command prompt

mongod --dbpath /MyData\database1 --port 30000 --replSet "sample"
mongod --dbpath /MyData\database2 --port 40000 --replSet "sample"
mongod --dbpath /MyData\database3 --port 50000 --replSet "sample"

“sample” denotes a arbitrary, user-defined name for our replication set. However the replication set still hasn’t been created at this point. We instead need to run the shell against one of the servers, for example

Note: the sample above, showing all databases on the same machine is solely an example, obviously no production system should implement this strategy, each instance of primary, secondary and arbiter, should be run on it’s own machine.

mongo --port 30000

Now we need to create the configuration for our replication set, for example

var sampleConfiguration =
{ _id : "sample", 
   members : [
     {_id : 0, host : 'localhost:30000', priority : 10 },
     {_id : 1, host : 'localhost:40000'},
     {_id : 2, host : 'localhost:50000', arbiterOnly : true } 
   ]
}

This sets up the replication set, stating the host on port 300000 is the primary (due to it’s priority being set, in this example). The host on port 40000 doesn’t have a priority (or abiterOnly) so this is the secondary and finally we have the arbiter.

At this point we’ve created the configuration but we still need to actually initiate/run the configuration. So, again, from the shell we write

rs.initiate(sampleConfiguration)

Note: This will take a few minutes to configure all the instances which make up the replication set. Eventually the shell will return from initiate call and should say “ok”.

The shell prompt should now change to show the replication set name of the currently connected server (i.e. PRIMARY).

Now if we write data to the primary it will “eventually” be replicated to all secondary databases.

If we take the primary database offline (or worse still a fault occurs and it’s taken offline without our involvement) a secondary database will be promoted to become the primary database (obviously in our example we only have one secondary, so this will take over as the primary). If/when the original primary comes back online, it will again become the primary database and the secondary will, of course, return to being a secondary database.

Don’t forget you can use

rs.help()

to view help for the various replication commands.

MongoDB notes

This post is meant as a catch all for those bits of information which do not need a full post (ofcourse this doesn’t mean I won’t create a full post at a later date should the need arise).

Atomicity

Whilst MongoDB guarantees atomicity of a write operation as a single document level there is no such guarantee on multiple documents.

So for example if one is creating a relational style model with related data in more than one document (i.e. not an embedded relational model) then writing to the multiple documents is NOT atomic.

Indexes

Be default an index is created for the _id of a document. To create our own we use the ensureIndex method on a collection.

The MongoDB documentation states that

  • Each index requires at least 8KB od data space
  • Whilst query performance on the indexed data is increased the write operation will see a negative performance impact – therefore indexes can be expensive if there’s a high write to read ratio on documents

Overriding Default functionality

As mongodb uses JavaScript we can use JavaScript code to override some default functionality.

Let’s say for example, we want to stop users accidentally dropping a database. We can override the dropDatabase function thus

DB.prototype.dropDatabase = function() {
   print("dropDatabase disabled");
}

db.dropDatabase = DB.prototype.dropDatabase;

Loading scripts from the shell

Typing in code, such as the above override of the dropDatabase function every time, would end up being time consuming. Being that this is JavaScript it may come as no surprise to find out that the mongo shell can load script files.

Using load(‘myscript.js’);

Loading scripts at startup

In the above example, we’ve created a script which we might need to load every time we start mongo. So it’d be better if we could just load the script file automatically at startup.

So to load scripts at startup on a Windows machine, place the script file into c:\users\\.mongorc.js

Note: It’s also possible that we might want to ignore scripts at startup when debugging, in such a case we start mongo using

mongo –norc

More to come…

Indexing your MongoDB data

By default MongoDB creates an index on the _id (ObjectId) field, but we can easily add indexes to other fields.

Using the JavaScript shell

In the JavaScript shell simply call ensureIndexUsing the 10gen drivers in C#

In C# using the 10gen drivers we can create an index using the following

collection.EnsureIndex(new IndexKeysBuilder().Ascending("artist"));

where collection is

MongoCollection<CD> collection = db.GetCollection<CD>("cds");

To remove an index we can simply use

collection.DropIndex(new IndexKeysBuilder().Ascending("Artist"));

Handling case-sensitive mapping from MongoDB to a POCO

So the convention appears to be to use camel case for column/field names within MongoDB. For example if we create an entry such as db.cds.Update({artist:”Alice Cooper”}).

In C# the convention is for properties, for example, to be written in Pascal case. So we’d have something like

public class CD
{
   public ObjectId Id { get; set; }
   public string Artist { get; set; }
   public string Title { get; set; }
   public string Category { get; set; }
}

So obviously MongoDB has a field name artist and we need to map it to the property name “Artist”.

To handle this we can use the BsonElement in the MongoDB.Bson.Serialization.Attributes namespace, as per

public class CD
{
   public ObjectId Id { get; set; }
   [BsonElement("artist")]
   public string Artist { get; set; }
   [BsonElement("title")]
   public string Title { get; set; }
   [BsonElement("category")]
   public string Category { get; set; }
}

or we can set up the mappings using the following

BsonClassMap.RegisterClassMap<CD>(cm =>
{
   cm.AutoMap();
   cm.GetMemberMap(c => c.Artist).SetElementName("artist");
   cm.GetMemberMap(c => c.Title).SetElementName("title");
   cm.GetMemberMap(c => c.Category).SetElementName("category");
});

Note: we do not need to setup the Id field to any mapping as this appears to be mapped based upon it’s type.

A class map may only be registered once, we can use BsonClassMap.IsClassMapRegistered if need be to ensure this.

More information can be found at Serialize Documents with the CSharp Driver

Starting out with MongoDB

This is a post on some of the basics of using MongoDB (on Windows).

mongod.exe

So to run the mongo server we use mongod.exe, if all goes well the server will start up and state it’s waiting for connections.

mongo.exe

The JavaScript shell used as the client admin application for the mongo server is named mongo.exe.

Shell Commands

  1. use <database> is used to switch the shell to using the named <database>. If this command is not used, be default the test database will be used.
  2. show dbs can be used to get a list of all the databases stored in the instance of MongoDB.
  3. show collections lists the available collections within the selected database. This is analogous to listing the tables in an SQL database.
  4. show users lists the users added to the database. To add a user via the JavaScript Shell we can use the following
    db.addUser({user: "UserName", pwd: "Password", roles: ["readWrite", "dbAdmin"]})
    

    Beware, as the roles are not validated against any schema or role list, so you can assign roles names which are not recognised as specific mongo roles, i.e. I made a typo for dbAdmin and it’s accepted as a role.

  5. db is the database object, so we can execute commands such as db.stats()
  6. db.myobjects let’s assume we created a database and added some objects into a collection named myobjects. Because we’re using a JavaScript shell we can access the myobjects collection off of the db object. So for example we can now using db.myobjects.count() to get a count of the number of “rows” in the collection. Equally we can use commands such as db.myobjects.find() to list the rows in the collection.

Example workflow

This is a simple example of using mongo.exe to create a database and some data, then querying the database.

For this example we’ll create a CD database which will be used to contain information on all the CD’s we own.

  1. If it’s not already running, run mongod.exe
  2. In a separate command prompt, run mongo.exe
  3. We’ll create the database by typing use CDDatabase, remember it will not be fully created until some data is added to it
  4. We can check which database we’re currently viewing by typing db. This will show us we’re in the CDDatabase, but if you now type show dbs you’ll notice the database still doesn’t actually exist in the list of available databases
  5. We can now create our first entry in the database. Notice, unlike an SQL database we do not create tables etc. So to create an entry we can type the following
    a = {artist:"Alice Cooper", title:"Billion Dollar Babies", category:"Rock"}
    db.cds.insert(a)
    

    Note: we don’t need to create a variable as we’ve done here, we can just replace ‘a’ in the insert method with the right hand side of the = operator as per

    db.cds.insert({artist:"Alice Cooper", title:"Billion Dollar Babies", category:"Rock"})
    

    So we should have now successfully added an entry into the “cds” collection. Notice again we do not need to define the “cds” collection first, it’s dynamically created when we start using it.

    We can use the command show collections to display a list of the collections that make up the datavase.

    Feel free to add more data if you wish.

  6. Let’s now see what’s in the collection dbs. To do this simply type db.cds.find(). You should now see all the entries within the “cds” collection that you created. Notice that a variable _id of type ObjectId has been added to our data.

    To find a single entry, in this example we’ll find the artist by name we simply use db.cds.find({artist:”Alice Cooper”})

  7. So as part of the standard CRUD operations on data, we’ve seen how to Create and Read data, now let’s look at updating data. Run the following command
    db.cds.insert({artist:"Iron Maiden", title:"Blood Brothers"})
    

    Now I’m left off the category here, so we now need to update this entry. We use the collection’s update method, thus

    db.cds.update({artist:"Iron Maiden"}, {$set:{category:"Heavy Metal"}})
    

    The first argument in the update method {artist:”Iron Maiden”} is the query string, in other words update where artist is “Iron Maiden”. The second argument is the update action, which simply tells update to set the category to “Heavy Metal”.

    If we run this command we’ll see the category added, but this command works only on the first item found, so if there were multiple “Iron Maiden” entries missing the category this would be a bit of a problem. However we can apply a third argument to update to solve this.

    db.cds.update({category:"Rock"}, {$set:{category:"Heavy Rock"}},{multi:true})
    

    The addition of {multi:true} will allow the update to be applied to all matching entries.

  8. Now to the final CRUD operation – Delete. This is simply a case of using the collection’s remove method. For example
    db.cds.remove({artist:"Alice Cooper"})
    

    Be careful not to forget the query part of this method call, the {artist:”Alice Cooper”} as using db.cds.remove() will remove all entries from the collection.

Creating a simple database in MongoDB with C#

This is quick post on getting up and running with MongoDB using the “Official MongoDB C# driver” and both creating a database and adding some data to it.

  1. Using Nuget, add the “Official MongoDB C# driver” from 10gen to your project
  2. Add the following using clauses
    using MongoDB.Bson;
    using MongoDB.Driver;
    
  3. Create a POCO object to represent the data you want to persist. The key thing to remember is to add a property of type ObjectId without this we’ll get and exception stating “No IdGenerator found”.

    So an example POCO might look like this

    public class Person
    {
       public ObjectId Id { get; set; }
       public string FirstName { get; set; }
       public string LastName { get; set; }
       public int Age { get; set; }
    }
    
  4. Now we need to connect to the server and create/use a database.

    MongoClient client = new MongoClient();
    MongoServer server = client.GetServer();
    MongoDatabase db = server.GetDatabase("MyDatabase");
    

    Obviously the first two lines create a mongo client and then gets access to the server. The line server.GetDatabase(“MyDatabase”) will get the database (if it exists) but also create a database if it doesn’t exist.

    Note: if you are creating a database using GetDatabase it will not exist until you actually store data in it.

  5. Next we’re going to assume we want to store a collection of employees (a collection of Person objects). So we want to get the collection of “employees”. Like the creating of the database, if no employees currently exist we still get a collection object which we can then save data to.

    MongoCollection<Person> collection = db.GetCollection<Person>("employees");
    
  6. Let’s now create a Person object ready for adding to the collection and ultimately to the database.

    Create the following

    Person p = new Person
    {
       Id = ObjectId.GenerateNewId(),
       FirstName = "Bob",
       LastName = "Baker",
       Age = 36
    }
    

    Notice that we generate the Id using ObjectId.GenerateNewId().

  7. Our next step is to save the new Person to the collection and this will add the data to the collection and thus the database, thus we can then query for this data afterwards using the JavaScript shell.

    collection.Save(p);