Thursday, November 29, 2012

Let's shard something

It's been a while so, let's get right to it...

Today I'll be talking about sharded clusters in MongoDB using Ubuntu Precise and Juju.

According the the mongodb documentation found on their website, one way of deploying a Shard Cluster is as follows:
  • deploy config servers
  • deploy a mongo shell (mongos)
  • deploy shards
  • connect the config servers to the mongo shell
  • add the shards to the mongo shell
This is a pretty involved process that normally takes several hours to accomplish.  In this post, I will show you a way of deploying a sharded cluster in minutes using UbuntuJuju and the MongoDB charm.

For the impatient, here is a video of the entire deployment:



For the not so impatient, here is a more detailed explanation of the deployment.

A few things to keep in mind when configuring this sharded cluster configuration:
  • Each shard will be a replica set with three nodes each.  As such, each shard must have a different replica-set name so it can be successfully registered as part of the cluster.
  • We will deploy three configuration servers
Bootstrap the environment
juju bootstrap



Mongo Shell
juju deploy mongodb mongos



As described in the MongoDB charm, there are many options that can be configured to suit your needs.  For this deployment, most of the configuration options will work with the exception of the replicaset which will need to be different for each shard.

Fortunately, a simple yaml file with our configuration overrides is all that's needed.

Prepare a configuration file similar to the following:
shard1:
  replicaset: shard1
shard2:
  replicaset: shard2
shard3:
  replicaset: shard3
configsvr:
  replicaset: configsvr

We'll save this one as ~/mongodb-shard.yaml



Config Servers ( we'll deploy 3 of them )
juju deploy mongodb configsvr --config ~/mongodb-shard.yaml -n3




Shards ( We'll deploy three replica-sets )
juju deploy mongodb shard1 --config ~/mongodb-shard.yaml -n3
juju deploy mongodb shard2 --config ~/mongodb-shard.yaml -n3
juju deploy mongodb shard3 --config ~/mongodb-shard.yaml -n3




Connect the Config Servers to the Mongo shell (mongos)
juju add-relation mongos:mongos-cfg configsvr:configsvr




Connect each Shard to the Mongo shell (mongos)
juju add-relation mongos:mongos shard1:database
juju add-relation mongos:mongos shard2:database
juju add-relation mongos:mongos shard3:database




With the above commands, we should now have a three replica-set sharded cluster running.
Using the default configuration, here are some details of our sharded cluster:
  • mongos is running on port 27021
  • configsvr is running on port 27019
  • the shards are running on the default mongodb port of 27017
  • The web admin is turned on by default and accessible with your browser on port 28017 on each of the shards after exposing the service.



After a few minutes, your sharded cluster should be ready.  Let's verify that everything went as planned:
  • Verify your config servers
    • juju expose configsvr
    • juju status configsvr
    • Open your browser to http://<public-address-of-configsvr:28017
  • Verify that each shard
    • juju expose <shard1|shard2|shard3>
    • juju status <shard1|shard2|shard3>
    • Open your browser to http://<public-address-of-shard>:28017
  • Verify that each shard has been successfully register with the cluster:
    • juju expose mongos
    • juju status mongos
    • mongo --host <public-address-of-mongos>:27021
    • Once connected:
      • sh.status()
      • Results should be similar to the following:
MongoDB shell version: 2.0.6
connecting to: ec2-184-169-254-0.us-west-1.compute.amazonaws.com:27021/test
mongos> sh.status()
--- Sharding Status --- 
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
        {  "_id" : "shard1",  "host" : "shard1/ec2-184-169-236-25.us-west-1.compute.amazonaws.com:27017,ec2-54-241-89-206.us-west-1.compute.amazonaws.com:27017,ip-10-170-173-51:27017" }
        {  "_id" : "shard2",  "host" : "shard2/ec2-204-236-159-194.us-west-1.compute.amazonaws.com:27017,ip-10-170-22-104:27017" }
        {  "_id" : "shard3",  "host" : "shard3/ec2-184-169-219-11.us-west-1.compute.amazonaws.com:27017,ec2-184-169-239-214.us-west-1.compute.amazonaws.com:27017,ip-10-170-215-20:27017" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }

mongos> 
      • In the above status, you should see the hosts associated with each of your shards.

You now have a MongoDB sharded cluster accessible via the public-address of the mongos instance... enjoy ... and don't forget to destroy the environment when you're done with it:

juju destroy-environment




-Juan