Every now and then I start asking some of the people I know and work with about their backup strategies. I thought it was time for me to share my backup strategy.
I have several systems that I use throughout my house but, they all follow the same basic strategy:
daily backups of each computer to a storage device in my home network
daily backups of my local storage device to a remove location
In the following paragraphs, I will provide more details on how I currently backup my main laptop.
My main computer uses Déjà Dup to backup my home directory and saves to a shared network drive via SSH. This gives me the flexibility of backing up to my drive ( at home ) from anywhere where I have network connectivity.
Let's go through this in a bit more detail:
Backup Drive
First things first. Let's get a drive in which to backup our data. I have a 2TB USB drive attached to one of my servers that I make available via SSH to my local network.
Here are the steps that I took to get the drive connected, partitioned, formatted, mounted and available:
Connect drive to any available USB port on your server.
Make sure to turn the power on your USB drive :)
On my server, the USB drive showed up as /dev/sdd1 as there is an existing partition there
Let's partition the drive with cfdisk ( Please read the manual for cfdisk BEFORE doing anything with cfdisk. )
sudo cfdisk /dev/sdd ( notice that I am working with the entire drive here and not just the partition )
delete any partitions in the drive and create one ext4 partition that uses the entire drive.
Use the menu items at the bottom to delete the existing partitions
When you create a new partition, cfdisk will default to create a partition using all available space, keep those defaults
The linux partition type is 83 ( cfdisk should default to this but, just in case you need it )
Make sure to Write the changes to disk
At this point, you should have a new partition on your backup drive:
In my case, my USB drive was recognized as /dev/sdd.
I created one partition on that drive.
The above tells me that my partition should be /dev/sdd1
Let's format our partition to get things going:
sudo mkfs.ext4 /dev/sdd1
Alas! We have a partition ready for use. Let's mount it ( mine is mounted on /media/Backup1 ):
sudo mkdir -p /media/Backup1
sudo mount /dev/sdd1 /media/Backup1
If you've made it this far, you should now have a new "mount point" where you can access your backup drive. Let's verify that by checking what partitions have been mounted:
df -h
The backup drive should be listed there as one of the partitions. Here is what the relevant part of mine looks like:
/dev/sdd1 1.8T 466G 1.3T 27% /media/Backup1
The last part on getting the Backup drive ready is to ensure that it can be accessed from your network. In my case, i have a username on all of the boxes that can access the backup server via SSH key. I also ensure that /media/Backup1 is owned by this username:
sudo chown -R negronjl:negronjl /media/Backup1
( Feel free to use whatever username you want for the above )
Déjà Dup
I use Ubuntu on my systems so, I already have Déjà Dup installed and only need to configure a few things to get going
Open the dash ( Super key ) and type backup. Déjà Dup will be in that list under the name of Backup.
Click on Backup.
My Déjà Dup configuration looks something like:
Storage tab:
Backup location: SSH
Server: <my backup server>
Port: 22
Folder: /media/Backup1/DejaDup
Username: <negronjl> Use whatever username you used in the previous step
The Folders tab:
Folders to back up: <home directory>
Folders to ignore: Trash, Downloads
In the above setting, feel free to add any folder already being backed up or that needs to be ignored.
Schedule:
How often to back up: Daily
Keep backups: At least 28 days ( This can be changed )
Close the window ... everything should be up and running now.
Backupninja
I use backupninja to backup my entire machine in case of catastrophic failure. Here is how to set up and configure backupninja:
sudo apt-get install backupninja
This is what my backupninja configuration file looks like:
when = everyday at 01 options = --s3-use-new-style nicelevel = 19 testconnect = no tmpdir = /home/backupninja/tmp [gpg] sign = yes encryptkey = <your GPG key> password = <your GPG key password> [source] include = /etc include = /home include = /opt include = /usr/local include = /root exclude = /home/*/.gvfs exclude = /home/*/.Private exclude = /home/.ecryptfs [dest] incremental = yes increments = 30 keep = 60 desturl = rsync://<backup_username>@<backup_server>//media/Backup1/Backups/backupninja/negronjl-laptop/negronjl-laptop awsaccesskeyid = <YOUR_AWS_ACCESS_KEY_ID> awssecretaccesskey = <YOUR_AWS_SECRET_KEY>
Adapt the above file to suit your needs ( usernames, passwords and such ) and save it to /etc/backup.d as something like <number>-<name>.dup Where <number> determines the order of execution of the files in the /etc/backup.d/ directory and <name> is an identifying name for the file ( something that tells you what the file is/does ). As an example, my backupninja configuration file is named:
Today I'll be talking about sharded clusters in MongoDB using Ubuntu Precise and Juju.
According the the mongodb documentation found on their website, one way of deploying a Shard Cluster is as follows:
deploy config servers
deploy a mongo shell (mongos)
deploy shards
connect the config servers to the mongo shell
add the shards to the mongo shell
This is a pretty involved process that normally takes several hours to accomplish. In this post, I will show you a way of deploying a sharded cluster in minutes using Ubuntu, Juju and the MongoDB charm.
For the impatient, here is a video of the entire deployment:
For the not so impatient, here is a more detailed explanation of the deployment.
A few things to keep in mind when configuring this sharded cluster configuration:
Each shard will be a replica set with three nodes each. As such, each shard must have a different replica-set name so it can be successfully registered as part of the cluster.
We will deploy three configuration servers
Bootstrap the environment juju bootstrap
Mongo Shell juju deploy mongodb mongos
As described in the MongoDB charm, there are many options that can be configured to suit your needs. For this deployment, most of the configuration options will work with the exception of the replicaset which will need to be different for each shard.
Fortunately, a simple yaml file with our configuration overrides is all that's needed.
Prepare a configuration file similar to the following:
Connect the Config Servers to the Mongo shell (mongos) juju add-relation mongos:mongos-cfg configsvr:configsvr
Connect each Shard to the Mongo shell (mongos) juju add-relation mongos:mongos shard1:database juju add-relation mongos:mongos shard2:database juju add-relation mongos:mongos shard3:database
With the above commands, we should now have a three replica-set sharded cluster running.
Using the default configuration, here are some details of our sharded cluster:
mongos is running on port 27021
configsvr is running on port 27019
the shards are running on the default mongodb port of 27017
The web admin is turned on by default and accessible with your browser on port 28017 on each of the shards after exposing the service.
After a few minutes, your sharded cluster should be ready. Let's verify that everything went as planned:
Verify your config servers
juju expose configsvr
juju status configsvr
Open your browser to http://<public-address-of-configsvr:28017
Verify that each shard
juju expose <shard1|shard2|shard3>
juju status <shard1|shard2|shard3>
Open your browser to http://<public-address-of-shard>:28017
Verify that each shard has been successfully register with the cluster:
juju expose mongos
juju status mongos
mongo --host <public-address-of-mongos>:27021
Once connected:
sh.status()
Results should be similar to the following:
MongoDB shell version: 2.0.6
connecting to: ec2-184-169-254-0.us-west-1.compute.amazonaws.com:27021/test
In the above status, you should see the hosts associated with each of your shards.
You now have a MongoDB sharded cluster accessible via the public-address of the mongos instance... enjoy ... and don't forget to destroy the environment when you're done with it: juju destroy-environment
A while back I started experimenting with Juju and was intrigued by the notion of services instead of machines.
A bit of background on Juju from their website:
Formerly called Ensemble, juju is DevOps DistilledTM. Through the use of charms(renamed from formulas), juju provides you with shareable, re-usable, and repeatable expressions of DevOps best practices. You can use them unmodified, or easily change and connect them to fit your needs. Deploying a charm is similar to installing a package on Ubuntu: ask for it and it’s there, remove it and it’s completely gone.
I come from a DevOps background and know first hand the troubles and tribulations of deploying production services, webapps, etc. One that's particularly "thorny" is hadoop.
To deploy a hadoop cluster, we would need to download the dependencies ( java, etc. ), download hadoop, configure it and deploy it. This process is somewhat different depending on the type of node that you're deploying ( ie: namenode, job-tracker, etc. ). This is a multi-step process that requires too much human intervention. It is also a process that is difficult to automate and reproduce. Imagine 10, 20 or 50 node cluster using this method. It can get frustrating quickly and it is prone to mistake.
With this experience in mind ( and a lot of reading ), I set out to deploy a hadoop cluster using an Juju charm.
First things first, let's install Juju. Follow the Getting Started documentation on the Juju site here.
According to the Juju documenation, we just need to follow some file naming conventions for what they call "hooks" ( executable scripts in your language of choice that perform certain actions ). These "hooks" control the installation, relationships, start, stop, etc of your charm. We also need to summarize the description of the formula in a file called metadata.yaml. The metadata.yaml file describes the formula, it's interfaces, what it requires and provides among other things. More on this file later when I show you the one for hadoop-master and hadoop-slave.
Armed with a bit of knowledge and a desire for simplicity, I decided to split the hadoop cluster in two:
hadoop-master (namenode and jobtracker )
hadoop-slave ( datanode and tasktracker )
I know this is not an all-encompassing list but, this will take care of a good portion of deployments and, the Juju charms are easy enough to modify that you can work your changes into them.
One of my colleagues, Brian Thomason did a lot of packaging for these charms so, my job is now easier. The configuration for the packages has been distilled down to three questions:
namenode ( leave blank if you are the namenode )
jobtracker ( leave blank if you are the jobtracker )
hdfs data directory ( leave blank to use the default: /var/lib/hadoop-0.20/dfs/data )
Due to the magic of Ubuntu packaging, we can even "preseed" the answers to those questions to avoid being asked about them ( and stopping the otherwise automatic process ). We'll use the utility debconf-set-selections for this. Here is a piece of the code that I use to preseed the values in my charm:
Thanks to Brian's work, I now just have to install the packages ( hadoop-0.20-namenode and hadoop-0.20-jobtracker). Let's put all of this together into a Juju charm.
Create a directory for the hadoop-master formula ( mkdir hadoop-master )
Make a directory for the hooks of this charm ( mkdir hadoop-master/hooks )
Let's start with the always needed metadata.yaml file ( hadoop-master/metadata.yaml ):
ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
namenode, which manages the block locations of files on the
filesystem. The jobtracker is a central service which is responsible
for managing the tasktracker services running on all nodes in a
Hadoop Cluster. The jobtracker allocates work to the tasktracker
nearest to the data with an available work slot.
provides:
hadoop-master:
interface: hadoop-master
Every Juju charm has an install script ( in our case: hadoop-master/hooks/install ). This is an executable file in your language of choice that Juju will run when it's time to install your charm. Anything and everything that needs to happen for your charm to install, needs to be inside of that file. Let's take a look at the install script of hadoop-master:
#!/bin/bash
# Here do anything needed to install the service
# i.e. apt-get install -y foo or bzr branch http://myserver/mycode /srv/webroot
There a few other files that we need to create ( start and stop ) to get the hadoop-master charm installed. Let's see those files:
start
#!/bin/bash
# Here put anything that is needed to start the service.
# Note that currently this is run directly after install
# i.e. 'service apache2 start'
set -x
service hadoop-0.20-namenode status && service hadoop-0.20-namenode restart || service hadoop-0.20-namenode start
service hadoop-0.20-jobtracker status && service hadoop-0.20-jobtracker restart || service hadoop-0.20-jobtracker start
stop
#!/bin/bash
# This will be run when the service is being torn down, allowing you to disable
# it in various ways..
# For example, if your web app uses a text file to signal to the load balancer
# that it is live... you could remove it and sleep for a bit to allow the load
# balancer to stop sending traffic.
# rm /srv/webroot/server-live.txt && sleep 30
set -x
juju-log "stop script"
service hadoop-0.20-namenode stop
service hadoop-0.20-jobtracker stop
Let's go back to the metadata.yaml file and examin it in more detail:
ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
namenode, which manages the block locations of files on the
filesystem. The jobtracker is a central service which is responsible
for managing the tasktracker services running on all nodes in a
Hadoop Cluster. The jobtracker allocates work to the tasktracker
nearest to the data with an available work slot.
provides:
hadoop-master:
interface: hadoop-master
The emphasized section ( provides ) tells juju that this formula provides an interface named hadoop-master that can be used in relationships with other charms ( in our case we'll be using it to connect the hadoop-master with the hadoop-slave charm that we'll be writing a bit later ). For this relationship to work, we need to let Juju know what to do ( More detailed information about relationships in charms can be found here ).
Per the Juju documentation, we need to name our relationship hooks hadoop-master-relation-joined and it should also be an executable script in your language of choice. Let's see what that file looks like:
#!/bin/sh
# This must be renamed to the name of the relation. The goal here is to
# affect any change needed by relationships being formed
# This script should be idempotent.
set -x
juju-log "joined script started"
# Calculate our IP Address
IP_ADDRESS=`unit-get private-address`
# Preseed our Namenode, Jobtracker and HDFS Data directory
juju add-relation hadoop-slave hadoop-master # ( connects the hadoop-slave to the hadoop-master )
As you can see, once you have the charm written and tested, deploying the cluster is really a matter of a few commands. The above example gives you one hadoop-master ( namenode, jobtracker ) and one hadoop-slave ( datanode, tasktracker ).
To add another node to this existing hadoop cluster, we add:
juju add-unit hadoop-slave # ( this adds one more slave )
Run the above command multiple times to continue to add hadoop-slave nodes to your cluster.
Juju allows you to catalog the steps needed to get your service/application installed, configured and running properly. Once your knowledge has been captured in an Juju charm, it can be re-used by you or others without much knowledge of what's needed to get the application/service running.
In the DevOps world, this code re-usability can save time, effort and money by providing self contained charms that provide a service or application.
The video version ( better than the not so short version above but not enough details )
The details ( pretty much all of them )
This deployment will be a bit different as we'll be deploying everything in Ubuntu 11.10 ( Oneiric ) and we'll be using large instances in Amazon EC2. Let's get started...
Copy the above code and save it somewhere in your computer ( I named mine ubuntu-latest-image and saved it in ~/bin so it's in my PATH ). The script depends on wget, grep, sed and awk. All available in the repositories and more than likely already installed in your system. Just in case you don't have them installed, run sudo apt-get install wget grep sed awk.
Execute the script as follows:
ubuntu-latest-image oneiric m1.large
It should return something like the following: Release: oneiric Size: m1.large Image ID: ami-d131f2b8
We now need to modify our default juju configuration so we can deploy large (m1.large) oneiric ( ami-d131f2b8 ) instances.
Edit juju's configuration file (~/.juju/environments.yaml) and make it look something like this:
2011-09-09 21:30:50,925 INFO 'status' command finished successfully
== DynDNS ==
If you want your CloudFoundry server to be accessible ( and usable ) remotely, you'll need to have a DNS entry that also creates a wildcard record. If you have a DynDNS account, you can enter your hostname and credentials right into this charm. Upon deployment, the configuration will be done for you. Edit cloudfoundry-server/hooks/install and change the following lines:
USE_DYNDNS="true" <---- make sure it is set to "true"
DYNDNS_USERNAME="dyndnsusername" <----- Your DynDNS username
DYNDNS_PASSWORD="dyndnspassword" <----- Your DynDNS password
DYNDNS_HOSTNAME="cf-host.dyndns.org" <--- The DyDNS host you created for this.
Deploy the cloudfoundry-server charm by typing the following:
In order to be able to connect to the cloudfoundry-server, we need to tell juju to expose (open) the ports specified in the charm ( 80, 443 and 4222 in this case ). Let's do that:
juju expose cloudfoundry-server
juju status should now look similar to this:
2011-09-09 21:46:01,008 INFO Connecting to environment.
2011-09-09 22:36:57,701 INFO 'status' command finished successfully
It looks a bit different now doesn't it? :)
Here's what we just did:
added a new DEA
added a new MySQL
added a new MongoDB
added a new Redis
This version of the deployment looks a lot better.
But wait!!! There's more!.
The newly deployed units can "grow" the deployment as needed. For example:
juju add-unit cf-mysql
juju add-unit cf-mongodb
juju add-unit cf-redis
juju add-unit cloudfoundry-server-dea
Each one of the above commands will add another unit of the existing deployed ones thus, horizontally scaling our deployment. Go ahead and add a few units and, run juju status after so you can get a better idea of how all of these services are orchestrated.
You may have noticed that I haven't gone into any details as to how these charms work ( especially if you have read any of my previous posts ). The idea I am trying to convey here is that with Ubuntu Server and Juju you really don't have to know how CloudFoundry needs to be installed and configured in order to be able to deploy and scale it. Juju charms neatly encapsulate all of the necessary knowledge so you don't have to. That is not to say that you shouldn't or are not able to. These charms are available for download, review, contributions and feedback ( <--- the emphasis means that I would really like comments, contributions and general feedback )
I look forward to your comments/questions and general feedback. Let me know what you think.