Monday, November 7, 2011

Hadoop cluster with Ubuntu server and Juju

A while back I started experimenting with Juju and was intrigued by the notion of services instead of machines.

A bit of background on Juju from their website:

  • Formerly called Ensemble, juju is DevOps DistilledTM. Through the use of charms(renamed from formulas), juju provides you with shareable, re-usable, and repeatable expressions of DevOps best practices. You can use them unmodified, or easily change and connect them to fit your needs. Deploying a charm is similar to installing a package on Ubuntu: ask for it and it’s there, remove it and it’s completely gone.

I come from a DevOps background and know first hand the troubles and tribulations of deploying production services, webapps, etc.  One that's particularly "thorny" is hadoop.

To deploy a hadoop cluster, we would need to download the dependencies ( java, etc. ), download hadoop, configure it and deploy it.  This process is somewhat different depending on the type of node that you're deploying ( ie: namenode, job-tracker, etc. ).  This is a multi-step process that requires too much human intervention.  It is also a process that is difficult to automate and reproduce.  Imagine 10, 20 or 50 node cluster using this method.  It can get frustrating quickly and it is prone to mistake.

With this experience in mind ( and a lot of reading ), I set out to deploy a hadoop cluster using an Juju charm.

First things first, let's install Juju.  Follow the Getting Started documentation on the Juju site here.

According to the Juju documenation, we just need to follow some file naming conventions for what they call "hooks" ( executable scripts in your language of choice that perform certain actions ).  These "hooks" control the installation, relationships, start, stop, etc of your charm.  We also need to summarize the description of the formula in a file called metadata.yaml.  The metadata.yaml file describes the formula, it's interfaces, what it requires and provides among other things.  More on this file later when I show you the one for hadoop-master and hadoop-slave.

Armed with a bit of knowledge and a desire for simplicity, I decided to split the hadoop cluster in two:

  • hadoop-master (namenode and jobtracker )
  • hadoop-slave ( datanode and tasktracker )
I know this is not an all-encompassing list but, this will take care of a good portion of deployments and, the Juju charms are easy enough to modify that you can work your changes into them.

One of my colleagues, Brian Thomason did a lot of packaging for these charms so, my job is now easier.  The configuration for the packages has been distilled down to three questions:

  1. namenode ( leave blank if you are the namenode )
  2. jobtracker ( leave blank if you are the jobtracker )
  3. hdfs data directory ( leave blank to use the default: /var/lib/hadoop-0.20/dfs/data )
Due to the magic of Ubuntu packaging, we can even "preseed" the answers to those questions to avoid being asked about them ( and stopping the otherwise automatic process ). We'll use the utility debconf-set-selections for this.  Here is a piece of the code that I use to preseed the values in my charm:
  • echo debconf hadoop/namenode string ${NAMENODE}| /usr/bin/debconf-set-selections
  • echo debconf hadoop/jobtracker string ${JOBTRACKER}| /usr/bin/debconf-set-selections
  • echo debconf hadoop/hdfsdatadir string ${HDFSDATADIR}| /usr/bin/debconf-set-selections
The variable names should be self explanatory.  

Thanks to Brian's work, I now just have to install the packages ( hadoop-0.20-namenode and hadoop-0.20-jobtracker).  Let's put all of this together into a Juju charm.

  • Create a directory for the hadoop-master formula ( mkdir hadoop-master )
  • Make a directory for the hooks of this charm ( mkdir hadoop-master/hooks )
  • Let's start with the always needed metadata.yaml file ( hadoop-master/metadata.yaml ):
ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
  The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
  namenode, which manages the block locations of files on the
  filesystem.  The jobtracker is a central service which is responsible
  for managing the tasktracker services running on all nodes in a
  Hadoop Cluster.  The jobtracker allocates work to the tasktracker
  nearest to the data with an available work slot.
provides:
  hadoop-master:
    interface: hadoop-master

  • Every Juju charm has an install script ( in our case: hadoop-master/hooks/install ).  This is an executable file in your language of choice that Juju will run when it's time to install your charm.  Anything and everything that needs to happen for your charm to install, needs to be inside of that file.  Let's take a look at the install script of hadoop-master:
#!/bin/bash
# Here do anything needed to install the service
# i.e. apt-get install -y foo  or  bzr branch http://myserver/mycode /srv/webroot


##################################################################################
# Set debugging
##################################################################################
set -ux
juju-log "install script"


##################################################################################
# Add the repositories
##################################################################################
export TERM=linux
# Add the Hadoop PPA
juju-log "Adding ppa"
apt-add-repository ppa:canonical-sig/thirdparty
juju-log "updating cache"
apt-get update


##################################################################################
# Calculate our IP Address
##################################################################################
juju-log "calculating ip"
IP_ADDRESS=`hostname -f`
juju-log "Private IP: ${IP_ADDRESS}"


##################################################################################
# Preseed our Namenode, Jobtracker and HDFS Data directory
##################################################################################
NAMENODE="${IP_ADDRESS}"
JOBTRACKER="${IP_ADDRESS}"
HDFSDATADIR="/var/lib/hadoop-0.20/dfs/data"
juju-log "Namenode: ${NAMENODE}"
juju-log "Jobtracker: ${JOBTRACKER}"
juju-log "HDFS Dir: ${HDFSDATADIR}"

echo debconf hadoop/namenode string ${NAMENODE}| /usr/bin/debconf-set-selections
echo debconf hadoop/jobtracker string ${JOBTRACKER}| /usr/bin/debconf-set-selections
echo debconf hadoop/hdfsdatadir string ${HDFSDATADIR}| /usr/bin/debconf-set-selections


##################################################################################
# Install the packages
##################################################################################
juju-log "installing packages"
apt-get install -y hadoop-0.20-namenode
apt-get install -y hadoop-0.20-jobtracker


##################################################################################
# Open the necessary ports
##################################################################################
if [ -x /usr/bin/open-port ];then
   open-port 50010/TCP
   open-port 50020/TCP
   open-port 50030/TCP
   open-port 50105/TCP
   open-port 54310/TCP
   open-port 54311/TCP
   open-port 50060/TCP
   open-port 50070/TCP
   open-port 50075/TCP
   open-port 50090/TCP
fi


  • There a few other files that we need to create ( start and stop ) to get the hadoop-master charm installed.  Let's see those files:
    • start
#!/bin/bash
# Here put anything that is needed to start the service.
# Note that currently this is run directly after install
# i.e. 'service apache2 start'

set -x
service hadoop-0.20-namenode status && service hadoop-0.20-namenode restart || service hadoop-0.20-namenode start
service hadoop-0.20-jobtracker status && service hadoop-0.20-jobtracker restart || service hadoop-0.20-jobtracker start

    • stop
#!/bin/bash
# This will be run when the service is being torn down, allowing you to disable
# it in various ways..
# For example, if your web app uses a text file to signal to the load balancer
# that it is live... you could remove it and sleep for a bit to allow the load
# balancer to stop sending traffic.
# rm /srv/webroot/server-live.txt && sleep 30

set -x
juju-log "stop script"
service hadoop-0.20-namenode stop
service hadoop-0.20-jobtracker stop

Let's go back to the metadata.yaml file and examin it in more detail:

ensemble: formula
name: hadoop-master
revision: 1
summary: Master Node for Hadoop
description: |
  The Hadoop Distributed Filesystem (HDFS) requires one unique server, the
  namenode, which manages the block locations of files on the
  filesystem.  The jobtracker is a central service which is responsible
  for managing the tasktracker services running on all nodes in a
  Hadoop Cluster.  The jobtracker allocates work to the tasktracker
  nearest to the data with an available work slot.
provides:
  hadoop-master:
    interface: hadoop-master

The emphasized section ( provides ) tells juju that this formula provides an interface named hadoop-master that can be used in relationships with other charms ( in our case we'll be using it to connect the hadoop-master with the hadoop-slave charm that we'll be writing a bit later ).  For this relationship to work, we need to let Juju know what to do ( More detailed information about relationships in charms can be found here ).

Per the Juju documentation, we need to name our relationship hooks hadoop-master-relation-joined  and it should also be an executable script in your language of choice.  Let's see what that file looks like:

#!/bin/sh
# This must be renamed to the name of the relation. The goal here is to
# affect any change needed by relationships being formed
# This script should be idempotent.

set -x

juju-log "joined script started"

# Calculate our IP Address
IP_ADDRESS=`unit-get private-address`

# Preseed our Namenode, Jobtracker and HDFS Data directory
NAMENODE="${IP_ADDRESS}"
JOBTRACKER="${IP_ADDRESS}"
HDFSDATADIR="/var/lib/hadoop-0.20/dfs/data"

relation-set namenode="${NAMENODE}" jobtracker="${JOBTRACKER}" hdfsdatadir="${HDFSDATADIR}"



juju-log "$JUJU_REMOTE_UNIT joined"

Your formula charm directory should now look something like this:
natty/hadoop-masternatty/hadoop-master/metadata.yamlnatty/hadoop-master/hooks/installnatty/hadoop-master/hooks/startnatty/hadoop-master/hooks/stopnatty/hadoop-master/hooks/hadoop-master-relation-joined
 This charm should now be complete...  It's not too exciting yet as it doesn't have the hadoop-slave counterpart to it but, it is a complete charm.

The latest version of the hadoop-master charm can be found here if you want to get it.

The hadoop-slave charm is almost the same as the hadoop-master charm with some exceptions.  Those I'll leave as an exercise for the reader.

The hadoop-slave charm can be found here if you want to get it.

Once you have both charm ( hadoop-master and hadoop-slave ) you can easily deploy your cluster by typing:

  • juju bootstrap   # ( creates/bootstraps the ensemble environment)
  • juju deploy --repository . local:natty/hadoop-master # ( deploys hadoop-master )
  • juju deploy --repository . local:natty/hadoop-slave # ( deploys hadoop-slave )
  • juju add-relation hadoop-slave hadoop-master # ( connects the hadoop-slave to the hadoop-master )
As you can see, once you have the charm written and tested, deploying the cluster is really a matter of a few commands.  The above example gives you one hadoop-master ( namenode, jobtracker ) and one hadoop-slave ( datanode, tasktracker ).

To add another node to this existing hadoop cluster, we add:

  • juju add-unit hadoop-slave # ( this adds one more slave )
Run the above command multiple times to continue to add hadoop-slave nodes to your cluster.

Juju allows you to catalog the steps needed to get your service/application installed, configured and running properly.  Once your knowledge has been captured in an Juju charm, it can be re-used by you or others without much knowledge of what's needed to get the application/service running.

In the DevOps world, this code re-usability can save time, effort and money by providing self contained charms that provide a service or application.

Friday, September 9, 2011

From zero to DrawBridge via Ubuntu Server, Juju and CloudFoundry in less than 10 minutes

** This is an updated post reflecting the renaming of the project formerly known as Ensemble and now known as Juju **

As Dustin mentioned in his blog posts here and here, we've been working with VMWare on a CloudFoundry deployment using Ubuntu Server and Juju.

Today, I would like to build on those posts and talk a bit about:
  • deploying a CloudFoundry server environment using Ubuntu Server and Juju
  • show how Juju charm can make it easy to horizontally scale your deployment
  • deploy a sample application


The short version ( not really that short after all )

Dependencies
sudo apt-get install bzr wget grep sed gawk

Juju
sudo apt-add-repository ppa:juju/pkgs
sudo apt-get update
sudo apt-get install juju

Charms and supporting scripts
bzr branch lp:~canonical-sig/+junk/cloudfoundry-server
bzr branch lp:~canonical-sig/+junk/cloudfoundry-server-dea
bzr branch lp:~canonical-sig/+junk/cf-mysql
bzr branch lp:~canonical-sig/+junk/cf-mongodb
bzr branch lp:~canonical-sig/+junk/cf-redis
bzr branch lp:~kirkland/+junk/drawbridge
bzr branch lp:~negronjl/+junk/ubuntu-latest-image
chmod +x ./ubuntu-latest-image
juju
echo "    default-image-id: `ubuntu-latest-image oneiric m1.large | grep Image | awk '{ print $3 }'`" >> ~/.juju/environments.yaml
echo "    default-instance-type: m1.large" >> ~/.juju/environments.yaml

Bootstrap
juju bootstrap

Deploy
juju deploy --repository . cloudfoundry-server
juju deploy --repository . cloudfoundry-server-dea
juju deploy --repository . cf-mysql
juju deploy --repository . cf-redis
juju deploy --repository . cf-mongodb

Relationships
juju add-relation cloudfoundry-server cloudfoundry-server-dea
juju add-relation cloudfoundry-server cf-mysql
juju add-relation cloudfoundry-server cf-mongodb
juju add-relation cloudfoundry-server cf-redis

Scale
juju add-unit cloudfoundry-server-dea
juju add-unit cf-mysql
juju add-unit cf-mongodb
juju add-unit cf-redis

DrawBridge
cd drawbridge
vmc
vmc target <your hostname here>
vmc add-user
vmc push

Take it all down
juju destroy-environment

The video version ( better than the not so short version above but not enough details )


The details ( pretty much all of them )

This deployment will be a bit different as we'll be deploying everything in Ubuntu 11.10 ( Oneiric ) and we'll be using large instances in Amazon EC2.  Let's get started...

Juju
sudo apt-add-repository ppa:juju/pkgs
sudo apt-get update
sudo apt-get install juju

Bazaar ( so we can download the charms )
sudo apt-get install bzr

CloudFoundry Juju charms
bzr branch lp:~canonical-sig/+junk/cloudfoundry-server
bzr branch lp:~canonical-sig/+junk/cloudfoundry-server-dea
bzr branch lp:~canonical-sig/+junk/cf-mysql
bzr branch lp:~canonical-sig/+junk/cf-mongodb
bzr branch lp:~canonical-sig/+junk/cf-redis

DrawBridge app ( Thanks Dustin )
bzr branch lp:~kirkland/+junk/drawbridge

Configure Juju
juju ( will create the configuration file if needed )

Get the latest Ubuntu Oneiric image from here or you can just use this little script that will do it for you:
#!/bin/bash

release="$1"
size="$2"
[ -z "$release" ] && release="oneiric"
[ -z "$size" ] && size="t1.micro"
echo "Release: ${release}"
echo "Size: ${size}"
result=`wget -q -O - http://uec-images.ubuntu.com/$release/current/ | grep -m1 "$size.*us-east" | sed -e "s/^.*<tt>//" -e "s/ <\/tt>.*$//" -e 's/${EC2_KEYPAIR_US_EAST_1}//'`
image_id=`echo $result | awk '{ print $2 }'`
echo "Image ID: ${image_id}"
Copy the above code and save it somewhere in your computer ( I named mine ubuntu-latest-image and saved it in ~/bin so it's in my PATH ).  The script depends on wget, grep, sed and awk.  All available in the repositories and more than likely already installed in your system.  Just in case you don't have them installed, run sudo apt-get install wget grep sed awk.


Execute the script as follows:
  • ubuntu-latest-image oneiric m1.large
It should return something like the following:
Release: oneiric
Size: m1.large
Image ID: ami-d131f2b8

We now need to modify our default juju configuration so we can deploy large (m1.large) oneiric ( ami-d131f2b8 ) instances.

Edit juju's configuration file (~/.juju/environments.yaml) and make it look something like this:
environments:  sample:    type: ec2    access-key: --removed--    secret-key: --removed--    control-bucket: --removed--    admin-secret: --removed--    default-image-id: ami-d131f2b8    default-instance-type: m1.large
The highlighted parts are the important ones, you should already have the other lines in your configuration.  

Bootstrap Juju:
juju bootstrap

Make sure the environment has been bootstrapped by running:
juju status 

The output should look similar the this:
2011-09-09 21:30:48,363 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-184-73-104-84.compute-1.amazonaws.com, instance-id: i-58f72138}
services: {}
2011-09-09 21:30:50,925 INFO 'status' command finished successfully

== DynDNS ==
If you want your CloudFoundry server to be accessible ( and usable ) remotely, you'll need to have a DNS entry that also creates a wildcard record.  If you have a DynDNS account, you can enter your hostname and credentials right into this charm.  Upon deployment, the configuration will be done for you.  Edit cloudfoundry-server/hooks/install and change the following lines:
USE_DYNDNS="true" <---- make sure it is set to "true"
DYNDNS_USERNAME="dyndnsusername"  <----- Your DynDNS username
DYNDNS_PASSWORD="dyndnspassword" <----- Your DynDNS password
DYNDNS_HOSTNAME="cf-host.dyndns.org" <--- The DyDNS host you created for this.

Deploy the cloudfoundry-server charm by typing the following:
ensmelbe deploy --repository . cloudfoundry-server

After a few minutes, the output of juju status should look similar to this:
2011-09-09 21:43:38,203 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-184-73-104-84.compute-1.amazonaws.com, instance-id: i-58f72138}
  1: {dns-name: ec2-50-17-173-69.compute-1.amazonaws.com, instance-id: i-c6f224a6}
services:
  cloudfoundry-server:
    charm: local:cloudfoundry-server-26
    relations: {}
    units:
      cloudfoundry-server/0:
        machine: 1
        relations: {}
        state: started

In order to be able to connect to the cloudfoundry-server, we need to tell juju to expose (open) the ports specified in the charm ( 80, 443 and 4222 in this case ).  Let's do that:
juju expose cloudfoundry-server

juju status should now look similar to this:
2011-09-09 21:46:01,008 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-184-73-104-84.compute-1.amazonaws.com, instance-id: i-58f72138}
  1: {dns-name: ec2-50-17-173-69.compute-1.amazonaws.com, instance-id: i-c6f224a6}
services:
  cloudfoundry-server:
    exposed: true
    charm: local:cloudfoundry-server-26
    relations: {}
    units:
      cloudfoundry-server/0:
        machine: 1
        open-ports: [80/tcp, 443/tcp, 4222/tcp]
        relations: {}
        state: started
2011-09-09 21:46:05,037 INFO 'status' command finished successfully

We should now have ports 80, 443 and 4222 open to the world.

In order to connect to our CloudFoundry server, we need to install ruby-vmc ( available in Ubuntu 11.10 ).  Let's install it by typing:
sudo apt-get install ruby-vmc

Once installed, connect to your CloudFoundry server by typing:
vmc target api.<your dns entry here>
ie:  vmc target api.cf-host.dyndns.org

Create a user:
vmc add-user --email <some email address> --passwd <some password>
ie: vmc add-user --email 'test@example.com' --passwd 'test'

Deploy DrawBridge
cd drawbridge
vmc push
Would you like to deploy from the current directory? [Yn]: Y
Application Name: drawbridge
Application Deployed URL: 'drawbridge.cf-host.dyndns.org'? 
Detected a Node.js Application, is this correct? [Yn]: Y
Memory Reservation [Default:64M] (64M, 128M, 256M, 512M, 1G or 2G) 
Creating Application: OK
Would you like to bind any services to 'drawbridge'? [yN]: y
The following system services are available::
1. mongodb
2. mysql
3. redis
Please select one you wish to provision: 2
Specify the name of the service [mysql-ce0c8]: 
Creating Service: OK
Binding Service: OK
Uploading Application:
  Checking for available resources: OK
  Processing resources: OK
  Packing application: OK
  Uploading (16M): OK   
Push Status: OK
Staging Application: OK                                                         
Starting Application: OK  
cd ..                                                      

Open your browser to URL you selected when pushing the app ( in this example: http://drawbridge.cf-host.dyndns.org )




That's all there is to that!!  You have deployed CloudFoundry server via Ubuntu Server and Juju and, DrawBridge via vmc and CloudFoundry.

But wait!!!  There's more!.

A single cloudfoundry-server is probably not your idea of scalability.  Let's deploy a few more scalable components.  
Let's add an extra DEA, MySQL node, MongoDB node and Redis node.
cd <directory where your charms are>
juju deploy --repository . cloudfoundry-server-dea
juju deploy --repository . cf-mysql
juju deploy --repository . cf-mongodb
juju deploy --repository . cf-redis

.... and connect them all together with the cloudfoundry-server
juju add-relation cloudfoundry-server cloudfoundry-server-dea
juju add-relation cloudfoundry-server cf-mysql
juju add-relation cloudfoundry-server cf-mongodb
juju add-relation cloudfoundry-server cf-redis

After a few minutes, run juju status again.... It should look something similar to this:
2011-09-09 22:36:40,991 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-184-73-104-84.compute-1.amazonaws.com, instance-id: i-58f72138}
  1: {dns-name: ec2-50-17-173-69.compute-1.amazonaws.com, instance-id: i-c6f224a6}
  2: {dns-name: ec2-107-20-82-210.compute-1.amazonaws.com, instance-id: i-7615c316}
  3: {dns-name: ec2-107-20-98-149.compute-1.amazonaws.com, instance-id: i-4e15c32e}
  4: {dns-name: ec2-184-72-209-9.compute-1.amazonaws.com, instance-id: i-5815c338}
  5: {dns-name: ec2-50-19-63-40.compute-1.amazonaws.com, instance-id: i-0015c360}
services:
  cf-mongodb:
    charm: local:cf-mongodb-1
    relations: {cf-server: cloudfoundry-server, mongodb-cluster: cf-mongodb}
    units:
      cf-mongodb/0:
        machine: 4
        relations:
          cf-server: {state: up}
          mongodb-cluster: {state: up}
        state: started
  cf-mysql:
    charm: local:cf-mysql-1
    relations: {cf-server: cloudfoundry-server, mysql-cluster: cf-mysql}
    units:
      cf-mysql/0:
        machine: 3
        relations:
          cf-server: {state: up}
          mysql-cluster: {state: up}
        state: started
  cf-redis:
    charm: local:cf-redis-1
    relations: {cf-server: cloudfoundry-server, redis-cluster: cf-redis}
    units:
      cf-redis/0:
        machine: 5
        relations:
          cf-server: {state: up}
          redis-cluster: {state: up}
        state: started
  cloudfoundry-server:
    exposed: true
    charm: local:cloudfoundry-server-26
    relations: {cf-server: cf-mysql}
    units:
      cloudfoundry-server/0:
        machine: 1
        open-ports: [80/tcp, 443/tcp, 4222/tcp]
        relations:
          cf-server: {state: up}
        state: started
  cloudfoundry-server-dea:
    charm: local:cloudfoundry-server-dea-26
    relations: {cf-dea-cluster: cloudfoundry-server-dea, cf-server: cloudfoundry-server}
    units:
      cloudfoundry-server-dea/0:
        machine: 2
        relations:
          cf-dea-cluster: {state: up}
          cf-server: {state: up}
        state: started
2011-09-09 22:36:57,701 INFO 'status' command finished successfully

It looks a bit different now doesn't it?  :)

Here's what we just did:
  • added a new DEA
  • added a new MySQL
  • added a new MongoDB
  • added a new Redis
This version of the deployment looks a lot better.

But wait!!!  There's more!.

The newly deployed units can "grow" the deployment as needed.  For example:
  • juju add-unit cf-mysql
  • juju add-unit cf-mongodb
  • juju add-unit cf-redis
  • juju add-unit cloudfoundry-server-dea
Each one of the above commands will add another unit of the existing deployed ones thus, horizontally scaling our deployment.  Go ahead and add a few units and, run juju status after so you can get a better idea of how all of these services are orchestrated.

You may have noticed that I haven't gone into any details as to how these charms work ( especially if you have read any of my previous posts ).  The idea I am trying to convey here is that with Ubuntu Server and Juju you really don't have to know how CloudFoundry needs to be installed and configured in order to be able to deploy and scale it.  Juju charms neatly encapsulate all of the necessary knowledge so you don't have to.  That is not to say that you shouldn't or are not able to. These charms are available for download, review, contributions and feedback ( <--- the emphasis means that I would really like comments, contributions and general feedback )


I look forward to your comments/questions and general feedback.  Let me know what you think.


-Juan
http://blog.xtremeghost.com

Monday, August 29, 2011

Membase deployment and scaling with Ubuntu Server and Juju

** This is an updated post reflecting the new name of the project formerly known as Ensemble and now known as Juju **

Let's talk about Membase

Membsase ( from their website ):
Membase Server is the lowest latency, highest throughput NoSQL database technology on the market. When your application needs data, right now, it will get it, right now. A distributed key-value data store, Membase Server is designed and optimized for the data management needs of interactive web applications, so it allows the data layer to scale out just like the web application logic tier – simply by adding more commodity servers.

When I first read about Membase, my first thought was something like: yup...yet another NoSQL thing like memcached, cassandra, mongodb, etc. I have done Juju charms for MongoDB here and Cassandra here so, I thought I would do a Membase one as well.

Well.. I stand corrected. Membase is different in some significant ways. I got the following points from their website but, I gotta tell you; I actually agree with them:
  • Membase is simple
    • Easy to get, install, manage, expand and use
      • Especially now that I have the juju charm ready for use
    • Production ready
      • I have seen this deployed in networks that handle some really heavy traffic
  • Membase is fast
    • Durable Speed Without Compromising Safety.
      • Again, I have seen this thing in action and I'm impressed.
    • Managed Memory Caching Layer.
  • Membase server is elastic
    • Zero Downtime Topology Change
    • Spreads Data Across Cluster
I should probably stop now before I turn this into a Membase commercial :)


I you are not familiar with Juju, I highly recommend that you go over their website and read up on it.  I plan on doing a quick howto and basic intro about Juju but, not today so, go read about it.  I'll be here when you get back.

Welcome back .... By now I hope you like Juju as much as I do and are ready to check out some of the inner workings of the Membase Juju charm.  

Juju charms are pretty liberal.  They just need to have a basic directory structure and some aptly named files.  One of those files is the metadata.yaml file.  

The metadata.yaml file contains the charms' description and some information about what it provides, requires and "peers" with.

Here is the metadata.yaml file for the Membase charm:
name: membase
revision: 2
summary: Membase Server
description: |
   Membase Server is the leading distribution of memcached and
   membase, created and supported by top contributors to the memcached
   and membase open source projects.
provides:
  db:
    interface: membase
peers:
  cluster:
    interface: membase-cluster
This metadata file is pretty simple.  It contains the charms' description ( taken directly from their package ), what the charm provides ( it provides a db by the name of membase ) and it's "peers" interface.

"peers" interfaces are not required but, are the special interfaces that are used to have "peers" share information, settings, etc between nodes.  In fact, peers interfaces are how I have the different Membase nodes communicate with each other and incorporate themselves into the cluster.  Read up about the different types of interfaces and much more on Esemble's documentation page. Let's take a look at how this all works.  The interesting bit of this charm is in the cluster-relation-changed hook that is responsible for adding new nodes to the existing cluster.  Here is the script:

#!/bin/bash 
set -ux
MASTER_INSTALL_TIME=`facter membase-install-time`
MASTER_MEMBER=$JUJU_UNIT_NAME
# Find out the oldest/first node
for MEMBER in `relation-list`
do
   INSTALL_TIME=`relation-get install-time ${MEMBER}`
   [ ${INSTALL_TIME} -lt ${MASTER_INSTALL_TIME} ] && MASTER_MEMBER=${MEMBER}
done
if [ ${JUJU_UNIT_NAME} != ${MASTER_MEMBER} ]; then
   # I am not the master node so, I need to join up with the master node
    /opt/membase/bin/membase rebalance \
                 -u `relation-get username ${MASTER_MEMBER}` -p `relation-get password ${MASTER_MEMBER}` \
                 -c `relation-get ip ${MASTER_MEMBER}` \
                 --server-add=`facter membase-ip` \
                 --server-add-username=`facter membase-username` \
                 --server-add-password=`facter membase-password`
fi
exit 0
Not too complicated for what it does huh?  Juju provides the foundation for all of this to happen easily and Membase provides a lot of the facilities that simplify the needed actions ( in our case, add a new node and rebalance the cluster ).

The above script looks for the oldest node ( during install, I keep track of the installation time ).  The oldest node in the deployment is considered to be the master.  Once the "master" has been identified, all new nodes in the deployment add themselves to the master's cluster and re-balance the cluster.  Membase provides a good cli from where to accomplish all that's needed hence the simplicity of this script.

Juju's installation instruction are here.  The charm can be downloaded from here.  

Here is a quick rundown on the commands you will need to run in order to get your Membase deployment ready:
  • mkdir ~/juju-charms
  • cd ~/juju-charms
  • bzr branch lp:principia/membase
  • juju bootstrap
  • juju deploy --repository . membase
  • juju status
    • You should see something like this:
2011-08-26 20:08:54,112 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-50-16-161-217.compute-1.amazonaws.com, instance-id: i-65ece204}
  1: {dns-name: ec2-50-19-19-37.compute-1.amazonaws.com, instance-id: i-29ebe548}
services:
  membase:
    charm: local:membase-2
    relations: {cluster: membase}
    units:
      membase/0:
        machine: 1
        relations: {}
        state: null
  • I've highlighted the part you would want to copy and paste on your browser.  It will point you to the Membase's web interface.  It defaults to port 8091 so, you would ( in this case ) open your browser to http://ec2-50-19-19-37.compute-1.amazonaws.com:8091
  • Log in with the default username and password ( Administrator/administrator )
    • You can change the default username and password in the file membase/hooks/install
  • You should see something similar to this:


  • Leave that Browser running, go back to the command line and add another unit by typing:
    • juju add-unit membase
  • In a few minutes, you'll see the new node in the Web interface.  The data is being rebalanced and, when complete, the node will be available and a member of the cluster.  

Membase does many things right and, Ubuntu Server provides a stable platform where to deploy.  Juju wraps it all up by making it reusable and simple.

Check out Membase's documentation here as well for information on testing your Membase deployment and more of it's features.

Questions/Comments/Feedback.  Let me know ...


-Juan


Thursday, August 18, 2011

HPCC with Ubuntu Server and Ensemble

** This is an updated post reflecting the new name of the project formerly known as Ensemble now known as Juju **

Let's start this post with a bit of background on the technologies that I'll be using:
  • What is Ubuntu?
    • Ubuntu is a fast, secure and easy-to-use operating system used by millions of people around the world.  
    • Secure, fast and powerful, Ubuntu Server is transforming IT environments worldwide. Realise the full potential of your infrastructure with a reliable, easy-to-integrate technology platform.
  • What is Juju?
    • Juju is a next generation service orchestration framework. It has been likened to APT for the cloud. With juju, different authors are able to create service charms independently, and make those services coordinate their communication through a simple protocol. Users can then take the product of different authors and very comfortably deploy those services in an environment. The result is multiple machines and components transparently collaborating towards providing the requested service.
  • What is HPCC?
    • HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!
 Now that we are all caught up, let's delve right into it.  I will be discussing the details of my newly created hpcc juju charm.

The hpcc charm has been one of the trickiest one to date to get working properly so, I want to take some time to explain some of the challenges that I encountered.

hpcc seems to use ssh keys for authentication and a single xml file to hold it's configuration.  All nodes that are part of the cluster should have identical keys and xml configuration file.

  • The ssh keys are pretty easy to do ( there is even a script that will do it all for you located at /opt/HPCCSystems/sbin/keygen.sh ).  You can just run: ssh-keygen -f path_where_to_save_keys/id_rsa -N "" -q
  • The configuration file environment.xml is a lot trickier to configure so, I will use cheetah templates to help make a template out of this enormous file.  
    • According to their website:
      • Cheetah is an open source template engine and code generation tool, written in Python. It can be used standalone or combined with other tools and frameworks. Web development is its principle use, but Cheetah is very flexible and is also being used to generate C++ game code, Java, sql, form emails and even Python code.
With cheetah, I can create self contained templates that can be generated into their intended file by just calling cheetah.  This is because we can embed python code inside the template itself, making the template ( environment.tmpl in our case ) more or less a python program that generates a fully functional environment.xml file ready for hpcc to use.

Another, very important, reason to use a template engine is the ability to create identical configuration files from each node without having to pass them around.  In other words, each node can create it's own configuration file and, since all nodes are using the same methods and data to create the file, they will all be exactly the same.

The hpcc configuration file is huge so, I'll just talk about some of the interesting bits of it here:
#import random
#import subprocess
#set $rel_structure = { $subprocess.check_output(['facter', 'install_time']).strip() : { 'name' : $subprocess.check_output(['hostname', '-f']).strip(), 'netAddress' : $subprocess.check_output(['facter','ipaddress']).strip(), 'uuid' : $subprocess.check_output(['facter','uuid']).strip()  } }
#for $member in $subprocess.check_output(['relation-list']).strip().split():
   #set $rel_structure[$subprocess.check_output(['relation-get','install_time', $member]).strip()] = { 'name' : $subprocess.check_output(['relation-get','name', $member]).strip(), 'netAddress' : $subprocess.check_output(['relation-get','netAddress', $member]).strip(), 'uuid' : $subprocess.check_output(['relation-get', 'uuid', $member]).strip() }
#end for
#set $nodes = []
#for $index in $sorted($rel_structure.keys()):
   $nodes.append(($rel_structure[$index]['netAddress'], $rel_structure[$index]['name']))
#end for
The above piece of code is what I am currently using to populate a list with the FQDN and address of each cluster member sorted by install time.  This puts the "master" of the cluster at the top of the list which will become useful when populating certain parts of the configuration file.


As we can see by the code above, the main piece of information that we use in this template is the node list.  Here is a sample of how we use it in the environment.tmpl template file:
 #for $netAddress, $name in $nodes:
  <Computer computerType="linuxmachine"
            domain="localdomain"
            name="$name"
            netAddress="$netAddress"/>
#end for
I encourage you to download the charm here and examine the environment.tmpl file in the templates directory.

Here is the complete environment.tmpl file... I know it's pretty small and, you can just download the charm and read the file at your leasure but, I wanted to give you an idea of the size and complexity of hpcc's configuration file.

====

#import random
#import subprocess
#set $rel_structure = { $subprocess.check_output(['facter', 'install_time']).strip() : { 'name' : $subprocess.check_output(['hostname', '-f']).strip(), 'netAddress' : $subprocess.check_output(['facter','ipaddress']).strip(), 'uuid' : $subprocess.check_output(['facter','uuid']).strip()  } }
#for $member in $subprocess.check_output(['relation-list']).strip().split():
   #set $rel_structure[$subprocess.check_output(['relation-get','install_time', $member]).strip()] = { 'name' : $subprocess.check_output(['relation-get','name', $member]).strip(), 'netAddress' : $subprocess.check_output(['relation-get','netAddress', $member]).strip(), 'uuid' : $subprocess.check_output(['relation-get', 'uuid', $member]).strip() }
#end for
#set $nodes = []
#for $index in $sorted($rel_structure.keys()):
   $nodes.append(($rel_structure[$index]['netAddress'], $rel_structure[$index]['name']))
#end for
<?xml version="1.0" encoding="UTF-8"?>
<!-- Edited with ConfigMgr on ip 71.204.190.179 on 2011-08-16T00:39:16 -->
<Environment>
 <EnvSettings>
  <blockname>HPCCSystems</blockname>
  <configs>/etc/HPCCSystems</configs>
  <environment>environment.xml</environment>
  <group>hpcc</group>
  <home>/home</home>
  <interface>eth0</interface>
  <lock>/var/lock/HPCCSystems</lock>
  <log>/var/log/HPCCSystems</log>
  <path>/opt/HPCCSystems</path>
  <pid>/var/run/HPCCSystems</pid>
  <runtime>/var/lib/HPCCSystems</runtime>
  <sourcedir>/etc/HPCCSystems/source</sourcedir>
  <user>hpcc</user>
 </EnvSettings>
 <Hardware>
 #for $netAddress, $name in $nodes:
  <Computer computerType="linuxmachine"
            domain="localdomain"
            name="$name"
            netAddress="$netAddress"/>
#end for
  <ComputerType computerType="linuxmachine"
                manufacturer="unknown"
                name="linuxmachine"
                opSys="linux"/>
  <Domain name="localdomain" password="" username=""/>
  <Switch name="Switch"/>
 </Hardware>
 <Programs>
  <Build name="community_3.0.4" url="/opt/HPCCSystems">
   <BuildSet installSet="deploy_map.xml"
             name="dafilesrv"
             path="componentfiles/dafilesrv"
             processName="DafilesrvProcess"
             schema="dafilesrv.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="dali"
             path="componentfiles/dali"
             processName="DaliServerProcess"
             schema="dali.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="dfuplus"
             path="componentfiles/dfuplus"
             processName="DfuplusProcess"
             schema="dfuplus.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="dfuserver"
             path="componentfiles/dfuserver"
             processName="DfuServerProcess"
             schema="dfuserver.xsd"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="DropZone"
             path="componentfiles/DropZone"
             processName="DropZone"
             schema="dropzone.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="eclagent"
             path="componentfiles/eclagent"
             processName="EclAgentProcess"
             schema="eclagent_config.xsd"/>
   <BuildSet installSet="deploy_map.xml" name="eclminus" path="componentfiles/eclminus"/>
   <BuildSet installSet="deploy_map.xml"
             name="eclplus"
             path="componentfiles/eclplus"
             processName="EclPlusProcess"
             schema="eclplus.xsd"/>
   <BuildSet installSet="eclccserver_deploy_map.xml"
             name="eclccserver"
             path="componentfiles/configxml"
             processName="EclCCServerProcess"
             schema="eclccserver.xsd"/>
   <BuildSet installSet="eclscheduler_deploy_map.xml"
             name="eclscheduler"
             path="componentfiles/configxml"
             processName="EclSchedulerProcess"
             schema="eclscheduler.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="esp"
             path="componentfiles/esp"
             processName="EspProcess"
             schema="esp.xsd"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="espsmc"
             path="componentfiles/espsmc"
             processName="EspService"
             schema="espsmcservice.xsd">
    <Properties defaultPort="8010"
                defaultResourcesBasedn="ou=SMC,ou=EspServices,ou=ecl"
                defaultSecurePort="18010"
                type="WsSMC">
     <Authenticate access="Read"
                   description="Root access to SMC service"
                   path="/"
                   required="Read"
                   resource="SmcAccess"/>
     <AuthenticateFeature description="Access to SMC service"
                          path="SmcAccess"
                          resource="SmcAccess"
                          service="ws_smc"/>
     <AuthenticateFeature description="Access to thor queues"
                          path="ThorQueueAccess"
                          resource="ThorQueueAccess"
                          service="ws_smc"/>
     <AuthenticateFeature description="Access to super computer environment"
                          path="ConfigAccess"
                          resource="ConfigAccess"
                          service="ws_config"/>
     <AuthenticateFeature description="Access to DFU"
                          path="DfuAccess"
                          resource="DfuAccess"
                          service="ws_dfu"/>
     <AuthenticateFeature description="Access to DFU XRef"
                          path="DfuXrefAccess"
                          resource="DfuXrefAccess"
                          service="ws_dfuxref"/>
     <AuthenticateFeature description="Access to machine information"
                          path="MachineInfoAccess"
                          resource="MachineInfoAccess"
                          service="ws_machine"/>
     <AuthenticateFeature description="Access to SNMP metrics information"
                          path="MetricsAccess"
                          resource="MetricsAccess"
                          service="ws_machine"/>
     <AuthenticateFeature description="Access to remote execution"
                          path="ExecuteAccess"
                          resource="ExecuteAccess"
                          service="ws_machine"/>
     <AuthenticateFeature description="Access to DFU workunits"
                          path="DfuWorkunitsAccess"
                          resource="DfuWorkunitsAccess"
                          service="ws_fs"/>
     <AuthenticateFeature description="Access to DFU exceptions"
                          path="DfuExceptionsAccess"
                          resource="DfuExceptions"
                          service="ws_fs"/>
     <AuthenticateFeature description="Access to spraying files"
                          path="FileSprayAccess"
                          resource="FileSprayAccess"
                          service="ws_fs"/>
     <AuthenticateFeature description="Access to despraying of files"
                          path="FileDesprayAccess"
                          resource="FileDesprayAccess"
                          service="ws_fs"/>
     <AuthenticateFeature description="Access to dkcing of key files"
                          path="FileDkcAccess"
                          resource="FileDkcAccess"
                          service="ws_fs"/>
     <AuthenticateFeature description="Access to files in dropzone"
                          path="FileIOAccess"
                          resource="FileIOAccess"
                          service="ws_fileio"/>
     <AuthenticateFeature description="Access to WS ECL service"
                          path="WsEclAccess"
                          resource="WsEclAccess"
                          service="ws_ecl"/>
     <AuthenticateFeature description="Access to Roxie queries and files"
                          path="RoxieQueryAccess"
                          resource="RoxieQueryAccess"
                          service="ws_roxiequery"/>
     <AuthenticateFeature description="Access to cluster topology"
                          path="ClusterTopologyAccess"
                          resource="ClusterTopologyAccess"
                          service="ws_topology"/>
     <AuthenticateFeature description="Access to own workunits"
                          path="OwnWorkunitsAccess"
                          resource="OwnWorkunitsAccess"
                          service="ws_workunits"/>
     <AuthenticateFeature description="Access to others&apos; workunits"
                          path="OthersWorkunitsAccess"
                          resource="OthersWorkunitsAccess"
                          service="ws_workunits"/>
     <AuthenticateFeature description="Access to ECL direct service"
                          path="EclDirectAccess"
                          resource="EclDirectAccess"
                          service="ecldirect"/>
     <ProcessFilters>
      <Platform name="Windows">
       <ProcessFilter name="any">
        <Process name="dafilesrv"/>
       </ProcessFilter>
       <ProcessFilter name="AttrServerProcess">
        <Process name="attrserver"/>
       </ProcessFilter>
       <ProcessFilter name="DaliProcess">
        <Process name="daserver"/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="DfuServerProcess">
        <Process name="dfuserver"/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="EclCCServerProcess">
        <Process name="eclccserver"/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="EspProcess">
        <Process name="esp"/>
        <Process name="dafilesrv" remove="true"/>
       </ProcessFilter>
       <ProcessFilter name="FTSlaveProcess">
        <Process name="ftslave"/>
       </ProcessFilter>
       <ProcessFilter name="RoxieServerProcess">
        <Process name="ccd"/>
       </ProcessFilter>
       <ProcessFilter name="RoxieSlaveProcess">
        <Process name="ccd"/>
       </ProcessFilter>
       <ProcessFilter name="SchedulerProcess">
        <Process name="scheduler"/>
       </ProcessFilter>
       <ProcessFilter name="ThorMasterProcess">
        <Process name="thormaster"/>
       </ProcessFilter>
       <ProcessFilter name="ThorSlaveProcess">
        <Process name="thorslave"/>
       </ProcessFilter>
       <ProcessFilter name="SashaServerProcess">
        <Process name="saserver"/>
       </ProcessFilter>
      </Platform>
      <Platform name="Linux">
       <ProcessFilter name="any">
        <Process name="dafilesrv"/>
       </ProcessFilter>
       <ProcessFilter name="AttrServerProcess">
        <Process name="attrserver"/>
       </ProcessFilter>
       <ProcessFilter name="DaliProcess">
        <Process name="daserver"/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="DfuServerProcess">
        <Process name="."/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="EclCCServerProcess">
        <Process name="."/>
       </ProcessFilter>
       <ProcessFilter multipleInstances="true" name="EspProcess">
        <Process name="."/>
        <Process name="dafilesrv" remove="true"/>
       </ProcessFilter>
       <ProcessFilter name="FTSlaveProcess">
        <Process name="ftslave"/>
       </ProcessFilter>
       <ProcessFilter name="GenesisServerProcess">
        <Process name="httpd"/>
        <Process name="atftpd"/>
        <Process name="dhcpd"/>
       </ProcessFilter>
       <ProcessFilter name="RoxieServerProcess">
        <Process name="ccd"/>
       </ProcessFilter>
       <ProcessFilter name="RoxieSlaveProcess">
        <Process name="ccd"/>
       </ProcessFilter>
       <ProcessFilter name="SchedulerProcess">
        <Process name="scheduler"/>
       </ProcessFilter>
       <ProcessFilter name="ThorMasterProcess">
        <Process name="thormaster"/>
       </ProcessFilter>
       <ProcessFilter name="ThorSlaveProcess">
        <Process name="thorslave"/>
       </ProcessFilter>
       <ProcessFilter name="SashaServerProcess">
        <Process name="saserver"/>
       </ProcessFilter>
      </Platform>
     </ProcessFilters>
    </Properties>
   </BuildSet>
   <BuildSet installSet="deploy_map.xml"
             name="ftslave"
             path="componentfiles/ftslave"
             processName="FTSlaveProcess"
             schema="ftslave_linux.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="hqltest"
             path="componentfiles/hqltest"
             processName="HqlTestProcess"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="ldapServer"
             path="componentfiles/ldapServer"
             processName="LDAPServerProcess"
             schema="ldapserver.xsd"/>
   <BuildSet deployable="no"
             installSet="auditlib_deploy_map.xml"
             name="plugins_auditlib"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="debugservices_deploy_map.xml"
             name="plugins_debugservices"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="fileservices_deploy_map.xml"
             name="plugins_fileservices"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="logging_deploy_map.xml"
             name="plugins_logging"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="parselib_deploy_map.xml"
             name="plugins_parselib"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="stringlib_deploy_map.xml"
             name="plugins_stringlib"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="unicodelib_deploy_map.xml"
             name="plugins_unicodelib"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet deployable="no"
             installSet="workunitservices_deploy_map.xml"
             name="plugins_workunitservices"
             path="componentfiles/configxml"
             processName="PluginProcess"
             schema="plugin.xsd"/>
   <BuildSet installSet="roxie_deploy_map.xml"
             name="roxie"
             path="componentfiles/configxml"
             processName="RoxieCluster"
             schema="roxie.xsd"/>
   <BuildSet installSet="deploy_map.xml" name="roxieconfig" path="componentfiles/roxieconfig"/>
   <BuildSet installSet="deploy_map.xml"
             name="sasha"
             path="componentfiles/sasha"
             processName="SashaServerProcess"
             schema="sasha.xsd"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="SiteCertificate"
             path="componentfiles/SiteCertificate"
             processName="SiteCertificate"
             schema="SiteCertificate.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="soapplus"
             path="componentfiles/soapplus"
             processName="SoapPlusProcess"
             schema="soapplus.xsd"/>
   <BuildSet installSet="deploy_map.xml"
             name="thor"
             path="componentfiles/thor"
             processName="ThorCluster"
             schema="thor.xsd"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="topology"
             path="componentfiles/topology"
             processName="Topology"
             schema="topology.xsd"/>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="ws_ecl"
             path="componentfiles/ws_ecl"
             processName="EspService"
             schema="esp_service_wsecl2.xsd">
    <Properties bindingType="ws_eclSoapBinding"
                defaultPort="8002"
                defaultResourcesBasedn="ou=WsEcl,ou=EspServices,ou=ecl"
                defaultSecurePort="18002"
                plugin="ws_ecl"
                type="ws_ecl">
     <Authenticate access="Read"
                   description="Root access to WS ECL service"
                   path="/"
                   required="Read"
                   resource="WsEclAccess"/>
     <AuthenticateFeature description="Access to WS ECL service"
                          path="WsEclAccess"
                          resource="WsEclAccess"
                          service="ws_ecl"/>
    </Properties>
   </BuildSet>
   <BuildSet deployable="no"
             installSet="deploy_map.xml"
             name="ecldirect"
             path="componentfiles/ecldirect"
             processName="EspService"
             schema="esp_service_ecldirect.xsd">
    <Properties bindingType="EclDirectSoapBinding"
                defaultPort="8008"
                defaultResourcesBasedn="ou=EclDirectAccess,ou=EspServices,ou=ecl"
                defaultSecurePort="18008"
                plugin="ecldirect"
                type="ecldirect">
     <Authenticate access="Read"
                   description="Root access to ECL Direct service"
                   path="/"
                   required="Read"
                   resource="EclDirectAccess"/>
     <AuthenticateFeature description="Access to ECL Direct service"
                          path="EclDirectAccess"
                          resource="EclDirectAccess"
                          service="ecldirect"/>
    </Properties>
   </BuildSet>
  </Build>
 </Programs>
 <Software>
  <DafilesrvProcess build="community_3.0.4"
                    buildSet="dafilesrv"
                    description="DaFileSrv process"
                    name="mydafilesrv"
                    version="1">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/mydafilesrv"
             name="s_$name"
             netAddress="$netAddress"/>
#end for             
  </DafilesrvProcess>
  <DaliServerProcess build="community_3.0.4"
                     buildSet="dali"
                     environment="/etc/HPCCSystems/environment.xml"
                     name="mydali"
                     recoverFromIncErrors="true">                     
   <Instance computer="$nodes[0][1]"
             directory="/var/lib/HPCCSystems/mydali"
             name="s_$nodes[0][1]"
             netAddress="$nodes[0][0]"
             port="7070"/>
  </DaliServerProcess>
  <DfuServerProcess build="community_3.0.4"
                    buildSet="dfuserver"
                    daliServers="mydali"
                    description="DFU Server"
                    monitorinterval="900"
                    monitorqueue="dfuserver_monitor_queue"
                    name="mydfuserver"
                    queue="dfuserver_queue"
                    transferBufferSize="65536">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/mydfuserver"
             name="s_$name"
             netAddress="$netAddress"/>
#end for    
#raw        
   <SSH SSHidentityfile="$HOME/.ssh/id_rsa"
#end raw
        SSHpassword=""
        SSHretries="3"
        SSHtimeout="0"
        SSHusername="hpcc"/>
  </DfuServerProcess>
  <Directories name="HPCCSystems">
   <Category dir="/var/log/[NAME]/[INST]" name="log"/>
   <Category dir="/var/lib/[NAME]/[INST]" name="run"/>
   <Category dir="/etc/[NAME]/[INST]" name="conf"/>
   <Category dir="/var/lib/[NAME]/[INST]/temp" name="temp"/>
   <Category dir="/var/lib/[NAME]/hpcc-data/[COMPONENT]" name="data"/>
   <Category dir="/var/lib/[NAME]/hpcc-data2/[COMPONENT]" name="data2"/>
   <Category dir="/var/lib/[NAME]/hpcc-data3/[COMPONENT]" name="data3"/>
   <Category dir="/var/lib/[NAME]/hpcc-mirror/[COMPONENT]" name="mirror"/>
   <Category dir="/var/lib/[NAME]/queries/[INST]" name="query"/>
   <Category dir="/var/lock/[NAME]/[INST]" name="lock"/>
  </Directories>
  <DropZone build="community_3.0.4"
            buildSet="DropZone"
            computer="$nodes[0][0]"
            description="DropZone process"
            directory="/var/lib/HPCCSystems/dropzone"
            name="mydropzone"/>
  <EclAgentProcess allowedPipePrograms="*"
                   build="community_3.0.4"
                   buildSet="eclagent"
                   daliServers="mydali"
                   description="EclAgent process"
                   name="myeclagent"
                   pluginDirectory="/opt/HPCCSystems/plugins/"
                   thorConnectTimeout="600"
                   traceLevel="0"
                   wuQueueName="myeclagent_queue">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/myeclagent"
             name="s_$name"
             netAddress="$netAddress"/>
#end for  
  </EclAgentProcess>
  <EclCCServerProcess build="community_3.0.4"
                      buildSet="eclccserver"
                      daliServers="mydali"
                      description="EclCCServer process"
                      enableSysLog="true"
                      maxCompileThreads="4"
                      name="myeclccserver"
                      traceLevel="1">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/myeclccserver"
             name="s_$name"
             netAddress="$netAddress"/>
#end for
  </EclCCServerProcess>
  <EclSchedulerProcess build="community_3.0.4"
                       buildSet="eclscheduler"
                       daliServers="mydali"
                       description="EclScheduler process"
                       name="myeclscheduler">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/myeclscheduler"
             name="s_$name"
             netAddress="$netAddress"/>
#end for
  </EclSchedulerProcess>
  <EspProcess build="community_3.0.4"
              buildSet="esp"
              componentfilesDir="/opt/HPCCSystems/componentfiles"
              daliServers="mydali"
              description="ESP server"
              enableSEHMapping="true"
              formOptionsAccess="false"
              httpConfigAccess="true"
              logLevel="1"
              logRequests="false"
              logResponses="false"
              maxBacklogQueueSize="200"
              maxConcurrentThreads="0"
              maxRequestEntityLength="8000000"
              name="myesp"
              perfReportDelay="60"
              portalurl="http://hpccsystems.com/download">
   <Authentication ldapAuthMethod="kerberos"
                   ldapConnections="10"
                   ldapServer=""
                   method="none"/>
   <EspBinding defaultForPort="true"
               defaultServiceVersion=""
               name="myespsmc"
               port="8010"
               protocol="http"
               resourcesBasedn="ou=SMC,ou=EspServices,ou=ecl"
               service="EclWatch"
               workunitsBasedn="ou=workunits,ou=ecl"
               wsdlServiceAddress="">
    <Authenticate access="Read"
                  description="Root access to SMC service"
                  path="/"
                  required="Read"
                  resource="SmcAccess"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to SMC service"
                         path="SmcAccess"
                         resource="SmcAccess"
                         service="ws_smc"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to thor queues"
                         path="ThorQueueAccess"
                         resource="ThorQueueAccess"
                         service="ws_smc"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to super computer environment"
                         path="ConfigAccess"
                         resource="ConfigAccess"
                         service="ws_config"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to DFU"
                         path="DfuAccess"
                         resource="DfuAccess"
                         service="ws_dfu"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to DFU XRef"
                         path="DfuXrefAccess"
                         resource="DfuXrefAccess"
                         service="ws_dfuxref"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to machine information"
                         path="MachineInfoAccess"
                         resource="MachineInfoAccess"
                         service="ws_machine"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to SNMP metrics information"
                         path="MetricsAccess"
                         resource="MetricsAccess"
                         service="ws_machine"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to remote execution"
                         path="ExecuteAccess"
                         resource="ExecuteAccess"
                         service="ws_machine"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to DFU workunits"
                         path="DfuWorkunitsAccess"
                         resource="DfuWorkunitsAccess"
                         service="ws_fs"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to DFU exceptions"
                         path="DfuExceptionsAccess"
                         resource="DfuExceptions"
                         service="ws_fs"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to spraying files"
                         path="FileSprayAccess"
                         resource="FileSprayAccess"
                         service="ws_fs"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to despraying of files"
                         path="FileDesprayAccess"
                         resource="FileDesprayAccess"
                         service="ws_fs"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to dkcing of key files"
                         path="FileDkcAccess"
                         resource="FileDkcAccess"
                         service="ws_fs"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to files in dropzone"
                         path="FileIOAccess"
                         resource="FileIOAccess"
                         service="ws_fileio"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to WS ECL service"
                         path="WsEclAccess"
                         resource="WsEclAccess"
                         service="ws_ecl"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to Roxie queries and files"
                         path="RoxieQueryAccess"
                         resource="RoxieQueryAccess"
                         service="ws_roxiequery"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to cluster topology"
                         path="ClusterTopologyAccess"
                         resource="ClusterTopologyAccess"
                         service="ws_topology"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to own workunits"
                         path="OwnWorkunitsAccess"
                         resource="OwnWorkunitsAccess"
                         service="ws_workunits"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to others&apos; workunits"
                         path="OthersWorkunitsAccess"
                         resource="OthersWorkunitsAccess"
                         service="ws_workunits"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to ECL direct service"
                         path="EclDirectAccess"
                         resource="EclDirectAccess"
                         service="ecldirect"/>
   </EspBinding>
   <EspBinding defaultForPort="true"
               defaultServiceVersion=""
               name="myws_ecl"
               port="8002"
               protocol="http"
               resourcesBasedn="ou=WsEcl,ou=EspServices,ou=ecl"
               service="myws_ecl"
               workunitsBasedn="ou=workunits,ou=ecl"
               wsdlServiceAddress="">
    <Authenticate access="Read"
                  description="Root access to WS ECL service"
                  path="/"
                  required="Read"
                  resource="WsEclAccess"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to WS ECL service"
                         path="WsEclAccess"
                         resource="WsEclAccess"
                         service="ws_ecl"/>
   </EspBinding>
   <EspBinding defaultForPort="true"
               defaultServiceVersion=""
               name="myecldirect"
               port="8008"
               protocol="http"
               resourcesBasedn="ou=EclDirectAccess,ou=EspServices,ou=ecl"
               service="myecldirect"
               workunitsBasedn="ou=workunits,ou=ecl"
               wsdlServiceAddress="">
    <Authenticate access="Read"
                  description="Root access to ECL Direct service"
                  path="/"
                  required="Read"
                  resource="EclDirectAccess"/>
    <AuthenticateFeature authenticate="Yes"
                         description="Access to ECL Direct service"
                         path="EclDirectAccess"
                         resource="EclDirectAccess"
                         service="ecldirect"/>
   </EspBinding>
   <HTTPS acceptSelfSigned="true"
          CA_Certificates_Path="ca.pem"
          certificateFileName="certificate.cer"
          city=""
          country="US"
          daysValid="365"
          enableVerification="false"
          organization="Customer of HPCCSystems"
          organizationalUnit=""
          passphrase=""
          privateKeyFileName="privatekey.cer"
          regenerateCredentials="false"
          requireAddressMatch="false"
          state=""
          trustedPeers="anyone"/>
   #for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/myesp"
             FQDN=""
             name="s_$name"
             netAddress="$netAddress"/>
   #end for
   <ProtocolX authTimeout="3"
              defaultTimeout="21"
              idleTimeout="3600"
              maxTimeout="36"
              minTimeout="30"
              threadCount="2"/>
  </EspProcess>
  <EspService allowNewRoxieOnDemandQuery="false"
              AWUsCacheTimeout="15"
              build="community_3.0.4"
              buildSet="espsmc"
              description="ESP services for SMC"
              disableUppercaseTranslation="false"
              eclServer="myeclccserver"
              enableSystemUseRewrite="false"
              excludePartitions="/,/dev*,/sys,/usr,/proc/*"
              monitorDaliFileServer="false"
              name="EclWatch"
              pluginsPath="/opt/HPCCSystems/plugins"
              syntaxCheckQueue=""
              viewTimeout="1000"
              warnIfCpuLoadOver="95"
              warnIfFreeMemoryUnder="5"
              warnIfFreeStorageUnder="5">
   <Properties defaultPort="8010"
               defaultResourcesBasedn="ou=SMC,ou=EspServices,ou=ecl"
               defaultSecurePort="18010"
               type="WsSMC">
    <Authenticate access="Read"
                  description="Root access to SMC service"
                  path="/"
                  required="Read"
                  resource="SmcAccess"/>
    <AuthenticateFeature description="Access to SMC service"
                         path="SmcAccess"
                         resource="SmcAccess"
                         service="ws_smc"/>
    <AuthenticateFeature description="Access to thor queues"
                         path="ThorQueueAccess"
                         resource="ThorQueueAccess"
                         service="ws_smc"/>
    <AuthenticateFeature description="Access to super computer environment"
                         path="ConfigAccess"
                         resource="ConfigAccess"
                         service="ws_config"/>
    <AuthenticateFeature description="Access to DFU"
                         path="DfuAccess"
                         resource="DfuAccess"
                         service="ws_dfu"/>
    <AuthenticateFeature description="Access to DFU XRef"
                         path="DfuXrefAccess"
                         resource="DfuXrefAccess"
                         service="ws_dfuxref"/>
    <AuthenticateFeature description="Access to machine information"
                         path="MachineInfoAccess"
                         resource="MachineInfoAccess"
                         service="ws_machine"/>
    <AuthenticateFeature description="Access to SNMP metrics information"
                         path="MetricsAccess"
                         resource="MetricsAccess"
                         service="ws_machine"/>
    <AuthenticateFeature description="Access to remote execution"
                         path="ExecuteAccess"
                         resource="ExecuteAccess"
                         service="ws_machine"/>
    <AuthenticateFeature description="Access to DFU workunits"
                         path="DfuWorkunitsAccess"
                         resource="DfuWorkunitsAccess"
                         service="ws_fs"/>
    <AuthenticateFeature description="Access to DFU exceptions"
                         path="DfuExceptionsAccess"
                         resource="DfuExceptions"
                         service="ws_fs"/>
    <AuthenticateFeature description="Access to spraying files"
                         path="FileSprayAccess"
                         resource="FileSprayAccess"
                         service="ws_fs"/>
    <AuthenticateFeature description="Access to despraying of files"
                         path="FileDesprayAccess"
                         resource="FileDesprayAccess"
                         service="ws_fs"/>
    <AuthenticateFeature description="Access to dkcing of key files"
                         path="FileDkcAccess"
                         resource="FileDkcAccess"
                         service="ws_fs"/>
    <AuthenticateFeature description="Access to files in dropzone"
                         path="FileIOAccess"
                         resource="FileIOAccess"
                         service="ws_fileio"/>
    <AuthenticateFeature description="Access to WS ECL service"
                         path="WsEclAccess"
                         resource="WsEclAccess"
                         service="ws_ecl"/>
    <AuthenticateFeature description="Access to Roxie queries and files"
                         path="RoxieQueryAccess"
                         resource="RoxieQueryAccess"
                         service="ws_roxiequery"/>
    <AuthenticateFeature description="Access to cluster topology"
                         path="ClusterTopologyAccess"
                         resource="ClusterTopologyAccess"
                         service="ws_topology"/>
    <AuthenticateFeature description="Access to own workunits"
                         path="OwnWorkunitsAccess"
                         resource="OwnWorkunitsAccess"
                         service="ws_workunits"/>
    <AuthenticateFeature description="Access to others&apos; workunits"
                         path="OthersWorkunitsAccess"
                         resource="OthersWorkunitsAccess"
                         service="ws_workunits"/>
    <AuthenticateFeature description="Access to ECL direct service"
                         path="EclDirectAccess"
                         resource="EclDirectAccess"
                         service="ecldirect"/>
    <ProcessFilters>
     <Platform name="Windows">
      <ProcessFilter name="any">
       <Process name="dafilesrv"/>
      </ProcessFilter>
      <ProcessFilter name="AttrServerProcess">
       <Process name="attrserver"/>
      </ProcessFilter>
      <ProcessFilter name="DaliProcess">
       <Process name="daserver"/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="DfuServerProcess">
       <Process name="dfuserver"/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="EclCCServerProcess">
       <Process name="eclccserver"/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="EspProcess">
       <Process name="esp"/>
       <Process name="dafilesrv" remove="true"/>
      </ProcessFilter>
      <ProcessFilter name="FTSlaveProcess">
       <Process name="ftslave"/>
      </ProcessFilter>
      <ProcessFilter name="RoxieServerProcess">
       <Process name="ccd"/>
      </ProcessFilter>
      <ProcessFilter name="RoxieSlaveProcess">
       <Process name="ccd"/>
      </ProcessFilter>
      <ProcessFilter name="SchedulerProcess">
       <Process name="scheduler"/>
      </ProcessFilter>
      <ProcessFilter name="ThorMasterProcess">
       <Process name="thormaster"/>
      </ProcessFilter>
      <ProcessFilter name="ThorSlaveProcess">
       <Process name="thorslave"/>
      </ProcessFilter>
      <ProcessFilter name="SashaServerProcess">
       <Process name="saserver"/>
      </ProcessFilter>
     </Platform>
     <Platform name="Linux">
      <ProcessFilter name="any">
       <Process name="dafilesrv"/>
      </ProcessFilter>
      <ProcessFilter name="AttrServerProcess">
       <Process name="attrserver"/>
      </ProcessFilter>
      <ProcessFilter name="DaliProcess">
       <Process name="daserver"/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="DfuServerProcess">
       <Process name="."/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="EclCCServerProcess">
       <Process name="."/>
      </ProcessFilter>
      <ProcessFilter multipleInstances="true" name="EspProcess">
       <Process name="."/>
       <Process name="dafilesrv" remove="true"/>
      </ProcessFilter>
      <ProcessFilter name="FTSlaveProcess">
       <Process name="ftslave"/>
      </ProcessFilter>
      <ProcessFilter name="GenesisServerProcess">
       <Process name="httpd"/>
       <Process name="atftpd"/>
       <Process name="dhcpd"/>
      </ProcessFilter>
      <ProcessFilter name="RoxieServerProcess">
       <Process name="ccd"/>
      </ProcessFilter>
      <ProcessFilter name="RoxieSlaveProcess">
       <Process name="ccd"/>
      </ProcessFilter>
      <ProcessFilter name="SchedulerProcess">
       <Process name="scheduler"/>
      </ProcessFilter>
      <ProcessFilter name="ThorMasterProcess">
       <Process name="thormaster"/>
      </ProcessFilter>
      <ProcessFilter name="ThorSlaveProcess">
       <Process name="thorslave"/>
      </ProcessFilter>
      <ProcessFilter name="SashaServerProcess">
       <Process name="saserver"/>
      </ProcessFilter>
     </Platform>
    </ProcessFilters>
   </Properties>
  </EspService>
  <EspService build="community_3.0.4"
              buildSet="ws_ecl"
              description="WS ECL Service"
              name="myws_ecl">
   <Properties bindingType="ws_eclSoapBinding"
               defaultPort="8002"
               defaultResourcesBasedn="ou=WsEcl,ou=EspServices,ou=ecl"
               defaultSecurePort="18002"
               plugin="ws_ecl"
               type="ws_ecl">
    <Authenticate access="Read"
                  description="Root access to WS ECL service"
                  path="/"
                  required="Read"
                  resource="WsEclAccess"/>
    <AuthenticateFeature description="Access to WS ECL service"
                         path="WsEclAccess"
                         resource="WsEclAccess"
                         service="ws_ecl"/>
   </Properties>
  </EspService>
  <EspService build="community_3.0.4"
              buildSet="ecldirect"
              clusterName="hthor"
              description="ESP service for running raw ECL queries"
              name="myecldirect">
   <Properties bindingType="EclDirectSoapBinding"
               defaultPort="8008"
               defaultResourcesBasedn="ou=EclDirectAccess,ou=EspServices,ou=ecl"
               defaultSecurePort="18008"
               plugin="ecldirect"
               type="ecldirect">
    <Authenticate access="Read"
                  description="Root access to ECL Direct service"
                  path="/"
                  required="Read"
                  resource="EclDirectAccess"/>
    <AuthenticateFeature description="Access to ECL Direct service"
                         path="EclDirectAccess"
                         resource="EclDirectAccess"
                         service="ecldirect"/>
   </Properties>
  </EspService>
  <FTSlaveProcess build="community_3.0.4"
                  buildSet="ftslave"
                  description="FTSlave process"
                  name="myftslave"
                  version="1">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/myftslave"
             name="s_$name"
             netAddress="$netAddress"
             program="/opt/HPCCSystems/bin/ftslave"/>
#end for
  </FTSlaveProcess>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_auditlib"
                 description="plugin process"
                 name="myplugins_auditlib"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_debugservices"
                 description="plugin process"
                 name="myplugins_debugservices"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_fileservices"
                 description="plugin process"
                 name="myplugins_fileservices"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_logging"
                 description="plugin process"
                 name="myplugins_logging"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_parselib"
                 description="plugin process"
                 name="myplugins_parselib"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_stringlib"
                 description="plugin process"
                 name="myplugins_stringlib"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_unicodelib"
                 description="plugin process"
                 name="myplugins_unicodelib"/>
  <PluginProcess build="community_3.0.4"
                 buildSet="plugins_workunitservices"
                 description="plugin process"
                 name="myplugins_workunitservices"/>
  <RoxieCluster allowRoxieOnDemand="false"
                baseDataDir="/var/lib/HPCCSystems/hpcc-data/roxie"
                blindLogging="false"
                blobCacheMem="0"
                build="community_3.0.4"
                buildSet="roxie"
                callbackRetries="3"
                callbackTimeout="500"
                channel0onPrimary="true"
                checkCompleted="true"
                checkFileDate="true"
                checkingHeap="0"
                checkPrimaries="true"
                checkState="false"
                checkVersion="true"
                clusterWidth="$len($nodes)"
                copyResources="true"
                crcResources="false"
                cyclicOffset="1"
                dafilesrvLookupTimeout="10000"
                daliServers="mydali"
                debugPermitted="true"
                defaultConcatPreload="0"
                defaultFetchPreload="0"
                defaultFullKeyedJoinPreload="0"
                defaultHighPriorityTimeLimit="0"
                defaultHighPriorityTimeWarning="5000"
                defaultKeyedJoinPreload="0"
                defaultLowPriorityTimeLimit="0"
                defaultLowPriorityTimeWarning="0"
                defaultMemoryLimit="0"
                defaultParallelJoinPreload="0"
                defaultPrefetchProjectPreload="10"
                defaultSLAPriorityTimeLimit="0"
                defaultSLAPriorityTimeWarning="5000"
                defaultStripLeadingWhitespace="1"
                deleteUnneededFiles="false"
                description="Roxie cluster"
                directory="/var/lib/HPCCSystems/myroxie"
                diskReadBufferSize="65536"
                diskReadStable="true"
                doIbytiDelay="true"
                enableForceKeyDiffCopy="false"
                enableHeartBeat="true"
                enableKeyDiff="true"
                enableSNMP="true"
                enableSysLog="true"
                fastLaneQueue="true"
                fieldTranslationEnabled="false"
                flushJHtreeCacheOnOOM="true"
                forceStdLog="false"
                highTimeout="2000"
                ignoreMissingFiles="false"
                indexReadChunkSize="60000"
                indexReadStable="true"
                initIbytiDelay="100"
                jumboFrames="false"
                keyedJoinFlowLimit="1000"
                keyedJoinStable="true"
                lazyOpen="false"
                leafCacheMem="50"
                linuxYield="true"
                localFilesExpire="-1"
                localSlave="false"
                logFullQueries="false"
                logQueueDrop="32"
                logQueueLen="512"
                lowTimeout="10000"
                maxBlockSize="10000000"
                maxLocalFilesOpen="4000"
                maxLockAttempts="5"
                maxRemoteFilesOpen="1000"
                memoryStatsInterval="60"
                memTraceLevel="1"
                memTraceSizeLimit="0"
                minFreeDiskSpace="1073741824"
                minIbytiDelay="0"
                minLocalFilesOpen="2000"
                minRemoteFilesOpen="500"
                miscDebugTraceLevel="0"
                monitorDaliFileServer="false"
                multicastBase="239.1.1.1"
                multicastLast="239.1.254.254"
                name="myroxie"
                nodeCacheMem="100"
                nodeCachePreload="false"
                numChannels="$len($nodes)"
                numDataCopies="2"
                parallelAggregate="0"
                perChannelFlowLimit="10"
                pingInterval="60"
                pluginsPath="/opt/HPCCSystems/plugins"
                preabortIndexReadsThreshold="100"
                preabortKeyedJoinsThreshold="100"
                preferredSubnet=""
                preferredSubnetMask=""
                remoteFilesExpire="3600000"
                resolveFilesInPackage="false"
                roxieMulticastEnabled="true"
                serverSideCacheSize="0"
                serverThreads="30"
                simpleLocalKeyedJoins="true"
                siteCertificate=""
                slaTimeout="2000"
                slaveConfig="cyclic redundancy"
                slaveThreads="30"
                smartSteppingChunkRows="100"
                soapTraceLevel="1"
                socketCheckInterval="5000"
#raw                
                SSHidentityfile="$HOME/.ssh/id_rsa"
#end raw                
                SSHpassword=""
                SSHretries="3"
                SSHtimeout="0"
                SSHusername="hpcc"
                statsExpiryTime="3600"
                syncCluster="false"
                systemMonitorInterval="60000"
                totalMemoryLimit="1073741824"
                traceLevel="1"
                trapTooManyActiveQueries="true"
                udpFlowSocketsSize="131071"
                udpInlineCollation="false"
                udpInlineCollationPacketLimit="50"
                udpLocalWriteSocketSize="131071"
                udpMaxRetryTimedoutReqs="0"
                udpMaxSlotsPerClient="2147483647"
                udpMulticastBufferSize="131071"
                udpOutQsPriority="0"
                udpQueueSize="100"
                udpRequestToSendTimeout="5"
                udpResendEnabled="true"
                udpRetryBusySenders="0"
                udpSendCompletedInData="false"
                udpSendQueueSize="50"
                udpSnifferEnabled="true"
                udpTraceLevel="1"
                useHardLink="false"
                useLogQueue="true"
                useMemoryMappedIndexes="false"
                useRemoteResources="true"
                useTreeCopy="false">
   <RoxieFarmProcess dataDirectory="/var/lib/HPCCSystems/hpcc-data/roxie"
                     listenQueue="200"
                     name="farm1"
                     numThreads="30"
                     port="9876"
                     requestArrayThreads="5">
#for $netAddress, $name in $nodes:                    
    <RoxieServerProcess computer="$name" name="farm1_$name"/>
#end for
   </RoxieFarmProcess>
#for $netAddress, $name in $nodes:                    
   <RoxieServerProcess computer="$name"
                       dataDirectory="/var/lib/HPCCSystems/hpcc-data/roxie"
                       listenQueue="200"
                       name="farm1_s_$name"
                       netAddress="$netAddress"
                       numThreads="30"
                       port="9876"
                       requestArrayThreads="5"/>
#end for
#for $netAddress, $name in $nodes:                    
   <RoxieSlave computer="$name" name="s_$name">
    <RoxieChannel dataDirectory="/var/lib/HPCCSystems/hpcc-data/roxie" number="$random.randint(1,$len($nodes))"/>
    <RoxieChannel dataDirectory="/var/lib/HPCCSystems/hpcc-data2/roxie" number="$random.randint(1,$len($nodes))"/>
   </RoxieSlave>
#end for
#set $channel = 1
#for $netAddress, $name in $nodes:                    
   <RoxieSlaveProcess channel="$channel"
                      computer="$name"
                      dataDirectory="/var/lib/HPCCSystems/hpcc-data/roxie"
                      name="s_$name"
                      netAddress="$netAddress"/>
#set $channel = $channel + 1                      
#end for
  </RoxieCluster>
  <SashaServerProcess autoRestartInterval="0"
                      build="community_3.0.4"
                      buildSet="sasha"
                      cachedWUat="* * * * *"
                      cachedWUinterval="24"
                      cachedWUlimit="100"
                      coalesceAt="* * * * *"
                      coalesceInterval="1"
                      dafsmonAt="* * * * *"
                      dafsmonInterval="0"
                      dafsmonList="*"
                      daliServers="mydali"
                      description="Sasha Server process"
                      DFUrecoveryAt="* * * * *"
                      DFUrecoveryCutoff="4"
                      DFUrecoveryInterval="12"
                      DFUrecoveryLimit="20"
                      DFUWUat="* * * * *"
                      DFUWUcutoff="14"
                      DFUWUduration="0"
                      DFUWUinterval="24"
                      DFUWUlimit="1000"
                      DFUWUthrottle="0"
                      ExpiryAt="* 3 * * *"
                      ExpiryInterval="24"
                      keepResultFiles="false"
                      LDSroot="LDS"
                      logDir="."
                      minDeltaSize="50000"
                      name="mysasha"
                      recoverDeltaErrors="false"
                      thorQMonInterval="1"
                      thorQMonQueues="*"
                      thorQMonSwitchMinTime="0"
                      WUat="* * * * *"
                      WUbackup="0"
                      WUcutoff="8"
                      WUduration="0"
                      WUinterval="6"
                      WUlimit="1000"
                      WUretryinterval="7"
                      WUthrottle="0"
                      xrefAt="* 2 * * *"
                      xrefCutoff="1"
                      xrefEclWatchProvider="true"
                      xrefInterval="0"
                      xrefList="*">
#for $netAddress, $name in $nodes:                    
   <Instance computer="$name"
             directory="/var/lib/HPCCSystems/mysasha"
             name="s_$name"
             netAddress="$netAddress"
             port="8877"/>
#end for             
  </SashaServerProcess>
  <ThorCluster autoCopyBackup="false"
               build="community_3.0.4"
               buildSet="thor"
               computer="$nodes[0][1]"
               daliServers="mydali"
               description="Thor process"
               localThor="false"
               monitorDaliFileServer="true"
               multiSlaves="false"
               name="mythor"
               pluginsPath="/opt/HPCCSystems/plugins/"
               replicateAsync="true"
               replicateOutputs="true"
               slaves="8"
               watchdogEnabled="true"
               watchdogProgressEnabled="true">
   <Debug/>
#raw   
   <SSH SSHidentityfile="$HOME/.ssh/id_rsa"
#end raw   
        SSHpassword=""
        SSHretries="3"
        SSHtimeout="0"
        SSHusername="hpcc"/>
   <Storage/>
   <SwapNode/>
   <ThorMasterProcess computer="$nodes[0][1]" name="m_$nodes[0][1]"/>
#if $len($nodes) > 1:
#for $netAddress, $name in $nodes[1:]:                    
   <ThorSlaveProcess computer="$name" name="s_$name"/>
#end for   
#end if
   <Topology>
    <Node process="m_$nodes[0][1]">
#if $len($nodes) > 1:
#for $netAddress, $name in $nodes[1:]:                    
     <Node process="s_$name"/>
#end for   
#end if
    </Node>
   </Topology>
  </ThorCluster>
  <Topology build="community_3.0.4" buildSet="topology" name="topology">
   <Cluster name="hthor" prefix="hthor">
    <EclAgentProcess process="myeclagent"/>
    <EclCCServerProcess process="myeclccserver"/>
    <EclSchedulerProcess process="myeclscheduler"/>
   </Cluster>
   <Cluster name="thor" prefix="thor">
    <EclAgentProcess process="myeclagent"/>
    <EclCCServerProcess process="myeclccserver"/>
    <EclSchedulerProcess process="myeclscheduler"/>
    <ThorCluster process="mythor"/>
   </Cluster>
   <Cluster name="roxie" prefix="roxie">
    <EclAgentProcess process="myeclagent"/>
    <EclCCServerProcess process="myeclccserver"/>
    <EclSchedulerProcess process="myeclscheduler"/>
    <RoxieCluster process="myroxie"/>
   </Cluster>
  </Topology>
 </Software>
</Environment>
====


Even just scrolling past the file takes a while!!  This behemoth of a file was tamed thanks to cheetah, I highly encourage you to read up on it.

This charm may require some changes to your environment.yaml file in ~/.juju as hpcc will only run on 64-bit instances.  Make sure that your juju environment has been properly shutdown before you edit this file ( juju destroy-environment ).  Here is my environment.yaml file where I show you the important part to check:
juju: environments

environments:
  sample:
    type: ec2
    access-key: ( removed ... get your own :) )
    secret-key: ( removed ... get your own :) )
    control-bucket: juju-fbb790f292e14a0394353bb4b63a3403
    admin-secret: 604d18a77fd24e3f91e1df398fcbe9f2
The emphasized parts are the important ones.  You can just copy them from here and paste them into your ~/.juju/environment.yaml file.
Now, let's take a look at the charm starting with the metadata.yaml file:
name: hpcc
revision: 1
summary: HPCC (High Performance Computing Cluster)
description: |
  HPCC (High Performance Computing Cluster) is a massive 
  parallel-processing computing platform that solves Big Data problems.
provides:
  hpcc:
    interface: hpcc
requires:
  hpcc-thor:
    interface: hpcc-thor
  hpcc-roxie:
    interface: hpcc-roxie
peers:
  hpcc-cluster:
    interface: hpcc-cluster

 There are various provides and requires interfaces in this metadata.yaml file but, for now, only the peers interface is being used.  I'll work on the other ones as the charm matures.


Let's look at the hpcc-cluster interface.  More specifically the hpcc-cluster-relation-changed hook where the new configuration is created:
#!/bin/bash
CWD=$(dirname $0)
cheetah fill --oext=xml --odir=/etc/HPCCSystems/ ${CWD}/../templates/environment.tmpl
service hpcc-init restart
It's pretty simple isn't it?   Since the "heavy lifting" is being done with the self contained cheetah template , we don't have much to do here but, to generate the configuration file and restart hpcc.

The other files in this charm are pretty self explanatory and simple so, I am leaving the details of them as an exercise to the reader.

All of the complexities in hpcc has been distilled to the following commands:

  • juju bootstrap
  • bzr branch lp:~negronjl/+junk/hpcc
  • juju deploy --repository . hpcc
    • wait a minute of two
  • juju status
    • you should see something similar to this:

negronjl@negronjl-laptop:~/src/juju/charms$ juju status
2011-08-18 16:00:54,413 INFO Connecting to environment.


machines: 
  0: {dns-name: ec2-184-73-109-244.compute-1.amazonaws.com, instance-id: i-6d61460c} 
  1: {dns-name: ec2-50-16-60-94.compute-1.amazonaws.com, instance-id: i-d5694eb4}


services:
  hpcc:
    charm: local:hpcc-1
    relations: {hpcc-cluster: hpcc}
    units:
      hpcc/0:
        machine: 1
        relations: {}
        state: null


2011-08-18 16:00:58,374 INFO 'status' command finished successfully
negronjl@negronjl-laptop:~/src/juju/charms$
The above commands, will give you a single node.

You can access the web interface of your node by pointing your browser to http://<FQDN>:8010 Where FQDN is the Fully Qualified Domain Name or Public IP Address of your hpcc instance.  On the left side, there should be a menu, explore the items on the Topology section.  The Target Clusters section should look something similar to this:


To experience the true power of hpcc, you should probably throw in some more nodes at it.  Let's do just that with:


  • juju add-unit hpcc 
    • do this as many times as you feel comfortable   
    • Each command will give you a new node in the cluster
    • wait a minute or two and you should see something similar to this:

negronjl@negronjl-laptop:~$ juju status
2011-08-18 16:25:55,739 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-184-73-109-244.compute-1.amazonaws.com, instance-id: i-6d61460c}
  1: {dns-name: ec2-50-16-60-94.compute-1.amazonaws.com, instance-id: i-d5694eb4}
  2: {dns-name: ec2-50-19-181-98.compute-1.amazonaws.com, instance-id: i-a5795ec4}
  3: {dns-name: ec2-184-72-147-67.compute-1.amazonaws.com, instance-id: i-25446344}
services:
  hpcc:
    charm: local:hpcc-1
    relations: {hpcc-cluster: hpcc}
    units:
      hpcc/0:
        machine: 1
        relations:
          hpcc-cluster: {state: up}
        state: started
      hpcc/1:
        machine: 2
        relations:
          hpcc-cluster: {state: up}
        state: started
      hpcc/2:
        machine: 3
        relations:
          hpcc-cluster: {state: up}
        state: started
2011-08-18 16:26:01,837 INFO 'status' command finished successfully
Notice how we now have more hpcc nodes :)  Here is what the web interface could look like:


Again....we have more nodes :)

Now that we have a working cluster, let's try it.  We'll first do the mandatory Hello World in ECL.  It looks something like this (hello.ecl):
Output('Hello world');
 We have to compile our hello.ecl so we can use it.  We do that by logging into one of the nodes ( I used juju ssh 1 to log on to the first/master node ) and typing the following:
eclcc hello.ecl -o
We run the file just like we would any other binary:
./hello
... and the output is:
ubuntu@ip-10-111-19-210:~$ ./hello
Hello world
ubuntu@ip-10-111-19-210:~$ 
There are far more interesting examples in the Learning ECL Documentation here.  I highly encourage you to go and read about it.

That's it for now.  Feedback is always welcome of course so, let me know how I'm doing.

-Juan