We use Zabbix to monitor our systems at work. It’s a great open source alternative. One of things I’ve been working on recently is auditing our monitoring system for defunct monitoring points, unmonitored services, and proper triggers and alerts based on our SLA requirements. HAProxy was one of those items.

There are standard monitoring points like PID changes, web interface availability, CPU/Memory usage, etc. But what about monitoring things like MAXCONN and CURCONNS? Turns out there’s a way to get this data from HAProxy using what they call a “stats socket.” This information isn’t found in the haproxy-en.txt file, but in the configuration.txt file. In my installation, it isn’t in /usr/share/doc/haproxy like everything else. I actually found this on the official website. Here’s the interesting bit:

> stats socket <path> [{uid | user} <uid>] [{gid | group} <gid>] [mode <mode>]  
> [level <level>]
> 
> Creates a UNIX socket in stream mode at location <path>. Any previously  
> existing socket will be backed up then replaced. Connections to this socket  
> will return various statistics outputs and even allow some commands to be  
> issued. Please consult section 9.2 "Unix Socket commands" for more details.
> 
> An optional "level" parameter can be specified to restrict the nature of  
> the commands that can be issued on the socket :  
> - "user" is the least privileged level ; only non-sensitive stats can be  
> read, and no change is allowed. It would make sense on systems where it  
> is not easy to restrict access to the socket.
> 
> - "operator" is the default level and fits most common uses. All data can  
> be read, and only non-sensible changes are permitted (eg: clear max  
> counters).
> 
> - "admin" should be used with care, as everything is permitted (eg: clear  
> all counters).
> 
> On platforms which support it, it is possible to restrict access to this  
> socket by specifying numerical IDs after "uid" and "gid", or valid user and  
> group names after the "user" and "group" keywords. It is also possible to  
> restrict permissions on the socket by passing an octal value after the "mode"  
> keyword (same syntax as chmod). Depending on the platform, the permissions on  
> the socket will be inherited from the directory which hosts it, or from the  
> user the process is started with.

Simple enough. Edit your haproxy.cfg and add this into your “global” section:

global
        daemon
        maxconn 100
        quiet
        user haproxy
        group haproxy
        stats socket    /tmp/haproxy

and you should now see a socket setup in /tmp (note in the ls output that the “s” at the beginning of the permission set denotes the file type as a socket):

# ls -lah /tmp/haproxy
srwxr-xr-x 1 root root 0 2010-07-14 12:53 /tmp/haproxy
#

Now we can query HAProxy using this socket for some stats. A great way to do this is using socat. If you don’t have it installed, you can compile from source, or use the package management system for your OS (ex: “apt-get install socat” for Ubuntu).

To query for some stats, you can try the following commands:

# echo “show info” | socat unix-connect:/tmp/haproxy stdio
# echo “show stat” | socat unix-connect:/tmp/haproxy stdio
# echo “show errors” | socat unix-connect:/tmp/haproxy stdio
# echo “show sess” | socat unix-connect:/tmp/haproxy stdio

More information on interacting with HAProxy through the stats socket can be found in section “9.2. Unix Socket commands” of the configuration.txt file I linked to above (it’s the last section in the file).

Varnish has become a popular topic of conversation lately, and rightly so. It is amazing. The Varnish-cache project describes it as simply a high-performance HTTP Accelerator. Outside of its use as a caching layer, it is also an excellent load balancer and proxy.

One of the great things about Varnish is the Varnish Configuration Language. VCL files define the policies for Varnish when handling requests and for caching and are written in a syntax similar to C and Perl. What makes this so amazing is the following:

When a new configuration is loaded, the varnishd management process translates the VCL code to C and compiles it to a shared object which is then dynamically linked into the server process.

This just blows my mind. Talk about going the extra mile to squeeze out as much performance as you can from your application. It makes the VCL syntax much easier to understand when you can think about it in these terms and know what the application is going to do with it.

I’ve been sifting through VCL files in my free time to accumulate a list of nifty tricks that I might be interesting in using in the future. VIM being my editor of choice, I hunted for a syntax file that would help when reading VCL files. I ran across one written by Elan Ruusamäe and it was exactly what I needed. Download the latest version and put it in your ~/.vim/syntax directory. Then make the following additions to your .vimrc:

au BufRead,BufNewFile *.vcl :set ft=vcl
au! Syntax vcl source ~/.vim/syntax/vcl.vim

Now I can enjoy syntax highlighting when ripping through example code like I would most other things I edit with VIM.

[2010-07-12] UPDATE: I just confirmed that this works for the latest mainline.

We use Gitorious at work as our Git front-end. We opted to not use Github since we wanted something we didn’t have to pay for, and was internal to our network. It has a great feature-set, and pretty much does everything we need it to do and more.

There is a bit of documentation on how to get Gitorious installed (as painful as it may be), but there’s little to no documentation on how to upgrade it once it’s installed. I just went through an upgrade and it turned out to be much easier than I was anticipating.

First, we need to get the latest source:

$ cd ~
$ mkdir ~/gitorious
$ git clone git://gitorious.org/gitorious/mainline.git gitorious

Next, we need to go through and compare all of the configuration files.

$ cd gitorious
$ cd config
$ diff <foo> /path/to/current/gitorious/install/configs/<foo>

Where foo is any specific configuration file in the new Gitorious source and your current install. I found that for the most part I was able to copy my old configs in place and not have to change anything.

This step is dependent on your install, but I made a few custom modifications to the source. I did not like how the “Projects” and “Repositories” pages were not listed alphabetically. They seemed to be ordered by descending creation date. Here’s the changes if you want to see how to do this as well:

--- app/views/projects/index.html.erb       2010-04-27 10:57:09.109711812 -0400
+++ /var/www/gitorious/app/views/projects/index.html.erb        2010-04-27 10:29:50.799722064 -0400
@@ -25,7 +25,7 @@
 <h1>Projects</h1>

 <ul class="project_list">
-<% @projects.each do |project| -%>
+<% @projects.sort_by { |p| p[:title] }.each do |project| -%>
   <li class="project_list_item">
     <%= render :partial => project, :object => project -%>
   </li>
--- app/views/projects/show.html.erb        2010-04-27 10:57:09.109711812 -0400
+++ /var/www/gitorious/app/views/projects/show.html.erb 2010-04-27 10:29:50.799722064 -0400
@@ -32,7 +32,7 @@
   <%= render_markdown(@project.description, :auto_link) -%>
 </div>

-<% @project.repositories.mainlines.each do |repo| -%>
+<% @project.repositories.mainlines.sort_by { |r| r[:name] }.each do |repo| -%>
   <%= render :partial => "repositories/overview", :locals => {:repository => repo} -%>
 <% end -%>

We also use Redmine at work for issues and project management within our team. One of the features of Redmine is the ability to browse source and do code reviews. In order to do this, Redmine needs access to the repositories. A lot of solutions out there suggest writing cronjobs to clone the source into a place Redmine can read. I found this horrible since the clone will be out-of-date quickly unless you have a cronjob which runs more frequently than your developers commit. My solution is a bit better, and allows for live-code browsing. We happen to have Redmine on the same system as Gitorious (both being Ruby on Rails, this worked out nicely). In our Redmine install, I created a symlink to the Gitorious repositories folder (you can use an NFS mount if they are hosted on different machines). That solved Redmine’s ability to have direct access to the source. The problem with that is the Gitorious naming-scheme is not human-readable. So the problem we had is how to tell Redmine which folder to look at for a specific project. We solved this by using the directory information in the Gitorious database and exposing it in a format that can be pasted into Redmine. Here’s the diff for this feature if you want to use it:

--- app/views/repositories/show.html.erb    2010-04-27 10:57:09.109711812 -0400
+++ /var/www/gitorious/app/views/repositories/show.html.erb     2010-04-27 10:29:50.809717943 -0400
@@ -63,6 +63,9 @@
       <strong><%= t("views.repos.created") %>:</strong>
       <%= @repository.created_at.to_s(:short) -%>
     </li>
+    <li>
+      <strong>Redmine: </strong>/var/www/redmine/repositories/<%= @repository.real_gitdir.to_s() -%>
+    </li>
   </ul>

   <ul class="links">

This will show up in Gitorious in the informational section on the right-hand side when viewing a repository.

Now that you have copied your configuration files over, and your personalized code changes, it’s time to put the new source in place. Backup your current install and current database before doing anything. Once that is backed up, delete the old install, and put the new one in place. Copy your repositories directory from the old install into the new one. Copy your public/system directory from your old install to the new one to migrate avatars. Then update the database (this is assuming you are using the production environment):

$ cd /path/to/new/gitorious/install
$ rake db:migrate RAILS_ENV=production

This shouldn’t return any errors, just a list of database changes that were made to accommodate the new install. If that is successful, you can restart your services (we use Apache, so it was just a matter of restarting Apache) and then visit your Gitorious page to make sure nothing is horribly broken. If there are any problems, it is most likely an issue with the version of Ruby, or of the gems you have installed. Review the contents of /path/to/new/gitorious/install/README for information on what versions are required and how to install them.

HAProxy is a high performance load balancer. It is very light-weight, and free, making it a great option if you are in the market for a load balancer and need to keep your costs down.

Lately we’ve been making a lot of load balancer changes at work to accommodate new systems and services. Even though we have two load balancers running with keepalived taking care of any failover situations, I was thinking about how we go about reloading our configuration files. In the event of a change, the “common” way to get the changes to take effect is to run /etc/init.d/haproxy restart. This is bad for a couple major reasons:

  1. You are temporarily shutting your load balancer down
  2. You are severing any current connections going through the load balancer

You might say, “if you have two load balancers with keepalived, restarting the service should be fine since keepalived will handle the failover.” This, however, isn’t always true. Keepalived uses advertisements to determine when to fail over. The default advertisement interval is 1 second (configurable in keepalived.conf). The skew time helps to keep everyone from trying to transition at once. It is a number between 0 and 1, based on the formula (256 – priority) / 256. As defined in the RFC, the backup must receive an advertisement from the master every (3 * advert_int) + skew_time seconds. If it doesn’t hear anything from the master, it takes over.

Let’s assume you are using the default interval of 1 second. On my test machine, this is the duration of time it takes to restart haproxy:

# time /etc/init.d/haproxy restart
 * Restarting haproxy haproxy
   ...done.

real    0m0.022s
user    0m0.000s
sys     0m0.016s

In this situation, haproxy would restart much faster than your 1 second interval. You could get lucky and happen to restart it just before the check, but luck is not consistent enough to be useful. Also, in very high-traffic situations, you’ll be causing a lot of connection issues. So we cannot rely on keepalived to solve the first problem, and it definitely doesn’t solve the second problem.

After sifting through haproxy documentation (the text-based documentation, not the man page) (/usr/share/doc/haproxy/haproxy-en.txt.gz on Ubuntu), I came across this:

313
314     global
315         daemon
316         quiet
317         nbproc  2
318         pidfile /var/run/haproxy-private.pid
319
320     # to stop only those processes among others :
321     # kill $(</var/run/haproxy-private.pid)
322
323     # to reload a new configuration with minimal service impact and without
324     # breaking existing sessions :
325     # haproxy -f haproxy.cfg -p $(</var/run/haproxy-private.pid) -st $(</var/run/haproxy-private.pid)

That last command is the one of interest. The -p asks the process to write down each of its children’s pids to the specified pid file, and the -st specifies a list of pids to send a SIGTERM to after startup. But it does this in an interesting way:

609 The '-st' and '-sf' command line options are used to inform previously running
610 processes that a configuration is being reloaded. They will receive the SIGTTOU
611 signal to ask them to temporarily stop listening to the ports so that the new
612 process can grab them. If anything wrong happens, the new process will send
613 them a SIGTTIN to tell them to re-listen to the ports and continue their normal
614 work. Otherwise, it will either ask them to finish (-sf) their work then softly
615 exit, or immediately terminate (-st), breaking existing sessions. A typical use
616 of this allows a configuration reload without service interruption :
617
618  # haproxy -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

The end-result is a reload of the configuration file which is not visible by the customer. It also solves the second problem! Let’s look at an example of the command and look at the time compared to our above example:

# time haproxy -f /etc/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

real    0m0.018s
user    0m0.000s
sys     0m0.004s

I’ve specified the config file I want to use and the pid file haproxy is currently using. The $(cat /var/run/haproxy.pid) takes the output of cat /var/run/haproxy.pid and passes it in to the -sf parameter as a list, which is what it is expecting. You will notice that the time is actually faster too (.012s sys, and .004s real). It may not seem like much, but if you are dealing with very high volumes of traffic, this can be pretty important. Luckily for us it doesn’t matter because we’ve been able to reload the haproxy configuration without dropping any connections and without causing any customer-facing issues.

UPDATE: There is a reload in some of the init.d scripts (I haven’t checked every OS, so this can vary), but it uses the -st option which will break existing sessions, as opposed to using -sf to do a graceful hand-off. You can modify the haproxy_reload() function to use the -sf if you want. I also find it a bit confusing that the documentation uses $(cat /path/to/pidfile) whereas this haproxy_reload() function uses $(<$PIDFILE). Either should work, but really, way to lead by example…

The official Motorola Droid 2.1 update is out. You can get it here: ESD81-from-ESD56.

Word in the DroidMod community is that we’ve rooted it already. I’m sure over the next day or so it’s going to be a whirlwind of forum updates and IRC chat to get this thing to the masses as quickly as possible. But it also doesn’t seem like it is doing anything more than what DroidMod 1.0 has already provided. When I have more details, I’ll pass it along.