CoreOS - Routing with flannel and HAProxy

September 30th, 2015

Since my last post, I have been digging in a little deeper on CoreOS. As a playground, I have set up a cluster of three nodes on Digital Ocean. At the moment, there is just a single HAProxy instance routing internet traffic to a single service hosting static HTML. Still, I thought it might be helpful to describe my learnings for others.

`cloud-config`

As we learned previously, CoreOS allows us to make some alterations during the provisioning of new nodes. The interface is the cloud-config format generally provided as the contents of a user-data key during provisioning. All of my nodes were provisioned with the same cloud-config, an example of which you can see here:

 1 #cloud-config
 2 
 3 ---
 4 coreos:
 5   update:
 6     reboot-strategy: off
 7   etcd2:
 8     discovery: https://discovery.etcd.io/<REDACTED>
 9     advertise-client-urls: http://$private_ipv4:2379
10     initial-advertise-peer-urls: http://$private_ipv4:2380
11     listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
12     listen-peer-urls: http://$private_ipv4:2380
13   fleet:
14     metadata: "platform=cloud,provider=digitalocean,region=sfo1,disk=ssd"
15   units:
16     - name: etcd2.service
17       command: start
18     - name: fleet.service
19       command: start
20     - name: flanneld.service
21       drop-ins:
22         - name: 50-network-config.conf
23           content: |
24             [Service]
25             ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
26       command: start

Since I’m just getting my feet wet, you will notice that I have set coreos.update.reboot-strategy = off. In a more production-like scenario, we would want our nodes to gracefully reboot for regular updates. I have set relevant components of my etcd configuration to use private addressing, since my entire cluster lives within a single region of a single provider. This brings us to the list of key/value pairs provided as a single string to the coreos.fleet.metadata attribute.

The final bit of interest in all of this is the configuration override of the flannel service. Mine is pretty simple, but it ensures that each node of the cluster gets a unique private subnet to use as an overlay network for intra-cluster communication among Docker containers. Hopefully, I am able to demonstrate how this simplifies routing inbound internet traffic.

Service Units

Here is an overview of the services currently scheduled on the cluster:

$ fleetctl list-units
UNIT			MACHINE		ACTIVE	SUB
dumbserve.service	53f94a39	active	running
haproxy-dvc.service	7db9339e	active	exited
haproxy.service		7db9339e	active	running
registrator.service	53f94a39	active	running
registrator.service	7db9339e	active	running
registrator.service	88edf1b7	active	running

You should notice that fleet has scheduled a registrator service on all three nodes. You might also notice that fleet has scheduled both the haproxy and haproxy-dvc services on the same node while scheduling the dumbserve service on a separate node. Let’s dig into these a bit more.

Registrator

The good folks at Weave have provided the community with a nifty “service registration bridge for Docker” called Registrator. This tool is generally configured to watch the Docker API of a node for changes and register those changes to a distributed configuration registry. In my case, this is etcd, which is provided out of the box with CoreOS. Let’s take a look at how I have these scheduled on the cluster.

$ fleetctl cat registrator
[Unit]
Description=registrator service registry bridge
After=flanneld.service
Requires=flanneld.service

[Service]
ExecStartPre=-/usr/bin/docker stop %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStartPre=/usr/bin/docker create  --name %p  --net=host  --volume=/var/run/docker.sock:/tmp/docker.sock:ro  gliderlabs/registrator:latest -internal etcd:///services
ExecStart=/usr/bin/docker start -a %p
ExecStop=/usr/bin/docker stop %p

[X-Fleet]
Global=true

You may notice that this service depends on the flannel overlay network to be successfully configured. You may also notice that I have chosen to use the -internal option to watch for exposed ports instead of published ones. Next, I have chosen to register all service under the /services prefix. I need to bind mount the docker socket to watch the API for changes. I use the read-only mode of the volume mount docker option to ensure the registrator service doesn’t inadvertently disrupt the docker service itself. Finally, I schedule this service on all nodes of the cluster with the Global=true property in the [X-Fleet] section.

This, along with flannel, allows me to simplify my services by not bothering with port mappings or container linking. Now whenever we schedule a new service on the cluster no matter what node it ends up on it’s private address and exposed port will be registered in etcd. Let’s take a look.

$ fleetctl ssh haproxy etcdctl ls /services
/services/dumbserve
/services/haproxy
$ fleetctl ssh haproxy etcdctl ls /services/dumbserve
/services/dumbserve/core01:dumbserve:8080
$ fleetctl ssh haproxy etcdctl get /services/dumbserve/core01:dumbserve:8080
10.1.18.13:8080

HAProxy

HAProxy is a wonderful tool to proxy and load balance traffic. A more production-like environment would involve a more highly available and fault tolerant setup. This might include tools such as Corosync and/or Pacemaker. For the purposes of this simple demonstration, those are considered out of scope.

My setup includes a very simple data volume container and a service container using the “official” haproxy image from the docker hub.

$ fleetctl cat haproxy-dvc
[Unit]
Description=HAProxy Data Volume Container
After=registrator.service
Requires=registrator.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-/usr/bin/docker create  -v /usr/local/etc/haproxy/  --name %p  library/haproxy:1.5
ExecStart=/usr/bin/docker inspect %p

$ fleetctl cat haproxy
[Unit]
Description=HAProxy service
After=%p-dvc.service
Requires=%p-dvc.service

[Service]
ExecStartPre=-/usr/bin/docker stop %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStartPre=/usr/bin/docker create  --name %p  -p 80:80  --volumes-from %p-dvc  library/haproxy:1.5
ExecStart=/usr/bin/docker start -a %p
ExecStop=/usr/bin/docker stop %p

[X-Fleet]
MachineOf=%p-dvc.service

Notice that my data volume container depends on the registrator service. I have set Type=oneshot and RemainAfterExit=yes in this unit to inform the scheduler that it does not include a long living process, but it’s success may be critical for some dependent service unit. Indeed, you should note that my haproxy service unit depends on the data volume container and I ensure that it is scheduled on the same node using the MachineOf property of the [X-Fleet] section.

The final piece of this is to configure our HAProxy service to route traffic according to the contents of our configuration registry. We have already shown how our registrator service will dynamically populate etcd with the appropriate information.

 1 global
 2         maxconn 256
 3 
 4     defaults
 5         mode http
 6         timeout connect 5000ms
 7         timeout client 50000ms
 8         timeout server 50000ms
 9 
10     frontend http_proxy
11         bind *:80
12         acl host_ncmga hdr_dom(host) norcalmga.com
13         use_backend ncmga if host_ncmga
14 
15     backend ncmga
16         server core01:dumbserve:8080 10.1.18.13:8080 maxconn 32

Improvements

This HAProxy config is currently being written by hand. This is clearly not ideal. The next step in making this cluster truly fully automated is watching the configured etcd prefix for changes, generating an appropriate HAProxy config, and rescheduling an HAProxy service unit.

blog (8)

coreos (2)

Jason Thigpen

Operations and Infrastructure Automation Enthusiast