Since my last post, I have been digging in a little deeper on CoreOS. As a playground, I have set up a cluster of three nodes on Digital Ocean. At the moment, there is just a single HAProxy instance routing internet traffic to a single service hosting static HTML. Still, I thought it might be helpful to describe my learnings for others.
cloud-config
As we learned previously, CoreOS allows us to make some alterations during the
provisioning of new nodes. The interface is the cloud-config
format
generally provided as the contents of a user-data
key during provisioning.
All of my nodes were provisioned with the same cloud-config
, an example of
which you can see here:
1 #cloud-config
2
3 ---
4 coreos:
5 update:
6 reboot-strategy: off
7 etcd2:
8 discovery: https://discovery.etcd.io/<REDACTED>
9 advertise-client-urls: http://$private_ipv4:2379
10 initial-advertise-peer-urls: http://$private_ipv4:2380
11 listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
12 listen-peer-urls: http://$private_ipv4:2380
13 fleet:
14 metadata: "platform=cloud,provider=digitalocean,region=sfo1,disk=ssd"
15 units:
16 - name: etcd2.service
17 command: start
18 - name: fleet.service
19 command: start
20 - name: flanneld.service
21 drop-ins:
22 - name: 50-network-config.conf
23 content: |
24 [Service]
25 ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config '{ "Network": "10.1.0.0/16" }'
26 command: start
Since I’m just getting my feet wet, you will notice that I have set
coreos.update.reboot-strategy = off
. In a more production-like scenario, we
would want our nodes to gracefully reboot for regular updates. I have set
relevant components of my etcd configuration to use private addressing, since
my entire cluster lives within a single region of a single provider. This
brings us to the list of key/value pairs provided as a single string to the
coreos.fleet.metadata
attribute.
The final bit of interest in all of this is the configuration override of the flannel service. Mine is pretty simple, but it ensures that each node of the cluster gets a unique private subnet to use as an overlay network for intra-cluster communication among Docker containers. Hopefully, I am able to demonstrate how this simplifies routing inbound internet traffic.
Service Units
Here is an overview of the services currently scheduled on the cluster:
$ fleetctl list-units
UNIT MACHINE ACTIVE SUB
dumbserve.service 53f94a39 active running
haproxy-dvc.service 7db9339e active exited
haproxy.service 7db9339e active running
registrator.service 53f94a39 active running
registrator.service 7db9339e active running
registrator.service 88edf1b7 active running
You should notice that fleet has scheduled a registrator
service on all three
nodes. You might also notice that fleet has scheduled both the haproxy
and
haproxy-dvc
services on the same node while scheduling the dumbserve
service on a separate node. Let’s dig into these a bit more.
Registrator
The good folks at Weave have provided the community with
a nifty “service registration bridge for Docker” called Registrator.
This tool is generally configured to watch the Docker API of a node for changes
and register those changes to a distributed configuration registry. In my case,
this is etcd
, which is provided out of the box with CoreOS. Let’s take a
look at how I have these scheduled on the cluster.
$ fleetctl cat registrator
[Unit]
Description=registrator service registry bridge
After=flanneld.service
Requires=flanneld.service
[Service]
ExecStartPre=-/usr/bin/docker stop %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStartPre=/usr/bin/docker create --name %p --net=host --volume=/var/run/docker.sock:/tmp/docker.sock:ro gliderlabs/registrator:latest -internal etcd:///services
ExecStart=/usr/bin/docker start -a %p
ExecStop=/usr/bin/docker stop %p
[X-Fleet]
Global=true
You may notice that this service depends on the flannel overlay network to be
successfully configured. You may also notice that I have chosen to use the
-internal
option to watch for exposed ports instead of published ones. Next,
I have chosen to register all service under the /services
prefix. I
need to bind mount the docker socket to watch the API for changes. I use the
read-only mode of the volume mount docker option to ensure the registrator
service doesn’t inadvertently disrupt the docker service itself. Finally, I
schedule this service on all nodes of the cluster with the Global=true
property in the [X-Fleet]
section.
This, along with flannel, allows me to simplify my services by not bothering with port mappings or container linking. Now whenever we schedule a new service on the cluster no matter what node it ends up on it’s private address and exposed port will be registered in etcd. Let’s take a look.
$ fleetctl ssh haproxy etcdctl ls /services
/services/dumbserve
/services/haproxy
$ fleetctl ssh haproxy etcdctl ls /services/dumbserve
/services/dumbserve/core01:dumbserve:8080
$ fleetctl ssh haproxy etcdctl get /services/dumbserve/core01:dumbserve:8080
10.1.18.13:8080
HAProxy
HAProxy is a wonderful tool to proxy and load balance traffic. A more production-like environment would involve a more highly available and fault tolerant setup. This might include tools such as Corosync and/or Pacemaker. For the purposes of this simple demonstration, those are considered out of scope.
My setup includes a very simple data volume container and a service container
using the “official” haproxy
image from the docker hub.
$ fleetctl cat haproxy-dvc
[Unit]
Description=HAProxy Data Volume Container
After=registrator.service
Requires=registrator.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStartPre=-/usr/bin/docker create -v /usr/local/etc/haproxy/ --name %p library/haproxy:1.5
ExecStart=/usr/bin/docker inspect %p
$ fleetctl cat haproxy
[Unit]
Description=HAProxy service
After=%p-dvc.service
Requires=%p-dvc.service
[Service]
ExecStartPre=-/usr/bin/docker stop %p
ExecStartPre=-/usr/bin/docker rm %p
ExecStartPre=/usr/bin/docker create --name %p -p 80:80 --volumes-from %p-dvc library/haproxy:1.5
ExecStart=/usr/bin/docker start -a %p
ExecStop=/usr/bin/docker stop %p
[X-Fleet]
MachineOf=%p-dvc.service
Notice that my data volume container depends on the registrator service. I
have set Type=oneshot
and RemainAfterExit=yes
in this unit to inform the
scheduler that it does not include a long living process, but it’s success may
be critical for some dependent service unit. Indeed, you should note that my
haproxy
service unit depends on the data volume container and I ensure that
it is scheduled on the same node using the MachineOf
property of the
[X-Fleet]
section.
The final piece of this is to configure our HAProxy service to route traffic
according to the contents of our configuration registry. We have already shown
how our registrator
service will dynamically populate etcd with the
appropriate information.
1 global
2 maxconn 256
3
4 defaults
5 mode http
6 timeout connect 5000ms
7 timeout client 50000ms
8 timeout server 50000ms
9
10 frontend http_proxy
11 bind *:80
12 acl host_ncmga hdr_dom(host) norcalmga.com
13 use_backend ncmga if host_ncmga
14
15 backend ncmga
16 server core01:dumbserve:8080 10.1.18.13:8080 maxconn 32
Improvements
This HAProxy config is currently being written by hand. This is clearly not ideal. The next step in making this cluster truly fully automated is watching the configured etcd prefix for changes, generating an appropriate HAProxy config, and rescheduling an HAProxy service unit.