Using DNS to query AWS

DNS is hard. Twisted is hard. AWS is easy. :)

At Mozilla Releng we use EC2 a lot. DNS has always been one of the issues -- one always wants to ssh/vnc to a specific VM to debug issues. Our Puppet infrastructure requires proper forward and reverse DNS entries to generate an SSL certificate.

Before we switched to PuppetAgain we weren't bothering ourselves to add VMs to DNS, and used to use a script to generate an /etc/hosts style file to simplify name resolution.

After adding spot instances into the equation we had to switch to a tricky model when we pre-create EC2 network interfaces in advance, add the corresponding IP addresses to DNS and tag the interfaces so our AMIs can use that information to set up their hostnames, etc.

This DNS requirement makes some things very inflexible. One has wait 10-20 minutes for DNS propagation. Even though we can use API to add new entries, cleaning up old ones has been always tricky.

During one my the 1x1s with catlee, we were brainstorming how we can get rid of DNS management and still be able to reach the VMs easily, we came to a simple idea to invent our own DNS server. Yay!

I wrote a simple DNS server using Twisted. It uses boto to query AWS and generate responses. The initial version is pretty simple, has a lot of hard coded values (like the port, log file, etc), has some issues with running boto async (yay defer.execute()), but it does addresses some of our issues above.

Some useful examples:

$ dig -p 1253 @localhost bld-linux64-ec2-010.build.releng.use1.mozilla.com
...
;; ANSWER SECTION:
bld-linux64-ec2-010.build.releng.use1.mozilla.com. 600 IN A 10.134.53.24
...

# use wildcards
$ dig -p 1253 @localhost *-linux64-ec2-010.*.releng.use1.mozilla.com
...
;; ANSWER SECTION:
tst-linux64-ec2-010.test.releng.use1.mozilla.com. 600 IN A 10.134.57.212
try-linux64-ec2-010.try.releng.use1.mozilla.com. 600 IN A 10.134.64.70
dev-linux64-ec2-010.dev.releng.use1.mozilla.com. 600 IN A 10.134.52.95
bld-linux64-ec2-010.build.releng.use1.mozilla.com. 600 IN A 10.134.53.24
...

# use instance ID
$ dig -p 1253 @localhost i-b462f595
...
;; ANSWER SECTION:
bld-linux64-ec2-158.build.releng.use1.mozilla.com. 600 IN A 10.134.52.9
...

# use tags
$ dig -p 1253 @localhost tag:moz-loaned-to=j*,moz-type=tst*
...
;; ANSWER SECTION:
tst-linux64-ec2-jrmuizel.test.releng.usw2.mozilla.com. 600 IN A 10.132.59.211
...

# do something useful, ping all loaned slaves
fping `dig -p 1253 @localhost tag:moz-loaned-to=* +short`
10.134.58.103 is alive
10.134.58.233 is alive
10.132.59.211 is alive
10.134.57.55 is alive
10.134.58.244 is unreachable
10.134.58.8 is unreachable

Comments

Comments powered by Disqus