IP Address Renumbering @ OCF

As our resources and needs grow and change, we’ve decided to move some of our services around in order to give us easier-to-reason-about IP ranges while growing the pools of IP addresses available to the types of services we expect will grow.

Status on this project is being tracked in the following Google spreadsheet: https://docs.google.com/spreadsheets/d/1m9-vN2gV4oCOkxh78BnnKUBvty_Z7Bxm8F2eSDyg238/edit#gid=0

As we move to upgrade werewolves to stretch, and possibly migrate all our user-facing web hosting behind a reverse proxy, it may make sense to order these projects such that we renumber IP addresses, then set up the universal reverse proxy, then migrate apphosting to the new server.

I would still wish to see a separate range for DMZ servers (tsunami, death, vampires, segfault)

We haven’t comitted the renumbering to code yet. If you want to make a DMZ range, you can edit the document with your proposal. Moreover, if we do a single-entrypoint reverse proxy setup, we can probably minimize the number of hosts we would need to put into such a DMZ.

thoughts on this LDIF for actually performing the change?

https://i.fluffy.cc/Zssz8797QF9K9Mn7090kqm2b4xzwmjD1.html

Would we do this all at once? How are you planning on updating DNS with this? My concern is that we update things, and then they all break when we next restart them (unless we have something to change the IP on them while they are running or something, which comes with its own set of problems). I think we should figure out what has strict IP dependencies too (iptables firewall rules, NFS access, and RT access come to mind currently, but I’m sure there’s more).

I don’t think it would be very easy to renumber things incrementally because the ranges conflict with one another. We’d probably want to do this the next time we need to restart services anyways to avoid any unnecessary downtime, and most of the external firewall rules already operate on a DNS basis so they shouldn’t need too much meddling to fix. Same with NFS. RT would need fixing as well, but that’s also not difficult to do. We should also review the codebase to see where else we have IP dependencies and try to move them to CNAME-based ones instead, if possible.

A small nit: I’d like to move the Marathon master VMs closer to the hozer range, in anticipation of them being decommissioned in favor of Kubernetes masters (which we should allocate here as well).

This entire range allocation may need to be re-imagined in an OCF with RFC1918 addressing for most internal-only services.