Git-based config store (ocf-etc)

We currently use config files in the collectively-owned ~staff folder on supernova for some parts of our website. Two examples I’m aware of are:

  • ~staff/lab_status.yaml, which creates an urgent announcement banner on our homepage, usually used to inform people of outages
  • ~staff/staff_hours/yaml, which controls https://www.ocf.berkeley.edu/staff-hours

This approach comes with a number of problems:

  • We don’t have a clear edit history of these files
  • If anyone messes up the syntax (YAML is confusing), it breaks the whole document
  • Concurrent edits aren’t handled well-- if two staffers are editing the file at the same time, they might (opaquely) overwrite eachother’s changes.

A lot of configuration for our services ends up happening in ocflib, our Python library which is installed on all hosts. But updates to ocflib can take a long time to propagate. Also, changes to ocflib are subject to a “heavier” review process, which we don’t want to deal with for files like the staff-hours config, since those change often without needing to be reviewed.

We can also add validators to ensure that files don’t have syntax errors.

This idea was brought up by ckuehl, who has seen it used in practice at Yelp. chat logs

Roadmap:

  • Make the srv-configs repository on GitHub
  • (probably) make a new unix user to “own” the files, ensure they have rw permissions and everything else has read-only access
  • Write Puppet rules to make sure it’s on all the hosts
  • Write Puppet rules to update it every x seconds (probably a cronjob)
  • Write validators (git hooks? something else?)
  • Rewrite the necessary parts of ocfweb to look for configs in the new location
  • (stretch) Store lab hours in srv-configs

Do you have examples of what you mean by configuration in ocflib?

I wonder if root would serve the purpose…

Can we perhaps use Jenkins to do this part? Maybe something like GitHub push notification -> Jenkins validates and pushes to hypervisors -> Everyone else syncs from hypervisors with cronjob?

Unless you can think of a guaranteed way such that everyone pushing configs will have run the validator before pushing. (People can and do forget to install pre-commit hooks…)

2 Likes

I agree, Jenkins seems like the best step for this, I’m just not sure what the best option is after Jenkins has run the validators to push the update out. Maybe we can just have Jenkins connect to the hypervisors after the build is complete and run some command/script to pull in changes (like is done for the puppet repo for updating the production environment). That seems reasonable to me, and then we can have the syncing done from each server with rsync or something, maybe having a DNS record to point to all the hypervisors (not great, a virtual IP would be better, but would also then need keepalived or something similar on each hypervisor). Thoughts?

I also don’t really think it should be called srv-configs, but yes, that’s the repo name that the inspiration comes from. Unless you really like the name or something.

What if we have a “stable” branch that can only be pushed to by Jenkins? That branch would be protected from invalid commits. That way we can still have a cronjob that pulls every once in a while.

That sounds great to me, then you can make sure that it’s valid config or at least run tests on it before promoting master to whatever the stable branch is named.

Here’s an example of one use case for this: hours.yaml, with a schema:

regular:
  - ['9:00', '18:00']
  - ['9:00', '20:00']
  - ['9:00', '20:00']
  - ['9:00', '20:00']
  - ['9:00', '18:00']
  - ['9:00', '20:00']
  - ['9:00', '20:00']
holidays:
  - date: [2018-11-21, 2018-11-25]
    reason: Thanksgiving Break
  - date: 2018-12-03
    reason: Late Lab Opening
    hours: ['10:00', '18:00']
  - date: 2018-12-14
    reason: Last Day of Finals
    hours: ['10:00', '18:00']
  - date: [2018-12-15, 2019-01-14]
    reason: Winter Break

…and here’s what the JSON schema looks like:

{
    "$schema": "http://json-schema.org/draft-08/schema#",
    "$id": "https://ocf.berkeley.edu/schemas/hours.schema.json",

    "definitions": {
        "time": {
            "type": "string",
            "pattern": "^[012]?[0-9]:[0-9][0-9]$"
        },
        "timerange": {
            "type": "array",
            "items": [
                { "$ref": "#/definitions/time" },
                { "$ref": "#/definitions/time" }
            ]
        },

        "date": {
            "type": "string",
            "pattern": "^(2[0-9]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9|3[01]))$"
        },
        "daterange": {
            "type": "array",
            "items": [
                { "$ref": "#/definitions/date" },
                { "$ref": "#/definitions/date" }
            ]
        },

        "holiday": {
            "type": "object",
            "properties": {
                "date": {
                    "oneOf": [
                        { "$ref": "#/definitions/date" },
                        { "$ref": "#/definitions/daterange" }
                    ]
                },
                "reason": { "type": "string" },
                "hours": { "$ref": "#/definitions/timerange" }
            },
            "required": ["date", "reason"],
            "additionalProperties": false
        }
    },

    "type": "object",
    "properties": {
        "regular": {
            "type": "array",
            "minItems": 7,
            "maxItems": 7,
            "items": { "$ref": "#/definitions/timerange" }
        },
        "holidays": {
            "type": "array",
            "items": { "$ref": "#/definitions/holiday" }
        }
    },
    "required": ["regular", "holidays"],
    "additionalProperties": false
}

JSON is always annoying to write. We could theoretically write the schema in YAML, though I’m not sure if this is considered bad practice or not. Here’s what it looks like when I put it in a JSON->YAML converter:

"$schema": http://json-schema.org/draft-08/schema#
"$id": https://ocf.berkeley.edu/schemas/hours.schema.json
definitions:
  time:
    type: string
    pattern: "^[012]?[0-9]:[0-9][0-9]$"
  timerange:
    type: array
    items:
    - "$ref": "#/definitions/time"
    - "$ref": "#/definitions/time"
  date:
    type: string
    pattern: "^(2[0-9]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9|3[01]))$"
  daterange:
    type: array
    items:
    - "$ref": "#/definitions/date"
    - "$ref": "#/definitions/date"
  holiday:
    type: object
    properties:
      date:
        oneOf:
        - "$ref": "#/definitions/date"
        - "$ref": "#/definitions/daterange"
      reason:
        type: string
      hours:
        "$ref": "#/definitions/timerange"
    required:
    - date
    - reason
    additionalProperties: false
type: object
properties:
  regular:
    type: array
    minItems: 7
    maxItems: 7
    items:
      "$ref": "#/definitions/timerange"
  holidays:
    type: array
    items:
      "$ref": "#/definitions/holiday"
required:
- regular
- holidays
additionalProperties: false