diff options
Diffstat (limited to 'presentations/repository-structure.txt')
-rw-r--r-- | presentations/repository-structure.txt | 111 |
1 files changed, 111 insertions, 0 deletions
diff --git a/presentations/repository-structure.txt b/presentations/repository-structure.txt new file mode 100644 index 00000000..327663b7 --- /dev/null +++ b/presentations/repository-structure.txt @@ -0,0 +1,111 @@ +$Id$ + +Copyright (C) 2007--2008 American Registry for Internet Numbers ("ARIN") + +Permission to use, copy, modify, and distribute this software for any +purpose with or without fee is hereby granted, provided that the above +copyright notice and this permission notice appear in all copies. + +THE SOFTWARE IS PROVIDED "AS IS" AND ARIN DISCLAIMS ALL WARRANTIES WITH +REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY +AND FITNESS. IN NO EVENT SHALL ARIN BE LIABLE FOR ANY SPECIAL, DIRECT, +INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM +LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE +OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR +PERFORMANCE OF THIS SOFTWARE. + + +This note is an attempt to write up the conclusions that several of us +came to at the RPKI meeting in Prague, March 2007. + +We currently have two examples of possible repository structure, +neither of which is quite what at least some of us think we want yet. + +APNIC's hierarchy of directories named by the g(SKI) function applied +to the public keys has some nice properties: in particular, the +hierarchical structure (subjects being stored underneath their issuers +when sharing a repository) has very nice scaling properties with +rsync. With this structure, a top-down walk such as rcynic uses will +pick up an entire tree as a side effect of fetching the top-most +issuer's SIA collection. Combined with a trivial memory cache to let +rcynic keep track of which URIs it has already fetched, this speeds up +rcynic's runtime by nearly two orders of magnitude in testing late +last year. The reason for this turns out to be simple: rsync is a +reasonably fast protocol, so one rsync connection to fetch an SIA +collection that hasn't changed takes approximately 500ms on not +terribly exciting hardware -- but in a system with more than 40,000 +objects, the cumulative total of all those half-second rsync +connections becomes significant. + +There is, however, one problem with the current APNIC structure: the +g(SKI) naming scheme means that the URIs associated with every +decendent of a particular node in the tree will change when that +node's key changes. This means that a key rollover event near the +root of the tree (eg, when APNIC's own key changes) will require a +painfully large number of certificates to be reissued. NB: only one +key is rolling in this scenario, the rest of the certificates are +being reissued with the same key, and the only reason why they need to +be reissued at all is that the path has changed. + +At the same time, the g(SKI) naming scheme has good properties for +naming certificates and CRLs, as it means that we can perform "make +before break" rollovers, leaving the old certificate path in place +until the new one is fully operational. + +So the goal here is to preserve the nice hierarchical structure of +APNIC's model, and preserve the nice properties of the g(SKI) naming +for certificates and CRLs, while avoiding the need to reissue every +desendent after a key rollover. We think that we can do this by using +static names for the directories and g(SKI)-based names for the files. + +This immediately begs the question of what the static names should be. +Some of us were tempted to make them organization names, but others of +us felt strongly that they should be completely meaningless, as they +only need to be unique within a particular parent directory and as we +really want to avoid all the "identity" issues that we have thus far +carefully avoided. We did not achieve consensus on this point, but I +(sra) am firmly in the meaningless name camp on this one. + +The upshot of all this is that I expect to be using URIs like: + + rsync://hostname/g0001/g0002/g0003/ + rsync://hostname/g0001/g0002/g0003/g(ski).cer + rsync://hostname/g0001/g0002/g0003/g(ski).crl + +where "gNNNN" indicates a generated symbol with no significance other +than that it is unique enough to avoid collisions. In this particular +case, the scope in which the symbol must be unique is just the +directory in which it appears. + +We haven't yet discussed how to name ROAs, but since we expect ROAs to +be tied to particular single-use EE certs, my guess is that a g(SKI) +name based on the key of the EE cert would be appropriate, so either +of the following forms would probably work: + + rsync://hostname/g0001/g0002/g0003/g0004/g(ski).roa + rsync://hostname/g0001/g0002/g0003/g(ski)/g(ski).roa + +The latter violates the rule of only using g(SKI) names for files, +never for directories, but in this case it'd probably be ok. + +This assumes that we're generating a new key for each ROA; if we're +reusing keys (dunno), we'd need a naming scheme like: + + rsync://hostname/g0001/g0002/g0003/g(ski)/g0004.roa + +For those unfamiliar with the notation (borrowed from Lisp): g0001 etc +are just "gensym" symbols, ie, the output of some function whose sole +purpose is to generate meaningless symbols. + +See images/repository-structure.pdf for an illustration of the problem +and solution. + +There may be compromise-driven rollover cases in which we will need to +reissue all of the children of a node whose key has been compromised. +Whether or not this is necessary depends on whether the master copy of +the authoritative data is safe somewhere else; if it is, and the +resource certificates are just a signed representation of an +authoritative database that has not been compromised, reissuing all of +the descendants may not be necessary, but if the resource certificates +-are- the database, and one level in it has been compromised, it's +probably advisable to reissue all the descendants. |