aboutsummaryrefslogtreecommitdiff
path: root/presentations/repository-structure.txt
diff options
context:
space:
mode:
Diffstat (limited to 'presentations/repository-structure.txt')
-rw-r--r--presentations/repository-structure.txt111
1 files changed, 111 insertions, 0 deletions
diff --git a/presentations/repository-structure.txt b/presentations/repository-structure.txt
new file mode 100644
index 00000000..327663b7
--- /dev/null
+++ b/presentations/repository-structure.txt
@@ -0,0 +1,111 @@
+$Id$
+
+Copyright (C) 2007--2008 American Registry for Internet Numbers ("ARIN")
+
+Permission to use, copy, modify, and distribute this software for any
+purpose with or without fee is hereby granted, provided that the above
+copyright notice and this permission notice appear in all copies.
+
+THE SOFTWARE IS PROVIDED "AS IS" AND ARIN DISCLAIMS ALL WARRANTIES WITH
+REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY
+AND FITNESS. IN NO EVENT SHALL ARIN BE LIABLE FOR ANY SPECIAL, DIRECT,
+INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
+LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
+OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
+PERFORMANCE OF THIS SOFTWARE.
+
+
+This note is an attempt to write up the conclusions that several of us
+came to at the RPKI meeting in Prague, March 2007.
+
+We currently have two examples of possible repository structure,
+neither of which is quite what at least some of us think we want yet.
+
+APNIC's hierarchy of directories named by the g(SKI) function applied
+to the public keys has some nice properties: in particular, the
+hierarchical structure (subjects being stored underneath their issuers
+when sharing a repository) has very nice scaling properties with
+rsync. With this structure, a top-down walk such as rcynic uses will
+pick up an entire tree as a side effect of fetching the top-most
+issuer's SIA collection. Combined with a trivial memory cache to let
+rcynic keep track of which URIs it has already fetched, this speeds up
+rcynic's runtime by nearly two orders of magnitude in testing late
+last year. The reason for this turns out to be simple: rsync is a
+reasonably fast protocol, so one rsync connection to fetch an SIA
+collection that hasn't changed takes approximately 500ms on not
+terribly exciting hardware -- but in a system with more than 40,000
+objects, the cumulative total of all those half-second rsync
+connections becomes significant.
+
+There is, however, one problem with the current APNIC structure: the
+g(SKI) naming scheme means that the URIs associated with every
+decendent of a particular node in the tree will change when that
+node's key changes. This means that a key rollover event near the
+root of the tree (eg, when APNIC's own key changes) will require a
+painfully large number of certificates to be reissued. NB: only one
+key is rolling in this scenario, the rest of the certificates are
+being reissued with the same key, and the only reason why they need to
+be reissued at all is that the path has changed.
+
+At the same time, the g(SKI) naming scheme has good properties for
+naming certificates and CRLs, as it means that we can perform "make
+before break" rollovers, leaving the old certificate path in place
+until the new one is fully operational.
+
+So the goal here is to preserve the nice hierarchical structure of
+APNIC's model, and preserve the nice properties of the g(SKI) naming
+for certificates and CRLs, while avoiding the need to reissue every
+desendent after a key rollover. We think that we can do this by using
+static names for the directories and g(SKI)-based names for the files.
+
+This immediately begs the question of what the static names should be.
+Some of us were tempted to make them organization names, but others of
+us felt strongly that they should be completely meaningless, as they
+only need to be unique within a particular parent directory and as we
+really want to avoid all the "identity" issues that we have thus far
+carefully avoided. We did not achieve consensus on this point, but I
+(sra) am firmly in the meaningless name camp on this one.
+
+The upshot of all this is that I expect to be using URIs like:
+
+ rsync://hostname/g0001/g0002/g0003/
+ rsync://hostname/g0001/g0002/g0003/g(ski).cer
+ rsync://hostname/g0001/g0002/g0003/g(ski).crl
+
+where "gNNNN" indicates a generated symbol with no significance other
+than that it is unique enough to avoid collisions. In this particular
+case, the scope in which the symbol must be unique is just the
+directory in which it appears.
+
+We haven't yet discussed how to name ROAs, but since we expect ROAs to
+be tied to particular single-use EE certs, my guess is that a g(SKI)
+name based on the key of the EE cert would be appropriate, so either
+of the following forms would probably work:
+
+ rsync://hostname/g0001/g0002/g0003/g0004/g(ski).roa
+ rsync://hostname/g0001/g0002/g0003/g(ski)/g(ski).roa
+
+The latter violates the rule of only using g(SKI) names for files,
+never for directories, but in this case it'd probably be ok.
+
+This assumes that we're generating a new key for each ROA; if we're
+reusing keys (dunno), we'd need a naming scheme like:
+
+ rsync://hostname/g0001/g0002/g0003/g(ski)/g0004.roa
+
+For those unfamiliar with the notation (borrowed from Lisp): g0001 etc
+are just "gensym" symbols, ie, the output of some function whose sole
+purpose is to generate meaningless symbols.
+
+See images/repository-structure.pdf for an illustration of the problem
+and solution.
+
+There may be compromise-driven rollover cases in which we will need to
+reissue all of the children of a node whose key has been compromised.
+Whether or not this is necessary depends on whether the master copy of
+the authoritative data is safe somewhere else; if it is, and the
+resource certificates are just a signed representation of an
+authoritative database that has not been compromised, reissuing all of
+the descendants may not be necessary, but if the resource certificates
+-are- the database, and one level in it has been compromised, it's
+probably advisable to reissue all the descendants.