diff options
-rw-r--r-- | scripts/README | 284 |
1 files changed, 30 insertions, 254 deletions
diff --git a/scripts/README b/scripts/README index a4cbec03..8ba9bbe5 100644 --- a/scripts/README +++ b/scripts/README @@ -218,6 +218,9 @@ TO DO: - Update operation and installation docs. + Known current omissions: left-right "rekey" and "revoke" operations, + testbed.py's rootd_sia config option. + [Ongoing] - Update internals docs (Doxygen). @@ -239,7 +242,7 @@ TO DO: -Things implemented but not yet tested: +Things implemented but not yet tested. - Client side of expiration now assumes that parent will reissue when its IRDB changes. @@ -264,267 +267,40 @@ Things implemented but not yet tested: -OLD to do list. This isn't really organized as a todo list but it -contains some useful notes, so retain it for now. Real TODO list is -above. - -- Need scripted tests that shrink and grow and shrink and shrink and - grow and shrink and grow and grow and .... Initial tests with - APNIC required doing this by hand, and there's a body of code that - only gets exercised when the IRDB changes, so need scripted tests. - - testbed.py is the framework for this, no doubt will add more - features later. Most urgent issues on this front at the moment are - - a) Coming up with some useful test scenarios (YAML for testbed.py) - that test interesting combinations of events, and - - b) Analysis tools so that we can check whether the right things - happened during the test. - - A first cut at (b) would just be tools to make the test results - readable by humans, but need to automate eventually. Running rcynic - on the output would also be useful. - - Being able to specify interaction with other servers (not running - under testbed) in a testbed.yaml might be useful for interop tests. - Kind of breaks testbed's fundamental model, though. Replacing what - testbed thinks is a leaf with somebody else would be easy, so maybe - we could specify some way to hang a bunch of rpkids under an - external parent? Hmm, data needed would look a lot like - testpoke.yaml, maybe we can reuse some of that language? - -- Work on a common protocol dump format with APNIC and other - implementors. Randy points out that it would be good if we could - all read each other's dumps. - - "Obvious" format would be an OpenSSL-style PEM of the CMS, with - a "text" portion (the place where "openssl x509 -text" would put a - text dump of a cert) showing the wrapped XML. - -- Rewrite hooks that call CRL generation and publication to do so - immediately rather than waiting for cron. Batching to handle all - of a bunch of events at once would be nice, but start by getting it - right, then worry about making it faster. - -- resource_set_notafter attribute added to RelaxNG but not yet to - rpki.up_down.class_elt. Need to convert to and from Python datetime - but maybe lxml already has code to help us with that. - -- Things implemented but not yet tested: - - - Client side of expiration now assumes that parent will reissue - when its IRDB changes. - - - Parent side of revocation (child_cert objects) and CRL generation - implemented. - - - Parent side of expiration implemented. - - - Child batch processing loop: regeneration or removal of expired - certs based on what's in the IRDB. - - - Batch regeneration of CRLs and manifests for all CAs. - - - Protection against up-down operations specifying a class_name that - belongs to some other self context. - - - Rewrote code that handles revoke on shrink to revoke -all- old - certs for that key, not just most recent. Not certain, but this - may have been the cause of a cert dropping not showing up in the - CRL during testing with APNIC in Vancouver. - -- Implement remaining left-right control booleans -- among other - reasons, these are the IRBE triggers for things like key rollover, - which we need to test some of the stuff that's already done. - -- Child side of revocation...Common Management Tasks page in the APNIC - Wiki shows some states where revocation is triggered by the child - after a delay. Other text in Common Management Tasks suggests that - ca_detail also needs deferred transitions from pending to active, - although Randy and I don't entirely believe that this is necessary - or even advisable. Revocation delay is enough to require a - deferred state transition timestamp to ca_detail object. - - Model for this is an enumerated state value (which we already had) - and a timestamp (which may be NULL) for next scheduled transition. - At the moment we think that the state progression is linear, ie, - there's no need for a next_state field. - - state := pending | active | deprecated - timestamp := NULL | <time of next transition> - - We can check for things with expired timers directly by doing - something like: - - SELECT blah FROM ca_detail - WHERE timestamp IS NOT NULL and timestamp < UTC_TIMESTAMP() +Other random notes: - Well, maybe. I don't really understand MySQL well enough to be sure - that it'll do the right thing comparing TIMESTAMP to DATETIME. +Being able to specify interaction with other servers (not running +under testbed) in a testbed.yaml might be useful for interop tests. +Kind of breaks testbed's fundamental model, though. Replacing what +testbed thinks is a leaf with somebody else would be easy, so maybe we +could specify some way to hang a bunch of rpkids under an external +parent? Hmm, data needed would look a lot like testpoke.yaml, maybe +we can reuse some of that language? - How do we, as child, find out that a cert has been revoked? In the - up-down protocol we just see a new cert, there's no indication what - happened to the old one. Either: +There's a three-way tradeoff lurking in the publication protocol, +manifest generation, and CRL generation: - a) We asked to have it revoked, duh. +1) Consistancy issues for relying parties (eg, don't want to withdraw + something that's still listed in the manifest); - b) Parent reissued with same resource class and key, revoking the - old cert (oversize, or something). We have to detect this when - processing <list_response/> and probably also <issue_response/>, - and perform immediate reissue to any affected children, because - the old cert is no good anymore. +2) Efficiency issues for the RPKI engine (eg, generating a new + manifest for each individual change during a batch run could be + expensive, would prefer to batch up the changes into a single + manifest run); and - In either case we're done with the old cert once it's been revoked. - Since we don't find out about that directly, we're done with it when - the parent issues a new cert. +3) Coherency issues for the RPKI engine (don't want to defer things + that could result in loss of state if something bad happens). - This suggest that the client "deprecated" state only occurs when - there's also a timer set to make it go away. - - Common Management Tasks has a delay between receipt of a new cert by - the child and that cert going active. Neither Randy nor I sees a - need for this delay, but if we need to implement it we do so using - the new cert's timer (ie, timer is on pending -> active transition), - so active -> deprecated transition for old cert happens as side - effect of activation of the new cert. - - Need logic to decide whether to ask parent to revoke when timer goes - off to remove from deprecated state. Is the test just whether the - cert that's going away has the same key as the active cert? - -- Publication protocol and implementation thereof. Defer until core - functionality in the main engine is done. - - As an interim measure, I've hacked up a local filesystem publication - kludge. - - Need publication hooks for: - - - Cert publication - - - CRL publication - - - Manifest publication - - - Withdrawal of any of the above - - Currently some of the hooks are in the wrong places. - - There's a three-way tradeoff lurking here: - - 1) Consistancy issues for relying parties (eg, don't want to - withdraw something that's still listed in the manifest); - - 2) Efficiency issues for the RPKI engine (eg, generating a new - manifest for each individual change during a batch run could be - expensive, would prefer to batch up the changes into a single - manifest run); and - - 3) Coherency issues for the RPKI engine (don't want to defer things - that could result in loss of state if something bad happens). - - Considerations (1) and (3) have to dominate, which may mean we take - a hit on (2). - -- Most of the explicit calls to sql_fetch*() are now encapsulated in - one-line methods. The remaining ones are probably hints at minor - bits of abstraction still to be done. - -- Subsetting (req_* attributes in up-down protocol) - -- Error handling: make sure that exceptions map correctly to up-down - error codes, flesh out left-right error codes. Note that the same - exception may produce different error codes depending on which - up-down PDU we're processing (sigh). - -- Haven't done anything about db.commit() and db.rollback() yet, for - that matter haven't yet whacked MySQL to enable those features. - -- Test with larger data set -- Tim gave me plenty of data and I have - the low-level tools, just haven't written the glue logic to create - child objects for all the entities in the IRDB, poll on behalf of - each of them, and check the result for sanity - -- Need to figure out how we're going to handle "root" case (IANA/RIR - self-signed resource cert issuing to its children). Right now this - is a separate little program, which is nice for testing but feels - wrong. As Randy puts it, this is kind of a special case of a - parent, although we might want to represent it differently. If this - were really only IANA and the RIRs, handling it as a special case - separate program might make sense, but anybody who wants to certify - private address space is going to have this problem, and the - separate daemon approach just feels wrong for that. - - If it's not a separate daemon, will need left-right protocol support - to configure whatever it is we're going to configure, with all the - usual private key hygiene issues. This might imply a level of - indirection, eg, the self-signed cert is generated in the IRBE, the - RPKI engine generates PKCS#10 for a working cert to be issued by the - self-signed cert (perhaps with RFC 3779 inheritance for everything, - to keep it small), so that the RPKI engine never needs to hold the - private key for the root. Or maybe the root key is no more special - than any of the other keys we have to protect. Or maybe it's so - special that we take the separate daemon approach so we can - sneakernet the root key. Or some combination of the above. - - Deferred for the moment, not sure for how long. - -- Need to handle loss of connnection to database server. MySQLdb - throws an exception, which we can catch, and retrying is easy - enough, but need to be a bit careful about recovery action depending - on whether we had uncommitted changes. - -- tlslite code seems a bit flakey under heavy use. irdb-setup.py has - now failed to run to completion on two different machines, in both - cases something myterious and bad happened in the tls code. - -- ROA generation. We have a bunch of the primitives for this but we - aren't yet generating the ROAs themselves. +Considerations (1) and (3) have to dominate, which may mean we take a +hit on (2). -Once this lot is done we'll be close to something that shows at least -the basics of normal operation, albiet in a form that's not yet usable -in production. - -Follow-up after that will be getting rid of remaining synchronous code -(make daemon fully event-driven, except perhaps for SQL queries), -address rollback, commit, and other data integrity issues, and see how -well the resulting code handles hosting (multiple self objects in same -daemon). Will need some way of implementing loops in the event -system. Absent some fancier method, probably just store a list of -object ids, then have the event handlers step through that fetching -the next object (and coping with the case where an object that was in -the list when the initial query was made isn't there anymore...). - -Somewhere along the way I'll need to update to the new model of trust -anchors we ended up with in Amsterdam, first step for which will -involve writing it down (well, RobK was supposed to do that, but I was -supposed to convert some pencil sketches into graphviz for him so -we're both lame on this so far). I don't think this results in major -changes, probably a few extra cert fields in the self object which we -then need to toss into the rpki.x509.X509_chain objects before -verifying CMS or TLS, and perhaps the existing TA fields in various -objects become pairs of certs instead of a single TA, but this is -mostly just generalization and reuse of existing code, no bold new -adventures. - -At some point we need to do performance testing. - -Although it may not be worth the effort, it'd be nice to get back to -only one crypto library. As far as I know the only things we're using -M2Crypto or cryptlib for are TLS. POW supports TLS, but tlslite -doesn't support POW. I suspect that tlslite has some kind of driver -interface, since it already supports two crypto packages, so there may -be a simple answer here. Not worth getting anywhere near this until -after making the jump to Twisted, as that might also affect this mess. -If the Python implementation ends up becoming the production version -of the engine rather than just a prototype, it might make sense to -revisit this, as the current mess of crypto libraries smells like an -accident waiting to happen. +Most of the explicit calls to sql_fetch*() are now encapsulated in +one-line methods. The remaining ones are probably hints at minor bits +of abstraction still to be done. Biz certs currently used by test scripts don't include SKI or AKI. I think this is because the test scripts use "openssl x509" rather than "openssl ca" when generating these certs. Not critical, and will -probably become completely irrelevant when RobK supplies the -all-singing all-dancing biz cert scripts, but should not be a big -problem to fix either if it gets in the way again. +probably become completely irrelevant with all-singing all-dancing +post-Amsterdam biz cert scripts, but should not be a big problem to +fix either if it gets in the way again. |