1 files changed, 30 insertions, 254 deletions
diff --git a/scripts/README b/scripts/README
index a4cbec03..8ba9bbe5 100644
--- a/scripts/README
+++ b/scripts/README
@@ -218,6 +218,9 @@ TO DO:
 
 - Update operation and installation docs.
 
+  Known current omissions: left-right "rekey" and "revoke" operations,
+  testbed.py's rootd_sia config option.
+
   [Ongoing]
 
 - Update internals docs (Doxygen).
@@ -239,7 +242,7 @@ TO DO:
 
 
 
-Things implemented but not yet tested:
+Things implemented but not yet tested. 
 
 - Client side of expiration now assumes that parent will reissue
   when its IRDB changes.
@@ -264,267 +267,40 @@ Things implemented but not yet tested:
 
 
 
-OLD to do list.  This isn't really organized as a todo list but it
-contains some useful notes, so retain it for now.  Real TODO list is
-above.
-
-- Need scripted tests that shrink and grow and shrink and shrink and
-  grow and shrink and grow and grow and ....   Initial tests with
-  APNIC required doing this by hand, and there's a body of code that
-  only gets exercised when the IRDB changes, so need scripted tests.
-
-  testbed.py is the framework for this, no doubt will add more
-  features later.  Most urgent issues on this front at the moment are
-
-  a) Coming up with some useful test scenarios (YAML for testbed.py)
-     that test interesting combinations of events, and
-
-  b) Analysis tools so that we can check whether the right things
-     happened during the test.
-
-  A first cut at (b) would just be tools to make the test results
-  readable by humans, but need to automate eventually.  Running rcynic
-  on the output would also be useful.
-
-  Being able to specify interaction with other servers (not running
-  under testbed) in a testbed.yaml might be useful for interop tests.
-  Kind of breaks testbed's fundamental model, though.  Replacing what
-  testbed thinks is a leaf with somebody else would be easy, so maybe
-  we could specify some way to hang a bunch of rpkids under an
-  external parent?  Hmm, data needed would look a lot like
-  testpoke.yaml, maybe we can reuse some of that language?
-
-- Work on a common protocol dump format with APNIC and other
-  implementors.  Randy points out that it would be good if we could
-  all read each other's dumps.
-
-  "Obvious" format would be an OpenSSL-style PEM of the CMS, with
-  a "text" portion (the place where "openssl x509 -text" would put a
-  text dump of a cert) showing the wrapped XML.
-
-- Rewrite hooks that call CRL generation and publication to do so
-  immediately rather than waiting for cron.   Batching to handle all
-  of a bunch of events at once would be nice, but start by getting it
-  right, then worry about making it faster.
-
-- resource_set_notafter attribute added to RelaxNG but not yet to
-  rpki.up_down.class_elt.  Need to convert to and from Python datetime
-  but maybe lxml already has code to help us with that.
-
-- Things implemented but not yet tested:
-
-  - Client side of expiration now assumes that parent will reissue
-    when its IRDB changes.
-
-  - Parent side of revocation (child_cert objects) and CRL generation
-    implemented.
-
-  - Parent side of expiration implemented.
-
-  - Child batch processing loop: regeneration or removal of expired
-    certs based on what's in the IRDB.
-
-  - Batch regeneration of CRLs and manifests for all CAs.
-
-  - Protection against up-down operations specifying a class_name that
-    belongs to some other self context.
-
-  - Rewrote code that handles revoke on shrink to revoke -all- old
-    certs for that key, not just most recent.  Not certain, but this
-    may have been the cause of a cert dropping not showing up in the
-    CRL during testing with APNIC in Vancouver.
-
-- Implement remaining left-right control booleans -- among other
-  reasons, these are the IRBE triggers for things like key rollover,
-  which we need to test some of the stuff that's already done.
-
-- Child side of revocation...Common Management Tasks page in the APNIC
-  Wiki shows some states where revocation is triggered by the child
-  after a delay.  Other text in Common Management Tasks suggests that
-  ca_detail also needs deferred transitions from pending to active,
-  although Randy and I don't entirely believe that this is necessary
-  or even advisable.   Revocation delay is enough to require a
-  deferred state transition timestamp to ca_detail object.
-
-  Model for this is an enumerated state value (which we already had)
-  and a timestamp (which may be NULL) for next scheduled transition.
-  At the moment we think that the state progression is linear, ie,
-  there's no need for a next_state field.
-
-    state     := pending | active | deprecated
-    timestamp := NULL | <time of next transition>
-
-  We can check for things with expired timers directly by doing
-  something like:
-
-    SELECT blah FROM ca_detail
-    WHERE timestamp IS NOT NULL and timestamp < UTC_TIMESTAMP()
+Other random notes:
 
-  Well, maybe.  I don't really understand MySQL well enough to be sure
-  that it'll do the right thing comparing TIMESTAMP to DATETIME.
+Being able to specify interaction with other servers (not running
+under testbed) in a testbed.yaml might be useful for interop tests.
+Kind of breaks testbed's fundamental model, though.  Replacing what
+testbed thinks is a leaf with somebody else would be easy, so maybe we
+could specify some way to hang a bunch of rpkids under an external
+parent?  Hmm, data needed would look a lot like testpoke.yaml, maybe
+we can reuse some of that language?
 
-  How do we, as child, find out that a cert has been revoked?  In the
-  up-down protocol we just see a new cert, there's no indication what
-  happened to the old one.  Either:
+There's a three-way tradeoff lurking in the publication protocol,
+manifest generation, and CRL generation:
 
-  a) We asked to have it revoked, duh.
+1) Consistancy issues for relying parties (eg, don't want to withdraw
+   something that's still listed in the manifest);
 
-  b) Parent reissued with same resource class and key, revoking the
-     old cert (oversize, or something).  We have to detect this when
-     processing <list_response/> and probably also <issue_response/>,
-     and perform immediate reissue to any affected children, because
-     the old cert is no good anymore.
+2) Efficiency issues for the RPKI engine (eg, generating a new
+   manifest for each individual change during a batch run could be
+   expensive, would prefer to batch up the changes into a single
+   manifest run); and
 
-  In either case we're done with the old cert once it's been revoked.
-  Since we don't find out about that directly, we're done with it when
-  the parent issues a new cert.
+3) Coherency issues for the RPKI engine (don't want to defer things
+   that could result in loss of state if something bad happens).
 
-  This suggest that the client "deprecated" state only occurs when
-  there's also a timer set to make it go away.
-
-  Common Management Tasks has a delay between receipt of a new cert by
-  the child and that cert going active.  Neither Randy nor I sees a
-  need for this delay, but if we need to implement it we do so using
-  the new cert's timer (ie, timer is on pending -> active transition),
-  so active -> deprecated transition for old cert happens as side
-  effect of activation of the new cert.
-
-  Need logic to decide whether to ask parent to revoke when timer goes
-  off to remove from deprecated state.  Is the test just whether the
-  cert that's going away has the same key as the active cert?
-
-- Publication protocol and implementation thereof.   Defer until core
-  functionality in the main engine is done.
-
-  As an interim measure, I've hacked up a local filesystem publication
-  kludge.
-
-  Need publication hooks for:
-
-  - Cert publication
-
-  - CRL publication
-
-  - Manifest publication
-
-  - Withdrawal of any of the above
-
-  Currently some of the hooks are in the wrong places.
-
-  There's a three-way tradeoff lurking here:
-
-  1) Consistancy issues for relying parties (eg, don't want to
-     withdraw something that's still listed in the manifest);
-
-  2) Efficiency issues for the RPKI engine (eg, generating a new
-     manifest for each individual change during a batch run could be
-     expensive, would prefer to batch up the changes into a single
-     manifest run); and
-
-  3) Coherency issues for the RPKI engine (don't want to defer things
-     that could result in loss of state if something bad happens).
-
-  Considerations (1) and (3) have to dominate, which may mean we take
-  a hit on (2).
-
-- Most of the explicit calls to sql_fetch*() are now encapsulated in
-  one-line methods.  The remaining ones are probably hints at minor
-  bits of abstraction still to be done.
-
-- Subsetting (req_* attributes in up-down protocol)
-
-- Error handling: make sure that exceptions map correctly to up-down
-  error codes, flesh out left-right error codes.  Note that the same
-  exception may produce different error codes depending on which
-  up-down PDU we're processing (sigh).
-
-- Haven't done anything about db.commit() and db.rollback() yet, for
-  that matter haven't yet whacked MySQL to enable those features.
-
-- Test with larger data set -- Tim gave me plenty of data and I have
-  the low-level tools, just haven't written the glue logic to create
-  child objects for all the entities in the IRDB, poll on behalf of
-  each of them, and check the result for sanity
-
-- Need to figure out how we're going to handle "root" case (IANA/RIR
-  self-signed resource cert issuing to its children).  Right now this
-  is a separate little program, which is nice for testing but feels
-  wrong.  As Randy puts it, this is kind of a special case of a
-  parent, although we might want to represent it differently.  If this
-  were really only IANA and the RIRs, handling it as a special case
-  separate program might make sense, but anybody who wants to certify
-  private address space is going to have this problem, and the
-  separate daemon approach just feels wrong for that.
-
-  If it's not a separate daemon, will need left-right protocol support
-  to configure whatever it is we're going to configure, with all the
-  usual private key hygiene issues.  This might imply a level of
-  indirection, eg, the self-signed cert is generated in the IRBE, the
-  RPKI engine generates PKCS#10 for a working cert to be issued by the
-  self-signed cert (perhaps with RFC 3779 inheritance for everything,
-  to keep it small), so that the RPKI engine never needs to hold the
-  private key for the root.  Or maybe the root key is no more special
-  than any of the other keys we have to protect.  Or maybe it's so
-  special that we take the separate daemon approach so we can
-  sneakernet the root key.  Or some combination of the above.
-
-  Deferred for the moment, not sure for how long.
-
-- Need to handle loss of connnection to database server.  MySQLdb
-  throws an exception, which we can catch, and retrying is easy
-  enough, but need to be a bit careful about recovery action depending
-  on whether we had uncommitted changes.
-
-- tlslite code seems a bit flakey under heavy use.  irdb-setup.py has
-  now failed to run to completion on two different machines, in both
-  cases something myterious and bad happened in the tls code.
-
-- ROA generation.  We have a bunch of the primitives for this but we
-  aren't yet generating the ROAs themselves.
+Considerations (1) and (3) have to dominate, which may mean we take a
+hit on (2).
 
-Once this lot is done we'll be close to something that shows at least
-the basics of normal operation, albiet in a form that's not yet usable
-in production.
-
-Follow-up after that will be getting rid of remaining synchronous code
-(make daemon fully event-driven, except perhaps for SQL queries),
-address rollback, commit, and other data integrity issues, and see how
-well the resulting code handles hosting (multiple self objects in same
-daemon).  Will need some way of implementing loops in the event
-system.  Absent some fancier method, probably just store a list of
-object ids, then have the event handlers step through that fetching
-the next object (and coping with the case where an object that was in
-the list when the initial query was made isn't there anymore...).
-
-Somewhere along the way I'll need to update to the new model of trust
-anchors we ended up with in Amsterdam, first step for which will
-involve writing it down (well, RobK was supposed to do that, but I was
-supposed to convert some pencil sketches into graphviz for him so
-we're both lame on this so far).  I don't think this results in major
-changes, probably a few extra cert fields in the self object which we
-then need to toss into the rpki.x509.X509_chain objects before
-verifying CMS or TLS, and perhaps the existing TA fields in various
-objects become pairs of certs instead of a single TA, but this is
-mostly just generalization and reuse of existing code, no bold new
-adventures.
-
-At some point we need to do performance testing.
-
-Although it may not be worth the effort, it'd be nice to get back to
-only one crypto library.  As far as I know the only things we're using
-M2Crypto or cryptlib for are TLS.  POW supports TLS, but tlslite
-doesn't support POW.  I suspect that tlslite has some kind of driver
-interface, since it already supports two crypto packages, so there may
-be a simple answer here.  Not worth getting anywhere near this until
-after making the jump to Twisted, as that might also affect this mess.
-If the Python implementation ends up becoming the production version
-of the engine rather than just a prototype, it might make sense to
-revisit this, as the current mess of crypto libraries smells like an
-accident waiting to happen.
+Most of the explicit calls to sql_fetch*() are now encapsulated in
+one-line methods.  The remaining ones are probably hints at minor bits
+of abstraction still to be done.
 
 Biz certs currently used by test scripts don't include SKI or AKI.  I
 think this is because the test scripts use "openssl x509" rather than
 "openssl ca" when generating these certs.  Not critical, and will
-probably become completely irrelevant when RobK supplies the
-all-singing all-dancing biz cert scripts, but should not be a big
-problem to fix either if it gets in the way again.
+probably become completely irrelevant with all-singing all-dancing
+post-Amsterdam biz cert scripts, but should not be a big problem to
+fix either if it gets in the way again.