$Id$ -*- Text -*- Python RPKI production tools. Requires Python 2.5. See doc/Installation for installation instructions and required packages. The full manual is available in both PDF and HTML formats; the PDF is in doc/manual.pdf, the HTML is in a compressed tarball doc/manual.tar.gz. $Revision$ TO DO: * Rework handling of surprising responses to up-down requests. Right now we get confused when we find that parent has issued a cert that we don't remember requesting, even when we have the ca_detail object in question sitting in our SQL as pending. This can happen if we throw an exception later and don't clean up properly -- which should never happen, but let's try to be robust about this. So we need to be smarter about comparing our own state with what we get back from our parent and figuring out what to do next. We probably also need to commit changes to SQL earlier. In general we should never have more than one ca_detail in state pending for a given ca, but the current code blindly assumes that will never happen and never recovers if that assumption has been violated. STATUS: Started, not complete. Internal tracking state for whether objects have been published is in place, but there's no code yet to force retry of failed publication. Some support for cleaning up extraneous ca_detail objects, dunno (yet) whether this is enough. TIME REQUIRED: One week (remaining). * Error handling: make sure that exceptions map correctly to up-down error codes, flesh out left-right error codes. Note that the same exception may produce different error codes depending on which up-down PDU we're processing (sigh). Will require code audit for coherency, which is most of the work. TIME REQUIRED: Two weeks DEPENDS ON: almost everything else, as almost any code change can raise new exceptions that we'd need to handle. STATUS: Not started * db.commit(), db.rollback(), code audit for data integrity issues, fix any data integrity issues that turn up. Among other issues, need to handle loss of connection to database server and other MySQL errors. Need to be careful about recovery action depending on whether we had uncommitted changes. TIME REQUIRED (commit and rollback): 3-4 weeks TIME REQUIRED (data integrity audit): 1 week TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results of runtime testing. STATUS: Not started * Resource subsetting (req_* attributes in up-down protocol), full implementation. Requires expanding SQL child_cert table to hold subset masks and rewriting a fair amount of code. TIME REQUIRED: 3-4 weeks STATUS: Not started * Performance testing and profiling. Getting rid of tlslite was a good first step, and RSA will always be slow without a HSM, but last time I tried profiling I saw hints that the Python ASN.1 may be a bottleneck. TIME REQUIRED: A few days to do profiling. What happens after that depends on what profiling finds. DEPENDS ON: Serious load testing may require assistance from others with larger test labs than I have directly available. STATUS: Barely started * Clean up rootd.py to be usable in a production system. Most urgent issues are handling of private keys, publishing outputs in pubd, and reissuing when details or keys change. May not need much else, as this is not a high-traffic server. Alternatively, perhaps rootd's functionality should be merged into rpkid after all, given that we now believe that anybody who needs to certify private address space may need to run it. TIME REQUIRED: One week if just cleaning up rootd. 2-3 weeks if folding rootd into rpkid. STATUS: Not started * Update internals docs (Doxygen). Mostly this means updating function comments in the Python code, as the rest is automatic. May require a bit more overview text to explain the workings and usage of the code. TIME REQUIRED: One week. STATUS: Ongoing * Add HSM support. Architecture includes it, current code does not. First step here would be talking to somebody with strong understanding of PKCS# 11. TIME REQUIRED: Unknown STATUS: Not started * Tighten up syntax checking in left-right schema. TIME REQUIRED: One day. STATUS: Not started * rcynic handling of RPKI trust anchors does not yet support draft-ietf-sidr-ta. Not needed for technical reasons ("trust-anchor-uri-with-key" method is roughly equivilent and much simpler), may be required for political reasons. TIME REQUIRED: Three days STATUS: Not started * Investigate using EKU (RFC 3280 4.2.1.13) as an alternative to wiring in BPKI EE certs for left-right protocol. STATUS: Not started * Django web UI to RPKI code will require some back-end support, not yet sure exactly how much. Current plan is that Django tool just drops data into SQL, at which point it becomes my problem; if we keep this model, semantics of the existing command line tools should map fairly well, so this part will just be a matter of performing essentially the same operations that the command line UI does now, with SQL tables instead of CSV files as the data store. TIME REQUIRED: Two weeks STATUS: Not started * Django Web UI to RPKI code as currently envisioned will (also) require some additional data from rpkid and rcynic that is not yet available in machine-parsable form, so that UI can report on status of delegated resources and detailed validation status. rpkid extensions for this should be relatively simple, rcynic work may be a bit more complex, or at least tedious, as it'll be C code generating XML. was part of this. rcynic XML detail should describe entire certificate validation chain and status of validation at each stage. Given tree walk this is probably going to be some kind of XML-representation tree structure, which i will probably design as s-expressions first to keep my brain from exploding with gratuitous XML syntax. TIME REQUIRED: 1-2 weeks. STATUS: done. rcynic work started along a different path (XML reporting of validation failure events, mostly done, still some corner cases); unclear whether this will suffice or UI really needs tree structure per above. * At present there is no mechanism by which an IRBE could request signing of objects other than ROAs. Eg, there has been some discussion of signing S/MIME letters to humans asking for routing, as an alternative to ROAs. If we decide to support this at all, it turns into a generalization of the ROA problem, and suggests that perhaps ROA generation should be handled somewhere outside of rpkid and only passed to rpkid for signing. This would be a significant change to the architecture, as it would remove rpkid's responsibility for keeping ROAs up to date. On further analysis: ROAs are different from S/MIME letters, in that ROAs are something we want both published and maintained on an ongoing basis until canceled, while S/MIME letters are one-offs that probably are not published. So ROAs need -something- to keep them current, and that something might as well be rpkid unless we find a strong argument that it should be something else. So the S/MIME letter functionality probably stays a different mechanism from ROAs. TIME REQUIRED: One week (including deciding what left-right protocol semantics for this should be) STATUS: Not started * There has been some discussion both in and out of the SIDR WG on perhaps dropping TLS out of the up-down protocol, as it is arguably not providing much that we can't do equally well with CMS. Left-right and publication are currently not SIDR WG docs, but presumably they would follow. Dunno where this is going to go, but assuming for purposes of discussion that we do drop TLS, we'll want to rip all that code out. This includes revising BPKI, SQL, left-right and publication protocols, and code using all of these both in daemons and UI tools. TIME REQUIRED: Three weeks (very rough guess). DEPENDS ON: Decision whether to keep or drop TLS. STATUS: Not started * Integrate UI tools into main code base. Right now there's this odd split, with the myrpki stuff off to one side, irdbd (which is also a sample implementation, not a core tool) in with rpkid, test code scattered hither and yon in all the above places, and none of it set up nicely either for running in place or installation. This all needs to be cleaned up, most likely by reorganizing all of the Python (and POW) code. TIME REQUIRED: Two weeks. STATUS: Mostly done. POW is still hanging off to the side, but the myrpki/ directory has now been merged into rpkid/, and various other bits have been cleaned up. * Autoconf review. Right now we're making minimal use of autoconf, just enough to get the code running on Mac OS and clean up a few old annoyances. There are other things that ought to be using autoconf, now that we're stuck with it. Eg, installation scripts, the build code for POW, etc. In the long run we might even want to check for usable system OpenSSL code and libraries: the RFC 3779 code is still off by default in all known public releases of OpenSSL, but the BPKI stuff that myrpki does only requires CMS, not RFC 3779, so it may be able to use the system openssl binary in many cases. TIME REQUIRED: One week for an initial pass. DEPENDS ON: Installation scripts. Not so much depends on, really, as two aspects of one interrelated mess. STATUS: Not started * We need installation scripts. Right now the only thing we install is rcynic, and that only on FreeBSD. TIME REQUIRED: One week, longer if installation for many platforms is required STATUS: Not started * We need better and unified documentation. Right now doc is scattered between rpkid core manual, various READMEs, internal docuemntation in various tools, etcetera. This is not kind to the user. Depending on how much hand-written (as opposed to Doxygen-harvested) doc we end up with, might want to convert overall to something like the Doxygen/Docbook combination that the Boost project uses (Boostbook). TIME REQUIRED: At least two weeks, plus at least one more week if making a serious change in doc tools (eg, Boostbook). DEPENDS ON: Portions of this would make sense to defer until after whatever code reorg happens to integrate UI tools, etc. Most of the hand-written content could be done right away, might require minor edits later to track reorg changes. STATUS: Not started * Rewrite irbe_cli.py to use cmd module. Right now irbe_cli is useful only as a debugging tool for its author, and the interface is very clunky (even by comparision to other clunky bits of code in this package). Rewriting to use cmd module would be a major improvement; some minor challenges here because irbe_cli integrates so tightly with the Python message classes representing the left-right and publication protcols; figuring out how to turn this into a cmd-based program without massive (and fragile) duplication of code is probably good for a few days of head scratching. TIME REQUIRED: Two weeks (including head scratching) STATUS: Not started * Clean up testbed tools. There's a collection of hacks that have been evolving as we've been building the testbed, most of which just grew as our needs evolved. The main scripts are checked into the repository, but some of the minor stuff is not, and some of the automation used in the testbed (cron scripts, automated use of a version control system (currently subversion) to archive changes to running data, etcetera) might be useful to others, so it should be cleaned up and made available as part of the package. TIME REQUIRED: One week STATUS: Moving target, but let's say "not started" for the bits I'm thinking about as I type this. * Early (pseudo) operational testing has uncovered a conflict between RIRs need not to be in the business of attesting to identities and operators need to have -some- way of finding out who to call when a RPKI cert is broken. Current proposal is to allow signature and publication of blobs of whois-like data; these would be signed by an EE cert using RFC 3779 inheritance, and would in essence be a self-attestation (no checking by others, no liability incurred by others, etc) as to contact information one might use in case of a problem. This would require a minor change to the rescert profile, so the SIDR WG would need to sign off on this. As this data would be published, presumably we would want rcynic to be able to check it. TIME REQUIRED: One week to add to rpkid et al (very rough -- includes design work to figure out exactly where this would fit, actual coding probably relatively minor). Perhaps an additional day or three to add to rcynic and write suitable search and display tools. DEPENDS ON: Agreement to add this to rescert profile. STATUS: Not started * myrpki.py should have a command that summarizes current state (data on file, actions it might make sense to take now, etc). TIME REQUIRED: A day or two STATUS: Not started * rcynic needs major rewrite to run multiple rsync processes in background, to work around tarpit attack by evil publishers. TIME REQUIRED: Three weeks (wild guess) STATUS: Not started, byond some preliminary design thoughts.