$Id$ -*- Text -*- Python RPKI production tools. Requires Python 2.5. See doc/Installation for installation instructions and required packages. $Revision$ TO DO: - rcynic handling of RPKI trust anchors needs updating, per discussions over previous months of how RPKI trust anchors work, how we package them, and how we roll them over. The last (TA rollover) is the driver for this. APNIC is now proposing a CMS-signed ASN.1 blob containing a version number and an RPKI certificate. Kent and Housley have not bought into this yet. RIPE is proposing that trust anchors just be a URL and a public key, which one would use by fetching a self-signed RPKI cert from the URL and comparing the public key. If everybody homes under IANA, none of this is necessary and what rcynic already does should suffice. Need to pick something and go with it. All but "home under IANA" would require minor changes to rcynic. PRIORITY: Required for pilot (usability issue for relying parties) TIME REQUIRED: One week. STATUS: Not started - Publication protocol and implementation thereof. Tricky bit is making sure that repository receives enough information to know whether parent has authorized child to use parent's namespace in nesting case; in theory this is straightforward but requires careful checking. Current implementation just uses a configured path check and does not attempt to trace back to permission from parent in nested publication case. Class and method design is intended to make it easy to drop in additional checks if needed. PRIORITY: Required for pilot TIME REQUIRED: 3-4 weeks for implementation. STATUS: Initial implementation seems to work, not seriously tested yet. See above for notes on ACL checking. - Resource subsetting (req_* attributes in up-down protocol), minimal implementation. Recognize this as correct protocol and signal an internal server error if ever used. PRIORITY: Required for pilot. TIME REQUIRED: Two days STATUS: code written, not yet tested. - rcynic does not yet handle manifests. This is both a real problem (manifests were added to plug a security hole) and a user acceptance problem (without manifest support rcynic checks old certs that are supposed to fail because they've been revoked, resulting in what appear to be spurious errors, which just annoy the user). PRIORITY: Required for pilot TIME REQUIRED: Two weeks. STATUS: Not started - User validation tool: fetch and validate certs and ROA for a prefix that the user wants to accept in a router filter the user is building. This probably uses rcynic's output as one of its inputs. PRIORITY: Required DEPENDS ON: ROA generation TIME REQUIRED: 1-2 weeks STATUS: Not started - Make rpkid fully event-driven (async tasking model), except for SQL queries. This probably involves the "twisted" framework. PRIORITY: Required (to implement scalable hosting model) TIME REQUIRED: Two weeks. STATUS: Not started - Error handling: make sure that exceptions map correctly to up-down error codes, flesh out left-right error codes. Note that the same exception may produce different error codes depending on which up-down PDU we're processing (sigh). Will require code audit for coherency, which is most of the work. PRIORITY: Required TIME REQUIRED: Two weeks DEPENDS ON: almost everything else, as almost any code change can raise new exceptions that we'd need to handle. STATUS: Not started - db.commit(), db.rollback(), code audit for data integrity issues, fix any data integrity issues that turn up. Among other issues, need to handle loss of connection to database server and other MySQL errors. Need to be careful about recovery action depending on whether we had uncommitted changes. PRIORITY: Required TIME REQUIRED (commit and rollback): 3-4 weeks TIME REQUIRED (data integrity audit): 1 week TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results of runtime testing. DEPENDS ON: async tasking model rollback. STATUS: Not started - Test framework for multiple self-instances per engine-instance (single self-instance per engine-instance is already done). PRIORITY: Required for testing DEPENDS ON: Async tasking model. TIME REQUIRED: One week STATUS: Not started - Current TLS code (tlslite) appeared to be flakey under heavy use back in November, and doesn't support all the required certificate checks out of the box. Certificate checker has now been replaced with something based on OpenSSL/POW, and the result seems to work. If the TLS code itself is still unstable, best bet would be to replace it with a Tls class cloned from the existing POW Ssl class; the current Ssl class isn't adaquate either, but there's documentation (eg, the O'Reilly OpenSSL book) that explains in some detail what this code would need to do. PRIORITY: Required for pilot (cert checking is a security issue). TIME REQUIRED: 3-4 weeks DEPENDS ON: Async tasking model. STATUS: Not started - Resource subsetting (req_* attributes in up-down protocol), full implementation. Requires expanding SQL child_cert table to hold subset masks and rewriting a fair amount of code. PRIORITY: Required for full implementation. TIME REQUIRED: 3-4 weeks STATUS: Not started - Performance testing STATUS: Not started - Clean up rootd.py to be usable in a production system. Most urgent issue is handling of private keys. May not need much else, as this is not a high-traffic server. PRIORITY: Highly desirable (not strictly needed for pilot testing) TIME REQUIRED: One week STATUS: Not started - Update internals docs (Doxygen). Mostly this means updating function comments in the Python code, as the rest is automatic. May require a bit of overview text to explain the workings of the code, this overview text may well turn out to be just the current flat text documents marked up for inclusion by Doxygen. PRIORITY: Desirable TIME REQUIRED: One week. STATUS: Ongoing - Reorganize code (directory names, module names, which objects are in which modules, add gctx pointers to objects to avoid passing explicit gctx pointers in almost every function call) to make it easier to understand and maintain. Portions of the existing code were done in extreme haste to meet testing deadlines, and it shows. PRIORITY: Highly desirable TIME REQUIRED: One week. STATUS: Explicit gctx eradication done; much file renaming done; other stuff not started. - Add HSM support. Architecture includes it, current code does not. First step here would be talking to somebody with strong understanding of PKCS# 11. PRIORITY: Desirable, not required for pilot TIME REQUIRED: Unknown STATUS: Not started - Installation packaging, so that rpkid can be built and installed like a normal package. PRIORITY: Desirable TIME REQUIRED: One week, longer if installation for many platforms is required STATUS: Not started - Tighten up syntax checking in left-right schema. PRIORITY: Desirable TIME REQUIRED: One day. STATUS: Not started - Rethink exposing SQL primary indices in protocols. Right now, auto-incremented SQL indices are used in many places in the left-right protocol, and are even exposed in a few places in our implementation of the up-down protocol. This is nice and unique but may be operationally fragile, since up-down usage means that URLs contain mechanically assigned identifiers rather than an identifier negotiated between the two parties during contract setup. The RIPE NCC suggested that we should instead use something like a hash of the client's name, which would be probabilistically unique, would not expose information, but would be stable even if we had to rebuild the database. PRIORITY: Rethinking desirable; reworking unknown TIME REQUIRED: One week to evaluate. Implementation time if we decide to make a change unknown, but probably on the order of another week. STATUS: Not started - Common protocol dump format with APNIC and other implementors so we can exchange protocol dumps. PRIORITY: Desirable TIME REQUIRED: Two days STATUS: Not started - IETF SIDR WG is still talking about ROAs with multiple signatures. No obvious need for this but IETF may mandate it anyway. Full implementation would require significant work revising current SQL table relations and upgrading CMS support. PRIORITY: Minimal, IETF feeping creaturism TIME REQUIRED: Unknown STATUS: Not started - Deaddrop of incoming messages, for audit. Absent a better theory, steal existing tech for this: preface with minimal RFC 2822 header and drop it into a Maildir folder using built-in Python Maildir library code, at which point it becomes soebody else's problem. STATUS: Not started PRIORITY: Desirable, trivial to implement. - Investigate using EKU (RFC 3280 4.2.1.13) as an alternative to wiring in BPKI EE certs for left-right protocol. Other random notes: Being able to specify interaction with other servers (not running under testbed) in a testbed.yaml might be useful for interop tests. Kind of breaks testbed's fundamental model, though. Replacing what testbed thinks is a leaf with somebody else would be easy, so maybe we could specify some way to hang a bunch of rpkids under an external parent? Hmm, data needed would look a lot like testpoke.yaml, maybe we can reuse some of that language? There's a three-way tradeoff lurking in the publication protocol, manifest generation, and CRL generation: 1) Consistancy issues for relying parties (eg, don't want to withdraw something that's still listed in the manifest); 2) Efficiency issues for the RPKI engine (eg, generating a new manifest for each individual change during a batch run could be expensive, would prefer to batch up the changes into a single manifest run); and 3) Coherency issues for the RPKI engine (don't want to defer things that could result in loss of state if something bad happens). Considerations (1) and (3) have to dominate, which may mean we take a hit on (2). Most of the explicit calls to sql_fetch*() are now encapsulated in one-line methods. The remaining ones are probably hints at minor bits of abstraction still to be done. Biz certs currently used by test scripts don't include SKI or AKI. I think this is because the test scripts use "openssl x509" rather than "openssl ca" when generating these certs. Not critical, and will probably become completely irrelevant with all-singing all-dancing post-Amsterdam biz cert scripts, but should not be a big problem to fix either if it gets in the way again.