aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRob Austein <sra@hactrn.net>2008-04-21 22:47:30 +0000
committerRob Austein <sra@hactrn.net>2008-04-21 22:47:30 +0000
commit8dfec45230670b061f16dbdc1492148d241e62a3 (patch)
tree3faaa36ce46792dcd4a930262d97689457a96236
parent2e30f96bfd94b3831c31e18e7d4fbdcf38dd0103 (diff)
Update TO DO
svn path=/rpkid/README; revision=1685
-rw-r--r--rpkid/README441
1 files changed, 195 insertions, 246 deletions
diff --git a/rpkid/README b/rpkid/README
index 0864cddf..257f2127 100644
--- a/rpkid/README
+++ b/rpkid/README
@@ -52,363 +52,312 @@ $Revision$
TO DO:
-- Update biz trust anchor model to what we came up with in Amsterdam.
- This was a direct result of security review by Kent and Housley.
+ * Update business trust anchor model to what was defined in Amsterdam. This
+ was a direct result of security review by Kent and Housley.
- This has been waiting for work we hope RobK is doing. This is
- probably not a lot of coding, probably a few extra cert fields in
- the self object which we then need to toss into the
- rpki.x509.X509_chain objects before verifying CMS or TLS, and
- perhaps the existing TA fields in various objects become pairs of
- certs instead of a single TA, but this is mostly just generalization
- and reuse of existing code, no bold new adventures.
+ This has been waiting for work hopefully being completed by the RIPE NCC, but
+ is probably not a lot of coding, probably a few extra certificate fields in the
+ self object which needs to go into the rpki.x509.X509_chain objects before
+ verifying CMS or TLS. Possibly the existing TA fields in various objects need
+ to become pairs of certificates instead of a single TA, but this is mostly just
+ generalization and reuse of existing code. Discussion in Philadelphia revealed
+ that this model is not yet a done deal.
- Discussion in Philadelphia revealed that this is not yet a done
- deal. Housley, RobK, and I all seem to be on the same page, and we
- think that what we're proposing will make sense to APNIC once we
- explain it properly, but overall we have not yet converged.
+ PRIORITY: Required for pilot (security issue)
- PRIORITY: Required (security issue)
+ TIME REQUIRED: Two weeks.
- STATUS: Not started
+ STATUS: Not started
-- rcynic handling of RPKI trust anchors needs updating. Discussions
- over last N months of how RPKI trust anchors work, how we package
- them, and how we roll them over. The last (TA rollover) is the
- driver for this.
- APNIC has apparently moved on from their proposal to use CMS-signed
- OpenSSL "PEM" format, they're now proposing a CMS-signed ASN.1
- SEQUENCE OF something. Precise details of APNIC's new model not yet
- known. Need to do analysis to make sure this is adaquate for our
- needs, if so just use it. This would involve minor changes to
- rcynic.
+ * rcynic handling of RPKI trust anchors needs updating, per discussions
+ over previous months of how RPKI trust anchors work, how we package them,
+ and how we roll them over. The last (TA rollover) is the driver for this.
- PRIORITY: Required (usability issue for relying parties)
+ APNIC has apparently moved on from their proposal to use CMS-signed OpenSSL
+ "PEM" format, they're now proposing a CMS-signed ASN.1 SEQUENCE OF
+ something. Precise details of APNIC's new model not yet known. Need to do
+ analysis to make sure this is adequate for our needs, if so just use it.
+ This would involve minor changes to rcynic.
- STATUS: Not started
+ PRIORITY: Required for pilot (usability issue for relying parties)
-- Publication protocol and implementation thereof. Protocol design
- started, Randy had comments that sent me back to the drawing board
- (he was right). Next step is to integrate Randy's advice, which
- probably means picking up more of the left-right protocol framework.
+ TIME REQUIRED: One week.
- Desirable although not strictly required that protcol be agreed upon
- among the RIRs. Might not be practical given how long it takes
- group to decide anything.
+ STATUS: Not started
- Tricky bit is making sure that repository receives enough
- information to know whether parent has authorized child to use
- parent's namespace in nesting case. In theory this is
- straightforward but requires careful checking.
- ARIN can't host output of non-hosted RPKI engines without this, and
- that's critical both to the security model as discussed with ARIN
- staff in late 2006, so I believe we need this capability even as
- part of the initial limited test.
+ * Publication protocol and implementation thereof. Desirable although not
+ strictly required that protocol be agreed upon among the RIRs. Tricky bit
+ is making sure that repository receives enough information to know whether
+ parent has authorized child to use parent's namespace in nesting case; in
+ theory this is straightforward but requires careful checking.
- PRIORITY: Required
+ ARIN can't host output of non-hosted RPKI engines without this, and that's
+ critical both to the security model as discussed with ARIN staff in late 2006,
+ hence this is a required capability even for testing.
- depending on how much of the protocol and implementation I can steal
- from the existing left-right protocol.
+ PRIORITY: Required for pilot
- STATUS: Started
+ TIME REQUIRED: 3-4 weeks for implementation once protocol settled, depending on
+ how much of the existing left-right protocol design and implementation can be
+ reused.
-- ROA generation. We have a bunch of the primitives for this but we
- aren't yet generating the ROAs themselves.
+ STATUS: Started
- For reasons that presumably made sense at the time, the left-right
- protocol for route_origin objects allows ranges as well as prefixes,
- and the SQL for stores everything as ranges, which is nice and
- general...except that ROAs can only hold prefixes. So left-right
- should only allow prefixes, and SQL should only store prefixes.
- Initial pass at ROA generation is done, not usable yet: ROA
- maintenance (after initial generation) not done, and the current CRL
- mechanism needs revision as it's too closely tied to child_cert
- objects. Need to rewrite with a separate CRL table hanging off the
- ca_detail object, then clean up excess baggage that currently hangs
- off the child_cert object.
+ * Resource subsetting (req_* attributes in up-down protocol), minimal
+ implementation. Recognize this as correct protocol and signal an
+ internal server error if ever used.
- PRIORITY: Required
+ PRIORITY: Required for pilot.
- STATUS: Started
+ TIME REQUIRED: Two days
-- rcynic does not yet handle manifests. This is both a real problem
- (manifests were added to plug a security hole) and a user acceptance
- problem (without manifest support rcynic checks old certs that are
- supposed to fail because they've been revoked, resulting in what
- appear to be spurious errors, which just annoy the user).
+ STATUS: Not started
- PRIORITY: Required
- STATUS: Not started
+ * ROA generation code is incomplete, no support yet for maintenance after
+ initial generation, and interaction with CRL mechanism needs work
-- Scripted tests to grow and shrink and revoke and .... See
- testbed.*.yaml, but more systematic testing needed.
+ PRIORITY: Required for pilot
- PRIORITY: Required
+ TIME REQUIRED: One week (remaining)
- STATUS: Ongoing
+ STATUS: Started
-- Randy's "user validation tool" (fetch and validate certs and
- probably the ROA for a prefix I want to accept in a route filter I
- am building in Python/Perl). This probably uses rcync's output as
- one of its inputs.
- This is a basic tool for a sysadmin who wants to -use- all this crud
- we're working so hard to generate. It's not required for the
- generation tools to work, but without it the entire toolset does
- nothing obviously useful, which will make it a very hard sell during
- the limited public test stage.
+ * rcynic does not yet handle manifests. This is both a real problem
+ (manifests were added to plug a security hole) and a user acceptance
+ problem (without manifest support rcynic checks old certs that are supposed
+ to fail because they've been revoked, resulting in what appear to be
+ spurious errors, which just annoy the user).
- PRIORITY: Required
+ PRIORITY: Required for pilot
- DEPENDS ON: ROA generation
+ TIME REQUIRED: Two weeks.
- STATUS: Not started
+ STATUS: Not started
-- Common protocol dump format with APNIC and other implementors so we
- can read each other's dumps. "Obvious" format would be an
- OpenSSL-style PEM of the CMS, with a "text" portion (the place where
- "openssl x509 -text" would put a text dump of a cert) showing the
- wrapped XML.
- PRIORITY: Desirable
+ * User validation tool: fetch and validate certs and ROA for a prefix that
+ the user wants to accept in a router filter the user is building. This
+ probably uses rcynic's output as one of its inputs.
- STATUS: Not started
+ PRIORITY: Required
-- Clean unused cruft out of left-right protocol, or at least have
- control booleans we don't intend to implement at present signal an
- error if used.
+ DEPENDS ON: ROA generation
- Bottleneck here has been deciding what to punt and what to
- implement. Removing unused booleans or raising errors when they're
- used is trivial.
+ TIME REQUIRED: 1-2 weeks
- PRIORITY: Required
+ STATUS: Not started
- STATUS: Error signalling done
-- Subsetting (req_* attributes in up-down protocol)
+ * Make rpkid fully event-driven (async tasking model), except for SQL
+ queries. This probably involves the "twisted" framework.
- Minimal implementation would be to recognize this as correct
- protocol and signal an internal server error if it's ever used.
+ PRIORITY: Required (to implement scalable hosting model)
- More serious implementation would require expanding SQL child_cert
- table to hold subset masks and tweaking almost every bit of code
- that touches that table.
+ TIME REQUIRED: Two weeks.
- PRIORITY: Required
+ STATUS: Not started
- STATUS: Not started
-- Error handling: make sure that exceptions map correctly to up-down
- error codes, flesh out left-right error codes. Note that the same
- exception may produce different error codes depending on which
- up-down PDU we're processing (sigh).
+ * Error handling: make sure that exceptions map correctly to up-down error
+ codes, flesh out left-right error codes. Note that the same exception may
+ produce different error codes depending on which up-down PDU we're
+ processing (sigh).
- Will require code audit for coherency.
+ Will require code audit for coherency, which is most of the work.
- PRIORITY: Required
+ PRIORITY: Required
- DEPENDS ON: almost everything else, as almost any code change can
- raise new exceptions that we'd need to handle.
+ TIME REQUIRED: Two weeks
- STATUS: Not started
+ DEPENDS ON: almost everything else, as almost any code change can raise new
+ exceptions that we'd need to handle.
-- db.commit(), db.rollback(), code audit for data integrity issues,
- fix any data integrity issues that turn up.
+ STATUS: Not started
- Among other issues, we need to handle loss of connnection to
- database server and other MySQL errors. MySQLdb throws an
- exception, which we can catch, and retrying is easy enough, but need
- to be careful about recovery action depending on whether we had
- uncommitted changes.
- PRIORITY: Required
+ * db.commit(), db.rollback(), code audit for data integrity issues, fix any
+ data integrity issues that turn up. Among other issues, need to handle loss
+ of connection to database server and other MySQL errors. Need to be careful
+ about recovery action depending on whether we had uncommitted changes.
- DEPENDS ON: async tasking model, sort of -- could do it first, but
- tasking change will affect the exception handling that triggers
- rollback.
+ PRIORITY: Required
- STATUS: Not started
+ TIME REQUIRED (commit and rollback): 3-4 weeks
-- Test with larger data set -- Tim gave me plenty of data, I have the
- low-level tools and the glue logic to create child objects for all
- the entities in the IRDB, but I don't yet have logic to poll on
- behalf of each of them and check result for sanity.
+ TIME REQUIRED (data integrity audit): 1 week
- Maybe it'd be easier to write something that dumps Tim's database in
- YAML format for testbed.py to chew on?
+ TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results
+ of runtime testing.
- PRIORITY: Highly desirable
+ DEPENDS ON: async tasking model rollback.
- STATUS: Not started
+ STATUS: Not started
-- Clean up rootd.py to be usable in a production system. Most urgent
- issue is handling of private keys. May not need much else, as this
- is not a high-traffic server.
- PRIORITY: Highly desirable (not strictly needed for limited testing)
+ * Test framework for multiple self-instances per engine-instance (single
+ self-instance per engine-instance is already done).
- STATUS: Not started
+ PRIORITY: Required for testing
-- Test framework, multiple self-instances per engine-instance (single
- self-instance per engine-instance is already done).
+ DEPENDS ON: Async tasking model.
- PRIORITY: Required
+ TIME REQUIRED: One week
- DEPENDS ON: async tasking model.
+ STATUS: Not started
- STATUS: Not started
-- tlslite code seems flakey under heavy use, and doesn't support all
- the cert checks we want. Best bet for getting this right is
- probably to hack on the POW Ssl class until it supports everything
- shown in the OpenSSL book; aside from speed, the main advantage here
- is that there -is- a list of all the things one needs to do to use
- TLS properly if one follows this recipe, whereas with TLSlite it's
- all a mystery.
+ * Current TLS code (tlslite) is flakey under heavy use and doesn't support
+ all the required certificate checks. Best fix would be to add what support
+ we need to POW Ssl class.
- Useful side effect of doing this via POW: it brings us back to only
- needing one crypto library (in particular it lets us punt M2Crypto,
- which appears to be coded as an accident waiting to happen).
+ PRIORITY: Required for pilot (cert checking is a security issue).
- PRIORITY: Required (cert checking is a security issue).
+ TIME REQUIRED: 3-4 weeks
- DEPENDS ON: Async tasking model.
+ DEPENDS ON: Async tasking model.
- STATUS: Not started
+ STATUS: Not started
-- Make rpkid fully event-driven (async tasking model), except for SQL
- queries. This probably involves the "twisted" framework.
- PRIORITY: Required (to implement hosting model)
+ * Resource subsetting (req_* attributes in up-down protocol), full
+ implementation. Requires expanding SQL child_cert table to hold subset
+ masks and rewriting a fair amount of code.
- STATUS: Not started
+ PRIORITY: Required for full implementation.
-- Performance testing
+ TIME REQUIRED: 3-4 weeks
- STATUS: Not started
+ STATUS: Not started
-- Update internals docs (Doxygen). Mostly this means updating
- function comments in the Python code, as the rest is automatic. May
- require a bit of overview text to explain the workings of the code,
- this overview text may well turn out to be just the current flat
- text documents marked up for inclusion by Doxygen.
- PRIORITY: Desirable
+ * Performance testing
- STATUS: Ongoing
+ STATUS: Not started
-- Reorganize code (directory names, module names, which objects are in
- which modules, add gctx pointers to objects so we can stop passing
- all these flipping explicit gctx pointers in almost every function
- call) to make it easier to understand and maintain. Portions of the
- existing code were done in extreme haste to meet testing deadlines,
- and it shows.
- STATUS: File renaming mostly done, other stuff not started
+ * Clean up rootd.py to be usable in a production system. Most urgent issue is
+ handling of private keys. May not need much else, as this is not a
+ high-traffic server.
- PRIORITY: Highly desirable (to preserve programmers' and
- maintainers' sanity, if nothing else)
+ PRIORITY: Highly desirable (not strictly needed for pilot testing)
-- Add HSM support. Architecture includes it, current code does not.
- First step here would be talking to somebody who understands PKCS#11
- better than I do, ie, Richard Lamb or Francis Dupont.
+ TIME REQUIRED: One week
- STATUS: Not started
+ STATUS: Not started
- PRIORITY: Desirable. Am guessing ARIN does not require this for
- initial test
-- Installation packaging, so that rpkid can be installed like a normal
- package.
+ * Update internals docs (Doxygen). Mostly this means updating function
+ comments in the Python code, as the rest is automatic. May require a bit of
+ overview text to explain the workings of the code, this overview text may
+ well turn out to be just the current flat text documents marked up for
+ inclusion by Doxygen.
- STATUS: Not started
+ PRIORITY: Desirable
- PRIORITY: Desirable
+ TIME REQUIRED: One week.
-- Tighten up syntax checking in left-right schema.
+ STATUS: Ongoing
- STATUS: Not started
- PRIORITY: Desirable
+ * Reorganize code (directory names, module names, which objects are in which
+ modules, add gctx pointers to objects to avoid passing explicit gctx
+ pointers in almost every function call) to make it easier to understand and
+ maintain. Portions of the existing code were done in extreme haste to meet
+ testing deadlines, and it shows.
-- Rethink exposing SQL primary indices in protocols. Right now, we
- use autoincremented SQL indices in many places in the left-right
- protocol, and they're even expose in a few places in our
- implementation of the up-down protocol. This is nice and unique but
- may be operationally fragile, since up-down usage means that URLs
- contain mechanically assigned identifiers rather than an identifier
- negotiated between the two parties during contract setup.
+ PRIORITY: Highly desirable
- RobK suggested that we should instead use something like a hash of
- the client's name, which would be probabilistically unique, would
- not expose information, but would be stable even if we had to
- rebuild the database.
+ TIME REQUIRED: One week.
- STATUS: Not started
+ STATUS: File renaming mostly done, other stuff not started
- if we decide to make a change unknown, but probably on the order of
- a few days.
- PRIORITY: Rethinking desirable; reworking unknown
+ * Add HSM support. Architecture includes it, current code does not. First
+ step here would be talking to somebody with strong understanding of PKCS#
+ 11.
-- IETF SIDR WG is still mumbling about ROAs with multiple signatures.
- As far as I can tell there is no need for this, but the WG may not
- agree. If ARIN wants me to implement this, will require both
- some SQL work (current table relationships assume ROA is tied to a
- single ca_detail) and some OpenSSL work (OpenSSL doesn't fully
- support multiple signatures yet, have not investigated in depth).
+ PRIORITY: Desirable, not required for pilot
- STATUS: Not started
+ TIME REQUIRED: Unknown
- PRIORITY: Minimal, IETF feeping creaturism
+ STATUS: Not started
-- Deaddrop of incoming messages, for audit. Absent a better theory,
- steal existing tech for this: preface with minimal RFC 2822 header
- and drop it into a Maildir folder using built-in Python Maildir
- library code, at which point it becomes soebody else's problem.
- STATUS: Not started
+ * Installation packaging, so that rpkid can be built and installed like a
+ normal package.
- Priority: Desirable, trivial to implement.
+ PRIORITY: Desirable
+ TIME REQUIRED: One week, longer if installation for many platforms is
+ required
-
+ STATUS: Not started
+
+
+ * Tighten up syntax checking in left-right schema.
+
+ PRIORITY: Desirable
+
+ TIME REQUIRED: One day.
+
+ STATUS: Not started
+
+
+ * Rethink exposing SQL primary indices in protocols. Right now,
+ auto-incremented SQL indices are used in many places in the left-right
+ protocol, and are even exposed in a few places in our implementation of the
+ up-down protocol. This is nice and unique but may be operationally fragile,
+ since up-down usage means that URLs contain mechanically assigned
+ identifiers rather than an identifier negotiated between the two parties
+ during contract setup.
+
+ The RIPE NCC suggested that we should instead use something like a hash of the
+ client's name, which would be probabilistically unique, would not expose
+ information, but would be stable even if we had to rebuild the database.
+
+ PRIORITY: Rethinking desirable; reworking unknown
+
+ TIME REQUIRED: One week to evaluate. Implementation time if we decide to make a
+ change unknown, but probably on the order of another week.
+
+ STATUS: Not started
+
+
+ * Common protocol dump format with APNIC and other implementors so we can
+ exchange protocol dumps.
+
+ PRIORITY: Desirable
+
+ TIME REQUIRED: Two days
+
+ STATUS: Not started
-Things implemented but not yet tested.
-- Client side of expiration now assumes that parent will reissue
- when its IRDB changes.
+ * IETF SIDR WG is still talking about ROAs with multiple signatures. No
+ obvious need for this but IETF may mandate it anyway. Full implementation
+ would require significant work revising current SQL table relations and
+ upgrading CMS support.
-- Parent side of revocation (child_cert objects) and CRL generation
- implemented.
+ PRIORITY: Minimal, IETF feeping creaturism
-- Parent side of expiration implemented.
+ TIME REQUIRED: Unknown
-- Child batch processing loop: regeneration or removal of expired
- certs based on what's in the IRDB.
+ STATUS: Not started
-- Batch regeneration of CRLs and manifests for all CAs.
-- Protection against up-down operations specifying a class_name that
- belongs to some other self context.
+ * Deaddrop of incoming messages, for audit. Absent a better theory,
+ steal existing tech for this: preface with minimal RFC 2822 header
+ and drop it into a Maildir folder using built-in Python Maildir
+ library code, at which point it becomes soebody else's problem.
-- Rewrote code that handles revoke on shrink to revoke -all- old certs
- for that key, not just most recent. Not certain, but this may have
- been the cause of a cert dropping not showing up in the CRL during
- testing with APNIC in Vancouver.
+ STATUS: Not started
-- Kludgy local publication hack seems to work now, including
- withdrawal. rcynic still whines occasionally, but I think that's
- just because, without manifest support, rcynic has no way of telling
- the difference between certs we withdrew on purpose and certs that
- were removed by an attacker, so the first rcynic run after a cert
- has been revoked pulls the old cert from the previous rcynic pass,
- find that it's listed in the CRL, and whines about it.
+ PRIORITY: Desirable, trivial to implement.