Update TO DO

svn path=/rpkid/README; revision=1685
author: Rob Austein <sra@hactrn.net> 2008-04-21 22:47:30 +0000
committer: Rob Austein <sra@hactrn.net> 2008-04-21 22:47:30 +0000
commit: 8dfec45230670b061f16dbdc1492148d241e62a3 (patch)
tree: 3faaa36ce46792dcd4a930262d97689457a96236
parent: 2e30f96bfd94b3831c31e18e7d4fbdcf38dd0103 (diff)
1 files changed, 195 insertions, 246 deletions
diff --git a/rpkid/README b/rpkid/README
index 0864cddf..257f2127 100644
--- a/rpkid/README
+++ b/rpkid/README
@@ -52,363 +52,312 @@ $Revision$
 
 TO DO:
 
-- Update biz trust anchor model to what we came up with in Amsterdam.
-  This was a direct result of security review by Kent and Housley.
+      * Update business trust anchor model to what was defined in Amsterdam. This
+	was a direct result of security review by Kent and Housley.
 
-  This has been waiting for work we hope RobK is doing.  This is
-  probably not a lot of coding, probably a few extra cert fields in
-  the self object which we then need to toss into the
-  rpki.x509.X509_chain objects before verifying CMS or TLS, and
-  perhaps the existing TA fields in various objects become pairs of
-  certs instead of a single TA, but this is mostly just generalization
-  and reuse of existing code, no bold new adventures.
+	This has been waiting for work hopefully being completed by the RIPE NCC, but
+	is probably not a lot of coding, probably a few extra certificate fields in the
+	self object which needs to go into the rpki.x509.X509_chain objects before
+	verifying CMS or TLS. Possibly the existing TA fields in various objects need
+	to become pairs of certificates instead of a single TA, but this is mostly just
+	generalization and reuse of existing code. Discussion in Philadelphia revealed
+	that this model is not yet a done deal.
 
-  Discussion in Philadelphia revealed that this is not yet a done
-  deal.  Housley, RobK, and I all seem to be on the same page, and we
-  think that what we're proposing will make sense to APNIC once we
-  explain it properly, but overall we have not yet converged.
+	PRIORITY: Required for pilot (security issue)
 
-  PRIORITY: Required (security issue)
+	TIME REQUIRED: Two weeks.
 
-  STATUS: Not started
+	STATUS: Not started
 
-- rcynic handling of RPKI trust anchors needs updating.  Discussions
-  over last N months of how RPKI trust anchors work, how we package
-  them, and how we roll them over.  The last (TA rollover) is the
-  driver for this.
 
-  APNIC has apparently moved on from their proposal to use CMS-signed
-  OpenSSL "PEM" format, they're now proposing a CMS-signed ASN.1
-  SEQUENCE OF something.  Precise details of APNIC's new model not yet
-  known.  Need to do analysis to make sure this is adaquate for our
-  needs, if so just use it.  This would involve minor changes to
-  rcynic.
+      * rcynic handling of RPKI trust anchors needs updating, per discussions
+	over previous months of how RPKI trust anchors work, how we package them,
+	and how we roll them over. The last (TA rollover) is the driver for this.
 
-  PRIORITY: Required (usability issue for relying parties)
+	APNIC has apparently moved on from their proposal to use CMS-signed OpenSSL
+	"PEM" format, they're now proposing a CMS-signed ASN.1 SEQUENCE OF
+	something. Precise details of APNIC's new model not yet known. Need to do
+	analysis to make sure this is adequate for our needs, if so just use it.
+	This would involve minor changes to rcynic.
 
-  STATUS: Not started
+	PRIORITY: Required for pilot (usability issue for relying parties)
 
-- Publication protocol and implementation thereof.   Protocol design
-  started, Randy had comments that sent me back to the drawing board
-  (he was right).  Next step is to integrate Randy's advice, which
-  probably means picking up more of the left-right protocol framework.
+	TIME REQUIRED: One week.
 
-  Desirable although not strictly required that protcol be agreed upon
-  among the RIRs.  Might not be practical given how long it takes
-  group to decide anything.
+	STATUS: Not started
 
-  Tricky bit is making sure that repository receives enough
-  information to know whether parent has authorized child to use
-  parent's namespace in nesting case.   In theory this is
-  straightforward but requires careful checking.
 
-  ARIN can't host output of non-hosted RPKI engines without this, and
-  that's critical both to the security model as discussed with ARIN
-  staff in late 2006, so I believe we need this capability even as
-  part of the initial limited test.
+      * Publication protocol and implementation thereof. Desirable although not
+	strictly required that protocol be agreed upon among the RIRs. Tricky bit
+	is making sure that repository receives enough information to know whether
+	parent has authorized child to use parent's namespace in nesting case; in
+	theory this is straightforward but requires careful checking.
 
-  PRIORITY: Required
+	ARIN can't host output of non-hosted RPKI engines without this, and that's
+	critical both to the security model as discussed with ARIN staff in late 2006,
+	hence this is a required capability even for testing.
 
-  depending on how much of the protocol and implementation I can steal
-  from the existing left-right protocol.
+	PRIORITY: Required for pilot
 
-  STATUS: Started
+	TIME REQUIRED: 3-4 weeks for implementation once protocol settled, depending on
+	how much of the existing left-right protocol design and implementation can be
+	reused.
 
-- ROA generation.  We have a bunch of the primitives for this but we
-  aren't yet generating the ROAs themselves.
+	STATUS: Started
 
-  For reasons that presumably made sense at the time, the left-right
-  protocol for route_origin objects allows ranges as well as prefixes,
-  and the SQL for stores everything as ranges, which is nice and
-  general...except that ROAs can only hold prefixes.  So left-right
-  should only allow prefixes, and SQL should only store prefixes.
 
-  Initial pass at ROA generation is done, not usable yet: ROA
-  maintenance (after initial generation) not done, and the current CRL
-  mechanism needs revision as it's too closely tied to child_cert
-  objects.  Need to rewrite with a separate CRL table hanging off the
-  ca_detail object, then clean up excess baggage that currently hangs
-  off the child_cert object.
+      * Resource subsetting (req_* attributes in up-down protocol), minimal 
+	implementation.    Recognize this as correct protocol and signal an
+	internal server error if ever used.
 
-  PRIORITY: Required
+	PRIORITY: Required for pilot.
 
-  STATUS: Started
+	TIME REQUIRED: Two days
 
-- rcynic does not yet handle manifests.  This is both a real problem
-  (manifests were added to plug a security hole) and a user acceptance
-  problem (without manifest support rcynic checks old certs that are
-  supposed to fail because they've been revoked, resulting in what
-  appear to be spurious errors, which just annoy the user).
+	STATUS: Not started
 
-  PRIORITY: Required
 
-  STATUS: Not started
+      * ROA generation code is incomplete, no support yet for maintenance after
+	initial generation, and interaction with CRL mechanism needs work
 
-- Scripted tests to grow and shrink and revoke and ....  See
-  testbed.*.yaml, but more systematic testing needed.
+	PRIORITY: Required for pilot
 
-  PRIORITY: Required
+	TIME REQUIRED: One week (remaining)
 
-  STATUS: Ongoing
+	STATUS: Started
 
-- Randy's "user validation tool" (fetch and validate certs and
-  probably the ROA for a prefix I want to accept in a route filter I
-  am building in Python/Perl).  This probably uses rcync's output as
-  one of its inputs.
 
-  This is a basic tool for a sysadmin who wants to -use- all this crud
-  we're working so hard to generate.  It's not required for the
-  generation tools to work, but without it the entire toolset does
-  nothing obviously useful, which will make it a very hard sell during
-  the limited public test stage.
+      * rcynic does not yet handle manifests. This is both a real problem
+	(manifests were added to plug a security hole) and a user acceptance
+	problem (without manifest support rcynic checks old certs that are supposed
+	to fail because they've been revoked, resulting in what appear to be
+	spurious errors, which just annoy the user).
 
-  PRIORITY: Required
+	PRIORITY: Required for pilot
 
-  DEPENDS ON: ROA generation
+	TIME REQUIRED: Two weeks.
 
-  STATUS: Not started
+	STATUS: Not started
 
-- Common protocol dump format with APNIC and other implementors so we
-  can read each other's dumps.  "Obvious" format would be an
-  OpenSSL-style PEM of the CMS, with a "text" portion (the place where
-  "openssl x509 -text" would put a text dump of a cert) showing the
-  wrapped XML.
 
-  PRIORITY: Desirable
+      * User validation tool: fetch and validate certs and ROA for a prefix that
+	the user wants to accept in a router filter the user is building. This
+	probably uses rcynic's output as one of its inputs.
 
-  STATUS: Not started
+	PRIORITY: Required
 
-- Clean unused cruft out of left-right protocol, or at least have
-  control booleans we don't intend to implement at present signal an
-  error if used.
+	DEPENDS ON: ROA generation
 
-  Bottleneck here has been deciding what to punt and what to
-  implement.  Removing unused booleans or raising errors when they're
-  used is trivial.
+	TIME REQUIRED: 1-2 weeks
 
-  PRIORITY: Required
+	STATUS: Not started
 
-  STATUS: Error signalling done
 
-- Subsetting (req_* attributes in up-down protocol)
+      * Make rpkid fully event-driven (async tasking model), except for SQL
+	queries. This probably involves the "twisted" framework.
 
-  Minimal implementation would be to recognize this as correct
-  protocol and signal an internal server error if it's ever used.
+	PRIORITY: Required (to implement scalable hosting model)
 
-  More serious implementation would require expanding SQL child_cert
-  table to hold subset masks and tweaking almost every bit of code
-  that touches that table.
+	TIME REQUIRED: Two weeks.
 
-  PRIORITY: Required
+	STATUS: Not started
 
-  STATUS: Not started
 
-- Error handling: make sure that exceptions map correctly to up-down
-  error codes, flesh out left-right error codes.  Note that the same
-  exception may produce different error codes depending on which
-  up-down PDU we're processing (sigh).
+      * Error handling: make sure that exceptions map correctly to up-down error
+	codes, flesh out left-right error codes. Note that the same exception may
+	produce different error codes depending on which up-down PDU we're
+	processing (sigh).
 
-  Will require code audit for coherency.
+	Will require code audit for coherency, which is most of the work.
 
-  PRIORITY: Required
+	PRIORITY: Required
 
-  DEPENDS ON: almost everything else, as almost any code change can
-  raise new exceptions that we'd need to handle.
+	TIME REQUIRED: Two weeks
 
-  STATUS: Not started
+	DEPENDS ON: almost everything else, as almost any code change can raise new
+	exceptions that we'd need to handle.
 
-- db.commit(), db.rollback(), code audit for data integrity issues,
-  fix any data integrity issues that turn up.
+	STATUS: Not started
 
-  Among other issues, we need to handle loss of connnection to
-  database server and other MySQL errors.  MySQLdb throws an
-  exception, which we can catch, and retrying is easy enough, but need
-  to be careful about recovery action depending on whether we had
-  uncommitted changes.
 
-  PRIORITY: Required
+      * db.commit(), db.rollback(), code audit for data integrity issues, fix any
+	data integrity issues that turn up. Among other issues, need to handle loss
+	of connection to database server and other MySQL errors. Need to be careful
+	about recovery action depending on whether we had uncommitted changes.
 
-  DEPENDS ON: async tasking model, sort of -- could do it first, but
-  tasking change will affect the exception handling that triggers
-  rollback.
+	PRIORITY: Required
 
-  STATUS: Not started
+	TIME REQUIRED (commit and rollback): 3-4 weeks
 
-- Test with larger data set -- Tim gave me plenty of data, I have the
-  low-level tools and the glue logic to create child objects for all
-  the entities in the IRDB, but I don't yet have logic to poll on
-  behalf of each of them and check result for sanity.
+	TIME REQUIRED (data integrity audit): 1 week
 
-  Maybe it'd be easier to write something that dumps Tim's database in
-  YAML format for testbed.py to chew on?
+	TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results
+	of runtime testing.
 
-  PRIORITY: Highly desirable
+	DEPENDS ON: async tasking model rollback.
 
-  STATUS: Not started
+	STATUS: Not started
 
-- Clean up rootd.py to be usable in a production system.   Most urgent
-  issue is handling of private keys.   May not need much else, as this
-  is not a high-traffic server.
 
-  PRIORITY: Highly desirable (not strictly needed for limited testing)
+      * Test framework for multiple self-instances per engine-instance (single
+	self-instance per engine-instance is already done).
 
-  STATUS: Not started
+	PRIORITY: Required for testing
 
-- Test framework, multiple self-instances per engine-instance (single
-  self-instance per engine-instance is already done).
+	DEPENDS ON: Async tasking model.
 
-  PRIORITY: Required
+	TIME REQUIRED: One week
 
-  DEPENDS ON: async tasking model.
+	STATUS: Not started
 
-  STATUS: Not started
 
-- tlslite code seems flakey under heavy use, and doesn't support all
-  the cert checks we want.  Best bet for getting this right is
-  probably to hack on the POW Ssl class until it supports everything
-  shown in the OpenSSL book; aside from speed, the main advantage here
-  is that there -is- a list of all the things one needs to do to use
-  TLS properly if one follows this recipe, whereas with TLSlite it's
-  all a mystery.
+      * Current TLS code (tlslite) is flakey under heavy use and doesn't support
+	all the required certificate checks. Best fix would be to add what support
+	we need to POW Ssl class.
 
-  Useful side effect of doing this via POW: it brings us back to only
-  needing one crypto library (in particular it lets us punt M2Crypto,
-  which appears to be coded as an accident waiting to happen).
+	PRIORITY: Required for pilot (cert checking is a security issue).
 
-  PRIORITY: Required (cert checking is a security issue).
+	TIME REQUIRED: 3-4 weeks
 
-  DEPENDS ON: Async tasking model.
+	DEPENDS ON: Async tasking model.
 
-  STATUS: Not started
+	STATUS: Not started
 
-- Make rpkid fully event-driven (async tasking model), except for SQL
-  queries.  This probably involves the "twisted" framework.
 
-  PRIORITY: Required (to implement hosting model)
+      * Resource subsetting (req_* attributes in up-down protocol), full
+        implementation.  Requires expanding SQL child_cert table to hold subset
+        masks and rewriting a fair amount of code.
 
-  STATUS: Not started
+	PRIORITY: Required for full implementation.
 
-- Performance testing
+	TIME REQUIRED: 3-4 weeks
 
-  STATUS: Not started
+	STATUS: Not started
 
-- Update internals docs (Doxygen).   Mostly this means updating
-  function comments in the Python code, as the rest is automatic.  May
-  require a bit of overview text to explain the workings of the code,
-  this overview text may well turn out to be just the current flat
-  text documents marked up for inclusion by Doxygen.
 
-  PRIORITY: Desirable
+      * Performance testing
 
-  STATUS: Ongoing
+	STATUS: Not started
 
-- Reorganize code (directory names, module names, which objects are in
-  which modules, add gctx pointers to objects so we can stop passing
-  all these flipping explicit gctx pointers in almost every function
-  call) to make it easier to understand and maintain.  Portions of the
-  existing code were done in extreme haste to meet testing deadlines,
-  and it shows.
 
-  STATUS: File renaming mostly done, other stuff not started
+      * Clean up rootd.py to be usable in a production system. Most urgent issue is
+	handling of private keys. May not need much else, as this is not a
+	high-traffic server.
 
-  PRIORITY: Highly desirable (to preserve programmers' and
-  maintainers' sanity, if nothing else)
+	PRIORITY: Highly desirable (not strictly needed for pilot testing)
 
-- Add HSM support.  Architecture includes it, current code does not.
-  First step here would be talking to somebody who understands PKCS#11
-  better than I do, ie, Richard Lamb or Francis Dupont.
+	TIME REQUIRED: One week
 
-  STATUS: Not started
+	STATUS: Not started
 
-  PRIORITY: Desirable.  Am guessing ARIN does not require this for
-  initial test
 
-- Installation packaging, so that rpkid can be installed like a normal
-  package.
+      * Update internals docs (Doxygen). Mostly this means updating function
+	comments in the Python code, as the rest is automatic. May require a bit of
+	overview text to explain the workings of the code, this overview text may
+	well turn out to be just the current flat text documents marked up for
+	inclusion by Doxygen.
 
-  STATUS: Not started
+	PRIORITY: Desirable
 
-  PRIORITY: Desirable
+	TIME REQUIRED: One week.
 
-- Tighten up syntax checking in left-right schema.
+	STATUS: Ongoing
 
-  STATUS: Not started
 
-  PRIORITY: Desirable
+      * Reorganize code (directory names, module names, which objects are in which
+	modules, add gctx pointers to objects to avoid passing explicit gctx
+	pointers in almost every function call) to make it easier to understand and
+	maintain. Portions of the existing code were done in extreme haste to meet
+	testing deadlines, and it shows.
 
-- Rethink exposing SQL primary indices in protocols.  Right now, we
-  use autoincremented SQL indices in many places in the left-right
-  protocol, and they're even expose in a few places in our
-  implementation of the up-down protocol.  This is nice and unique but
-  may be operationally fragile, since up-down usage means that URLs
-  contain mechanically assigned identifiers rather than an identifier
-  negotiated between the two parties during contract setup.
+	PRIORITY: Highly desirable
 
-  RobK suggested that we should instead use something like a hash of
-  the client's name, which would be probabilistically unique, would
-  not expose information, but would be stable even if we had to
-  rebuild the database.
+	TIME REQUIRED: One week.
 
-  STATUS: Not started
+	STATUS: File renaming mostly done, other stuff not started
 
-  if we decide to make a change unknown, but probably on the order of
-  a few days.
 
-  PRIORITY: Rethinking desirable; reworking unknown
+      * Add HSM support. Architecture includes it, current code does not. First
+	step here would be talking to somebody with strong understanding of PKCS#
+	11.
 
-- IETF SIDR WG is still mumbling about ROAs with multiple signatures.
-  As far as I can tell there is no need for this, but the WG may not
-  agree.  If ARIN wants me to implement this, will require both
-  some SQL work (current table relationships assume ROA is tied to a
-  single ca_detail) and some OpenSSL work (OpenSSL doesn't fully
-  support multiple signatures yet, have not investigated in depth).
+	PRIORITY: Desirable, not required for pilot
 
-  STATUS: Not started
+	TIME REQUIRED: Unknown
 
-  PRIORITY: Minimal, IETF feeping creaturism
+	STATUS: Not started
 
-- Deaddrop of incoming messages, for audit.  Absent a better theory,
-  steal existing tech for this: preface with minimal RFC 2822 header
-  and drop it into a Maildir folder using built-in Python Maildir
-  library code, at which point it becomes soebody else's problem.
 
-  STATUS: Not started
+      * Installation packaging, so that rpkid can be built and installed like a
+	normal package.
 
-  Priority: Desirable, trivial to implement.
+	PRIORITY: Desirable
 
+	TIME REQUIRED: One week, longer if installation for many platforms is
+	required
 
-
+	STATUS: Not started
+
+
+      * Tighten up syntax checking in left-right schema.
+
+	PRIORITY: Desirable
+
+	TIME REQUIRED: One day.
+
+	STATUS: Not started
+
+
+      * Rethink exposing SQL primary indices in protocols. Right now,
+	auto-incremented SQL indices are used in many places in the left-right
+	protocol, and are even exposed in a few places in our implementation of the
+	up-down protocol. This is nice and unique but may be operationally fragile,
+	since up-down usage means that URLs contain mechanically assigned
+	identifiers rather than an identifier negotiated between the two parties
+	during contract setup.
+
+	The RIPE NCC suggested that we should instead use something like a hash of the
+	client's name, which would be probabilistically unique, would not expose
+	information, but would be stable even if we had to rebuild the database.
+
+	PRIORITY: Rethinking desirable; reworking unknown
+
+	TIME REQUIRED: One week to evaluate. Implementation time if we decide to make a
+	change unknown, but probably on the order of another week.
+
+	STATUS: Not started
+
+
+      * Common protocol dump format with APNIC and other implementors so we can
+	exchange protocol dumps.
+
+	PRIORITY: Desirable
+
+	TIME REQUIRED: Two days
+
+	STATUS: Not started
 
-Things implemented but not yet tested. 
 
-- Client side of expiration now assumes that parent will reissue
-  when its IRDB changes.
+      * IETF SIDR WG is still talking about ROAs with multiple signatures. No
+	obvious need for this but IETF may mandate it anyway. Full implementation
+	would require significant work revising current SQL table relations and
+	upgrading CMS support.
 
-- Parent side of revocation (child_cert objects) and CRL generation
-  implemented.
+	PRIORITY: Minimal, IETF feeping creaturism
 
-- Parent side of expiration implemented.
+	TIME REQUIRED: Unknown
 
-- Child batch processing loop: regeneration or removal of expired
-  certs based on what's in the IRDB.
+	STATUS: Not started
 
-- Batch regeneration of CRLs and manifests for all CAs.
 
-- Protection against up-down operations specifying a class_name that
-  belongs to some other self context.
+      * Deaddrop of incoming messages, for audit.  Absent a better theory,
+        steal existing tech for this: preface with minimal RFC 2822 header
+	and drop it into a Maildir folder using built-in Python Maildir
+	library code, at which point it becomes soebody else's problem.
 
-- Rewrote code that handles revoke on shrink to revoke -all- old certs
-  for that key, not just most recent.  Not certain, but this may have
-  been the cause of a cert dropping not showing up in the CRL during
-  testing with APNIC in Vancouver.
+	STATUS: Not started
 
-- Kludgy local publication hack seems to work now, including
-  withdrawal.  rcynic still whines occasionally, but I think that's
-  just because, without manifest support, rcynic has no way of telling
-  the difference between certs we withdrew on purpose and certs that
-  were removed by an attacker, so the first rcynic run after a cert
-  has been revoked pulls the old cert from the previous rcynic pass,
-  find that it's listed in the CRL, and whines about it.
+	PRIORITY: Desirable, trivial to implement.
author	Rob Austein <sra@hactrn.net>	2008-04-21 22:47:30 +0000
committer	Rob Austein <sra@hactrn.net>	2008-04-21 22:47:30 +0000
commit	8dfec45230670b061f16dbdc1492148d241e62a3 (patch)
tree	3faaa36ce46792dcd4a930262d97689457a96236
parent	2e30f96bfd94b3831c31e18e7d4fbdcf38dd0103 (diff)