$Id$

Python RPKI production tools.

Requires Python 2.5.

External Python packages required:

- lxml, which in turn requires the libxml2 C libraries.

  FreeBSD: /usr/ports/devel/py-lxml

- MySQLdb, which in turn requires MySQL client and server.  I'm
  testing with MySQL 5.1.

  FreeBSD: /usr/ports/databases/py-MySQLdb

- TLSLite, which pulls in other crypto packages.

  FreeBSD: /usr/ports/security/py-tlslite

- Cryptlib, at the moment just to support TLSlite but may end up using
  it for other things later.

  FreeBSD: /usr/ports/security/cryptlib

  ...but the FreeBSD port doesn't (yet?) install the Python bindings,
  sigh, so at the moment you have to do that by hand:

  # cd /usr/ports/security/cryptlib
  # make install
  # cd work/bindings
  # python setup.py install
  # cd ../..
  # make clean

- Eventually I expect that this will require an event-handling package
  like Twisted, but I'm not there yet.

- The testpoke tool (up-down protocol command line test client) also
  uses PyYAML, mostly for compatability with APNIC's equivalent tool.

  FreeBSD: /usr/ports/devel/py-yaml

We also use a hacked copy of the Python OpenSSL Wrappers (POW)
package, but our copy has enough modifications that it's expanded in
the Subversion tree.  Depending on how this all works out, I may end
up splitting the POW.pkix module out of the POW package and using it
with Cryptlib, as the POW.pkix package is 98% about doing ASN.1 in
pure Python and only 2% about any kind of crypto.



Current TO DO list:

- Representation of timestamps is a mess.  We have four different
  kinds already: seconds from epoch, the the two flavors of timestamps
  used in ASN.1, and the timestamps used in MySQL.  Need a unifying
  class to hide all this nastiness.

- Subsetting (req_* attributes in up-down protocol)

- Revocation and CRL generation

  - Need to keep data on unexpired revoked certs to generate CRL
    (added a timestamp for this, sufficient?)

  - Do we ever need to delay revocation of old certs to give their
    replacements time to propegate?

  These two imply that we need fields in child_cert table to
  indicate whether a cert is dead, eg, a date field which is NULL if
  the cert is still live, otherwise is the date after which it should
  be added to the CRL.

- Publication protocol and implementation thereof.   Defer until core
  functionality in the main engine is done.

  As an interim measure, hack some kind of stub publication (not real
  protocol yet, just dump to local filesystem so can see outputs and
  maybe rcynic against them); this is a stop-gap to let me concentrate
  on the main engine and defer work on the publication protocol and
  engine.

- Publication hooks everywhere - need not wait for protocol, can just
  log what would happen for now, or write to local file store (perhaps
  even in a form that we can use with rcynic as a relying party).
  Hooks for this go into:

  - Cert publication

  - CRL publication

  - Manifest publication

  - Withdrawal of any of the above

- Child batch processing loop, eg, regeneration or removal of expired
  certs, CRL update, manifest update, etc.  This should probably be an
  iteration over CA objects, as the CA is the actor in pretty much
  everything that might need to be done.

  Figuring out whether to regenerate or remove expired certs requires
  some of the same data as CRL generation.

  - Code to clean up expired certs

  - Code to revoke certs -- need to sort out when we do this
    automatically vs waiting for explicit revoke PDU from child

  - Code to generate CRLs

- Haven't done anything about db.commit() and db.rollback() yet, for
  that matter haven't yet whacked MySQL to enable those features.

- In theory, all access to object data attributes ought to be through
  accessor methods so that the .set() method can mark teh object as
  SQL-dirty automagically.  Not done yet.  One way of hiding the
  grotty bits here might be to make attributes look like elements of a
  dict(), ie, implement __getitem__() and __setitem__() methods,
  probably along with the other "container type" special methods for
  completeness.  An even easier way to do this would be to subclass
  dict() and just customize __setitem__().

  But all of this assumes that we really want automagic SQL-dirty, and
  I'm less convinced of that now than when I made the above note.

- Whack expiration dates of certs to match irdb valid_until value when
  issuing -- valid_until is optional, what do we do if it's not set?
  Default period in self object seems obvious answer, neither Randy
  nor I has thought of anything better yet.

- Test with larger data set -- Tim gave me plenty of data and I have
  the low-level tools, just haven't written the glue logic to create
  child objects for all the entities in the IRDB, poll on behalf of
  each of them, and check the result for sanity

- Need to figure out how we're going to handle "root" case (IANA/RIR
  self-signed resource cert issuing to its children).  Right now this
  is a separate little program, which is nice for testing but feels
  wrong.  As Randy puts it, this is kind of a special case of a
  parent, although we might want to represent it differently.  If this
  were really only IANA and the RIRs, handling it as a special case
  separate program might make sense, but anybody who wants to certify
  private address space is going to have this problem, and the
  separate daemon approach just feels wrong for that.

  If it's not a separate daemon, will need left-right protocol support
  to configure whatever it is we're going to configure, with all the
  usual private key hygiene issues.  This probably implies a level of
  indirection, eg, the self-signed cert is generated in the IRBE, the
  RPKI engine generates PKCS#10 for a working cert to be issued by the
  self-signed cert (perhaps with RFC 3779 inheritance for everything,
  to keep it small), so that the RPKI engine never needs to hold the
  private key for the root.

  Deferred for the moment, not sure for how long.

- Logging subsystem, including syslog support.

Once this lot is done we'll be close to something that shows at least
the basics of normal operation, albiet in a form that's not yet usable
in production.

Follow-up after that will be getting rid of remaining synchronous code
(make daemon fully event-driven, except perhaps for SQL queries),
address rollback, commit, and other data integrity issues, and see how
well the resulting code handles hosting (multiple self objects in same
daemon).

Somewhere along the way I'll need to update to the new model of trust
anchors we ended up with in Amsterdam, first step for which will
involve writing it down (well, RobK was supposed to do that, but I was
supposed to convert some pencil sketches into graphviz for him so
we're both lame on this so far).  I don't think this results in major
changes, probably a few extra cert fields in the self object which we
then need to toss into the rpki.x509.X509_chain objects before
verifying CMS or TLS, and perhaps the existing TA fields in various
objects become pairs of certs instead of a single TA, but this is
mostly just generalization and reuse of existing code, no bold new
adventures.

At some point we need to do performance testing.  All of our crypto is
done internally except for CMS, for which we still call the OpenSSL
CLI tool.  The simplest solution would probably be to extend POW to
support CMS; cryptlib might be more entertaining solution, but am not
sure it's worth the time sink.

Although it may not be worth the effort, it'd be nice to get back to
only one crypto library.  As far as I know the only things we're using
M2Crypto or cryptlib for are TLS.  POW supports TLS, but tlslite
doesn't support POW.  I suspect that tlslite has some kind of driver
interface, since it already supports two crypto packages, so there may
be a simple answer here.  Not worth getting anywhere near this until
after making the jump to Twisted, as that might also affect this mess.
If the Python implementation ends up becoming the production version
of the engine rather than just a prototype, it might make sense to
revisit this, as the current mess of crypto libraries smells like an
accident waiting to happen.