$Id$

INTRODUCTION

The design of rpkid and friends assumes that certain tasks can be
thrown over the wall to the registry's back end operation.  This was a
deliberate design decision to allow rpkid et al to remain independent
of existing database schema, business PKIs, and so forth that a
registry might already have.  All very nice, but it leaves someone who
just wants to test the tools or who has no existing back end with a
fairly large programming project.  The tools in this directory attempt
to fill that gap.

This is a basic implementation of what a registry back end would need
to use rpkid and friends.  These tools do not use every available
option, nor are they necessarily as efficient as possible.  Large
registries will almost certainly want to roll their own tools, perhaps
using these as a starting point.  Nevertheless, we hope that these
tools will at least provide a useful example.

The primary tool here is a single command line Python program:
myrpki.py.  myrpki has a number of commands, most of which are used
for initial setup, some of which are used on an ongoing basis.  myrpki
can be run either in an interactive mode or by passing a single
command on the command line when starting the program; the former mode
is intended to be somewhat human-friendly, the latter mode is useful
in scripting, cron jobs, and automated testing.

myrpki use has two distinct phases: setup and data maintenance.  The
setup phase is primarily about constructing the "business PKI" (BPKI)
certificates that the daemons use to authenticate CMS and HTTPS
messages and obtaining the service URLs needed to configure the
daemons.  The data maintenance phase is about configuring local data
into the daemons.

myrpki uses the OpenSSL command line tool for almost all operations on
keys and certificates; the one exception to this is the comamnd which
talks directly to the daemons, as this command uses the same
communication libraries as the daemons themselves do.  The intent
behind using the OpenSSL command line tool for everything else is to
allow all the other commands to be run without requiring all the
auxiliary packages upon which the daemons depend; this can be useful,
eg, if one wants to run the back-end on a laptop while running the
daemons on a server, in which case one might prefer not to have to
install a bunch of unnecessary packages on the laptop.

During setup phase myrpki generates and processes small XML messages
which it expects the user to ship to and from its parents, children,
etc via some out-of-band means (email, perhaps with PGP signatures,
USB stick, we really don't care).  During data maintenance phase,
myrpki does something similar with another XML file, to allow hosting
of RPKI services; in the degenerate case where an entity is just
self-hosting (ie, is running the daemons for itself, and only for
itself), this latter XML file need not be sent anywhere.

The basic idea here is that a user who has resources maintains a set
of .csv files containing a text representation of the data needed by
the back-end, along with a configuration file containing other
parameters.  The intent is that these be very simple files that are
easy to generate either by hand or as a dump from relational database,
spreadsheet, awk script, whatever works in your environment.  Given
these files, the user then runs myrpki to extract the relevant
information and encode everything about its back end state into an XML
file, which can then be shipped to the appropriate other party.

Many of the myrpki commands which process XML input write out a new
XML file, either in place or as an entirely new file; in general,
these files need to be sent back to the party that sent the original
file.  Think of all this as a very slow packet-based communication
channel, where each XML file is a single packet.  In setup phase,
there's generally a single round-trip per setup conversation; in the
data maintenance phase, the same XML file keeps bouncing back and
forth between hosted entity and hosting entity.

Note that, as certificates and CRLs have expiration and nextUpdate
values, a low-level cycle of updates passing between resource holder
and rpkid operator will be necessary as a part of steady state
operation.  [The current version of these tools does not yet
regenerate these expiring objects, but fixing this will be a
relatively minor matter.]

The third important kind of file in this system is the configuration
file for myrpki.  This contains a number of sections, some of which
are for myrpki, others of which are for the OpenSSL command line tool,
still others of which are for the various RPKI daemon programs.  The
examples/ subdirectory contains a commented version of the
configuration file that explains the various parameters.

The .csv files read by myrpki can be anything that the Python "csv"
library understands.  By default, they're in tab-delimited format
(because the author finds this easier to read than the comma-delimited
format), but this can be changed to fit local needs.

Please note: tab-delimited CSV is a format defined by a certain
popular spreadsheet program, and is *not* the same as
whitespace-separated text.  Tab characters are *punctuation*, and each
tab character indicates the division between two columns.  Two tab
characters in a row indicates a separator, a blank cell, and another
separator, not one separator.  The upshot of all this is that
attempting to make your columns line up prettily will not work as you
expect, you will end up with too many cells, some of them empty.

Keep reading, and don't panic.

The default configuration file name is myrpki.conf.  You can change
this using the "-c" option when invoking myrpki, or by setting the
environment variable MYRPKI_CONF.

See examples/myrpki.conf for details on the variables that you can
(and in some cases must) set.

See examples/*.csv for commented examples of the several CSV files.
Note that the comments themselves are not legal CSV, they're just
present to make it easier to understand the examples.


GETTING STARTED -- OVERVIEW

Which process you need to follow depends on whether you are running
rpkid yourself or will be hosted by somebody else.  We call the first
case "self-hosted", because the software treats running rpkid to
handle resources that you yourself hold as if you are an rpkid
operator who is hosting an entity that happens to be yourself.

"$top" in the following refers to wherever you put the
subvert-rpki.hactrn.net code.  Once we have autoconf and "make
install" targets, this will be some system directory or another; for
now, it's wherever you checked out a copy of the code from the
subversion repository or unpacked a tarball of the code.

Most of the setup process looks the same for any resource holder,
regardless of whether they are self-hosting or not.  The differences
come in the data maintenence phase.

The steps needed during setup phase are:

0) Write a configuration file (copy $top/myrpki/examples/myrpki.conf
   and edit as needed).  You need to configure the [myrpki] section;
   in theory, the rest of the file should be ok as it is, at least for
   simple use.  You also need to create (either by hand or by dumping
   from a database, spreadsheet, whatever) the CSV files describing
   prefixes and ASNs you want to allocate to your children and ROAs
   you want created.

1) Initialization ("initialize" command).  This creates the local BPKI
   and other data structures that can be constructed just based on
   local data such as the config file.  Other than some internal data
   structures, the main output of this step is the "identity.xml" file,
   which is used as input to later stages.

   In theory it should be safe to run the "initialize" command more
   than once, in practice this has not (yet) been tested.

2) Send (email, USB stick, carrier pigeon) identity.xml to each of your
   parents.  This tells each of your parents what you call yourself,
   and supplies each parent with a trust anchor for your
   resource-holding BPKI.

3) Each of your parents runs the "configure_child" command, giving the
   identity.xml you supplied as input.  This registers your data with
   the parent, including BPKI cross-registration, and generates a
   return message containing your parent's BPKI trust anchors, a
   service URL for contacting your parent via the "up-down" protocol,
   and (usually) either an offer of publication service (if your parent
   operates a repository) or a referral from your parent to whatever
   publication service your parent does use.  Referrals include a
   CMS-signed authorization token that the repository operator can use
   to determine that your parent has given you permission to home
   underneath your parent in the publication tree.

4) Each of your parents sends (...) back the response XML file
   generated by the "configure_child" command.

5) You feed the response message you just got into myrpki using the
   "configure_parent" command.  This registers the parent's information
   in your database, including BPKI cross-certification, and processes
   the repository offer or referral to generate a publication request
   message.

6) You send (...) the publication request message to the repository.
   The <contact_info/> element in the request message should (in
   theory) provide some clue as to where you should send this.

7) The repository operator processes your request using myrpki's
   "configure_publication_client" command.  This registers your
   information, including BPKI cross-certification, and generates a
   response message containing the repository's BPKI trust anchor and
   service URL.

8) Repository operator sends (...) the publication confirmation message
   back to you.

9) You process the publication confirmation message using myrpki's
   "configure_repository" command.

At this point you should, in theory, have established relationships,
exchanged trust anchors, and obtained service URLs from all of your
parents and repositories.  The last setup step is establishing a
relationship with your RPKI service host, if you're not self-hosted,
but as this is really just the first message of an ongoing exchange
with your host, it's handled by the data maintenance commands.

The two commands used in data maintenence phase are
"configure_resources" and "configure_daemons".  The first is used by
the resource holder, the second is used by the host.  In the
self-hosted case, it is not necessary to run "configure_resources" at
all, myrpki will run it for you automatically.

GETTING STARTED -- CONFIGURATION FILE

The current sample configuration file should, in theory, be much
simpler to use than in earlier versions of this code.  The sample
configuration uses a simple macro-expansion mechanism to place all of
the configuration data you need to touch into the [myrpki] section;
the rest of the configuration file is for the various daemons and
other tools, and is entirely configured via references to the values
defined in the [myrpki] section.

GETTING STARTED -- HOSTED CASE

The basic steps involved in getting started for a resource holder who
is being hosted by somebody else are:

a) Run through steps (0)-(9), above.

b) Run the configure_resources command to generate myrpki.xml.

c) Send myrpki.xml to the rpkid operator who will be hosting you.

d) Wait for your rpkid operator to ship you back an updated XML file
   containing a PKCS #10 certificate request for the BPKI signing
   context (BSC) created by rpkid.

e) Run configure_resources again with the XML file received in step
   (d), to issue the BSC certificate and update the XML file again to
   contain the newly issued BSC certificate.

f) Send the updated XML file back to your rpkid operator.

At this point you're done with initial setup.  You will need to run
configure_resources again whenever you make any changes to your
configuration file or CSV files.  [Once myrpki knows how to update
BPKI CRLs, you will also need to run configure_resources periodically
to keep your BPKI CRLs up to date.]  Any time you run
configure_resources myrpki, you should send the updated XML file to
your rpkid operator, who will [generally?] send you a further updated
XML file in response.

GETTING STARTED -- SELF-HOSTED CASE

The first few steps involved in getting started for a self-hosted
resource holder (that is, a resource holder that runs its own copy of
rpkid) are the same as in the hosted case above; after that the
process diverges.

The [current] steps are:

a) See rpkid/doc/Installation, and follow the basic installation
   instructions there to build the RFC-3779-aware OpenSSL code and
   associated Python extension module.

b) Run through steps (0)-(9), above.

c) Next, you need to set up the MySQL databases that rpkid et al will
   use.  The MySQL database, username, and password values all need to
   match the ones you specified in myrpki.conf.  There are two
   different ways you can do this:

   i) You can use the setup-sql.py script, which prompts you for your
      MySQL root password then attempts to do everything else
      automatically using values from myrpki.conf; or

   ii) You can do it manually.

   The first approach is simple:

   $ python setup-sql.py
   Please enter your MySQL root password:

   The script should tell you what databases it creates.  You can use
   the -v option if you want to see more details about what it's doing.

   If you'd prefer to do the SQL setup manually, perhaps because you
   have valuable data in other MySQL databases and you don't want to
   trust some random setup script with your MySQL root password,
   you'll need to use the MySQL command line tool, as follows:

   $ mysql -u root -p

   mysql> CREATE DATABASE irdb_database;
   mysql> GRANT all ON irdb_database.* TO irdb_user@localhost IDENTIFIED BY 'irdb_password';
   mysql> USE irdb_database;
   mysql> SOURCE $top/rpkid/irdbd.sql;
   mysql> CREATE DATABASE rpki_database;
   mysql> GRANT all ON rpki_database.* TO rpki_user@localhost IDENTIFIED BY 'rpki_password';
   mysql> USE rpki_database;
   mysql> SOURCE $top/rpkid/rpkid.sql;
   mysql> COMMIT;
   mysql> quit

   where "irdb_database", "irdb_user", "irdb_password",
   "rpki_database", "rpki_user", and "rpki_password" are the
   appropriate values from your configuration file.

   If you are running pubd and doing manual SQL setup, you'll also
   have to do:

   $ mysql -u root -p
   mysql> CREATE DATABASE pubd_database;
   mysql> GRANT all ON pubd_database.* TO pubd_user@localhost IDENTIFIED BY 'pubd_password';
   mysql> USE pubd_database;
   mysql> SOURCE $top/rpkid/pubd.sql;
   mysql> COMMIT;
   mysql> quit

d) If you are running your own publication repository (that is, if you
   are running pubd), you will also need to set up an rsyncd server or
   configure your existing one to serve pubd's output.  There's a
   sample configuration file in $top/myrpki/examples/rsyncd.conf, but
   you may need to do something more complicated if you are already
   running rsyncd for other purposes.  See the rsync(1) and
   rsyncd.conf(5) manual pages for more details.

e) Start the daemons.  You can use $top/myrpki/start-servers.py to do
   this, or write your own script.

   If you intend to run pubd, you should make sure that the directory
   you specified as publication_base_directory exists and
   is writable by the userid that will be running pubd, and should
   also make sure to start rsyncd.

f) Run myrpki's configure_daemons command, twice, with no arguments.

   You need to run the command twice because myrpki has to ask rpkid
   to create a keypair and generate a certification request for the
   BSC.  The first pass does this, the second processes the
   certification request, issues the BSC, and loads the result into
   rpkid.  [Yes, we could automate this somehow, if necessary.]

At this point, if everything went well, rpkid should be up,
configured, and starting to obtain resource certificates from its
parents, generate CRLs and manifests, and so forth.  At this point you
should go figure out how to use the relying party tool, rcynic: see
$top/rcynic/README if you haven't already done so.

If and when you change your CSV files, you should run
configure_daemons again to feed the changes into the daemons.

GETTING STARTED -- HOSTING CASE

If you are running rpkid not just for your own resources but also to
host other resource holders (see "HOSTED CASE" above), your setup will
be almost the same as in the self-hosted case (see "SELF-HOSTED CASE",
above), with one procedural change: you will need to tell
configure_daemons to process the XML files produced by the resource
holders you are hosting.  You do this by specifying the names of all
those XML files on as arguments to the configure_daemons command.  So,
if you are hosting two friends, Alice and Bob, then, everywhere the
instructions for the self-hosted case say to run configure_daemons
with no arguments, you will instead run it with the names of Alice's
and Bob's XML files as arguments.

Note that configure_daemons sometimes modifies these XML files, in
which case it will write them back to the same filenames.  While it is
possible to figure out the set of circumstances in which this will
happen (at present, only when myrpki has to ask rpkid to create a new
BSC keypair and PKCS #10 certificate request), it may be easiest just
to ship back an updated copy of the XML file after every you run
configure_daemons.

GETTING STARTED -- "PURE" HOSTING CASE

In general we assume that anybody who bothers to run rpkid is also a
resource holder, but the software does not insist on this.

[Er, well, rpkid doesn't, but myrpki now does -- "pure" hosting was an
unused feature that fell by the wayside while simplifying the user
interface.  It would be relatively straightforward to add it back if
we ever need it for anything, but the mechanism it used to use no
longer exists -- the old [myirbe] section of the config file has been
collapsed into the [myrpki] section, so testing for existance of the
[myrpki] section no longer works.  So we'll need an explicit
configuration option, no big deal, just not worth chasing now.]

A (perhaps) plausible use for this capability would be if you are an
rpkid-running resource holder who wants for some reason to keep the
resource-holding side of your operation completely separate from the
rpkid-running side of your operation.  This is essentially the
pure-hosting model, just with an internal hosted entity within a
different part of your own organization.

UPGRADING FROM OLD MYRPKI TOOLS

There's a script that attempts to upgrade from the previous version of
the myrpki tools (myirbe scripts, parents.csv file, etcetera).  The
conversion script is not well tested, so taking a backup (including an
SQL dump) FIRST is STRONGLY recommended.  The script attempts to read
all the necessary settings out of your old myrpki.conf file and the
obsolete {parents,children,pubclients}.csv files, and writes out a new
configuration file (myrpki.conf.new) and a set of "entitydb" files
(the local XML database used by the current myrpki program).  To use
the conversion script, just run

$ python convert-from-csv-to-entitydb.py

with no arguments in the directory where your old myrpki.conf and .csv
files reside.  See the script itself for available command line
options, most of which override various filenames.

Note that the conversion script will not rename existing BPKI
directories to the new convention (./bpki/{resources,servers}/),
instead it will write out myrpki.conf.new using the old directory
names (./bpki.{myrpki,myirbe}/); if you want to switch to the new
convention, move the directories yourself and edit the .conf file to
match.  The script does not delete any of the old files, so you'll
want to clean up yourself after you're sure the conversion worked.

Be warned that the old file format contains less information than the
new XML files do, so in some cases the conversion script is just
making stuff up as best it can.  In theory, the cases where it has to
do this will not matter, but this has not been tested yet.

TROUBLESHOOTING

If you run into trouble setting up this package, the first thing to do
is categorize the kind of trouble you are having.  If you've gotten
far enough to be running the daemons, check their log files.  If
you're seeing Python exceptions, read the error messages.  If you're
getting TLS errors, check to make sure that you're using all the right
BPKI certificates and service contact URLs.

TLS configuration errors are, unfortunately, notoriously difficult to
debug, because connection failures due to misconfiguration happen
early, deep in the guts of the OpenSSL TLS code, where there isn't
enough application context available to provide useful error messages.

If you've completed the steps above, everything appears to have gone
OK, but nothing seems to be happening, the first thing to do is check
the logs to confirm that nothing is actively broken.  rpkid's log
should include messages telling you when it starts and finishes its
internal "cron" cycle.  It can take several cron cycles for resources
to work their way down from your parent into a full set of
certificates and ROAs, so have a little patience.  rpkid's log should
also include messages showing every time it contacts its parent(s) or
attempts to publish anything.

rcynic in fully verbose mode provides a fairly detailed explanation of
what it's doing and why objects that fail have failed.

You can use rsync (sic) to examine the contents of a publication
repository one directory at a time, without attempting validation, by
running rsync with just the URI of the directory on its command line:

  $ rsync rsync://rpki.example.org/where/ever/

[Maybe there should be something here explaining how to use
irbe_cli.py for debugging, but the syntax is fairly obscure as it's
just a command line interface to the left-right and publication
protocols -- almost certainly want a friendlier tool for
troubleshooting.]

KNOWN ISSUES

The lxml package provides a Python interface to the Gnome libxml2 and
libxslt C libraries.  This code has been quite stable for several
years, but initial testing with lxml compiled and linked against a
newer version of libxml2 ran into problems (specifically, gratuitous
RelaxNG schema validation failures).  libxml2 2.7.3 worked; libxml2
2.7.5 did not work on the test machine in question.  Reverting to
libxml2 2.7.3 fixed the problem.  Rewriting the two lines of Python
code that were triggering the lxml bug appears to have solved the
problem, so the code now works properly with libxml 2.7.5, but if you
start seeing weird XML validation failures, it might be another
variation of this lxml bug.

An earlier version of this code ran into problems with what appears to
be an implementation restriction in the the GNU linker ("ld") on
64-bit hardware, resulting in obscure build failures.  The workaround
for this required use of shared libraries and is somewhat less
portable than the original code, but without it the code simply would
not build in 64-bit environments with the GNU tools.  The current
workaround appears to behave properly, but the workaround requires
that the pathname to the RFC-3779-aware OpenSSL shared libraries be
built into the _POW.so Python extension module.  At the moment, in the
absence of "make install" targets for the Python code and libraries,
this means the build directory; eventually, once we're using autoconf
and installation targets, this will be the installation directory.  If
necessary, you can override this by setting the LD_LIBRARY_PATH
environment variable, see the ld.so man page for details.  This is a
relatively minor variation on the usual build issues for shared
libraries, it's just annoying because shared libraries should not be
needed here and would not be if not for this GNU linker issue.