aboutsummaryrefslogtreecommitdiff
path: root/scripts/README
blob: 3bb445611f9583fe3a814e2e8f4046145bd0d63c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
$Id$ -*- Text -*-

Python RPKI production tools.

Requires Python 2.5.

External Python packages required:

- lxml, which in turn requires the libxml2 C libraries.

  http://codespeak.net/lxml/

  FreeBSD: /usr/ports/devel/py-lxml

- MySQLdb, which in turn requires MySQL client and server.  I'm
  testing with MySQL 5.1.

  http://sourceforge.net/projects/mysql-python/

  FreeBSD: /usr/ports/databases/py-MySQLdb

- TLSLite, which pulls in other crypto packages.

  http://trevp.net/tlslite/

  FreeBSD: /usr/ports/security/py-tlslite

- Cryptlib, at the moment just to support TLSlite but may end up using
  it for other things later.

  http://www.cs.auckland.ac.nz/~pgut001/cryptlib/

  FreeBSD: /usr/ports/security/cryptlib

  ...but the FreeBSD port doesn't (yet?) install the Python bindings,
  sigh, so at the moment you have to do that by hand:

  # cd /usr/ports/security/cryptlib
  # make install
  # cd work/bindings
  # python setup.py install
  # cd ../..
  # make clean

- Eventually I expect that this will require an event-handling package
  like Twisted, but I'm not there yet.

- The testpoke tool (up-down protocol command line test client) and
  testbed tools also uses PyYAML.

  http://pyyaml.org/

  FreeBSD: /usr/ports/devel/py-yaml

We also use a hacked copy of the Python OpenSSL Wrappers (POW)
package, but our copy has enough modifications that it's expanded in
the Subversion tree.  Depending on how this all works out, I may end
up splitting the POW.pkix module out of the POW package and using it
with Cryptlib, as the POW.pkix package is 98% about doing ASN.1 in
pure Python and only 2% about any kind of crypto.



$Revision$

TO DO:

- Scripted tests to grow and shrink and revoke and ....  See
  testbed.*.yaml, but more systematic testing needed.

  PRIORITY: Required

  TIME REQUIRED: open-ended

  STATUS: Ongoing

- Randy's "user validation tool" (fetch and validate certs and
  probably the ROA for a prefix I want to accept in a route filter I
  am building in Python/Perl).  This probably uses rcync's output as
  one of its inputs.

  This is a basic tool for a sysadmin who wants to -use- all this crud
  we're working so hard to generate.  It's not required for the
  generation tools to work, but without it the entire toolset does
  nothing obviously useful, which will make it a very hard sell during
  the limited public test stage.

  PRIORITY: Required

  DEPENDS ON: ROA generation

  TIME REQUIRED: three days

  STATUS: Not started

- Common protocol dump format with APNIC and other implementors so we
  can read each other's dumps.  "Obvious" format would be an
  OpenSSL-style PEM of the CMS, with a "text" portion (the place where
  "openssl x509 -text" would put a text dump of a cert) showing the
  wrapped XML.

  PRIORITY: Desirable

  TIME REQUIRED: one day

  STATUS: Not started

- Clean unused cruft out of left-right protocol, or at least have
  control booleans we don't intend to implement at present signal an
  error if used.

  Bottleneck here has been deciding what to punt and what to
  implement.  Removing unused booleans or raising errors when they're
  used is trivial.

  PRIORITY: Required

  TIME REQUIRED: Less than one day

  STATUS: Error signalling done

- resource_set_notafter attribute added to RelaxNG but not yet to
  rpki.up_down.class_elt.  Need to convert to and from
  rpki.sundial.datetime.  This is an up-down protocol feature that was
  added fairly late and that none of us properly implement yet, but
  failing to handle it would be a spec violation and eventually cause
  an interop problem.

  PRIORITY: Required

  TIME REQUIRED: Less than one day

  STATUS: Done

- Publication protocol and implementation thereof.   Protocol design
  started, Randy had comments that sent me back to the drawing board
  (he was right).  Next step is to integrate Randy's advice, which
  probably means picking up more of the left-right protocol framework.

  Desirable although not strictly required that protcol be agreed upon
  among the RIRs.  Might not be practical given how long it takes
  group to decide anything.

  Tricky bit is making sure that repository receives enough
  information to know whether parent has authorized child to use
  parent's namespace in nesting case.   In theory this is
  straightforward but requires careful checking.

  ARIN can't host output of non-hosted RPKI engines without this, and
  that's critical both to the security model as discussed with ARIN
  staff in late 2006, so I believe we need this capability even as
  part of the initial limited test.

  PRIORITY: Required

  TIME REQUIRED: 1-2 weeks for implementation once protocol settled,
  depending on how much of the protocol and implementation I can steal
  from the existing left-right protocol.

  STATUS: Started

- Subsetting (req_* attributes in up-down protocol)

  Minimal implementation would be to recognize this as correct
  protocol and signal an internal server error if it's ever used.

  More serious implementation would require expanding SQL child_cert
  table to hold subset masks and tweaking almost every bit of code
  that touches that table.

  PRIORITY: Required

  TIME REQUIRED (minimal version): One day

  TIME REQUIRED (real version): 1-2 weeks

  STATUS: Not started

- Error handling: make sure that exceptions map correctly to up-down
  error codes, flesh out left-right error codes.  Note that the same
  exception may produce different error codes depending on which
  up-down PDU we're processing (sigh).

  Will require code audit for coherency.

  PRIORITY: Required

  TIME REQUIRED: four days

  DEPENDS ON: almost everything else, as almost any code change can
  raise new exceptions that we'd need to handle.

  STATUS: Not started

- db.commit(), db.rollback(), code audit for data integrity issues,
  fix any data integrity issues that turn up.

  Among other issues, we need to handle loss of connnection to
  database server and other MySQL errors.  MySQLdb throws an
  exception, which we can catch, and retrying is easy enough, but need
  to be careful about recovery action depending on whether we had
  uncommitted changes.

  PRIORITY: Required

  TIME REQUIRED (commit and rollback): Two weeks

  TIME REQUIRED (data integrity audit): Three days

  TIME REQUIRED (fix data integrity): Unknown, depends on code audit
  and results of runtime testing.

  DEPENDS ON: async tasking model, sort of -- could do it first, but
  tasking change will affect the exception handling that triggers
  rollback.

  STATUS: Not started

- Test with larger data set -- Tim gave me plenty of data, I have the
  low-level tools and the glue logic to create child objects for all
  the entities in the IRDB, but I don't yet have logic to poll on
  behalf of each of them and check result for sanity.

  Maybe it'd be easier to write something that dumps Tim's database in
  YAML format for testbed.py to chew on?

  PRIORITY: Highly desirable

  TIME REQUIRED (setup): One day to convert Tim's data to YAML

  TIME REQUIRED (testing): Unknown, depends on what we turn up

  STATUS: Not started

- Clean up rootd.py to be usable in a production system.   Most urgent
  issue is handling of private keys.   May not need much else, as this
  is not a high-traffic server.

  PRIORITY: Highly desirable (not strictly needed for limited testing)

  TIME REQUIRED: Two days

  STATUS: Not started

- Test framework, multiple self-instances per engine-instance (single
  self-instance per engine-instance is already done).

  PRIORITY: Required

  DEPENDS ON: async tasking model.

  TIME REQUIRED: One week

  STATUS: Not started

- tlslite code seems flakey under heavy use, and doesn't support all
  the cert checks we want.  Best bet for getting this right is
  probably to hack on the POW Ssl class until it supports everything
  shown in the OpenSSL book; aside from speed, the main advantage here
  is that there -is- a list of all the things one needs to do to use
  TLS properly if one follows this recipe, whereas with TLSlite it's
  all a mystery.

  Useful side effect of doing this via POW: it brings us back to only
  needing one crypto library (in particular it lets us punt M2Crypto,
  which appears to be coded as an accident waiting to happen).

  PRIORITY: Required (cert checking is a security issue).

  TIME REQUIRED: Two weeks.

  DEPENDS ON: Async tasking model.

  STATUS: Not started

- ROA generation.  We have a bunch of the primitives for this but we
  aren't yet generating the ROAs themselves.

  PRIORITY: Required

  TIME REQUIRED: Three days

  STATUS: Not started

- Make rpkid fully event-driven (async tasking model), except for SQL
  queries.  This probably involves the "twisted" framework.

  PRIORITY: Required (to implement hosting model)

  TIME REQUIRED: one week.

  STATUS: Not started

- Update biz trust anchor model to what we came up with in Amsterdam.
  This was a direct result of security review by Kent and Housley.

  This has been waiting for work we hope RobK is doing.  This is
  probably not a lot of coding, probably a few extra cert fields in
  the self object which we then need to toss into the
  rpki.x509.X509_chain objects before verifying CMS or TLS, and
  perhaps the existing TA fields in various objects become pairs of
  certs instead of a single TA, but this is mostly just generalization
  and reuse of existing code, no bold new adventures.

  PRIORITY: Required (security issue)

  TIME REQUIRED: One week.

  STATUS: Not started

- Performance testing

  STATUS: Not started

- rcynic handling of RPKI trust anchors probably needs updating.
  Discussions over last N months of how RPKI trust anchors work, how
  we package them, and how we roll them over.  The last (TA rollover)
  is the driver for this.

  Last I recall (need to check email archives) APNIC had proposed a
  relatively simple format (CMS signed PEM-encoded X.509 object set,
  or something like that).  Need to do analysis to make sure this is
  adaquate for our needs, if so just use it.  This would involve minor
  changes to rcynic.

  Alternatively, this could be a separate program to keep this grot
  out of rcynic itself, but that's probably a usability nightmare.

  PRIORITY: Required (usability issue for relying parties)

  TIME REQUIRED: Three days.

  STATUS: Not started

- rcynic does not yet handle manifests.  This is both a real problem
  (manifests were added to plug a security hole) and a user acceptance
  problem (without manifest support rcynic checks old certs that are
  supposed to fail because they've been revoked, resulting in what
  appear to be spurious errors, which just annoy the user).

  PRIORITY: Required

  TIME REQUIRED: One week.

  STATUS: Not started

- Update operation and installation docs.

  Known current omissions: left-right "rekey" and "revoke" operations,
  testbed.py's rootd_sia config option.

  TIME REQUIRED (current work items): Less than one day

  PRIORITY: Required

  STATUS: Ongoing

- Update internals docs (Doxygen).   Mostly this means updating
  function comments in the Python code, as the rest is automatic.  May
  require a bit of overview text to explain the workings of the code,
  this overview text may well turn out to be just the current flat
  text documents marked up for inclusion by Doxygen.

  PRIORITY: Desirable

  TIME REQUIRED: Two days

  STATUS: Ongoing

- Reorganize code (directory names, module names, which objects are in
  which modules, add gctx pointers to objects so we can stop passing
  all these flipping explicit gctx pointers in almost every function
  call) to make it easier to understand and maintain.  Portions of the
  existing code were done in extreme haste to meet testing deadlines,
  and it shows.

  STATUS: Not started

  TIME REQUIRED: two days

  PRIORITY: Highly desirable (to preserve programmers' and
  maintainers' sanity, if nothing else)

- Add HSM support.  Architecture includes it, current code does not.
  First step here would be talking to somebody who understands PKCS#11
  better than I do, ie, Richard Lamb or Francis Dupont.

  STATUS: Not started

  TIME REQUIRED: Unknown

  PRIORITY: Desirable.  Am guessing ARIN does not require this for
  initial test



Things implemented but not yet tested. 

- Client side of expiration now assumes that parent will reissue
  when its IRDB changes.

- Parent side of revocation (child_cert objects) and CRL generation
  implemented.

- Parent side of expiration implemented.

- Child batch processing loop: regeneration or removal of expired
  certs based on what's in the IRDB.

- Batch regeneration of CRLs and manifests for all CAs.

- Protection against up-down operations specifying a class_name that
  belongs to some other self context.

- Rewrote code that handles revoke on shrink to revoke -all- old certs
  for that key, not just most recent.  Not certain, but this may have
  been the cause of a cert dropping not showing up in the CRL during
  testing with APNIC in Vancouver.

- Kludgy local publication hack seems to work now, including
  withdrawal.  rcynic still whines occasionally, but I think that's
  just because, without manifest support, rcynic has no way of telling
  the difference between certs we withdrew on purpose and certs that
  were removed by an attacker, so the first rcynic run after a cert
  has been revoked pulls the old cert from the previous rcynic pass,
  find that it's listed in the CRL, and whines about it.



Other random notes:

Being able to specify interaction with other servers (not running
under testbed) in a testbed.yaml might be useful for interop tests.
Kind of breaks testbed's fundamental model, though.  Replacing what
testbed thinks is a leaf with somebody else would be easy, so maybe we
could specify some way to hang a bunch of rpkids under an external
parent?  Hmm, data needed would look a lot like testpoke.yaml, maybe
we can reuse some of that language?

There's a three-way tradeoff lurking in the publication protocol,
manifest generation, and CRL generation:

1) Consistancy issues for relying parties (eg, don't want to withdraw
   something that's still listed in the manifest);

2) Efficiency issues for the RPKI engine (eg, generating a new
   manifest for each individual change during a batch run could be
   expensive, would prefer to batch up the changes into a single
   manifest run); and

3) Coherency issues for the RPKI engine (don't want to defer things
   that could result in loss of state if something bad happens).

Considerations (1) and (3) have to dominate, which may mean we take a
hit on (2).

Most of the explicit calls to sql_fetch*() are now encapsulated in
one-line methods.  The remaining ones are probably hints at minor bits
of abstraction still to be done.

Biz certs currently used by test scripts don't include SKI or AKI.  I
think this is because the test scripts use "openssl x509" rather than
"openssl ca" when generating these certs.  Not critical, and will
probably become completely irrelevant with all-singing all-dancing
post-Amsterdam biz cert scripts, but should not be a big problem to
fix either if it gets in the way again.