1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
|
$Id$ -*- Text -*-
Python RPKI production tools.
Requires Python 2.5.
External Python packages required:
- lxml, which in turn requires the libxml2 C libraries.
http://codespeak.net/lxml/
FreeBSD: /usr/ports/devel/py-lxml
- MySQLdb, which in turn requires MySQL client and server. I'm
testing with MySQL 5.1.
http://sourceforge.net/projects/mysql-python/
FreeBSD: /usr/ports/databases/py-MySQLdb
- TLSLite, which pulls in other crypto packages.
http://trevp.net/tlslite/
FreeBSD: /usr/ports/security/py-tlslite
- Cryptlib, at the moment just to support TLSlite but may end up using
it for other things later.
http://www.cs.auckland.ac.nz/~pgut001/cryptlib/
FreeBSD: /usr/ports/security/cryptlib
...but the FreeBSD port doesn't (yet?) install the Python bindings,
sigh, so at the moment you have to do that by hand:
# cd /usr/ports/security/cryptlib
# make install
# cd work/bindings
# python setup.py install
# cd ../..
# make clean
- Eventually I expect that this will require an event-handling package
like Twisted, but I'm not there yet.
- The testpoke tool (up-down protocol command line test client) and
testbed tools also uses PyYAML.
http://pyyaml.org/
FreeBSD: /usr/ports/devel/py-yaml
We also use a hacked copy of the Python OpenSSL Wrappers (POW)
package, but our copy has enough modifications that it's expanded in
the Subversion tree. Depending on how this all works out, I may end
up splitting the POW.pkix module out of the POW package and using it
with Cryptlib, as the POW.pkix package is 98% about doing ASN.1 in
pure Python and only 2% about any kind of crypto.
TO DO:
- Scripted tests to grow and shrink and revoke and .... See
testbed.*.yaml, but more systematic testing needed.
PRIORITY: Required
TIME REQUIRED: as needed, open-ended
STATUS: Ongoing
- Randy's "user validation tool" (fetch and validate certs and
probably the ROA for a prefix I want to accept in a route filter I
am building in Python/Perl). This probably uses rcync's output as
one of its inputs.
This is a basic tool for a sysadmin who wants to -use- all this crud
we're working so hard to generate. It's not required for the
generation tools to work, but without it the entire toolset does
nothing obviously useful, which will make it a very hard sell during
the limited public test stage.
PRIORITY: Required
DEPENDS ON: ROA generation
TIME REQUIRED: three days
STATUS: Not started
- Common protocol dump format with APNIC and other implementors so we
can read each other's dumps. "Obvious" format would be an
OpenSSL-style PEM of the CMS, with a "text" portion (the place where
"openssl x509 -text" would put a text dump of a cert) showing the
wrapped XML.
PRIORITY: Desirable
TIME REQUIRED: one day
STATUS: Not started
- Clean unused cruft out of left-right protocol, or at least have
control booleans we don't intend to implement at present signal an
error if used.
Bottleneck here has been deciding what to punt and what to
implement. Removing unused booleans or raising errors when they're
used is trivial.
PRIORITY: Required
TIME REQUIRED: Less than one day
STATUS: Started
- resource_set_notafter attribute added to RelaxNG but not yet to
rpki.up_down.class_elt. Need to convert to and from
rpki.sundial.datetime. This is an up-down protocol feature that was
added fairly late and that none of us properly implement yet, but
failing to handle it would be a spec violation and eventually cause
an interop problem.
PRIORITY: Required
TIME REQUIRED: Less than one day
STATUS: Not started
- Publication protocol and implementation thereof. Protocol design
started, Randy had comments that sent me back to the drawing board
(he was right). Next step is to integrate Randy's advice, which
probably means picking up more of the left-right protocol framework.
Desirable although not strictly required that protcol be agreed upon
among the RIRs. Might not be practical given how long it takes
group to decide anything.
Tricky bit is making sure that repository receives enough
information to know whether parent has authorized child to use
parent's namespace in nesting case. In theory this is
straightforward but requires careful checking.
ARIN can't host output of non-hosted RPKI engines without this, and
that's critical both to the security model as discussed with ARIN
staff in late 2006, so I believe we need this capability even as
part of the initial limited test.
PRIORITY: Required
TIME REQUIRED: 1-2 weeks for implementation once protocol settled,
depending on how much of the protocol and implementation I can steal
from the existing left-right protocol.
STATUS: Started
- Subsetting (req_* attributes in up-down protocol)
Minimal implementation would be to recognize this as correct
protocol and signal an internal server error if it's ever used.
More serious implementation would require expanding SQL child_cert
table to hold subset masks and tweaking almost every bit of code
that touches that table.
PRIORITY: Required
TIME REQUIRED (minimal version): One day
TIME REQUIRED (real version): 1-2 weeks
STATUS: Not started
- Error handling: make sure that exceptions map correctly to up-down
error codes, flesh out left-right error codes. Note that the same
exception may produce different error codes depending on which
up-down PDU we're processing (sigh).
Will require code audit for coherency.
TIME REQUIRED: four days
DEPENDS ON: almost everything else, as almost any code change can
raise new exceptions that we'd need to handle.
STATUS: Not started
- db.commit(), db.rollback(), and related data integrity issues.
TIME REQUIRED: two weeks for commit and rollback. Data integrity
fuzzier.
DEPENDS ON: async tasking model, sort of -- could do it first, but
tasking change will affect the exception handling that triggers
rollback.
STATUS: Not started
- Test with larger data set -- Tim gave me plenty of data, I have the
low-level tools and the glue logic to create child objects for all
the entities in the IRDB, but I don't yet have logic to poll on
behalf of each of them and check result for sanity. Maybe it'd be
easier to write something that dumps Tim's database in YAML format
for testbed.py to chew on?
STATUS: Not started
- Clean up rootd.py to be usable in a production system. Most urgent
issue is handling of private keys. May not need much else, as this
is not a high-traffic server.
STATUS: Not started
- Handle loss of connnection to database server and other MySQL
errors. MySQLdb throws an exception, which we can catch, and
retrying is easy enough, but need to be a bit careful about recovery
action depending on whether we had uncommitted changes.
STATUS: Not started
- Test framework, multiple self-instances per engine-instance.
DEPENDS ON: async tasking model.
STATUS: Not started
- tlslite code seems flakey under heavy use, and doesn't support all
the cert checks we want. Best bet for getting this right is
probably to hack on the POW Ssl class until it supports everything
shown in the OpenSSL book; aside from speed, the main advantage here
is that there -is- a list of all the things one needs to do to use
TLS properly if one follows this recipe, whereas with TLSlite it's
all a mystery.
Useful side effect of doing this via POW: it brings us back to only
needing one crypto library (in particular it lets us punt M2Crypto,
which appears to be coded as an accident waiting to happen).
TIME REQUIRED: one week.
DEPENDS ON: async tasking model.
STATUS: Not started
- ROA generation. We have a bunch of the primitives for this but we
aren't yet generating the ROAs themselves.
STATUS: Not started
- Make rpkid fully event-driven (async tasking model), except for SQL
queries. This probably involves the "twisted" framework.
TIME REQUIRED: one week.
STATUS: Not started
- Update biz trust anchor model to what we came up with in Amsterdam.
This has been waiting for work we hope RobK is doing. This is
probably not a lot of coding, probably a few extra cert fields in
the self object which we then need to toss into the
rpki.x509.X509_chain objects before verifying CMS or TLS, and
perhaps the existing TA fields in various objects become pairs of
certs instead of a single TA, but this is mostly just generalization
and reuse of existing code, no bold new adventures.
TIME REQUIRED: one week.
STATUS: Not started
- Performance testing
STATUS: Not started
- rcynic handling of RPKI trust anchors probably needs updating.
Discussions over last N months of how RPKI trust anchors work, how
we package them, and how we roll them over. Last I recall (need to
check email archives) APNIC had proposed a relatively simple format
(CMS signed PEM-encoded X.509 object set, or something like that).
Need to do analysis to make sure this is adaquate for our needs, if
so just use it. This would involve minor changes to rcynic.
Alternatively, this could be a separate program to keep this grot
out of rcynic itself, but that's probably a usability nightmare.
TIME REQUIRED: three days.
STATUS: Not started
- rcynic does not yet handle manifests. This is both a real problem
(manifests were added for a reason) and a user acceptance problem
(without manifest support rcynic checks old certs that are supposed
to fail because they've been revoked, resulting in what appear to be
spurious errors, which just annoy the user).
TIME REQUIRED: one week.
STATUS: Not started
- Update operation and installation docs.
Known current omissions: left-right "rekey" and "revoke" operations,
testbed.py's rootd_sia config option.
STATUS: Ongoing
- Update internals docs (Doxygen).
STATUS: Ongoing
- Reorganize code (directory names, module names, which objects are in
which modules, add gctx pointers to objects so we can stop passing
all these flipping explicit gctx pointers in almost every function
call) to make it easier to understand and maintain. Portions of the
existing code were done in extreme haste to meet testing deadlines,
and it shows.
STATUS: Not started
TIME REQUIRED: two days
PRIORITY: Highly desirable (to preserve programmers' and
maintainers' sanity, if nothing else)
- Add HSM support. Architecture includes it, current code does not.
STATUS: Not started
PRIORITY: Desirable. Am guessing ARIN does not require this for
initial test
Things implemented but not yet tested.
- Client side of expiration now assumes that parent will reissue
when its IRDB changes.
- Parent side of revocation (child_cert objects) and CRL generation
implemented.
- Parent side of expiration implemented.
- Child batch processing loop: regeneration or removal of expired
certs based on what's in the IRDB.
- Batch regeneration of CRLs and manifests for all CAs.
- Protection against up-down operations specifying a class_name that
belongs to some other self context.
- Rewrote code that handles revoke on shrink to revoke -all- old certs
for that key, not just most recent. Not certain, but this may have
been the cause of a cert dropping not showing up in the CRL during
testing with APNIC in Vancouver.
- Kludgy local publication hack seems to work now, including
withdrawal. rcynic still whines occasionally, but I think that's
just because, without manifest support, rcynic has no way of telling
the difference between certs we withdrew on purpose and certs that
were removed by an attacker, so the first rcynic run after a cert
has been revoked pulls the old cert from the previous rcynic pass,
find that it's listed in the CRL, and whines about it.
Other random notes:
Being able to specify interaction with other servers (not running
under testbed) in a testbed.yaml might be useful for interop tests.
Kind of breaks testbed's fundamental model, though. Replacing what
testbed thinks is a leaf with somebody else would be easy, so maybe we
could specify some way to hang a bunch of rpkids under an external
parent? Hmm, data needed would look a lot like testpoke.yaml, maybe
we can reuse some of that language?
There's a three-way tradeoff lurking in the publication protocol,
manifest generation, and CRL generation:
1) Consistancy issues for relying parties (eg, don't want to withdraw
something that's still listed in the manifest);
2) Efficiency issues for the RPKI engine (eg, generating a new
manifest for each individual change during a batch run could be
expensive, would prefer to batch up the changes into a single
manifest run); and
3) Coherency issues for the RPKI engine (don't want to defer things
that could result in loss of state if something bad happens).
Considerations (1) and (3) have to dominate, which may mean we take a
hit on (2).
Most of the explicit calls to sql_fetch*() are now encapsulated in
one-line methods. The remaining ones are probably hints at minor bits
of abstraction still to be done.
Biz certs currently used by test scripts don't include SKI or AKI. I
think this is because the test scripts use "openssl x509" rather than
"openssl ca" when generating these certs. Not critical, and will
probably become completely irrelevant with all-singing all-dancing
post-Amsterdam biz cert scripts, but should not be a big problem to
fix either if it gets in the way again.
|