aboutsummaryrefslogtreecommitdiff
path: root/rpkid/README
blob: 46b1aacc335b7536999ec8db186ce8568733d30b (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
$Id$ -*- Text -*-

Python RPKI production tools.

Requires Python 2.5.

External Python packages required:

- lxml, which in turn requires the libxml2 C libraries.

  http://codespeak.net/lxml/

  FreeBSD: /usr/ports/devel/py-lxml
  Fedora:  python-lxml.i386

- MySQLdb, which in turn requires MySQL client and server.  I'm
  testing with MySQL 5.1.

  http://sourceforge.net/projects/mysql-python/

  FreeBSD: /usr/ports/databases/py-MySQLdb
  Fedora:  MySQL-python.i386

- TLSLite, which pulls in other crypto packages.

  http://trevp.net/tlslite/

  FreeBSD: /usr/ports/security/py-tlslite

- Cryptlib is no longer required.

- Eventually I expect that this will require an event-handling package
  like Twisted, but I'm not there yet.

- The testpoke tool (up-down protocol command line test client) and
  testbed tools also uses PyYAML.

  http://pyyaml.org/

  FreeBSD: /usr/ports/devel/py-yaml

We also use a hacked copy of the Python OpenSSL Wrappers (POW)
package, but our copy has enough modifications that it's expanded in
the Subversion tree.  Depending on how this all works out, I may end
up splitting the POW.pkix module out of the POW package and using it
with Cryptlib, as the POW.pkix package is 98% about doing ASN.1 in
pure Python and only 2% about any kind of crypto.



$Revision$

TO DO:

      - Update business trust anchor model to what was defined in Amsterdam. This
	was a direct result of security review by Kent and Housley.

	This is probably not a lot of coding, probably a few extra certificate
	fields that need to be passed in when verifying CMS or TLS.  So far the
	existing TA fields in various objects have become pairs of certificates
	instead of a TA, but they're not yet tied into a real single TA.   We
	may also need a cert or two in the <self/> object so that we can tie
	everything together into a single TA for the entire RPKI engine instance.

	PRIORITY: Required for pilot (security issue)

	TIME REQUIRED: Two weeks.

	STATUS: Started


      - rcynic handling of RPKI trust anchors needs updating, per discussions
	over previous months of how RPKI trust anchors work, how we package them,
	and how we roll them over. The last (TA rollover) is the driver for this.

	APNIC is now proposing a CMS-signed ASN.1 blob containing a version
	number and an RPKI certificate.  Kent and Housley have not bought into
	this yet.  Need to do analysis to make sure this is adequate for our
	needs, if so just use it.  This would involve minor changes to rcynic.

	PRIORITY: Required for pilot (usability issue for relying parties)

	TIME REQUIRED: One week.

	STATUS: Not started


      - Publication protocol and implementation thereof. Desirable although not
	strictly required that protocol be agreed upon among the RIRs. Tricky bit
	is making sure that repository receives enough information to know whether
	parent has authorized child to use parent's namespace in nesting case; in
	theory this is straightforward but requires careful checking.

	ARIN can't host output of non-hosted RPKI engines without this, and that's
	critical both to the security model as discussed with ARIN staff in late 2006,
	hence this is a required capability even for testing.

	PRIORITY: Required for pilot

	TIME REQUIRED: 3-4 weeks for implementation once protocol settled, depending on
	how much of the existing left-right protocol design and implementation can be
	reused.

	STATUS: Started


      - Resource subsetting (req_* attributes in up-down protocol), minimal 
	implementation.    Recognize this as correct protocol and signal an
	internal server error if ever used.

	PRIORITY: Required for pilot.

	TIME REQUIRED: Two days

	STATUS: Not started


      - ROA generation code.  First cut at this seems to work and output looks
        right, but this hasn't been tested properly yet due to lack of a ROA
        validation tool.

	For reasons that presumably made sense at the time, the
	left-right protocol for route_origin objects allows ranges as
	well as prefixes, and the SQL for stores everything as ranges,
	which is nice and general...except that ROAs can only hold
	prefixes.  So left-right schema should only allow prefixes,
	and SQL should only store prefixes.

	PRIORITY: Required for pilot

	TIME REQUIRED: One week (remaining)

	STATUS: Started


      - rcynic does not yet handle manifests. This is both a real problem
	(manifests were added to plug a security hole) and a user acceptance
	problem (without manifest support rcynic checks old certs that are supposed
	to fail because they've been revoked, resulting in what appear to be
	spurious errors, which just annoy the user).

	PRIORITY: Required for pilot

	TIME REQUIRED: Two weeks.

	STATUS: Not started


      - User validation tool: fetch and validate certs and ROA for a prefix that
	the user wants to accept in a router filter the user is building. This
	probably uses rcynic's output as one of its inputs.

	PRIORITY: Required

	DEPENDS ON: ROA generation

	TIME REQUIRED: 1-2 weeks

	STATUS: Not started


      - Make rpkid fully event-driven (async tasking model), except for SQL
	queries. This probably involves the "twisted" framework.

	PRIORITY: Required (to implement scalable hosting model)

	TIME REQUIRED: Two weeks.

	STATUS: Not started


      - Error handling: make sure that exceptions map correctly to up-down error
	codes, flesh out left-right error codes. Note that the same exception may
	produce different error codes depending on which up-down PDU we're
	processing (sigh).

	Will require code audit for coherency, which is most of the work.

	PRIORITY: Required

	TIME REQUIRED: Two weeks

	DEPENDS ON: almost everything else, as almost any code change can raise new
	exceptions that we'd need to handle.

	STATUS: Not started


      - db.commit(), db.rollback(), code audit for data integrity issues, fix any
	data integrity issues that turn up. Among other issues, need to handle loss
	of connection to database server and other MySQL errors. Need to be careful
	about recovery action depending on whether we had uncommitted changes.

	PRIORITY: Required

	TIME REQUIRED (commit and rollback): 3-4 weeks

	TIME REQUIRED (data integrity audit): 1 week

	TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results
	of runtime testing.

	DEPENDS ON: async tasking model rollback.

	STATUS: Not started


      - Test framework for multiple self-instances per engine-instance (single
	self-instance per engine-instance is already done).

	PRIORITY: Required for testing

	DEPENDS ON: Async tasking model.

	TIME REQUIRED: One week

	STATUS: Not started


      - Current TLS code (tlslite) appeared to be flakey under heavy use back
	in November, and doesn't support all the required certificate
	checks out of the box.

	Certificate checker has now been replaced with something based on
	OpenSSL/POW, and the result seems to work.  If the TLS code itself is
	still unstable, best bet would be to replace it with a Tls class cloned
	from the existing POW Ssl class; the current Ssl class isn't adaquate
	either, but there's documentation (eg, the O'Reilly OpenSSL book) that
	explains in some detail what this code would need to do.

	PRIORITY: Required for pilot (cert checking is a security issue).

	TIME REQUIRED: 3-4 weeks

	DEPENDS ON: Async tasking model.

	STATUS: Not started


      - Resource subsetting (req_* attributes in up-down protocol), full
        implementation.  Requires expanding SQL child_cert table to hold subset
        masks and rewriting a fair amount of code.

	PRIORITY: Required for full implementation.

	TIME REQUIRED: 3-4 weeks

	STATUS: Not started


      - Performance testing

	STATUS: Not started


      - Clean up rootd.py to be usable in a production system. Most urgent issue is
	handling of private keys. May not need much else, as this is not a
	high-traffic server.

	PRIORITY: Highly desirable (not strictly needed for pilot testing)

	TIME REQUIRED: One week

	STATUS: Not started


      - Update internals docs (Doxygen). Mostly this means updating function
	comments in the Python code, as the rest is automatic. May require a bit of
	overview text to explain the workings of the code, this overview text may
	well turn out to be just the current flat text documents marked up for
	inclusion by Doxygen.

	PRIORITY: Desirable

	TIME REQUIRED: One week.

	STATUS: Ongoing


      - Reorganize code (directory names, module names, which objects are in which
	modules, add gctx pointers to objects to avoid passing explicit gctx
	pointers in almost every function call) to make it easier to understand and
	maintain. Portions of the existing code were done in extreme haste to meet
	testing deadlines, and it shows.

	PRIORITY: Highly desirable

	TIME REQUIRED: One week.

	STATUS: Explicit gctx eradication done; much file renaming done; other
	stuff not started.


      - Add HSM support. Architecture includes it, current code does not. First
	step here would be talking to somebody with strong understanding of PKCS#
	11.

	PRIORITY: Desirable, not required for pilot

	TIME REQUIRED: Unknown

	STATUS: Not started


      - Installation packaging, so that rpkid can be built and installed like a
	normal package.

	PRIORITY: Desirable

	TIME REQUIRED: One week, longer if installation for many platforms is
	required

	STATUS: Not started


      - Tighten up syntax checking in left-right schema.

	PRIORITY: Desirable

	TIME REQUIRED: One day.

	STATUS: Not started


      - Rethink exposing SQL primary indices in protocols. Right now,
	auto-incremented SQL indices are used in many places in the left-right
	protocol, and are even exposed in a few places in our implementation of the
	up-down protocol. This is nice and unique but may be operationally fragile,
	since up-down usage means that URLs contain mechanically assigned
	identifiers rather than an identifier negotiated between the two parties
	during contract setup.

	The RIPE NCC suggested that we should instead use something like a hash of the
	client's name, which would be probabilistically unique, would not expose
	information, but would be stable even if we had to rebuild the database.

	PRIORITY: Rethinking desirable; reworking unknown

	TIME REQUIRED: One week to evaluate. Implementation time if we decide to make a
	change unknown, but probably on the order of another week.

	STATUS: Not started


      - Common protocol dump format with APNIC and other implementors so we can
	exchange protocol dumps.

	PRIORITY: Desirable

	TIME REQUIRED: Two days

	STATUS: Not started


      - IETF SIDR WG is still talking about ROAs with multiple signatures. No
	obvious need for this but IETF may mandate it anyway. Full implementation
	would require significant work revising current SQL table relations and
	upgrading CMS support.

	PRIORITY: Minimal, IETF feeping creaturism

	TIME REQUIRED: Unknown

	STATUS: Not started


      - Deaddrop of incoming messages, for audit.  Absent a better theory,
        steal existing tech for this: preface with minimal RFC 2822 header
	and drop it into a Maildir folder using built-in Python Maildir
	library code, at which point it becomes soebody else's problem.

	STATUS: Not started

	PRIORITY: Desirable, trivial to implement.



Other random notes:

Being able to specify interaction with other servers (not running
under testbed) in a testbed.yaml might be useful for interop tests.
Kind of breaks testbed's fundamental model, though.  Replacing what
testbed thinks is a leaf with somebody else would be easy, so maybe we
could specify some way to hang a bunch of rpkids under an external
parent?  Hmm, data needed would look a lot like testpoke.yaml, maybe
we can reuse some of that language?

There's a three-way tradeoff lurking in the publication protocol,
manifest generation, and CRL generation:

1) Consistancy issues for relying parties (eg, don't want to withdraw
   something that's still listed in the manifest);

2) Efficiency issues for the RPKI engine (eg, generating a new
   manifest for each individual change during a batch run could be
   expensive, would prefer to batch up the changes into a single
   manifest run); and

3) Coherency issues for the RPKI engine (don't want to defer things
   that could result in loss of state if something bad happens).

Considerations (1) and (3) have to dominate, which may mean we take a
hit on (2).

Most of the explicit calls to sql_fetch*() are now encapsulated in
one-line methods.  The remaining ones are probably hints at minor bits
of abstraction still to be done.

Biz certs currently used by test scripts don't include SKI or AKI.  I
think this is because the test scripts use "openssl x509" rather than
"openssl ca" when generating these certs.  Not critical, and will
probably become completely irrelevant with all-singing all-dancing
post-Amsterdam biz cert scripts, but should not be a big problem to
fix either if it gets in the way again.