rpkid/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259

$Id$ -*- Text -*-

Python RPKI production tools.

Requires Python 2.5.

See doc/Installation for installation instructions and required
packages.


$Revision$

TO DO:


      * Make rpkid fully event-driven (async tasking model), except for SQL
	queries. This may involve the "twisted" framework.

	TIME REQUIRED: Two weeks.

	STATUS: Not started

      * Error handling: make sure that exceptions map correctly to up-down error
	codes, flesh out left-right error codes. Note that the same exception may
	produce different error codes depending on which up-down PDU we're
	processing (sigh).

	Will require code audit for coherency, which is most of the work.

	TIME REQUIRED: Two weeks

	DEPENDS ON: almost everything else, as almost any code change can raise new
	exceptions that we'd need to handle.

	STATUS: Not started

      * db.commit(), db.rollback(), code audit for data integrity issues, fix any
	data integrity issues that turn up. Among other issues, need to handle loss
	of connection to database server and other MySQL errors. Need to be careful
	about recovery action depending on whether we had uncommitted changes.

	TIME REQUIRED (commit and rollback): 3-4 weeks

	TIME REQUIRED (data integrity audit): 1 week

	TIME REQUIRED (fix data integrity): Unknown, depends on code audit and results
	of runtime testing.

	DEPENDS ON: async tasking model rollback.

	STATUS: Not started

      * Test framework for multiple self-instances per engine-instance (single
	self-instance per engine-instance already done).

	DEPENDS ON: Async tasking model.

	TIME REQUIRED: One week

	STATUS: Not started

      * Current TLS code (tlslite) is flakey and slow.  Unless I can
        find a good Python TLS interface that somebody else is
        maintaining, best option would be to add TLS support to POW.

	TIME REQUIRED: 3-4 weeks

	DEPENDS ON: Async tasking model.

	STATUS: Not started

      * Resource subsetting (req_* attributes in up-down protocol), full
        implementation.  Requires expanding SQL child_cert table to hold subset
        masks and rewriting a fair amount of code.

	TIME REQUIRED: 3-4 weeks

	STATUS: Not started

      * Performance testing.  Some very preliminary tests show a
        hotspot in the TLS code, but further testing will be needed,
        particularly after the async tasking model change.

	STATUS: Barely started

      * Clean up rootd.py to be usable in a production system.  Most
	urgent issues are handling of private keys and publishing
	outputs in pubd. May not need much else, as this is not a
	high-traffic server.

	TIME REQUIRED: One week

	STATUS: Not started

      * Update internals docs (Doxygen). Mostly this means updating
	function comments in the Python code, as the rest is
	automatic.  May require a bit more overview text to explain
	the workings and usage of the code.

	TIME REQUIRED: One week.

	STATUS: Ongoing

      * Add HSM support. Architecture includes it, current code does
	not.  First step here would be talking to somebody with strong
	understanding of PKCS# 11.

	TIME REQUIRED: Unknown

	STATUS: Not started

      * Installation packaging, so that rpkid can be built and installed like a
	normal package.

	TIME REQUIRED: One week, longer if installation for many platforms is
	required

	STATUS: Not started

      * Tighten up syntax checking in left-right schema.

	TIME REQUIRED: One day.

	STATUS: Not started

      * Rethink exposing SQL primary indices in protocols. Right now,
	auto-incremented SQL indices are used in many places in the
	left-right protocol, and are even exposed in a few places in
	our implementation of the up-down protocol.  This is nicely
	unique but may be operationally fragile, since up-down usage
	means that URLs contain mechanically assigned identifiers
	rather than an identifier negotiated between the two parties
	during contract setup.

	Review by RIPE NCC staff suggested that we should instead use
	something like a hash of the client's name, which would be
	probabilistically unique and would not expose information, but
	would be stable even if we had to rebuild the database.

	TIME REQUIRED: One week to evaluate. Implementation time if we decide to make a
	change unknown, but probably on the order of another week.

	STATUS: Not started

      * IETF SIDR WG is still talking about ROAs with multiple
	signatures. No obvious need for this but IETF may mandate it
	anyway.  Full implementation would require significant work
	revising current SQL table relations and upgrading CMS
	support, and would also require nontrivial rewrite of rcynic.

	TIME REQUIRED: Unknown

	STATUS: Not started

      * rcynic handling of RPKI trust anchors does not yet match most
        recent agreement by design team.  Currently waiting for an OID
        assignment for the CMS-wrapped indirection format that the
        design team settled on.

	TIME REQUIRED: Three days

	DEPENDS ON: OID assignment

	STATUS: Not started

      * Publication protocol ACL checking may need revisiting.  Tricky
        bit is making sure that repository receives enough information
        to know whether parent has authorized child to use parent's
        namespace in nesting case; in theory this is straightforward
        but requires careful checking.  Current implementation just
        uses a configured path check and does not attempt to trace
        back to permission from parent in nested publication case.
        Class and method design is intended to make it easy to drop in
        additional checks if needed.

	STATUS: Trivial version (required path check) done.

      * Deaddrop of incoming messages, for audit.  Absent a better
        theory, steal existing tech for this: preface with minimal RFC
        2822 header and drop it into a Maildir folder using built-in
        Python Maildir library code, at which point it becomes soebody
        else's problem.

	STATUS: Not started

      * Investigate using EKU (RFC 3280 4.2.1.13) as an alternative to
        wiring in BPKI EE certs for left-right protocol.

	STATUS: Not started

      * Rethink current ROA generation scheme: why are we pushing
        <route_origin/> objects into rpkid instead of letting rpkid
        pull data from irdbd as it does for resources?  Is there a
        better way to represent proto-ROA data in SQL so that we can
        query directly for the resources we need (NB: this might
        require rethinking other rpkid SQL tables)?

	STATUS: Not started

      * Really need scripts and better doc on BPKI setup.

	TIME REQUIRED: One week

	STATUS: Not started

      * Testing of this by anybody but the author and a few friends is
        going to require some kind of user interface.  Python based
        web UI is probably the most cost effective approach, Django
        might be a good base for this.  Some of the operations
        suggested in an initial brainstorming session on this are
        outside the scope of what rpkid currently knows how to do (eg,
        signing S/MIME "please route" messages), so one of the tasks
        here is to see if trying to write a user interface sheds light
        on required features that are currently missing.

	STATUS: Not started

      * At present there is no mechanism by which an IRBE could
        request signing of objects other than ROAs.  Eg, there has
        been some discussion of signing S/MIME letters to humans
        asking for routing, as an alternative to ROAs.  If we decide
        to support this at all, it turns into a generalization of the
        ROA problem, and suggests that perhaps ROA generation should
        be handled somewhere outside of rpkid and only passed to rpkid
        for signing.  This would be a significant change to the
        architecture, as it would remove rpkid's responsibility for
        keeping ROAs up to date.

	STATUS: Not started


Other random notes:

Being able to specify interaction with other servers (not running
under testbed) in a testbed.yaml might be useful for interop tests.
Kind of breaks testbed's fundamental model, though.  Replacing what
testbed thinks is a leaf with somebody else would be easy, so maybe we
could specify some way to hang a bunch of rpkids under an external
parent?  Hmm, data needed would look a lot like testpoke.yaml, maybe
we can reuse some of that language?

There's a three-way tradeoff lurking in the publication protocol,
manifest generation, and CRL generation:

1) Consistancy issues for relying parties (eg, don't want to withdraw
   something that's still listed in the manifest);

2) Efficiency issues for the RPKI engine (eg, generating a new
   manifest for each individual change during a batch run could be
   expensive, would prefer to batch up the changes into a single
   manifest run); and

3) Coherency issues for the RPKI engine (don't want to defer things
   that could result in loss of state if something bad happens).

Considerations (1) and (3) have to dominate, which may mean we take a
hit on (2).