Age | Commit message (Collapse) | Author |
|
May have finally gotten the right balance of indexes for basic use.
Use various optimizations to let us load large data sets before the
heat death of the universe. Some of these optimizations are
dangerous, in the sense that if this script crashes while constructing
the database, you'll have to rebuild the database from scratch.
Probably ought to offer both this and the slow-but-safe approach as
command line options, but:
- The speed improvements look to be worth at least an order of
magnitude in the runtime,
- The speed improvements also prevent all the fsync() calls in the
safe approach from turning the underlying filesystem into cream
cheese while the script is running, and
- This script is just a research anlysis tool to begin with.
So I think the risk is justified in this case.
svn path=/trunk/; revision=5934
|
|
seconds-since-epoch.
svn path=/trunk/; revision=5933
|
|
svn path=/trunk/; revision=5930
|
|
svn path=/trunk/; revision=5928
|
|
svn path=/trunk/; revision=5927
|
|
work of extracting and parsing before discovering that we've hit a
duplicate. Not sure what equivalent would be for Maildir (maybe
Message-ID?) so deferring that for now.
svn path=/trunk/; revision=5925
|
|
analysis.
svn path=/trunk/; revision=5923
|