3 # Turn a wordlist into a cdb file.
5 # cdb is a format invented by Dan Bernstein for fast, constant databases. The
6 # database is fixed during creation and cannot be changed without rebuilding
7 # it, and is optimized for very fast access. This program takes as input a
8 # wordlist file (a set of words separated by newlines) and turns it into a cdb
9 # file with the words as keys and the constant "1" as a value. The resulting
10 # database is suitable for fast existence lookups in the wordlist, such as for
11 # password dictionary checks.
17 use File::Basename qw(basename);
18 use Getopt::Long qw(GetOptions);
20 # The path to the cdb utility, used to create the final database. By default,
21 # the user's PATH is searched for cdb.
24 # print with error checking and an explicit file handle.
26 # $fh - Output file handle
27 # @args - Remaining arguments to print
30 # Throws: Text exception on output failure
33 print {$fh} @args or croak('print failed');
37 # Always flush output.
40 # Clean up the script name for error reporting.
42 local $0 = basename($0);
44 # Parse the argument list.
45 my ($ascii, $max_length, $min_length, $manual);
46 Getopt::Long::config('bundling', 'no_ignore_case');
49 'max-length|L=i' => \$max_length,
50 'min-length|l=i' => \$min_length,
51 'manual|man|m' => \$manual
54 print_fh(\*STDOUT, "Feeding myself to perldoc, please wait...\n");
55 exec('perldoc', '-t', $fullpath);
58 die "Usage: cdbmake-wordlist <wordlist>\n";
62 # The output file goes in the same directory and is named the same as the
63 # input but with .data appended.
64 my $output = $input . '.data';
66 die "$0: temporary output file $output already exists\n";
69 # Process the input file into the output file, converting it to cdb input
71 open(my $in, '<', $input)
72 or die "$0: cannot open input file $input: $!\n";
73 open(my $out, '>', $output)
74 or die "$0: cannot create output file $output: $!\n";
75 while (defined(my $word = <$in>)) {
77 my $length = length($word);
78 next if (defined($min_length) && $length < $min_length);
79 next if (defined($max_length) && $length > $max_length);
81 next if $word =~ m{ [^[:ascii:]] }xms;
82 next if $word =~ m{ [[:cntrl:]] }xms;
84 print_fh($out, "+$length,1:$word->1\n");
87 close($in) or die "$0: cannot read all of input file $input: $!\n";
88 close($out) or die "$0: cannot write to output file $output: $!\n";
90 # Run cdb to turn the result into a constant database. Ignore duplicate keys.
91 system($CDB, '-c', '-u', "$input.cdb", $output) == 0
92 or die "$0: cdb -c failed\n";
94 # Remove the temporary file.
95 unlink($output) or die "$0: cannot remove temporary file $output: $!\n";
102 cdbmake-wordlist cdb whitespace wordlist lookups lookup sublicense
103 MERCHANTABILITY NONINFRINGEMENT krb5-strength --ascii Allbery
107 cdbmake-wordlist - Create a cdb database from a wordlist
111 B<cdbmake-wordlist> [B<-am>] [B<-l> I<min-length>] [B<-L> I<max-length>]
116 cdb is a format invented by Dan Bernstein for fast, constant databases.
117 The database is fixed during creation and cannot be changed without
118 rebuilding it, and is optimized for very fast access. This program takes
119 as input a wordlist file (a set of words, possibly including whitespace,
120 separated by newlines) and turns it into a cdb file with the words as keys
121 and the constant C<1> as a value. The resulting database is suitable for
122 fast existence lookups in the wordlist, such as for password dictionary
125 B<cdbmake-wordlist> takes one argument, the input wordlist file. The
126 output cdb database will have the same name as I<wordlist> but with
127 C<.cdb> appended. The input wordlist file does not have to be sorted.
133 =item B<-a>, B<--ascii>
135 Filter all words that contain non-ASCII characters or control characters
136 from the resulting cdb file, leaving only words that consist solely of
137 ASCII non-control characters.
139 =item B<-L> I<maximum>, B<--max-length>=I<maximum>
141 Filter all words of length greater than I<maximum> from the resulting cdb
142 database. The length of each line (minus the separating newline) in the
143 input wordlist will be checked against I<minimum> and will be filtered out
144 of the resulting database if it is shorter. Useful for generating
145 password dictionaries from word lists that contain random noise that's
146 highly unlikely to be used as a password.
148 The default is to not filter out any words for maximum length.
150 =item B<-l> I<minimum>, B<--min-length>=I<minimum>
152 Filter all words of length less than I<minimum> from the resulting cdb
153 database. The length of each line (minus the separating newline) in the
154 input wordlist will be checked against I<minimum> and will be filtered out
155 of the resulting database if it is shorter. Useful for generating password
156 dictionaries where shorter passwords will be rejected by a generic length
157 check and no dictionary lookup will be done for a transform of the password
158 shorter than the specified minimum.
160 The default is not to filter out any words for minimum length.
162 =item B<-m>, B<--man>, B<--manual>
164 Print out this documentation (which is done simply by feeding the script to
171 Russ Allbery <eagle@eyrie.org>
173 =head1 COPYRIGHT AND LICENSE
175 Copyright 2013 The Board of Trustees of the Leland Stanford Junior
178 Permission is hereby granted, free of charge, to any person obtaining a
179 copy of this software and associated documentation files (the "Software"),
180 to deal in the Software without restriction, including without limitation
181 the rights to use, copy, modify, merge, publish, distribute, sublicense,
182 and/or sell copies of the Software, and to permit persons to whom the
183 Software is furnished to do so, subject to the following conditions:
185 The above copyright notice and this permission notice shall be included in
186 all copies or substantial portions of the Software.
188 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
189 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
190 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
191 THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
192 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
193 FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
194 DEALINGS IN THE SOFTWARE.
200 The cdb file format is defined at L<http://cr.yp.to/cdb.html>.
202 The current version of this program is available from its web page at
203 L<http://www.eyrie.org/~eagle/software/krb5-strength/> as part of the
204 krb5-strength package.