Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0...
-
Upload
milo-owens -
Category
Documents
-
view
212 -
download
0
Transcript of Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0...
1
sd&msoftware design & managementGmbH & Co. KGThomas-Dehler-Straße 2781737 MünchenTelefon (0 89) 6 38 12-0Telefax (0 89) 6 38 12-150
http://www.sdm.de
Module “Bit::Vector”
“Bit::Vector - more than the name suggests”
Steffen Beyer
YAPC::Europe, London, UK,
ICA, September 22-24 2000
sd&m 2
Agenda
• What does it do?
• Purpose(s)
• Summary of available methods
• Characteristics
• Alternatives
• Some Applications
• Questions & Answers, Suggestions
sd&m 3
What does it do?
The Bit::Vector module implements bit arrays of arbitrarysize.
Not very sexy, you may think.
But actually bit vectors are the base of all computationsperformed by a computer!
Your CPU calls them "processor registers"...
By the way, is everybody familiar with two's complementbinary representation and arithmetics?
sd&m 4
Purpose(s)
• Efficient storage and handling of bit arrays
• Extend your CPU to any desired number of bits
• Efficient set operations
• Efficient big integer arithmetic
sd&m 5
Summary of available methods (See file "BitVector.txt")
• Especially interesting methods:
– "Interval_Substitute()"(is to bit vectors what "splice" is to Perl arrays)
– "Interval_Scan_...()"(finds contiguous blocks of set bits)
– "Chunk_...()"(allows access to packets of bits at a time of chooseable size)
– "...Reverse()"(same to bit vectors as Perl's "reverse" for strings)
sd&m 6
Characteristics (1/3)
• Internally written in C (thus fast)
• Relies on CPU's machine word operations for maximum speed
• Auto-adapts to size of machine word at runtime
• Uses efficient algorithms (mostly "divide-and-conquer"), time complexity of many functions O(1), O(n), O(n ld n)
• C library at the core can also be used stand-alone (without Perl)
• Free Software (GPL+Artistic), C library also LGPL
sd&m 7
Characteristics (2/3) - Efficient Algorithms
• Example: Exponentiation (xk)E.g. 2713 (base 10) k = 13 = 27*27*27*27*27*27*27*27*27*27*27*27*27
= 110111101 (base 2) n = int(ld k) = 3 = (110118)1 * (110114)1 * (110112)0 * (110111)1
Worst case: 2n multiplications = O(n) = O(ld k)instead of k - 1 = O(k) – here: only 5 instead of 12
• Example: Conversion to decimal representationDivides bit vector modulo largest power of 10 fittinginto a machine word, then uses machine word mathoperations to break remainder down further
• Example: Bit counting (number of set bits)
sd&m 8
Characteristics (3/3)
• Object-oriented interface, e.g.
$vec1->intersection($vec2,$vec3);
• Optionally(*) provides overloaded operators
– one set of operands for set operations, e.g.
$set1 = $set2 & $set3;
– one set of operands for big integer math, e.g.
$bigsum += $bigint;
(*): will be optional in version 6.0 (for improved loading speed of "plain" module), is always loaded now
sd&m 9
Alternatives (1/2)
• vec()– confusing– insufficiently powerful for many applications
• PDL– complicated– designed primarily for astronomical data analysis and
heavy duty number crunching (written in C, internally)
• Math::PARI– very powerful– requires separate C library "PARI"
• Math::BigInt (is in the Core of Perl 5.6)– slow (written entirely in Perl, stores digits in Perl arrays)
• Math::BigInteger– unmaintained, doesn't compile (uses XS and a C library)
sd&m 10
Alternatives (2/2)
• Set::Bag - implements multisets
• Set::IntSpan - optimized for .newsrc file type sets (also supported by Bit::Vector, but need more memory)
• Set::Object - implements sets of arbitrary objects (can be simulated with Bit::Vector using lookup table, set operations will then be faster)
• Set::Scalar - similar to Set::Object (?), but also allows recursion (set of sets)
• Set::Window - optimized for intervals of integers (needs much less memory than Bit::Vector, but only of limited use since the whole interval is either in or out)
sd&m 11
Simulating Set::Object using lookup table
• See file "SetObject.pl"
sd&m 12
Some Applications
• Set::IntRange - sets of integers (universe = some interval)
• Math::MatrixBool - useful for graph algorithms(e.g. shortest paths / Kleene's Algorithm)
• Slice (multiple document version generator)
• Parse table generators for compiler-compilers à la "yacc" (calculating first, follow & lookahead character sets)
• Cryptography
• Easy manipulation of data (files), any number of bits at a time
sd&m 13
Application "Slice"
• See
– homepage screenshot "Slice.bmp"
– file "file.in"
– file "Slice.txt"
– file "file.html.en.OK"
– file "file.html.de.OK"
– URL http://www.engelschall.com/sw/slice/
sd&m 14
Application "Date::Calc" v5.0 (coming soon)
• Stores years in bit vectors (one year = one bit vector, one day = one bit)
• Bit is "on" if corresponding day is a holiday
• Performs calculations taking holidays into account
sd&m 15
Questions & Answers, Suggestions
• Please feel free to ask!
• Suggestions are welcome.