Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0...

15
1 sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 http://www.sdm.de Module “Bit::Vector” “Bit::Vector - more than the name suggests” Steffen Beyer YAPC::Europe, London, UK, ICA, September 22-24 2000

Transcript of Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0...

Page 1: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

1

sd&msoftware design & managementGmbH & Co. KGThomas-Dehler-Straße 2781737 MünchenTelefon (0 89) 6 38 12-0Telefax (0 89) 6 38 12-150

http://www.sdm.de

Module “Bit::Vector”

“Bit::Vector - more than the name suggests”

Steffen Beyer

YAPC::Europe, London, UK,

ICA, September 22-24 2000

Page 2: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 2

Agenda

• What does it do?

• Purpose(s)

• Summary of available methods

• Characteristics

• Alternatives

• Some Applications

• Questions & Answers, Suggestions

Page 3: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 3

What does it do?

The Bit::Vector module implements bit arrays of arbitrarysize.

Not very sexy, you may think.

But actually bit vectors are the base of all computationsperformed by a computer!

Your CPU calls them "processor registers"...

By the way, is everybody familiar with two's complementbinary representation and arithmetics?

Page 4: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 4

Purpose(s)

• Efficient storage and handling of bit arrays

• Extend your CPU to any desired number of bits

• Efficient set operations

• Efficient big integer arithmetic

Page 5: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 5

Summary of available methods (See file "BitVector.txt")

• Especially interesting methods:

– "Interval_Substitute()"(is to bit vectors what "splice" is to Perl arrays)

– "Interval_Scan_...()"(finds contiguous blocks of set bits)

– "Chunk_...()"(allows access to packets of bits at a time of chooseable size)

– "...Reverse()"(same to bit vectors as Perl's "reverse" for strings)

Page 6: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 6

Characteristics (1/3)

• Internally written in C (thus fast)

• Relies on CPU's machine word operations for maximum speed

• Auto-adapts to size of machine word at runtime

• Uses efficient algorithms (mostly "divide-and-conquer"), time complexity of many functions O(1), O(n), O(n ld n)

• C library at the core can also be used stand-alone (without Perl)

• Free Software (GPL+Artistic), C library also LGPL

Page 7: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 7

Characteristics (2/3) - Efficient Algorithms

• Example: Exponentiation (xk)E.g. 2713 (base 10) k = 13 = 27*27*27*27*27*27*27*27*27*27*27*27*27

= 110111101 (base 2) n = int(ld k) = 3 = (110118)1 * (110114)1 * (110112)0 * (110111)1

Worst case: 2n multiplications = O(n) = O(ld k)instead of k - 1 = O(k) – here: only 5 instead of 12

• Example: Conversion to decimal representationDivides bit vector modulo largest power of 10 fittinginto a machine word, then uses machine word mathoperations to break remainder down further

• Example: Bit counting (number of set bits)

Page 8: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 8

Characteristics (3/3)

• Object-oriented interface, e.g.

$vec1->intersection($vec2,$vec3);

• Optionally(*) provides overloaded operators

– one set of operands for set operations, e.g.

$set1 = $set2 & $set3;

– one set of operands for big integer math, e.g.

$bigsum += $bigint;

(*): will be optional in version 6.0 (for improved loading speed of "plain" module), is always loaded now

Page 9: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 9

Alternatives (1/2)

• vec()– confusing– insufficiently powerful for many applications

• PDL– complicated– designed primarily for astronomical data analysis and

heavy duty number crunching (written in C, internally)

• Math::PARI– very powerful– requires separate C library "PARI"

• Math::BigInt (is in the Core of Perl 5.6)– slow (written entirely in Perl, stores digits in Perl arrays)

• Math::BigInteger– unmaintained, doesn't compile (uses XS and a C library)

Page 10: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 10

Alternatives (2/2)

• Set::Bag - implements multisets

• Set::IntSpan - optimized for .newsrc file type sets (also supported by Bit::Vector, but need more memory)

• Set::Object - implements sets of arbitrary objects (can be simulated with Bit::Vector using lookup table, set operations will then be faster)

• Set::Scalar - similar to Set::Object (?), but also allows recursion (set of sets)

• Set::Window - optimized for intervals of integers (needs much less memory than Bit::Vector, but only of limited use since the whole interval is either in or out)

Page 11: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 11

Simulating Set::Object using lookup table

• See file "SetObject.pl"

Page 12: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 12

Some Applications

• Set::IntRange - sets of integers (universe = some interval)

• Math::MatrixBool - useful for graph algorithms(e.g. shortest paths / Kleene's Algorithm)

• Slice (multiple document version generator)

• Parse table generators for compiler-compilers à la "yacc" (calculating first, follow & lookahead character sets)

• Cryptography

• Easy manipulation of data (files), any number of bits at a time

Page 13: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 13

Application "Slice"

• See

– homepage screenshot "Slice.bmp"

– file "file.in"

– file "Slice.txt"

– file "file.html.en.OK"

– file "file.html.de.OK"

– URL http://www.engelschall.com/sw/slice/

Page 14: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 14

Application "Date::Calc" v5.0 (coming soon)

• Stores years in bit vectors (one year = one bit vector, one day = one bit)

• Bit is "on" if corresponding day is a holiday

• Performs calculations taking holidays into account

Page 15: Sd&m software design & management GmbH & Co. KG Thomas-Dehler-Straße 27 81737 München Telefon (0 89) 6 38 12-0 Telefax (0 89) 6 38 12-150 .

sd&m 15

Questions & Answers, Suggestions

• Please feel free to ask!

• Suggestions are welcome.