Informatics Research at Informatics Depatment Universitas Padjadjaran
Informatics I101
description
Transcript of Informatics I101
![Page 1: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/1.jpg)
Informatics I101
February 25, 2003
John C. Paolillo, Instructor
![Page 2: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/2.jpg)
Electronic Text
• ASCII — American Standard Code for Information Interchange
• EBCDIC (IBM Mainframes, not standard)• Extended ASCII (8-bit, not standard)
– DOS Extended ASCII– Windows Extended ASCII– Macintosh Extended ASCII
• UNICODE (16-bit, standard-in-progress)
![Page 3: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/3.jpg)
ASCII
01000001
Alphabet letter "A"
means
Screen Representation
A A A
is displayed as
![Page 4: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/4.jpg)
The ASCII CodeNUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
BS
HT
LF
VT
FF
CR
SO
SI
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
blank
!
"
#
$
%
&
'
(
)
*
+
`
-
.
/
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
~
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
DEL
0 1 2 3 4 5 6 7
![Page 5: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/5.jpg)
An Example Text
T h i s i s a n e x a m p l e
84 104 105 115 32 105 115 32 97 110 32 101 120 97 109 112 108 101
Note that each ASCII character corresponds to a number, including spaces, carriage returns, etc. Everything must be represented somehow, otherwise the computer couldn’t do anything with it.
![Page 6: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/6.jpg)
Representation in Memory32
101108112109
97120101
32110
97
_el
pmaxe_na
0110101001101001011010000110011101100110011001010110010001100011011000100110000101100000
![Page 7: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/7.jpg)
Features of ASCII
• 7 bit fixed-length code – all codes have same number of bits
• Sorting: A precedes B, B precedes C, etc.• Caps + 32 = Lower case (A + space = a)• Word divisions, etc. must be parsed
ASCII is very widespread and almost universally supported.
![Page 8: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/8.jpg)
Variable-Length Codes
• Some symbols (e.g. letters) have shorter codes than others– E.g. Morse code:
e = dot, j = dot-dash-dash-dash
– Use frequency of symbols to assign code lentgths
• Why? Space efficiency – compression tools such as gzip and zip use variable-
length codes (based on words)
![Page 9: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/9.jpg)
Requirements
Starting and ending points of symbols must be clear(simplistic) example: four symbols must be encoded:
0 10 110 1110
All symbols end with a zeroAny zero ends a symbolAny one continues a symbolAverage number of bits per symbol = 2
![Page 10: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/10.jpg)
Example
• 12 symbols– digits 0-9
– decimal point and space (end of number)
0 1 2 3 4 5 6 7 8 9 _ .
00
00
0 11
11
10 1
0
0 00
01
11
11
0 001 0102 01103 011104 0111105 0111116 107 1108 11109 11110_ 111110. 111111
![Page 11: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/11.jpg)
Efficient Coding
Huffman coding (gzip)1. count the number of times each symbol occurs
2. start with the two least frequent symbola) combine them using a tree
b) put 0 on one branch, 1 on the other
c) combine counts and treat as a single symbol
3. continue combining in the same way until every symbol is assigned a place in the tree
4. read the codes from the top of the tree down to each symbol
![Page 12: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/12.jpg)
Information Theory
• Mathematical theory of communication– How many bits in an efficient variable-length
encoding?– How much information is in a chunk of data?– How can the capacity of an information medium be
measured?
• Probabilistic model of information– “Noisy channel” model– less frequent ≈ more surprising ≈ more informative
• Measures information using the notion entropy
![Page 13: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/13.jpg)
Noisy Channel
1
0
1
0
Source Destination
We measure the probability of each possible path (correct reception and errors)
![Page 14: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/14.jpg)
Entropy
• Entropy of a symbol is calculated from its probability of occurrenceNumber of bits required hs = log2 ps
Average entropy: H(p) = – sum( pi log pi )
• Related to variance
• Measured in bits (log2)
![Page 15: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/15.jpg)
Base 2 Logarithms
2log2x = x ; e.g. log22 = 1, log24 = 2, log28 = 3,
etc.
Often we round up to the nearest power of two (= min number of bits)
![Page 16: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/16.jpg)
Unicode
• Administered by the Unicode Consortium
• Assigns unique code to every written symbol (21 bits: 2,097,152 codes)– UTF-32: four-byte fixed-length code– UTF-16: two to four-byte variable-length code– UTF-8: one to 4-byte variable length code
• ASCII Block (one byte) + basic multilingual plane (2-3 bytes) + supplementary (4 bytes)
![Page 17: Informatics I101](https://reader036.fdocuments.in/reader036/viewer/2022062500/56815747550346895dc4e94b/html5/thumbnails/17.jpg)