Fondamenti di Informatica - ce.uniroma2.it · Introduzione al corso e all’informatica Prof....
Transcript of Fondamenti di Informatica - ce.uniroma2.it · Introduzione al corso e all’informatica Prof....
The teacher
n Emiliano Casalicchio
[email protected] n Consulting hours
When: monday 16:00-18:00 Where: room D1-17 (1st floor), “Ingegneria
dell’Informazione” building
2
Textbook and teaching material
n Textbook
Engineering Computation with MATLAB®, Second Edition, David M. Smith
n The MATLAB® tool
http://www.mathworks.com Student edition available 89$
n More material
Slides presented and suggested readings
3
Slides don’t replace the textbook. Them are complementary tools.
Both are essential to succeed!!!!
Authoritative information
n Official source information and teaching material
http://www.ce.uniroma2.it/courses/FOI/
» friends » forums » Other
18 Marzo
2011
Emiliano Casalicchio (C)
7
Definitions
n Several definitions
1. the “science of computers” aka: Computer Science
”Computer science is no more about computers than astronomy is about telescopes.”!
!E. Dijkstra (Dutch computer Scientist. Turing Award in 1972. 1930-2002)!
2. the “science of information” or
3. “the science of information representation and (automatic) processing” aka: Informatics
8
Definitions
n “the science of information representation and (automatic) processing”
n what is information ? why are we interested in its representation and (automatic) processing ?
n actually, we are interested in “solving problems”
more precisely, in systematic ways for solving problems Computer Science is concerned with this
9
A problem to be solved
n what do we need to solve this problem “abstractly” ? a representation of the involved entities a procedure to be followed (based on the adopted representation)
?
n representation a way of expressing the relevant information for the problem at hand
n procedure a step-by-step process to be followed to get the solution
a “recipe”
10
Representation + Process
n “the science of information representation and (automatic) processing”
now it should be a bit clearer what we are talking about
it remains to discuss more explicitly the meaning of “automatic” later …
11
the box-and-door problem again …
n box representation face A: heigth(A), width(A)
e.g.: height(A) = 220 cm, width(A) = 70 cm face B: heigth(B), width(B)
e.g.: height(B) = 220 cm, width(B) = 110 cm face C: heigth(C), width(C)
e.g.: height(C) = 70 cm, width(C) = 110 cm
?
n door representation heigth(door), width(door)
e.g.: height(door) = 210 cm, width(door) = 80 cm
A B
12
the box-and-door problem again …
n OK ? hint: try with the numbers given in the previous slide …
?
A B
n a new procedure (“new recipe”) check: height(A)<heigth(door)) AND width(A)<width(door)
if true, OK ; else, “rotate” face A check: width(A)<heigth(door)) AND heigth(A)<width(door)
if true, OK ; else, check face B …
n procedure (“recipe”) check: height(A)<heigth(door)) AND width(A)<width(door)
if true, OK ; else, check face B check: height(B)<heigth(door)) AND width(B)<width(door)
if true, OK ; else, check face C check: height(C)<heigth(door)) AND width(C)<width(door)
if true, OK ; else FAILURE
13
“general” procedure
check: height(A)<heigth(door)) AND width(A)<width(door)
if true, OK ; else, “rotate” face A check: width(A)<heigth(door)) AND
heigth(A)<width(door) if true, OK ; else, check face B
…
n we are interested in general procedures “parametric” procedure
n does this procedure work only for this box and this door ?
?
A B
14
“automatic” processing
check: height(A)<heigth(door)) AND width(A)<width(door)
if true, OK ; else, “rotate” face A check: width(A)<heigth(door)) AND
heigth(A)<width(door) if true, OK ; else, check face B
…
n box representation face A: heigth(A), width(A) face B: heigth(B), width(B) face C: heigth(C), width(C)
n door representation heigth(door), width(door)
representation procedure
+
n what do we mean by “automatic” ?
n “the science of information representation and (automatic) processing”
15
a language to write “recipes”
START
recipe processing begins
END
recipe processing ends
input / output
a recipe step
condition
selection of alternative paths
true false
a “collateral” recipe
X :
a named “container”
X ← expr
put a value in a “container”
16
an example of “recipe”
Start
true write: “trivial answer or impossible to calculate”
End r ← remainder
of x/y
read m and n
x ← m y ← n
x ≠ 0 AND y ≠ 0 false
false r = 0
x ← y
n “try” this recipe with different pairs of non-negative integer numbers are you able to get the “final” answer?
n what does it mean ? is it necessary to know that, to process
this recipe ?
y ← r
write: “the answer is” y
End
true
17
a mathematical problem
n determine z = GCD(i, j) i, j ∈ N
n D(i) the set of the integer divisors of i
€
=def
n D(i) = { k | i = k·q, k ∈ N+, q ∈ N } N+= N - {0}
n z = GCD(i, j) = max( D(i) ∩ D(j) ) i, j z GCD
i, j z GCD
matching problems with recipes
BTW: this “recipe” to solve the GCD problem is known as “the Euclid algorithm” (300 B.C.)
read m and n
start
End
End
write:“the answer is ” y
x←m y←n
x≠0 AND y≠0 true
true
false
false
r←remainder of x/y
r=0 x←y
y←r
1-19
Selecting Baseball Cards – The Problem
n For example, say you have a big collection of baseball cards and you want to find the names of the 10 “qualified” players with the highest lifetime batting averages.
n To qualify, the players must have been in the league at least 5 years, had at least 100 plate appearances per year, and made fewer than 10 errors per year.
n The cards contain all the relevant information for each player. You just have to organize the cards to solve the problem.
1-20
Selecting Baseball Cards – The Steps
Clearly there are a number of steps between the stack of cards and the solution. In no particular order these are:
a. Write down the names of the players from some cards
b. Sort the stack of cards by the lifetime batting average
c. Select all players from the stack with 5 years or more in the league
d. Select all players from the stack with fewer than 10 errors per year
e. Select all players from the stack with over 100 plate appearances per year
f. Keep the first 10 players from the stack
1-21
Selecting Baseball Cards – The recipes
The solution might be: c. Select all players from the stack with 5 years or more
in the league d. Select all players from the stack with fewer than 10
errors per year e. Select all players from the stack with over 100 plate
appearances per year b. Sort the stack of cards by the lifetime batting
average f. Keep the first 10 players from the stack a. Write down the names of the players from these
cards
In any
order
Computer Science / Informatics is about …
n “recipes” to solve problems the methodological facet
problem solving and information management
n “machines” to process recipes the technological facet
(presently) electronic computing devices and systems that use them
n both have a long history methodologies (Euclid ~300 B.C., …, al-Kuwarizmi ~1000 A.D., …
Hilbert ~1800 A.D., Gödel, Turing, ~1900 A.D., …) machines (abacus ?B.C., …, Babbage engine ~1800 A.D., …
ENIAC ~1940 A.D., …)"
the actual “birthdate” of computer science
n when technology advances made it realistic building machines able to process recipes millions (and more …) times faster than humans mid of past century …"
n the MACHINE (computer) does the recipe" however hard, tedious and complex it is"
Crank through a million genomes? Find one person in a 30,000 campus? Process a million dots on the screen or a billion sound samples? No problem!"
BTW: That’s media computation
n a word of caution : we have not (yet?) recipes and machines to solve any problem intractable problems non computable problems
Computer science is concerned with the study of recipes
n Computer scientists study… How the recipes are written
algorithms, software engineering The “units” used in the recipes
data structures, databases What can recipes be written for
systems, intelligent systems, theory How well the recipes work (human-computer interfaces)
Specialized Recipes
n computer scientists can also specialize on special kinds of recipes recipes that create pictures, sounds, movies, animations (graphics,
computer music) like other people specialize in crepes or barbeque ...
n still others look at emergent properties of computer “recipes” What happens when lots of recipes talk to one another (networking,
non-linear systems) …
n our focus will be on recipes to solve engineering problems computing liquid level, measuring a solid object, control robot arm
motion, encryption, processing images
n despite specialization, they share several core concepts with other C.S. fields playing a particular game, we will learn general rules …
core concept
information representation
n “recipes” work on abstractions (representations) … and we know what they mean"
n “machines” execute recipes to manipulate representations" without knowing what they mean"
What computers understand
n quotation from previous slide: " “machines” execute recipes to manipulate representations"
without knowing what do they mean"
27
n It’s not really multimedia at all. It’s unimedia
(said Nicholas Negroponte, founder of MIT Media Lab) Everything is 0’s and 1’s
n Computers are exceedingly stupid The only data they understand is 0’s and 1’s They can only do the most simple things with those 0’s and 1’s
Move this value here Add, multiply, subtract, divide these values Compare these values, and if one is less than the other, go follow this step
rather than that one. Done fast enough, those simple things can be amazing.
How a computer works
n just an outline …
28
n The part that does the adding and comparing is the Central Processing Unit (CPU).
n The CPU talks to the memory Think of it as a sequence
millions of mailboxes, each one byte in size, each of which has a numeric address
n The hard disk provides 10 times or more storage than in memory, but is millions of times slower
n The display is the monitor or LCD (or whatever)
Let’s make a step back in 19th century
n The Babbage’s difference engine
n Conceived in 1854
n Realized in 1991 (Science Museum in London)
29
The Von Neumann architecture
n Conceived around 1940
31
ENIAC
Suggested reading
http://en.wikipedia.org/wiki/John_von_Neumann
How a computer works
n just an outline …
32
n The part that does the adding and comparing is the Central Processing Unit (CPU).
n The CPU talks to the memory Think of it as a sequence
millions of mailboxes, each one byte in size, each of which has a numeric address
n The hard disk provides 10 times or more storage than in memory, but is millions of times slower
n The display is the monitor or LCD (or whatever)
33
Central Processing Unit (CPU)
n is in charge of executing the operations of a recipe the recipe is stored in memory
n Execution cycle 1. Fetch an operation (instruction) from memory
1. each operation is coded according to predefined “rules” 2. Decode operation 3. Execute operation
n Each CPU is characterized by its own operation codes (machine language) : 0100 0000 0000 1000
0100 0000 0000 1001
0000 0000 0000 1000
...
34
Memory
n Main memory: central memory stores data and operations of running programs
…binary format volatile “random access” (RAM)
constant access time SRAM, DRAM, etc.
fast (~10-100nsecs), expensive “limited” capacity (up to a few Gigabytes) hierarchical structure
1st, 2nd level cache, …
n Secondary memory: hard disk, CD, etc.. non volatile large capacity (several hundreds Gigabytes) slow (~ms and more), cheap
35
Central Memory
n Consisting of cells (locations), with each of them consisting in turn of a fixed number of binary elements each binary element can store (represent) only two
values : 0 or 1 binary digit -> bit
usually: one cell = 1 byte (8 bit) n Each cell is associated with an address in the
range [0,1,…,M-1] M: memory dimension main memory can be seen as a “vector” of bytes
n CPU reads/writes cell content by specifying the cell address read: to fetch the content of a memory cell write: to modify the content of a memory cell m bit address ⇒ address space 2m
not necessarily: M = 2m
Memory
8 bit
byte M-1 byte M-2
byte 0 byte 1
Layers of abstraction
n basically, we have not this “raw” vision of a computer
n high level operations mathematical functions (log, sine, …) text processing ...
n high level memory set of cells identified by a name
user defined high level cell content
text, image, set of …, … secondary memory organized as a set of named files
n obtained through stratified layers of software
Application Software
System Software
Hardware
Key Concept: Encodings
n We can interpret the 0’s and 1’s in computer memory any way we want. We can treat them as
numbers. We can encode information in
those numbers
38
n Even the notion that the computer understands numbers is an interpretation We encode the voltages on
wires as 0’s and 1’s, eight of these defining a byte
Which we can, in turn, interpret as a decimal number
BTW: why do we interpret this string of 0’s an 1’s as 74 ?
Layer the encodings as deep as you want
n ASCII encoding for characters “A” coded as 65 “B” coded as 66 … If there’s a byte with a 65 in it, and we decide that it’s a
character, POOF! It’s an “A”!
n We can string together lots of these numbers together to make usable text “77, 97, 114, 107” stands for “Mark” “60, 97, 32, 104, 114, 101, 102, 61” stands for “<a href=“ (HTML)
Layered encodings
n A number is just a number
n If you have to treat it as a letter, there’s a piece of software that does it For example, that associates 65 with the graphical
representation for “A” n If you have to treat it as part of an HTML document, there’s a
piece of software that does it That understands that “<A HREF=“ is the beginning of a link
n That part that knows HTML communicates with the part that knows that 65 is an “A”
Multimedia is unimedia
n But that same byte with a 65 in it might be interpreted as… A very small piece of sound (e.g., 1/44100-th of a second) The amount of redness in a single dot in a larger picture The amount of redness in a single dot in a larger picture which
is a single frame in a full-length motion picture
n We use software to manage all these layers How do you decide what a number should mean, and how you
should organize your numbers to represent all the data you want?
That’s data structures
n If that sounds like a lot of data, it is To represent all the dots on your screen probably takes more
than 3,145,728 bytes Each second of sound on a CD takes 44,100 bytes
Why digitize media?
n We work with digital encoding of media digitization
n Digitizing media is encoding media into numbers Real media is analogue (continuous). To digitize it, we break it into parts where we can’t perceive
the parts.
n By converting them in digital format, we can more easily manipulate them, store them, transmit them without error, etc.
How can it work to digitize media?
n Why does it work that we can break media into pieces and we don’t perceive the breaks?
n We can only do it because human perception is limited. We don’t see the dots in the pictures, or the gaps in the
sounds.
n We can make this happen because we know about physics (science of the physical world) and psychophysics (psychology of how we perceive the physical world)
“talking” with computers
n We need a language to exchange information with computers data, recipes, …
n Different programming languages are different ways (encodings) to tell computers same things
45
Programming languages and layers of abstraction
n Different languages at different layers
n machine language 0100 0000 0000 1000 0100 0000 0000 1001 0000 0000 0000 1000
n Assembler LOAD X ADD Y STORE Z
n high level language def fun(): a = 0; print a+5
Sequence of binary instructions, directly executable by CPU
Instructions in 1-to-1 correspondence with binary instructions, but
expressed with symbolic (human understandable) names
Machine independent. Data abstraction
Introduzione 46
translation from layer to layer
swap: muli $2, $5, 4 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31
assembler
00000000101000010000000000011000 00000000100011100001100000100001 10001100011000100000000000000000 10001100111100100000000000000100 10101100111100100000000000000000 10101100011000100000000000000100 00000011111000000000000000001000
def swap(v, k) : temp = v[k] v[k] = v[k+1] v[k+1] = temp
compiler
Program in a high level language Program in
assembler language (MIPS)
Programm in binary machine language (MIPS)
47
Programming language
n Each programming language is characterized by: 1. syntax 2. semantics
n A natural language sentence can be syntactically correct, but with no meaning at all ! the grass reads the house
n The same for programming languages sentences.
Why should you need to study computer science? or “recipes”?
n To understand better the “recipe-way” of thinking It’s influencing everything, from computational science to bioinformatics Eventually, it’s going to become part of everyone’s notion of a liberal
education That’s the process argument BTW, to work with and manage computer scientists
n AND … to communicate! Writers, marketers, producers communicate through computation
n We’ll somehow take these in opposite order
48