CS194-3/CS16x Introduction to Systems Lecture 1 What is a “System”? August 27, 2007 Prof....

41
CS194-3/CS16x Introduction to Systems Lecture 1 What is a “System”? August 27, 2007 Prof. Anthony D. Joseph http://www.cs.berkeley.edu/~adj/cs16x

Transcript of CS194-3/CS16x Introduction to Systems Lecture 1 What is a “System”? August 27, 2007 Prof....

CS194-3/CS16xIntroduction to Systems

Lecture 1

What is a “System”?

August 27, 2007

Prof. Anthony D. Joseph

http://www.cs.berkeley.edu/~adj/cs16x

Lec 1.28/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Who am I?

Professor Anthony D. Joseph• 465 Soda Hall (RAD Lab)• adj AT cs.berkeley.edu• Office hours Mon/Tue 1-2pm in 413 Soda

• Background:– MIT undergrad and grad student

• Research areas:– Current: Network security, OS security, very

large security testbeds– Other: Mobile computing, wireless

networking, cellular telephony

Lec 1.38/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Goals for Today

• Motivation for a new course• Topics:

– Operating systems, Databases, Networking, Security, Software engineering, Distributed systems

• Complexity

Interactive is important!

Ask Questions!

Note: Some slides and/or pictures in the following areadapted from slides ©2005 Silberschatz, Galvin, and Gagne. Slides courtesy of Kubiatowicz, AJ Shankar, George Necula, Alex Aiken, Eric Brewer, Ras Bodik, Ion Stoica, Doug Tygar, and David Wagner.

Lec 1.48/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Why Change CS 162?

• Only minor changes since early 1990’s…– Slides!– Java version of Nachos– Content: More crypto/security, less databases and

distributed filesystems– Time to update again!!

• Most CS students take CS 162 and 186– But, not all take EE 122, CS 169/161– We’d like all students to have a basic

understanding of key concepts from these classes

• Each class introduces the same topics with class-specific biases– Concurrency in an Operating System versus in a

Database– Introduce concepts with a common framework

Lec 1.58/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Rapid Underlying Technology Change

• “Cramming More Components onto Integrated Circuits”– Gordon Moore, Electronics, 1965

Lec 1.68/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Computing Devices Everywhere

Lec 1.78/27/07 Joseph CS194-3/16x ©UCB Fall 2007

People-to-Computer Ratio Over Time

From David Culler

Lec 1.88/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Increasing Software Complexity

From MIT’s 6.033 course

Lec 1.98/27/07 Joseph CS194-3/16x ©UCB Fall 2007

But, Latency Improves Slowly…

From MIT’s 6.033 course

Lec 1.108/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Heat is a Major Problem!

From MIT’s 6.033 course

Lec 1.118/27/07 Joseph CS194-3/16x ©UCB Fall 2007

The Internet

Lec 1.128/27/07 Joseph CS194-3/16x ©UCB Fall 2007

The Dark Side of the Internet…

Lec 1.138/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• Click on the link and you join the STORM zombie network of 250K-10M “0wned” PCs

• Zombies used by malicious hackers (crackers) for phishing, spamming, identity theft, extortion

• Crackers build zombie networks of 10K-1M compromised machines & sell services– Ex: Take down competitor's website for $1K

• Hugely profitable!– Massive spamming, ID fraud through

phishing– Roughly half of all spam is sent by zombies

• How can we secure our machines against folks like this?

Zombie Networks

Lec 1.148/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• How to manage complexity at all levels?

• Many issues and many tradeoffs

• Need a global view of systems– Decompose into components

• Need a global understanding of systems– Applications, networks, databases,

operating systems, security, software engineering…

Complexity

Lec 1.158/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Course Administration

• Instructor: Anthony D. Joseph (adj AT cs.berkeley.edu)465 Soda HallOffice Hours: M/Tu 1-2 in 413 Soda

• TAs: Kai Xia ([email protected])

• Website: http://www.cs.berkeley.edu/~adj/cs16x

• Reader and book: TBA– Most likely: Silberschatz, Galvin, and Gagne,

Operating Systems Concepts, 7th Ed., 2005

• Projects: First project will likely be Nachos-based

• Grading: TBA

Lec 1.168/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• Managing complexity (abstractions, layering, modularity)

• Team programming, IDEs, documentation style • OS, memory, database, and network security • Kernel and address spaces, Address translation,

Caching, TLBs, demand paging• I/O Systems, File systems, directories, database buffer

pools, tuple layouts, files of tuples• Internet evolution, architectures, protocols, routing, P2P

and overlay networks,• Concurrency, processes, threads, ACID • Enforcing mutual exclusion, serializability, 2PL, logging,

recovery, deadlock • Viruses, worms, and botnets, DDoS, • Cryptographic algorithms: RSA, MD5, DES• Simple authentication protocols, PKI • Query (dataflow) operators, map-reduce

Topic Coverage

Lec 1.178/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• It’s like art:– There’s a vision, a realization, an aesthetic

appeal, a sense of ownership and satisfaction

• It’s not like art:– The end result is useful

» To you, and anyone else

• It’s immensely satisfying to do– Your project is your baby

» It’ll keep you up at night, make you proud… » But won’t disown you when it’s 14 (though you

might disown it)

• Good software engineering can be learned– But it is hard to teach– Most people only learn through experience

(making mistakes)

Creating Software Is Awesome

Lec 1.188/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Group Project Simulates Industrial Environment

• Project teams have 4 or 5 members in same discussion section– Must work in groups in “the real world”

• Communicate with colleagues (team members)– Communication problems are natural– What have you done?– What answers you need from others?– You must document your work!!!– Everyone must keep an on-line notebook

• Communicate with supervisor (TAs)– How is the team’s plan?– Short progress reports are required:

» What is the team’s game plan?» What is each member’s responsibility?

Lec 1.198/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Typical Lecture Format

• 1-Minute Review• 20-Minute Lecture• 5- Minute Administrative Matters• 25-Minute Lecture• 5-Minute Break (water, stretch)• 25-Minute Lecture• Instructor will come to class early & stay after to

answer questions

Attention

Time

20 min. Break “In Conclusion, ...”25 min. Break 25 min.

Lec 1.208/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Academic Dishonesty Policy

• Copying all or part of another person's work, or using reference material not specifically allowed, are forms of cheating and will not be tolerated. A student involved in an incident of cheating will be notified by the instructor and the following policy will apply:

http://www.eecs.berkeley.edu/Policies/acad.dis.shtml• The instructor may take actions such as:

– require repetition of the subject work, – assign an F grade or a 'zero' grade to the subject work, – for serious offenses, assign an F grade for the course.

• The instructor must inform the student and the Department Chair in writing of the incident, the action taken, if any, and the student's right to appeal to the Chair of the Department Grievance Committee or to the Director of the Office of Student Conduct.

• The Office of Student Conduct may choose to conduct a formal hearing on the incident and to assess a penalty for misconduct.

• The Department will recommend that students involved in a second incident of cheating be dismissed from the University.

Lec 1.218/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Computer System Organization

• Computer-system operation– One or more CPUs, device controllers

connect through common bus providing access to shared memory

– Concurrent execution of CPUs and devices competing for memory cycles

Lec 1.228/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Example: Some Mars Rover Requirements

• Serious hardware limitations/complexity:– 20Mhz powerPC processor, 128MB of RAM – cameras, scientific instruments, batteries,

solar panels, and locomotion equipment– Many independent processes work together

• Can’t hit reset button very easily!– Must reboot itself if necessary– Always able to receive commands from Earth

• Individual Programs must not interfere– Suppose the MUT (Martian Universal Translator

Module) buggy– Better not crash antenna positioning software!

• Further, all software may crash occasionally– Automatic restart with diagnostics sent to Earth– Periodic checkpoint of results saved?

• Certain functions time critical:– Need to stop before hitting something– Must track orbit of Earth for communication

Lec 1.238/27/07 Joseph CS194-3/16x ©UCB Fall 2007

How do we tame complexity?

• Every piece of computer hardware different– Different CPU

» Pentium, PowerPC, ColdFire, ARM, MIPS– Different amounts of memory, disk, …– Different types of devices

» Mice, Keyboards, Sensors, Cameras, Fingerprint readers

– Different networking environment» Cable, DSL, Wireless, Firewalls,…

• Questions:– Does the programmer need to write a single

program that performs many independent activities?

– Does every program have to be altered for every piece of hardware?

– Does a faulty program crash everything?– Does every program have access to all hardware?

Lec 1.248/27/07 Joseph CS194-3/16x ©UCB Fall 2007

OS Tool: Virtual Machine Abstraction

• Software Engineering Problem: – Turn hardware/software quirks

what programmers want/need– Optimize for convenience, utilization, security,

reliability, etc…• For Any OS area (e.g. file systems, virtual

memory, networking, scheduling):– What’s the hardware interface? (physical reality)– What’s the application interface? (nicer

abstraction)

Application

Operating System

Hardware

Physical Machine Interface

Virtual Machine Interface

Lec 1.258/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Interfaces Provide Important Boundaries

• Why do interfaces look the way that they do?– History, Functionality, Stupidity, Bugs, Management– CS152 Machine interface– CS160 Human interface– EE122 Protocol stack– CS169 Software engineering/management

• Should responsibilities be pushed across boundaries?– RISC architectures, Graphical Pipeline Architectures

instruction set

software

hardware

Lec 1.268/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Virtual Machines

• Software emulation of an abstract machine– Make it look like hardware has features you want– Programs from one hardware & OS on another one

• Programming simplicity– Each process thinks it has all memory/CPU time– Each process thinks it owns all devices– Different Devices appear to have same interface– Device Interfaces more powerful than raw hardware

» Bitmapped display windowing system» Ethernet card reliable, ordered, networking (TCP/IP)

• Fault Isolation– Processes unable to directly impact other processes– Bugs cannot crash whole machine

• Protection and Portability– Java interface safe and stable across many

platforms

Lec 1.278/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Four Components of a Computer System

Definition: An operating system implements a virtual machine that is (hopefully) easier and safer to program and use than the raw hardware.

Lec 1.288/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Virtual Machines (con’t): Layers of OSs

• Useful for OS development– When OS crashes, restricted to one VM– Can aid testing programs on other OSs

Lec 1.298/27/07 Joseph CS194-3/16x ©UCB Fall 2007

What does an Operating System do?

• Silerschatz and Gavin:“An OS is Similar to a government”

– Begs the question: does a government do anything useful by itself?

• Coordinator and Traffic Cop:– Manages all resources– Settles conflicting requests for resources– Prevent errors and improper use of the computer

• Facilitator:– Provides facilities that everyone needs– Standard Libraries, Windowing systems– Make application programming easier, faster, less

error-prone• Some features reflect both tasks:

– E.g. File system is needed by everyone (Facilitator)– But File system must be Protected (Traffic Cop)

Lec 1.308/27/07 Joseph CS194-3/16x ©UCB Fall 2007

OS Systems Principles

• OS as illusionist:– Make hardware limitations go away– Provide illusion of dedicated machine with

infinite memory and infinite processors• OS as government:

– Protect users from each other– Allocate resources efficiently and fairly

• OS as complex system:– Constant tension between simplicity and

functionality or performance• OS as history teacher

– Learn from past – Adapt as hardware tradeoffs change

BREAK

Lec 1.328/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• Need to store information?

• Can put it in a file

• Too big or too complex?– Use a database

• How big is the web?– 400 million hosts– 15-30 billion pages

(http://www.pandia.com/sew/383-web-size.html)…

• With a billion users looking for information

Data Complexity

Lec 1.338/27/07 Joseph CS194-3/16x ©UCB Fall 2007

What is a Database System Today?

Lec 1.348/27/07 Joseph CS194-3/16x ©UCB Fall 2007

More Complex Database Systems

Lec 1.358/27/07 Joseph CS194-3/16x ©UCB Fall 2007

So… What is a Database?

• We will be broad in our interpretation• A Database:

– A very large, integrated collection of data.

• Typically models a real-world “enterprise”– Entities (e.g., teams, games)– Relationships (e.g. The A’s are playing in the World

Series)

• Might surprise you how flexible this is– Web search:

» Entities: words, documents» Relationships: word in document, document links to

document.

– P2P filesharing:» Entities: words, filenames, hosts» Relationships: word in filename, file available at host

Lec 1.368/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• A Database Management System (DBMS) is:– A software system designed to store,

manage, and facilitate access to databases.

• Typically this term used narrowly– Relational databases with transactions

» E.g. Oracle, DB2, SQL Server

– Mostly because they predate other large repositories» Also because of technical richness

– When we say DBMS in this class we will usually follow this convention» But keep an open mind about applying the

ideas!

What is a Database Management System?

Lec 1.378/27/07 Joseph CS194-3/16x ©UCB Fall 2007

Is the WWW a DBMS?

• Fairly sophisticated search available– Crawler indexes pages on the web– Keyword-based search for pages

• But, currently– data is mostly unstructured and untyped– search only:

» can’t modify the data» can’t get summaries, complex combinations of data

– few guarantees provided for freshness of data, consistency across data items, fault tolerance, …

– Web sites typically have a (relational) DBMS in the background to provide these functions.

• The picture is changing quickly– Information Extraction to get structure from unstructured– New standards e.g., XML, Semantic Web can help data

modeling

Lec 1.388/27/07 Joseph CS194-3/16x ©UCB Fall 2007

“Search” versus Query

• What if you wanted to find out which actors donated to John Kerry’s presidential campaign?

• Try “actors donated to john kerry” in your favorite search engine.

• If it isn’t “published”, it can’t be searched!

Lec 1.398/27/07 Joseph CS194-3/16x ©UCB Fall 2007

A “Database Query” Approach

Lec 1.408/27/07 Joseph CS194-3/16x ©UCB Fall 2007

“Yahoo Actors” JOIN “FECInfo”

(Courtesy of the Telegraph research group @Berkeley)

Lec 1.418/27/07 Joseph CS194-3/16x ©UCB Fall 2007

• Learn how to build complex systems:– How can you manage complexity for future projects?

• Engineering issues:– Why is the web so slow sometimes? Can you fix it?– What features should be in the next mars Rover?– How do large distributed systems work? (e.g. Skype)

• Business issues:– Will my web services application scale to 1M users?

• Buying and using a personal computer:– Why different PCs with same CPU behave

differently?– Should you upgrade to Vista or wait?– Why does Microsoft have such a bad name (and

Apple a good name)?• Security, viruses, and worms

– What exposure do you have to worry about?

Why Study Systems – OS/Net/DB/Sec/SE?