ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

54
ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

Transcript of ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

Page 1: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

2. Data Formats

Chapt. 3

Page 2: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Introduction

• Examples

pp. 59.-61

Real World

Data

Computer

DataInput device

Dear Mom: Keyboard 10110010…

Digitalcamera

10110010…

Page 3: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Format must be appropriate

• The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound)

Page 4: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Rules/Conventions

• Proprietary formats– Unique to a product or company

– E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes

• Standards– Evolve two ways:

• Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)

• Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)

pp. 61-62

Page 5: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Standards Organizations

• ISO – International Standards Organization

• CSA – Canadian Standards Association

• ANSI – American National Standards Institute

• IEEE – Institute for Electrical and Electronics Engineers

• Etc.

Page 6: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Examples of Standards

Type of Data Standards

Alphanumeric ASCII, EBCDIC, Unicode

Image JPEG, GIF, PCX, TIFF

Motion picture MPEG-2, Quick Time

Sound Sound Blaster, WAV, AU

Outline graphics/fonts PostScript, TrueType, PDF

Page 7: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Why Standards?

• Standard are “arbitrary”

• They exist because they are– Convenient– Efficient– Flexible– Appropriate– Etc.

Page 8: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Alphanumeric Data

• Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters “123” (one, two, three)

• Four standards for representing letters (alpha) and numbers– BCD – Binary-coded decimal– ASCII – American standard code for information

interchange– EBCDIC – Extended binary-coded decimal interchange

code– Unicode

pp. 63-69

Page 9: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 10: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Binary-Coded Decimal (BCD)

• Four bits per digit Digit Bit pattern

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

Note: the following bit patterns are not used:

101010111100110111101111

Page 11: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Example

• 709310 = ? (in BCD)

7 0 9 3

0111 0000 1001 0011

Page 12: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Next 22 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 13: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

The Problem

• Representing text strings, such as “Hello, world”, in a computer

Page 14: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Codes and Characters

• Each character is coded as a byte

• Most common coding system is ASCII (Pronounced ass-key)

• ASCII = American National Standard Code for Information Interchange

• Defined in ANSI document X3.4-1977

Page 15: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

ASCII Features

• 7-bit code• 8th bit is unused (or used for a parity bit)• 27 = 128 codes• Two general types of codes:

– 95 are “Graphic” codes (displayable on a console)

– 33 are “Control” codes (control features of the console or communications channel)

Page 16: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

ASCII Chart

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 17: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 18: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Most significant bit

Least significant bit

Page 19: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

e.g., ‘a’ = 1100001

Page 20: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

95 Graphic codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 21: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

33 Control codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 22: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Alphabetic codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 23: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Numeric codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 24: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Punctuation, etc.

Page 25: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

“Hello, world” Example

============

Binary010010000110010101101100011011000110111100101100001000000111011101100111011100100110110001100100

Hexadecimal48656C6C6F2C207767726C64

Decimal72

1011081081114432

119103114108100

Hello, world

============

============

Page 26: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Common Control Codes

• CR 0D carriage return

• LF 0A line feed

• HT 09 horizontal tab

• DEL 7F delete

• NULL 00 null

Hexadecimal code

Page 27: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 28: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Terminology

• Learn the names of the special symbols– [ ] brackets– { } braces– ( ) parentheses– @ commercial ‘at’ sign– & ampersand– ~ tilde

Page 29: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 30: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Escape Sequences

• Extend the capability of the ASCII code set• For controlling terminals and formatting output• Defined by ANSI in documents X3.41-1974 and

X3.64-1977

• The escape code is ESC = 1B16

• An escape sequence begins with two codes:

ESC [

1B16 5B16

Page 31: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Examples

• Erase display: ESC [ 2 J

• Erase line: ESC [ K

Page 32: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Next 1 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 33: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

EBCDIC

• Extended BCD Interchange Code (pronounced ebb’-se-dick)

• 8-bit code

• Developed by IBM

• Rarely used today

• IBM mainframes only

Page 34: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 35: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Unicode

• 16-bit standard

• Developed by a consortia

• Intended to supercede older 7- and 8-bit codes

Page 36: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Unicode Version 2.1

• 1998

• Improves on version 2.0

• Includes the Euro sign (20AC16 = )

• From the standard:…contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

http://www.unicode.org

Page 37: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Keyboard Input

• Key (“scan”) codes are converted to ASCII

• ASCII code sent to host computer

• Received by the host as a “stream” of data

• Stored in buffer

• Processed

• Etc.

pp. 69

Page 38: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Shift Key

• inhibits bit 5 in the ASCII code

Key(s)

ASCII code

6 5 4 3 2 1 0 Character

1 1 0 0 0 0 1

1 0 0 0 0 0 1

a

A

a

aShift

Page 39: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Control Key

• inhibits bits 5 & 6 in the ASCII code

Key(s)

ASCII code

6 5 4 3 2 1 0 Character

1 1 0 0 0 1 1

0 0 0 0 0 1 1

c

ETX

c

cCtrl

Controlcode

Page 40: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 41: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

OCR

Hello, world

Page of text

Optical scan 10110110…

Computer file

Page 42: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 43: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Bar Codes

• An automatic identification (Auto ID) technology that streamlines identification and data collection

• See http://www.digital.net/barcoder/barcode.html

Page 44: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 45: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Voice/audio Input

• Input device: microphone

• Audio input is “digitized” and stored

• Processed in two ways– As is (no recognition)– Recognized and converted to alphanumeric data

(ASCII)

Digitize 10110010…

Page 46: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 47: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Punched Cards

• Invented by Herman Hollerith (founder of IBM)

• Each card holds 80 characters

Page 48: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 49: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Images

• Typically images are pictures that are optically scanned and saved as a “bit map” or in some other format

• Many formats– gif, jpeg, …

Page 50: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Typical “Save As” Dialog

Page 51: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Objects

• Images made of geometrically definable shapes

• Offer efficiency, flexibility, small size, etc.

Page 52: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 53: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Pointing Devices

• Originally used for specifying coordinates (x, y) for graphical input

• Today used as general purpose device for “graphical user interfaces” (GUIs)

Page 54: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3.

ITEC 1011 Introduction to Information Technologies

Thank you