1
Lecture 4Lecture 4Data FormatsData Formats
ITEC 1000 “Introduction to Information Technology”
2
Lecture Template:Lecture Template:
Data FormsData Forms Data conversion and representationData conversion and representation Data FormatsData Formats Alphanumeric DataAlphanumeric Data Image DataImage Data Audio DataAudio Data Data InputData Input Data CompressionData Compression Internal Computer Data Format Internal Computer Data Format
3
Data FormsData Forms
Human communication Includes language, images and sounds
Computers Process and store all forms of data in binary format
Conversion to computer-usable representation using data formats
Define the different ways human data may be represented, stored and processed by a computer
4
Data conversion and representationData conversion and representation
5
Data formatsData formats
Proprietary formatsUnique to a product or companyE.g., Microsoft Word, Word Perfect
Standards (evolve in two ways):Proprietary formats become de facto standards (e.g., Adobe PostScript)Invented by an international standard organization (e.g., Motion Pictures Experts Group, MPEG)
6
Common Data RepresentationsCommon Data Representations
Type of Data Standard(s)
Alphanumeric Unicode, ASCII, EDCDIC
Image (bitmapped) GIF (graphical image format)TIF (tagged image file format)PNG (portable network graphics)
Image (object) PostScript, JPEG, SWF (Macromedia Flash), SVG
Outline graphics and fonts
PostScript, TrueType
Sound WAV, AVI, MP3, MIDI, WMA
Page description PDF (Adobe Portable Document Format), HTML, XML
Video Quicktime, MPEG-2, RealVideo, WMV
7
Alphanumeric DataAlphanumeric Data
Characters (r, T), number digits (0..9), punctuation (!, ;), special purpose characters ($, &)
Four codes/standards to represent letters and numbers:
BCD (Binary-Coded Decimal)UnicodeASCII (American Standard Code for Information Interchange)EBCDIC (Extended Binary Coded Decimal Interchange Code)
8
Next 2 slides
Standard Alphanumeric FormatsStandard Alphanumeric Formats
BCD ASCII EBCDIC Unicode
9
Binary-Coded Decimal (BCD)Binary-Coded Decimal (BCD)
Four bits per digit Digit Bit pattern
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
Note: the following 6 bit patterns are not used:
101010111100110111101111
10
BCD: ExampleBCD: Example
709310 = ? (in BCD)
7 0 9 3
0111 0000 1001 0011
11
Standard Alphanumeric FormatsStandard Alphanumeric Formats
Next 13 slides
BCD ASCII EBCDIC Unicode
12
ASCII FeaturesASCII Features
Developed by ANSI (American National Standards Institute) Defined in ANSI document X3.4-1977 7-bit code 8th bit is unused (or used for a parity bit or to indicate
“extended” character set) 27 = 128 different codes Two general types of codes:
95 are “Printing” codes (displayable on a console)33 are “Control” codes (control features of the console or communications channel)
Represents Latin alphabet, Arabic numerals, standard punctuation characters Plus small set of accents and other European special characters (Latin-I ASCII)
13
ASCII TableASCII Table
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
14
ASCII TableASCII Table
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Most significant bit
Least significant bit
15
ASCII TableASCII Table
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
e.g., ‘a’ = 1100001
16
ASCII TableASCII Table
95 Printing codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
17
ASCII TableASCII Table
33 Control codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
18
ASCII TableASCII Table
Alphabetic codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
19
ASCII TableASCII Table
Numeric codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
20
ASCII TableASCII Table
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Punctuation, etc.
21
ASCII TableASCII Table
MSDLSD 0 1 2 3 4 5 6 7
0 NUL DLE SP 0 @ P p
1 SOH DC1 ! 1 A Q a W
2 STX DC2 “ 2 B R b r
3 ETX DC3 # 3 C S c s
4 EOT DC4 $ 4 D T d t
5 ENQ NAK % 5 E U e u
6 ACJ SYN & 6 F V f v
7 BEL ETB ‘ 7 G W g w
8 BS CAN ( 8 H X h x
9 HT EM ) 9 I Y i y
A LF SUB * : J Z j z
B VT ESC + ; K [ k {
C FF FS , < L \ l |
D CR GS - = M ] m }
E SO RS . > N ^ n ~
F SI US / ? O _ o DEL
7416
111 0100
22
Example: Example: “Hello, world”“Hello, world”
============
Binary100100011001011101100110110011011110101100010000011101111100111111001011011001100100
Hexadecimal48656C6C6F2C207767726C64
Decimal72
1011081081114432
119103114108100
Hello, world
============
============
23
Common Control CodesCommon Control Codes
CR 0D carriage return LF 0A line feed HT 09 horizontal tab DEL 7F delete NULL 00 null
Hexadecimal code
24
ASCII Table: Common Control CodesASCII Table: Common Control Codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
25
Standard Alphanumeric FormatsStandard Alphanumeric Formats
Next 3 slides
BCD ASCII EBCDIC Unicode
26
EBCDICEBCDIC
8-bit code Developed by IBM IBM and compatible
mainframes only Rarely used today
(common in archival data)
Character codes differ from ASCII
Conversion software to/from ASCII available
ASCII EBCDIC
Space 2016 4016
A 4116 C116
b 6216 8216
27
EBCDIC Table (1 out of 2)EBCDIC Table (1 out of 2)
28
EBCDIC Table (2 out of 2)EBCDIC Table (2 out of 2)
29
Standard Alphanumeric FormatsStandard Alphanumeric Formats
Next 2 slides
BCD ASCII EBCDIC Unicode
30
UnicodeUnicode
Most common 16-bit form represents 65,536 characters
ASCII Latin-I subset of UnicodeValues 0 to 255 in Unicode table
Multilingual: defines codes for Nearly every character-based alphabetLarge set of ideographs for Chinese, Japanese and KoreanComposite characters for vowels and syllabic clusters required by some languages
Allows software modifications for local-languages
31
Two-byte Unicode Assignment TableTwo-byte Unicode Assignment Table
32
Collating SequenceCollating Sequence
Collating SequenceCollating Sequence – the order of the codes in the representation table
Determines sorting and selection of the alphanumeric data
Collating Sequences are different in ASCII and EBCDIC:
Small letters precede capitals in EBCDIC; reverse in ASCIINumbers collate first in ASCII; in EBCDIC, last
33
Two Classes of CodesTwo Classes of Codes
Printing charactersProduced output on the screen or printer
Control charactersControl position of output on screen or printerCause action to occurCommunicate status between computer and I/O device
34
Control Code Definitions (ASCII Table)Control Code Definitions (ASCII Table)
35
Escape SequencesEscape Sequences
Extend the capability of the ASCII code set For controlling terminals and formatting output Defined by ANSI in documents X3.41-1974 and
X3.64-1977 The escape code is ESC = 1B16
An escape sequence begins with two codes: ESC [
1B16 5B16
36
Escape Sequences: ExamplesEscape Sequences: Examples
Erase display: ESC [ 2 J Erase line: ESC [ K
37
Alphanumeric Input: KeyboardAlphanumeric Input: Keyboard
Scan codeScan codeTwo different binary scan codes generated
when key is struck and when key is released
Converted to Unicode, ASCII or EBCDIC by software in terminal or PCReceived by the host as a stream of text and other characters, i.e. in the sequence typed
AdvantageEasily adapted to different languages or keyboard layoutSeparate scan codes for key press/release for multiple key combinations
Examples: shift and control keys
38
Shift KeyShift Key
inhibits bit 5 in the ASCII code
Key(s)ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 0 1
1 0 0 0 0 0 1
a
A
a
aShift
39
Control KeyControl Key
inhibits bits 5 & 6 in the ASCII code
Key(s)ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 1 1
0 0 0 0 0 1 1
c
ETX
c
cCtrl
Controlcode
40
Keyboard InputKeyboard Input
Three letters are typed: “D”, “I”, “R”, followed by the carriage return
Four scan codes translated to ASCII binary codes: 1000100, 1001001, 1010010, 0001101
41
OCR (optical character recognition)OCR (optical character recognition)
Scans text and inputs it as character data
Special OCR software required Used to read specially encoded
characters• Example: magnetically printed check
numbers
Attempts to recognize hand-written input (limited, only carefully printed)
42
Bar Code ReadersBar Code Readers
Used in applications that require fast, accurate and repetitive input with minimal employee training
Examples: supermarket checkout counters and inventory control
Alphanumeric data in bar code (i.e., 780471 108801 90000) read optically using wand that converts them into electrical binary signals
A bar code translation module converts the binary input into a sequence of number codes , one code per digit, then translated to Unicode or ASCII.
43
OtherOther Alphanumeric InputAlphanumeric Input
Magnetic stripe reader: alphanumeric data from credit cards
VoiceDigitized audio recording common but conversion to alphanumeric data difficult
Requires knowledge of sound patterns in a language (phonemes) plus rules for pronunciation, grammar, and syntax
44
Image DataImage Data
Photographs, figures, icons, drawings, charts and graphs
Two approaches: Bitmap or raster images of photos and paintings with continuous variation (e.g., GIF, JPEG)Object or vector images composed of graphical shapes like lines and curves defined geometrically
Differences include:Quality of the imageStorage space required Time to transmitEase of modification
45
Image InputImage Input
Image scanning (moves over the image converting dot by dot into a stream of binary numbers, pixels, representing black or white, or levels of gray, or of a colour) – bitmap image
Digital/video cameras – bitmap image Pointing devices (mouse, pen)- object
image
46
Bitmap ImagesBitmap Images
Each individual pixel (pi(x)cture element) in a graphic stored as a binary number
Pixel: A small area with associated coordinate locationExample: each point below represented by a 4-bit code corresponding to 1 of 16 shades of gray
47
Bitmap DisplayBitmap Display
Monochrome: black or white1 bit per pixel
Gray scale: black, white or 254 shades of gray
1 byte per pixel Color graphics: 16 colors, 256 colors, or
24-bit true color (16.7 million colors)4, 8, and 24 bits respectively
48
Storing Bitmap ImagesStoring Bitmap Images
Frequently large filesExample: 600 rows of 800 pixels with 1 byte for each of 3 colors ~1.5MB file
File size affected byResolution (the number of pixels per inch)
Amount of detail affecting clarity and sharpness of an image
Levels: number of bits for displaying shades of gray or multiple colors
Palette: color translation table that uses a code for each pixel rather than actual color value
Data compression
49
GIF GIF (Graphics Interchange Format)(Graphics Interchange Format)
First developed by CompuServe in 1987 GIF89a enabled animated images
allows images to be displayed sequentially at fixed time sequences
Color limitation: 256 Image compressed by LZW (Lempel-Zif-
Welch) algorithm Preferred for line drawings, clip art and
pictures with large blocks of solid color Lossless compression
50
GIF GIF (Graphics Interchange Format)(Graphics Interchange Format)
51
JPEG JPEG (Joint Photographers Expert Group)(Joint Photographers Expert Group)
Allows more than 16 million colors Suitable for highly detailed
photographs and paintings Employs special compression
algorithm that Discards data to decreases file size and transmission speedMay reduce image resolution, tends to distort sharp lines
52
Other Bitmap FormatsOther Bitmap Formats
TIFF (Tagged Image File Format): .tif (pronounced tif)
Used in high-quality image processing, particularly in publishing
BMP (BitMaPped): .bmp (pronounced dot bmp)Device-independent format for Microsoft Windows environment: pixel colors stored independent of output device
PCX: .pcx (pronounced dot p c x)Windows Paintbrush software
PNG: (Portable Network Graphics): .png (pronounced ping)
Designed to replace GIF and JPEG for Internet applicationsPatent-freeImproved lossless compressionNo animation support
53
Object ImagesObject Images
Created by drawing packages or output from spreadsheet data graphs
Composed of lines and shapes in various colors
Computer translates geometric formulas to create the graphic
Storage space depends on image complexity
number of instructions to create lines, shapes, fill patterns
Movies Shrek and Toy Story use object images
54
Object ImagesObject Images
Based on mathematical formulasEasy to move, scale and rotate without losing shape and identity as bitmap images may
Require less storage space than bitmap images
Cannot represent photos or paintings Cannot be displayed or printed directly
Must be converted to bitmap since output devices except plotters are bitmap
55
Popular Object Graphics SoftwarePopular Object Graphics Software
Most object image formats are proprietaryFiles extensions include .wmf, .dxf, .mgx, and .cgm
Macromedia Flash: low-bandwidth animation Micrographx Designer: technical drawings to
illustrate products CorelDraw: vector illustration, layout, bitmap
creation, image-editing, painting and animation software
Autodesk AutoCAD: for architects, engineers, drafters, and design-related professionals
W3C SVG (Scalable Vector Graphics) based on XML Web description language
Not proprietary
56
PostScriptPostScript
Page description language: list of procedures and statements that describe each of the objects to be printed on a page
Stored in ASCII or Unicode text fileInterpreter program in computer or output device reads PostScript to generate image
Scalable font supportFont outline objects specified like other objects
57
PostScript ProgramPostScript Program
58
Representing Characters as Representing Characters as ImagesImages
Characters stored in format like Unicode or ASCII
Text processed and stored primarily for content Presentation requirements like font stored
with the character Text appearance is primary factorExample: screen fonts in Windows
Glyphs: Macintosh coding scheme that includes both identification and presentation requirement for characters
59
Bitmap vs. Object ImagesBitmap vs. Object Images
Bitmap (Raster) Object (Vector)
Pixel map Geometrically defined shapes
Photographic quality Complex drawings
Paint software Drawing software
Larger storage requirements Higher computational requirements
Enlarging images produces jagged edges
Objects scale smoothly
Resolution of output limited by resolution of image
Resolution of output limited by output device
60
Video ImagesVideo Images
Require massive amount of dataVideo camera producing full screen 640 x 480 pixel true color image at 30 frames/sec 27.65 MB of data/sec 1-minute film clip 1.6 GB storage
Options for reducing file size: decrease size of image, limit number of colors, reduce frame rate
Method depends on how video delivered to usersStreaming video: video displayed as it is downloaded from the Web server
Example: video conferencingLocal data (file on DVD or downloaded onto system) for higher quality
MPEG-2: movie quality images with high compression require substantial processing capability
61
Audio DataAudio Data
Transmission and processing requirements less demanding than those for video
Waveform audio: digital representation of sound
MIDI (Musical Instrument Digital Interface): instructions to recreate or synthesize sounds
Analog sound converted to digital values by A-to-D converter
62
Waveform AudioWaveform Audio
Sampling rate normally 50KHz
63
Sampling RateSampling Rate
Number of times per second that sound is measured during the recording process.
1000 samples per second = 1 KHz (kilohertz)Example: Audio CD sampling rate = 44.1KHz
Height of each sample saved as:8-bit number for radio-quality recordings16-bit number for high-fidelity recordings2 x 16-bits for stereo
64
MIDIMIDI
Music notation system that allows computers to communicate with music synthesizers
Instructions that MIDI instruments and MIDI sound cards use to recreate or synthesize sounds.
Do not store or recreate speaking or singing voicesMore compact than waveform3 minutes = 10 KB
65
AudioAudio FormatsFormats
MP3 Derivative of MPEG-2 (ISO Moving Picture Experts Group)Uses psychoacoustic compression techniques to reduce storage requirementsDiscards sounds outside human hearing range: lossy compression
WAVDeveloped by Microsoft as part of its multimedia specificationGeneral-purpose format for storing and reproducing small snippets of sound
66
..WAVWAV Sound Format Sound Format
67
Data CompressionData Compression
Compression: recoding data so that it requires fewer bytes of storage space.
Compression ratio: the amount file is shrunk Lossless: inverse algorithm restores data to exact
original formExamples: GIF, PCX, TIFF
Lossy: trades off data degradation for file size and download speed
Much higher compression ratios, often 10 to 1Example: JPEG Common in multimedia
MPEG-2: uses both forms for ratios of 100:1
68
Compression AlgorithmsCompression Algorithms
Repetition0 5 8 7 0 0 0 0 3 4 0 0 0 0 1 5 8 7 0 4 3 4 0 3Example: large blocks of the same color
Pattern Substitution Scans data for patternsSubstitutes new pattern, makes dictionary entryExample: 45 to 30 bytes plus dictionary
Peter Piper picked a peck of pickled peppers. t p a of l pp s.
Pe pi ed
er ck pe
Pi
69
Internal Computer Data FormatInternal Computer Data Format
All data stored as binary numbers Interpreted based on
Operations computer can performData types supported by programming language used to create application
70
Five Simple Data TypesFive Simple Data Types
Boolean: 2-valued variables or constants with values of true or false
Char: Variable or constant that holds alphanumeric character
EnumeratedUser-defined data types with possible values listed in definition
Type DayOfWeek = Mon, Tues, Wed, Thurs, Fri, Sat, Sun Integer: positive or negative whole numbers Real
Numbers with a decimal pointNumbers whose magnitude, large or small, exceeds computer’s capability to store as an integer
Top Related