Nic Shulver, [email protected] Data Representation Data Usually computing systems are complex...

Nic Shulver, [email protected]

Data RepresentationData

Usually computing systems are complex devices, dealing with a vast array of information categories


Data RepresentationComputing Systems Data

Computing systems store, present, and help us modify:TextAudioImages and graphicsVideo


Data RepresentationDigital vs. Analog

Computing systems are finite machines. They store an limited amount of information, even if the limit is very big.

The information can be represented in one or two ways: analog or digital.


Data RepresentationDigital vs. Analog (1)

Analog data is a continuous representationA mercury thermometer is an analog device

Digital data is a discrete representation, breaking the information up into separate (discrete) elements

Computers can’t work with analog information, so a need do digitize the analog information arise


Data RepresentationWhy digital signals?

Electronic signals (analog and digital) degrade as they propagate. The strength of the signal fluctuates due to environmental effects.

Analog signals lose information. Since any voltage level within the range is valid, it is impossible to know that the original signal was even changed

A digital signal can degrade quite a bit until the information is lost, because any value over a certain threshold is considered high value and below the threshold is considered a low value


Data Representation

Threshold1 1 1 1 1 1 1 10 0 0 0 0 0 0 0

Digital Signal

Analog Signal

Digital Signal Degradation

Analog Signal Degradation

Digital vs. Analog (3)

• You can still retrieve the information from a reasonably degraded digital signal

• Periodically a digital signal is reclocked to regain its original shape. As long as it is reclocked before too much degradation, no info is lost.


Data RepresentationCount like a computer

“There are 10 types of people – those who

understand binary, and those who don’t.”


Data RepresentationDigital Hardware Systems

Digital Binary SystemTwo discrete values:

yes, on, non-zero volts, current flowing, "1" no, off, 0 volts, no current flowing, "0”

Advantage of binary systems:rigorous mathematical foundation based on logicit’s easy to implement


Data RepresentationBinary Representation (1)

Why binary representation (as suppose to decimal or octal, etc..)?Because the devices that store and manage the

digital data are far less expensive and complex for binary representation.

They are also far more reliable when they have to represent one out of two possible values.

Because the electronic signals are easier to maintain if they carry only binary data.


Data RepresentationBinary Representation (2)

One bit can be either 0 or 1. Therefore, one bit can represent only two things.

To represent more than two things, we need multiple bits.Two bits can represent four things because there are four

combinations of 0 and 1 that can be made from two bits: 00, 01, 10,11.

In general, n bits can represent 2^n things because there are 2^n combinations of 0 and 1 that can be made from n bits.

Note that every time we increase the number of bits by 1, we double the number of things we can represent.


Data RepresentationBinary Bit and Group Definitions

Bit - a single binary digitNibble - a group of four bitsByte - a group of eight bitsWord - depends on processor; 8, 16, 32, 64

or more bitsLSB - Least Significant Bit (on the right)MSB - Most Significant Bit (on the left)


Data RepresentationBinary Number System

Just like decimal numbers exceptThe only valid digits are 0 and 1The base is 2 instead of 10

Binary to decimal conversion is just the explicit expression of the positional values,

1 0 11 x 20 = 10 x 21 = 01 x 22 = 4

Total = 5


Data RepresentationOctal & Hexadecimal Number Systems

Systems with different radix and digitsOctal:

Radix = 8Digits = 0,1,2,3,4,5,6,7

Hexadecimal:Radix = 16Digits = 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

Primary advantage of both is it’s easy to convert to/from binary


Data RepresentationData Formats - How to Interpret Data

Internal representation must be appropriate for type of processing taking place:i.e. Images & sound: have to be digitized

Images – need detailed description of the data, how colour is represented at each data point

Sound – need sampling rate

Proprietary formats are unique to a product or company, e.g., PDF, FLA, H.264


Data RepresentationData Formats - How to Interpret Data


Data RepresentationWhy Standards?

Standards evolve two ways:Proprietary formats become de facto standards (e.g., Adobe

PostScript, Apple Quick Time)Committee is struck to solve a problem (Motion Pictures

Experts Group, MPEG)

They exist because they are:Convenient – sometimes time to market is very important

whenever trying to finish a product. Existing standards may be used to save time.

Efficient – most of the standards are put together by committees of experienced engineers and designers


Data RepresentationWhy Standards?

Flexible – often allow for extensionsAppropriate – solve specific problemInteroperability of dataInteroperability of hardware and softwareBut sometimes standards are arbitrary and

have some content derived from “accidents of history”


Data RepresentationStandards Organizations

ISO – International Standards OrganizationCSA – Canadian Standards AssociationANSI – American National Standards InstituteIEEE – Institute for Electrical and Electronics

Engineers


Data RepresentationExamples of Standards

Type of Data Standards

Alphanumeric ASCII, Unicode

Image JPEG, PNG, PCX, TIFF, etc

Motion picture MPEG-2, MPEG-4, etc

Sound WAV, MP3, AAC, FLAC, etc..

Outline graphics/fonts PostScript, TrueType, PDF


Data RepresentationAlphanumeric Data

Three standards for representing letters (alpha) and numbersASCII – American Standard Code for Information

Interchange (old)EBCDIC – Extended Binary-Coded Decimal

Interchange Code (very old - not used anymore, was used in IBM mainframes)

Unicode


Data RepresentationCodes and Characters

The problem:Representing text strings, such as

“Hello, world”, in a computer

Each character is coded as a byte ( = 8 bits)Most common coding system was ASCIIDefined in ANSI document X3.4-1977


Data Representation“Hello, world” Example

============

Binary010010000110010101101100011011000110111100101100001000000111011101100111011100100110110001100100

Hexadecimal48656C6C6F2C207767726C64

Decimal72

1011081081114432

119103114108100

Hello, world

============

============

Note: 12 characters – requires 12 bytesEach character requires 1 byte


Data RepresentationUnicode (1)

The extended version of the ASCII character set is not enough for international use.

The Unicode character set uses 16 bits per character. Therefore, the Unicode character set can represent 2^16, or over 65 thousand, characters.

Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.


Data RepresentationUnicode (2)

Version 2.1, 1998Improves on version 2.0 Includes the Euro sign (U+20AC = “€” = “EURO SIGN”)From the standard: …contains 38,887 distinct coded

characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

The 2012 update of Unicode was 6.2 (see http://www.unicode.org for current version)

Nic Shulver, [email protected] Data Representation Data Usually computing systems are complex...

Documents

Transcript of Nic Shulver, [email protected] Data Representation Data Usually computing systems are complex...