Data Representation. Topics Bit patterns Binary numbers Data type formats Character representation...

Data Representation

Topics

• Bit patterns

• Binary numbers

• Data type formats

• Character representation

• Integer representation

• Floating point number representation

Data Representation

• Data representation refers to the manner in which data is stored in the computer

• There are several different formats for data storage

• It is important for computer problem solvers to understand the basic formats

Why is it Important?• As an example: • We will learn that since we have finite storage, it is possible to

overflow a storage location by trying to store too large a number

• Most programming languages provide multiple data types each providing different length storage for variables

• It is up to the programmer to choose the data type with a length that won’t overflow

• Knowing how numbers are represented in storage helps one to understand this

Bit Pattern

• As you recall from an earlier presentation, data may take various forms: characters, numbers, graphical, etc.

• All data is stored in the computer as a sequence of bits, that is, binary digits

• This is a universal storage format for all data types, and it is called a bit pattern

Bits and Bytes• A bit is the smallest unit of data stored in a computer and it has a

value of 0 or 1• It’s like a switch, on (1) or off (0)• In computers, bits are stored electronically in RAM and auxiliary

storage devices by two-state digital circuits• The storage device itself doesn’t know what the bit pattern

represents, but software (application software, operating system, and I/O device firmware) stores and interprets the pattern

• That is, data is coded then stored and when retrieved it is decoded• A byte is a string of 8 bits and is called a character when the data

is text

Binary Numbers

• Each bit pattern is a binary number, that is, a number represented by 0’s and 1’s rather than 0, 1, 2, …, 9 as decimal numbers are

• For example, bit patterns like 1010 and 101001 are also binary numbers

• Binary numbers are based on powers of 2 rather than powers of 10 as decimal numbers are

Data Type Formats• As we have learned, fundamentally all data is

stored as a bit pattern

• But the different data types have different bit pattern formats

• We want to learn the formats for:– Characters (for example, left, Lane, a, ?, \)– Integers (for example, 1, 453, -10, 0)– Floating point numbers (for example, 3.14159, 1.2, -

567.235, 0.009)

Character Representation• The American Standard Code for Information Interchange (ASCII)

is the scheme used to assign a bit pattern to each of the characters

• ASCII charts come in different flavors:

– Some have 7 bit strings, some 8 or more

– Some show the binary code for the various characters as well as the code represented in other number systems, e.g., decimal, hex, octal

• For example, the letter A has the ASCII code:– 1000001 in binary for a 7-bit chart– 65 in decimal– 41 in hex– 101 in octal Note: All four of these

numbers represent the same value but using different number systems

Subset of ASCII Chart

ASCII Chart• Uppercase characters have different ASCII codes

than lowercase

• Uppercase characters come before lowercase

• Numbers come before letters

• The special characters are spread around

• The numbers and upper and lowercase characters are in adjacent groupings, so that their codes increment by one

ASCII Chart• The only difference between the binary codes for the

upper and lowercase characters is the sixth bit, that is, the decimal code for a lowercase character is 32 greater than the uppercase character’s decimal code

• ASCII codes before decimal 32 are control characters (nonprintable) like bell, backspace, and carriage return

• The final ASCII code in the 7-bit chart is the control character DEL with decimal code 127

Extended ASCII & Unicode• The eight-bit ASCII chart is sometimes called Extended

ASCII• The seven-bit ASCII codes are the same in eight-bit chart

except have a zero at the left• Some manufactures use the extra bit to create additional

special characters, these however are nonstandard, e.g., using decimal 171 for ½, or 246 for ÷

• Unicode is another scheme developed so that the many symbols in international languages may be represented. It also uses bit patterns. UTF-32 uses 32 bits.

Numeric Representation• ASCII codes are an inefficient method for representing

numbers• For example, the number 1,024 using 8-bit ASCII would

require four bytes or 32 bits of storage• Arithmetic operations on numbers represented in ASCII

are very complicated• Representing the precision of a number, that is, the

number of digits stored, may require large amounts of space when stored in ASCII

• There are more efficient schemes for numbers

Integer Representation• An integer is a whole number, that is, a number

without a decimal portion• Integers may be positive, negative, or zero• A plus-sign or minus-sign in front of the number

is used to represent positive and negative numbers• The plus-sign is not required for positive numbers

and zero• There are two categories of integer representation:

unsigned and signed

Unsigned Integer• An unsigned integer is an integer without a sign, that is, a non-

negative integer• They range from zero to infinity, but no computer can store all

the integers in that range• So, a maximum unsigned integer is defined• This maximum is based on the number of bits used to store an

integer• Let’s use 8 and 16-bit (1 and 2 bytes) storage locations in our

examples• The length of storage is set by the data type the programmer

specifies for a variable

Unsigned Integer• An unsigned integer is stored as its value when

represented as a binary number• Leading zeros are added to fill out the storage location• For example, the decimal number 9 is represented as

00001001 when stored in 1-byte because 000010012 = 910

• When stored in a 2-byte location, 9 would be represented as 0000000000001001

Unsigned Integer• One may use the following table to work with binary

numbers:

• For example, given 00001001, what decimal number does it represent?

• Add the non-negative powers of two, that is, 8 + 1 = 9

Unsigned Integer• One may use the same table to go the other way, that is,

given the decimal number 13, what is its binary representation?

• Find the largest power of 2 that doesn’t exceed the number and place a 1 in that cell:

• Subtract that power of 2 from the number and use this as the new number: 13 – 8 = 5

Unsigned Integer• Then continue in this way until the sum of the

powers of two equals the number:

• Now, 5 – 4 = 1, and so finally:

• Note that 8 + 4 + 1 = 13

Unsigned Integer

• Then fill in the remaining cells with zeros:

• So, the unsigned integer representation of decimal 13 is 00001101 when stored in 1-byte

Unsigned Integer

• If one tries to store a number in a memory location that is not large enough we have what is called overflow

• In this case, depending on the system, one may or may not receive an error message

• So, one must not store a number that is larger than the maximum for a given length of storage

• The maximum number storable in 1-byte is 255

Unsigned Integer

• For example, if one tries to store 256 in 1-byte there is overflow because the largest value storable in 8 bits is 255 as one can see from the following table:

• Note that 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255

Signed Integer• A sign-and-magnitude format is used to allow for positive

and negative numbers (and zero)• The leading bit is designated as the sign bit: 0 for positive or

zero, 1 for negative• The remaining bits represent the value• So, in 1-byte of storage the maximum number storable is not

255 as it was for the unsigned integer representation, but 127:

• Note that 64 + 32 + 16 + 8 + 4 + 2 + 1 = 127

Signed Positive Integer• To determine what the sign-and-magnitude

representation of a positive decimal number is:– Convert the decimal number to binary– If needed add leading zeros to fill the storage

location

• For example, decimal 12 is represented in 1-byte as 00001100 because 8 + 4 = 12:

Signed Positive Integer• Going the other way, given a sign-and-magnitude

representation for a positive number, one can interpret it as follows:– Leftmost bit will be 0 indicating positive– Convert the remaining bits to a decimal number

• For example, 00010001 is decimal 17:

• Because 16 + 1 = 17

Signed Negative Integer• For negative numbers, two’s

complement format is used

• Two’s complement is still a sign-and-magnitude format

• In two’s complement, some of the magnitude bits are flipped from 0 to 1 or 1 to 0

Signed Negative Integer• To determine what the two’s complement representation

of a negative decimal number is:– Ignore the sign and convert the decimal number to

binary– If needed add leading zeros to fill the storage location– Leave all the rightmost 0’s and first 1 unchanged, but

flip the remaining bits– Make the sign bit 1

• For example, decimal -14 is represented in 1-byte as 11110010 because (see next slide)

Signed Negative Integer• Convert 14 to binary (8 + 4 + 2 = 14) and make

leading bits zero:

• Leave the rightmost 0’s and first 1 as is, but flip the remaining bits:

• Make sign bit 1:

Signed Negative Integer• Going the other way, given a two’s complement

representation for a negative number, one can interpret it as follows:– Leave the rightmost bits up to and including the first

1 unchanged, but flip the remaining bits– Convert the binary number to decimal– Put a minus-sign in front

• For example, 11101010 is decimal –22 because (see next slide)

Signed Negative Integer

• Flip all but the rightmost 1 and any following 0’s:

• Convert the binary number to decimal:

• We get 22 because 16 + 4 + 2 = 22• Put a minus-sign in front yielding -22

Signed Negative Integer• Two’s complement is the standard representation for

negative integers in modern computers• This is because arithmetic operations are simple to

implement when integers are stored this way (but this concept is beyond the scope of the course)

• Although on the surface it seems complicated, at a deeper level it allows for simplicity of operations

Signed Negative Integer• An alternative but equivalent method for converting a

negative number to its two’s complement representation is:– Ignore the sign and convert the decimal number to

binary– If needed add leading zeros to fill the storage location– Flip all the bits– Add 1 to the result of the last step– Make the sign bit 1

• Some people find this easier

Signed Negative Integer• For example, -14. First, convert 14 to binary (8 + 4 + 2 = 14)

and make leading bits zero:

• Flip all the bits:

• Add 1:

• Make the sign bit 1:

Floating Point Number Representation

• Float point numbers are those that have a decimal portion (mathematicians call these real numbers)

• Numbers like 3.14159, 50000.3, and 0.000005

• The method that is used allows for very large or very small numbers to be stored using the same format

Floating Point• The main idea in this format is that the

decimal point is allowed to “float”

• That is, there is an “actual” decimal location in the original number, and there is “stored” decimal location that is usually different

• The original number is normalized by moving the decimal place so there is only one digit to the left

Floating Point• The basic idea can be seen from an example although

this description glosses over many details• The number 102.39 is normalized by moving its decimal

point two places to the left to become 1.0239, and this number is stored and is called the mantissa

• Also, the fact that the decimal point was moved left by 2 is stored so that the original number may be reconstructed and this is called the exponent

• The sign of the number is also stored (0 for positive or zero, 1 for negative)

Floating Point• However, it is actually more complicated than

that

• The exponent and mantissa are actually stored in binary

• And the value stored as the mantissa is only the fractional part of the binary number once the decimal point has been moved so that there is a binary 1 at the left, that is, 1.101001 is stored as 101001 and the leading 1 is assumed

Floating Point• The representation of numbers in floating point

involves a couple procedures that are complicated and beyond the scope of the course

• These are “repetitive multiplication of a decimal fraction by 2,” and the “excess system” for storing positive and negative numbers

• So, we won’t be converting the numbers manually ourselves

Floating Point• However, the procedure used to store a number in floating

point representation is:– Store a 0 (positive) or 1 (negative) in the sign field

– Convert the integer part to binary

– Convert the decimal part (fraction) to binary by using “repetitive multiplication by 2”

– Combine the two binary numbers with a decimal point between

– Move the decimal point so that there is a 1 bit at the left and store the remaining bits in the mantissa field

– Store the number of places moved using the “excess system” in the exponent field

Floating Point• Computers store data in binary and in finite

space, i.e., they are discrete, finite systems

• However real numbers form a continuous, infinite system

• Hence, computers can only approximate real numbers

• The precision of a floating point number is how close the stored number is to the original number

Floating Point

• Small Basic Example:

• Mathematically c should be 0 but what does the program display for c?

a = 2 / 3b = 2 * (1 / 3)c = a - bTextWindow.WriteLine(c)

Floating Point

• The more bits available for the mantissa field the more digits of the original number may be stored

• Programming languages normally allow the programmer to define the precision by the data type chosen

Floating Point

• Institute of Electrical and Electronics Engineers (IEEE) standards:

• Single-Precision (4 bytes)

• Double-Precision (8 bytes)

Floating Point

• Trade off:

• Double precision numbers require more space and therefore programs using them may run slower

• But operations using double precision numbers will be more precise

Summary

• Data are stored as bit patterns

• A bit pattern is a binary number

• There are various data type formats

• Characters are represented in ASCII

Summary• Integers are represented as either

– Unsigned – stored as the binary number equivalent to the original

– Signed Positive – stored using the sign-and-magnitude format where the magnitude is the binary equivalent

– Signed Negative – stored using the sign-and-magnitude format where the magnitude is in the two’s complement format

• Floating point numbers are represented using sign, exponent, and mantissa

Terminology• Data representation• Bit• Byte• Bit pattern• Binary number• Character• Integer• Floating point number• ASCII• Control characters• Extended ASCII• Unicode

• Unsigned integer representation• Overflow• Sign-and-magnitude representation• Sign• Two’s complement representation• Floating point representation• Normalize• Exponent• Mantissa• Precision• Single-precision• Double-precision

Data Representation. Topics Bit patterns Binary numbers Data type formats Character representation...

Documents

Transcript of Data Representation. Topics Bit patterns Binary numbers Data type formats Character representation...