Multimedia Data Formats: Issues and...

CMU-ITC-91-098

Multimedia Data Formats:Issues and Instances

Michael MCINERNY12-Jun-91

ABSTRACT

Multimedia data formats axe the permanent representation of a multimedia document andthus encode the information in that document. Users are limited in their expressionultimately by what the data format permits the application to store. Therefore, itbehooves the system designer to be aware of the issues involved in creating and usingdata formats. This paper sketches some of the user requirements of multimedia,discusses the variety of issues concerning data formats (both common and mediaspecific), and examines a few key formats in use today.

Copyright (c) 1991Michael MclnernyInformation Technology CenterCarnegie Mellon University1

This work was performed as a joint project of Carnegie Mellon University and the IBM Corporation. Theviews and conclusions contained in this document are those of the author and should not be interpreted as

representing the official policies of the IBM Corporation or Carnegie Mellon University.

-i-

Table of Contents

Introduction ...................................................................................................................... 1

Using Multimedia ...................................................................................................... 1Formatted Text ............................................................................................... 1Rasters and Drawings .................................................................................... 1Audio and Video ............................................................................................ 2Multimedia Documents .................................................................................. 2

Technical Issues ............................................................................................................... 3Common Issues .......................................................................................................... 3

Interchange versus Storage ............................................................................ 3Internal Organization ..................................................................................... 3External References ....................................................................................... 3

Scalability ...................................................................................................... 4Static Media Issues .................................................................................................... 4

Text Issues ........................................................................................................... 4

Page Description versus Layout Rules ...... .................................................... 4Deep or Shallow Markup Semantics ............................................................. 4

Raster issues ......................................................................................................... 5Bi-level versus Continuous Tone ................................................................... 5

Bitplane Depth ............................................................................................... 5Color Lookup Tables ..................................................................................... 5"Alpha Channel" Information ...................................................................... 5

Drawing Issues ..................................................................................................... 5Application Domain ....................................................................................... 5Dimensionality ............................................................................................... 6Richness of Primitives ................................................................................... 6

Continuous Time Media Issues .................................................................................. 6

Sampling Rates and Depth ............................................................................. 6"Lossy" Compression ................................................................................... 6Bandwidth from Storage to Display .............................................................. 6Latency in Delivery ....................................................................................... 7

Compositional Media Issues ...................................................................................... 7Extensibility ................................................................................................... 7Reusability ..................................................................................................... 7

Survey of Data Formats ................................................................................................... 7Static Media Formats ................................................................................................. 8

Formatted Text ..................................................................................................... 8Scribe and LaTeX .......................................................................................... 8ATK's Ez ....................................................................................................... 8RTF ................................................................................................................ 8ODA ............................................................................................................... 8SGML & DSSSL ........................................................................................... 9

- ii -

Rasters .................................................................................................................. 9PBM ............................................................................................................... 9CCITT G3/G4 Fax ......................................................................................... 9JPEG .............................................................................................................. 10TIFF ............................................................................................................... 10

Drawings .............................................................................................................. 11IGES ............................................................................................................... 11CGM .............................................................................................................. 11NAPLPS ......................................................................................................... 12

PostScript ....................................................................................................... 12Continuous Time Formats ......................................................................................... 13

Audio .................................................................................................................... 13CD & DAT ..................................................................................................... 13

Speech ............................................................................................................ 13Music .................................................................................................................... 13

MIDI ......................................................... ..................................................... 13Motion Pictures .................................................................................................... 14

DVI ................................................................................................................ 14MPEG ............................................................................................................ 14

Compositional Formats .............................................................................................. 14IFF .................................................................................................................. 14ATK ............................................................................................................... 15ODA ............................................................................................................... 15SGML ............................................................................................................ 15

Conclusions ...................................................................................................................... 15Access to Substructure ................................................................................... 16

Mixing Media ................................................................................................ 16Use in an Information Management System ................................................. 16

References ........................................................................................................................ 17

Introduction

The success of a multimedia system depends partly on three factors: its ability toexchange documents with established monomedia applications, access and organize theinformation in those documents, and exchange documents with other multimediasystems. These factors necessitate an understanding of multimedia data formats and howthese formats impact storage and retrieval.

In addition, as multimedia capable systems grow in usage, data formats are being calledupon to support novel applications. In particular, users want to freely mix media, reuseparts of other documents, and organize their multimedia documents with informationmanagement systems.

In this paper, we will examine the user requirements for multimedia, outline the keytechnical issues concerning the storage and interchange of each medium, and surveysome of the prominent data formats and how they stand against the technical issues.

Using Multimedia

While a digital computer represents all information in terms of numbers (by definition),to require users to deal with media such as text and drawings in terms of their numericrepresentations would be counterproductive. With the advent of bit-mapped displays,applications can now present images that more closely match the cognitive expectationsof the user. For instance, rather than print a list of vectors a computer-aided design(CAD) application can project a drawing of the design. While arbitrary media arepossible, there are four broad classes of media a user is likely to encounter: text,drawings, audio/video, and documents composed of several media.

Formatted Text

Modem text editors let a user add typographical information (often called "styles") totheir text and show the effect of these styles on-screen as the user works with the text.

For instance, emboldened words look heavier, titles appear in larger type, and subscriptsare smaller and sit below the baseline.

Some editors, recognizing that most styles are used to convey the structure of thedocument or additional semantics of the text, try to capture this information directly: anauthor builds a hierarchy of paragraphs, subsections, sections, and chapters; or markspassages as quotations. The text editor then induces the appropriate stylistic layout,using author-adjustable parameters.

Rasters and Drawings

Like a tile mosaic, raster images allow a user to manipulate the image down to thepicture element (i.e. pixel), or more broadly, as with a brushstroke. Rasters are a flexiblemedium, as indicated by the plurality of methods for working with them: paint programsallow artists access to a digital canvas via simulated brushes and pencils; scanners andframe grabbers allow photographs and video stills to be captured; image editors allow

CMU-ITC-098 1

Multimedia Data Formats M. Mclnerny

pictures to be cropped, cut, pasted, rotated, and filtered.

A more structured approach to imaging can be found in drawing editors. Instead ofmanipulating the pixels, a user defines the image in terms of its component geometricshapes, such as lines, polygons, and circles. More sophisticated drawing editors work inthree dimensions and provide the ability to specify surfaces and solids, even texture,material, lighting sources and manufacturing tolerances. Almost all allow the user topackage a set of specifications together for reuse as a component, creating a personal ororganizational library of designs.

The process of turning a drawing into an image is called rendering. Every drawingprogram is capable of at least enough rendering to allow the user to interactivelymanipulate the drawing. But there exist accurate and computationally expensiverendering operations such as ray-tracing and radiosity which can also be employed togenerate photo-realistic images. Rendering a drawing into a raster freezes a perspectiveon the drawing, and opens up the possibility of an artist altering the image.

Audio and Video

Digital audio and video allow sound and motion pictures to be recorded, stored, andplayed back under computer control. Furthermore, because the media are sampleddigitally, the computer has very fine control over how the sample is played back andwhen. One of the chief advantages that this technology has over computer-controlledtape or laser disc is that careful programming can eliminate the seek times, or pauses,when jumping from one point in the sample to another. Another advantage is that thesamples can be digitally reprocessed and filtered, both to remove noise and perform otherinteresting effects. Digitizing video/audio also permits intermixing of other digitalmedia, e.g., putting a graphic or some text on a video image.

Basic digital audio and video editors use a splice operation to assemble sequences, muchlike their tape or film counterparts. Filtering and other signal processing effects arebecoming more popular, and start to take advantage of the flexibility of the digitalmedium. The most interesting audio/video applications allow an instructor to createprogrammed or interactive sequences.

Multimedia Documents

Once an author has access to several different media, he will want to compose amultimedia document, that is, a document which uses more than one medium (e.g. anarticle with figures). The ability to compose media varies from system to system: somesystems allow only an aside, where the non-text media can be requested to appear inanother window; others allow in-line media, but only in alternation with the text; therichest multimedia editors allow any media to contain any other media ad infinitum, sothat a text can contain a table which contains more formatted text within which lies a

raster. The Andrew Toolkit [Palay88] is an example of such a system, and it was used toproduce this paper.

CMU-ITC-098 2


Technical Issues

In order to compare data formats, it is necessary to enumerate the technical issues againstwhich the formats can be evaluated. There are some issues common to all data formats,but each media type also adds new issues. For example, continuous time media (likevideo or audio) have a real-time performance aspect to their presentation, unlike staticmedia (like text or rasters).

Common Issues

All data formats can be evaluated on the issues of interchange, organization, their abilityto support external references, and scale.

Interchange versus Storage

Many applications choose to store their information in a private, native format, often forreasons of space or time efficiency. These applications usually feature an option to writethe document in a transportable fashion to be used for interchange with otherapplications. Data formats optimized for storage often take advantage of machine-specific encodings to increase speed and decrease storage costs; implicit andapplication-specific references are used; and hardware characteristics are assumed.Formats for interchange are careful to either avoid these assumptions or document themexplicitly. For instance, numbers can be written out in ASCII and tags can be generatedfor use as targets of references. Frequently the resulting datastream is readable and caneven withstand some careful editing. Unfortunately, this is not always true: the ISOabstract syntax notation (ASN.1 [ISO-8824]) has a binary encoding (BER [ISO-8825])that does not lend itself to human interpretation and is difficult to modify [Rosenberg91].

A data format designed for storage can become an interchange standard if enoughapplications support it. In addition, interchange formats are sometimes used as nativestorage formats, so that an application has interchange capabilities "built-in".

Internal Organization

The ways of organizing the internal layout of a datastream encompass format, separationof data from form, and sorting. Older formats mimic a deck of punched cards with fixedlength and fixed format records while newer formats use a grammar with data tuckedbetween delimiters or keywords. Structured documents raise the issue of whether to mixstructure and data, or to group all the raw data and use pointers from the structure backinto the data. Sorted data are easier to search than unordered data, but the sort operationcan be computationally expensive.

External References

The ability to name or point to a component or subsequence of a medium is dependentupon the structure of datastream: whether it supports tagging and clustering, or usescompression. Tags allow the author or reader to name a point or a region of interestwithin the document, which will be adjusted accordingly across edits. Clusters arelinked components, which are preserved across edits like tags but can refer to a set of

CMU-ITC-098 3


interesting regions. Compression is used to reduce storage requirements, but it interfereswith external references since it may become necessary to decompress some or all of thedata in order to extract the target region. Most data formats simply are not designed toallow for arbitrary access to the document.

Scalability

The ability of a format to handle large and small documents is its scalability. There areoften hard limits to the size of the information being represented: number of dimensions,extent of the dimensions, resolution, total size. In addition to considering how well aformat deals with large or complex documents, it is also interesting to note how well itdeals with very small or simple documents. If the format requires a lot of overhead incoding the document, authors of applications which generate small documents willprobably forego the complicated format in favor of a simpler, less general format or evencreate a private format.

Static Media Issues

While static media (like text) lack the performance aspect of continuous time media, inmany ways they are more sophisticated than their real-time cousins, owing to their needto convey as much information as concisely as possible. The most common types ofstatic media are formatted text, rasters, and drawings.

Text Issues

Page Description versus Layout Rules

When storing formatted text, the datastream can represent either the structure of thedocument or the formatting. A document's structure includes text of the document andindicates the layout rules. It is easy to edit and it preserves an author's semantic structure(e.g. SGML). The document's formatting indicates how the text should be positioned ona page and is useful for printing (e.g. PostScript). Some systems allow both types ofinformation to be stored (e.g. ODA).

Deep or Shallow Markup Semantics

Text editors with shallow markup semantics allow text to be marked with style macros,which translate a tag like "chapter" into "new page, bold, bigger, and left justify"typographic conventions (e.g. ATK). While this is sufficient for creating a printed page,it fails to capture the notion that the paragraphs and subsections following the chaptertitle, up to the next chapter, are logically part of the chapter. In other words, thesemantics of "chapter/section/subsection" as a hierarchy are completely lacking. Aneditor with deep markup semantics actually organizes the text internally as a hierarchy(e.g. IBM's Quill for SGML). Deep markup semantics produce a datastream that can beunambiguously parsed without assumptions as to the semantics meant by the styleinformation.

CMU-ITC-098 4


Raster issues

Bi-level versus Continuous Tone

A bi-level raster 1 uses only two colors, usually black and white. A continuous toneimage allows the use of many colors to express the image. Certain compressiontechniques, such as run-length encoding, work well on bi-level rasters (e.g. in Group 3fax) and miserably on continuous tone images. Likewise the converse holds true fortechniques like discrete cosine transforms (DCTs, e.g. as used in JPEG). [Chen84] Bi-level images are usually smaller, since they have an obvious binary encoding. However,fractional scaling of bi-level images suffers from aliasing, while continuous tone imagescan use antialiasing techniques to overcome this problem.

Bitplane Depth

The number of colors available in an image is capped at an upper bound of two raised tothe number of bit planes which make up the raster. Thus, an 8-bit deep raster has apalette of 28 or 256 colors, and a 24-bit deep raster has 224 (over 16 million) colors (e.g.TIFF).

Color Lookup Tables

A raster can use a means of indirection called a color lookup table (CLUT), to increasethe apparent depth of the image (e.g. GIF). For instance, a picture of an apple on a redtablecloth will need many different shades of red, but few shades of blue and green. So,a CLUT can be created to reflect this imbalance in the color distribution, and the pixels inthe raster will be assigned the index value of the appropriate color in the CLUT. Thistechnique allows 24-bit images to be compressed to 8-bits, with minimal loss ofinformation.

"Alpha Channel" Information

It is also possible to allocate additional planes to encode information about the pixels inthe image. This so-called alpha channel information can be used as a mask to indicatewhat parts of the image are in the foreground versus the background, the level oftransparency in different regions, even range or distance from the camera (e.g. TIFF).

Drawing Issues

Application Domain

Drawing editors are usually targeted to a domain: oriented towards producing certaintypes of documents (e.g. computer aided design or CAD). An application's emphasis onallowing the user to express some operations rather than others has an impact on what thedatastream is capable of expressing. For instance, a CAD program can store three-dimensional shapes, while a graphic design program cannot. On the other hand, thegraphic design program might provide an artist the ability to manipulate font outlines andsave that information in a datastream which the CAD program could not read because itdoes not know how to manipulate fonts in such a manner.

1The term monochrome is sometimes used, but monochrome more precisely refers to a continuous tone im-

age made up of shades of one hue.

CMU-ITC-098 5

Multimedia Data Formats M. McInemy

Dimensionality

One of the basic differences among drawing editors is the number of dimensions theywill handle. Graphic designers need only the two dimensions of a sheet of paper.Semiconductor and circuit board layout needs several co-planar and inter-connected 2-Dsurfaces. CAM systems need to be able to specify parts in three dimensions. The impactis that 2-D systems know nothing about wire frames, surfaces, and solids, and 2-Ddatastreams cannot represent these forms.

Richness of Primitives

Simple drawing programs provide only lines and polygons as primitives, whilesophisticated editors allow curves, surfaces, and solids. Some editors have libraries ofpredefined shapes like logic symbols or gears. Simple editors can deal with complexshapes only when they are broken down into their simplest elements, losing thesemantics of the object.

Continuous Time Media Issues

Because of their novelty, most attention has been focused on the storage of continuoustime media for straightforward delivery alone. Unfortunately little has been done toenable users to structure, annotate, synthesize, and compose and synchronize thesemedia. In any case, the real-time component of continuous time media presentation andperception presents the data format designer with challenges in bandwidth, latency,sampling rates, and compression and synchronization.

Sampling Rates and Depth

Continuous time media all start and end as analog phenomena. The process of capturingthe analog signal and turning it into a digital stream is called sampling. The frequency atwhich the analog stream is sampled is the sample rate. The precision to which it issampled is called the sample depth. Both affect fidelity: the sample rate limits thehighest frequencies which can be reproduced, and the sample depth limits the dynamicresponse, or the ability to capture subtle changes in the signal.

"'Lossy" Compression

There is a class of compression techniques which result in a loss of information ondecompression. These lossy compression techniques can be used effectively when thehuman perceptual system is relatively insensitive to the information lost. For instance,NTSC video encodes the signal into two components: a high-frequency luminance signalfor creating high-resolution black and white images, and a low-frequency chroma signal,for adding color washes to the luminance information [Lippman89, Alexander90].

Bandwidth from Storage to Display

Bandwidth is the volume of data (or data rate) required for presentation of the media.Audio needs around one and a half million bits per second (1.5 Mbits/second) for highquality stereo (CD data rate). Television-grade video needs 30 Mbits/second (NTSC datarate), and high definition television boosts that requirement to over 100 Mbits/second. Aserious problem develops if the storage device or operating system overhead puts a

CMU-ITC-098 6


ceiling on this bandwidth; the only solution is to use compression to lower thebandwidth through the system and decompression at the output device.

Latency in Delivery

Because it is continuous time media, it is important to estimate the latency, or delay time,of a data format. A data format can contribute to latency through the use of encodings(e.g. compression) and poor organization. Encodings which are difficult to code ordecode can cause data to be delayed, resulting in choppy performance. Poor organizationmay cause excessive seeking on the hardware device in order to collect all the requireddata, and the cumulative effect of all the seek times may also cause the data to bedelayed. A bad format can also prevent quick error recovery or amelioration.

Compositional Media Issues

Compositional media systems attempt to bring different kinds of media together in asingle document. The measure of how flexible a compositional system is inincorporating new media types is called its extensibility. Reusablity is the ability toencode and reuse common information.

Extensibility

A compositional system which defines support for only a few media types will soonbecome obsolete as new media types are invented. A better approach is to allow thedatastream (and the application) to support an extension system whereby new mediatypes can be introduced and incorporated into the document. Some systems areextremely flexible and have a well-defined architecture for supporting new media, whileother systems provide only a trap-door approach of adding private data. The latterapproach is often cumbersome and usually lacks the appropriate semantic support for themedia.

Reusability

The ability to record common information once and reuse it later results in smaller, moremanageable documents. Multimedia implies different kinds of information, but it is alsoconvenient to support multiple views on a common set of information, e.g.: two differentprojections of a 3-D drawing, or a table of numbers and a graph of that dataset.Functionally, there is one set of data supporting the multiple views, so that any change tothe data from one view will cause all of the views to be updated. A compositional dataformat for such a system would need a way of encoding a view's reuse of data fromanother source in the document (or even another document).

Survey of Data Formats

There exist far too many formats for any one paper to document. What follows is a briefsummary of some of the more common data formats: how they stand on the technicalissues and any additional facts about the format related to the issues.

CMU-ITC-098 7


Static Media Formats

Formatted Text

Scribe and LaTeX

SCRIBE and LaTeX are two non-interactive document preparation systems. Bothrequire that the user employ the text editor of his choice to create the source document,that is, the datastream for the document formatter program itself. Thus, it is a storageformat. The text is marked up semantically, but the structure is mostly shallow (SCRIBEdoes allow the markup forms to create an environment, and environments can be nested).The advanced user can alter the layout rules, but not the casual user. While both systemsrequire a tiny amount of document overhead, there is no limit to the length of adocument. There are no provisions for referencing pieces of another document, but thereis support for including outside files as subdocuments, and also for dealing with anexternal bibliographic database. [Reid80, Lamport86]

ATK' s Ez

The Andrew Toolkit (ATK) is an interactive multimedia system, which users knowthrough its formatted text application "ez" [Palay88]. The datastream was designed forstorage, not interchange. However, by convention the datastream is written using onlyprintable ASCII characters and line lengths less than 70 columns [Borensteing0].Because of these conventions, ATK datastreams can be passed through nearly allelectronic mailers anywhere on the Internet, Bitnet, or Usenet, allowing ATK documentsto be mailed from one site to another. Ez can manipulate plain text documents, butstyled documents have a small amount of overhead and a few encoding rules.Documents can be arbitrarily long. The page layout is accomplished through rules,which can be altered by casual users. More complicated layout changes (e.g. 2-columntext) can be achieved by expert users through backdoor hooks into troff and PostScript(which together form the underlying printing model). However, the markup semanticsare purely shallow, and there is no provision to reference pieces of an ATK document. Itis possible to reference another ATK document in its entirety, but only by expert users.

RTF

Microsoft's Rich Text Format is an industry standard format for the interchange of textwith typesetter information [Microsoftg0]. The format is grammar-based and orientedtowards page layout, although there is limited support for style sheets. The semantics areshallow, but assumes a section and paragraph organization to the document. There areno means for referencing other documents, but there are facilities for associating piecesof text (e.g. footnotes and annotations). There is no limit to the length of the text, but thestyle and font tables are limited in their size. There is a facility for including bi-levelrasters and for organizing text into rows and columns.

ODA

The Office Document Architecture (ODA) is an international standard for the

interchange of formatted and processable documents [ISO-8613, Rosenberg91,Brown89]. The file format is specified by a grammar, and there are currently two definedencodings: a binary ASN.1 [Rose90, ISO-8824, ISO-8825] encoding called ODIF, and

CMU-ITC-098 8


an SGML encoding called ODL/SDIF. 2 Because of the encoding, it is tedious torepresent small documents; on the other hand, documents may have unlimited length.ODA documents can have multiple page layouts, and can be laid-out with rules as well.The structure of a document is deep. It is not possible for one ODA document to includeanother, but it possible to reference a portion of another document.

SGML & DSSSL

The Standard Generalized Markup Language (SGML) is an international standard for themarkup of documents [ISO-8879, ISO-8879/A1, Brown89]. It defines the syntax to beused for declaring and adding structural and other types of markup to a document. It doesnot specify how to layout the document. A companion standard, which is in progress,Document Style and Semantics Specification Language (DSSSL), does provide astandardized way to add layout rules to an SGML document [ISO-10179]. SGMLdocuments can be encoded with very little overhead, and imposes no restriction on thelength of a document. SGML supports the inclusion of whole documents (calledsubdocuments) as its only form of external reference. However, SGML does provide theability to specify multiple versions of a document in a single file/encoded form.

Rasters

PBM

PBM is a collection of utilities for interchanging and manipulating rasters. It is availableon the MIT X windows distribution or from the Usenet's comp.sources.unix archives.Early versions supported only bi-level rasters and stored the data with a brief headerfollowed by the bits of the image, each bit encoded as an ASCII one or zero character.Newer versions of the PBM package support grayscale (8 bit deep) and color (up to 24bits) images. The encoding remains ASCII-based, but a more compact non-compressedbinary encoding also exists. While the ASCII encoding is highly transportable,facilitating interchange, it is very bulky, and because arbitrary white space can be presentin the file, references into the file are difficult. However, the binary encoding isreasonably space efficient, and offsets into the file can be calculated to select individualpixels (or a range of pixels). The encoding has almost no overhead, and permitsarbitrarily large rasters. There are no provisions for a CLUT nor alpha-channel. Thereare however, numerous filters for converting to and from many different formats anddisplay devices.

CCITT G3/G4 Fax

The CCITT Group 3 [CCITT-88-VII.3, Warner90] one-dimensional encoding for bi-levelimages is a scan-line compression scheme which uses a predefined Huffman table toencode runs of black and white pixels. Group 3 two-dimensional coding (which is alsoused by Group 4) allows the compression of a scan line by encoding its delta from theprevious line, again by using a predefined delta coding table. Both coding schemes takeone page at a time and produce a bit string. The bit string makes arbitrary reference intothe file meaningless, and the two-dimensional coding requires the decoding of all thescan lines back to the reference scan line (which in Group 4 is at the top of the page!).

2Currently,all implementationsof ODAusethe ASN.1encoding.

CMU-ITC-098 9


CCITT Group 3 (one-dimensional) compression is used primarily by telefacsimile (fax)machines [Glass91]. In addition to the raster encoding, however, fax machines alsoexchange other information about the transmission (number of pages, identity of thecaller) using additional fax protocols. CCITT Group 4 (two-dimensional) encoding isbecoming increasingly popular for use as the digital storage format for page images(scanned microfilm, electronic paper archive).

Because the implicit tables were designed to support pages of text and simple linedrawings, half-toned images compress poorly. However, on pages of text, a tenfoldcompression is common.

JPEG

The Joint Photographic Experts Group [JPEG90, Wallace91, Baran90] has defined alossy compression scheme based on the Discrete Cosine Transform (DCT). The DCT issimply the cosine component of the Discrete Fourier Transform. Performing a DCT onan image produces an array, which when quantized yields many zero elements. Thistransformed array can be compressed using well-known sparse matrix compressiontechniques. However, the quantization which leads to the sparse matrix also causes thedecompressed image to differ from the original image. Due to the nature of thecompression, small amounts of quantization produce negligible loss of quality. Becausethe quantization factor can be increased, the amount of compression (and also thedegradation in image quality) can be increased. Color images up to 24 bits deep can becompressed. There are no facilities for a CLUT nor alpha-channel. Because of thecompression, it is only possible to refer to the whole frame and there is no facility toreference other images in a JPEG file.

The JPEG standard defines a file format which is self-describing, and thus is suitable forinterchange. The high degree of compression makes the format suitable for storage aswell. However, as the processing is tedious and geared towards larger rasters, smallrasters are unlikely to be encoded with JPEG. Several vendors will license software toperform the compression and have chip-level hardware which will perform thecompression at high speed--fast enough to compress NTSC video frames to fast harddrives [Webster90, Rosenthal90].

TIFF

The Tagged Image File Format (TIFF) is an industry standard promoted by both AldusCorp. and Microsoft Corp. [TIFF5.0, Graef89] It is a standard file format which allowsfor the encoding of bi-level, grayscale, CLUT, and full-color RGB images, with variousdepths, and optional compression. It was designed for interchange among desktoppublishing (DTP) applications, but is suitable for storage as well. For each image in thefile (there can be more than one), there is a header for the image which is a set of namedparameters. The parameters are named, or "tagged", so that they can be omitted(defaulted) or the list of tags extended. Pointers are used heavily to point to variablesized blocks of data within the file, and images can be broken into horizontal strips ofarbitrary height and order. This means that the pixels which make up the image can bescattered in blocks throughout the file; however, it is also possible to lay out the file in a

CMU-ITC-098 10


straightforward fashion, with all of the pixels of the image in one block. TIFF is bestused for small to large rasters, as there is some overhead and there are hard limits to theextent of a raster. There is no way to reference an external file from inside a TIFF file,but it is feasible for an application to refer to strips of an image within the file, althoughthere is no provision for this referencing within the file itself. There are three kinds ofoptional compression available: CCITT G3, both one and two dimensional and G4 [seealso TIFF-F]; Macintosh Pack Bits, a run length encoding; and Lempel-Ziv & Welch, anadaptive compression scheme. TIFF will probably be extended to incorporate JPEGcompression in the near future. There is no provision for an alpha-channel as such, butone could be stored for private interchange as additional RGB planes or with a newprivate tag.

DrawingsIGES

The Initial Graphics Exchange Specification (IGES) is a U.S. file standard for theinterchange of 2-D and 3-D CAD/CAM documents [Smith88, Mayer87]. The format isrecord-oriented, with three encodings: fixed card-image, free-form ASCII, and binary.The file is broken down into several sections, the two significant parts being the entity 3declaration and entity definition sections. Thus, there is overhead, which works againstsmall drawings, and limits to the number of elements in a drawing. There is nomechanism for referencing external files, but there are several ways of referring to andreusing internal entities. The list of primitives and their attributes is very rich--so rich infact, that interchange is difficult because few CAD/CAM systems understand allprimitives. Work is progressing on a new standard to replace IGES, called Product DataExchange Standard/Standard for the Exchange of Product Model Data (PDES/STEP)[Wilson89, OwenS7, Wilson87].

CGM

Computer Graphics Metafile (CGM) is an international standard for the interchange ofdrawings and images [Arnold88]. It supports the typical range of 2-D drawing primitivesand supports color, multi-fonted text, and rasters. Each file may encode multiplepictures, each of which is independent from the others, permitting easy access to each ofthe images. References to sub-images is not possible, nor are references to other CGMfiles. The file format is based on an encoded representation of a sequence of drawingoperators. The drawing operators are clustered into pictures, and the pictures intometafiles. There are three defined encodings: binary, which is the easiest to parse andmost compact; clear text, which is human readable, self-delimiting, and highlytransportable; and ASCII, which is hard to decode, hard to read, but is self-delimiting andreasonably compact. It is also acceptable to define private encodings for use amongapplications. Additionally, there is a mechanism for private application data, usermessages, and an escape mechanism for adding functionality. The addition of 3-Dprimitives is being considered.

3An entityis a geometricshapesuchas a lineor surface,an annotation,a structure,or a macro.

CMU-ITC-098 11


NAPLPS

The North American Presentation Level Protocol Syntax was designed to support low-bandwidth picture information for use in Videotex and Teletext applications [Ninke85].Its data format was optimized to transmit picture elements (polygons, lines, text) in asfew bytes as possible and as a superset of the ASCII standard. This means plain textASCII files are NAPLPS datastreams, but not vice-versa. Nevertheless, it is a reasonableencoding of a 2-D color drawing. Its main limitation is that information providers aresupposed to be aware of the "minimum configuration" (called the Service ReferenceModel, or SRM) which restricts the number of points in the image and the number ofcolors in the palette. Thus, while the format is capable of fairly rich drawings, mostNAPLPS files have been constructed to rely on only the subset of facilities available onthe minimum configuration. It is possible to download character sets, patterns, andmacros (sequences of drawing commands) for reuse in the picture. There is also nohigh-level information about the image being stored, nor are there provisions for privateextensions, references to other files, or references to parts of the drawing being defined.There is a provision for transmitting rasters, however.

PostScript

PostScript 4 is a page description language developed by Adobe Systems Incorporated[Adobe90]. It is a stack-based language [Loeliger81] that has powerful two-dimensionalpage markup primitives and excellent font support. Beyond the language itself, Adobehas established the PostScript Document Structuring Conventions (DSC) and theEncapsulated PostScript File Format (EPSF). The DSC organizes the file in terms of itsprolog, page descriptions, and trailer information which is used by a document manager.It also explicitly states any resources needed by the document and allows for intelligentspool management (skipping downloading of resources if the document manager knowsthat they already exist in the printer). The DSC also allows other files and documents tobe imported, and makes it possible to refer to pages, objects, and resources in thedocument (although DSC files cannot refer to these objects themselves). However, muchof the overhead is required, making it difficult to encode small documents, but there is nolimit to the length of a document. EPSF documents are subsets of DSC documents thatdefine a single page image, intended to be imported into another document for reuse.PostScript is an output option only for most applications, although some applications(Interview's idraw application from the MIT X Windows distribution and AdobeIllustrator 3.0) use it as a storage medium as well. Of course, most page layout systemsallow the inclusion of EPSF documents, but for printing only.

Work is progressing on the definition of an Editable PostScript; that is, an extension tothe DSC that organizes the PostScript page descriptions themselves into nested objectswhich can be intelligently edited. These "out of band" comments are necessary becauseit is mathematically impossible to predict what a PostScript program will do. Theadditional comments form a "contract" which specifies what the enclosed PostScriptcode accomplishes.

4AndDisplayPostScript,whichis an extensionto PostScriptto supportuserinteraction.

CMU-ITC-098 12


Continuous Time Formats

Audio

CD & DAT

Compact Disc digital audio and Digital Audio Tape are interchange standards for thetransmission of audio samples. The samples are organized into tracks, with indices topoints within a track. There is no way for a sample to reference another sample, eitherwithin the file or in another file. The audio information is sampled at 44.125 KHz (48KHz optionally for DAT), 16 bits/sample, left and right channels, for a bandwidth ofabout 150 Kbytes/second, and no latency (the data must arrive on time). There is nocompression. However, there is room for additional data to be stored along with theaudio samples. These so-called sub-codes have been put to use in two ways: as storagefor MIDI note information; and as crude, limited color graphics and text(CD+GRAPHICS).

Speech

There is no "standard" format for speech information but informally, systems use thefollowing parameters: a sample rate of 8KHz, 8 bits/sample, single channel ormonophonic, for a bandwidth of about 8Kbytes/seeond, which is what a voice-gradetelephone line provides [Cater83]. A 250 millisecond delay is tolerable as long as it isnot frequent, and occasional missed sample segments can be replaced with white noise aslong as they are not too long or too frequent. Also, it is possible to perform various kindsof compression, due to the fact that the waveform signal changes relatively slowly. Forexample, a 4-bit delta encoding is usually sufficient.

Music

MIDI

The Musical Instrument Digital Interface (MIDI) is a one-way serial protocol for sendinga musician's gestures, such as key presses and breath control, from one device (e.g., akeyboard) to another (e.g., a synthesizer) [MIDI1.0]. The serial data is sent 8 bits at atime at a data rate of 31.25 kbits/second. The format of the data is a command word

followed by a varying number of operands. There is a mechanism which allows thecommand word to be elided if it would be repeated, for rapid continuous changes. Thedata is sent in real time, allowing a bandwidth of up to 3k gestures/second. A latency ofmore than 2 msec is unacceptable to some musicians, and MIDI is not capable of phase-level synchronization [Loy85]. To convert MIDI to a data format, it is customary totranscribe each command into a file, along with the delta in time between two commandsinserted between the two commands. This allows faithful playback of the MIDIdatastream. It is not possible for the MIDI datastream to refer to itself nor to anotherdatastream, but it is possible to merge two datastreams in time. There are systems whichextract a score from the MIDI datastream, once the keys, tempos, and time signatureshave been specified. This score can either be linked to the datastream, or can be used toresynthesize a new datastream.

CMU-ITC-098 13


MIDI does not have enough bandwidth for some applications--for example those thatamplify and compound the gestural input of the user, so there is work on a superset ofMIDI with a much wider bandwidth called MIDI-2.

Motion Pictures

DVI

Digital Video Interactive (DVI) is a proprietary video and still compression mediumavailable from Intel Inc. and IBM Corp. [Ripley89, Glass89, Loveria90] It was designedto allow the storage of low-quality video, high-resolution stills, and variable qualityaudio onto CD-ROMs. Proprietary lossy three-dimensional (intraframe and interframe)compression of full-motion video (30 frames/second, 256x240 pixel frames, 8 bits ofluminance and 1 bit of sub-sampled chroma information) reduces the data rate to1.5Mbits/second, which is the data rate of CD-ROM drives. There is no latency in CD-ROM drives, and DVI does not allow any latency. Because a DVI file can be broken upon a magnetic hard drive, the seek time between blocks can cause the image to pausenoticeably unless care is taken to properly buffer the data. The DVI format cannotreference other files nor itself, but an application can jump to any point in a videosequence where an absolute frame (a frame coded two-dimensionally, or intraframe only)exists--it is the DVI file author's responsibility to ensure that adequate entry points exist(although the DVI algorithm will automatically insert absolute frames when it would bemore space efficient than the interframe coding). Interestingly, the compression isasymmetric: it requires more time to compress the image than to decompress it. Thisallows the "player" machines to be inexpensive. There are centralized facilities withcompression engines which use parallel processing to compress source video for theapplication developer [Tinker89].

MPEG

The Motion Picture Experts Group (MPEG) motion picture compression scheme is beingdeveloped as an international standard. Like JPEG, it will also use DCT compression,but will add interframe compression to the intraframe compression of JPEG. Thecommittee is aiming for a compression ratio of 100 to 1, which will allow full-motionTV-quality video (720x576 pixels/frame, 30fps) and CD-quality audio to be packed intoa 1.5Mbits/second bandwidth channel (e.g. CD-ROM). Intel is involved in the MPEGcommittee and promises to add the MPEG algorithms to its DVI system [LeGall91,Baran90].

Compositional Formats

IFF

Electronic Arts' Interchange File Format (IFF) is an interchange format in use onCommodore Amiga and Apple Macintosh personal computers. 5 IFF files arehierarchically structured, with tagged blocks of data. There are blocks which can be usedto concatenate, sequence, define shared properties, and structure other blocks. There arealso media-specific blocks (text, rasters, drawings, scores, samples, etc.) which contain

5Foreconomyof code,manyprogramsalsouseIFFas theirstorageformat.

CMU-ITC-098 14


an encoded representation of the media [EA-IFF85]. IFF, like TIFF, has hard limits tothe maximum size of a block, and there is a small amount of required overhead. It is notpossible for the file to refer to or reuse any part of itself, nor is it possible to point toanother file. However, the structure was designed to be parsable without knowledge ofthe media types, so it is possible to manipulate and point to parts of the file withoutcorrupting the unknown media types. There is a central registry for adding new mediatypes, which helps prevent name space collision.

ATK

The Andrew Toolkit (ATK), as previously mentioned, is a storage format with agrammar structure. However, in addition to formatted text, container objects (like thetext, spreadsheet, drawing, and layout objects) allow the placement of arbitrary mediatypes within their datastream, since the protocol for dealing with nested objects (insets inATK parlance) is well defined. This nesting can be arbitrarily deep, as long as"container" objects are used [Palay88]. Some objects, like rasters and buttons, do notallow insets. Adding new media types occurs in an ad hoc fashion, with no mechanismfor preventing name collisions. It is possible to refer to other objects within the file(which allows for multiple views on shared data), but currently it is not possible to referto objects in another file (only to the whole file).

ODA

In addition to providing for the organization and layout of text, ODA allows other media(currently CCITT Group 4 compressed bi-level rastei_s and CGM drawings) to be placedat the paragraph level. There is no provision for arbitrary nesting of media types[Brown89]. New media types can be added either via private extensions, or through theinternational standardization process (tables and color rasters are proceeding through thelatter process). It seems possible to cause an embedded media object to appear inmultiple locations in a document, but not to change the view of the underlying data.

SGML

Once again, Standard Generalized Markup Language (SGML) follows the ODA lead ofallowing CGM datastreams to be included in an SGML document, but also adds supportfor spreadsheets and mathematical notations. SGML also permits the inclusion of taggedbinary data, where the tag would be interpreted privately among applications, but allowthe SGML document to be correctly scanned. SGML does provide the ability to identifyelements for reuse, but permits only whole file reference [ISO-8879, ISO-8879/A1].

Conclusions

Any multimedia system needs the ability to interchange and store different media types.Many of the data formats surveyed were designed for, or are suitable for interchange.Some of the formats, particularly the compressed formats such as JPEG and DVI, arealso suitable for storage and may be an application's only storage option, due to spaceand bandwidth limitations. Unfortunately, very few formats provide access to portions ofa document or substructure. Also, without a uniform representation, maintaining amixed-media collection can be difficult.

CMU-ITC-098 15


Access to Substructure

In order to encourage reuse, particularly of copyrighted and voluminous data, it isimportant for authors to be able to select exactly what they need, and nothing more.Since many systems allow a document to be broken up into several files, one poorsubstitute would be to fragment all datastreams into "interesting" components, andreassemble them via file inclusion to resemble the original document. This would allowauthors to reference the "interesting" pieces only. This scheme is insufficient for tworeasons: the notion of what is interesting cannot be precomputed; and the ability torecover the context of the referenced piece is lost, since the piece has no knowledge ofthe larger whole of which it is a part. Further research is needed into reliable anduniform methods for accessing fragments of a medium.

Mixing Media

Using standard data formats for storage requires that the application know how to parse,manage, and rewrite many formats. Due to the diversity of the formats, the softwareoverhead could prove unwieldy. It may be best to synthesize a new extensible formatwhich has the power to express the data and constructs of the other formats, and usesfilter functions to preserve the difficult-to-replicate data formats (such as JPEGcompression). PostScript Level 2 has the beginnings of such a scheme.

Use in an Information Management System

Ultimately, all multimedia documents should be available under a single informationmanagement system (two such systems are Alexandria [Palay90] and IDEX[Ritchie89]). Such a system would have powerful database technology to support thestructuring and compositing of arbitrary media. Standards like ODA, which define thestructure of the information, and then an encoding, will migrate easily to such a system,since the database would then be just an alternate encoding. However, it will still benecessary to provide interchange functionality with older systems.

CMU-ITC-098 16


References

[Adobe90] Adobe Systems Inc. PostScript Language Reference Manual, 2e Addison-WesleyPublishing Company, Inc., New York, 1985.

[Arnold88] Arnold, David B. and Bono, Peter R. CGM and CGI: Metafile and InterfaceStandards for Computer Graphics. Springer-Verlag, New York, 1988.

[Alexander90] Alexander, Michael. "Data storage: grace under pressure," Computerworld vol.24no.21; May 21, 1990; p17.

[Baran90] Baran, Nick. "Putting the Squeeze on Graphics," BYTE, Dec. 1990, pp. 289-294.

[Borenstein90] Borenstein, Nathaniel S. Multimedia Applications Development with the AndrewToolkit. Prentice Hall, Englewood Cliffs., New Jersey, 1990.

[Brown89] Brown, H. "Standards for structured documents," Computer Journal; vol.32, no.6;

December 1989; pp. 505-514

[Cater83] Cater, John P. Electronically Speaking: Computer Speech Generation. Howard W.Sams & Co., Inc., Indianapolis, Indiana, 1983.

[CCITT-88-VII.3]

CCI'IT. Blue Book, vol. VII.3, recommendations T.4 (Group 3) and T.6 (Group 4),1988. ISBN 92-61-03611-2.

[Chen84] Chen, Wen-Hsiung and Pratt, William K. "Scene Adaptive Coder," in VisualCommunications Systems, eds. Netravali and Prasada, IEEE Press, Inc. New York,1989. pp. 463-478.

[EA-IFF85] Electronic Arts. "EA IFF 85" Standard for Interchange Format Files. ElectronicArts, January, 1985.

[Glass91] Glass, L.B. "Fax Facts," BYTE; vol.16, no.2; February 1991; pp. 301-308.

[Glass89] Glass, L.B. "Digital Video Interactive," BYTE; vol. 14, no.5; May 1989; pp. 283-289

[Graef89] Graef, G.L. "Graphics Formats," BYTE; vol.14, no.9; Sept. 1989; pp. 305-306, 308-310

[ISO-8613] Information Processing Systems - Text and Office Systems - Office DocumentArchitecture (ODA). International Organization for Standardization, ISO 1988.

[ISO-8824] ISO/TC 97. Information Processing Systems - Open Systems Interconnection -Specification of Abstract Syntax Notation One (ASN.1). International Organization forStandardization, ISO 8824:1987(E).

[ISO-8825] ISO/TC 97. Information Processing Systems - Open Systems Interconnection -Specification of Basic Encoding Rules for Abstract Syntax Notation One (ASN.1).International Organization for Standardization, ISO 8825:1987(E).

CMU-ITC-098 17


[ISO-8879] ISO/TC 97. Information Processing Systems - Text and Office Systems - StandardGeneralized Markup Language (SGML). International Organization forStandardization, ISO 8879:1986(E).

[ISO-8879/A1] ISO/TC 97. Information Processing Systems - Text and Office Systems - StandardGeneralized Markup Language (SGML): Amendment 1. International Organization forStandardization, ISO 8879:1986/Al:1988(E).

[ISO-10179] Information Processing Systems - Text Composition - Document Style, Semantics andSpecification Language. International Organization for Standardization, ISO10179:1989.

[JPEG90] Joint Photographic Expert Group (JPEG). Revision 8 of the JPEG TechnicalSpecification, ISO/IEC JTC1/SC2/WG8, CCITr SGVIII, August 14, 1990.

[Lamport86] Lamport, Leslie. LATEX: A Document Preparation System. Addison-WesleyPublishing Company, Inc., Reading, MA, 1986.

[LeGall91] Le Gall, Didier. "MPEG: A Video Compression Standard for MultimediaApplications," Communications oftheACM; vol.34, no.4; April 1991; pp. 46-58.

[Lippman89] Lippman, Andrew and Butera, William. "Coding Image Sequences for InteractiveRetrieval," Communications of the ACM, vol. 32, no. 7, July 1989; pp. 852-860.

[Loeliger81] Loeliger, R. G. Threaded Interpretive Languages - Their Design and Implementation,Byte Books, Peterborough, NH, 1981.

[Loveria90] Loveria, Greg and Kinsfler, Don. "Multimedia: DVI Arrives," BYTE IBM SpecialEdition; Fall 1990; pp. 105-108.

[Loy85] Loy, Gareth. "Musicians Make a Standard: The MIDI Phenomenon," ComputerMusic Journal, vol. 9, no. 4, Winter 1985.

[Mayer87] Mayer, Ralph J. "IGES." Byte vol.12 no.6; June 1987; pp. 209-214.

[MIDI1.0] MIDI Manufacturers Association & Japan MIDI Standards Committee. MIDI 1.0Detailed Specification (document version 4.1, Jan. 89). The International MIDIAssociation, Los Angeles, 1989.

[Microsoft90] Microsoft Corporation. "Rich Text Format," Microsoft Word Technical Reference:for Windows and 0S/2. Microsoft Press, Redmond, Washington. ch. 10, pp. 381-409.

[Ninke85] Ninke, William H. "Design Considerations of NAPLPS, the Data Syntax forVIDEOTEX and TELETEX in North America," in Visual Communications Systems,eds. Netravali and Prasada, IEEE Press, Inc. New York, 1989. pp. 408-421.

[Palay90] Palay, A., Anderson D., Horowitz, M., andMclnerny, M. The Alexandria Project: In

Search of a Unified Environment for Information Access and Management.Information Technology Center, Carnegie Mellon University, Pittsburgh, PA. Inpreparation.

[Palay88] Palay, A. J., et al. "The Andrew Toolkit - An Overview", Proceedings of the USENIXWinter Conference. Dallas, February, 1988. pp. 9-21.

CMU-ITC-098 18

Multimedia Data Formats M. Mclnemy

[Reid80] Reid, Brian K. and Walker, Janet H. SCRIBE - Introductory User's Manual 3e.Unilogic, Ltd., Pittsburgh, PA, 1980.

[Owen87] Owen, Jon and Bloor, M. Susan. "Neutral Formats for Product Data Exchange: theCurrent Situation," Computer-Aided Design vol.19 no.8; Oct. 1987; pp. 436-443.

[Ripley89] Ripley, G.D. "DVI--a Digital Multimedia Technology," Communications of theACM; vol.32, no.7; July 1989; pp. 811-822.

[Ritehle89] Ritchie, I. "HYPERTEXT--Moving Towards Large Volumes," The ComputerJournal, vol. 32, no. 6, 1989; pp. 516-523.

[Rosenberg91] Rosenberg, Jonathan, et aL Multi-media Document Translation: ODA and theEXPRES Project. Springer-Verlag, New York, 1991.

[Rose90] Rose, Marshall T. "Abstract Syntax," The Open Book: A Practical Perspective onOSI. Prentice Hall, Englewood Cliffs, New Jersey, 1990; ch. 8, pp. 225-335.

[Rosenthal90] Rosenthal, Steve. "JPEG emerging as standard for compressing image files;compression chip price decline predicted," MacWEEK vol.4 no.12; March 27, 1990;pp. 31-32.

[Smith88] Smith, Bradford, et al. Initial graphics exchange specification (IGES), version 4.0SP-767 (Society of Automotive Engineers).

[TIFFS.0] TIFF Revision 5.0 Specification. Aldus Corporation, Seattle, Washington, 1988.

[TIFF-F] TIFF Class F Specification. Cygnet Technologies, Berkeley, California, 1990.

[Tinker89] Tinker, M. "DVI Parallel Image Compression," Communications of the ACM;vol.32, no.7; July 1989; pp. 844-851.

[Wallace91] Wallace, Gregory K. "The JPEG Still Picture Compression Standard,"

Communications of the ACM; vol.34, no.4; April 1991; pp. 30-44.

[Warner90] Warner, William C. "Compression Algorithms Reduce Digitized Images toManageable Size," EDNJune 21, 1990; pp. 203-212.

[Webster90] Webster, Bruce F. "The Color Squeeze: Image Processing Demands a Lot ofResources, but Help may be On the Way," Macworld vol.7 no.4; April 1990; pp. 81-86.

[Wilson89] Wilson, Peter R. "PDES STEPs forward," IEEE Computer Graphics and

Applications vol.9 no.2; March 1989; pp. 79-80.

[Wilson87] Wilson, Peter R. "A Short History of CAD Data Transfer Standards," IEEE

Computer Graphics and Applications vol.7 no.6; June 1987; pp. 64-67.

CMU-ITC-098 19

Multimedia Data Formats: Issues and...

Documents

Transcript of Multimedia Data Formats: Issues and...