Product Retrieval Statistics Canada / Statistique Canada Chuck Humphrey ACCOLEDS/DLI Training...

Post on 24-Dec-2015

216 views 0 download

Tags:

Transcript of Product Retrieval Statistics Canada / Statistique Canada Chuck Humphrey ACCOLEDS/DLI Training...

Product Retrieval

Statistics Canada / Statistique Canada

Chuck Humphrey

ACCOLEDS/DLI Training

December, 2001

Preparing to Download

The Steps in Downloading Files identify DLI title and its delivery

medium (ftp or cd-rom) know about different file types choose a tool to download the files have large enough storage space to

hold the files choose a file naming convention

Delivery Medium

The files of DLI titles are available either on CD-ROM, FTP, or both.

Determining the delivery medium The DLI web site

Name and Acronym of Products A list of Files Searching the Collection

Name & Acronym of Product

Column specifies if a DLI title is available on CD, FTP, or Web

Name & Acronym of Product

Exercise:Go to http://www.statcan.ca/english/Dli/contents.htm

and select the link from “Name and Acronym of Product”

Name & Acronym of Product

Exercise (continued):What delivery media are available for the following titles? Industrial Monitor Health PCCF+ General Social Survey, Cycle 14

A List of Files

“Files” is a misnomer. This is a list of titles and the delivery media on which they are available.

Searching the Collection

Searchs by DLI title result with a description of the product and the delivery mediumOR

Searching the Collection

the files that can be downloaded using ftp.

Delivery Medium Summary

If the medium is CD-ROM, place an order from the web site link to “Submit an Order”.If the medium is FTP or Web, you must determine how you wish to download the file or files.

Background Knowledge

Before discussing which method to use to download files, it is useful to understand two characteristics of files

the encoding of their content, and the relationship between file

extensions and their corresponding computing applications

Content and Extensions

The encoding of file content: Binary

executable, compressed, or proprietary (e.g., Self-extracting, Zip, IVT or PDF)

ASCII plain text (e.g., raw data or read-me

instructions)

Content and Extensions

File extensions and applications:The extensions used with file names can help identify the general contents of files because of the relationship between specific extensions with applications.

For example, .pdf is associated with the Adobe Acrobat Reader and a file with this extension is expect to contain a document.

Content and Extensions

File extensions and applications:Knowing the application associated with a file extension can also help identify the nature of its encoded contents. The file formats of most applications are binary.

For example, .pdf is a binary file format.

Basic Rules to Downloading

Knowing whether a file is to be treated in binary or ASCII mode is fundamental to downloading files.Why? Because the file transfer protocol used to move files between computers operates in two modes: binary and ascii (or text).

Downloading Modes

Which mode to use? Binary mode preserves all of the

content in a file upon transfer, including text and special characters.

Downloading Modes

Which mode to use? ASCII mode preserves text but

lets the operating system process special characters as commands. ASCII also corrects the end-of-line characters between operating systems.

Downloading Modes

Everything can be downloaded in binary mode and the contents will always be safe.

The only disadvantage of downloading text files in binary is that end-of-line designators differ across operating systems.

End of Line Characters

Operating Systems and End of Lines

MS Windows/ DOS cr & lfMacintosh OS crUNI X lf

End of Line Characters

Exercise:View the user guide in cycle 4 of the GSS using WS_ftp both in binary and ascii mode.The file to view is c4microe.txt.(click on the file name, the mode, and then View)

Files and the FTP Modes

The DLI FTP site contains for each title a ‘readme’ file that lists the names of all files, their FTP mode, a brief description, and the number of records and record length for data files.

Files and the FTP Modes

The readme file for the General Social Survey is at the top of the gss directory and called Readgss.txt

Files and the FTP Modes

The FTP mode for each file is identified as A for ASCII and B for binary

Readme File Content

The brief description of the contents of files in the readme file is also helpful in knowing what to expect in each file.

Extensions and Content

Common DLI Extensions

Data .zipSelf -expanding fi le .exeSPSS command fi les .spsSAS command fi les .sasDocumentation .txtDocumentation .pdf

Preparing to Download

We’ve reviewed the delivery media of DLI titles and the different file types and their transfer mode.Next we need to discuss the tools to download files.

File Transfer Tools

Two general types of file transfer tools for downloading DLI files:

independent FTP clients Web browser FTP clients

Independent FTP Clients

Different FTP clients have become popular on different operating systems.

MS Windows : WS_FTP Mac OS : fetch UNIX : ftp

Independent FTP Clients

One distinct advantage of all ftp clients is that they allow viewing and retrieving multiple files with a single command or click of a mouse button.

Independent FTP Clients

These clients also allow setting the file transfer mode and generally provide a great deal of flexibility in controlling an ftp session.

Independent FTP Clients

A disadvantage of these clients is that they rely strictly on the names of directories and files to display what is available for downloading. Therefore, you have to know what it is that you want to download by its file name.

Web Browser FTP Clients

Each Web browser has some level of ftp capability incorporated. Two options exist in using most Web browsers to download files.

connect to the DLI ftp site using the DLI FTP URL, id and password

“searching the collection” on the DLI Web site

Using the DLI FTP URL

Using the FTP URL, the browser displays the directory and file structure of the DLI FTP site. ID & password are displayed when using this method.

Using the DLI FTP URL

Single or multiple directories or file can be selected using a combination of the shift and control key. Right-click of the mouse allows a “copy to folder” in IE.

“Searching the Collection”

Links to files within DLI titles have been organized on the DLI Web site under the “Searching the Collection” section of the site.

“Searching the Collection”

The files in this survey can be downloaded by right-clicking on the mouse and using “Save target as…”. The data file requires Id and password to access.

Summary of Pros and Cons

 

FTP Client DLI Web Site Web Browser

Plus Have access to all of the files on the FTP site

Has an interface similar to Windows Explorer in selecting files

Full text descriptions simplify locating

files

Minus Must rely on abbreviated file and directory

names

ID and password must be entered

on the URL

Can only retrieve

one file at a time

Compression Tools

DLI uses two types of compression

PKZIP (.zip) Self-extracting Zip File (.exe)

Compression Tools

PKZIP can be uncompressed on multiple platforms

MS Windows : WinZiphttp://www.winzip.com/

Mac OS : unstuffithttp://www.aladdinsys.com/

UNIX : unzip http://www.info-zip.org/pub/infozip/

UnZip.html

Compression Tools

Self-extracting Zip files (.exe) are only executable on MS Windows / DOS. Some unzip utilities will also open self-extracting zip files, including WinZip and Unix unzip.

Compression Tools

Pay attention in WinZip to the directory in which files are being written.Also, you may wish to turn off the option to restore the folder names used in the compressed archive.

File Sizes

Pay attention to the sizes of files as you download them. The DLI Web site as well as the Readme file lists the compressed and uncompress sizes of files.

File Sizes

You can also determine the uncompressed size of a file in WinZip before attempting to uncompress it.

Maxline Utility

The DLI FTP site has a useful utility to check the record length and number of records in files. This is particularly useful in confirming the contents of raw data files.

Maxline Utility

The maxline utility is under the directory: utiland is named: maxline.exeMaxline uses DOS naming conventions (8.3). To find proper DOS names, you may need to use the DOS command: dir /x

Maxline Utility

The line length of raw data files should match the maximum specified in documentation.And the number of records is identified as line feeds by maxline.

Naming Conventions

You may choose to institute a naming convention to help store files locally. For example, you may choose to use the DLI directory names. Alternatively, you may use an accession number to categorize DLI titles.

Naming Conventions

The only concern about changing names of files is that you may at some point need to return to the DLI FTP site to confirm something about a file. You’ll then need to know the original file name that is used on the DLI FTP site.

Time for Exercises