Using the Web for Research: gathering and referencing ... · Using the Web for Research: gathering...

14
Using the Web for Research: gathering and referencing online materials Venue: SOAS, L62 Tutor: Jens Franz ([email protected] ) 3 consecutive Wednesdays 10:00 -12:00 or 14:00 - 16:00 in room L62 on January 25th, February 1st and 8th 2006 NB. This is a three-part course and while you can attend a session on its own, the sessions build upon each other and it is therefore recommended that you attend all three sessions. Session One - the basics - January 25th: - Introduction to workshop and participants. - acquiring and documenting text and graphics content from the internet in a restricted environment such as university computers / internet cafés (with only Internet Explorer & MS Word available). - citation conventions for material gathered from online sources - including web pages, blogs, usenet postings & e-mail correspondence. - obtaining background information on a website or web forum contributor - who owns the site and where is it located (IP addresses, WhoIs). Session Two - advanced archiving techniques - February 1st: - more efficient methods of gathering online material for research using alternative freeware & open source software (screen capturing, tree-based information managers to collect and structure online clippings, Firefox browser and extensions). - using a USB stick to run such utilities on other computers. Session Three - streaming & mirroring - February 8th: - how to capture and archive streaming audio and flash animations. - strategies for archiving (ephemeral) sections of a website on your own computer for later analysis (site mirroring). Objectives & Learning Outcomes: At the end of the workshops, students should have gained the following skills: - ability to capture & customise format and appearance of text and graphics gathered on the internet. - understanding of and competence in referencing online material. - awareness of underlying structure of the internet and ability to identify ownership of a website and geographic location of a forum contributor. - ability to archive material that cannot be saved easily and/or is of a fleeting nature. ***** If you would like to learn more about using online electronic journals and bibliographies available at SOAS, please refer to the SOAS library webpages for details of resources and workshops: http://www.soas.ac.uk/library/index.cfm?navid=224

Transcript of Using the Web for Research: gathering and referencing ... · Using the Web for Research: gathering...

Using the Web for Research: gathering and referencing online materials

Venue: SOAS, L62

Tutor: Jens Franz ([email protected]) 3 consecutive Wednesdays 10:00 -12:00 or 14:00 - 16:00 in room L62 on January 25th, February 1st and 8th 2006

NB. This is a three-part course and while you can attend a session on its own, the sessions build upon each other and it is therefore recommended that you attend all three sessions. Session One - the basics - January 25th: - Introduction to workshop and participants. - acquiring and documenting text and graphics content from the internet in a restricted

environment such as university computers / internet cafés (with only Internet Explorer & MS Word available).

- citation conventions for material gathered from online sources - including web pages, blogs, usenet postings & e-mail correspondence.

- obtaining background information on a website or web forum contributor - who owns the site and where is it located (IP addresses, WhoIs).

Session Two - advanced archiving techniques - February 1st: - more efficient methods of gathering online material for research using alternative

freeware & open source software (screen capturing, tree-based information managers to collect and structure online clippings, Firefox browser and extensions).

- using a USB stick to run such utilities on other computers. Session Three - streaming & mirroring - February 8th: - how to capture and archive streaming audio and flash animations. - strategies for archiving (ephemeral) sections of a website on your own computer for

later analysis (site mirroring). Objectives & Learning Outcomes: At the end of the workshops, students should have gained the following skills: - ability to capture & customise format and appearance of text and graphics gathered on

the internet. - understanding of and competence in referencing online material. - awareness of underlying structure of the internet and ability to identify ownership of a

website and geographic location of a forum contributor. - ability to archive material that cannot be saved easily and/or is of a fleeting nature.

***** If you would like to learn more about using online electronic journals and bibliographies available at SOAS, please refer to the SOAS library webpages for details of resources and workshops: http://www.soas.ac.uk/library/index.cfm?navid=224

– 1 –

day one: background and basics

1.1 capturing parts of webpages for presentations/essays Function of the Print Scrn key: Captures entire screen to clipboard, from where it can be pasted into other applications such as MS Word, Wordpad or Paint for further processing; ALT + Print Scrn captures the active window onto the clipboard. One thing that does not get captured though is video clips (see right).

1.1.1 what default programmes are there? Following are applications that can be found on almost any public Windows computer and some tips on how to use them to document your online material. Internet Explorer (IE): - Increase the size of small text before capturing.Text can

be scaled up without loss of quality while it is still text and not part of an image: IE menu bar: ‘View’– ‘Text Size’ (does not scale graphic elements – unlike Firefox or Opera Browsers which do)

- Use F11 to show a webpage in full screen mode in order to capture a larger section of a webpage than is visible in the browser window.

- Flash animations can be stopped, rewound and stepped through frame by frame through the menu available with a right-click over the animation (this context menu is sometimes disabled by the authors of a Flash animation).

MS Word: - When copying text from webpages with complex layout, do make use of the ‘Paste Special’

oprion (in Edit menu) to avoid transferring webpage layout into messy word tables.

The Picture Toolbar can be used to manipulate captured images, particularly to crop them to the relevant elements, but also to adjust size and brightness.

outcome of an attempt to capture streaming video – the screen

remains black…

– 2 –

cropping: <= before after => URL: http://www.hostip.info/ Paint: A very rudimentary image editing programme. Good enough to save images captured and pasted via the clipboard, but if possible use an image viewer such as Irfanview (will be covered on day 2) for this. Paint in Windows-XP now allows for files to be saved in compressed formats such as .jpeg & .gif, but still does not allow cropping, so this has to be done in MS-Word.

1.1.2 recovering Flash animation files from the browser cache The cache is a folder where your internet browser (temporarily) stores copies of visited webpages. Usually multimedia content such as Macromedia Flash animations are also stored there, and while you cannot save these to a local directory from within the browser window, you can look for them in the cache folder and copy them from there to a permanent location (e.g. the My Documents folder). For IE on Windows the cache folder is usually in the following location: C:\Documents and Settings\[username]\Local Settings\Temporary Internet Files, and you can look for files ending in SWF, DAT or WMV.

There is however an easier way to access the cache of a Browser: Ø in Internet Explorer you can reach it through the Options menu:

Tools – Internet Options – General tab – Settings – View Files Ø in Firefox & Opera browsers you can view the cache by typing “about:config” in the URL

bar – a list of all the files in the cache is displayed in a browser window and can be searched (CTRL + F).

– 3 –

1.2 citation of online sources As with references to printed materials, the purpose of consistent and accurate citation of online resources is to allow your readers to access the material you have quoted, and this may require the provision of additional information. Many professional bodies in the US have issued guidelines for citation of online materials – the UK is a bit behind here, and since citation styles vary between academic discplines, the following can only highlight general points to look out for rather than give definitive prescriptions. Wherever possible, provide the following information and in this order: – Author – Year (or full date) – Title (the name of the web page) – Type of Document, i.e. WWW page, Newsgroup, Listserv message – URL (do include the protocol type, i.e. http://, ftp://, gopher://) – “date accessed” or “accessed on” or “retrieved on” [Access Date]. e.g.:

Chandler, Daniel (1998): 'Personal Home Pages and the Construction of Identities on the Web'. WWW document, URL: <http://www.aber.ac.uk/media/Documents/short/webident.html>, accessed 14 June 2005. or if there is no identifiable author or date of publication: wikipedia (n.d.): ‘IP address’, WWW document, URL: <http://en.wikipedia.org/wiki/IP_address>, accessed on: 12 June 2005.

Note: Ø If a webpage hasn’t got an author or a date (which is often the case), then either quote the

institution or compiler of website “name (comp.)” or start with the page title. Ø The access date is important because of the fleeting and changing nature of much online

content – an article you surf across today might have been changed or deleted from the website tomorrow, or its URL may no longer be correct.

Ø URL addresses can often get very long, so the convention is to place a linebreak after a slash or before a dot. Online journals and some weblogs provide a “persistent link” to an online article – quote that rather than the often cryptic and volatile actual URL of the webpage.

Ø When quoting from a private correspondence do not include the sender or recipient’s e-mail address – unless you have the explicit consent of the correspondent. Personal communications are also not listed in the bibliography.

Ø If your work will be available online, do not use underlining to indicate italicization, since underlining in web pages indicates the presence of a hypertext link.

Ø When providing a hyperlink in your paper, it may be a good idea to leave at least one space before a punctuation mark to clearly separate it from the URL.

– 4 –

Comprehensive online guides for quotation styles are provided here: APA stlye: http://www.bedfordstmartins.com/online/cite6.html

MLA style: http://www.bedfordstmartins.com/online/cite5.html And there are two books available in the SOAS library dealing with the citation of sources you didn’t even think of (but not necessarily the ones you did think of, such as a weblog – that’s technological progress for you ;-) ♦ Gibaldi, Joseph (2003): MLA Handbook for Writers of Research Papers, 6 th ed., The Modern

Language Association of America, New York. [SOAS Lib.: A 808.027/834,125] ♦ Walker, Janice R. & Taylor, Todd (1998): The Columbia Guide to Online Style, Columbia

University Press, New York. [SOAS Lib.: A 808.027/795,145]

– 1 –

day two: gathering and referencing online materials

1.1 a case for Sam Spade – investigating the background of a website

When looking at a website it can sometimes be useful to find out whether the claims made about the operators of the site are indeed accurate, and also where a specific website or page is hosted geographically. This section gives some basic hints how to find out, though you should keep in mind that there are many cases in which these tools do not reveal the true owner/operator of a website (such as sites registered through anonymous registrars or hosted on servers such as triod or geocities.

1.1.1 WhoIs, IP, DNS A URL with a domain name such as www.samspade.org is the user-friendly way of identifying a website. However, the internet is actually structured by the underlying Internet Protocol (IP) system, an address of which looks like this in the current standard: 206.117.161.81, and which is used by communicating computers to identify each other. A Domain Name Server (DNS) resolves a domain name sent to it by your internet browser to its corresponding IP address. Each IP address is unique on the intenet and IP addresses are often likened to telephone numbers – only that they don’t have the equivalent of an area code, so the IP address gives no indication of where the computer is located geographically. This is where services such as http://www.hostip.info or http://www.networldmap.com/ come in. They make use of databases of internet servers’ IP addresses and their geographical location. There is also a spreadsheet file containing a list of IP addresses and the name of the country they have been assigned to, and it can be downloaded at: http://www.maxmind.com/app/geoip_country .

To find out who owns a domain name the WhoIs service can be used, and while there are many such servers responsible for different Top-Level Domains (TLDs) such as .uk .cn .org or .tv, there are also convenient websites that try to automatically forward the request to the right server. Two good websites for this task are: http://www.samspade.org and http://centralops.net/co/ For more information about the general workings of the Internet Protocol and also the next generation IP6, see: http://en.wikipedia.org/wiki/IP_address .

– 2 –

1.2 graphics capture and formating with irfanview

1.2.1 GIF, TIF, JPG, dpi: There are lots of graphic file formats you can use to save graphics in – we will concentrate on the most widespread ones here, namely jpg, gif & tif. The various file formats and compression/quality options available in irfanview can be selected in the dialog window upon saving an image. TIF: can save graphics with various compression algorithms, but may be most useful for saving black&white images (e.g. text in image form) at a small filesize without loss of quality (using the CCITT Fax 4 compression type). JPG: uses a a lossy compression algorithm to achieve small file sizes, can lead to artefacts especially with small text or intricate patterns. Quality can usually be adjusted, though JPGs should be created from an uncompressed original since re-compression causes increase in artefacts. Recommended format for screenshots containing only graphics. GIF: reduces number of colours to an (indexed) palette of 256, but compresses images containing text much better. Recommended for screenshots containing text & graphics

Text above captured as JPG (note articacts) size 7kB, below captured as GIF size 2.5kB

dpi (= dots per inch): the resolution of the image: while monitor screens have dpi values of 72 or 96, printers operate with 300dpi upwards. The dpi value does not in itself change the size of an image, but many software programmes use this value to determine the scale at which an image is displayed. In irfanview you can change the dpi value by hitting the � image information button in the middle of the toolbar.

– 3 –

1.3 links to the applications introduced in the workshop Below is a list of the applications used in the workshop. The more extensive description of what they do and how to use them is contained in a sample Keynote file we will be using during the session and which you can download from the USB_files folder on the server (Pandora / usually mapped to drive W: ) and play with.

1.3.1 Mozilla Firefox v1.50 – an open-source internet browser to complement/replace IE. URL[ http://www.mozilla.com The portable version of Firefox can be found at: URL[ http://portableapps.com/internet/browsers/portable_firefox Firefox is a web browser alternative to MS Internet Explorer, particularly useful for online research due to the extensions available for it which allow you to customize the browser with additional functions. A catalogue of extensions available can be browsed at https://addons.mozilla.org/?application=firefox . Firefox Extensions discussed in the workshop: ScrapBook 0.18.5 URL[ http://amb.vis.ne.jp/mozilla/scrapbook/ Mozilla Archive Format (MAF) URL[ http://maf.mozdev.org/installation.html CopyURL+ URL[ http://copyurlplus.mozdev.org/

1.3.1.1 IrfanView v3.98 URL[ http://www.irfanview.com Small graphics viewer that allows basic editing of images

1.3.1.2 Keynote v1.6.5 URL[ http://www.tranglos.com/free/keynote.html A very handy little information manager that allows you to keep any kind of information in a treebased node sructure.

1.3.1.3 AudioGrabber v1.83 URL[ http://www.audiograbber.com-us.net/ Does what it says on the tin – it can be used to capture the sound from streaming audio and video – in fact any sound you can hear on your computer – to mp3 files.

1.3.1.4 HTTrack v3.40 URL[ http://www.httrack.com/index.php An open source offline browser/website mirroring programme that is free for private use. What it does is go out onto the internet and mirror parts of a website as a local copy on your computer. you can then work offline with that mirror.

day three: gathering and referencing online materials

1.1 Additional Notes for Week Two Some of you were a bit puzzled by the lack of printed step-by-step instructions on how to use the additional programmes such as Keynote or Irfanview. While the purpose of the workshop is to introduce you to these tools and leave you to discover their full potential and applicability to your specific circumstances, here are some basic hints on how to get them up and running. For further information on these programmes please see the Keynote file “Onlineresearch_wks_2006.knt”, use the help & tutorial files of the individual applications, and also have a look at the programmes’ web-links in the handout for week 2. Launching and Creating Shortcuts for the Applications

It also became apparent that some people were missing shortcuts to the programme files located within the individual application folders. Please remember that Firefox and Keynote need to be moved to a location where you have write access first, e.g. the Desktop or the My Documents folder (you can do this by click-dragging the relevant folder onto the desktop). From the Desktop you can then also copy the folder to your USB drive (E:). One other thing to keep in mind is that closing some applications such as Audiograbber or Keynote may take a while to close down (20-30 seconds) when they are running from a server or USB stick – please just be patient – there is no need for concern that your computer may have crashed. To help you identify the programme files (.EXE) you need to double-click inside the respective application folder in order to start an application, please take a look at the images below:

– 2 –

You can create shortcuts to these programmes but have to remember that these shortcuts might not work when accessing them on a different computer. If you do transfer some of the applications onto a USB stick and use that mainly at SOAS, shortcuts placed anywhere on the USB stick (e.g. root directory E:\) should work ok. Irfanview and capturing images Once started, the magic key for beginning to capture screenshots is C, and once you have set the capture options Irfanview is running in the background (see taskbar at bottom of screen) until you press CTRL + F11 to capture the screen/active window. You can then crop the image by dragging a frame around the section you want to keep and choosing ‘Crop’ from the menu { CTRL + Y }. With Irfanview you can also sharpen the image, resize it alter its colours & contrast, or insert a caption (see ‘Edit’ menu on right), and save the result in an image format of your choice. The screen capture tool is useful mainly when trying to capture an ensemble of text and graphics or individual frames from an animation (Video, Flash or GIF). However, if you only want to copy an image from a webpage, the best way of doing that is to right-click

– 3 –

over the image and choose “save picture as”. The saved file will give you the best available quality of the image to archive or manipulate.

1.1.1 Keynote For using Keynote, the tree-based information management tool, I would recommend that you explore the “sample.knt” file which is located in the Keynote directory. It gives you a quick introduction to the main aspects of this versatile tool.

To automatically capture several bits of text from webpages (or indeed any other source such as PDFs or Word documents) into separate nodes in the document tree click F11 or choose ‘Note – Clipboard Capture’ from the menu. You can re-arrange the location and hierarchy of nodes and sub-trees by dragging them with the mouse – even into other tabs in the same file. For most functions there are several ways of accessing them: keyboard shortcut, icon in toolbar or through the menu.

1.1.2 Firefox & Extensions: Mozilla Firefox is an alternative to Internet Explorer (IE) that is gaining fast in popularity. It is an open-source programme that conforms to the standards set for web-pages, but its strong points vis-à-vis IE are that it is faster in rendering web-pages, it is slightly more secure, and it allows several web-pages to be opened in the same window as tabs (see below).

– 4 –

But for gathering materials online, Firefox’s biggest advantage is the possibility to add certain functions in the form of ‘Extensions’, and we will focus on the use of an extension called ‘Scrapbook’ to capture graphics and text exactly the way they appear on screen, and complete with their URL and date of access.

While we are using the portable version of Firefox in the workshop, the only difference between this and the standard version is that the latter supports Java – the portable version has to rely on Java already being installed on a computer you operate on.

– 5 –

1.2 Sound Recording More and more websites provide sound and video in addition to test-based content, and at times it can be useful to record such audio either for research or language-learning purposes. While podcasting and the spread of mp3-players have meant that many websites now offer their audio programmes in a downloadable form, others still only provide streaming audio and video which cannot easily be saved. We will look at ways to record such sound anyway: Windows does have a little programme called Sound Recorder, and it can be located from the Start Menu under Programmes – Accessories – Entertainment. While it does allow recording of sound, it only exports uncompressed WAV files, i.e. very large files, but what is even worse is that it only records for 60 seconds. Therefore we will turn to a freeware application called Audiograbber to record streaming audio. Regardless of which programme you use to record streaming audio, it is necessary to adjust the ‘Audio Properties’ of a sound card to tell it what to record. ‘Audio Properties’ are accessed via right-clicking the speaker symbol in the Windows System Tray, choosing ‘Open Volume Control’ and then selecting ‘Properties’ from the ‘Options’ menu. [if the speaker symbol is not visible, you can try to access the audio settings through Start Menu – Control Panel – Sounds and Audio Settings] See the image below for the relevant settings:

When using Audiograbber, there is actually a shortcut to the audio settings window in the recording window itself that makes life a lot easier. When setting the recording volume, be sure that the level bars do not go beyond 100% - otherwise you will end up with very distorted recordings.

– 6 –

Audiograbber – as the name suggests – is a tool to capture audio from the computer into mp3, wav or other file formats. It was originally written to convert music CDs to mp3 files, but it does have a function called "Line in sampling" under the File menu which records from other sound sources such as an audio or video stream playing in Windows Mediaplayer or Realplayer. You could even record straight onto your mp3-player if you set the output directory for the mp3 files accordingly.

All the necessary adjustments for recording voice quality sound have already been made on the version of Audiograbber in the USB_files folder. All you need to do is give your files a name and press ‘Record’…

For further details on settings and the Website mirroring programme HTTrack please see the “onlineresearch_wks_2006.knt” Keynote file.