Managing experiment data using Excel and Friends

33
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Managing Experiment Data Using Excel and Friends: Digging Out from Under the Avalanche Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 6/1/2006 © 2006 The Board of Trustees of The Leland Stanford Junior University

description

Learning to better manage experiment data using MS Excel and MySQL

Transcript of Managing experiment data using Excel and Friends

Page 1: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu

Managing Experiment Data Using

Excel and Friends: Digging Out

from Under the Avalanche

Yannick Pouliot, PhDBioresearch Informationist

Lane Medical Library & Knowledge Management Center

6/1/2006

© 2006 The Board of Trustees of The Leland Stanford Junior University

Page 2: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

2

Course Expectations Objectives

Demonstrate

… good practices

… useful features

… the value of querying via Excel

Windows vs. Mac

Structure Examples, use cases

Exercises

Resources

Class evaluation questionnaire: http://www.surveymk.com/s.asp?u=915602161402

Page 3: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

3

ContentsComplexity

Excel good practices

Excel handy functions

Querying Web sites &

databases using Excel

+

-

Page 4: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

4

So Why Are We Here?

Lots of data

Need for better management of these data

Need exceeds Excel

Excel never really meant for data management anyway

Applying common tools to ameliorate the problem

“In IT, there’s no problem that enough money

can’t solve” not the philosophy here…

Instead: invest yourself and you’ll get a handsome

return

Page 5: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu

Essential Tip

Clippy: not as dorky as

you might think

Page 6: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

6

How To Help Clippy Give You

Better Answers

Read a (good) Excel manual cover to cover

Don’t try to understand everything

Just flip pages and let it impress into your brain

Not fun, but it will give you the requisite

vocabulary

Increases your odds of getting the right answer

Gives you an idea of what Excel can do

Page 7: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu

Part I: Essential Excel

Functions

Page 8: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

8

Essential Excel Functions

1. Conditional Formatting

2. Named ranges & Input validation

3. Custom Toolbar

4. PivotTable

5. Web Querying

6. MS Query

Page 9: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

9

Excel Functions 1: Conditional

Formatting

Definition: A formatting (e.g., cell shading or

font color) applied automatically by Excel to

cells if a specified condition is true.

Example: applying green cell color to the cell if a

test result exceeds a threshold value

In: Format/Conditional Formatting

See Spreadsheet1.xls/ConditionalExample1 - try

Reference

Page 10: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

10

Excel Functions 2: Named Ranges and

Validation

Named ranges are ranges of cells that are…named!

Named ranges can be used for validating input data Important for ensuring data consistency

Essential for queryability

Also useful to avoid repetitive typing by using drop-down menu

See: Spreadsheet1.xls/InputValidation - try

How to: here

Other references

Page 11: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

11

Excel Functions 3: Custom Toolbar

Why? Bring often used functions together for faster

access

DEMO

How to? 50 min online tutorial

Section on custom toolbars here

Page 12: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

12

Excel Functions 4: PivotTables

Automatic summarization of data Converting same category data into summarized values

Tall/skinny wide/fat

See: Spreadsheet3.xls/Summary1 - try

Underlying data can always be accessed by

clicking on a summary cell

Online demo (5 min)

How to? 30 min online tutorial

Page 13: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

13

Excel Functions 5: Web Querying

Why Query the Web Using Excel?

Data in a Web page = first step

Need data stored in tool used for daily work

Excel

E.g., with a list I can:

Sort

Annotate

Edit

Page 14: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

14

Excel Functions 5: Web Querying

Options

1. Copy/paste Web page into Excel - try

2. Run Web query from within Excel more control -

try

Going one step further: creating a refreshable Web query

Excel Web querying is not perfect… Still limited to how data are formatted on Web page

requires editing

Some Web pages don’t work

No arbitrary querying capability (limited by Web interface)

The answer: direct querying using e.g. SQL

Page 15: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu

BREAK

Page 16: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu

Part II: Querying

Databases Using Excel

Page 17: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

17

Putting MSQuery to Work

MSQuery, an unknown hero Free

Facilitates writing a SQL query graphical

What is SQL?

First, need to find it! Search for “MSQRY32.EXE” using “Search for Files or

Folders”

Search hidden files and folders

On my disk, it is located in C:\Program Files\Microsoft Office\OFFICE11

Once you find it, create a shortcut to it and rename it e.g. MSQuery

move the shortcut to a desired location

Page 18: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

18

Example: Network Querying of Ensembl

Database Using MS Query

What happens when you use MS Query

DEMO

May take some time Big database, lots of data to return from far away…Remote DB

query

query

resu

lts

results

Page 19: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

19

FYI - Bioinformatics Databases:

Who Supports Direct Querying?Direct Queryability of Selected Bioinformatics Databases

Database Internet SQL querying? How? Modality DB Engine

ArrayExpress Eventually SOAP-based

Ensembl Yes

http://www.ensembl.org

/info/data/download.ht

ml

SQL MySQL

Mouse Genome

DatabaseYes ask for account SQL Sybase

NCBI Entrez Yes

http://eutils.ncbi.nlm.nih

.gov/entrez/query/static

/esoap_help.html

SOAP-based SQL Server

PharmGKB Yes

http://www.pharmgkb.or

g/home/projects/webser

vices/

SOAP-based Oracle

Saccharomyces Genome

DatabaseEventuallyMaybe Oracle

Stanford Microarray

DatabaseNo Oracle

Page 20: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

20

How to Query Using MSQuery

Steps

1. Make sure you have the requisite driver

2. Create a Data Source Name

3. Write your SQL query

4. Get the results back into Excel!

Page 21: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

21

Step1: Getting DriversEssential for Querying

A driver is a piece of software that lets your

operating system talk to a database

Each database engine (Oracle, MySQL, etc)

requires its own driver

Generally must be installed by user

Drivers are needed by Data Source Name

tool and querying programs

Require (simple) installation

Page 22: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

22

MySQL Driver: Needed to Query

MySQL Databases

Windows: Download MySQL

Connector/ODBC 3.51 here

Must be installed for direct querying using

e.g. Excel

Not necessary if you are using the MySQL Query

Browser

Page 23: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

23

Oracle Driver: Needed to Query

Oracle Databases

Installing “client” software will install

driver

Windows: Download 10g Client here

Mac: Download 10g Client here

Must be installed if you are querying

using e.g. Excel

Page 24: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

24

Step 2: Creating a Data Source Name

A Data Source Name (DSN) tells programs

on your PC where and how to query a

database

Populating the fields:

Data Source Name: Unique name of your choice

Description: anything

Server: exactly as given by the database provider

Port number: as specified by database provider

Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A

Page 25: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

25

Step 3: Building a Query

DEMO

Page 26: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

26

Resources – Excel Summarizing Numerical Data

Data summarization (text):

http://office.microsoft.com/en-

us/assistance/HA011864391033.aspx

Page 27: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

27

Resources – MS Access Free Online Training Resources

Using an Access database to store and information (2 min) http://office.microsoft.com/en-us/assistance/HA011709681033.aspx

Creating a database from Excel (5 min): http://office.microsoft.com/en-us/assistance/HA012013211033.aspx

Creating tables in Access (50 min): http://office.microsoft.com/training/training.aspx?AssetID=RC061183261033

Writing queries (50 min): http://office.microsoft.com/training/training.aspx?AssetID=RC010776611033

Page 28: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

28

Resources - Excel

Available

via Safari

Accessible from

Lane Library

Available

via Safari

Page 30: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

30

MS Query Resources

Excellent tutorial:

http://office.microsoft.com/training/Training.as

px?AssetID=RP011856321033&CTT=6&Orig

in=RC011856161033

Page 31: Managing experiment data using Excel and Friends

Lane Medical Library &

Knowledge Management Center

http://lane.stanford.edu

31

Resources – SQL

SQL=Structured Query Language

The Language to Query Relational Databases

Beginning SQL, Wilton P & Colby JW: E

http://jenson.stanford.edu/uhtbin/cgisirsi/5AG

uKeptoD/GREEN/59960102/9#holdings

Oracle SQL*Plus, Gennick, J.

Beginning MySQL: E

http://site.ebrary.com/lib/stanford/Doc?id=101

14227

Page 33: Managing experiment data using Excel and Friends

Lane Medical Library & Knowledge Management Center

http://lane.stanford.edu