Beyond The Bench Workshops

48
ORGANIZERS: SPONSORS: #BeyondTheBench #BECareer2013 #CurrentExchange

Transcript of Beyond The Bench Workshops

Page 1: Beyond The Bench Workshops

ORGANIZERS: SPONSORS:

#BeyondTheBench  #BECareer2013  

#CurrentExchange  

Page 2: Beyond The Bench Workshops
Page 3: Beyond The Bench Workshops

Establishing your

online presence

Robert Aboukhalil

Page 4: Beyond The Bench Workshops

Why

? You’re being Googled

Page 5: Beyond The Bench Workshops

#1:

Lin

ked

In

Page 6: Beyond The Bench Workshops

Why LinkedIn? • Online CV + networking • Recruiters use LinkedIn •  Find jobs posted on LinkedIn • Apply to jobs

Page 7: Beyond The Bench Workshops

www.linkedin.com/pub/robert-aboukhalil/84/a648df/

Page 8: Beyond The Bench Workshops
Page 9: Beyond The Bench Workshops

#2:

Fac

eboo

k

Page 10: Beyond The Bench Workshops
Page 11: Beyond The Bench Workshops

#3:

Tw

itter

Page 12: Beyond The Bench Workshops

#4:

You

r w

ebsi

te

Page 13: Beyond The Bench Workshops

Step 1: Wordpress.com

Page 14: Beyond The Bench Workshops

Step 1: Wordpress.com

Page 15: Beyond The Bench Workshops

Step 2: themeforest.net

Page 16: Beyond The Bench Workshops

Step 2: themeforest.net

Page 17: Beyond The Bench Workshops

Step 3: Have an awesome portfolio

Page 18: Beyond The Bench Workshops

Now

wha

t?

Page 19: Beyond The Bench Workshops
Page 20: Beyond The Bench Workshops
Page 21: Beyond The Bench Workshops

A language all scientists should know

How R helped me look at billions of genotypes and how it can help you too

Mitchell Bekritsky

WSBS Graduate Student

Page 22: Beyond The Bench Workshops

What is R?

•  Language for statistical

analysis, data manipulation

and graphics

•  Open source

•  Flexible language

•  Powerful built-in functions

•  Strong user community

•  Publication quality graphs

•  Free!

Graphic  from  h=p://blenditbayes.blogspot.com/2013/06/visualising-­‐crime-­‐hotspots-­‐in-­‐england_25.html  

Page 23: Beyond The Bench Workshops
Page 24: Beyond The Bench Workshops

Who uses R?

Source:  h=p://www.revoluKonanalyKcs.com/what-­‐is-­‐open-­‐source-­‐r/companies-­‐using-­‐r.php  

Page 25: Beyond The Bench Workshops

What is R used for?

•  Movie recommendations

•  Credit risk analysis

•  Tailoring online advertising

•  Predicting economic activity

•  Clinical drug development

•  News graphics

•  Modeling oil spills

•  Predicting election outcomes

Graphic  from  h=p://www.nyKmes.com/interacKve/2009/06/25/arts/0625-­‐jackson-­‐graphic.html  

Page 26: Beyond The Bench Workshops

But I’m a biologist…

Page 27: Beyond The Bench Workshops

How R helped me see my data

•  First time looking at microsatellite genotypes

•  How many microsatellites differ from reference genome?

•  By how much?

Problems:

–  Lots of data (4.7 million genotypes)

–  Complex information

–  Too big for Excel

–  No good graphics in Excel either

Page 28: Beyond The Bench Workshops

One of my first graphs in R

Lessons learned about my data

•  Lots of microsatellites differ

from reference by a little bit

•  Thousands differ by ± 20 bp

•  8.27% of all microsatellites

differ from reference (~400k)

Lessons learned about my graph

•  This is a terrible graph

Page 29: Beyond The Bench Workshops

A bad R graph is better than no R graph

Bad graphs helped me

•  Understand my data better

•  Improve my analyses

•  Improve how I communicate

my data

•  R has incredible flexibility for

graphing—if you can dream it,

you can probably build it

Page 30: Beyond The Bench Workshops

A bad R graph is better than no R graph

Bad graphs helped me

•  Understand my data better

•  Improve my analyses

•  Improve how I communicate

my data

•  R has incredible flexibility for

graphing—if you can dream it,

you can probably build it

My best R graphs make one point clearly without clutter

Page 31: Beyond The Bench Workshops

For example…

Page 32: Beyond The Bench Workshops

How R saved my thesis

•  Processing lots of sequencing

data in hundreds of people

•  Too many people and

processes to monitor all steps

of pipeline by eye while data

was being processed

Sanity check

•  After data processing did data

look bi-allelic?

Page 33: Beyond The Bench Workshops

How R saved my thesis

•  Processing lots of sequencing

data in hundreds of people

•  Too many people and

processes to monitor all steps

of pipeline by eye while data

was being processed

Sanity check

•  After data processing did data

look bi-allelic?

No!!  

Page 34: Beyond The Bench Workshops

Troubleshooting using R

•  People don’t actually have massive deletions and amplifications

•  My pipeline was deleting files because of a bug, which would

remove large chunks of chromosomes

•  Thanks to R, I found people where this had happened, tracked

down the bug, and didn’t report massive CNVs in autistic children

Side note

•  If it looks too good to be true, it probably is

Page 35: Beyond The Bench Workshops

R helped me build a better genotyper

•  Some non-reference alleles

aren’t covered well

•  Leads to incorrect genotype

calls

Problem

•  How do I develop a smarter

genotyper and know that it

works?

Page 36: Beyond The Bench Workshops

R helped me build a better genotyper

•  Some non-reference alleles

aren’t covered well

•  Leads to incorrect genotype

calls

Problem

•  How do I develop a smarter

genotyper and know that it

works? 0 20 40 60 80 100

020

4060

80100

chr19:54772760 A repeat, reference length 8

8 bp allele coverage

10 b

p al

lele

cov

erag

e

Genotypes10|-110|108|-18|108|8

Page 37: Beyond The Bench Workshops

Modeling genotypes in R

•  Built a model for biased

genotypes in R

•  Model helped me build a more

accurate genotyper

•  When applied to real data,

clear improvements

Page 38: Beyond The Bench Workshops

R finds de novo mutations for me

•  >300 million genotypes

•  How do I find de novo mutations in all that data?

R to the rescue!

Page 39: Beyond The Bench Workshops

What R has done for me

Data mining

•  Finding de novo mutations

•  Quality control for my data

Data manipulation

•  Converting raw read counts to genotypes

Data simulation and modeling

•  Finding ways to improve my genotyper

Data visualization

Page 40: Beyond The Bench Workshops

R has extensive support for biologists

Bioconductor is an incredible resource for biological analyses in R

•  Microarrays

•  Differential expression (DESeq, edgeR, cummeRbund)

•  Gene models

•  Flow cytometry (flowCore, flowStats, flowViz)

•  Interacting with Ensembl, Cosmic, Gramene, etc. (biomaRt)

Page 41: Beyond The Bench Workshops

Installing R

•  R can be downloaded from r-

project.org

•  R runs on PCs, Macs and

Linux computers

•  The R project website has an

R manual to get you started

Page 42: Beyond The Bench Workshops

Working in R

Native R interface can be hard to

work with

•  Lots of windows

•  Difficult to keep things

organized

Page 43: Beyond The Bench Workshops

RStudio interface

•  All your variables, help pages,

script windows and consoles

in one place

•  Highlights R code for easier

programming

•  Tabbed windows for multiple

scripts

•  History saves all previous

commands, plot history saves

all previous plots

•  Find it at rstudio.com

Page 44: Beyond The Bench Workshops

Learning R

Many online tutorials

•  R has its own introduction

•  Statistics Using R with Biological Examples

Take interesting data, use it to explore R

•  Plot, graph, use statistical tests

Ask someone who knows R

•  Getting started is pretty easy

•  Learn what you need when you need it

Page 45: Beyond The Bench Workshops

Thanks!!

Page 46: Beyond The Bench Workshops
Page 47: Beyond The Bench Workshops

The Bioscience Entreprise Club is dedicated to helping CSHL’s science research professionals and alumni cultivate and leverage their cross-disciplinary skill sets and expertise to transition into diverse careers.

Page 48: Beyond The Bench Workshops

Current Exchange is CSHL’s very own student-run magazine. We feature articles about science aimed at a general audience. Check out our inaugural issue at issuu.com/currentexchange Send your articles to [email protected] by November 5, 2013