Bioinformatics Stuart M. Brown, Ph.D. NYU School of Medicine.

BioinformaticsStuart M. Brown, Ph.D.NYU School of Medicine

What is BioinformaticsThe use of computers to collect, analyze, and interpret biological information at the molecular level.

"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."

A set of software tools for molecular sequence analysis

Introduction The Human Genome ProjectChallenges of Molecular Biology computingThe changing role of the Biologist in the Age of InformationBioinformatics softwareGenomics Impact on medicine

I. The Human Genome ProjectThe genome sequence is complete - almost!approximately 3.2 billion base pairs.

All the GenesAny human gene can now be found in the genome by similarity searching with over 99% certainty.However, the sequence still has many gapshard to find an uninterrupted genomic segment for any gene still cant identify pseudogenes with certaintyThis will improve as more sequence data accumulates

Raw Genome Data:

The next step is obviously to locate all of the genes and describe their functions. This will probably take another 15-20 years!

so why are there ~60,000 human genes on Affymetrix GeneChips?Why does GenBank have 49,000 human gene coding sequences and UniGene have 96,000 clusters of unique human ESTs?

Clearly we are in desperate need of a theoretical framework to go with all of this dataCelera says that there are only ~34,000 genes

Implications for BiomedicinePhysicians will use genetic information to diagnose and treat disease.Virtually all medical conditions have a genetic component.Faster drug development researchIndividualized drugsGene therapyAll Biologists will use gene sequence information in their daily work

II. Bioinformatics Challenges Lots of new sequences being added- automated sequencers- Human Genome Project- EST sequencing

GenBank has over 16 Billion bases and is doubling |every year!!(problem of exponential growth...)

How can computers keep up?The huge dataset

New Types of Biological DataMicroarrays - gene expression

Multi-level maps: genetic, physical, sequence, annotation

Networks of Protein-protein interactions

Cross-species relationshipsHomologous genesChromosome organization

Similarity Searching the Databanks What is similar to my sequence?

Searching gets harder as the databases get bigger - and quality degrades

Tools: BLAST and FASTA = time saving heuristics (approximate)

Statistics + informed judgement of the biologist

>gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369

Score = 272 bits (137), Expect = 4e-71 Identities = 258/297 (86%), Gaps = 1/297 (0%) Strand = Plus / Plus

Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59

Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| ||Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119

Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 |||||||| | || | ||||||||||||||| ||||||||||| || ||||||||||||Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179

Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 ||||||||| | |||||||| |||||||||||||||||| ||||||||||||||||||||Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239

Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 || || ||||| || ||||||||||| | |||||||||||||||||| ||||||||Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296

Alignment Alignment is the basis for finding similarity Pairwise alignment = dynamic programming Multiple alignment: protein families and functional domains Multiple alignment is "impossible" for lots of sequences Another heuristic - progressive pairwise alignment

Sample Multiple Alignment

Structure- Function Relationships Can we predict the function of protein molecules from their sequence?sequence > structure > function

Conserved functional domains = motifs

Prediction of some simple 3-D structures (a-helix, b-sheet, membrane spanning, etc.)

Protein domains (from ProDom database)

DNA Sequencing Automated sequencers > 40 KB per day 500 bp reads must be assembled into complete genes- errors especially insertions and deletions- error rate is highest at the ends where we want to overlap the reads- vector sequences must be removed from ends Faster sequencing relies on better softwareoverlapping deletions vs. shotgun approaches: TIGR

Finding Genes in genome Sequence is Not Easy About 2% of human DNA encodes functional genes.

Genes are interspersed among long stretches of non-coding DNA.

Repeats, pseudo-genes, and introns confound matters

Pattern Finding ToolsIt is possible to use DNA sequence patterns to predict genes:promoterstranslational start and stop codes (ORFs)intron splice sitescodon bias

Can also use similarity to known genes/ESTs

Phylogenetics Evolution = mutation of DNA (and protein) sequences

Can we define evolutionary relationships between organisms by comparing DNA sequencesis there one molecular clock?phenetic vs. cladisitic approacheslots of methods and software, what is the "correct" analysis?

II. The Biologist in the Age of Information

The Internet provides a wealth of biological information can be overwhelminge-mailUSENETWeb

Info skill = finding the information that you need efficiently

Computing in the lab - everyday tasks (not computational biology) ordering supplies online reference books lab notes literature searching

Training "computer savvy" scientists Know the right tool for the job

Get the job done with tools available

Network connection is the lifeline of the scientist

Jobs change, computers change, projects change, scientists need to be adaptable

The job of the biologist is changingAs more biological information becomes available The biologist will spend more time using computersThe biologist will spend more time on data analysis (and less doing lab biochemistry)Biology will become a more quantitative science (think how the periodic table and atomic theory affected chemistry)

III. Molecular Biology Software Tools

GCG (Wisconsin Package) The most popular and most comprehensive set of tools for the molecular biologist.- Runs on mainframe computers: (UNIX)- Web, X-Windows (SeqLab) interfaces- Inexpensive for large numbers of users- Requires local databases (on the mainframe computer)- Allows for custom databases and programming

The WebMany of the best tools are free over the WebBLASTENTREZ/PUBMEDProtein motifs databasesBioinformatics service providersDoubleTwist, Celera, BioNavigator Hodgepodge collection of other toolsPCR primer designPairwise and Multiple Alignment

Personal Computer ProgramsMacintosh and Windows applications - Commercial: Vector NTI, MacVector, OMIGA, Sequencher- Freeware: Phylip, Fasta, Clustal, etc. Better graphics, easier to useCan't access very large databases or perform demanding calculationsIntegration with web databases and computing services

Putting it all together The current state of the art requires the biologist to jump around from Web to mainframe to personal computer

The trend is for integration:Web + personal computer will replace text interface to mainframe ?Will the Web become the ultimate interface for all computing ??

The Role of the RCRProvide software (site licenses), computing hardware, and databasesTrain scientists to use the softwareCoursesNewsletter & e-mail updatesSeminarsOne-on-one trainingTechnical support (on our software!)Phone, e-mail, lab/office visitsConsultingRecommendations, joint work, do it for you, custom software development

IV. Genomics

The application of high-throughput automated technologies to molecular biology.

The experimental study of complete genomes.

Genomics TechnologiesAutomated DNA sequencingAutomated annotation of sequencesDNA microarraysgene expression (measure RNA levels)single nucleotide polymorphisms (SNPs)Protein chips (SELDI, etc.)Protein-protein interactions

cDNA spotted microarrays

Affymetrix Gene Chips

Microarray Data AnalysisClustering and pattern detectionData mining and visualizationControls and normalization of resultsStatistical validatationLinkage between gene expression data and gene sequence/function/metabolic pathways databasesDiscovery of common sequences in co-regulated genesMeta-studies using data from multiple experiments

Pharmacogenomics The use of DNA sequence information to measure and predict the reaction of individuals to drugs.

Personalized drugs

Faster clinical trialsSelected trail populations

Less drug side effectsToxicogenomics

Impact on Bioinformatics Genomics produces high-throughput, high-quality data, and bioinformatics provides the analysis and interpretation of these massive data sets.It is impossible to separate genomics laboratory technologies from the computational tools required for data analysis.

Genomics Software @ the RCRAffymetrix Gene Chip Analysis SuiteGeneSpringResearch Genetics Pathways (nylon filters)TIGR Spotfinder, ScanAlyze, Cluster

Coming soon : a shared microarray database

Bioinformatics Stuart M. Brown, Ph.D. NYU School of Medicine.

Documents

Transcript of Bioinformatics Stuart M. Brown, Ph.D. NYU School of Medicine.

Enterprise Learning...Bioinformatics Online Master’s NYU-ePoly’s online Bioinformatics ... art data science algorithms and new pedagogic designs to make the online courses more

Nyu stern essays, nyu stern essay tips apphelp

NYU Presentation

Center for Health Informatics and Bioinformatics NYU Langone Medical Center

| Bioinformatics USC Libraries Bioinformatics Service · USC Libraries Bioinformatics Service ... Galaxy, R & Bioconductor Bioinformatics Servers Hardware: Two Dell PowerEdge R630

Algorithmic Algebraic Model Checking: Hybrid Automata and ... · I would like to thank the NYU Bioinformatics Group for providing a vibrant re-search environment, and providing me

Networks and Algorithms in Bio-informatics D. Frank Hsu Fordham University hsu@cis.fordham.edu *Joint work with Stuart Brown; NYU Medical School Hong Fang.

NASQAR: A web-based platform for high-throughput ... · between the core bioinformatics teams of the NYU Abu Dhabi and NYU New York Centers for Genomics and Systems Biology. Conclusions:

Bioinformatics 2013 Li Bioinformatics Btt029

Multiple Alignment Stuart M. Brown NYU School of Medicine.

Introduction to Python for Biologists Lecture 2 This Lecture Stuart Brown Associate Professor NYU School of Medicine.

What bioinformatics? What is bioinformatics?

1. INTRODUCTION TO BIOLOGY AND BIOINFORMATICS · INTRODUCTION TO BIOLOGY AND BIOINFORMATICS BIOINFORMATICS COURSE MTAT.03.239 11.09.2013 . 2 "Introduction to Bioinformatics" Bioinformatics

July 14, 2016 Webcast for the Bioinformatics MS at NYU Tandon Online

Bioinformatics for molecular biology€¦ · Bioinformatics for molecular biology Structural bioinformatics tools, predictors, and 3D modeling –Structural Bioinformatics DrJon K.

What can I provide? Mark 7/6/04. Who am I? FJU –BS: Computer Science Information Engineering NYU –MS: CS UMDNJ –PHD: Bioinformatics.

The Economics of Networks - NYU Stern | NYU Stern School of

March 19, 2002Marco Antoniotti NYU Bioinformatics Group1 Simulation: Software Methods and Biological Processes Marco Antoniotti NYU Courant Bioinformatics.

CS5263 Bioinformatics Lecture 1: Introduction Outline Administravia What is bioinformatics Why bioinformatics Topics in bioinformatics What you will.

Boolean Networks and Experiment Design B-Cell Single Ligand Screen Stuart Johnson Bioinformatics and Data Analysis Lab UCSD.