Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant...

60
Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant...

Page 1: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Using Internet Databases to Collect Information on

Bladder Cancer

Riham Soliman

Research assistant

Bioinformatics group

Page 2: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Objectives of talk

1. Outline the importance of web-based public databases in the medical field

2. Necessity of having a biomedical research portal containing information collected from experiments on Egyptian samples

3. Outline our research goals

4. Explain our research and its benefits to the Egyptian community

Page 3: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Introduction

Page 4: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

The basics

Cytoplasm

Nucleus

DNA double-helix

From: iGenetics CD-ROM (Animation Chapter 1: Genetics: An Introduction)

Page 5: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Molecular genetics

Nucleotides are molecules constituting the DNA double-helix

All our traits are encoded in DNA

Genes are specific sequences of nucleotides that characterize our traits passed on from parents

Modified from: iGenetics CD-ROM (Animation Chapter 2: DNA as Genetic Material: The Hershey-Chase Experiment)

C G

A T

CG

T

C G

CG

3 billion nucleotides

C G A T

Complements

Page 6: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Gene expression

How is DNA transformed into functional output for the cell, and consequently organism, survival?

Central dogmaDNA RNA protein

Gene expression analysis can be performed by studying RNA level- transcriptome Protein level- proteome

Transcription Translation

Page 7: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Genetic mutations

Changes in the genetic sequence

Required for genetic diversity among individuals

Disease-causing mutations Deletions Insertions Duplications

http://www.genome.gov//Pages/Hyperion/DIR/VIP/Glossary/Illustration/mutation.cfm

Page 8: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

What is cancer?

Normally cells will grow and divide until organism has completed development

Some cells retain ability to grow and divide long after termination of development carcinogenesis

Uncontrolled cell division arises

The cell only cares about making more copies of itself rather than undergoing proper division

Page 9: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Cancer-causing mutations

Tumor suppressor genes (TSG) Mutations might cause under expression of TSG

Proto-oncogenes Mutations cause them to become over expressed Become oncogenic (cancer-causing)

Carcinogenesis is a multi-step process A single mutation is not enough Accumulation of more than one mutations is necessary

Page 10: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Mutagenesis: multi-step

http://www.cancervic.org.au/about-cancer/what_is_cancer

Page 11: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Bioinformatics: a history

Is an interdisciplinary discipline combining medicine, biology, computer science and mathematics. Serves the biological and medical community Based on computational power

Dates back to 1960s Discovery of DNA double helix Discovery of genes; contain information guiding building of all cellular

components.

Human genome project Completed in 2003 Sequencing of the entire human genome

Today Challenge of amalgamating large amounts of data from biomedical research

Genetic research Molecular research

Page 12: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Databases and information stored within them

Page 13: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Why are databases necessary?

Data provided is tailored to scientist’s requirement

Offers a variety of information on genes, RNA, proteins, diagrams, images, etc.

Databases sprout collaborations between scientists Improved research Data sharing Interoperability

Ease-of-access to stored data

Considers the fact that molecular scientists might not be computer proficient

Page 14: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Information provided on databases

Literature NCBI (National Centre for Biotechnology Information) General databases

Google search Scholar Academic databases

Ebscohost

Sequence data

Protein Sequence 3D structure

Level of expression Different experimental conditions comparable to physiological environment Time-course experimentation

Protein-protein and protein-DNA interactions

Page 15: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

KEGGKyoto Encyclopedia of Genes and Genomes

Cytoplasm

Nucleus

Nuclear membrane

Page 16: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

KEGG: bladder cancer

Page 17: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

MAPK pathway from KEGG

Page 18: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

WikiPathways

Page 19: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

MAPK pathway on Wikipathway: downloaded using GenMAPPGenMapp is an open source bioinformatics application to visualize metabolic pathways

Page 20: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

BioCarta: MAPK pathway

Page 21: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Data extraction from NCBI

National Center for Biotechnology Information.

Run and maintained by collaborative efforts of computer scientists, molecular biologists, biochemists, research physicians and structural biologists.

Provides information on diseases, genes, gene sequences, gene transcripts, proteins, protein interactions, function, additional resources.

Page 22: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Types of services offered by NCBI

PubMed Literature search service of the National Library of Medicine. Access to over 16 million citations linked to participation online journals. Speed, efficient, easy to use.

BLAST (Basic Local Alignment Search Tool) Most famous tool on NCBI Used for pair-wise sequence comparison Identification of novel sequences and/or determining their property(ies).

Entrez One of the most popular search engines in NCBI Search query can be name of gene, protein (if different) or accession number for the gene,

RNA or protein. A plethora of relevant information produced

OMIM (Online Mendelian Inheritance in Man) Used mostly by physicians and medical investigators interested with genetic disorders

Page 23: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Cancer-specific databases

Page 24: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

caBIG

Is an information network connecting the cancer research community

Cancer Biomedical Informatics Grid

Provided by the National Cancer Institute (NCI) in the USA

Integrative cancer research extending from bench to bedside and back again

Accelerate discovery of new detection, diagnostic and treatment techniques to improve outcome

Shares information on clinical research, imaging, pathology and molecular biology

Page 25: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

caBIG services and resources

Domain workspaces constitute areas of interest to the cancer-researching and medical community

1. Integrative cancer research (ICR) workspaces

2. Clinical trial management systems

3. In vivo imaging workspace

4. Tissue banks and pathology tools workspace

caBIG Tools1. Bioconductor: established open-source collection of software packages for high

throughput genome analysis

2. caArray: open-source, web and programmatically accessible array data management system

3. caIMAGE: database of cancer images

4. caMATCH: system that identifies patients who are potentially eligible for clinical trials

Page 26: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Profiling of bladder cancer data from public

databases

Page 27: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Objectives of research

1. Collecting information on genes involved in bladder cancer.

2. Assembling an interaction network for these genes.

3. Identifying biomarkers

4. Collecting expression level data, e.g., microarray data.

5. Automatic management, processing, visualization of this data.

Page 28: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

0 5 10 15 20 25 30 35 40

MelanesiaMiddle Africa

MicronesiaIndia

ChinaWestern Africa

South-Central AsiaSouth -Eastern Asia

Eastern AfricaEastern Asia

PolynesiaCentral America

CaribbeanJapan

South AmericaSouthern Africa

Western AsiaCentral & Eastern

Australia/New ZealandNorthern Europe

Northern AfricaWestern Europe

Northern AmericaSouthern Europe

Egypt

Rate per 100,000 population

Males

Females

Figure 1.3: Age-standardised (World) incidence rates for bladder cancer, by sex, world regions, 2002 estimates

Source: http://info.cancerresearchuk.org/cancerstats/types/bladder/incidence/

Page 29: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Bladder cancer stages

From: OXFORD,G.A.R.Y. and THEODORESCU,D.A.N. Review Article: The Role of Ras Superfamily Proteins in Bladder Cancer Progression, The Journal of Urology, 170: 1987-1993, 2003.

Page 30: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

From: http://cornellurology.com/bladder/gi/types.shtml

Carcinoma in situ

Transitional cell carcinoma

Metastatic transitional cell

carcinoma

Squamous cell carcinoma

Invasive (high grade)

Superficial (low grade)

Bladder cancer types

Page 31: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Aetiology of bladder cancer in Egypt

Cigarette smoking (3-7 fold risk) (Samanic et al. 2006)

Aromatic amines Occupational hazard

Schistosomiasis (Michaud, 2007) Bathing in infested waters Working in fields SCC was more common TCC during times of high

schistosomiasis.

Page 32: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Genes involved in bladder cancer

To identify genes involved in bladder carcinogenesis and progression, internet research was performed to gather information about these genes.

Sources

Publicly available databases e.g.

NCBI www.ncbi.nlm.nih.gov/ KEGG http://www.genome.jp/ BioGRID http://www.thebiogrid.org/ GeneOntology http://amigo.geneontology.org/ Ensembl www.ensembl.org/

Literature search using Pubmed (NCBI) and Google.

Page 33: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Data collection

Genes were collected using Boolean queries, e.g., “Bladder cancer, name of gene”.

We identified 261 genes related to bladder cancer Data was summarized in a list containing gene information

and interacting genes. Gene name, NCBI accession number, URLs Chromosome locus Protein-protein interactions Function in normal cell Function in bladder cancer cell Diagnostic/prognostic potential or use Literature

Data annotation

Page 34: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Biomarker identification

Target in cancer research is mainly to predict tumor behavior. Early diagnosis Prevent delayed treatment situations

We need to distinguish harmless early lesions from those that will progress into cancer.

Depends on good tests and tools.

Current diagnosis of bladder cancer: cystoscopy.

Research community is developing good biomarkers for this purpose. Biomarkers are molecules that could be targeted in therapy.

Page 35: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

MarkerSensitivity

%Specificity

% Method of detection Manufacturer

NMP22 47-87 58-91 Enzyme immunoanalysis Matritech

BTA STAT 57-82 61-82Antigen-antibody

colorimetric Bard Diagnostics

BTA TRAK 55-80 38-98 Enzyme immunoanalysis Bard Diagnostics

FDP 41-93 77-94Antigen-antibody

colorimetric Intracel Corp

Telomerase 53-91 46-99 Polymerase chain reaction Oncor

Immunocyt 86-95 79-90Immunofluorescence

immnoassay/ cytology Diagnocure

Quanticyt 45-59 70-93 MorphometryGentian Scientific

Software

UBC 59-79 84-96 Enzyme immunoanalysis IDL

CYFRA 21-1 74-99 57-78Eelectrochemiluminescence

assayRoche Diagnostics

BLCA4 85-96 85-100 Enzyme immunoanalysisEichrom Technologies

Hyaluronic acid/hyaluronidase 82-92 83-96 Enzyme immunoanalysis  

Biomarkers in use

Page 36: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

NMP22

CD44

Hyaluronic acid

Hyaluronidase

NMP22GPSM2

Markers inserted onto KEGG’s bladder cancer network

Page 37: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Microarray technology

Measuring gene expression

Page 38: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

cDNA microarrayCustom-made

OligonucleotideReady

Gene expression analysis: Transcriptomics

Microarray technology: the study of mRNA levels in cells Transcriptome

Looks at the abundance of the transcript for thousands of genes High throughput

Revolutionized by Affymetrix company

http://en.wikipedia.org/w

iki/DN

A_m

icroarray

Affymetrix array

Page 39: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

From : http://www.fastol.com/~renkwitz/microarray_chips.htm

Differential expression

Up regulation

Down regulation

ControlCancer

Page 40: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Output of microarray

Raw image is usually a 16-bit TIFF file.

Microarray image processor converts color intensities into raw quantitative data (probe-level data)

No immediate observations can be made concerning gene expression from raw data

Statistical analysis applications are used to interrogate the data for information on gene expression patterns

Page 41: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Raw data storage

Modes of data storage

As files•Data is stored directly on the institution’s or lab’s computer

•Does not require special software

•Difficult to track and query the data if larger experiments are performed.

In local databases•Commercial or academic

•Allows local storage of data

•Good tracking and management of experimental data and integration with public MA databases.

•Requires purchase, installation and maintenance of complex software

Page 42: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Public and commercial microarray databases

PUBLIC

GEO (Gene expression omnibus) NCBI

ArrayExpress (EBI-EMBL) caBIG SMD (Stanford microarray

database) Yale microarray database RED (Rice expression

database) Oncomine

COMMERCIAL Oncomine Array Informatics Limas GeNet (Russian website)

OTHER CleanEx (SIB) GenMAPP

Page 43: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Our bladder cancer microarray data collection

Queried “Bladder cancer” using all public databases identified

Collected 14 data sets on bladder cancer ArrayExpress GEO Oncomine

Based on literature, there are unpublished data sets

Page 44: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Gender

Disease state

Disease staging

Page 45: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Precomputational analyses

Some databases provide information from preliminary analysis on data.

Make data exploration much easier and quicker for the user.

Oncomine

Page 46: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

ONCOMINE™ RESEARCH

ONCOMINE performs pre-computations on data to make data exploration much easier and quicker

Oncomine is made up of 3 layers• Data input• Data analysis• Data visualization

Page 47: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Single-experiment analysis Multiple-experiment analysis

Upper quartile

Lower quartile

Median

Smallest value

Outlier

Largest value

Outlier

Single and multiple experiment analyses

Page 48: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

SIB (Swiss Institute of Bioinformatics)

Research groups based in different European countries.

The main goal is to provide a bioinformatics platform conglomerating as well as analyzing different data sets

CleanEx microarray database Data is analyzed into their portal for easier access and interpretation

Page 49: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

CleanEx•Provided through the Swiss Institute of Bioinformatics (SIB)

•Service similar to ONCOMINE but gathers data sets only from GEO

Does not allow profile visualization

Page 50: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Collecting information on bladder cancer in Egypt

specifically

Page 51: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Published article on bladder cancer in Egypt

Ewis et al. (2007) studied bilharzia-associated SCC (squamous cell carcinoma)

Analysis performed using with microarray 17 patients diagnosed at the Egyptian National Cancer

Institute.

RESULT Showed a change in expression- differential expression -

in 82 genes 38 genes up regulated 44 genes down regulated

Page 52: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Our own data analysis on Ewis et al. data

1. Annotated information gathered on each of 82 genes

2. Compared expression pattern for each gene with other data sets from public, free databases

Page 53: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

3. Identified 7 genes from the Ewis study showing opposition to all other datasets collected

4. Identified 3 genes from the Ewis study correlating in expression with other studies from databases

Page 54: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

5. Gathered more detailed information on the 7 genes

Where do they lie in our KEGG pathway network How vital are they to cell function Does Ewis data make sense (based on the known function)?

Discrepancies found in results Keratin 16

Page 55: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

KRT16

KRT7

TGFBR

TGFβ

SMAD2/3 SMAD4

ACVR1B

JNK

KEGG BC pathway with all significant markers for research

Not much data provided on the remaining proteinsWE NEED TO UNDERSTAND THEIR FUNCTION

Modified from the KEGG database

Page 56: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

CONCLUSION

Page 57: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Follow up of Ewis et al. study

PROS Offers good preliminary

information on bilharzia-associated bladder cancer in the Egyptian population

CONS Several mistakes detected in

annotation Pooled samples Only SCC studied Does not explain the present

discrepancies in the results e.g. Keratin 16

FOLLOW UP STUDY IS NECESSARY TO UNDERSTAND DISCREPANCIES AND GENETIC DIFFERENCES

BETWEEN WESTERN AND EGYPTIAN PATIENTS

Page 58: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Problems with data collection

1. Information in databases is expanding as more research is carried out.

2. Each public database does not have a complete representation of all molecules.

Time-consuming to look through several databases.

3. There is no bladder cancer-specific database.

4. Automated methods are needed to update the data.

Page 59: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Long-term objectives of our study

1. Determine the genetic and molecular profile of the Egyptian bladder cancer patients

Based on histology Based on the bilharzial status

2. Identify biomarkers to use as drug targets in a clinical setting

3. Improve treatment modalities Tailored to the Egyptian profile

Page 60: Using Internet Databases to Collect Information on Bladder Cancer Riham Soliman Research assistant Bioinformatics group.

Thank you