Please have a seat. Our program will commence shortly.

18
Please have a seat. Our program will commence shortly.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    3

Transcript of Please have a seat. Our program will commence shortly.

Please have a seat. Our program will commence shortly.

Biomarker Automated Retrieval Tool

Ronny Chan, Kim NgoRonny Chan, Kim Ngo

Earth Science Data Earth Science Data Systems Dept.Systems Dept.

Bioinformatics Relationship

Science produces massive amounts of data Data needs to be analyzed, stored, &

retrieved This is data-mining

We want to apply computer science to improve this process

Motivation

Problems with conventional data mining Time consuming Accuracy not defined (subjective)

No objective scientific info retrieval tool

Where are the Biomarkers?

Cancer Biomarkers

An indicator of cancerous growth.

Proposed Solution

Create a program that allows people to quickly scan literature for the

most relevant keywords/biomarkers

B.A.R.T.

HER-2

HPEBP4EP-CAM

ERBB2BAG-1

Significance

What is the need of the project? More efficient research Save time

conventional enhanced

B.A.R.T.

Goals

Make biomarker/keyword searches more efficient

Learn Java Learn SQL

Approach

Write a program Read in articles Use part of Vector Space Model algorithm to

rank terms Output relevant terms in statistical rankings

they BRCA1VS.

Vector Space Model

Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines

Algorithm for B.A.R.T.

Keywords Input

PubMed Query Agent

Data Store

Data Retrieval and Output

Content Analyzer

Keyword Parser

Content Ranker

DCIS CU-TP3982 ERBB2 HER-2 HPEBP4 BAG-1 EP-CAM 99M

Results

Lessons & Difficulties

Deciding on algorithm choice Ease of implementation and effectiveness

Limited knowledge & experience Java, SQL Initial implementation is slow

5 ARTICLES = 160 sec

UPDATE: AUGUST 18, 2004

100 ARTICLES = 8^19 years

20 ARTICLES = 1904 sec

100 ARTICLES = 8^38 years

Future work

Apply different term weight functions to make results more robust

Optimize the program for speed

Citations

1. http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF

2. http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10

3. http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf

4. http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf

5. Biomarkers Definitions Working Group.Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).

Acknowledgements

Earth Science Data System, JPLTina XiaoPaul RamirezChris MattmannRoshanak RoshandelSean Hardman

ALL SoCalBSI Colleagues

National Institute of Health (NIH)

National Science Foundation (NSF)

Southern California Bioinformatics Summer Institute (So Cal BSI)

SoCalBSI Professors

Jacqueline Heras

Q : malignant breast cancer

D 1: detection of malignant level in the cell

D 2: sighting of breast stage in the breast cancer

D 3: detection of malignant stage in the cancer

doc the stage level sighting cell malignant in of breast detection cancer

D1 1(0) 0 1(.477) 0 1(.477) 1(.176) 1(0) 1(0) 0 1(.176) 0

D2 1(0) 1(.176) 0 1(.477) 0 0 1(0) 1(0) 2(.477) 0 1(.176)

D3 1(0) 1(.176) 0 0 0 1(.176) 1(0) 1(0) 0 1(.176) 1(.176)

Q 0 0 0 0 0 1(.176) 0 0 1 0 1(.176)

VSM ExampleID TERM DF IDF

1 the 3 0

2 stage 2 .176

3 level 1 .477

4 sighting 1 .477

5 cell 1 .477

6 malignant 1 .176

7 in 3 0

8 of 3 0

9 breast 1 .477

10 detection 2 .176

11 Cancer 2 .176

)(log

)(log

23

10

10 DFn

Example Continued…

Keyword tf * idf