Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington...
-
Upload
colleen-barber -
Category
Documents
-
view
218 -
download
0
Transcript of Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington...
![Page 1: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/1.jpg)
Converting Large NCBI Databases into SAS
Rosa SJ Lin
Division of Statistical Genomics Washington University in Saint Louis
June 30, 2008
![Page 2: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/2.jpg)
NCBI(http://www.ncbi.nlm.nih.gov)
Contains a large number of databases Most important are: - GenBank - PubMed - RefSeq - Online Mendelian Inheritance in Man
(OMIM) - dbSNP
![Page 3: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/3.jpg)
dbSNP Database
![Page 4: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/4.jpg)
NCBI dbSNP
Contains information about SNPs
Submitted data is given an ss number
(e.g. ss52079780)
If data meets criteria a reference SNP is
created which had an rs number (e.g.
rs530)
![Page 5: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/5.jpg)
dbSNP Data (1)- Each record with various lines and each line with various lengths
![Page 6: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/6.jpg)
dbSNP Data (2)
![Page 7: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/7.jpg)
dbSNP Data (3)
![Page 8: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/8.jpg)
Various uses of the SCAN, INDEX functions to assist in reading data (1)
data ncbisnp ; length rs $12 ; infile din firstobs=1 missover pad;
input snpline $132. ; if index(snpline,"updated")>0 then do; rs=compress(scan(snpline,1,"|")); output; end;run;
![Page 9: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/9.jpg)
Various uses of the SCAN, INDEX functions to assist in reading data (2)
if index(snpline,"alleles=")>0 then do; alleles=substr(compress(scan(snpline,2,"|")),9); output; end;
if index(snpline,"assembly=reference")>0 then do chrom=input(substr(compress(scan(snpline,3,"|")),5),8.); posc=compress(scan(snpline,4,"|")); output; end;
![Page 10: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/10.jpg)
Use RETAIN statement - cause a variable to keep its value from one iteration of the DATA step to the next.
retain markname rs alleles;
![Page 11: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/11.jpg)
dbSNP Data (4)
![Page 12: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/12.jpg)
Output SAS Dataset
![Page 13: Converting Large NCBI Databases into SAS Rosa SJ Lin Division of Statistical Genomics Washington University in Saint Louis June 30, 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062315/5697bf8f1a28abf838c8d586/html5/thumbnails/13.jpg)
Readings:
Kim L Kolbe etc., SUGI 22: “Advanced Techniques for Reading Difficult and Unusual Flat Files”.
Clinton S Rickards, SUGI 24: “Reading External Files Using SAS® Software”.