Text-mining practical
-
Upload
lars-juhl-jensen -
Category
Science
-
view
197 -
download
1
Transcript of Text-mining practical
![Page 1: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/1.jpg)
![Page 2: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/2.jpg)
unix primer
![Page 3: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/3.jpg)
the command line
![Page 4: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/4.jpg)
some useful commands
![Page 5: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/5.jpg)
cat
![Page 6: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/6.jpg)
less
![Page 7: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/7.jpg)
head -10
![Page 8: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/8.jpg)
tail -10
![Page 9: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/9.jpg)
grep ‘needle’
![Page 10: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/10.jpg)
cut -f 2
![Page 11: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/11.jpg)
sort
![Page 12: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/12.jpg)
sort -nr
![Page 13: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/13.jpg)
uniq -c
![Page 14: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/14.jpg)
redirecting output
![Page 15: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/15.jpg)
write to file
![Page 16: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/16.jpg)
command > filename
![Page 17: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/17.jpg)
using pipes
![Page 18: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/18.jpg)
command1 | command2
![Page 19: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/19.jpg)
putting it all together
![Page 20: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/20.jpg)
cut -f 4 infile | sort | uniq -c |sort -nr | head -100 > outfile
![Page 21: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/21.jpg)
the task
![Page 22: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/22.jpg)
disease gene finding
![Page 23: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/23.jpg)
named entity recognition
![Page 24: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/24.jpg)
human genes
![Page 25: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/25.jpg)
gene prioritization
![Page 26: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/26.jpg)
what I have done
![Page 27: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/27.jpg)
information retrieval
![Page 28: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/28.jpg)
two diseases
![Page 29: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/29.jpg)
prostate cancer
![Page 30: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/30.jpg)
schizophrenia
![Page 31: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/31.jpg)
two sets of documents
![Page 32: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/32.jpg)
62,755 abstracts
![Page 33: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/33.jpg)
65,588 abstracts
![Page 34: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/34.jpg)
one directory with each set
![Page 35: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/35.jpg)
one file with each abstract
![Page 36: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/36.jpg)
dictionary
![Page 37: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/37.jpg)
tab-delimited file
![Page 38: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/38.jpg)
human genes
![Page 39: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/39.jpg)
22,523 entities
![Page 40: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/40.jpg)
synonyms
![Page 41: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/41.jpg)
from many databases
![Page 42: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/42.jpg)
orthographic variation
![Page 43: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/43.jpg)
prefixes and postfixes
![Page 44: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/44.jpg)
automatically generated
![Page 45: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/45.jpg)
2,726,495 names
![Page 46: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/46.jpg)
tagdir program
![Page 47: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/47.jpg)
flexible matching
![Page 48: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/48.jpg)
upper- and lower-case
![Page 49: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/49.jpg)
spaces and hyphens
![Page 50: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/50.jpg)
tab-delimited output
![Page 51: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/51.jpg)
what you will do
![Page 52: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/52.jpg)
named entity recognition
![Page 53: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/53.jpg)
find unfortunate names
![Page 54: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/54.jpg)
create “black list”
![Page 55: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/55.jpg)
information extraction
![Page 56: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/56.jpg)
co-mentioning
![Page 57: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/57.jpg)
within abstracts
![Page 58: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/58.jpg)
rank genes for each disease
![Page 59: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/59.jpg)
find shared gene
![Page 60: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/60.jpg)
![Page 61: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/61.jpg)
a helping hand
![Page 62: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/62.jpg)
“black list”
![Page 63: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/63.jpg)
100+ matches
![Page 64: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/64.jpg)
10+ matches
![Page 65: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/65.jpg)
![Page 66: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/66.jpg)
wrap up
![Page 67: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/67.jpg)
prostate cancer
![Page 68: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/68.jpg)
FOLH1
![Page 69: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/69.jpg)
schizophrenia
![Page 70: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/70.jpg)
Glutamate carboxypeptidase II
![Page 71: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/71.jpg)
same protein
![Page 72: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/72.jpg)
synonyms matter
![Page 73: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/73.jpg)
“black list” is crucial
![Page 74: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/74.jpg)
text mining is useful
![Page 75: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/75.jpg)
not black magic
![Page 76: Text-mining practical](https://reader036.fdocuments.in/reader036/viewer/2022062405/554e89f4b4c90573338b4994/html5/thumbnails/76.jpg)
EMBO Practical Course Computational Biology:Genomes to SystemsPuerto Varas, 3-9 April 2014
Thank you!
Thank you!