Enroller Colloquium: Sulman Sarwar
-
Upload
johanna-green -
Category
Education
-
view
491 -
download
0
description
Transcript of Enroller Colloquium: Sulman Sarwar
ENROLLER - An e-Research infrastructure for humanities researchers
Sulman SarwarResearch AssociateNational e-Science Center
University of Glasgow
OUTLINE
Introduction
Current Work
DEMO
Future Work
Conclusions
Introduction - ENROLLER ENROLLER - An Enhanced Repository for Language and Literature Researchers
JISC funded project (2009 - 2011)
National e-Science Center and University of Glasgow English Department
Data sets participating in the project
OED (Oxford English Dictionary), HTE (Historical Thesaurus of English), DSL (Dictionary of Scots Language), SCOTS (Scottish Corpus of Text and Speech), CMSW (Corpus of Modern Scottish Writing), NECTE (Newcastle Electronic Corpus of Tyneside English)
Objectives• To develop an interactive, research infrastructure
providing seamless access to participating data collections
• A well designed easy-to-use search system, access to digital sound, video and textual data
• Develop tools for linguistic analysis (such as: concordancing, collocation and frequency analysis)
• Seamless secure access to licensed data; by developing automatically enforced access and usage policies
• Support for addition of new data collections
• Building large-scale data indexes for searching and exploiting HPC and e-Science facilities (such as ScotGrid and NGS)
Current WorkSimple Searches
Data resides on local (to institution servers)
Input: a file containing word(s) or phrases
Output: to Display, Save to file.
Cross-collection Searches
Search same word(s) in multiple collections
Current Work ...Bulk Searches (over NGS or ScotGrid)
Data resides on NGS
Input: Word(s), Phrases,
Output: to Display, Save as file
Execute Workflows (over NGS or ScotGrid)
Data, Input and Output same as above.
Workflow Example• Input a word OR upload a file containing the words/phrases (terms) to be searched
• Search the terms in thesaurus (for example: HTE)
• Search the results from thesaurus-search in Scottish Corpus
• Find concordances for for each of the words
• Display the thesaurus-search results
• Display the corpus-search results
• Display the concordances
• Save/Download the results
Timid
{acolmod,egeful,(ge)forht,forhtfulforhtiendlic,forhtig,forhtmod,herebleaþ,ungedyrstig,unþriste,blethe<bleaþ,fey<fæge,unbold<unbeald,unbold<unbeald,argh<earg,frightful,feared,ferdy,fearful,ferdful,g(h)astful,trembling,timorous,cremetous,cremeuse,craintive,sheepish,meticulous,timid,tremebund,awful,soft,pale,timorsome, tremulous, pigeon-hearted,affrightful, formidolous, pavid, timidous,unsupported, tender-nosed, scary, pippin-hearted,kitten-hearted, funky, tender-footed, fearsome misventurous,scare,cotton-wool,
} 51 entries
talk chapped knocked blate bashfultimidrax stretch galluses braces yont defend inverewe handkerchief trees wavetimidsurrender while rhododendrons hurl defiance 1 lawrence wrote of histimidtypist nelly morrison dirty bitch
through you i see thetimidindeterminate puzzled soul behind that together in one corner atimidscrum we tried to celebrity
bit rupert wis aye atimidcat it jist hatit e wis a tom it wistimidi min fine on it
HTE
2- Shibboleth redirects
user to W.A.Y.F. service
Typical Interaction Flow
3- User selects their
home instituition
QuickTime and aᆰBMP decompressor
are needed to see this picture.
Home Instituition
Identity Provider LDAP
LDAP
AuthZ
AuthN
QuickTime and aᆰBMP decompressorare needed to see this picture.
QuickTime and aᆰBMP decompressorare needed to see this picture.
Shibb Frontend
PortalDB
Service Provider
QuickTime and aᆰBMP decompressorare needed to see this picture.OEDQuickTime and aᆰBMP decompressorare needed to see this picture.SCOTSQuickTime and aᆰBMP decompressorare needed to see this picture.NECTE
NGSNGS
QuickTime and aᆰBMP decompressorare needed to see this picture.
HTE
QuickTime and aᆰBMP decompressorare needed to see this picture.
SCOTS
MapMapReduceReduce
Data
QuickTime and aᆰBMP decompressor
are needed to see this picture.
QuickTime and aᆰBMP decompressor
are needed to see this picture.
Uni. of Uni. of GlasgowGlasgow
Uni. of Uni. of New New
CastleCastle
OUPOUP
WE
B
WE
B
SE
RV
ICE
SS
ER
VIC
ES
GR
ID S
ER
VIC
ES
GR
ID S
ER
VIC
ES
1. User points browser at Grid resource/portal
QuickTime and aᆰBMP decompressor
are needed to see this picture.
Federation
4. Home site authenticates user
and pushes attributes to the service
provider
QuickTime and aᆰBMP decompressor
are needed to see this picture.
5. Pass authentication info and attributes to authZ function
QuickTime and aᆰBMP decompressor
are needed to see this picture.
QuickTime and aᆰBMP decompressorare needed to see this picture.
QuickTime and aᆰBMP decompressorare needed to see this picture.
Resu
lts Ag
gre
gato
r
Using NGS/ScotGrid
#!/bin/bashecho "Starting application: #!/bin/bashecho "Starting application: scots-app "echo " submitting to job-scots-app "echo " submitting to job-manager at: "echo $(/bin/hostname -f)echo manager at: "echo $(/bin/hostname -f)echo " with aruguments to main "echo $*cd " with aruguments to main "echo $*cd /home/ngs0273/javaprog/scots-/home/ngs0273/javaprog/scots-appprops=/home/ngs0273/javaprog/scots-appprops=/home/ngs0273/javaprog/scots-app/src/main/resources/app/src/main/resources/project_ngs.propertiesecho "properties file: project_ngs.propertiesecho "properties file: " $propsnthreads=4echo "# threads= " " $propsnthreads=4echo "# threads= " $nthreads/usr/local/Cluster-Apps/java-$nthreads/usr/local/Cluster-Apps/java-1.6.0_03/bin/java -cp target/scots-app-1.0-1.6.0_03/bin/java -cp target/scots-app-1.0-SNAPSHOT.jar scots.app.App "$1" $props SNAPSHOT.jar scots.app.App "$1" $props $nthreads$nthreads
QuickTime and aᆰBMP decompressorare needed to see this picture.DATAQuickTime and aᆰBMP decompressorare needed to see this picture.DATA
Head NodeHead Node
Job ManagerJob Manager
CE-1CE-1
CE-2CE-2
QuickTime and aᆰBMP decompressorare needed to see this picture.DAT
A
CE-3CE-3
CE-4CE-4
CE-NCE-N
OutputOutput
WEB SERVICESWEB SERVICES
GR
ID S
ER
VIC
ES
GR
ID S
ER
VIC
ES
QuickTime and aᆰBMP decompressorare needed to see this picture.
JobJob
Sub-Sub-missionmission
ClientClient
MapReduceMapReduceApplicationApplication
Job Job
Submission
Submission
Script
Script
QuickTime and aᆰBMP decompressorare needed to see this picture.DAT
A
DEMO
ENROLLER Search
ENROLLER Advance Search
Types of Searches
Simple Word (single/multiple) Searches
Free text queries / phrase searches
Wild-card searches (can* , t?ll)
Fuzzy searches -to search for a term similar in spelling (roam~ : foam , roams)
Field searches (title: BBC)
Term boosting - to control the relevance (salmon^4 reid)
Boolean Searches ( “ayr” AND “scotland”, “ayr” -“scotland”, “ayr” OR “edinburgh”. Likewise + and NOT operators)
Grouping - ( (ayr OR glasgow) AND BBC))
Types of Searches
Future Work•Cross-collection searches
•Development of Language Analysis Tools
•Addition of new data collections
•Addition of UI features in portal for better user experience
•Working towards the development of a VRE for Language and Literature community
•Thank you.
•Questions?
THE END.