Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final...
Transcript of Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final...
![Page 1: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/1.jpg)
Project group knowAANFinal presentation
Computer Science Education GroupUniversity of Paderborn
October 20th 2011
![Page 2: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/2.jpg)
Overview
Overview
I IntroductionI System components & Work flowI DemonstrationI Development processI Summary & OutlookI Time for further questions of detail
PG knowAAN 2
![Page 3: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/3.jpg)
Overview
Overview: First part
I GoalsI Extraction & Storage (of data)I Exploration (of data)I System components & Work flowI Analysis & Visualization (of data)
PG knowAAN 3
![Page 4: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/4.jpg)
Goals
Goals
I Explore research networksI Based on: Artifacts (scientific publications) and metadataI Combination and analysis of dataI Computation of similarities of full textsI Support for conference management system GinkgoI Data visualizationI Recommendations
(Source: PG knowAAN project description)
PG knowAAN 4
![Page 5: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/5.jpg)
Goals
Imagine you are interested in a conference.You downloaded the papers of 2 or 3 years.
Now you have nearly 100 publications.How do you explore them?
100 publications. Do you know tools?PG knowAAN 5
![Page 6: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/6.jpg)
Extraction & Storage
Extraction & Storage
First step: Extract data and store it.
PG knowAAN 6
![Page 7: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/7.jpg)
Extraction & Storage
PG knowAAN 7
![Page 8: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/8.jpg)
Exploration
Exploration
Second step: Explore data.
PG knowAAN 8
![Page 9: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/9.jpg)
Exploration
Exploring a conference
PG knowAAN 9
![Page 10: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/10.jpg)
Exploration
Exploration
Which extracted data is available for a publication?
→ Database schema
PG knowAAN 10
![Page 11: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/11.jpg)
publication
id GUID
lucuid VARCHAR(512)
title VARCHAR(512)
booktitle VARCHAR(512)
normtitle VARCHAR(512)
date VARCHAR(512)
editor VARCHAR(512)
journal VARCHAR(512)
note VARCHAR(512)
pages VARCHAR(512)
publisher VARCHAR(512)
tech VARCHAR(512)
volume VARCHAR(512)
number VARCHAR(512)
rawstring VARCHAR(4096)
xmlfile VARCHAR(512)
pdffile VARCHAR(512)
topicfile VARCHAR(512)
created BIGINT
modified BIGINT
Indexes
author
id GUID
text VARCHAR(512)
normtext VARCHAR(512)
firstname VARCHAR(512)
lastname VARCHAR(512)
created BIGINT
modified BIGINT
Indexes
pub_aut
publication_id GUID
author_id GUID
Indexes
affiliation
id GUID
text VARCHAR(512)
location_id GUID
Indexes
address
id GUID
text VARCHAR(512)
location_id GUID
Indexes
pub_aff
publication_id GUID
affiliation_id GUID
Indexes
pub_add
publication_id GUID
address_id GUID
Indexes
citation
publication1_id GUID
publication2_id GUID
Indexes
discipline
id GUID
text VARCHAR(512)
parent_id GUID
Indexes
location
id GUID
latitude DOUBLE
longitude DOUBLE
text VARCHAR(512)
Indexes
keyword
id GUID
text VARCHAR(512)
Indexes
pub_key
publication_id GUID
keyword_id GUID
score DOUBLE
source VARCHAR(512)
Indexes
pub_evt
publication_id GUID
event_id GUID
Indexes
pub_dis
publication_id GUID
discipline_id GUID
Indexes
pub_con
publication_id GUID
concept_id GUID
score DOUBLE
source VARCHAR(512)
Indexes
concept
id GUID
text VARCHAR(512)
Indexes
event
id GUID
text VARCHAR(512)
filepath VARCHAR(512)
predecessor_id GUID
successor_id GUID
Indexes
eventseries
id GUID
text VARCHAR(512)
filepath VARCHAR(512)
Indexes
evt_evs
event_id GUID
eventseries_id GUID
Indexes
aut_add
author_id GUID
address_id GUID
Indexes
aut_aff
author_id GUID
affiliation_id GUID
Indexes
pub_cat
publication_id GUID
category_id GUID
score DOUBLE
source VARCHAR(512)
Indexes
category
id GUID
text VARCHAR(512)
Indexes
bib_coupling
co_author
co_citationkeyword_count
discipline_count
category_count
concept_count
evt_pub_aut_count
![Page 12: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/12.jpg)
System components & Work flow
System components & Work flow
How is our system structured?
→ Some examples.
PG knowAAN 12
![Page 13: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/13.jpg)
System components & Work flow
Components
<< component >>
FileStorage
<< component >>
Backend
<< component >>
xmlBuilder
<< component >>
TopicExtraction
<< component >>
TF-Component
<< component >>
TrendDetection
<< component >>
Roundtrip
<< component >>
Recommendation
<< component >>
PDFToText
<< component >>
Clustering
<< component >>
DB
<< component >>
Parscit
<< component >>
DataBase
<< component >>
SolrWebServices
<< component >>
DocBrowser
<< component >>
FrontendReferenceExtraction
<< component >>
ParscitTrainer
JDBC
JDBC
Model
WebServices
WebServices
FileSystem
PG knowAAN 13
![Page 14: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/14.jpg)
Languagedetection: DB:Solr:NounExtraction:Lemmatizer:Parscit:PDFToText :RoundTripExecutor :RoundTrip :DocumentBrowser:
a / 1) .addPDF
a / 1)
a / 2) .writeToFS
a / 2) Path
a / 3) .createThread
a / 3)
.submitThread
b / 1) .run
b / 1)
b / 2) .getText
b / 2) Text
b / 3) .ParseFullText
b / 3) ParscitXML
b / 6) .lemmatize
b / 6) LemmatizedText
b / 4) .extractBodyAndAstract
b / 4) BodyAndAbstract
b / 7) .extractNouns
b / 7) NounsList
b / 8) .lemmatizeNounslist
b / 8) LemmatizedNouns
b / 10) .writeToFiles
b / 10) Paths
b / 5) .getLanguage
b / 5) LanguageString
b / 9) .ReduceToTopNouns
b / 9) TopNouns
b / 11) .addTexts
b / 11) Solrid
b / 12) .addPublication
b / 12)
![Page 15: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/15.jpg)
System components & Work flow
Work flow
PG knowAAN 15
![Page 16: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/16.jpg)
Analysis & Visualization
Analysis & Visualization
Third step: Analyze and visualize data.
PG knowAAN 16
![Page 17: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/17.jpg)
Analysis & Visualization
Analysis of authors
PG knowAAN 17
![Page 18: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/18.jpg)
Analysis & Visualization
Analysis of scientific publications
PG knowAAN 18
![Page 19: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/19.jpg)
Demonstration
Demonstration
Now: Demo.Image: http://www.flickr.com/photos/plaisanter/5525977163/
PG knowAAN 19
![Page 20: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/20.jpg)
Development process
Technologies
Jersey
PG knowAAN 20
![Page 21: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/21.jpg)
Development process
Methods of agile software development
FDD XPScrum
PG knowAAN 21
![Page 22: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/22.jpg)
Development process
Methods of agile software development
I Weekly meetingsI Sit together (as much as possible)I Automated building systemI Continuous integrationI Issue tracking
PG knowAAN 22
![Page 23: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/23.jpg)
Summary and Outlook
Summary and future work
Summary
I Integrated processing of scientific papersI Aggregated visualization of authors, publications and
eventsI Compute various analysis over the dataI Cleaning functionality for automated processed data
Future work
I Parallelized ClusteringI Additional graphical visualizationI Improve extraction of metadata from PDF files
PG knowAAN 23
![Page 24: Project group knowAAN Final presentationbücker.name/pubs/pgknowaan.final.presentation.pdf · Final presentation Computer Science Education Group University of Paderborn October 20th](https://reader030.fdocuments.in/reader030/viewer/2022011902/5f0e42777e708231d43e60c7/html5/thumbnails/24.jpg)
Summary and Outlook
Thank you for your attention
Questions?
PG knowAAN 24