DBLP database DBLP Web Site & Search Pages User Query Info n User Knowledge Info1 Info2 1. 2. 3.
.uni-trier.de dblp 11 September 2002Michael Ley1 The DBLP Computer Science Bibliography: Evolution,...
-
Upload
sabrina-stephanie-holland -
Category
Documents
-
view
213 -
download
0
Transcript of .uni-trier.de dblp 11 September 2002Michael Ley1 The DBLP Computer Science Bibliography: Evolution,...
-
The DBLP Computer Science Bibliography: Evolution, Research Issues,PerspectivesSPIRE 2002Lisbon, Portugal
Michael Ley
-
Thank you for the invitationMany of you use DBLPThis talk gives some background information about the serviceYou are invited to use the DBLP data to test and evaluate for your algorithms
Michael Ley
-
About me born March 19591986 diploma in informatics from Aachen University of Technology1993 Ph.D. from University of Triersince 1993 lecturer at U TrierProgramming for 1st/2nd year studentsDB implementation, Digital Libraries
Michael Ley
-
OutlineHistoryTechnical BackgroundPerspectives & Research Issues
Michael Ley
-
1. History
The BeginningPerson PagesServiceEary Recognition
ACM SIGMOD AnthologySponorGrowth of DBLPSoftware Labs
Michael Ley
-
The BeginningEnd of 1993 - simple test of Web technology: Xmosaic, NCSA HTTP serverTables of contents:Journals and proceedingsDataBase systems / Logic Programming
Michael Ley
-
Basic Architecture
Michael Ley
-
Person-Publication Network
Michael Ley
-
Person-Publication NetworkAdding a hyperlink from each person name to a page which enumerates this persons publicationsResult of the complex social network behind researchDB, IR: simply a view, a canned query
Michael Ley
-
DBLP is a service, not a research projectlimited resourcestrade-off: development of new features & softwarevs. entering & maintaining contents
Michael Ley
-
No DBMS used until todayprototype implementations (diploma theses): DBLP with SHORE, DB2+better tools to maintain consistencysoftware too large for maintenencetrade-off: disk-space vs. CPU speed
Michael Ley
-
Early Recognition1997:ACM SIGMOD Service AwardVLDB Endowment Special Recognition AwardHelped to make DBLP a more official project & to get a small inital fund
Michael Ley
-
ACM SIGMOD AnthologySIGMOD had made some profits with its conferences Idea by Rick Snodgrass:use the money to scan in historical DB publicationsCombine these full texts and an improved version of DBLP to a digital libary
Michael Ley
-
Anthology: ContentsJournals, Newsletters:TODSTKDEVLDB JournalDistr. & Parallel DBData Engineering
SIGMOD RecordSIGKDD Expl.SIGIR ForumData Base
Michael Ley
-
Proceedings:ACM DLACM GISADBISCIKMCOOPISDASFAADBPLDOLAPEDBTERHypertextICDTKRDBMFDBSNPIVPDISPODSPOSSIGIRSIGMODSIGFIDETSSDSSDBMVLDBXP+several Workshops
Michael Ley
-
Books:Abiteboul/Hull/Vianu: Foundations of DBsBernstein/Hadzilacos/Goodman: Concurrency Control and Recovery in Database SystemsMaier: Theory of Relational DatabasesGray: The Benchmark HandbookStonebraker: The INGRES PapersWiederhold: Database Design (2nd Ed.)Snodgrass: The TSQL2 Temporal Query L.
Michael Ley
-
21 CDROMs / 2 DVDs >150000 pages full text
Michael Ley
-
Anthology: Citation LinksReferences:[1] [2] B: xxx.[3] References:[1] Referenced by: A: yyy.
Michael Ley
-
Sponsor Found / Expansionhelp by MS Research & Jim Gray made it possible to expand DBLP to cover most areas of computer sciencestudents were hired to enter data
Michael Ley
-
Growth of DBLP
Michael Ley
-
Teaching Java At U Trier we teach Java as the first programming languageAssignments: DBLP XML data are usedgraph algorithmsuser interfacessimple search engines
Michael Ley
-
2. Technical Background
Initial DesignPerson PagesMirrorsSimple Search
XML-RecordsBHT-FilesMG: advanced searchAnthology search
Michael Ley
-
Initial DesignEntry pagesCollection of HTML tables of contentsTOCs were parsed to generate Person Pages (customized xmosaic parser)Person Index
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
TOC Parser / Generation of Person PagesParserTOC_OUTmkauthorsTOCsPersonPages
Michael Ley
-
MirrorsUniversity of Trier had a slow internet connectionsDBLP should have a high availabilityTechnique:transfer all TOCs + entry pages + TOC_OUT in a tar.gz filerun mkauthors on the mirror
Michael Ley
-
Michael Ley
-
TOC_OUTtitleCGI
-
Bibliographic Recordscitation linking, reviews, annotated bibliographies, assign an unique ID to each publicationmake it accessible by this IDstore the information in classical bibliograhic records
Michael Ley
-
Bibliographic Records You may download them from http://dblp.uni-trier.de/xml/ (uncompressed ~120MByte)Simple DTD Idea: BibTeX++ in XML syntaxExamples
Michael Ley
-
Andreas OberweisWolffried StuckyDie Behandlung von Ausnahmen in Software-Systemen: Eine Literaturbersicht.492-502199133Wirtschaftsinformatik6db/journals/wi/wi33.html#OberweisS91
Michael Ley
-
Andreas OberweisOliver PaulzenHagen J. SexauerEin wissensbasiertes Vorgehensmodell zur Gestaltung von CRM-Systemen.429-4362001GI Jahrestagung (1)db/conf/gi/gi2001-1.html#OberweisPS01
Michael Ley
-
Peter JaeschkeAndreas OberweisWolffried StuckyExtending ER Model Clustering by Relationship Clustering.451-4621993ERdb/conf/er/er93.html#JaeschkeOS93conf/er/93er93/ER93-P447.pdfdb/conf/er/JaeschkeOS93.html
Michael Ley
-
conf/er/CarlsonJA89journals/tods/Chen76journals/cj/FeldmanM86......conf/er/RauhS92...conf/er/ScheuermannSW79journals/csur/TeoreyYF86journals/cacm/TeoreyWBK89
Michael Ley
-
Ramez ElmasriVram KouramajianBernhard ThalheimEntity-Relationship Approach - ER'93, 12th International Conference on the Entity-Relationship Approach, Arlington, Texas, USA, December 15-17, 1993, ProceedingsERLecture Notes in Computer Science823Springer19943-540-58217-7db/conf/er/er93.html
Michael Ley
-
Publication Types
Michael Ley
-
XML Parser / Generation of Person PagesParserTOC_OUTmkauthorsXML-RecordsPersonPages
Michael Ley
-
Tables of Contents,
Michael Ley
-
Bibliography HyperText (BHT)include mechanism ()several addtional tags: , , , HTML
Michael Ley
-
25. SIGIR 2002: Tampere, Finland 25. SIGIR 2002: Tampere,Finland
Web Information Retrieval
Information Retrieval Theory
Michael Ley
-
Michael Ley
-
Michael Ley
-
Advanced Search: MGManaging Gigabytes Software by Witten, Moffat, BellDBLP XML Records MG DocumentsFilter: matching terms in the required field
Michael Ley
-
DBLP ArchitectureEntry PagesPerson PagesStreams: Conference Series,JournalsTables of Content (TOC)PersonSearchAdvancedSearchgeneratedanswers
Michael Ley
-
Entering DataStudentsWWW, E-MailwrappermkhtmlBibTeXBibTeXBHT
Michael Ley
-
Anthology: Offline SearchSimple search engine for all standard platforms Java Applet (Java 1.0)Only author searchApplet reads simple tree data structure from files:
Michael Ley
-
curran,john Hcutting,dczarnecki,hdabbaghcdagmar brdaigrdale rodambrdan cartdan smdang,zdaniel d. fudaniel j. burdaniel menddaniel sadaniela merdannenberg,cdaradroot:separatorsMann,Robert I.Mann,Sally Fahrenholz-Mann,SamuelMann,StefanMann,StephenMann-Ho LeeMann-May YauManna,I.Manna,M. LaManna,Serena LaManna,ZoharMannai,Dhamir N.Mannarino,Gabriela SusanaMannava,Phanindra K.Manne,FredrikManne,SrilathaManneback,Pierreleaves: names(permutationsof name parts)
Michael Ley
-
Michael Ley
-
Anthology: Full Text SearchAcrobat catalog: index constructioncapture: OCR for scanned textsmanual entry of title fields was necessaryinsufficient software support by AdobeAcrobat Reader with searchnot available for Linux
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
3. Perspectives & Research Issues
Funding & collaborationsPerson Names
DBLP BrowserVisualizationClassification / Clustering
Michael Ley
-
Informatics PortalDBLP will be a central part of a larger Informatics PortalOther parts:CompuScience (FIZ Karlsruhe)Collection of CS Bibliographies (Achilles/Vollmer, Karlsruhe)LeaBib (Mayr, Munich)
Michael Ley
-
Informatics Portal (2)Funded by the German Federal Ministry of Education and Research, 3 yearsIn Cooperation with the Gesellschaft fr Informatik (German Computer Society)2 open positions at Trier: service+research
Michael Ley
-
Additional Contents complete coverage of LNCSmore ACM & IEEE publicationscooperation with IFIP, Usenix, cooperation with the libraries at Dagstuhl & Max-Planck-Institut fr Informatik (Saarbrcken)several German series (LNI,IA,IFB)
Michael Ley
-
Number of records by publication year(24 August 2002)
Michael Ley
-
Challenge: How to manage the exponential growth of (computer)science publications ?maintenance problemlost in publication space
Michael Ley
-
Person Nameswidely accepted method to identify personsnames may not be uniquea person may change her/his name (marriage, emigration to other cultural environment)variations of person names
Michael Ley
-
abbreviations: Jeffrey D. Ullman, J. D. Ullman, J. Ullman, Jeff Ullman, nicknames: Michael / Mike, William / Bill, Joseph / Joepermutations: Liu Bin / Bin Liudifferent transcriptions: Andrei / Andrej / Andreyaccents: Stephane / Stphaneumlauts: Muller / Mller / Mueller
Michael Ley
-
ligatures: Wei / Weiss, strm / Aastromcase: Al-AAli / Al-Aalihyphens: Hans-Peter / Hans Petercomposition: MaoLin / Mao Lin, Kenichi / Ken-ichi / Kenichipostfixes: Karel Culik II, Jr. / Sr.typos
Michael Ley
-
Person Names: Problemsthere should be a 1:1 mapping between persons and person pageshow to search persons best ?how to normalize different spellings ?name normalization costs morethan 60% of our time
Michael Ley
-
Person Name Normalizationfor each new entry we try to locate the authors/editors in the existing collectionif spellings differ, but we are confident that they are variations of the same persons name make them equalwrite out most parts of the namepersons perferred spelling ?
Michael Ley
-
for persons with many publications the name spelling usually converges to a stable & correct statefor persons with a few known publications it is more likely that there are duplicate person pages, incorrect or incomplete spellings
Michael Ley
-
Heuristics in the decision process coauthor relationship gives strong indications for the identity of personsstreams (journals/conference series): condensation points for communities, weaker indicationsame keywords in titlestime frame ?a lot of background knowledge
Michael Ley
-
Wanted: Tool to make DBLP maintenance more efficientspecialized weight functions for the name normalization problemquery: new entry = list of names, title, publication stream, year, DBLP browser: framework for experiments
Michael Ley
-
DBLP Browser: Rolesmaintenance toolbibliographic tool for users of DBLPcomposition of reference listsexport in popular formats like BibTeXplatform for experiments in visualization (SemIPort project )
Michael Ley
-
DBLP Browsermain memory IR systemcompressed representation of the bibliographic recordsconvenient graphical user interfaceJava / Swingfirst prototype implemented
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Michael Ley
-
Danny De SchreyeMaurice BruynoogheKristof VerschaetseOn the Existence of Nonterminating Queries for a Restricted Class of PROLOG-Clauses.237-248198941Artificial Intelligence2db/journals/ai/ai41.html#SchreyeBV89
Michael Ley
-
Representation of -Fields:construct canonical Huffman codes on word level [MG-Book]degree of tree-nodes: 213 (0x2a-0xff)lexicon: 3-in-4 front codingPublications are sorted by journal/booktitle, year
Michael Ley
-
7Comprehending32sibility5-ons0sion2Compress21ibility3+n3onless6Compressions3,ve2.lets2or5Compressors/-ise2+d2s4Comprising/5omisability3,ed4s6Compromising0+r..sing.ter2Compters0,ur/-ing/on2Comptuer.4uatational1-ion4al5Compuations20tional//iting/log4Compulsory/,nd0.ting/s0Comput03abilities2-les3y7Computacional1-ion4,al1ting7Computationai6.lism5,el4onsLexicon oftitle words0=*, 1=+, 2=',', 3='-',4='.', 5='/', 6='0', 7='1',
Michael Ley
-
Lexicon: title words
Danny De SchreyeMaurice BruynoogheKristof Verschaetsec1c2c3 ...237-248
Michael Ley
-
Representation of - and -Fields:construct list of all personssort them by number of publicationsadd inverted lists (publication numbers) to all persons
Peter Smith 17 36 1003 7790 8800Johanna Mayer 36 2077 9002
Michael Ley
-
Representation of inverted indexesuse d-gaps [MG-book]variable-byte coding (base 111 numbers, 7bit value + last byte flag) [Scholer et al., SIGIR 2002]
Michael Ley
- Alberto H. F. Laender)s"b~5"\"@&$"8?\"$"i$-"cUL"?"
-
Lexicon: title words
a1 a2a3 c1c2c3 ...237-248Artificial Intelligence
Person table
Michael Ley
-
Representation of - and -Fields:construct lists of all journals and booktitlesadd position of first publication with this journal/booktitle and countArtificial Intelligence 5000 603TODS 7890 400
Michael Ley
-
33110509CACM)#^PTCS)7IKIEEE Transactions on Computers),[IInformation Processing Letters)/KHTSE)896IEEE Computer)+5JACM)0|5The Computer Journal)8U3SIAM J. Comput.)5@3Software - Practice and Experience)6z2Artificial Intelligence)"0Information and Control)//JCSS)12/IEEE Transactions on Pattern Analysis and Machine Intelligence),.
Michael Ley
-
Lexicon: title words
a1 a2a3 c1c2c3 ...237-248 j 2db/journals/ai/ai41.html#SchreyeBV89
Person tableJournal tableBooktitle table
Michael Ley
-
Representations of Paths: key, , construct paths dictionary: bottom-up tree of path elemensconfvldbspiresigirGray88Ullman91
Michael Ley
-
Lexicon: title wordsPerson tableJournal tableBooktitle tablepaths dictionaryseries tablepublishers tableAvSchreyeBV89)Tff:ff*fLfffNf3;ffHl3Km)g#uvYpb#""+Ap1SchreyeBV89Tl1l2)gn1n2up2Yypjvnb3a1a2a3
Michael Ley
-
DBLP Browser: Compression
Michael Ley
-
DBLP (Browser): next stepsearchvisualizationBHT files, but standard cases should not require BHT files: extentsion of the BibTeX data model
Michael Ley
-
Extension of the BibTeX data modelinproceedingsproceedingsconferenceperiodicalpvolumearticlestreamvolumepapercrossrefcrossrefcrossrefcrossrefcitecitecitecite
Michael Ley
-
Conference on Very Large DatabasesVLDBdb/conf/vldb/index.html
conf/vldb/2002 conf/vldb/2001
Michael Ley
-
User Interfaces for Bibliographic Collectionsconventional wisdom: provide a good IR systemDBLP: browsing oriented interface + very simple search tools
Why is DBLP nervertheless used?
Michael Ley
-
there was little comprehensive searching by faculty researchers because they worked within specialized forums where they could more efficiently find the materials they needed. This precluded comprehensive searching to identify an exhaustive set of materials in a wider variety of forums. [] They also used browsing to examine electronic announcements of conferences, and tables of contents of journals [Lisa Covi, PhD thesis 1996]
Michael Ley
-
Problems of the many search-oriented Web-PortalsHow to query ?Description / characterization of the collection unknown
Michael Ley
-
views visualizations diagrams global characterizations of the collection (coarse navigation)stream level diagramsperson page visualizations
Michael Ley
-
global characterizations
Michael Ley
-
number of publicationspersons
Michael Ley
Diagramm3
Diagramm3
0
120250
31568
14528
8570
5624
4034
2948
2266
1851
1464
1181
981
897
697
594
581
481
456
378
314
334
309
299
264
256
202
209
169
156
138
151
138
139
94
116
131
95
87
87
77
71
62
59
76
63
47
33
50
51
51
41
40
42
47
39
22
32
31
37
24
20
30
20
21
22
14
25
20
27
12
16
18
9
19
12
14
13
8
10
13
9
15
11
10
9
12
8
9
10
8
7
8
6
6
6
8
4
6
5
6
4
5
3
3
3
6
3
3
3
2
2
4
0
1
8
4
3
3
5
1
121
2
123
4
2
3
3
2
1
130
131
2
3
2
2
1
137
1
3
140
1
1
143
2
145
146
2
1
149
2
151
1
153
154
155
156
2
158
1
160
1
162
2
164
2
2
167
168
1
1
171
1
2
174
2
176
177
178
179
180
181
182
183
184
185
186
1
188
1
190
1
192
193
194
195
196
1
198
199
200
201
202
203
1
205
1
207
208
209
210
211
212
213
214
215
216
217
1
219
1
221
222
1
224
225
226
1
228
229
230
231
232
233
234
235
236
237
1
239
240
241
242
243
244
1
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
1
Tabelle1
0
1120250
231568
314528
48570
55624
64034
72948
82266
91851
101464
111181
12981
13897
14697
15594
16581
17481
18456
19378
20314
21334
22309
23299
24264
25256
26202
27209
28169
29156
30138
31151
32138
33139
3494
35116
36131
3795
3887
3987
4077
4171
4262
4359
4476
4563
4647
4733
4850
4951
5051
5141
5240
5342
5447
5539
5622
5732
5831
5937
6024
6120
6230
6320
6421
6522
6614
6725
6820
6927
7012
7116
7218
739
7419
7512
7614
7713
788
7910
8013
819
8215
8311
8410
859
8612
878
889
8910
908
917
928
936
946
956
968
974
986
995
1006
1014
1025
1033
1043
1053
1066
1073
1083
1093
1102
1112
1124
1130
1141
1158
1164
1173
1183
1195
1201
121
1222
123
1244
1252
1263
1273
1282
1291
130
131
1322
1333
1342
1352
1361
137
1381
1393
140
1411
1421
143
1442
145
146
1472
1481
149
1502
151
1521
153
154
155
156
1572
158
1591
160
1611
162
1632
164
1652
1662
167
168
1691Avi Wigderson
1701Oscar H. Ibarra
171
1721Jeffrey Scott Vitter
1732Joseph Halpern / Abraham Silberschatz
174
1752Sushil Jajodia / Moti Yung
176
177
178
179
180
181
182
183
184
185
186
1871Michael Stonebraker
188
1891Kurt Mehlhorn
190
1911Robert Endre Tarjan
192
193
194
195
196
1971Oded Goldreich
198
199
200
201
202
203
2041Elisa Bertino
205
2061Micha Sharir
207
208
209
210
211
212
213
214
215
216
217
2181Kang G. Shin
219
2201Moshe Y. Vardi
221
222
2231Christos H. Papadimitriou
224
225
226
2271Jeffrey D. Ullman
228
229
230
231
232
233
234
235
236
237
2381Hector Garcia-Molina
239
240
241
242
243
244
2451Philip S. Yu
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
2671Grzegorz Rozenberg
268
269
270
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
120250
31568
14528
8570
5624
4034
2948
2266
1851
1464
1181
981
897
697
594
581
481
456
378
314
334
309
299
264
256
202
209
169
156
138
151
138
139
94
116
131
95
87
87
77
71
62
59
76
63
47
33
50
51
51
41
40
42
47
39
22
32
31
37
24
20
30
20
21
22
14
25
20
27
12
16
18
9
19
12
14
13
8
10
13
9
15
11
10
9
12
8
9
10
8
7
8
6
6
6
8
4
6
5
6
4
5
3
3
3
6
3
3
3
2
2
4
0
1
8
4
3
3
5
1
2
4
2
3
3
2
1
2
3
2
2
1
1
3
1
1
2
2
1
2
1
2
1
1
2
2
2
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
1
Tabelle2
Tabelle3
-
ClassificationACM CR Systemclassifications only avalable for small subset of computer science literatureoften criticized as too inflexible / too coarseintellectual classification (on the paper level) is too expensive
Michael Ley
-
Clustering on stream levelrelated conferences or journals:in which other streams did the authors of this stream publish most frequently ?todo: what is most frequently?what is the neighborbood of a stream ?
Michael Ley
-
CRYPTOEUROCRYPTSTOCFOCSSIAM J. Co.ASIACRYPTTCSJCSSIPLACISPPubl.Key Cry.JoC
Michael Ley
-
Thank you for your attentionPorta Nigra the most famous landmark of Trier
Michael Ley