PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 10, 2005 --...
-
Upload
chloe-walters -
Category
Documents
-
view
227 -
download
3
Transcript of PSU/Villanova/VT Discussion Virginia Tech’s Digital Library Research Laboratory Jan. 10, 2005 --...
PSU/Villanova/VT Discussion
Virginia Tech’s Digital LibraryResearch Laboratory
Jan. 10, 2005 -- PSUEdward A. Fox, [email protected]
Virginia Tech, Blacksburg, VA 24061 USAhttp://fox.cs.vt.edu/talks/
http://fox.cs.vt.edu/cv.htm
Acknowledgements (Selected)
• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …
Stepping Stones & Pathways:
Improving retrieval by Improving retrieval by chains of relationshipschains of relationships
between between document topicsdocument topics
Fernando Das-Neves, Virginia Tech DLRL
A Little Experiment(Compare a simple query with a longer version that explicitly includes
stepping stones)
• “Literary Style in Sherlock Holmes stories”
• Note: Numbers are total relevant web pages in top 20 Google results for the query made up of terms on either end of the link.
Connan Doyle
Victorian Novel
Sherlock Holmes Literary Style
4
5
20
5
Sherlock Holmes Literary Style 2
VS.
No. of rel. docs.
Another Example
• “What is the Relationship between Data Mining and Recommender Systems?”
• Naïve Results: There are many matches that are possible answers.• Discussion: But, many of the pages with co-occurrences give no real
information about the requested relationship.
Social Networks
Collaborative Filtering
Recommender Systems
Data Mining
Recommender Systems
Data Mining
Machine Learning
VS.
7
10 10
9 11 15
An Alternative Interpretation of a Query in IR:
• A query represents two related, separable concepts.
• Objective: Retrieve a sequence of documents that support a valid set of chains of relationships between the two concepts.
• Input: a query representing two concepts.• Output: two groups of documents + a set of
stepping stones (document groups, i.e., clusters) connecting the topics by pathways (relations among clusters).
Type of Questions Matching Alternative Interpretation
• Ill-defined questions, with non-enumerated answers:– “How or why is X related to Y?” – “What is the X of Y?”
• Even if queries with form “give me something about X” lead to relevant docs, it is possible to increase the quantity and quality of information in the query result, when relations are explicit (as a result of our semi-automatic method).
Why is this useful?
• Questions of this type are common.– For example, such questions often occur
during research studies.– These occur often in educational settings,
e.g., for homework.– These occur often in workplace settings,
requiring gathering and relating of information.
• Handling of this type of question by current systems often is inadequate.
How to Build Stepping Stones and Pathways?
• Our approach involves a belief network, to combine content+structure in document similarity calculation, including citation and co-citation similarities.
• Find two relevant document sets, each related to one of the two original sub-queries.
• Find a diverse set of strong candidates, each connecting the two subsets, but as different as possible from other candidates.
• Create stepping stones by finding similar documents to those candidates; keep the clusters that are heavily cited, or whose documents are highly correlated (in all aspects).
• Repeat the process, finding a new stepping stone in between each pair of clusters that are weakly related, until the pathway length is too long, or the similarity is sufficient.
Streams, Structures, Spaces, Scenarios, and Societies (5S): A
Formal Digital Library Framework and Its Applications
Marcos André GonçalvesDoctoral defense
Virginia Tech, Blacksburg, VA 24061 USA
Informal 5S Definition: DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
5Ss
Ss Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines service managers, responsible for running DL services; actors, that use those services
Hypotheses
• A formal theory for DLs can be built based on 5S.
• The formalization can serve as a basis for modeling and building high-quality DLs.
5S Framework and DL Development (Gonçalves)
Requirements Analysis Design Implementation Test
5S 5SLOO ClassesWorkflow Components
DLEvaluation
5SGraph 5SLGenFormalTheory/Metamodel
DL XMLLog
5SLGen: Automatic DL Generation
5S Meta
Model5SLGraph
DL Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
Research Questions1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries?
3. What are the fundamental relationships among the Ss and high-level DL concepts?
4. How can we allow digital librarians to easily express those relationships?
5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties?
6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
Outline• Motivation: the problem
– Hypotheses and research questions
• Part 1:Theory– 5S: introduction, formal definitions– The formal ontology
• Part 2: Tools/Applications– Language– Visualization– Generation– Logging
• Part 3: Quality• Conclusions, Future Work
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
Streams
text
audio
image
video do mss
R
C DMc
describes
stores
is_version_of/ cites/links_to
Ic
Se
Sc
e
extendsreuses
SM
Ac
opexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Top
Pr Metric
Measurable
Measure
describes
employsproduces
employsproduces
employsproduces
Structures
Spaces
Vec
belongs_to
contains
ms
is_ais_a
precedeshappens_before
is_a
redefinesinvokes
contains
contains
Digital Library Formal Ontology
Ic
Acquiring
universalcollection
C
DMCIndexing
DescribingCataloguing
Linking
Hypertext
Submitting
AuthoringDigitizing
doi
mskjp
p
e
e
describes
p
p
p
e
e
p
e
p
Composition of key infrastructure services
Composition of additional services
SearchingBrowsing
queryanchor
Society
actor
C, {doi, i I}
Recommending Filtering Binding Visualizing Expanding query
user model/expr query/category {doj, j J}
{dor, r R} {dof, f F}
biuk
InformationSatisfaction Services
spj query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(doi, acj, rij), i I, j }
p
e
e
p p p p p
e e
classCt
e ee e
e
p
e
Indexing
IC
p
e
transformer
e
Ontology: Taxonomy of Services
BindingBrowsingCustomizingDisseminatingExpanding(query)FilteringRecommendingRequestingSearching
AnnotatingClassifyingClusteringEvaluatingExtractingIndexingLinkingLogging
MeasuringRating
Reviewing (peer)Surveying
Training (classifier)TranslatingVisualizing
ConservingConverting
Copying/ReplicatingTranslating (format)
AcquiringAuthoringCataloging
Crawling (focused)DescribingDigitizingHarvestingSubmitting
PreservationalCreational
AddValue
Repository-Building
Information SatisfactionServices
Infrastructure Services
5SL: a DL Modeling language
• Domain specific languages – Address a particular class of problems by offering
specific abstractions and notations for the domain at hand
– Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping.
• XML-based realization of 5S– Interoperability– Use of many standard sub-languages (e.g., MIME
types, XML Schemas, UML notations)
5SGen – Version 2: ODL, Services, Scenarios
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SGen
The XML Log Format
Log
SessionId MachineInfo StatementTransaction Timestamp
SessionInfo RegisterInfoEvent ErrorInfo
Action
Search Browse StoreSysInfoUpdate
SearchBy QueryString CatalogCollection PresentationInfo
StatusInfo
Timeout
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Similarity
Significance
Quality and the InformationLife Cycle
Rao Shen’s Preliminary Exam:Hypothesis and Research Questions
• The 5S framework provides effective solutions to DL integration.
– Formally define the DL integration problem?– Guide integration of domain focused DLs?
• How to formally model such domain specific DLs?• How to integrate formally defined DL models into a
union DL model?• How to use the union DL model to help design and
implement high quality integrated DLs?
– Assess the integration?
Related Work
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
SemInt
has an example
LSD
has an example
Interrelated with
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
Interrelated with
GA
trained by
DL integration formalization
based on
Formal Definition of DL Integration
• DLi=(Ri, DMi, Servi, Soci), 1 i n
– Ri is a network accessible repository
– DMi is a set of metadata catalogs for all collections
– Servi is a set of services
– Soci is a society
• UnionRep• UnionCat• UnionServices• UnionSociety
Repository1
DL1
Repository2
Union Catalog
Union Repository
Catalog1 Catalog2
Searching
Union DL DL2
archaeologists
Society
General Public
Society
ArchaeologistsGeneral Public
Union Society
ServiceBrowsingService
Union Service
Harvesting, Mapping,Searching, Browsing,
Clustering, Visualization
Architecture of a Union DL
CitiViz:A Visual User Interface to the
CITIDEL System
ECDL 2004, Bath, England, September 2004
Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and
Edward A. [email protected] http://fox.cs.vt.edu
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Descriptive Metadata
specification
SpaTemOrg
StraDia
Arch Descriptive Metadata specification
ArchDO
ArchObj
ArchColl
Arch Metadata catalog
ArchDColl ArchDR Minimal ArchDL
A Minimal ArchDL in the 5S Framework
5SGraph5S Archaeology
MetaModelArchDL Expert ArchDL Designer
Structure Sub-model
ETANA-DLUnion Services
Descriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
VN Metadata Format
ETANA-DL Metadata Format
HD Metadata Format
Mapping Tool
Wrapper4VN Wrapper4HD
Inverted Files
Services DB
Index
Index
BrowseService
SearchService
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
VNCatalog
HDCatalog
UnionCatalog
5SGen
ComponentPool
Browsing…
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:– Fox (director, DL systems)– Lee (history)– Perez (user interface, Spanish support)– Students: Ryan Richardson, Kate McDevitt,
Jon Pryor, Baoping Zhang
• Partners– College of New Jersey (Knox)– Hofstra (Impagliazzo)– Villanova (Cassel)– Penn State (Giles)
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming (Aaron Krowne, now at Emory U. Library)
•Integrates multiple classifications / views through maps, closure
•Extensions: clustering, visualization, personalization, …
CITIDEL + PIPE• Adds Interaction Personalization to CITIDEL
•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.
•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.
Naren Ramakrishnan and Saverio Perugini (U. Dayton)
OCKHAM Library Network (NSDL)
NSDL
OCKHAM
Services
NSDLServices
Teachers LearnersLibrarians
OCKHAMLibrary
Network
LibraryServices
OCKHAM (Ming Luo)
• Simplicity (a la OCCAM’s razor)• Support by Mellon and DLF• Four main ideas:
1. Components2. Lightweight protocols3. Open reference models (e.g., 5S, OAIS)4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P, with Emory, Notre Dame, Oregon State, …
OCKHAM Proposed Services
• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org (supported by Ming Luo)
LOCKSS Extensions:Bing Liu, Xiaoyu Zhang, Ji-Sun Kim• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels, journals• Shift to OAI, esp. for ETDs• Collab with Emory (Martin Halbert)
– NDIIP: AmericanSouth, MetaArchive– Help deploy and adapt, apply in other contexts
• Another registry• Set of publisher manifests (information providers)• Set of storage systems (archival storage)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
Hussein Suleman(Capetown, S. Africa)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
ETD DL for the Networked Digital Library of Theses and Dissertations
(www.ndltd.org)
Search
Filter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers ETD collections
Example Open Digital Library
Open Digital Library Deployments
• NDLTD (www.ndltd.org)• Computer Science Teaching Center
(www.cstc.org)• Computing and Information Technology
Interactive Digital Educational Library (www.citidel.org)
• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet
• OCKHAM• Open to others through DL-in-a-box
Interest-based User Grouping Model
for Collaborative Filtering in Digital Libraries
7th ICADL 2004
Shanghai, P.R. China
Dec. 15, 2004
Edward A. Fox, Seonho KimVirginia Tech, Blacksburg, VA 24061 USA
Some Other Students/Projects
• Wensi Xi: Matrices, reinforcement, clusters (Microsoft)• Paul Mather: mod/sim of large DLs on clusters;
characterization: uses, files (NASA)• Ming Luo: personalization aided by demographics• Ryan Richarson: CLIR with concept maps• Xiaoyan Yu: Stepping Stones and Pathways (NSF,
Fernando Das Neves completed & returned to Argentina)• Baoping Zhang: Physics and classification (NSF, DFG)• Several: TREC with GP• New projects:
– Superimposed information w. PSU (NSF NSDL)– Quality and metasearch and structure w. Emory (IMLS)
• …