Terminology in Statistical Information Integration Tasks: What’s the Problem?
description
Transcript of Terminology in Statistical Information Integration Tasks: What’s the Problem?
Terminology in Statistical Information Integration Tasks:
What’s the Problem?
Open Forum 2003 on Metadata Registries
Thursday, January 23, 20032:00-2:45 pm
Sheila O. Denn
2Open Forum 2003 on Metadata Registries
Introduction
Work undertaken as part of NSF grant (EIA 0131824) to study integration of data and interfaces to work toward a statistical knowledge network.
This talk focuses on results from first phase of metadata user study to determine what kinds of problems users have with terminology and metadata on government statistical web sites.
3Open Forum 2003 on Metadata Registries
agency backend data
agency backend data
agency backend data
agency backend data
agency intermediary: reports, tables,
“planned” DB queries
agency intermediary: reports, tables,
“planned” DB queries
agency intermediary: reports, tables,
“planned” DB queries
agency intermediary: reports, tables,
“planned” DB queries
end user: generally passive
reader, little interaction, must do all integration
Current Situation: each agency has its own backend data and provides its own intermediary. End user has little opportunity for interaction or active manipulation. Burden of finding information and integrating it across agencies (and occasionally within one agency) is on user.
firewall
4Open Forum 2003 on Metadata Registries
agency backend data
agency backend data
agency backend data
agency backend data
Goal: In the SKN, each agency has its own backend data, which feeds into a common public intermediary (PI) outside of firewall. User Interfaces link to the PI under user control.
public intermediary: variable/concept level,
XML-based, single point of access to information
from all agencies
Statistical Ontology
firewall
Domain ExpertsEnd User
Communities
Domain Ontologies
I n t
e r
f a c
e s
U s
e r
end user
end user
end user
end user
end users: interactwith data frominformation/conceptperspective, notagency perspective
end user
end user
end user
5Open Forum 2003 on Metadata Registries
What kinds of problems does terminology cause for users?
Miss
Collision
Categorization
User Agency
Term
Term
??
??
Termuser Termagency
Termuser category Termagency category
Termuser TermuserTermuser Termuser
6Open Forum 2003 on Metadata Registries
What kinds of problems does terminology cause for users?
Misses There is no agency term or concept that is linked
to a term or concept that the user is interested in or
The user encounters a term on the system with which she is unfamiliar or about which she has only a vague understanding.
Examples: Seasonal adjustment Consumption vs. production Farm profits vs. market value of agricultural products
7Open Forum 2003 on Metadata Registries
What kinds of problems does terminology cause for users?
Collisions A user has an understanding of a concept that is different from the way the
concept is expressed by the agency. The same term is used differently by different agencies, making integration
of data difficult. Can also apply to clusters of terms where it is not clear what the distinction
between them is. Examples
Labor, labor force, labor supply, workforce, labor force participation rate, labor market
Full-time employment Sector
Categorization – when category groupings do not make sense to the user. Example
Soybeans
8Open Forum 2003 on Metadata Registries
Data Collection
In previous work: Transaction logs User queries Interviews
In the first phase of current study, interviews with agency and non-agency domain experts
These sources of evidence yielded categories of terms that can cause difficulty.
9Open Forum 2003 on Metadata Registries
Categories of Terms
Statistical terms Date/currency/time Geography Domain terms User terms
10Open Forum 2003 on Metadata Registries
Implications for Vocabulary Support Tools
Goals: Provide a basic level of statistical literacy Not intended to be a highly technical, or comprehensive
resource Include terms users frequently encounter while browsing
statistical agency sites Sources of Evidence:
Terminology used on frequently visited pages Anecdotal evidence from agency and non-agency
consultants Metadata user study Web crawl of agency sites
11Open Forum 2003 on Metadata Registries
Implications for Vocabulary Support Tools
We need to explore how we can use metadata to map between the user terms and the agency terms, and between terms as used by different agencies.
Users are not likely to browse the glossary as a distinct activity, so they need “just-in-time” vocabulary support.
Vocabulary support should allow users to remain in context, not lose sight of the task they are working on. Context specificity – explanations should be provided at varying
levels of specificity General (context-free or “universal”) Agency or context-specific (term as used by particular agency or within
particular domain) Table or statistic-specific (term as it relates to a particular row, column,
or statistic)
12Open Forum 2003 on Metadata Registries
Implications for Vocabulary Support Tools
Provide explanations of term or concept that are as relevant to the user’s current context as possible.
The most specific explanations available should be offered at the time a user first invokes help.
If there are no explanations appropriate for a specific statistic, row, or column, offer an explanation one level up in generality.
Pathways from specific to general will be based on a statistical ontology currently under development.
The ontology will also be used to provide patterns (templates) for definitions at each level of specificity.
13Open Forum 2003 on Metadata Registries
Vocabulary Support Tool Examples
The tools we are working on will provide a basic level of explanation of statistical terms.
Tools may include: Definitions Examples Brief tutorials Demonstrations Interactive simulations Pointers to related terms/concepts Pointers to more complete (or more technical) explanations
14Open Forum 2003 on Metadata Registries
IndexAn index combines numbers measuring different things into a single number. The single number represents all the different measures in a compact, easy-to-use form. Values for an index can be compared to each other, for example, over time.
combiner
index = 12.3
10.1
103
24.759
6
42
12
12.5
13
13.5
14
14.5
Jan Apr Jul Oct
Jan.combiner
Apr.combiner
Jul.combiner
Oct.combiner
12.3 13.1 13.9 14.3
The index has increased this year.
16Open Forum 2003 on Metadata Registries
Consumer Price Index (CPI)
The Consumer Price Index (CPI) represents changes in prices of all goods and services produced for consumption by urban households. It combines prices into a single number that can be compared over time.
Items are classified into 8 major groups:•Food and Beverages•Housing•Apparel•Transportation•Medical Care•Recreation•Education and Communication•Other
Consumer Price Index
medical careother
CPI combiner
transportationfood & beverage
apparel
recreation
housing
education & communication
Telephone
The Consumer Price Index has increased since 1995.
1997 CPICombiner
1998 CPICombiner
1999 CPICombiner
2000 CPICombiner
2001 CPICombiner
160
165
170
175
180
1997 1998 1999 2000 2001
19Open Forum 2003 on Metadata Registries
Antiknock Index, also known as Octane Rating
A number used to indicate gasoline’s antiknock performance in motor vehicle engines. The two recognized laboratory engine test methods for determining the antiknock rating, i.e., octane rating, of gasolines are the Research method and the Motor method. In the United States, to provide a single number as guidance to the consumer, the antiknock index (R+M)/2, which is the average of the Research and Motor octane numbers, was developed.
http://www.eia.doe.gov/glossary/glossary_a.htm
Research method
Motor method
Antiknock Index, also known as Octane Rating
Regular:
85 - 88
Midrange:
88 - 90
Premium:
90 or above
(R + M)/2
AntiknockCombiner
21Open Forum 2003 on Metadata Registries
Evaluation
What do we need to evaluate? Technical accuracy Usability of interface “Effectiveness”
Is it attractive enough to entice people to use it? Is it helpful? Is it informative? Does it help the user in completion of task?
How do we measure these things? What other kinds of vocabulary support issues do we
need to address?
22Open Forum 2003 on Metadata Registries
Other Issues
Implementation Ongoing maintenance/responsibility
23Open Forum 2003 on Metadata Registries
Project Teams
Metadata User Study Team Carol Hert Stephanie Haas Jenny Fry Lydia Harris Sheila Denn
Vocabulary Support Team Stephanie Haas Ron Brown Cristina Pattuelli Jesse Wilbur
GovStat PIs Gary Marchionini, UNC-CH Stephanie Haas, UNC-CH Carol Hert, Syracuse
Catherine Plaisant, UMd Ben Shneiderman, UMd
24Open Forum 2003 on Metadata Registries
For More Information
Sheila O. Denn
School of Information and Library Science
University of North Carolina at Chapel Hill
http://ils.unc.edu/govstat/