Evaluation of Citation Enhanced Scholarly Databases INFOPRO 2005 Keynote address Dr. Peter Jacso...

Post on 27-Mar-2015

217 views 1 download

Tags:

Transcript of Evaluation of Citation Enhanced Scholarly Databases INFOPRO 2005 Keynote address Dr. Peter Jacso...

Evaluation of Citation Enhanced Scholarly Databases

INFOPRO 2005 Keynote addressDr. Peter Jacso ProfessorUniversity of Hawaii, USA

Tokyo, November, 2005

Jacso

Japan in Science & Technology

Jacso

Japan in Science & Technology publication

Jacso

The birth of an idea

Eugene Garfield

Jacso

Thesaurus-based and free-text searching

• Nice in library schools but not in practice• Is it sciatica or ischialgia ?• Orthopedic or orthopaedic • Center or centre• Shiatsu or shiatzu• Student or pupil• Bad behavior or bad behaviour

Jacso

What is a citation?

• Citation or reference

• Citation indexing, indexes or indices

• Citation analysis or analyzis

• Or is it analyses?

Jacso

Multi-disciplinary – discipline-focused

• WoS and Scopus largest multidisciplinary databases

• Google Scholar – it is free, and ….. ?

• Discipline-oriented databases

• arXiv - primarily physics

• NASA/ADS – astrophysics and some related

• PsycINFO - psychology

• CINAHL - nursing

• CiteSeer – computer science

• RePEC and its derivatives – economics

• SMEAL – businessJacso

Citation

collecting, parsing, indexing, matching,  browsing, searching, sorting, ranking outputting linking

Jacso

The purpose of database evaluation

• We did it with print reference sources for a long time

• Content AND Software

• Practical and financial implication$$$$

• Thermometers, pulsometers, blood pressure meters are not

enough

• X-Ray, MRI, blood tests

• Quantitative and qualitative aspects

• Quantifiable, measurable vs. philosophical-ideal

• Can’t do it in your lunch break

• Going beyond PR-info from database publishers Jacso

CONTENT MEASURES

• Database size

• Database dimensions

• Scope

• Composition

• Source coverage

• Journal base

• The special aspects of cited references as data elements

Jacso

Database Size

• Guinness Book of World Records mentality

• Biggest, Greatest, Largest, Greenest

• Fastest, Strongest, Leanest, Meanest

• Where is quality?

• Aeroflot – the biggest airline …. and (one of) the worst

before glasnost

• Sports Discus – little muscle much flab

Jacso

Database Size

WoS 1980-2005 25.5 million records

WoS 1945-2005 36 million records

Wos Century of Science 37 million records

 

Scopus 26 million records

 

Google Scholar (GS) maybe 10 million records, but

mixing fine jizake with cheap wine

Jacso

Database Dimensions

• Absolute size is not everything

• Biggest is not always the best(est)

• HowHow is database A bigger than database B

• In what shape and form?

• How is the “body” of the database built

• Different disciplines - different preferences

Jacso

Bigger horizontally (wider) vs. vertically (taller)

Jacso

Number of records with cited references Scopus

Jacso

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06

Dialog ISI subset & Scopusnumber of records with cited references

Jacso

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06

The fate of two databases

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

80,000

90,000

100,000

1966

1968

1970

1972

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

PsycINFO

MHA

Jacso

Noticeable lack of currency

Jacso

Google Scholar’s innumeracy more for 1965 – 2005 than for 1955 - 2005??

Jacso

GS big in/on Japan - 86% of all of its 1955-2005 records?

bigger between 1965 - 2005 than 1955 - 2005?

Jacso

Subject Scope

• Not static, may have evolved in the past X years• Obvious subject dominance in Scopus at the

journal level

Jacso

Much more subject dominance at the article level

Jacso

Apparent Presence of Arts & Humanities in WoS

Jacso

Composition

Jacso

Composition

Current Science

Jacso

there are books & conference proceedings in Scopus

(but not enhanced with cited refs)

Jacso

All records26,731,691

with keywords21,706,112

with abstracts18,538,475

with refs8,442,048

Completeness of records

Jacso

Completeness of records Dialog ISI subset total items & items with cited references

Jacso

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06

Scopus number of total items & the number of records with cited references

Jacso

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06

The Scientist case

Jacso

The D-Lib Magazine claim

Jacso

The D-Lib Magazine’s bold claim

Jacso

Jacso

JASIS - JASIS&T case

The Scientist case

Jacso

The Scientist case

Jacso

The Scientist case

Jacso

The Scientist case

Jacso

JASIS - JASIS&T case

Jacso

Jacso

JASIS - JASIS&T case

Evaluation of Citation Enhanced Scholarly Databases

Part 2.

Dr. Peter Jacso Professor

University of Hawaii at Manoa, USA

Tokyo, November, 2005

Jacso

Software Issues

• Software capabilities can make or break a product

• Cited references represent new and unusual data element

• New challenge, few (WoS, Scopus, CSA) can do it well

• For researchers adding cited references to their paper is

the bane of publishing

• No universal standard for cited reference formats.

• Reference Management programs support more than 700

citation style formats

Jacso

ProCite

Jacso

Many chances for messing up cited references in digitization

• Who can mess them up?

• Authors, editors, copy editors at publisher

• Data entry operators at A/I services

• Programmers at database aggregators

• Programmers when extracting data from publishers’ archives

• Spoiled and careless programmers when doing anything

Jacso

Selected references

Jacso

Notes

Jacso

Jacso

Google Scholar

• Autonomous citation indexing is not perfect either• Google Scholar mightily managed to mix up many metadata elements• Is this an article published in 2006?• Has it been really cited 98 times already in October, 2005.

Jacso

• No, it’s the page number, a Hungarian postal code, or any 4-character digit

Jacso

Careless data entry/OCR-ing can cripple the links

Jacso

EBSCO

… or 20th century programming can serve references dead cold

Jacso

Jacso

as opposed to the native hot-linked WoS version

Jacso

or the hot and spicy Scopus version with “cited by” (citedness) score

Jacso

…as long that cited references have no misspellings

Jacso

• typos cripple the impressive “cited by” feature - the best of Scopus and CSA, which can’t undo the misspelling shown earlier or the one done by PsycINFO in the author name here – it is Jacso not Jasco, thank you

Jacso

in the cited references they are crippled in more way than one,

but we may feel warmed up by 6 “cited by” hot links to records which cite Moed’s article in PsycINFO …

Jacso

….that’s why there are relatively few as opposed to the 45 citedness score in Scopus shown earlier ….

Jacso

…. and the 55 citedness score in Wos for the cited 1995 article of Moed HF. The citedness scores of WoS and Scopus often get close for articles published since the mid 1990s, but not for the earlier ones

Jacso

Remember, to see the citedness score is yet a two-step process in WoS, but it likely will include soon the citedness score within the cited reference list directly as in CSA and Scopus.

Jacso

Browsing of citing (source) and cited (target) Author and Journal Names is a must. Still only few offer adequate browsing. Scopus only for source author and source title.

Jacso

Browsing/Looking up citing/cited authors & journals

Jacso

Inconsistencies and inaccuracies are rampant in source journal names as in PASCAL

Jacso

WoS can spell it consistently nearly 20,000 times as source journal - quality control, order instead of disorder

Jacso

In cited sources all hell breaks loose

Jacso

Without browsing and defensive searching you would miss a lot

In Dialog’s version of the ISI subset misspelled formats are not corrected

Dialog only updates (adds new records) does not RE-LOAD (to correct old ones)

Jacso

In CINAHL I have slim-chances without browsing the author and cited author fields before searching. Browsing is like looking in the pool before diving to see if there is water, and how much is thereBe savvy & browse, browse, browse if the software allows

Jacso

AND WHAT BROWSE OPTIONS Google Scholar offers?

None

Zilch

Nada

Zero

Kotonashi

Jacso

SEARCHING

• Rather limited options for cited author, cited title, cited journal

• Menu driven in WoS

• SAME (sentence) option in WoS, but …

• …. No searching in cited title in WoS

• Proximity and positional operators in Scopus

• Mostly command-driven in Advanced Mode in Scopus

• Useful but ugly prefixes in Scopus

• Good menus in EBSCO and Ovid

Jacso

SEARCHING

• No truncation when searching in REFxxx ?

Jacso

Result display and sorting

• Short result list for at-

a-glance impression

about sources, then

sorting by citedness

score!

Jacso

Jacso

Jacso

Sorting & relative citedness score

• CSA could sort but does not offer this feature by citedness

• Google Scholar used to rank the result by citedness score

• No one offers citedness by age adjusted score even if that would be the

most fair

• 10 year old versus 2 year old article had different chances for receiving

citations

• My tests showed big difference for some items in ranking by absolute vs

relative citedness score

Jacso

Jacso

The many dimensions of citedness scores

Citedness scores can be highly informative in estimating usefulness & perceived importance of a paper by peers in form of citations (=links).

 

Major differences because of the domains of citing sources

• In journal publishers’s archive gathered only from digitized journals of

the publisher

• At aggregators/facilitators from all databases hosted (except ...PsycINFO

for not so splendid isolation policy))

• In self-published databases gathered from the database itself

• In Scopus gathered from 1996 onward from >10,000++ journals

• In WoS gathered from 1900/1945/1980 forward from <10,000 journals

Jacso

All the above assume correct identification, matching & calculation.Enter Google Scholar – playing fast and loose with the numbersMake it very fast and very loose

Jacso

A half-page quickie interview with the author in The Scientist cited 7,380 times?

Jacso

You can scroll up and down in the purportedly citing Nucleic Acid Research article for the name of Kraulis and The Scientist and the title, you will not find them. Any of them.

Jacso

Jacso

Two articles by Kraulis, but neither is the 1993 piece in The Scientist

Jacso

But what do you expect from a software that cannot even do the most basic Boolean OR operation correctly

Jacso

Indeed, “citation data is subtle stuff” and requires competence

Jacso