Improving the effectiveness of Web searching: Methodological issues
description
Transcript of Improving the effectiveness of Web searching: Methodological issues
![Page 1: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/1.jpg)
Improving the effectiveness of Web searching:
Methodological issuesBarry EaglestoneDepartment of Information StudiesUniversity of [email protected]
![Page 2: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/2.jpg)
Overview
• An inductive study to build evidence-based meta-cognitive models of web searching by the general public.
• Data modelling issues– A Temporal data modelling solution
• Discussion & Final thoughts
![Page 3: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/3.jpg)
An inductive study of how the general public search on the web.
Setting the scene – the database approach and state of the art.
![Page 4: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/4.jpg)
Motivation
• Need to develop new models for searching: update outdated usage paradigms.– Improve training methods– Develop automated assistance systems
![Page 5: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/5.jpg)
Previous studies of search logs
• Web search is shallow + promiscuous• Low use of advanced features• Global statistics
– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity
![Page 6: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/6.jpg)
chemoinformatics
Database
The Team
Information SeekingInformation Seeking
chemoinformatics
Database
![Page 7: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/7.jpg)
Soft Hard
Spectrum of Research Perspective
Modelling/engineering/empirical
Qualitative / quantitative data analysis / modeling
Human / organisationalissues
FormallyDefinedproblems
Computer world formalisations
Hardware /Software solutions
CS Computer WorldCS Computer WorldPeople world ISPeople world ISInventionInventionDiscoveryDiscovery
ProblemProblemSolvingSolvingformalismformalism
![Page 8: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/8.jpg)
How will we use it?
Effectiveness?
Meta-cognitiveKnowledge aboutweb searching?
How do theysearch?
Who are the searchers?What are they searching
for?
Infer effectiveness from•search transformation patterns•subject’s narrative
ContextThe GENERAL PUBLICVolunteers (c500 searches):
ICT coursesUniversity evening classesCity Learning Centre coursesCitizens’ forumPersonal contactsLibraryAdvertisingStudents and academics
+ over 1,000,000 search logs anonymous searchers
•Self-selected searches explained through interview and think aloud protocols•2-3 set searches
Observe and record•Over 1,000,000 anonymous search engine transaction logs
•c500 observed and recorded searches; talk to searchersDetermine query similarity
Delimit searchesCode query transformationModel searches as transformation graphsData mine for stereotypical search strategesCorrelate with who, why and effectivenessThus, establish evidence-based models of search strategy, related to user and problem characteristics and likelihood of success
Evidence-based meta-cognitive trainingIntelligent interfaces
![Page 9: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/9.jpg)
Why Meta-cognition?. “Meta-cognition refers to higher order thinking
which involves active control over the cognitive processes engaged in learning. ….”
Livingston (1997)
• Meta-cognitive knowledge– “…knowledge of personal variables to general knowledge about
how human beings learn and process information, as well as individual knowledge of one’s own learning processes…” e.g. “I have a bad memory!”
• Meta-cognitive regulation– “… activities used to ensure that that a cognitive goal has been
met….”, e.g., question yourself about the text and then re-read.Livingston (1997)
![Page 10: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/10.jpg)
Cognitive Styles Analysishttp://www.memletics.com/manual/default.asp?ref=ga&data=999+learning+styles+free+test
Holist Analyst
Verbalizer
Imager
![Page 11: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/11.jpg)
Syntactical/quantitative Semantic/qualitative
Exite search logs
~106 searchesHolistic search logs
Supplemented with qualitative data
![Page 12: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/12.jpg)
Preliminary work
• Analysis of search logs
• Development of descriptive codes
• Aim is to form a basis for the analysis of our experimental data
![Page 13: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/13.jpg)
Strengths / Limitations• Large sample• Definitely general public.• No enquiry context – what are they looking
for? What are they thinking?• No measure of success.• Are they searching or just browsing?• Where does one enquiry end and another
begin?• Limited to one search engine – what did they
do during a delay?
![Page 14: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/14.jpg)
Excite Database Sampleqid uid time rank query querymore totwords
343 000000000000006a 192141 0 alco fence company ohio No 4
344 000000000000006a 192219 0 alco fence company ohio No 4
345 000000000000006a 192228 10 alco fence company ohio No 4
346 000000000000006a 192243 20 alco fence company ohio No 4
347 000000000000006a 192328 0 lifetime fence company ohio No 4
348 000000000000006a 192359 10 lifetime fence company ohio No 4
349 000000000000006a 192455 0 lifetime wire fence No 3
350 000000000000006a 192634 0 high tensile wire fence No 4
351 000000000000006b 161906 0 sickle cell anemia No 3
352 000000000000006b 162006 10 sickle cell anemia No 3
353 000000000000006b 162130 0 sickle cell anemia No 3
354 000000000000006c 144303 0 Hilton Garden Inn No 3
355 000000000000006c 144331 0 Hilton Garden Inn Jacksonville No 4
356 000000000000006c 144433 0 Hotel Search No 2
357 000000000000006c 144541 0 Jacksonvill Hotel No 2
358 000000000000006c 144728 0 www.hilton.com No 1
~ 106 queries
1
2
3
Sessions
![Page 15: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/15.jpg)
Query Transformations• Changes in search strategy
– conceptual e.g. changes in type of search: broad specific text image
– Linguistic: syntactic, query structure.
• Examples Q1: shakespeare hamletQ2: shakespeare hamlet quotes
Q3: to be or not to beQ4 “to be or not to be”Q5: “to be or not to be” +shakespeare
![Page 16: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/16.jpg)
Our Preliminary Analysis
• To look at textual (syntactic) changes.• Link queries by text similarity.• Infer enquiry change from textual
dissimilarity.• Use these elements to develop a
machine-readable codification of QT’s.• To mine for characteristic patterns.
![Page 17: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/17.jpg)
Code TransformationN New queryR A repeated query /same page
rank – relevance feedback. P Page ranking (seek more)p Page ranking (earlier pages)
I(k) Identical C(k) Conjoint
S(k) Sub-phrase in common s(k) Sub-phrase + words in commonM(k) Other textual similarity
Example Transformations
![Page 18: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/18.jpg)
QT graphs
N 1 2 3 5 4 6 Start
M C C s
22
23
25
26
27 s
s
s
S s
24 28
RP(14)
END
s s
20
29
R
s
21 5
uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R
nursing careerspaid undergraduate nursing schools in baltimore city maryland
Code Transformation
N New query
R A repeated query /same page rank – relevance feedback.
P Page ranking (seek more)
p Page ranking (earlier pages)
I(k) Identical
C(k) Conjoint
S(k) Sub-phrase in common
s(k) Sub-phrase + words in common
M(k) Other textual similarity
![Page 19: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/19.jpg)
QT graphs
7 2
N 1 2
3
5
4
6 Start M C
QJ
C 19 15
14
18
P(7)
END
20
P C
P(3)
Delay
QJ
QD
uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)
molsworth
"us army"
![Page 20: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/20.jpg)
Preliminary Conclusions• We have developed a rich set of codes
describing syntactic part of QT’s• These can be used to develop a graph-based
description• Correlations between the codes are
meaningful/interesting• They form part of the analysis for our current
experimental study.
![Page 21: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/21.jpg)
…and if you want to read about our preliminary results….
• Whittle M, Eaglestone B, Ford N, Madden A (2007), Data Mining of Search Logs, Journal of the American Society for Information Science and Technology (in press)
• Whittle M, Eaglestone B, Ford N, Gillet V.J., Madden A (2006), Query Tranformations And Their Role In Web Searching By The General Public, Information Research, 12(1) October 2006
• Whittle M, Eaglestone B, Ford N, Gillet V, Madden A (2006), Query transformations and their role in web searching by the general public. Information Seeking in Context Conference 2006 ISIC, Austrailia
• Andrew Madden, Barry Eaglestone, Nigel Ford, MartinWhittle (2006) Search engines: a first step to finding information: preliminary findings from a study of observed searches, Information Seeking in Context Conference 2006 ISIC, Austrailia.
![Page 22: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/22.jpg)
Sheffield Experimental StudyScreensAudio
Qualitativeanalysis
Quantitativeanalysis
KeystrokesQueriesWeb page titles
Transcribing Pre-Processing
Temporaldatabase
Modeldevelopment
![Page 23: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/23.jpg)
Data modelling issues
![Page 24: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/24.jpg)
Evolution of databasesSetting the scene – the database approach and state of the art.
The database approach – A database should be a natural representation of information as data, suitable for all relevant applications without duplication, including the ones you have not yet though of.
“A well designed database system will mirror its users’ perceptionsmirror its users’ perceptions of the problem space, and thus allows them to address the problem in hand without address the problem in hand without complexities and distractions of complexities and distractions of computer world implementation computer world implementation detailsdetails… Implicit is the notion that users should work within the bounds of ‘good ‘good practice’practice’””
![Page 25: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/25.jpg)
The semantic gap
Customer Salesperson
Take_byPlaced_by
Sales_Order
1
n m
1
C# Name …C1 Dr. EaglestoneC2 Ms Smith
SP# Names …S5 Mr. Chan …S8 Dr. Shao
C# SP# Product QuantityC1 S5 P99 120C1 S5 P2 10
Customer
Salesperson
SalesOrder
The gap between what you wish to represent and what you can represent.
Setting the scene – the database approach and state of the art.
![Page 26: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/26.jpg)
….. & Data Independence
Applications/Users
External Model
Logical Model
Internal Model
Principles of database technology…
![Page 27: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/27.jpg)
QT graphs
7 2
N 1 2
3
5
4
6 Start M C
QJ
C 19 15
14
18
P(7)
END
20
P C
P(3)
Delay
QJ
QD
uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)
molsworth
"us army"
![Page 28: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/28.jpg)
A Ready-madeTemporal data modelling solution
![Page 29: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/29.jpg)
GENREG – A ready-made solution that has also been proposed for healthcare ?
The Organisation: National Museum of Denmark
Multimedia– Pictures as well as descriptions
Distributed– Each department ran their own database system
for their collection (ownership!) Object-oriented design
– Entities, not just values Relational implementation
![Page 30: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/30.jpg)
Database Research
Science
Technology
Application
Praxis
Theory
![Page 31: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/31.jpg)
TopologyDanish Pre-history
Department of Antiquity
Ethnographic Department
Coin Collection
LAN
1,000,000 artefacts200,000 images
![Page 32: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/32.jpg)
Design / Abstractions•Design
•Object oriented•Based on a curator’s perspective
•“Curators apply scientific training to determine the history of artefacts…creating knowledge about past and present societies by determining relationships which group artefacts within certain times and places in history”
•AbstractionsArtefactEventRelationship
•relate artefacts which participate in common events
![Page 33: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/33.jpg)
Mould
usedto
fabricate
Brooches
![Page 34: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/34.jpg)
GENREG data model
ARTIFACT
EVENT/ARTIFACT
One (or more) artifactsparticipates
in one or more events.
![Page 35: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/35.jpg)
Burial site
Grave Grave
ArtefactArtefactArtefactArtefact Artefact Artefact
![Page 36: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/36.jpg)
E
IH
F
DCB
A
G
LKJ
Merchant’s House
Manor House
Rooms
Furniture
Furniture
Purchase event
![Page 37: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/37.jpg)
![Page 38: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/38.jpg)
Integrated Care Pathways Application
[Procter, P., Eaglestone, B.M. & Burdis, C. “A unified model to support an information intensive healthcare environment, MIE
'99]
P1
P2
P6P3
P4 P5
It
It+2
It+1
It+2
It+1
Treatment
Alternative diagnoses
Alternative prognoses
![Page 39: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/39.jpg)
A formal GENREG Modeltype Genreg = abs [tuple[ Collection : Artifacts, Events : set[Event]]
new : () Genreg,= : (Genreg × Genreg) boolean,events : (Genreg) set[Event],collection : (Genreg) Artifacts]
type Artifacts = graph[Artifact]
type Event = abs[ tuple [id: E_Id, type : Exent_type, t : Time,place : Location, actors : set[Actor_Type], edge : set[Edge]]= : (Event × Event) boolean,id : (Event) E_Id,type : (Event) Event_Type,t : (Event) Time,place : (Event) Location,actors : (Event) set[Actor_Type],edgeset : (Event) set[Edges]]…
![Page 40: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/40.jpg)
type Time = abs[tuple[ lower, upper: T]new : () Time,= : (Time × Time) boolean,before : (Time × Time) boolean,meets : (Time × Time) boolean,overlaps : (Time × Time) boolean,during : (Time × Time) boolean,starts : (Time × Time) boolean,finishes : (Time × Time) boolean,
![Page 41: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/41.jpg)
• add_artifact / delete_artifact (D, a)• add_event / delete_event (D, e)• merge (D,F,E)
• select_artefacts (D,p)• select_events (D,p)• related_to (D,n)• related_by (D,e,n)
![Page 42: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/42.jpg)
Temporal Data Models(See also SQL/Temporal)
Entity
Attr
ibut
e
Time Entity: Barry; Height: 5’ 10’’
Entity: Barry; Height: 2’ 3’’
Time: 2004
Time: 1950
![Page 43: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/43.jpg)
• Artefact histories are created retrospectively
• Multiple orthogonal time dimensions can be represented (using specialised events), e.g., discovery and historic time.
• Relationships between events and states are modelled.
• Multiple objects can represent different states and interpretations of an entity.
![Page 44: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/44.jpg)
![Page 45: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/45.jpg)
QT graphs
7 2
N 1 2
3
5
4
6 Start M C
QJ
C 19 15
14
18
P(7)
END
20
P C
P(3)
Delay
QJ
QD
uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)
molsworth
"us army"
Q3
Q4
QJt
![Page 46: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/46.jpg)
Some final thoughts…
![Page 47: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/47.jpg)
Some final thoughts…
• The Database Approach?• Semantic gap?• Data independence?• Temporal modelling?• Query language?• So, what’s happening?
![Page 48: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/48.jpg)
IR & DB?
IR – collections of artefacts are available for ad hoc querying (any relevant problem) –
The problem is modelled by the query
DB – collections of artefacts are structured to model the problem space.
Server(s)Internet accessible
repositoriesof artefacts
Client(s)User are researchers
who derive knowledge fromretrieved artefacts
Problem-relatedQuery
Problem-relevantartefacts
Researcher’s workspace –Developed to model the
Problem spaceArtefact collection
![Page 49: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/49.jpg)
…final thoughts…• Knowledge of research methodology is
important (qualitative and quantitative)• Nudist, Atlas, SPSS don’t support mixed
methods• Database approach allows integration of
qualitative and quantitative data, and organisation of data to evolve to model emerging theory
• Temporal data models are key to modelling evolving strategy…
![Page 50: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/50.jpg)
Acknowledgments
• The project team – Nigel Ford, Andrew Madden, Martin Whittle
• Arts and Humanities Research Council (formerly Board) for funding
• Mark Sanderson and Amanda Spink for making the Excite logs available
• Val Gillet and Eleanor Gardiner for help with graphs.
![Page 51: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/51.jpg)
![Page 52: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/52.jpg)
Summary
Feedback can lead to semantic changesComplexity can be a hindranceSearches don’t necessarily end when a searcher leaves a search engine.
![Page 53: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/53.jpg)
AlgorithmLoop over session queries
Loop over previous queries
for i = 1 to n
for j = 1 to i-1
Compare query i with j
Choose most similar pair i,j
Analyse to assign QT type
i
j 1
n
time
![Page 54: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/54.jpg)
Some Preliminary Observations
• Quote marks are likely to be used with a new query.
• Delay is strongly associated with N (New query): these are successful single queries within a session.
• B (Include Boolean) & C (Conjoint) are positively associated
• B & D (Disjoint) are negatively associated
![Page 55: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/55.jpg)
Number of words/query: Excite 2001
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 10 100terms/query
Nor
mal
ised
freq
uenc
y
![Page 56: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/56.jpg)
Classification of textual QT’s
• Word order, addition, subtraction.• Inclusion or removal of
– Boolean terms– “quotes”
• Detection of new enquiries.
• We use similarity methods to compare words and queries.
![Page 57: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/57.jpg)
Self-selected searches
Prompts:• Think about the last time you had
trouble finding something you were looking for on the Internet.
• Do you have any hobbies or interests for which the Internet might provide useful information?
![Page 58: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/58.jpg)
Hölsher & Strube (2000): Graphical Representation
Close-up of direct interaction with a search engine: numbers show transition probabilities.
Experts and novicesdoing specificsearch tasks
![Page 59: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/59.jpg)
Set searches
Heads:What was written on Neville Chamberlain’s piece of paper?You’ve won a holiday to Saga. What can you find out about the place that interests you?
![Page 60: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/60.jpg)
Set searches
Tails:You’ve received a postcard from friends who say they’re visiting Map. Where are they? There are many opportunities to win things on the Internet. Can you find some that relate to your interests?
Additional search:Find the postcode of the tallest building in the UK outside of London.
![Page 61: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/61.jpg)
All searches recorded using
Spector pro (key stroke recorder) and My Screen Recorder (which records voice + activities on PC).
![Page 62: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/62.jpg)
Annotated transcriptsTime at which stated action takes place.
Browse time preceding action
Search 100.50 “I might as well go with what I know best”01.20 (enters ‘CD albums collection’)01.27 (6s browse) Selects 2nd link (CD universe)01.53 (31s browse) – selects Dance = 7 of ?
(>24) (on LHS). “See this is the trouble, cos I don’t really know what category it would go into. It was a mixed CD so it’s got all sorts of different things on, and there’s not really a category for that, I don’t think.”
01.56 (8s browse) – Selects Dance Collections = 7 of 12 (top of page)
![Page 63: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/63.jpg)
Search dimensions
VolunteerSearch
no. On Off
On .
On+Off DepthIntensity:
Mean (s.d.)1 1 2 1 0.67 1 43.33 (24.66)
2 10 8 0.56 2 14.72 (15.1)
3 6 5 0.55 3 12.27 (11.26)
4 3 1 0.75 1 7.5 (6.45)
2 1 30 14 0.68 6 4.55 (6.36)
2 22 8 0.73 2 7.67 (9.8)
3 8 1 0.89 1 13.33 (16.96)
4 24 2 0.92 1 6.73 (12.88)
![Page 64: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/64.jpg)
Progress
ca54 volunteers observed since Oct 2005 (representing c200 searches).
![Page 65: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/65.jpg)
cf Transaction Logs
Internet searches are often regarded as being ‘shallow and promiscuous’ (=many short,simple searches).This idea supports the perception of searches viewed from search engine transaction logs. A useful summary of search engine use, but not of Web search behaviour viewed as a whole.
![Page 66: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/66.jpg)
Feedback loops
Learn from previous searchesE.g. semantic shifts
Sheffield Pals Battalion
Richard Sparling
![Page 67: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/67.jpg)
Complex search ≠ good search
Familiarity with search engine facilities (Boolean, “”, etc) does not always indicate competence. E.g.: postcode "tallest building outside london" –london.
![Page 68: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/68.jpg)
Use the general to find the specialist
Search engine used to find a more focussed search tool. E.g. – searcher looking for info on B&B in York finds a directory of holiday accommodation.
![Page 69: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/69.jpg)
• Jansen ref re complexity• Findings title• Search dimensions slide• Database side – modelling.
![Page 70: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/70.jpg)
Previous studies of search logs• Web search is shallow + promiscuous• Low use of advanced features• Global statistics
– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity
![Page 71: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/71.jpg)
Strengths• Large sample.• Natural environment.• Definitely general public.
• No enquiry context – what are they looking for? What are they thinking?
• No measure of success.• Are they searching or just browsing?• Where does one enquiry end and another begin?• Limited to one search engine – what did they do during a delay?
Limitations
![Page 72: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/72.jpg)
Experimental Study
• Strengths– Very detailed information.– Searching not surfing.– Comparison of identical enquiries.
• Limitations– Small sample of queries.– Limited public sample – volunteers.
![Page 73: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/73.jpg)
This work• Development of quantitative analysis
• Analysis of search logs (Excite 2001)
• Development of descriptive codes
• Aim is to form a basis for the analysis of our experimental data
![Page 74: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/74.jpg)
Aims of Quantitative Analysis
• To look at textual (syntactic) changes.• Link queries by text similarity.• Infer enquiry change from textual
dissimilarity.• Use these elements to develop a
machine-readable codification of QT’s.
![Page 75: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/75.jpg)
Word similarity
667.087
10*2W
bacS
Drawback:On this measure doing and going are very similar (0.8)while bug and debugging have SW = 0.5
Dice Coefficient
e l e c t e d e l e c t i o n 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0
Shift
![Page 76: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/76.jpg)
Word Similarity Threshold
dingping 75.0
86WS
bringthing
6.0106
WS
tryingstring
5.0126
WSnursingtraining 4.0
156
WS
•Partial solution: introduce threshold WST = 0.4•Anything less similar than WST is given SW = 0
![Page 77: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/77.jpg)
Query Similarity• For each word in query 1 find the most similar
word in query 2 and combine results
• Accommodates repeated words (in query 2) without weighting
• Main point of WST is to avoid the accumulation of many small contributions to the query similarity
![Page 78: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/78.jpg)
Query Similarity Example
leaf gelatin supplier barcelona
gelatine supplies in spain
Score = 0 Score = 0.93 Score = 0.88
Score = 0
wordsofnumberscoresofsumS
maxQ Evaluate = 0.453
![Page 79: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/79.jpg)
Query Similarity Threshold
We are looking for the most similar previous query to i
i
jtimeIf none are similar maybe i isa new enquiry
Set QST =0.3 as lowest acceptable similarity for a valid query connection
![Page 80: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/80.jpg)
Setting WST and QST
• Result narrowed down by close inspection
• In first 300 queries the set with WST = 0.4 and QMT = 0.3 agreed with a human analysis of the best categorisation in all cases bar one, which was in any case an unusual entry.
![Page 81: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/81.jpg)
AlgorithmLoop over session queries
Loop over previous queries
for i = 1 to n
for j = 1 to i-1
Compare query i with j
Choose most similar pair i,j Assign k=j
Analyse to assign QT type k i
i
j 1
n
time
![Page 82: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/82.jpg)
Code Transformation
U Unique
N New query
R Repeated query
P Page viewing (seek more)
p Page viewing (earlier pages)
“Trivial” Transformations
![Page 83: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/83.jpg)
Substantive Transformations ICode Transformation (relative to k)I(k) Identical J(k) Identical apart from Quotes/Boolean
C(k) Conjoint
D(k) Disjoint
S(k) Sub-phrase in common
s(k) Sub-phrase + words in common
![Page 84: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/84.jpg)
Substantive Transformations II
Code Transformation (relative to k)W(k) Single word in commonw(k) Separated single words in common
M(k) Other textual similarity
Below Threshold SimilarityZ(k) Not similar but word in common
z(k) Not similar but words in common
![Page 85: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/85.jpg)
Target: one two three
Target: 123 Comparison Symbol Type
Basic transfomations 1234 C Conjunction 12 D Disjunction
Common sub-phrase 124 S Replacement 231 s Reordering 1243 s Insertion/removal
Common word 145 W Replacement 132 w Reordering 143 w Repacement/insertion
Below threshold similarity 1456 Z Common word 1245678 z Common phrase
![Page 86: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/86.jpg)
Code Transformation
B Include Boolean term
b Remove Boolean term
Q Include quote marks
q Remove quote marks
_ Delay > 1 hour
Supplementary Transformations
![Page 87: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/87.jpg)
Example full transformationMay include up to 4 terms e.g.
BQC(4)_Boolean
Quote MarksSubstantive Delay
![Page 88: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/88.jpg)
Some examples Code Query1 Query2 QJ(k) bargain music “bargain music” QC(k) Bacteremia “Pneumoccol Bacteremia” qJ(k) “university of texas”
“alternative medicine” university of texas” “alternative medicine”
qw(k) "tax law_depreciation system"
tax law/depreciation system
BC(k) "the sopranos" "the sopranos" +scripts BJ(k) +"Complaint form letters"
Insurance +"Complaint form letters" +Insurance
BS(k) doppler effect labs doppler effect +lab
![Page 89: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/89.jpg)
More examples Code Query1 Query2 Bs(k) conferences image processing +image +processing
+conferences +finland BqW(k) "Craig Larman" +Larman +Valtech BqZ(k) +"lbp 1000" +review +canon +review +laser
+printer BqW(k) Hevia AND bagpipe "Spanish bagpipe" bQs(k) +used +horse +trailer +arndt +"horse trailer" used bqW(k) +arndt +"horse trailer" used +Arndt trailer bqs(k) +Moby +southside +"Gwen
Stefani" +mp3 +Moby +southside +mp3
![Page 90: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/90.jpg)
Output for thefirst 100
Excite queries
Source file: excite.txt word modification threshold : 0.400000 query modification level : 0.300000 sub-session delay/s : 3600 qid0 uid nq Modification list 1 1 ** 1 U 2 2 ** 5 NW(1)_NPP 7 3 ** 4 NS(1)PP 11 4 ** 1 U 12 5 ** 1 U 13 6 ** 1 U 14 7 ** 5 N_QNPPP 19 8 ** 4 NPPP 23 9 ** 1 U 24 10 ** 4 NQJ(1)NQN 28 11 ** 5 N_NN_NP 33 12 ** 2 N_N 35 13 ** 3 NR_R 38 14 ** 1 U 39 15 ** 1 U 40 16 ** 4 NM(1)RN 44 17 ** 21 N_N_NC(1)PPPPNW(9)PPPPC(10)PPPPPP 65 18 ** 2 NP 67 19 ** 10 NRPC(1)RP_NS(7)D(7)I(7) 77 20 ** 1 QU 78 21 ** 1 U 79 22 ** 1 U 80 23 ** 1 U 81 24 ** 1 U 82 25 ** 1 QU 83 26 ** 11 N_NC(2)PPPPW(3)NC(9)P 94 27 ** 5 NNW(2)RR 99 28 ** 1 U 100 29 ** 3 NW(1)_M(1)
N_NC(2)PPPPW(3)NC(9)P
![Page 91: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/91.jpg)
One session - 3 sub-sessions
qid uid time rank query querymore totwords
83 000000000000001a 083122 0 chicago sun times No 3
84 000000000000001a 105439 0 f8 No 1
85 000000000000001a 105453 0 f8 airplane No 2
86 000000000000001a 105536 10 f8 airplane No 2
87 000000000000001a 105614 20 f8 airplane No 2
88 000000000000001a 105630 30 f8 airplane No 2
89 000000000000001a 105731 40 f8 airplane No 2
90 000000000000001a 105740 0 airplanes f8 No 2
91 000000000000001a 113441 0 ceo compensation No 2
92 000000000000001a 113633 0 2000 ceo compensation No 3
93 000000000000001a 113752 10 2000 ceo compensation No 3
1 N_
2 N
3 C(2)
4 P
5 P
6 P
7 P
8 W(3)
9 N
10 C(9)
11 P
![Page 92: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/92.jpg)
Query lengths
1
10
100
1000
10000
100000
1000000
1 10 100
Length/Queries
Freq
uenc
y
sessions sub-session
10% of sub-sessionsare at least 7 queries in length
![Page 93: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/93.jpg)
QT relative frequencies
0
5
10
15
20
25
30
35
U N P p R I J C D S s W w M Z z B b Q q _Query Transformation
Per
cant
age
Freq
uenc
y
![Page 94: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/94.jpg)
Terminal QT’s
0
0.2
0.4
0.6
0.8
1
1.2
U N P p R I J C D S s W w M Z z B b Q q _
Query Transformation
Term
inal
QT
ratio
)(QTFreqQTFinalFreqRatio
i.e.: The lastqueries in a sub-session
![Page 95: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/95.jpg)
QT graphs
N 1 2 3 5 4 6 Start
M C C s
22
23
25
26
27 s
s
s
S s
24 28
RP(14)
END
s s
20
29
R
s
21 5
uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R
nursing careers
paid undergraduate nursing schools in baltimore city maryland
![Page 96: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/96.jpg)
QT graphs
7 2
N 1 2
3
5
4
6 Start M C
QJ
C 19 15
14
18
P(7)
END
20
P C
P(3)
Delay
QJ
QD
uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)
molsworth
"us army"
![Page 97: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/97.jpg)
Frequency of nodes with k connections
0
2
4
6
8
10
12
0 2 4 6 8 10k
ln(f)
Query length 10
Query length 20
Slope = -1
Exponential scaling
![Page 98: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/98.jpg)
Intra-QT correlations
• f (A,B) measured coincident frequency of codes A and B
• E{} Expected value• V{} Variance
ij
ijijijf AAfV
AAfEAAfAAD
,
,,,
Correlations within a transform e.g. [BQC(3)_]
![Page 99: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/99.jpg)
Intra-QT correlations
Type B b Q q –— U 20.60 – 1.32 – – N -1.48 – 23.26 – 78.27 P – – – – -66.16 p – – – – -9.63 R – – – – 10.53 I – – – – 4.45 J 61.85 47.37 136.42 78.37 -5.74 C 46.02 -42.81 -15.14 -19.22 -4.70 D -34.07 62.20 -15.09 13.45 -4.79 S -24.52 -11.14 -20.69 -7.63 -5.65 s -2.62 9.93 -7.05 3.65 -8.04 W -35.00 -10.35 -32.99 -6.81 -6.05 w -2.63 9.14 -11.51 -0.98 -8.18 M -21.05 -12.98 -37.31 -13.28 -1.97 Z -2.26 14.11 -10.06 2.23 -0.90 z 1.78 2.82 0.55 1.45 0.95 B 0.00 – 1.16 76.78 -15.01 b – 0.00 74.95 10.05 -11.07 Q 1.16 74.95 0.00 – -0.28 q 76.78 10.05 – 0.00 -7.77 — -15.01 -11.07 -0.28 -7.77 0.00
Example:
[BQC(3)_]
![Page 100: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/100.jpg)
Some Observations
• Quote marks are likely to be used with a new query.
• Delay is strongly associated with N: these are successful single queries within a session.
• B & C are positively associated• B & D are negatively associated
![Page 101: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/101.jpg)
Application to Experimental Results
![Page 102: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/102.jpg)
Query Transformsqid SS Query QM(similarity) QM(preceeding)1 * CD albums collection N N2 CD albums collection R R3 * Autotrader N N4 * atlas N N5 * place names N N6 place names R R7 * map N N8 * online competitions N N9 * Tall British buildings N N10 Tall buildings w(9) w(9)11 Tall buildings R R12 Tall buildings R R13 Tall buildings in Britain w(9) C(12)14 Tallest building outside London M(9) M(13)
![Page 103: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/103.jpg)
Temporal Database•A repository of all data for each session•Accessible to SQL•Used to build evidence-based models for searching
Background detailsWeb experienceCognitive style scores
Subjects appraisalof searches
uid
Search queriesWeb page titles
uid
Key stroke recordActivity timings
Query modificationcodes
qidqid
Qualitative analysis
![Page 104: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/104.jpg)
Acknowledgments
• Arts and Humanities Research Council (formerly Board) for funding
• Mark Sanderson and Amanda Spink for making the Excite logs available
![Page 105: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/105.jpg)
Questions ?
![Page 106: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/106.jpg)
Setting WST and QST
excite: WST = 0.4
0
50000
100000
150000
200000
250000
300000
350000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Query Transformation
Freq
uenc
y
Tot NewTot Modz+Z
![Page 107: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/107.jpg)
Inter-QT correlations
• f ( A | B ) measured frequency of codes B following A
• E{} Expected value• V{} Variance
ij
ijijijf ABfV
ABfEABfABD
|
|||
Correlations of one transform with the next.
![Page 108: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/108.jpg)
Inter-QT correlations
Prior Transformation Type N P p R I J C D S s W w M Z z B b Q q —
N 82.40 -39.20 -2.92 13.26 22.95 2.22 10.77 9.92 5.92 -2.37 23.22 2.37 30.55 11.24 3.99 22.86 8.45 17.84 6.60 102.17 P -42.39 323.03 9.91 -15.98 -17.58 -9.12 -4.10 -6.83 -12.90 -5.45 -19.89 -5.75 -32.02 -8.76 -2.25 -25.01 -19.35 -18.59 -7.81 -71.47 p -50.08 79.89 154.30 17.11 4.96 -8.42 -18.06 -10.74 -15.30 -10.98 -21.52 -11.35 -18.32 -8.35 -2.35 -21.79 -10.70 -17.30 -7.25 21.57 R 125.10 -85.27 3.73 198.05 23.30 -2.83 0.55 -2.51 -3.93 -6.24 1.94 -3.17 14.86 1.31 -0.46 -16.30 -12.24 -0.72 -6.71 89.80 I -8.96 -39.39 7.11 25.19 152.36 23.27 35.60 20.45 19.44 10.92 33.41 15.91 61.29 5.88 1.04 0.33 6.43 -0.72 4.76 61.21 J 31.31 -28.13 0.42 -1.56 -2.36 45.43 29.05 12.92 21.68 19.21 15.47 15.55 10.37 7.08 4.06 66.72 37.31 70.63 46.88 -5.89 C 98.65 -27.61 -2.25 -7.92 -3.51 9.43 50.98 -1.42 2.57 -5.27 11.76 -2.43 7.80 10.78 1.98 33.37 6.34 25.51 3.16 -8.53 D 39.12 -24.03 -2.58 -3.66 -0.82 23.95 14.41 21.89 32.39 29.83 26.52 21.93 -4.62 11.31 4.55 45.21 24.60 57.86 14.67 5.58 S 35.67 -30.46 -3.62 -7.55 0.35 12.88 31.20 28.48 108.55 44.56 27.07 25.89 -6.91 26.54 5.79 56.24 35.14 39.28 17.90 6.90 s 8.44 -18.69 -2.58 -6.79 -1.78 15.49 43.13 15.71 59.83 117.15 1.57 34.34 -12.48 30.55 21.59 46.67 34.77 33.33 22.27 1.00 W 79.54 -43.79 -5.10 -9.05 4.91 15.72 16.39 32.98 10.95 -0.93 117.56 23.20 24.22 14.02 -0.47 70.07 38.85 46.57 17.34 27.82 w 17.74 -17.47 -2.16 -5.35 2.10 12.61 23.19 16.82 22.55 23.51 44.17 66.50 -2.25 18.13 3.57 39.50 35.21 26.21 14.42 6.21 M 109.09 -57.39 -6.00 0.68 8.81 4.55 -5.14 7.04 -11.05 -11.98 4.69 -7.25 160.36 -3.45 -2.86 31.61 14.40 9.17 4.19 31.52 Z 37.56 -13.24 -3.22 -0.98 1.32 6.09 9.11 5.53 17.10 13.88 5.76 5.96 -2.27 19.33 3.01 29.60 10.64 12.79 6.22 30.99 z 9.83 -4.61 0.69 0.25 -0.56 2.35 2.28 -0.82 7.06 8.53 -0.52 3.29 -2.42 8.85 20.34 12.08 4.22 4.48 2.57 4.33 B 61.06 -42.37 3.02 -0.11 -3.05 56.39 36.12 14.63 22.86 19.43 33.25 19.98 23.90 14.39 4.25 204.51 70.57 72.24 51.54 0.67 b 38.59 -32.48 -8.39 -14.33 -4.12 50.59 17.99 24.07 35.23 41.38 27.86 27.48 12.74 19.47 9.57 247.85 145.67 44.35 48.16 4.51 Q 35.97 -24.81 -5.29 -9.96 -3.80 112.76 21.46 12.99 19.11 17.75 23.45 15.62 7.47 8.74 2.70 81.08 67.37 126.97 50.84 5.15 q 18.26 -22.93 -2.71 -5.39 -0.10 54.20 17.40 22.01 23.42 28.37 23.34 14.49 6.52 7.45 3.91 41.28 40.34 173.97 135.55 5.06
Pos
terio
r Tra
nsfo
rmat
ion
— 54.44 -16.60 0.96 28.90 14.56 0.59 9.51 3.69 7.01 0.35 9.44 1.59 11.49 3.65 0.14 0.87 -1.84 4.33 -0.49 65.46
Example: [BQC(3)_][bqD(5)]
![Page 109: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/109.jpg)
Some Observations
• Self-correlations suggest habitual tendencies
• Substantive QT’s rarely follow or precede page-viewing. They are associated with active searching.
• Delay is followed by N, a new query or R or I – suggesting memory refresh.
![Page 110: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/110.jpg)
Number of words/query: Excite 2001
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 10 100terms/query
Nor
mal
ised
freq
uenc
y
![Page 111: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/111.jpg)
Hölsher & Strube (2000): Graphical Representation
Close-up of direct interaction with a search engine: numbers show transition probabilities.
Experts and novicesdoing specificsearch tasks
![Page 112: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/112.jpg)
Word Similarity
e l e c t e d e l e c t i o n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Shift word along until the best match is found
e l e c t e d e l e c t i o n 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
logical AND: same letter
![Page 113: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/113.jpg)
Motivation
• Need to develop new models for searching: update outdated usage paradigms.
• Improve training methods
• Develop automated assistance systems
![Page 114: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/114.jpg)
Context
• How do the general public search the web?
• Experimental study– general public volunteers– record sound, screens, keystrokes
• Goal: evidence-based model of effective searching
![Page 115: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/115.jpg)
Previous studies of search logs• Web search is shallow + promiscuous• Low use of advanced features• Global statistics
– number of queries/search– Pages viewed / user– query reformulation (change in no of terms)– Most users enter few terms– Little to be gained by increasing complexity
![Page 116: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/116.jpg)
This work• Development of quantitative analysis
• Analysis of search logs (Excite 2001)
• Development of descriptive codes
• Aim is to form a basis for the analysis of our experimental data
![Page 117: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/117.jpg)
Aims of Quantitative Analysis
• To look at textual (syntactic) changes.• Link queries by text similarity.• Infer enquiry change from textual
dissimilarity.• Use these elements to develop a
machine-readable codification of QT’s.
![Page 118: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/118.jpg)
Target: one two three
Target: 123 Comparison Symbol Type
Basic transfomations 1234 C Conjunction 12 D Disjunction
Common sub-phrase 124 S Replacement 231 s Reordering 1243 s Insertion/removal
Common word 145 W Replacement 132 w Reordering 143 w Repacement/insertion
Below threshold similarity 1456 Z Common word 1245678 z Common phrase
![Page 119: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/119.jpg)
Code Transformation
B Include Boolean term
b Remove Boolean term
Q Include quote marks
q Remove quote marks
_ Delay > 1 hour
Supplementary Transformations
![Page 120: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/120.jpg)
Example full transformationMay include up to 4 terms e.g.
BQC(4)_Boolean
Quote MarksSubstantive Delay
![Page 121: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/121.jpg)
Some examples Code Query1 Query2 QJ(k) bargain music “bargain music” QC(k) Bacteremia “Pneumoccol Bacteremia” qJ(k) “university of texas”
“alternative medicine” university of texas” “alternative medicine”
qw(k) "tax law_depreciation system"
tax law/depreciation system
BC(k) "the sopranos" "the sopranos" +scripts BJ(k) +"Complaint form letters"
Insurance +"Complaint form letters" +Insurance
BS(k) doppler effect labs doppler effect +lab
![Page 122: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/122.jpg)
More examples Code Query1 Query2 Bs(k) conferences image processing +image +processing
+conferences +finland BqW(k) "Craig Larman" +Larman +Valtech BqZ(k) +"lbp 1000" +review +canon +review +laser
+printer BqW(k) Hevia AND bagpipe "Spanish bagpipe" bQs(k) +used +horse +trailer +arndt +"horse trailer" used bqW(k) +arndt +"horse trailer" used +Arndt trailer bqs(k) +Moby +southside +"Gwen
Stefani" +mp3 +Moby +southside +mp3
![Page 123: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/123.jpg)
Output for thefirst 100
Excite queries
Source file: excite.txt word modification threshold : 0.400000 query modification level : 0.300000 sub-session delay/s : 3600 qid0 uid nq Modification list 1 1 ** 1 U 2 2 ** 5 NW(1)_NPP 7 3 ** 4 NS(1)PP 11 4 ** 1 U 12 5 ** 1 U 13 6 ** 1 U 14 7 ** 5 N_QNPPP 19 8 ** 4 NPPP 23 9 ** 1 U 24 10 ** 4 NQJ(1)NQN 28 11 ** 5 N_NN_NP 33 12 ** 2 N_N 35 13 ** 3 NR_R 38 14 ** 1 U 39 15 ** 1 U 40 16 ** 4 NM(1)RN 44 17 ** 21 N_N_NC(1)PPPPNW(9)PPPPC(10)PPPPPP 65 18 ** 2 NP 67 19 ** 10 NRPC(1)RP_NS(7)D(7)I(7) 77 20 ** 1 QU 78 21 ** 1 U 79 22 ** 1 U 80 23 ** 1 U 81 24 ** 1 U 82 25 ** 1 QU 83 26 ** 11 N_NC(2)PPPPW(3)NC(9)P 94 27 ** 5 NNW(2)RR 99 28 ** 1 U 100 29 ** 3 NW(1)_M(1)
N_NC(2)PPPPW(3)NC(9)P
![Page 124: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/124.jpg)
One session - 3 sub-sessions
qid uid time rank query querymore totwords
83 000000000000001a 083122 0 chicago sun times No 3
84 000000000000001a 105439 0 f8 No 1
85 000000000000001a 105453 0 f8 airplane No 2
86 000000000000001a 105536 10 f8 airplane No 2
87 000000000000001a 105614 20 f8 airplane No 2
88 000000000000001a 105630 30 f8 airplane No 2
89 000000000000001a 105731 40 f8 airplane No 2
90 000000000000001a 105740 0 airplanes f8 No 2
91 000000000000001a 113441 0 ceo compensation No 2
92 000000000000001a 113633 0 2000 ceo compensation No 3
93 000000000000001a 113752 10 2000 ceo compensation No 3
1 N_
2 N
3 C(2)
4 P
5 P
6 P
7 P
8 W(3)
9 N
10 C(9)
11 P
![Page 125: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/125.jpg)
Query lengths
1
10
100
1000
10000
100000
1000000
1 10 100
Length/Queries
Freq
uenc
y
sessions sub-session
10% of sub-sessionsare at least 7 queries in length
![Page 126: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/126.jpg)
QT relative frequencies
0
5
10
15
20
25
30
35
U N P p R I J C D S s W w M Z z B b Q q _Query Transformation
Per
cant
age
Freq
uenc
y
![Page 127: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/127.jpg)
Terminal QT’s
0
0.2
0.4
0.6
0.8
1
1.2
U N P p R I J C D S s W w M Z z B b Q q _
Query Transformation
Term
inal
QT
ratio
)(QTFreqQTFinalFreqRatio
i.e.: The lastqueries in a sub-session
![Page 128: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/128.jpg)
QT graphs
N 1 2 3 5 4 6 Start
M C C s
22
23
25
26
27 s
s
s
S s
24 28
RP(14)
END
s s
20
29
R
s
21 5
uid 74: NM(1)C(2)C(3)S(4)s(5)PPRPRRRRPPRRppI(5)s(6)s(22)s(22)s(23)s(25)s(26)s(22)R
nursing careers
paid undergraduate nursing schools in baltimore city maryland
![Page 129: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/129.jpg)
QT graphs
7 2
N 1 2
3
5
4
6 Start M C
QJ
C 19 15
14
18
P(7)
END
20
P C
P(3)
Delay
QJ
QD
uid 342: NM(1)C(2)QJ(3)_C(2)PI(2)PPPPPPPC(2)PPPQJ(15)QD(15)
molsworth
"us army"
![Page 130: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/130.jpg)
Frequency of nodes with k connections
0
2
4
6
8
10
12
0 2 4 6 8 10k
ln(f)
Query length 10
Query length 20
Slope = -1
Exponential scaling
![Page 131: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/131.jpg)
Intra-QT correlations
• f (A,B) measured coincident frequency of codes A and B
• E{} Expected value• V{} Variance
ij
ijijijf AAfV
AAfEAAfAAD
,
,,,
Correlations within a transform e.g. [BQC(3)_]
![Page 132: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/132.jpg)
Intra-QT correlations
Type B b Q q –— U 20.60 – 1.32 – – N -1.48 – 23.26 – 78.27 P – – – – -66.16 p – – – – -9.63 R – – – – 10.53 I – – – – 4.45 J 61.85 47.37 136.42 78.37 -5.74 C 46.02 -42.81 -15.14 -19.22 -4.70 D -34.07 62.20 -15.09 13.45 -4.79 S -24.52 -11.14 -20.69 -7.63 -5.65 s -2.62 9.93 -7.05 3.65 -8.04 W -35.00 -10.35 -32.99 -6.81 -6.05 w -2.63 9.14 -11.51 -0.98 -8.18 M -21.05 -12.98 -37.31 -13.28 -1.97 Z -2.26 14.11 -10.06 2.23 -0.90 z 1.78 2.82 0.55 1.45 0.95 B 0.00 – 1.16 76.78 -15.01 b – 0.00 74.95 10.05 -11.07 Q 1.16 74.95 0.00 – -0.28 q 76.78 10.05 – 0.00 -7.77 — -15.01 -11.07 -0.28 -7.77 0.00
Example:
[BQC(3)_]
![Page 133: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/133.jpg)
Application to Experimental Results
![Page 134: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/134.jpg)
Query Transformsqid SS Query QM(similarity) QM(preceeding)1 * CD albums collection N N2 CD albums collection R R3 * Autotrader N N4 * atlas N N5 * place names N N6 place names R R7 * map N N8 * online competitions N N9 * Tall British buildings N N10 Tall buildings w(9) w(9)11 Tall buildings R R12 Tall buildings R R13 Tall buildings in Britain w(9) C(12)14 Tallest building outside London M(9) M(13)
![Page 135: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/135.jpg)
Temporal Database•A repository of all data for each session•Accessible to SQL•Used to build evidence-based models for searching
Background detailsWeb experienceCognitive style scores
Subjects appraisalof searches
uid
Search queriesWeb page titles
uid
Key stroke recordActivity timings
Query modificationcodes
qidqid
Qualitative analysis
![Page 136: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/136.jpg)
Conclusions• We have developed a rich set of codes
describing syntactic part of QT’s• These can be used to develop a graph-based
description• Correlations between the codes are
meaningful/interesting• They will form part of the analysis for our
experimental study.
![Page 137: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/137.jpg)
Acknowledgments
• Arts and Humanities Research Council (formerly Board) for funding
• Mark Sanderson and Amanda Spink for making the Excite logs available
![Page 138: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/138.jpg)
Questions ?
![Page 139: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/139.jpg)
Setting WST and QST
excite: WST = 0.4
0
50000
100000
150000
200000
250000
300000
350000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Query Transformation
Freq
uenc
y
Tot NewTot Modz+Z
![Page 140: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/140.jpg)
Inter-QT correlations
• f ( A | B ) measured frequency of codes B following A
• E{} Expected value• V{} Variance
ij
ijijijf ABfV
ABfEABfABD
|
|||
Correlations of one transform with the next.
![Page 141: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/141.jpg)
Inter-QT correlations
Prior Transformation Type N P p R I J C D S s W w M Z z B b Q q —
N 82.40 -39.20 -2.92 13.26 22.95 2.22 10.77 9.92 5.92 -2.37 23.22 2.37 30.55 11.24 3.99 22.86 8.45 17.84 6.60 102.17 P -42.39 323.03 9.91 -15.98 -17.58 -9.12 -4.10 -6.83 -12.90 -5.45 -19.89 -5.75 -32.02 -8.76 -2.25 -25.01 -19.35 -18.59 -7.81 -71.47 p -50.08 79.89 154.30 17.11 4.96 -8.42 -18.06 -10.74 -15.30 -10.98 -21.52 -11.35 -18.32 -8.35 -2.35 -21.79 -10.70 -17.30 -7.25 21.57 R 125.10 -85.27 3.73 198.05 23.30 -2.83 0.55 -2.51 -3.93 -6.24 1.94 -3.17 14.86 1.31 -0.46 -16.30 -12.24 -0.72 -6.71 89.80 I -8.96 -39.39 7.11 25.19 152.36 23.27 35.60 20.45 19.44 10.92 33.41 15.91 61.29 5.88 1.04 0.33 6.43 -0.72 4.76 61.21 J 31.31 -28.13 0.42 -1.56 -2.36 45.43 29.05 12.92 21.68 19.21 15.47 15.55 10.37 7.08 4.06 66.72 37.31 70.63 46.88 -5.89 C 98.65 -27.61 -2.25 -7.92 -3.51 9.43 50.98 -1.42 2.57 -5.27 11.76 -2.43 7.80 10.78 1.98 33.37 6.34 25.51 3.16 -8.53 D 39.12 -24.03 -2.58 -3.66 -0.82 23.95 14.41 21.89 32.39 29.83 26.52 21.93 -4.62 11.31 4.55 45.21 24.60 57.86 14.67 5.58 S 35.67 -30.46 -3.62 -7.55 0.35 12.88 31.20 28.48 108.55 44.56 27.07 25.89 -6.91 26.54 5.79 56.24 35.14 39.28 17.90 6.90 s 8.44 -18.69 -2.58 -6.79 -1.78 15.49 43.13 15.71 59.83 117.15 1.57 34.34 -12.48 30.55 21.59 46.67 34.77 33.33 22.27 1.00 W 79.54 -43.79 -5.10 -9.05 4.91 15.72 16.39 32.98 10.95 -0.93 117.56 23.20 24.22 14.02 -0.47 70.07 38.85 46.57 17.34 27.82 w 17.74 -17.47 -2.16 -5.35 2.10 12.61 23.19 16.82 22.55 23.51 44.17 66.50 -2.25 18.13 3.57 39.50 35.21 26.21 14.42 6.21 M 109.09 -57.39 -6.00 0.68 8.81 4.55 -5.14 7.04 -11.05 -11.98 4.69 -7.25 160.36 -3.45 -2.86 31.61 14.40 9.17 4.19 31.52 Z 37.56 -13.24 -3.22 -0.98 1.32 6.09 9.11 5.53 17.10 13.88 5.76 5.96 -2.27 19.33 3.01 29.60 10.64 12.79 6.22 30.99 z 9.83 -4.61 0.69 0.25 -0.56 2.35 2.28 -0.82 7.06 8.53 -0.52 3.29 -2.42 8.85 20.34 12.08 4.22 4.48 2.57 4.33 B 61.06 -42.37 3.02 -0.11 -3.05 56.39 36.12 14.63 22.86 19.43 33.25 19.98 23.90 14.39 4.25 204.51 70.57 72.24 51.54 0.67 b 38.59 -32.48 -8.39 -14.33 -4.12 50.59 17.99 24.07 35.23 41.38 27.86 27.48 12.74 19.47 9.57 247.85 145.67 44.35 48.16 4.51 Q 35.97 -24.81 -5.29 -9.96 -3.80 112.76 21.46 12.99 19.11 17.75 23.45 15.62 7.47 8.74 2.70 81.08 67.37 126.97 50.84 5.15 q 18.26 -22.93 -2.71 -5.39 -0.10 54.20 17.40 22.01 23.42 28.37 23.34 14.49 6.52 7.45 3.91 41.28 40.34 173.97 135.55 5.06
Pos
terio
r Tra
nsfo
rmat
ion
— 54.44 -16.60 0.96 28.90 14.56 0.59 9.51 3.69 7.01 0.35 9.44 1.59 11.49 3.65 0.14 0.87 -1.84 4.33 -0.49 65.46
Example: [BQC(3)_][bqD(5)]
![Page 142: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/142.jpg)
Some Observations
• Self-correlations suggest habitual tendencies
• Substantive QT’s rarely follow or precede page-viewing. They are associated with active searching.
• Delay is followed by N, a new query or R or I – suggesting memory refresh.
![Page 143: Improving the effectiveness of Web searching: Methodological issues](https://reader033.fdocuments.in/reader033/viewer/2022051700/56816016550346895dcf1740/html5/thumbnails/143.jpg)
Number of words/query: Excite 2001
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 10 100terms/query
Nor
mal
ised
freq
uenc
y