Download - Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Xiaoying Gao

Computer Science

Victoria University of Wellington

Intelligent AgentsCOMP 423

Next lecture

• Paper: Personalized Mobile search engine

• Some concepts• Ontology

• Cyc, WordNet, ODP(Open directory project)

• Entropy• disorder, uncertainty• the smaller, the better

• Ranking SVM (Support Vector Machines)• Supervised machine learning

• K-means clustering

Review on information retrieval

• Text representation• Find all the unique words/terms in all documents,

• each term is a feature• Each document is represented as a vector

• Weight for each term• Term frequency: for one particular document• Inversed document frequency : the same for all

documents• Query is also a vector

• Comparison with Query• The similarity between query vector and each document vector

• Cosine similarity • or other distance measure (Euclidean distance)

• Ranking• The documents with higher similarity are more relevant to query• PageRank

Evaluation

• Precision and RecallFor a query, If a system finds 200 results, among them 50 are relevant. The human labels ( model solutions) have 120 relevant documents.

•Precision and recall curve

• AUC: Area under curve

• Top N precision

• ARR: the average rank of the documents rated as “relevant”

Personalised search

Personalised information retrieval

A related area is called Adaptive Hypermedia

Information gathering

• Information gathering approach• Implicit, Explicit, Both

• Type of information• Queries, clicked documents, snippets of documents• Cashed web pages, dwell time on page, desktop

documents• Email, calendar items• Tags and bookmarks on online social applications• User supplied information• User’s categorical interests

• Source of information• Server side, Client –side, user intervention

Information representation

• User model• Short-term interests, long-term interests• Static, dynamic, periodic• Terms or conceptual terms (use wordnet, ontology)• Vector-based

• Models where user’s interests are maintained in a vector of weighted keywords (concepts).

• Semantic network based• Models where user’s interests are maintained in a network

structure of terms and related terms (concepts and related concepts)

Two directions• Query adaptation or result adaptation

• Query expansion/adaptation• implicit

individualised

User model

Aggregate

Usage information (search logs)

Not user-focused

Pseudo-relevance feedback

Global analysis• explicit

individual, relevance feedback, interactive query expansion

Search results filtering/adaptation

• Different applications: individual, aggregate, web search or recommendation, databases search

• Typically use supervised machine learning• Relevant, not relevant: binary classification• Training data:

• Labeled data• Assume clicked docs are relevant

• Machine learning methods• KNN: K nearest neighbour• Naïve Bayes• SVM: Support vector machines• …

Evaluation

• In lab setting

10-500 users

• Quantitative & Qualitative

• System performance

• User evaluation, system usability

• Data sets

open web corpora, in-lab generated logs,

TREC collection, search engine query logs

subset of annotated documents from specific sites