Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids...
-
Upload
edwina-mccormick -
Category
Documents
-
view
214 -
download
0
Transcript of Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids...
![Page 1: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/1.jpg)
Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox
Community Grids LaboratoryIndiana University
![Page 2: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/2.jpg)
Delicious example
2
Bookmark
Tags
SocialNetwork
s
People-generate
d
![Page 3: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/3.jpg)
Collaborative Tagging Online bookmarking
with annotations Create social networks Utilize power of
people’s knowledge Pros and cons
High-quality classifier by using human intelligence
But lack of control or authority
3
![Page 4: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/4.jpg)
4
![Page 5: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/5.jpg)
5
Search ResultSOAP, REST, …
Repository
Query with various options
RDFRSSAtomHTML
Populate Bookmarks/ tags
Distributed Tagging Data
CCT System
Data Coordinator
User Service
Data Importer
Collective Collaborative Tagging (CCT) System
![Page 6: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/6.jpg)
6
1st - Service and algorithm development Identify services and algorithms
2nd - Interface development Web2.o style interface REST, SOAP, …
3rd – Export/import service development Merging distributed data sets Export data to build mesh-up sites
So far, we are mainly in 1st stage and do some experiments in 2nd stage
![Page 7: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/7.jpg)
7
Different Data Sources
Various IR algorithms
Flexible Options
Result Comparison
![Page 8: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/8.jpg)
8
SearchingSearching
Given input tags, returning the most relevant X (X = URLs, tags, or users)
Given input tags, returning the most relevant X (X = URLs, tags, or users)
Latent Semantic Indexing (LSI), FolkRank
Latent Semantic Indexing (LSI), FolkRank
II
Recommendation
Recommendation
Indirect input tags, returning undiscovered XIndirect input tags, returning undiscovered XIIII
ClusteringClustering
Community discovering. Finding a group or a community with similar interests
Community discovering. Finding a group or a community with similar interests
K-Means, Deterministic Annealing Clustering
K-Means, Deterministic Annealing Clustering
IIIIII
Trend detection
Trend detection
Analysis the tagging activities in time-series manner and detect abnormality
Analysis the tagging activities in time-series manner and detect abnormality
Time Series AnalysisTime Series AnalysisIVIV
Service DescriptionAlgorithm
Type
![Page 9: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/9.jpg)
Vector-space model (bag-of-words model) Assume n URLs and q tags A URL can be represented by q-dimension
vector, di = (t1, t2, … , tq)
A total data set can be represented by n-by-q matrix
Pairwise Dissimilarity Matrix n-by-n symmetric matrix Distance (Euclidean, Manhattan, … ) Angles, cosine, sine, … O(n2) complexity
9
![Page 10: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/10.jpg)
10(Source : MSI-CIEC)
Graph model Building a graph with nodes and edges Edges are indicating relationship Becoming complex networks (tag graph)
Dissimilarity Related with path distance Finding path is important
(Shortest path problem) Naive approach :
O(n3) complexity
![Page 11: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/11.jpg)
Latent Semantic Indexing Using vector-space model, find the most
similar URLs with user’s query tags Dimension reduction from high q to low d (q
>> d) Removing noisy terms, extracting latent
concepts
11Precision
Reca
ll
2 terms4 terms8 terms20% dim. reductionNone
Ideal Line
![Page 12: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/12.jpg)
Discover the group structures of URLs Non-parametric learning algorithm
Non-trivial optimization problem Should avoid local minima/maxima solution
12
![Page 13: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/13.jpg)
Deterministically avoid local minima Tracing global solution by changing level of
energy Analogy to physical annealing process (High
Low)
13
![Page 14: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/14.jpg)
Classification To response more quickly to user’s requests Training data based on user’s input and
answering questions based on the training results
Artificial Neural Network, Support Vector Machine,…
Trend Detection Can be used for prediction/forecasting Time-series analysis of tagging activities Markov chain model, Fourier transform, …
14
![Page 15: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/15.jpg)
The goal of our Collective Collaborative Tagging (CCT) system Utilize various data sets Provide various information retrieval (IR)
algorithms Help to utilize people-powered knowledge
Currently various models and algorithms are being investigated
Service interfaces and import/export function will be added soon
15
![Page 16: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/16.jpg)
16
![Page 17: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/17.jpg)
17
-. Distances, cosine, …-. O(N2) complexity-. Distances, cosine, …-. O(N2) complexity
Dis-similarity
Dis-similarity
Vector-space Model
-. Paths, hops, connectivity, …-. O(N3) complexity
-. Paths, hops, connectivity, …-. O(N3) complexity
Graph Model
-. Latent Semantic Indexing-. Dimension reduction schemes-. PCA
-. Latent Semantic Indexing-. Dimension reduction schemes-. PCA
AlgorithmAlgorithm-. PageRank, FolkRank, …-. Pairwise clustering-. MDS
-. PageRank, FolkRank, …-. Pairwise clustering-. MDS
-. q-dimensional vector-. q-by-n matrix-. q-dimensional vector-. q-by-n matrix
Represen-tation
Represen-tation
-. G(V, E) -. V = {URL, tags, users}-. G(V, E) -. V = {URL, tags, users}
![Page 18: Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022062802/56649e995503460f94b9caaa/html5/thumbnails/18.jpg)
Pairwise clustering Input from vector-based model vs. graph
model How to avoid local minima/maxima? (e.g, K-
Means)
18
Graph modelVector-space model