Chengqi zhang graph processing and mining in the era of big data

47
Graph Processing and Mining in the Era of Big Data Chengqi Zhang Centre for Quantum Computation & Intelligent Systems (QCIS) University of Technology, Sydney (UTS)

Transcript of Chengqi zhang graph processing and mining in the era of big data

Page 1: Chengqi zhang graph processing and mining in the era of big data

Graph Processing and Mining in the Era of Big

DataChengqi Zhang

Centre for Quantum Computation & Intelligent Systems (QCIS)

University of Technology, Sydney (UTS)

Page 2: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 3: Chengqi zhang graph processing and mining in the era of big data

Graph Everywhere!

Social NetworkFacebook, Twitter

Web GraphGoogle, Yahoo

Road Network

The Internet of Things

Page 4: Chengqi zhang graph processing and mining in the era of big data

Big Data Characteristics

Big Data

Volume• Petabytes• Records• Transactions

Velocity• Batch• Real time• Streaming

Variety• Structured• Unstructure

d• Semi-

structured

Page 5: Chengqi zhang graph processing and mining in the era of big data

Graph in Big Data: Volume

• 1.23 billon active users in 2013• 190 friends/user on average• 500 TB data/day in 2012

• 2.1 billion webpages in 2000• 15 billion edges in 2000• 20 PB data/day in 2008

• 180-200 PB data in 2011

• 6.5 PB data + 50 TB/day in 2009

Page 6: Chengqi zhang graph processing and mining in the era of big data

Graph in Big Data: Velocity

• Fast flowing data• Evolving data structures and relationships

Page 7: Chengqi zhang graph processing and mining in the era of big data

Graph in Big Data: Variety

• Directed vs Undirected• Labeled vs Unlabeled • Weighted vs Unweighted• Heterogeneous vs homogeneous

Page 8: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 9: Chengqi zhang graph processing and mining in the era of big data

Challenges and Opportunities

New Graph Semantics (Variety)

New Query Processing Algorithms (Volume & Velocity)New Indexing Techniques (Volume &

Velocity) New Computing Models (Volume)

New Graph Mining Tasks (Variety)

Page 10: Chengqi zhang graph processing and mining in the era of big data

New Graph Semantics

Traditional (Google)• Input: keywords• Output: webpages

containing keywords• Ranked by PageRank

New (Google)• Input: keywords• Output: knowledge

graph/subgraph• Ranking should

consider both structural and content information

Page 11: Chengqi zhang graph processing and mining in the era of big data

New Graph Mining Tasks

Chemical Compound Database

Chemical Features

Team of Experts

Several Years

Graph Mining

Several Hours

Page 12: Chengqi zhang graph processing and mining in the era of big data

New Query Processing Algorithms

LocationRelationship

Text

Spatial query processing, nearest neighbor search …

Link analysis, shortest path search, community detection …

Text processing, string matching, semantic analysis …

All of these should be processed inMilliseconds

Page 13: Chengqi zhang graph processing and mining in the era of big data

New Indexing Techniques

Traditional: webpages, files ?

Hash table, B-tree, Inverted Index …

New: subgraphs, trees, paths ?What’s more

Graph is Frequently Changing…

Page 14: Chengqi zhang graph processing and mining in the era of big data

New Computing Models

Single Machine vs Multiple Machines

Internal Algorithms vs External Algorithms

Single Core vs Multiple Cores

Page 15: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 16: Chengqi zhang graph processing and mining in the era of big data

Structural Keyword Search

Jim, data mining Jim

data mining

data mining

Jim

Jim, data mining

data miningJim

data miningJim

Traditional: Content Keyword Search

New: Structural Keyword Search

VS

Our Work:• ICDE’07: Finding Top-K Min-Cost Connected Trees in Databases• SIGMOD’09: Keyword Search in Databases: The Power of RDBMS• Morgan & Claypool 2009 (Book): Keyword Search in Databases• VLDBJ’11: Scalable Keyword Search on Large Data Streams• ICDE’11 & TKDE’12: Computing Structural Statistics by Keywords in

Databases

Page 17: Chengqi zhang graph processing and mining in the era of big data

Graph Matching

MatchGraph 1 Graph 2

2

41

7

53

6

2

41

7

1

53

6

Graph PatternMatch

NP-Hard Problems

Our Work:• EDBT’12: Finding Top-K Similar Graphs in Graph Databases• CIKM’11 & VLDBJ’13: High Efficiency and Quality: Large Graphs

Matching• VLDB’14: Leveraging Graph Dimensions in Online Graph Search

Page 18: Chengqi zhang graph processing and mining in the era of big data

Community Detection

?What is a community in a graph?

A cohesive subgraph?A dense subgraph?

Everyone is highly connected to others?Everyone is with small distance with others?

An Example: k-core

1-core 2-core

3-core

Page 19: Chengqi zhang graph processing and mining in the era of big data

Community Detection

Graph 3-core

4-clique 3-edge-cc 4-truss

? Other Semantics?

Our Work:• SIGMOD’13: Efficiently Computing k-Edge Connected Components via

Graph Decomposition• SIGMOD’14: Querying k-truss Community in Large and Dynamic Graphs• VLDB’15: Influential Community Search in Large Networks• KDD’15: Locally Densest Subgraph Discovery

Page 20: Chengqi zhang graph processing and mining in the era of big data

Influential Community (VLDB’15)

Which are the most influential research groups?

A Collaboration Network

Page 21: Chengqi zhang graph processing and mining in the era of big data

Locally Densest Subgraph (KDD’15)

Which are the most representative densesubgraphs?

Page 22: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 23: Chengqi zhang graph processing and mining in the era of big data

Graph Classification+ -+

++

-

--

Graph Database

…Frequent Subgraphs

…Optimal Subgraphs Classifier

1

2

3

4

1

2

3

+ -++

+-

--

Graph Database

…Optimal Subgraphs Classifier

+ -++

+-

--

Graph Database

…Optimal Subgraphs Classifier

1 2 3

Traditional: 3 Phases

Our work (CIKM’12): 2 Phases

Our work (PR’15): 1 Phase

Direct Selection

Direct Selection

Our Work:• CIKM’12: Graph Classification: A Diversified Discriminative Feature Selection

Approach• ICDE’13: Graph Stream Classification using Labeled and Unlabeled Graphs• IJCAI’13: Graph Classification with Imbalanced Class Distributions and Noise• TKDE’14: Bag Constrained Structure Pattern Mining for Multi-Graph

Classification• SDM’14: Multi-Graph Learning with Positive and Unlabeled Bags• ICDM’14: Multi-Graph-View Learning for Graph Classification• IJCAI’15: Multi-Graph-View Learning for Complicated Object Classification• TKDE’15: CogBoost: Boosting for Fast Cost-sensitive Graph Classification• PR’15: Finding the Best not the Most: Regularized Loss Minimization

Subgraph Selection for Graph Classification

Page 24: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 25: Chengqi zhang graph processing and mining in the era of big data

Polynomial DelayEnumeration Problems in Graph?• Structural keyword search• Community detection• Graph pattern matching• Similar graph search

Polynomial Time w.r.t. Input?Output can be exponential

Impossible!So…

Polynomial Total: Polynomial to Input+Output

Possible, but…

Page 26: Chengqi zhang graph processing and mining in the era of big data

Polynomial Delay

time… … …

Many answers!

Can’t you be faster?

time

How about this?

Polynomial Total

Polynomial Delay

New SolutionPolynomial Delay: Delay Time Polynomial to InputTotal time is still large,

but…

Our Work:• ICDE’09: Querying Communities in Relational Databases• Algorithmica’13: Fast Maximal Cliques Enumeration in Sparse Graphs• EDBT’15: Efficiently Computing Top-K Shortest Path Join• VLDB’15: Optimal Enumeration - Efficient Top-k Tree Matching

Page 27: Chengqi zhang graph processing and mining in the era of big data

Diversified Graph Search

Enumeration Problems in Graph• Structural keyword search• Community detection• Graph pattern matching• Frequent graph pattern

mining• …

Top-6 Answers

Top-6 Diversified Answers

Top-K Densest Communities? Consider Diversity?

GraphOur Work:

• VLDB’12: Diversifying Top-K Results• CIKM’12: Graph Classification: A Diversified Discriminative Feature

Selection Approach• VLDB’13 & VLDBJ’15: Top-K Structural Diversity Search in Large

Networks• ICDE’15: Diversified Top-K Clique Search

Page 28: Chengqi zhang graph processing and mining in the era of big data

Diversified Top-K Cliques (ICDE’15)

AB

E

J

G H

KI

F

C

D

Maximum CliqueTop-2 Maximum Cliques

Too much overlap!

Diversified Top-2 Maximum Cliques

Cover All Nodes!

Problem Statement:Compute k Cliques to Cover Maximum Number of Nodes

Page 29: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 30: Chengqi zhang graph processing and mining in the era of big data

Dijkstra’s Algorithm?

Shortest Path Computation

A* Algorithm?

Traverse the whole graph in worst case

Precompute all-pair shortest paths?Impractical!

Our approach (VLDBJ’12):Compute a subset of pairs

VLDBJ’12

Our Work:• VLDBJ’12: The Exact Distance to Destination in Undirected World• VLDB’13: Top-K Nearest Keyword Search on Large Graphs• VLDBJ’13: Computing Weight Constraint Reachability in Large Networks• SIGMOD’15: Index-based Optimal Algorithms for Computing Steiner

Components with Maximum Connectivity

Page 31: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 32: Chengqi zhang graph processing and mining in the era of big data

Our Focus

I/O Efficient Computation

Control

Data-path

Secondary

Storage(Disk)

Processor

Registers

MainMemory(DRAM)

Second

LevelCache(SRAM

)

On-ChipCache

1 ns 10 msSpeed: 5 ns 100 ns100B TBSize: KB GB

Tertiary

Storage

(Tape)

10 secPB

10 nsMB

Graph ProblemsMain Memory vs Disk

Sequential I/O vs Random I/OExternal vs Semi-external

Partition based vs Nested loop based

Our Work:• EDBT’12: I/O Cost Minimization: Reachability Queries Processing over

Massive Graphs• SIGMOD’13 & VLDBJ’14: I/O Efficient: Computing SCCs in Massive

Graphs• ICDE’14: Contract & Expand: I/O Efficient SCCs Computing• SIGMOD’15: Divide and Conquer - I/O Efficient Depth-First Search

Page 33: Chengqi zhang graph processing and mining in the era of big data

Parallel Computation

Memory

Core Core

L1 L1

L2Switch

Core Core

L1 L1

L2Switch

CPU

DiskMemory

CPU

DiskMemory

CPU

DiskMemory

Network

• Computation SensitiveMulticore

• Shared Memory• Separated L1 Cache• Reduce Cache Miss

• Data SensitiveDistributed Computing

• Shared Nothing• Separated CPU, memory, Disk• Reduce Communication

• Divide Tasks • Divide Data

Multicore Distributed ComputingMapReduce, BSP…

Comparison…

Our Work:• VLDB’10: Ten Thousand SQLs: Parallel Keyword Queries Computing• SIGMOD’14: Scalable Big Graph Processing in MapReduce• VLDB’15: Scalable Subgraph Enumeration in MapReduce

Page 34: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 35: Chengqi zhang graph processing and mining in the era of big data

Graph Processing System Design

Objective 1:Extracting Primitive Operators from DB

and DMChallenge: Completeness & Minimality

Objective 2:Scalable Processing Techniques

Challenge: Guarantee of “Optimality”

Objective 3:Characterizing Real-time Tractability

Challenge: Hard & Risky

Page 36: Chengqi zhang graph processing and mining in the era of big data

Graph System Structure

Data EnvironmentsStatic, Streaming, Dynamic Graph, Probabilistic, Spatial, Evolving Graph, Random Graph

Computing ModelsMain-memory, Distributed/Cloud/MapReduce/BSP/Spark/Pregel,

SSD, Parallel/Multi-core, External/Semi-External

Advanced ApplicationsSocial Network (Twitter, Facebook), Geo Social (Checkin), Chemical, Biological,

Web Graph (Wiki), Collaboration (DBLP), Public Opinion Mining

Query Primitives• Given a Graph Pattern:

Similarity, Pattern, Sub/Super Graph• Given a Set of Nodes:

Topology: SimRank, Connectivity, Path

K-hop, Flow, Community, Reachability• Given a Set of Keywords:

Knowledge Graph, Attributed Graph, RDF

Mining Primitives• Subgraph Based:

Cohesive Subgraph Mining

Community DetectionGraph Clustering,

PartitionFrequent Subgraph

Mining• Aggregate Based:

PageRank, Outlier, Anonymity

Influence Maximization

Primitive Computing ParadigmsJoins, BFS, DFS, Topological Sort, Spanning Tree, Diameter

Page 37: Chengqi zhang graph processing and mining in the era of big data

Our Current Development

Computing ModelsSIGMOD’15b, VLDB’15a, VLDBJ’14, SIGMOD’14a, SIGMOD’13a,

EDBT’12b, VLDB’10

Advanced ApplicationsVLDB’15c, VLDBJ’13b, VLDB’13a, TKDE’12, ICDE’11, CIKM’11b

Query Primitives

VLDBJ’15, SIGMOD’15a, VLDB’15b, KDD’15, ICDE’15b,

VLDB’13b, VLDBJ’12,EDBT’12a, ICDE’09b, ICDE’07

Mining PrimitivesAlgorithmica’13, CIKM’12,

CIKM’11a, IJCAI’15,TKDE’14, SDM’14,

ICDE’13a, TKDE’15, ICDE’13b,

ICDM’13IJCAI’13, ICDM’14, PR’15

Primitive Computing ParadigmsICDE’15a, EDBT’15, ICDE’14, VLDB’14, SIGMOD’13b, VLDBJ’13a, VLDB’12,

Data EnvironmentsSIGMOD’14b, VLDBJ’11, SIGMOD’09, ICDE’09a, EDBT’08, SSDBM’08

Page 38: Chengqi zhang graph processing and mining in the era of big data

Outline Background Challenges and Opportunities Our Work: Graph Semantics Our Work: Graph Mining Our Work: Query Processing Our Work: Indexing Our Work: Computing Models Graph Processing System Design Future Developments

Page 39: Chengqi zhang graph processing and mining in the era of big data

Future Developments

Social Network Recommendation

Location Based Social Network

Big Graph Processing in CloudMassive Graph Matching

Graph Summary

Graph Stream

Personalized CommunitySearchHigh Influence Community

SearchGraph Clustering in Cloud

Massive Uncertain Graph

Page 40: Chengqi zhang graph processing and mining in the era of big data

Conclusion

Mining and Query Processing

The Era of Big Data

Indexing

Semantics

Computing Model

Big Graph: Larger, More ComplexMore Challenges!

More Opportunities to Explore the Unknown World!

Page 41: Chengqi zhang graph processing and mining in the era of big data

Aknowledgements

1. Dr Lu Qin2. Prof. Xingquan Zhu3. Mr Jia Wu4. Mr Shirui Pan

Page 42: Chengqi zhang graph processing and mining in the era of big data

References1. Jeffrey Xu Yu, Lu Qin, and Lijun Chang: Keyword Search in Databases, published by

Morgan & Claypool, 2009.2. Xin Huang, Hong Cheng, Rong-Hua Li, Lu Qin, and Jeffrey Xu Yu: Top-K Structural

Diversity Search in Large Networks, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 24, No. 3, Pages 319-343, 2015.

3. Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Lijun Chang, and Xuemin Lin: I/O Efficient: Computing SCCs in Massive Graphs, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 24, No. 2, Pages 245-270, 2014.

4. Yuanyuan Zhu, Lu Qin, Jeffrey Xu Yu, Yiping Ke, and Xuemin Lin: High Efficiency and Quality: Large Graphs Matching, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 22, No. 3, Pages 345-368, 2013.

5. Miao Qiao, Hong Cheng, Lu Qin, Jeffrey Xu Yu, Philip S. Yu, and Lijun Chang: Computing Weight Constraint Reachability in Large Networks, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 22, No. 3, Pages 275-294, 2013.

6. Lijun Chang, Jeffrey Xu Yu, and Lu Qin: Fast Maximal Cliques Enumeration in Sparse Graphs, in Algorithmica, Vol. 66, No. 1, Pages 173-186, 2013.

7. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Computing Structural Statistics by Keywords in Databases. Invited paper by IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 24, No. 10, Pages 1731-1746, 2012.

8. Lijun Chang, Jeffrey Xu Yu, Lu Qin, Hong Cheng, and Miao Qiao: The Exact Distance to Destination in Undirected World, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 21, No. 6, Pages 869-888, 2012.

9. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Scalable Keyword Search on Large Data Streams, in the International Journal on Very Large Data Bases (VLDBJ), Vol. 20, No. 1, Pages 35-57, 2011.

10. Lu Qin, Rong-Hua Li, Lijun Chang, and Chengqi Zhang: Locally Densest Subgraph Discovery, to appear in Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'15), 2015.

Page 43: Chengqi zhang graph processing and mining in the era of big data

References11. Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang: Scalable Subgraph Enumeration in

MapReduce, to appear in Proceedings of the Very Large Database Endowment (VLDB), 2015. 12. Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, Wenjie Zhang: Index-based Optimal

Algorithms for Computing Steiner Components with Maximum Connectivity, to appear in Proceedings of ACM Conference on Management of Data (SIGMOD'15), 2015.

13. Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, and Zechao Shang: Divide & Conquer: I/O Efficient Depth First Search, to appear in Proceedings of ACM Conference on Management of Data (SIGMOD'15), 2015.

14. Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Jian Pei: Efficiently Computing Top-K Shortest Path Join, in Proceedings of the 18th International Conference on Extending Database Technology (EDBT'15), 2015.

15. Rong-Hua Li, Jeffrey Xu Yu, Lu Qin, Rui Mao, and Tan Jin: On Random Walk Based Graph Sampling, in the 31st IEEE International Conference on Data Engineering (ICDE'15), 2015.

16. Long Yuan, Lu Qin, Xuemin Lin, Lijun Chang, and Wenjia Zhang: Diversified Top-K Clique Search, in the 31st IEEE International Conference on Data Engineering (ICDE'15), 2015.

17. Lijun Chang, Xuemin Lin, Wenjie Zhang, Jeffrey Xu Yu, Ying Zhang, and Lu Qin: Optimal Enumeration: Efficient Top-k Tree Matching, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 8, No. 5, Pages 533-544, 2015.

18. Rong-Hua Li, Lu Qin, Jeffrey Xu Yu, and Rui Mao: Influential Community Search in Large Networks, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 8, No. 5, Pages 509-520, 2015.

19. Yuanyuan Zhu, Jeffrey Xu Yu, and Lu Qin: Leveraging Graph Dimensions in Online Graph Search, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 8, No. 1, Pages 85-96, 2015.

20. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu: Querying K-Truss Community in Large and Dynamic Graphs, in Proceedings of ACM Conference on Management of Data (SIGMOD'14), Pages 1311-1322, 2014.

Page 44: Chengqi zhang graph processing and mining in the era of big data

References21. Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, and Xuemin Lin: Scalable Big

Graph Processing in MapReduce, in Proceedings of ACM Conference on Management of Data (SIGMOD'14), Pages 827-838, 2014.

22. Zhiwei Zhang, Lu Qin, and Jeffrey Xu Yu: Contract & Expand: I/O Efficient SCCs Computing, in the 30th IEEE International Conference on Data Engineering (ICDE'14), Pages 208-219, 2014.

23. Xin Huang, Hong Cheng, Rong-Hua Li, Lu Qin, and Jeffrey Xu Yu: Top-K Structural Diversity Search in Large Networks, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 6, No. 13, Pages 1618-1629, 2013.

24. Miao Qiao, Lu Qin, Hong Cheng, Jeffrey Xu Yu, and Wentao Tian: Top-K Nearest Keyword Search on Large Graphs, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 6, No. 10, Pages 901-912, 2013.

25. Lijun Chang, Jeffrey Xu Yu, Lu Qin, Xuemin Lin, Chengfei Liu, and Weifa Liang: Efficiently Computing k-Edge Connected Components via Graph Decomposition, in Proceedings of ACM Conference on Management of Data (SIGMOD'13), Pages 205-216, 2013.

26. Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Lijun Chang, and Xuemin Lin: I/O Efficient: Computing SCCs in Massive Graphs, in Proceedings of ACM Conference on Management of Data (SIGMOD'13), Pages 181-192, 2013.

27. Yuanyuan Zhu, Jeffrey Xu Yu, Hong Cheng, and Lu Qin: Graph Classification: A Diversified Discriminative Feature Selection Approach, in Proceedings of 2012 ACM International Conference on Information and Knowledge Management (CIKM'12), Pages 205-214, 2012.

28. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Diversifying Top-K Results, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 5, No. 11, Pages 1124-1135, 2012.

29. Yuanyuan Zhu, Lu Qin, and Jeffrey Xu Yu: Finding Top-K Similar Graphs in Graph Databases, in Proceedings of the 15th International Conference on Extending Database Technology (EDBT'12), Pages 456-467, 2012.

30. Zhiwei Zhang, Jeffrey Xu Yu, Lu Qin, Qing Zhu, and Xiaofang Zhou: I/O Cost Minimization: Reachability Queries Processing over Massive Graphs, in Proceedings of the 15th International Conference on Extending Database Technology (EDBT'12), Pages 468-479, 2012.

Page 45: Chengqi zhang graph processing and mining in the era of big data

References31. Yuanyuan Zhu, Lu Qin, Jeffrey Xu Yu, Yiping Ke, and Xuemin Lin: High Efficiency and Quality: Large

Graphs Matching, in Proceedings of 2011 ACM International Conference on Information and Knowledge Management (CIKM'11), Pages 1755-1764, 2011.

32. Lijun Chang, Jeffrey Xu Yu, Lu Qin, Yuanyuan Zhu, and Haixun Wang: Finding Information Nebula over Large Networks, in Proceedings of 2011 ACM International Conference on Information and Knowledge Management (CIKM'11), Pages 1465-1474, 2011.

33. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Computing Structural Statistics by Keywords in Databases, in Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE'11), Pages 363-374, 2011.

34. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Ten Thousand SQLs: Parallel Keyword Queries Computing, in Proceedings of the Very Large Database Endowment (VLDB), Vol. 3, No. 1, Pages 58-69, 2010.

35. Lu Qin, Jeffrey Xu Yu, and Lijun Chang: Keyword Search in Databases: The Power of RDBMS, in Proceedings of ACM Conference on Management of Data (SIGMOD'09), Pages 681-694, 2009.

36. Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao: Querying Communities in Relational Databases, in Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE'09), Pages 724-735, 2009.

37. Lu Qin, Jeffrey Xu Yu, Lijun Chang, and Yufei Tao: Scalable Keyword Search on Large Data Streams, in Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE'09), Short Paper, Pages 1199-1202, 2009.

38. Lu Qin, Jeffrey Xu Yu, Bolin Ding, and Yoshiharu Ishikawa: Monitoring Aggregate k-NN Objects in Road Networks, in Proceedings of the 20th International Conference on Scientific and Statistical Database Management (SSDBM’08), Pages 168-186, 2008.

39. Bolin Ding, Jeffrey Xu Yu, and Lu Qin: Finding Time-Dependent Shortest Paths over Large Graphs, in Proceedings of the 11th International Conference on Extending Database Technology (EDBT'08), Pages 205-216, 2008.

40. Bolin Ding, Jeffrey Xu Yu, Shan Wang, Lu Qin, Xiao Zhang, and Xuemin Lin: Finding Top-k Min-Cost Connected Trees in Databases, in Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07), Pages 836-845, 2007. (Best Student Paper)

Page 46: Chengqi zhang graph processing and mining in the era of big data

References41. Jia Wu, Xingquan Zhu, Chengqi Zhang, Philip S. Yu. Bag Constrained Structure Pattern

Mining for Multi-Graph Classification. IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol 26, No 10, pp.2382-2396, 2014.

42. Jia Wu, Zhibin Hong, Shirui Pan, Xingquan Zhu, Chengqi Zhang, Zhihua Cai. Multi-Graph Learning with Positive and Unlabeled Bags. SDM 2014: 217-225.

43. Jia Wu, Xingquan Zhu, Chengqi Zhang, Zhihua Cai: Multi-instance Multi-graph Dual Embedding Learning. ICDM’13, 2013: 827-836.

44. Jia Wu, Shirui Pan, Xingquan Zhu, Chengqi Zhang. Multi-Graph-View Learning for Complicated Object Classification. International Joint Conference on Artificial Intelligence (IJCAI’15), 2015

45. Shirui Pan, Jia Wu, and Xingquan Zhu, "CogBoost: Boosting for Fast Cost-sensitive Graph Classification", IEEE Transactions on Knowledge and Data Engineering (TKDE), Accepted, 2015.

46. Shirui Pan, Xingquan Zhu, Chengqi Zhang, and Philip S. Yu. "Graph Stream Classification using Labeled and Unlabeled Graphs", International Conference on Data Engineering (ICDE’13), 2013

47. Shirui Pan and Xingquan Zhu. "CGStream: Continuous Correlated Graph Query for Data Streams". 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012.

48. Shirui Pan and Xingquan Zhu. "Graph Classification with Imbalanced Class Distributions and Noise", 23rd International Joint Conference on Artificial Intelligence (IJCAI), 2013

49. Jia Wu, Zhibin Hong, Shirui Pan, Xingquan Zhu, Chengqi Zhang, Zhihua Cai. "Multi-graph-view Learning for Graph Classification", Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM), 2014

50. Shirui Pan, Jia Wu, Xingquan Zhu, Guodong Long, Chentqi Zhang, “Finding the Best not the Most: Regularized Loss Minimization Subgraph Selection for Graph Classification”, to appear in Pattern Recognition (PR), 2015

Page 47: Chengqi zhang graph processing and mining in the era of big data

Thank you!

Questions?