Final Database
Transcript of Final Database
![Page 1: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/1.jpg)
Distributed Databases and Peer-to-Peer Databases
Supervised By:
Dr. Bassam Hasan Hammo
Ayman Fetyani
Mohammed Musaddaq
Mohammed Ghanem
![Page 2: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/2.jpg)
Distributed Databases
![Page 3: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/3.jpg)
Architecture
![Page 4: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/4.jpg)
Architecture
![Page 5: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/5.jpg)
Peer-to-Peer
![Page 6: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/6.jpg)
What are Peer-to-Peer systems?
All nodes both clients and servers
Multiple connections between nodes
Notion of “equality” hence “peers”
Pure P2P = Zero server
![Page 7: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/7.jpg)
Potential benefits of P2P systems
Scale up to very large numbers of peers
Dynamic self-organization
Load balancing
Parallel processing
High availability through massive replication
![Page 8: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/8.jpg)
8/26
A generic P2P system
• A user at a peer may access sharable data at remote peers
private sharable
P2P software private sharable
P2P software
private sharable
P2P software
![Page 9: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/9.jpg)
![Page 10: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/10.jpg)
10/26
Distributed database system (DDBS)
• Distribution transparency– Global schema
• Common data descriptions• Distributed data placement
– Centralized control through global catalog
– Distributed functions• Schema mapping• Query processing• Transaction management• Access control• Etc.
DistributedDatabaseSystem
DBMS1 DBMS2
Queries, Transactions
Site 1
Site 3Site 2
![Page 11: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/11.jpg)
DBS categories
• The various DBS categories can be characterized along the following three dimensions:
• (i) Distribution, ranging from a centralized architecture (no distribution) (D0) to a client-server distribution (moderate distribution) (D1) to a peer-to-peer (or to full-scale distribution) (D2);
• (ii) Autonomy, ranging from zero autonomy (tight integration)(A0), semi-autonomy (loose integration)(A1) to full autonomy or total isolation (A2);
• (iii) Heterogeneity, ranging from zero heterogeneity (homogeneous systems)(H0) to full heterogeneity (H1).
![Page 12: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/12.jpg)
DBS categories(cont.)• (A0,D1,H0) identifies properties of distributed database systems, i.e., no
heterogeneity and no autonomy.
• (A1,D0,H1) heterogeneous federated database systems .
• (A1,D1,H1) distributed heterogeneous federated database systems.
• (A2,D1,H1) Multi-databases
• (A2,D2,H1) distributed multi-databases
• These systems belong to the class of MDBSs: they are highly decentralized, heterogeneous and totally independent of one another, in the sense that each DBS component is not aware of the existence of all other DBSs and their databases.
![Page 13: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/13.jpg)
P2P vs DDBS
![Page 14: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/14.jpg)
Data Integration Architecture
(a) FDBS/MDBS (b) PDBS
![Page 15: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/15.jpg)
MDBS and PDBS
Simplified System Architecture of (a) a MDBS and (b) a PDBS
![Page 16: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/16.jpg)
P2P network topologies
• Unstructured systems– no predefined topology for linking the peers to each other. Query routing is
done by flooding.
– e.g. SETI@home, Gnutella
• Structured (DHT) systems– There is a specific topology for peer linking.
– DHTs support a routing mechanism that allows the users to find efficiently the peer responsible for a key.
– e.g. CAN, CHORD, Pastry, Pgrid
• Super-peer (hybrid) systems– some peers are responsible for indexing and locating the shared data.
– e.g. Napster, Edutella
![Page 17: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/17.jpg)
P2P unstructured network
• High autonomy (peer needs to know neighbor to login)
• Searching by flooding the network– general, inefficient
• High-fault tolerance with replication
![Page 18: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/18.jpg)
P2P structured network
• Efficient exact-match search– O(log n) for put(key,value), get(key)
• Limited autonomy since a peer is responsible for a range of keys
![Page 19: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/19.jpg)
Super-peer network
• Super-peers can perform complex functions (meta-data management, indexing, acces control, etc.)
– Efficiency and QoS– Restricted autonomy– SP = single point of failure => use several
![Page 20: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/20.jpg)
Requirements for P2P datamanagement (1)
• Autonomy of peers– Peers should be able to join/leave at any time, control their data wrt
other (trusted) peers
• Query expressiveness– Key-lookup, key-word search, SQL-like
• Efficiency– Efficient use of bandwidth, computing power, storage
![Page 21: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/21.jpg)
Requirements for P2P datamanagement (2)
• Quality of service (QoS)– User-perceived efficiency: completeness of results, response time, data
consistency, …
• Fault-tolerance– Efficiency and QoS despite failures
• Security– Data access control in the context of very open systems
![Page 22: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/22.jpg)
P2P systems comparison
Requirements Unstructured DHT Super-peer
Autonomy high low avg
Query exp. high low high
Efficiency low high high
QoS low high high
Fault-tolerance high high low
Security low low high
![Page 23: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/23.jpg)
Data management in P2P systems
• Current research focuses on– Decentralized schema mappings
• PeerDB: unstruct. network, keyword search only
– Extending DHT for complex querying• PIER : exact-match and join queries
– Query reformulation• Edutella: super-peer, RDF-based schemas• Piazza: graph of pair-wise schema mappings
– Replication• generally limited to static read-only files• P-Grid addresses updates in structured networks
![Page 24: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/24.jpg)
Data management in APPA (AtlasP2P Architecture)
• Objectives– Scalability, availability and performance
• Main features– Network-independent architecture– Layered, service-based architecture– Replication with semantics-based reconciliation– Decentralized schema management– Schema-based query support and optimization– Peer data caching
• Prototype on JXTA– Network-independent P2P services
![Page 25: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/25.jpg)
Network independent APPA
![Page 26: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/26.jpg)
Different APPA architectures
![Page 27: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/27.jpg)
Schema management in APPA
• Takes advantage of the collaborative nature of the applications– Peers that wish to cooperate agree on a
Common Schema Description (CSD)
• Given 2 CSD relation definitions, an example of peer mapping at peer p is:–
• Peer mappings stored as P2P data
![Page 28: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/28.jpg)
Validation
• Implementation on JXTA– Some support for APPA’s basic services– Network-independent
• Experimentation on large clusters and grid [Grid 5000]
• Simulation to scale up to very large P2P systems– Using SimJava and Brite
![Page 29: Final Database](https://reader034.fdocuments.in/reader034/viewer/2022042623/55284d4849795928048b4718/html5/thumbnails/29.jpg)
Thank you
We Hope this presentation has been informative for you and thank you for
listening