GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of...
-
Upload
delilah-cooper -
Category
Documents
-
view
214 -
download
0
Transcript of GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of...
![Page 1: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/1.jpg)
GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content
Ioannis KonstantinouSchool of ECE
Computing Systems LaboratoryNational Technical University of
Athens
![Page 2: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/2.jpg)
Concept
Text based search in audiovisual content Search results: Portions of video files
containing selected keywords Example
User searches for keyword “Acropolis” Video portions containing the spoken word
“Acropolis” are located and presented in the user
YouTube like functionality
![Page 3: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/3.jpg)
Objectives
Keyword extraction from video files using automatic speech recognition algorithms (ASR)
Efficient and scalable distributed storage of large media content
Indexing of extracted metadata for efficient keyword search
YouTube like user interface for video searching/downloading
Contribution to existing Grid Middleware using GGF standardized components
![Page 4: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/4.jpg)
Addressed Issues Execution
Distributed execution of CPU/Data intensive Speech Recognition Algorithms
Storage Server load balancing using
performance metrics Client transfer time optimization using
bittorrent like algorithms Increase data availability Multi-organizational data storage
support using Virtual Organizations (VOs)
![Page 5: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/5.jpg)
Video file keyword extraction procedure
MetadataDB
MetaScheduler
Cluster Cluster
Cluster
Storage
SESE
SE
Application Software
Data Client GET/PUT File Ops
CVSP tool
MetadataFile
Export
MetadataParserImport
Update
Schedule Execution
![Page 6: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/6.jpg)
Distributed Execution Platform Architecture
Central MDS Server
User Interface
GridWay Metascheduler
Globus Toolkit 4
MDS
Local MDS Server Local MDS Server Local MDS Server
Publish aggregateCluster information
SOAP/XML messages
Computing Element
PBS Scheduler
Globus Toolkit 4
MDS + WSGRAM
WN WN WN
Ganglia Stats
WN WN WN WN WN WN
PBS client service
Ganglia Tool
MDS : Managing and Discovery ServicePBS : Portable Batch SchedulerWSGRAM : Grid Resource and
Allocation ManagementWN: Worker Node
![Page 7: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/7.jpg)
Distributed Storage Platform Architecture
DRLS Pastry Peer to Peer
SE
GridFTP Server
Network Weather Service (NWS)
DRLSCom Service
Globus WS Apache Axis Container
SE
SE
Storage Subsystem
Client Machine
GridNews DataClient component
GridFTP Client
Parallel Downloader
GridFTPGet/put
Query Storage Subsystemto obtain candidate storage servers
using SOAP/XML Messages
Get/Put LFN to PFN mappings
SOAP/XML Messages
MDS: Managing and Discovery ServiceGridFTP: Grid File Transfer ProtocolDRLS: Distributed Replica Location ServiceSE: Storage ElementPFN: Physical File NameLFN: Logical File Name
SE
Replication Algorithmusing NWS stats
![Page 8: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/8.jpg)
Distributed Replica Location Service
Contains mappings of LFNs to PFNs DHT used: Pastry P2P Logarithmic routing
In a network with n nodes, a query needs only log(n) messages (hops)
Plaxton’s algorithm minimizes query latencies Redundancy through replication
Eliminates single point of failure situations Inherent load balancing capabilities
Consistent hashing algorithms
![Page 9: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/9.jpg)
Load Balancing
Servers exchange load metrics CPU Bandwidth Free Disk Space
Prediction algorithms (e.g. Linear regression) forecast future metrics from history data
Weighted Normalized Metric WNM : WmX(Mt/Mmax) Total Server Load (TSL): Sum(WNMi)i=1..n Servers maintain numerically sorted TSL list:
[TSL1..TSLn] TSL list periodically refreshed
![Page 10: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/10.jpg)
Replication
Upon a STORE client request: Top k servers are selected from WNM list
k: configurable static replication factor Most suitable server is returned to the client Client initiates a single GridFTP file upload Server replicates the new file according to
WNM list and factor k DRLS is informed about the new LFN->PFN
mappings Client is informed Upon completion
![Page 11: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/11.jpg)
Parallel Downloader
Upon a GET client request: Server contacts DRLS and retrieves replica
locations Client establishes N GridFTP connections Client initiates N parallel (threaded) small data
chunk requests After each successful retrieval, client re-initiates
another request Optimum file transfer time:
The greater file portion is retrieved from the faster storage nodes
To be replaced by GridTorrent
![Page 12: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/12.jpg)
GridTorrent Metadata fields
Current Id File size Piece size Hashes
Distributed RLS instead of Tracker
Partial GridFTP for actual transfer
BitTorrent replica selection and tit-for-tat algorithm.
Compatible with plain GridFTP servers
PFN’s prefix determines protocol (gtp://site.fully.qualified.domain.name/path/to/file)
Bittorrent
Peer3GridFTP Server
New connection
New
connection
Peer setPeer2
GridTorrent client
Peer1GridTorrent client Bittorrent
Stri
ped
Grid
FTP
Striped GridFTP
CurrentID :pfn1 pfn2 pfn3
…
publish
pfns
Peer that wants todownload a file
size,piece_size,hashes, currentID
DistributedRLS
Striped GridFTP
Bitt
orre
nt
New
con
nect
ion
piece
piece
piece
piece
piece
piece
![Page 13: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/13.jpg)
EXISTING MIDDLEWARE CONTRIBUTION
Design and development of a dynamic replica
selection/placement algorithm
Added support for multiple clusters using Gridway Metascheduler
Replace centralized replica location service with a
scalable distributed peer to peer solution
Enhanced bittorrent like file downloading
from multiple sources
![Page 14: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/14.jpg)
Development Testbed Hardware
4X Dual Core AMD Opteron(tm) Processor 875 2.2GHz – 8 virtual CPU 16Gb Ram
Deployment of 5 Xen virtual machines, 2GB ram Software
Globus Toolkit v4.0 Globus WS Core (Apache AXIS WS Container) Rice Pastry P2P (Java) Network Weather Service Torque (OpenPBS) scheduler GridWay Metascheduler
![Page 15: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/15.jpg)
Virtualization
Xen Hypervisor Paravirtualization tequnique Guest OS use special “xen aware” kernel Direct utilization of special CPU instructions Faster than full virtualization (VmWare)
Use of Xen Hypervisor Easy prototype management/administration Simple control of the node lifecycle Facilitate prototype deployment in many actual
nodes
![Page 16: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/16.jpg)
Currently working
Replace ParallelDownloader with GridTorrent
Deploy prototype in the PlanetLab testbed
Run experiments Fine-tune designed algorithms
![Page 17: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/17.jpg)
Gridnews Portal
Users can: Perform keyword search in the auto-
annotated multimedia content View the video from their browser in a
youTube style Download only a fragment of the video
where this keyword exists
![Page 18: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/18.jpg)
Screenshots
keyword
Video URLs
time position
![Page 19: GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.](https://reader035.fdocuments.in/reader035/viewer/2022070413/5697bfd01a28abf838caacca/html5/thumbnails/19.jpg)
Questions