1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell...

20
1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University, Wolfville, NS, Canada * presenting

Transcript of 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell...

Page 1: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

1

Management of Distributed Data

Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell

Jodrey School of Computer Science, Acadia University, Wolfville, NS, Canada

* presenting

Page 2: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

2

Contents of the Talk

• General goals for systems that implement data distribution and remote program invocation

• Distributed Data Manager, DDM

• Conclusions

• Future Work

Page 3: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

3

General Goals

• Platform Independence• Security• Scalability• Pull and Push• Efficiency• Synchronization• Fault Tolerance• Flexibility• Selective Choice• Remote Invocation

Page 4: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

4

DDM: Distributed Data Manager

• Two kinds of applications:– subscriber expresses interest in some data that can

be provided by a publisher– publisher selects subscribers and specifies data to be

sent to these subscribers.

• A single application may be at the same time both, a publisher and a subscriber

• SSL is used to support secure transmission• DDM extends our previous system, called DI.

Page 5: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

5

DDM: Channel

• An abstract concept that represents a virtual, unidirectional connection between two nodes

(similar to a pipe used in P2P systems). • Two kinds of channels:

– P-channel, created by a publisher.– S-channel, created by a subscriber.

• Each channel has– name– optional description

Page 6: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

6

DDM: P-channel

Each channel specifies the source of its data, i.e. the directory in the file system on the publisher’s computer.

Page 7: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

7

DDM: P-channel

• the publisher can push, or distribute some or all of P-channels to one or more selected subscribers

• when the subscriber receives the incoming P-channel, it will automatically create the corresponding S-channel based on– publisher's IP and port number– the name of the P-channel.

• any data in the P-channel will be pushed into the newly created S-channel.

Page 8: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

8

Publisher’s Configuration Files

• publisher.properties (static)– port number on which the publisher will send

data;– a root directory for storing all channels’

description;– the filename and password for the keystore

used to establish secure SSL communication path;

• target.properties (dynamic)– lists all potential subscribers, can be changed

at runtime.

Page 9: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

9

Publisher’s Dynamic Files

End-users can select at runtime one or more target clients to start the remote program at same time:

Page 10: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

10

Publisher’s Dynamic Files

If there are multiple applications in a channel, the user can select one of them from the list to start:

Page 11: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

11

Subscriber’s Configuration Files

• subscriber.properties– port number for receiving data;– a root directory for storing all incoming channels’ data;– a channel creation type– the filename and password for the keystore used to

establish secure SSL communication path;– a security flag indicating whether or not the data

transmission is secure (true by default);– a channel update type– a channel update option

Page 12: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

12

Relative Virtual Mapping

Example:1. A publisher P creates a channel "quote", and specifies

the corresponding directory "c:\program\quote".2. A publisher distributes the "quote" channel to a

subscriber S with all channel data (it will include all files and sub-directories under "c:\program\quote").

3. In S, a channel named "quote" is automatically created, and all received data (files and sub-directories) are stored with the original hierarchy under the root channel directory(specified in the configuration file subscriber.properties)

Page 13: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

13

S-channel

• the S-channel is the reference to the corresponding P-channel

• typically created automatically by subscriber software when receiving incoming P-channels pushed by publishers

• the subscriber can also manually create an S-channel:

• the subscriber must know in advance the available channel names in publisher side

• if the channel name specified in S-channel does not exist in publisher side, this S-channel is invalid, just like that you set wrong frequency for a TV station channel.

Page 14: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

14

S-channel

GUI with buttons to create, modify, delete S-channels, and update the selected S-channel, i.e. pull data for this channel from specified publisher side.

Page 15: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

15

New S-channel

•An address of the publisher node

• channel name

• description,

• update type and option (they can also be used for an existing S-channel, this way, the subscriber can control the way data are received)

Page 16: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

16

Remote Invocation

To distribute and to invoke components C1, C2, C3:1. A publisher (on N) creates three channels; one for each

of N1, N2 and N3.2. The publisher specifies the Main class in the descriptor

file app.xml file (explained in the next slide) for each channel.

3. The publisher adds addresses of N1, N2 and N3 to the file target.properties.

4. The publisher distributes all channels to three subscribers

5. The publisher uses the "remote start" button

Components: C1 C2 C3Machine N

Machine N1 Machine N2 Machine N3

Page 17: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

17

XML-based Descriptor

Defines the startup class for the distributed program and the necessary path information to load dependent classes for this program, for example:

<?xml version="1.0" encoding="ISO-8859-1"?><App name="CS Lab course"><MainClass> ca.acadiau.cs.Lab </MainClass><ClassPath> client.jar </ClassPath></App>Should be located in the channel and sent to the subscriber

along with other channel’s content.

Page 18: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

18

XML-based Descriptor

A program can be remotely started with pre-defined policy file and arguments.

<?xml version="1.0" encoding="ISO-8859-1"?>

<App name="CS Lab course"><MainClass> cs.Lab </MainClass><ClassPath> client.jar </ClassPath><PolicyFile> policy.txt </PolicyFile><Arguments> Michael:27 </Arguments></App>

Page 19: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

19

Conclusions

• DDM provides both, data distribution and remote invocation

• It can be used with firewalls:

Consider a specific example of a campus that uses a firewall:– It is possible to set up an S-channel from an off-

campus machine to a machine inside the firewall– the publisher inside the firewall can push data to the

off-campus machine

Page 20: 1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,

20

Future Work

• the current interface is GUI-based, and for some applications it will be useful to provide a text-based interface.

• to improve the flexibility of our system, we will add a publisher broker, which will maintain all available publishers (and their channels) so that subscribers can dynamically set and modify these data

• support invocation of non-Java programs.