Information Searching and Retrieval from Distributed Databases using Mediators, CORBA and XML
description
Transcript of Information Searching and Retrieval from Distributed Databases using Mediators, CORBA and XML
Information Searching and Retrieval from Distributed Databasesusing Mediators, CORBA and XML
Presented by: Vincent CheungSupervised by: Prof Michael Lyu,
Prof K. W. Ng
Dec 18, 2000
Research objectives Address the Use of XML in enhancing Data
Representation and System Communication Address the Use of Mediator Architecture for
Searching in Distributed Environment Address the Use of XML and HTTP to simulate
CORBA IIOP calls We have developed a CORBA-based mediator
system which can overcome the limitation of firewall problem to achieve a worldwide system
Presentation outline What is Mediator? Using XML Implementation with CORBA CORBA vs. Firewalls Overcome the Firewall Evaluation Future Work Conclusion Demonstration
Searching in distributed sourcesTo integrate different components in the
systems, there are three approaches: CORBA Mediators Agents
They are not orthogonal, e.g.: Implement mediators by CORBA Agents may use mediators
What is mediator? A middle layer for forwarding clients queries to appropriate
sources, and integrate the data before returning to users
DatabaseEngine
DatabaseEngine
DatabaseEngine
mediator
Client UIClient UIClients UI
query
result
Using XML We need a COMMON format of information in
order to let distributed heterogeneous components for data exchange.
XML provides a good solution because: XML is semi-structured and highly flexible in data
representation. Highly structured traditional relational and object-oriented data schema can be mapped to XML schema without losing information.
A common XML schema for heterogeneous system can be achieved
Hence, we use XML data schema in our system.
A piece of XML data<news><source>South China Morning Post</source><date>April 15, 2000</date><title>Press warning appropriate, says Beijing </title><reporter>Kong Lai-fan</reporter><reporter>Greg Torode</reporter><content>Beijing yesterday defended remarks made by senior SAR-based official Wang Fengchao that local media should avoid reporting separatist views. </content></news>
XML-QL To query semi-structured data, traditional query lan
guages may not be adequate We use XMLQL, which is designed for semi-structu
red XML data An example:where <news> $B <news> in ‘’news_db.xml’’,<date><year>2000</year><month> 4 </month><day> 15 </day></date> in $Bconstruct <result> $B </result>
Why use CORBA?Common Object Request Broker
ArchitectureDesigned for application development
within distributed heterogeneous environment
We use CORBA to build our mediator system.
Architecture of our system
Web-base UI
Web-base UI
Web-base UI
1st tier
Servlet Interface
Mediator(forwarding queries and integrating
results)
2nd tierData Source
Mediator
Data Source
3rd tier
UI Queries and Results Queries Results
…Nth tier
Handling some special casesSome special cases: Infinite loops,
broken connection and too many layer of traversal.
Infinite loops:
mediator mediator
mediator
Solution: To give each query a unique ID, and mediators will keep track all query IDs which have no replied answers yet. Duplicated IDs would indicate an infinite looping has occurred.
qid = 123qid = 123
qid = 123
qid = 123
Special cases Broken connection
Solution: Use a timeout parameter to specify the maximum amount of time that we are willing to wait.
Too may layers of traversalSolution: use a maximum layer parameter to specify the maximum number of layers that we want to go.
mediator mediator
mediatorTimeout = 15000Max_layer = 3
Timeout = 10000Max_layer = 2
Timeout = 5000Max_layer = 1
IDL design of our system IDL (Interface Definition Language) defines e
xport interface of CORBA objects Our IDL design:
The parameter type for special cases handlingStruct SysPara{
long qid;long timeout;short maxlayer;
}
IDL design of our system Mediator may make queries to Databases or Me
diators Hence, we want Databases and Mediators can
be very similar objects Now, both of them are implementing QueryEngi
ne Interface Interface QueryEngine {
String query(in SysPara para, in string xmlquery);
}
QueryEngine
QueryDB QueryMed
Implemented by
QueryDB ObjectDirectly connects to the data sourceCaller calls query() It takes the query statement parameter
and make query to related data sourceReturns answer in XML string stream
format
QueryMed Object Same invoking method, query() Besides QueryEngine, it implements another interface, Qu
eryMediatorpublic interface QueryMediator {public QueryEngine[] qelist();public void qelist(QueryEngine[] arg);public void append_result(String res);}
qelist holds a list of QueryEngine objects, i.e. QueryMed or QueryDB objects, which will be called by that mediator.
It starts a thread for each target QueryEngine object, and the thread will call append_result() to integrate results from various sources
Problems with CORBA firewall We cannot achieve a worldwide query system becau
se of the firewall. CORBA uses Internet InterORB Protocol (IIOP) for c
ommunication. Message body of IIOP is encoded in Common Data
Representation (CDR), which translates IDL data types into a byte-ordering independent octet string.
Firewalls cannot decode the message body of IIOP in application level.
But we have application level gateways for telnet, FTP, and HTTP
CORBA firewallsWe do have some firewalls which are ded
icated for CORBA communicationsE.g. IONA Orbix WonderWall and Visibro
ker GatekeeperThey have limitations:
Vendor dependent: Orbix firewalls must use Orbix developed client and server objects
Not all features of CORBA can be used: Callbacks may not be used
Not commonly used Extra purchase, etc…
CORBA and firewalls IIOP cannot pass firewall, but HTTP can
CORBA enclave
CORBA enclave
IIOP
HTTP Servlet XML
firewall
A real worldwide CORBA system can be achieved
Solution to firewall problem HTTPGateway is also implementing QueryEn
gine.
HTTPGateway is a virtual query engine to forward the query to the target systems
QueryEngine
QueryDB QueryMed
Implemented by
HTTPGateway
Simulated Call
Mediator M
HTTP Gateway H
firewall
Servlet Mediator SM
DB Object
CORBA Enclave CORBA Enclave
Sample XML message in parameter passing<request object="QueryMed" method="query" return="string">
<parameter order="1" name="para">}<SysPara><qid>398498241824033984092</qid><maxlayer>4</maxlayer><timeout>2000</timeout></SysPara></Para><parameter order="2" name="QueryStatement"><string><news> \$B </news> in "database.xml"<keyword>satellite</keyword> in \$Bconstruct <result> \$B </result></string></Para>
</request>
Its DTD<!DOCTYPE parapassing [
<!ELEMENT request (parameter*)><!ATTLIST request object (#CDATA)><!ATTLIST request method (#CDATA)><!ATTLIST request return (#CDATA)><!ELEMENT parameter (SysPara | string)><!ATTLIST parameter order (#CDATA)><!ELEMENT SysPara (qid,maxlayer,timeout)><!ELEMENT qid (#CDATA)><!ELEMENT maxlayer (#CDATA)><!ELEMENT timeout (#CDATA)><!ELEMENT string (#CDATA)>
]>
EvaluationADVANTAGES: It can overcome the CORBA IIOP vs. firewalls
problem by using HTTP calls,XML, and servlet. Security issue can still be maintained. External
CORBA objects can call only the objects combined with Servlet
Both primitive data types, like String, or complex classes, SysPara, can be well represented by XML in parameter passing
Evaluation It is not limited for the clients and servers
must be CORBA implementation. Internal CORBA objects would not notice the
difference of calling an external objects when comparing to callings of internal objects
With sharing the same XML and IDL standards, we can easily achieve a worldwide query system.
EvaluationWEAKNESS:With using HTTP calls to simulate IIOP i
s relative slower than using firewalls dedicated for CORBA, as we need to do many extra works to initialize the servlet, to convert parameters to XML format, invoke the servlets to work, etc.
Future WorkGeneralize the IIOP simulation
For all kinds of objects, parameter types, return types, etc
Direct code generation from IDLDesign the mechanism that supports
CALLBACKSWhen callback can be simulated, we
can enhance the features of our mediator system
ConclusionUsing Mediators in querying distributed
sourcesHow to use CORBA and XML to
integrate the systemCooperation between XML and CORBA
in simulation of IIOP calls
Demonstration Aim to show the mediator system can:
make queries to varies sources in parallel go beyond firewalls
User Interface(SHB 1027)
Servlet MediatorH111, Kuomao Hall
Data Source 1H111, Kuomao Hall
HTTP GatewayH111, Kuomao Hall
FIREWALL
Servlet Mediator(SHB 913, pc90003)
Data Source 2(SHB 913, pc90003)
Q & A Session
Welcome to Give Questions and Comments
<appreciation>Thank You</appreciation>