Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong.
-
Upload
alexandre-hollier -
Category
Documents
-
view
214 -
download
0
Transcript of Web Service Clustering Building Homogenous Service Communities Wei Liu Wilson Wong.
Web Service ClusteringBuilding Homogenous Service Communities
Wei LiuWilson Wong
2 SOCASE@AAMAS2008
22-Jun-05
Outline• A brief introduction on –Web services– Text mining
• Web Service Clustering – The motivation– The challenges– The process– The results
3 SOCASE@AAMAS2008
22-Jun-05
What are Web Services• It is software designed to be used by other software via
Internet protocols and formats (Forrester)• Web services are self-describing components that can
discover and engage other web services or applications to complete complex tasks over the Internet. (Sun Microsystems, Inc)
• Web Services are loosely coupled software components delivered over the Internet via standards-based technologies like XML, and SOAP. (Gartner)
• Self-describing, self-contained, modular unit of application logic that provides some business functionality to other applications through an Internet connection… (UDDI.org)
• Web services are Internet-based, modular applications that perform a specific business task and conform to a particular technical format. (IBM)
• A web service is application logic that is programmatically available, exposed using the Internet. (Microsoft)
4 SOCASE@AAMAS2008
22-Jun-05 4
• Web services are applications accessible via the Web to be consumed by clients.
• Clients of a Web Service are usually refer as service requester.
• Technologies standardized by the W3C to support Web service applications are:
Web Service Description Language (WSDL) Simple Object Access Protocol (SOAP) Universal Discovery, Description, and Integration (UDDI)
The Web Service Triangle
5 SOCASE@AAMAS2008
• Broadly defined as “the act of locating a machine-processable description of a web service that may have been unknown and that meets certain functional criteria”
• Originated from agent match-making paradigm (middle agents and brokers), later moved onto UDDI [2]
• The discovery mechanisms differ according what languages are used for describing the service (WSDL or OWL-S)
What is Web Service Discovery
[2] Garofalakis, J., Panagis, Y., Sakkopoulos, E., Tsakalidis, A.: Web service discovery mechanisms: Looking for a needle in a haystack? In: International Workshop on Web Engineering, Hypermedia Development and Web Engineering Principles and Techniques: Put them in use, in conjunction with ACM Hypertext, Santa Cruz (2004)
6 SOCASE@AAMAS2008
• Static and Not scalable– The registry can become a bottle neck– New services have to be added through a
laborious process to ensure “correct” categorisation, which deters people from using it
• Search is keyword based– Ontology supported semantic search are only
available agent and semantic web services
Ill-fated Registry Based Structure
7 SOCASE@AAMAS2008
• Make use of the wsdl files collected by Google
• Automatically cluster these files into functionally similar groups using text mining methods– linguistic analysis, and statistical techniques
combined
• The resulting clusters will help service discovery by reducing the size of the haystacks
What we propose
8 SOCASE@AAMAS2008
• Traditional Information Retrieval and Document Clustering techniques cannot be borrowed directly, because of the following observations– web service files do not usually contain sufficiently large
number of words for use as index terms or features. – Moreover, the small number of words present in the web
service files are erratic and unreliable. – Related web pages that describe the WSDL service are also
considered. GoogleAPI for discovering web page referral or citation. However, most of the WSDL files do not have related web pages that provide hyperlinks to them. The few that have hyperlinks referring to them are typically examples teaching how to program in a service-oriented paradigm. Observations are concurred by [9]
Challenges
9 SOCASE@AAMAS2008
System Architecture
22-Jun-05
10 SOCASE@AAMAS2008
Collected WSDL File
22-Jun-05
11 SOCASE@AAMAS2008
• Content – Parse the WSDL file for service descriptions in natural language
• Context – Relate documents by looking at parent/grandparent directories– Tokenising, stemming, – Remove function words*– Remove programming terms*
Obtaining Content and Context
12 SOCASE@AAMAS2008
• One of the properties of content words is that they tend to “clump” or to re-occur whenever they have appeared once [10].
• On the other hand, the occurrence of function words tend to be independent of one another.
• Very often, such contrasting property can be captured through the inability of the Poisson distribution to model word occurrences in documents [11].
• In other words, unlike content words, function words tend to be Poisson distributed.
Content Words vs. Function Words
13 SOCASE@AAMAS2008
Remove Function Words
22-Jun-05
A segment of the output during content-word recognition performed on the word tokens in the web service context set for the service QuranService.
(single parameter poisson distribution)
14 SOCASE@AAMAS2008
Remove Programming Terms
22-Jun-05
Using term clustering methods that based on Normalised Google Distance to identify programming term clusters using our Tree-Traversing Ants featureless term clustering [12]
15 SOCASE@AAMAS2008
Clustering Results for QuranService
22-Jun-05
A small oracle:runtime,
webservice,developer,module,
data
16 SOCASE@AAMAS2008
• The service host is the second and top-level portion of the domain name (i.e. a segment of the authority part of the URI) of the host containing the WSDL file, and
• The service name is the name of the WSDL file.
• As one may note, the four features are by no means the best or the only ones available for describing a web service.
• However they are the most accessible and feasible ones to use in this case.
The service host and the service name
17 SOCASE@AAMAS2008
Combining the four features
18 SOCASE@AAMAS2008
Web Service Clusters
22-Jun-05
19 SOCASE@AAMAS2008
Conclusions• The paper presented techniques for automatic discovery of
web services of similar functionalities.• We term such service clusters as homogeneous service
communities. If the crawling and the clustering process are in continuous operation like a typical search engine does, the approach has the potential of enabling self-organisation of the Web as proposed in [3].
• The proposed web service clustering approach assumes no registries, and can automatically reduce the search space of web services effectively. Therefore, it can be seen as a predecessor for Web Service Discovery.
• This paper gathers real service description files from the Web instead of working on hypothetical examples.
• The resulting clusters not only provide a useful glimpse on what services are out there, but also an insight into the types of technologies which have proliferated in this area.
22-Jun-05
20 SOCASE@AAMAS2008
22-Jun-05 20
Web service has become a new trend for doing business online. U.S. – 65% of companies will and have been working on
Web service projects. 2003 – $3 billions; 2008 – $15.8 billions Web services help in e-business and e-commerce
development.
The Web Service “Hype”
“Just as the Web revolutionized how users talk to applications, XML transforms how applications talk to each other.” (Bill Gates)
“Web services are expected to revolutionize our life in much the same way as the Internet has during the past decade or so.” (Gartner)
21 SOCASE@AAMAS2008
• The UDDI Business Registry (UBR) was part of the UDDI Project announced in September 2000.
• The project goals were to define a set of specifications to enable description, discovery and integration and to prove interoperability through operational experience.
• The UBR ran for 5 years, demonstrating live, industrial strength UDDI implementations managing over 50,000 replicated entries.
Why IBM, Microsoft and SAP stopped UBR
22 SOCASE@AAMAS2008
Is Popfly service-oriented?
23 SOCASE@AAMAS2008
Thank You