XSLT I Robin Burke ECT 360. Outline XSLT processing XSLT syntax XPath XSLT basics Lab.
Implementing the Genetic Algorithm in XSLT: PoC
-
Upload
jimfuller2009 -
Category
Technology
-
view
777 -
download
1
description
Transcript of Implementing the Genetic Algorithm in XSLT: PoC
Proof of Concept:SOA Application Composition using the Genetic Algorithm
Jim Fuller
http://www.ruminate.co.uk http://www.slgchorus.com
Introduction• Technical Director / Internet Services Manager
for Stuart Lawrence Group companies• on-IDLE ltd sponsored 1st XSLT conference in
the world: XSLT UK 2001 along with Dave Pawson
• co-founder of the EXSLT effort, along with Dave Pawson, Jeni Tennison, Uche Obigu, et al.
• Technical reviewer and author for now defunct WROX, on books dealing with XML, XSLT and web services
Lecture Overview
How we use WS todayXSLT and S-expressionsGenetic Algorithm refresherEarly Genetic Experiments with XSLTApplication composition using Genetic
AlgorithmConclusions
How we use WS in today's applications
• Indirectly consume web services via WSDL / UDDI subsequent generation of stub code
• Direct Consumption of SOAP via manual crafting of HTTP Request headers + SOAP envelope
• Primary use cases: Integration and Interoperability
• Emerging use cases: orchestration, higher level business processes, and automated application composition
MVC type architectures are popular
Client Tier
Presentation Tier
Business Tier
Integration Tier
Resource Tier Data Repository,
XML Binding, Persistence
Model
View Controller
External web services
Internal web services
WS MVC with the Browser
Controller
EventHandler
SOAPEventHandler
Model
The Model receives events from the Controller and updates itself sending Data which gets transformed by our view components.
View
-IE web service client side processing
-XSLT templates-CSS-Global.xml-Global.xsl
HTTP GETHTTP POSTREQUEST
Internal web services
External web services
HTTP RESPONSE
Internet Explorer Client
SOA Anchor
• Stability via web service server: BEA Weblogic, IBM Websphere, Systinet WASP, .NET, ColdFusionMX
• versioning control of web services• Easy to deploy same web service through multiple
transports• Smooth out learning curve for many of the underlying
XML technologies ( SAML )• security integration with underlying PKI • Instant solution to some problems• Deploy existing code as web service, no need for
‘special’ web service code embedded in your own code
Bazaar not opened yet
• Currently developers ask how can *I* use them in *my* applications.
• Web services live behind the firewall and solve integration problems; extraprise.
• Google, Amazon and Microsoft are all examples of monolithic web services.
• Many deployed web services are highly specific to a certain problem domain.
• Who will bind a specific public web service with their precious application ? (Amazon in research pane).
The world of ‘millions of web services’
• The question is not ‘how will a developer find a web service?’ but how will a machine find and use the right web service ?
• How will the developer/machine know it’s the right one ? That its stable, correct version, and it can be trusted…
• The promise of SOA is real time application composition generating applications or components, based on a set of general evolving criteria
Automatic application composition methods
• One approach, not linked to any problem domain is to use the Genetic Algorithm…though there are obvious constraints using these methods
Random searchof the problem
domain
AI / intelligent Software agent
methods
Genetic Algorithm Refresher
• The Genetic Algorithm ( GA ) is a model of the evolution of a population of artificial individuals.
• Each individual is a chromosome which contains discrete units of information; in computers this can be a string, binary numbers, etc… .
• With each generation the best fitness individuals are selected for genetic operations to create new generation
• The driving force behind the search for new and better solutions is the retention and combination of good partial solutions to a problem
Abridged Genetic Algorithm
• The Fundamental Theorem of Genetic Algorithms
M(H, t):# of individuals in population 't' with the schema 'H'.f(H): average fitness of the individuals with the schema 'H'.F: average fitness of the entire population.p1:probability of the schema being destroyed by crossover.p2:probability of the schema being destroyed by mutation.
GA operations
• Reproduction: An individual is perfectly replicated to a new population
• Crossover ( Recombination ): Parental material is recombined to create offspring to join new population
• Mutation: random changes• Permutation: reordering • Editing: evaluation to a terminal• Encapsulation: single indivisible function• Decimation: removal of individuals
Genetic Programming ProcessStep 0. Create a random initial population of individuals
Step 1. Evaluate the fitness of each individual
Step 2. Select individuals according to their fitness, which will participate in generating offspring (moms+dads)
Step 3. Apply primary and secondary genetic operations to generate new offspring population
Step 4. Repeat the steps 1,2,3, to generate X number of generations
Step 5. choose best fit individual
Symbolic expressions and XSLT
• XSLT List questions….I originally wanted to solve ‘I want to transform source xml to target xml using XSLT’. Could use generic templates or some other automated process.
• Vestigial lisp memories of s expressions are similar to xslt / xml: data and programming in one
• XSLT guru David Carlisle presence at XSLT UK 2001 opened my eyes to functional programming
• My work with EXSLT defined the limitations of XSLT…which led me to build frameworks to implement complex MVC architectures
(+(* 2 3) 4) evaluates to 10 and symbolic expression looks like;
Simplest Lisp Example
3
4
+
*
2
Hierarchical computer programs are more expressive then manipulating linear strings
XSLT are also general hierarchical computer programs
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version=“2.0">
<xsl:template match="a"> <d/>
<c/> </xsl:template></xsl:stylesheet>
<d/><c/>
<xsl:template/>
<xsl:stylesheet/>
There are some differences, e.g. there are a variety of node types within XML
Problem definition
• Create a GA process that will discover an XSLT program which taken a source.xml generates a target.xml
• Prototype uses ASF ANT to control the whole process
• Michael Kay’s excellent SAXON xslt processor, XSLT 2.0 simplified situation by removal of dealing with RTF’s and node-set usage
• Initially create a simple problem, e.g. that of transforming a source xml into a copy of itself
Source XML
<a>
<b>
<c>
<d></d>
</c>
</b>
</a>
Target XML
<a>
<b>
<c>
<d></d>
</c>
</b>
</a>
Early Genetic ExperimentStep 0. Randomly generate initial population of xslt documents
Step 1. evaluate fitness using via xml diff of target.xml to result.xml
Step 2. select individuals according to their fitness which can be used by step 3
Step 3. Apply primary and secondary genetic operations to generate new offspring population from selected individuals
Step 4. Repeat steps 1,2,3, to generate X number of generations
Step 5. choose best fit individual of last generation
Objective Generate an xslt program that transforms source xml into result xml which is equivalent to target xml
Terminal Set <a/> <b/> <c/> <d/>
Function Set Subset of xslt instructions
Fitness Cases One fitness case
Raw fitness Node count on xmldiff patch file difference between result xml and target xml
Standardized fitness
Same as raw fitness, approaching 0 is better fitness
Parameters M=500, G=51
Step 0. Generate Initial Population
Used IBM xml generator: com.ibm.XMLGenerator.XMLGenerator to generate a population of xslt documents.
<?xml version='1.0'?><!-- Created by IBM XML Generator
numberLevels=10, maxRepeats=3, Random seed=1060890913224fixedOdds=1, impliedOdds=4, defaultOdds=4maxIdRefs=3, maxEntities=3, maxNMTokens=3isExplicitRoot=true, root element name is 'xsl:stylesheet'entOdds=1 Entity list:[]doctype declaration?false
--><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="a"> <c/> <d/> </xsl:template></xsl:stylesheet>
Avoid ‘early taxonomisation’
• No attributes• No namespaces• No schemas• Xmlgenerator DTD defines allowable
terminals and functions e.g. xsl:apply-templates, xsl:for-each, xsl:value-of, xsl:copy-of, xsl:choose, xsl:if, xsl:copy.
• used <a>, <b>, <c>, <d> as the only allowable elements
Ant: generate_initial_population <target name=“generate_initial_population">
<tempfile property="temp.file" prefix="xslt_" suffix=".xsl" destdir="${dirs.src}"/>
<!-- defines start.TODAY, start.DSTAMP, start.TSTAMP properties //--><tstamp prefix="start"/>
<!-- current population number //--><property name="xslt.build_number" value="${gen_count}"/>
<!-- apply transforms using xslt //--><java classname="com.ibm.XMLGenerator.XMLGenerator"
fork="true" failonerror="false"
output="${temp.file}">
<arg value="${xslt.initial_dtd}"/> <arg value="-root"/> <arg value="${xslt.root_node}"/> <arg value="-nodecl"/> <arg value="-f"/> <arg value="1"/> <arg value="-l"/> <arg value="10"/>
</java>
</target>
Step 1: Evaluate Fitness
XSLT generation
xslt Source.xml
result.xml Target.xml
evaluate fitness
transformation
xml diff
Each individual is ranked, by testing xslt program against a source xml
Step 1. evaluate fitness (cont)
• Could have chosen multiple source and target xml to use in fitness assessment
• Output of transformation (result.xml) is xmldiff’ed with target xml
• I used an extremely simple xml diff tool that just output xml patch
• Converted Diff patch file into a number, which is the number of nodes contained in the patch file
TREEDIFFMERGE DIFFERENCE PATCH RESULT XML from XSLT individual transformation with SOURCE XML
<?xml version="1.0" encoding="UTF-8"?>
<diff xmlns:diff='http://diff.org'>
<diff:insert dst="1">
<a>
<b>
<c>
<d />
</c>
</b>
</a>
</diff:insert>
</diff>
<?xml version="1.0" encoding="UTF-8"?><root/>
<?xml version="1.0" encoding="UTF-8"?>
<diff xmlns:diff='http://diff.org'>
<diff:copy src="2" dst="1">
<diff:copy src="16" dst="2" />
</diff:copy>
</diff>
<?xml version="1.0" encoding="utf-8"?><root>
<a/><a><a><c/><c><a><d/></a><c/></c></a><b><b/><a/><c/><b>
<c>
<d/>
</c>
</b></b><a/></a><d><a><c/><a/><a/></a><c/></d><c/>
</root>
<?xml version="1.0" encoding="UTF-8"?>
<diff />
<?xml version="1.0" encoding="utf-8"?><root><a>
<b>
<c>
<d/>
</c>
</b>
</a></root>
XML Diff issues
• Most diff algorithms are based on a paper published in 1976 by J. W. Hunt and M. D. McIlroy, An Algorithm for Differential File Comparison
• XML is not just text, it has a structure, text based diff programs do not take this into accordance
• simple example: <footie/> versus <footie></footie>logically these are equal
Ant: transform_src
<target name="transform_src"><java classname="net.sf.saxon.Transform"
fork="true"failonerror="false"output="${current_xslt_file}.xml">
<arg value="${source_xml}"/><arg value="${current_xslt_file}"/>
</java></target>
Ant: fitness_src
<target name="fitness_src"><java classname="TreeDiffMerge"
fork="true"failonerror="false"output="${current_xslt_file}.fitness.xml">
<arg value="-d"/><arg value="${current_xslt_file}"/><arg value="${target_xml}"/>
</java></target>
Step 2. Select individuals
• Probabilistic selection to choose which individuals participate in genetic operation
Selected XSLT population
Select individuals for genetic operations, based on their fitness
A word on fitness
• Raw fitness: is the natural representation in terms of the specific problem
• Standardized fitness: lower the better• Adjusted fitness: lies between 0-1• Normalized fitness: lies between 0-1 with
sum of fitness values = 1• In our case the lower the number of
‘different’ nodes the better, use standardized fitness
Step 3. Primary Genetic Operations
Selected XSLT population
New generation
Reproduction
Individual reproduced into new generation
Step 3. Primary Genetic Operations
Selected XSLT population
New generation
Creates 2 offspring‘Mom’
‘Dad’
Crossover ( Recombination )
Select parents then crossover creates 2 offspring
Step 3. Primary Genetic OperationsCrossover ( Recombination )
‘Dad XSLT’‘Mom XSLT’
‘offspring xslt’
‘offspring xslt’
New generationSwap nodes between selected parent xslt
Step 3. Secondary Genetic Operations
• Mutation: is a form of random crossover
• Permutation: Reorganize nodes
• Editing: evaluate a set of nodes
• Encapsulation: takes a branch and replaces with 1 indivisible node
• Decimation: removes individual based on domain specific criteria
Step 3. Secondary Genetic Operations
mutation
‘selected XSLT’
Pick a node and randomly mutate
Completely new set of instructions
‘offspring xslt’
Step 3. Secondary Genetic Operations
permutation
‘selected XSLT’ ‘offspring xslt’
Permutated node order
Step 3. Secondary Genetic Operations
editing
‘selected XSLT’ ‘offspring xslt’
Replace node with evaluated expression
Step 3. Secondary Genetic Operations
encapsulation
‘selected XSLT’ ‘define new function’
Identify useful subtrees and encapsulate by defining new function
‘XSLT’
Step 3. Secondary Genetic Operations
decimation
Identify very poor fitness individuals and remove from population
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"></xsl:stylesheet>
<xsl:stylesheet/>
Ant: select, perform, and generate new population
<target name="select_crossover_population">….xslt transformation selected crossover using xslt</target>
<target name="select_reproduction_population">….xslt transformation selected reproduction using xslt</target>
<target name=“perform_genetic_operation">….genetic operations were performed using xslt</target>
<target name="generate_new_generation">…new individuals were copied over to a new directory</target>
Step 4. Generate X populations
• M= 500, g = 51• Set initial genetic operation probabilities:
90% crossover on selected individuals10% reproduction on selected individuals0% secondary operations on selected
individuals• Define termination criteria if you want an
ongoing process until a desired fitness is obtained.
• Iterate until done
Ant properties<project name=“early_genetic_trial" default="build" basedir=".">
<!-- setup ant-contrib//--><property name="ant-contrib.jar" location="c:\java\ant-contrib-0.3.jar"/><taskdef resource="net/sf/antcontrib/antcontrib.properties"
classpath="${ant-contrib.jar}"/>
<!-- genetic parameters//--><property name="genetic.pop_size" value=“500"/><property name="genetic.number_of_generations" value=“51"/><property name="gen.reproduction_probability" value=".10"/><property name="gen.recombinate_probability" value=".90"/>
<property name="gen.mutation_probability" value=“.0"/><property name="gen.permuation_probability" value=".0"/><property name="gen.editing_probability" value=".0"/><property name="gen.encapsulation_probability" value=".0"/><property name="gen.decimation_probability" value=".0"/>
<!-- xml properties //--><property name=“source_xml" value="c:\_genetic\generate_initial_population\source.xml"/><property name=“target_xml" value="c:\_genetic\generate_initial_population\target.xml"/>
<!-- xslt properties //-->…contained xslt properties
<!-- directory properties //-->…contained directory properties
<!-- report properties //--><property name="xslt.report" value="C:\java\jakarta-ant-1.5.1\etc\log.xsl"/>
Simplified Ant Build Target
<target name="build" depends="clean, create">
<antcall target=" generate_initial_population ">
<param name=“no_of_individuals" value=" ${genetic.pop_size}"/>
</antcall>
<antcall target=“initiate_genetic_run“/>
<antcall target=“report “/>
<echo message=“successfully ran genetic run”/>
</target>
Results
• Non-normative results indicate ok processing time e.g. PIII 128 meg RAM approx 7 minutes to solve this problem
• For simple mapping this was an effective technique
• Many times best fit were poorly performing XSLT, needed to add criteria to fitness that timed processing time
Ruminations
• Early success with XSLT approach proved the applicability of GA with xml based technologies
• Was easy to let people define a source and target xml
• Issues of speed and efficiency can be addressed later on
• How could I involve web services into such a process ?
GA Strategies to Consider
• Could directly apply the genetic algorithm directly with another language; java or c# ?
• Leverage existing XSLT approach and add SOAP as a new function/terminal via XSLT extension
Enhance existing Prototype
• augment XSLT approach and introduce web services into terminal/function set
• Needed a local repository of Web Services to add to existing function set
• Needed to enhance XSLT with a generic SOAP XSLT Extension which indirectly invokes a web services via WSDL definition
• Adjust generate initial population to include soap extension
Simple application compositionStep 0. Randomly generate initial population of xslt documents, this is
now a 2 stage process to include web services via new function
Step 1. evaluate fitness using via xml diff of target.xml to result.xml
Step 2. select individuals according to their fitness which can be used by step 3
Step 3. Apply primary and secondary genetic operations to generate new offspring population from selected individuals
Step 4. Repeat steps 1,2,3, to generate X number of generations
Step 5. choose best fit individual of last generation
Web Services Search Engine
• Long term storage in WSIL format• Data was stored in XML Xindice XML
Repository• Which is accessible via WebDav and
HTTP Get• Can query using XPATH• Harvested by a combination of google,
scanning and general web robot techniques
Manual Harvesting of Web Services
• Google ‘file: wsil’ or inspection.wsil
• Google ‘file: wsdl’
• Scanning common Application Server ports, sending simple SOAP messages
• Xmethods and general registries
• Did not want to bind to either WSDL or UDDI….
Simple WSIL example
<?xml version="1.0"?><inspection
xmlns="http://schemas.xmlsoap.org/ws/2001/10/inspection/">
<service> <description
referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="http://example.com/stockquote.wsdl" />
</service></inspection>
WSIL with 2 services<?xml version="1.0"?><inspection xmlns="http://schemas.xmlsoap.org/ws/2001/10/inspection/" xmlns:wsiluddi="http://schemas.xmlsoap.org/ws/2001/10/inspection/uddi/"> <service> <abstract>A stock quote service with two descriptions</abstract> <description referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="http://example.com/stockquote.wsdl"/> <description referencedNamespace="urn:uddi-org:api"> <wsiluddi:serviceDescription location="http://www.example.com/uddi/inquiryapi"> <wsiluddi:serviceKey>4FA28580-5C39-11D5-9FCF-BB3200333F79</wsiluddi:serviceKey> </wsiluddi:serviceDescription> </description> </service> <service> <description referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="ftp://anotherexample.com/tools/calculator.wsdl"/> </service> <link referencedNamespace="http://schemas.xmlsoap.org/ws/2001/10/inspection/" location="http://example.com/moreservices.wsil"/></inspection>
inspection.wsil at XMETHODS<?xml version='1.0' encoding='UTF-8'?><inspection xmlns='http://schemas.xmlsoap.org/ws/2001/10/inspection/'
xmlns:wsiluddi='http://schemas.xmlsoap.org/ws/2001/10/inspection/uddi/' xmlns:wsilxmethods='http://schemas.xmethods.net/ws/2001/10/inspection/'>
<service> <abstract>Get the Barnes & Noble price by ISBN</abstract> <description referencedNamespace='http://schemas.xmlsoap.org/wsdl/'
location='http://www.abundanttech.com/webservices/bnprice/bnprice.wsdl'/> <description referencedNamespace='http://www.xmethods.net/'> <wsilxmethods:serviceDetailPage
location='http://www.xmethods.net/ve2/ViewListing.po?key=uuid:C5119582-90AC-51E7-72AA-ED7D8927C9D1'>
<wsilxmethods:serviceID>272507</wsilxmethods:serviceID> </wsilxmethods:serviceDetailPage> </description> </service>…..</inspection>
XSLT Generic SOAP client
• Created extension function in SAXON, which grew out of a SOAP debugging tool effort ( another talk ! )
• Ability to invoke a web service via WSDL and randomly choose web service
• Web service invocation called during xslt transformation
• Function prototype: ws:invoke(wsdl,methodname,nodeset)
Example of using a web service in XSLT
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ws=“http://www.ruminate.co.uk/ws”version=“2.0"> <xsl:template match="a"><xsl:value-of
select=“ws:invoke(‘http://somewsdlfile.wsdl’,’getGUID’,a)”/>
<b/> </xsl:template></xsl:stylesheet>
Issues• Step 0 generation required additional stages, to introduce ws:invoke
combined with WSIL information• Encapsulation was applied to xslt statements that contained
ws:invoke function, so crossover would not change the statement• Always choose 1st method ( in order ) in WSDL• Step 0 consistently generated highly unfit programs, required larger
population size• Mutation seeding ws:invoke statement vastly speeded up process• New timeout factors necessary• GA process significantly slowed down due to inclusion of web
services• GA process was more effective with better fitness evaluation; e.g.
ranking fitness consisted of 3 source and targets
Objective Generate an xslt program that multiplies 2 numbers, converts to Celsius and returns number in Chinese
Terminal Set <a/>, <b/> ( 2 numbers )
Function Set Subset of xslt instructions + ws:invoke
Fitness Cases three fitness cases
Raw fitness Node count on xmldiff patch file difference between result xml and target xml
Parameters M=1000, G=51
Results
• Multiply 2 numbers convert to Celsius and result should be in Chinese: average 2 hours
• Tried a variety of more complicated problems, with many runs never converging to a solution; It is apparent that there is not enough ‘genetic material’ online yet
• Prototype proved that GA can be applied• Assisting GA always speeded up the process• Many optimization opportunities
Enhancement
• Could have used Dimitri Novachtev’s FXSL, though this would have imposed a pure fp viewpoint on process
• Use UDDI as web services repository• Applied GA to ANT or xml pipeline, or even to
BPEL, WS-CAF or any xml vocabulary• Prototyping with ANT was successful, but
eventually will embed in a software framework
The Internet as a maturing Software Framework
• Inheritance versus composition resuse mechanism
• Hierarchical versus relational data models
• Synchronous versus asynchronous
• Stateful versus stateless
• Declarative versus OO versus procedural
• Coarse grained versus RPC versus Object based web services
Conclusion In 5 years time will there be advances in hardware processing to make GA
techniques viable?
problem domain experts can formulate representation of a problem to be solved using simple xml
Coders become farmers
Its counter intuitive to generate a million line ‘messy’ program to solve a problem
Are there any amends/changes to key specifications that will assist or restrict the GA method ?
Thank you, any questions ?
References
• JOHN R KOZA, Genetic Programming, MIT Press 1992• W3C, SOAP Version 1.2 • W3C, XML Version 1• W3C, XSLT Version 2: • W3C, WSDL Version 1:• WSIL Version 1• J. W. Hunt and M. D. McIlroy , An Algorithm for
Differential File Comparison published in 1976• SAXON XSLT PROCESSOR by Michael Kay,
http://saxon.sourceforge.net• ASF ANT, http://ant.apache.org• FXSL, Dimitre Novatchev http://fxsl.sourceforge.net