WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the...

33
www.clarin-d.net WebLicht Tool Integration Erhard Hinrichs, Marie Hinrichs, Wei Qiu University of Tübingen

Transcript of WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the...

Page 1: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

WebLicht ToolIntegration

ErhardHinrichs,MarieHinrichs,WeiQiuUniversityof Tübingen

Page 2: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

Introduction

• WebLicht (Web-basedLinguisticChainingTool)– Motivation– Overviewofthearchitecture– TCF(textcorpusformat)data-exchangeformat

• Integration– RepositorySets– Comet(CMDIorchestrationmetadatatool)– CenterRegistryUpdate

• ToolTesting– Bombard(toolscalabilitytesting)– Awesome(verifyactualtoolinput/outputagainstmetadata)

• UsageStatistics– Piwik

• Links

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 1

Page 3: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

Motivation

Manynaturallanguageprocessingtoolsmustbeinvokedfromthecommandlineorfromprogrammingcode.• Sometimesdifficulttoinstall• Notalwaysavailableforalloperatingsystems• Confusingforthosenotaccustomedtorunningcommandlinetoolsorwhodon’tliketoprogram

• Notpossibletocreatesomeannotationswithonetoolandotherannotationswithadifferenttool– Input/Output formatsdiffer

2WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia

Page 4: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

Overview

WebLicht• MakesNLPtoolsavailableonlineandenablesuserstomix-and-matchtoolstosuittheirneeds.

• TheWebLichtwebinterfaceguidestheconstructionandexecutionofprocessingchains,andprovidesvisualizationofresults.

• NLPtoolsaremadeavailablebyCLARINcentersaswebservices.

3WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia

Page 5: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

Overview

WebLichtisanexecutionenvironmentfornaturallanguageprocessingpipelines:● Usesaservice-orientedarchitecture(SOA)● Webservicesarecombinedtoformachain● ChainsareexecutedviasequentialHTTPPOSTrequeststoservicesonthechain

● Theoutputofservicen istheinputtoservicen+1 inthechain● MostservicesuseTextCorpusFormat(TCF)astheirinputandoutput

● Servicesaddoneormoreannotationlayer(s)

4WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia

Page 6: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

MainComponents

• Services fordataprocessing,distributed• Repositories containingmetadataabouttheservices• Harvester

– Collectsservicemetadatafromrepositories• Chainingengine

– Guidestheconstructionofvalidchains– Executeschains

• Webinterfaceforbuildingandexecutingchains• WaaS (WebLicht asaService)

– RESTfulwebservicetoexecutechains– Canbecalledfromcommand-line,scripts,etc.

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 5

Page 7: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ComponentInteractions

6WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia

Page 8: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Motivation

TCF(textcorpusformat)isWebLicht’s data-exchangeformat• Servicechainsaremeanttobebuiltflexibly– Mix-n-match,Legostyle– Not“oneservicedoesitall”

• Outputofservicen isinputofservicen+1• Servicesrelyontheannotationsintheinput• Servicesmustbeableto– Readannotationsproducedbypreviousservices– Writeannotationsforusebylaterservices

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 7

Page 9: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Details

• Eachserviceaddsoneormoreannotationlayers• Servicesarenotpermittedtochangeorremoveexistingannotations

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 8

Page 10: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Details

TCFisanXMLformatforstoringlinguisticannotations• Text,tokens,sentences,lemmas,part-of-speech,morphology,parsing,etc

• Standoffformat– EachtypeofannotationisstoredinaseparateXMLelement

– tokenlayercanbeseenasthecentral,atomicelementtowhichotherannotationlayersrefer

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 9

Page 11: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Example

<?xmlversion="1.0"encoding="UTF-8"?><D-Spinxmlns="http://www.dspin.de/data"version="0.4"><MetaData xmlns="http://www.dspin.de/data/metadata">

Theexecutionenginerecordsmetadataabouttheservicesusedtocreatethedocumenthere.

</MetaData><TextCorpus xmlns="http://www.dspin.de/data/textcorpus"lang="de"><text>Karinfliegt nach NewYork.Sie willdort Urlaub machen.</text>

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 10

Page 12: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Example

<tokens><token ID="t_0">Karin</token><token ID="t_1">fliegt</token><token ID="t_2">nach</token><token ID="t_3">New</token><token ID="t_4">York</token><token ID="t_5">.</token><token ID="t_6">Sie</token><token ID="t_7">will</token><token ID="t_8">dort</token><token ID="t_9">Urlaub</token><token ID="t_10">machen</token><token ID="t_11">.</token>

</tokens>

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 11

• EachtokenhasauniqueID

• Allotherannotationlayers(exceptthetextlayer)referencetokensdirectlyorindirectly

Page 13: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Example

<sentences><sentenceID="s_0"tokenIDs="t_0t_1t_2t_3t_4t_5"></sentence><sentenceID="s_1"tokenIDs="t_6t_7t_8t_9t_10t_11"></sentence>

</sentences>

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 12

Page 14: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Example

<POStags tagset="stts"><tagID="pt_0"tokenIDs="t_0">NE</tag><tagID="pt_1"tokenIDs="t_1">VVFIN</tag><tagID="pt_2"tokenIDs="t_2">APPR</tag><tagID="pt_3"tokenIDs="t_3">NE</tag><tagID="pt_4"tokenIDs="t_4">NE</tag><tagID="pt_5"tokenIDs="t_5">$.</tag><tagID="pt_6"tokenIDs="t_6">PPER</tag><tagID="pt_7"tokenIDs="t_7">VMFIN</tag><tagID="pt_8"tokenIDs="t_8">ADV</tag><tagID="pt_9"tokenIDs="t_9">NN</tag><tagID="pt_10"tokenIDs="t_10">VVINF</tag><tagID="pt_11"tokenIDs="t_11">$.</tag>

</POStags>...

</TextCorpus></D-Spin>

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 13

Page 15: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

TCF:Processing

• MoredetailedinformationaboutTCFisavailableintheDevelopersManualontheweblicht-wiki:– schema– onlinevalidator– Tutorialforreading/writingTCFusingtheJavalibrary

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 14

Page 16: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

BuildingaTool

ToDo’s forimplementingaWebLicht tool:• Gatherresourcesneededforthetool– Models,lists,etc

• Implementthetoolasawebservice• Deploythewebserviceonapublicserver

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 15

Page 17: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

IntegratingaToolintoWebLicht

MakingatoolvisibletoWebLicht:• Createaset inyourcenter’srepositoryforWebLicht webservices

• CreateCMDImetadataforthetoolandplaceitintherepositoryset

• AssignaPIDtothetool• Updateyourcenter’sdataintheCenterRegistrytoenableharvestingbyWebLicht

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 16

Page 18: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

RepositorySet

• EachcenterofferingWebLicht servicesmustcreateaset intheirrepositoryformetadataaboutthosewebservices– MetadataforWebLicht servicesshouldbepartofthisset– TheWebLicht harvesteronlyretrievesthosesets– Avoidsharvestingofunnecessarydata

• Procedureforcreatingaset dependsontherepositorysoftware(fedora,dspace,…)

• Setnamecanbeanything,butkeepitsimple(nospacesorspecialcharacters)– e.g.WebLichtWebServices

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 17

Page 19: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

CreatingToolMetadatawithComet

Comet(CMDOrchestrationMetadataEditingTool)canbeusedtocreate CMDImetadatausingtheWebLichtWebService profile• Twomainsections– GeneralInformation

• Toolname,description,PID,…– "Orchestration" Information

• Toollanguage,input/outputspecification

• Neededbythechainingandexecutionengineto– Guideusersinbuildingvalidchains– Executeaprocessingchain

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 18

Page 20: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

CreatingToolMetadataWithComet

CometcanbeusedtocreatemetadataforaWebLicht webservice:• createfromscratch• uploadmetadataforediting

• useexistingmetadataasatemplate

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 19

Page 21: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

CreatingToolMetadataWithComet

GeneralInformation

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 20

Page 22: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

CreatingToolMetadataWithComet

OrchestrationInformation

Input(TCF):• tokens• lang(de|en|fr|it)

Output(addstoinput):• lemmas• part-of-speechtags– tagset dependsoninputlanguage

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 21

Page 23: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

UpdateCenterRegistryData

Ask your CenterRegistrycontact person to add these elements to yourcenter‘s data:• WebServicesSet

– Thename of the set inyour repository containing CMDImetadata for WebLicht webservices

– No spaces or special characters allowed– e.g.WebLichtWebServices

• WebServiceType:WebLicht• MetadataScheme:CMDI• OaiAccessPoint

– URLfor harvesting– e.g.http://repository.my.org.countrycode/oaiprovider

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 22

Page 24: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ToolTesting

Someusefultestingsoftwarefortooldevelopers:• Bombardforscalabilitytesting• Awesomefortestingmetadatacorrectness

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 23

Page 25: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ToolTesting:Bombard

Bombardforscalabilitytesting• Simulatesusersinvokingtoolchains• Flexibletest-caseconfiguration• Multipletestcasescanberunsimultaneouslyduringone“bombardment”

• Reportsstatisticsforeachtool:– successes/failures– averageprocessingtimes

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 24

Page 26: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ImprovingScalability

One way to improve scalability of webservices:

• Jesque - adistributedtaskqueueframework• Exploittheparallelprocessingcapabilitiesofmodernserversandcomputingclusters– parallelprocessingwithinrequests– concurrentprocessingofrequests– guaranteeswithrespecttousageofresources– fairness(smallrequestsshouldnothavetowaitforlargeones)

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 25

Page 27: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ToolTesting:Awesome

Awesome(AutomatedWEbService OrchestrationMonitoringEnvironment)• Teststhatservices’orchestrationmetadatamatchesactualusage

• POSTsasmalltestfilecontainingtherequiredinputannotationstotheservice

• Checksthattheoutputreturnedcontainstheexpectedadditionalannotationlayers

• Runstestsautomaticallyatregulartimeintervals– Reportsgroupedbycreator,errorcode,orseverity

• Cantestindividualservices– SpecifyPID,inputfile,expectedoutputfile

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 26

Page 28: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ToolTesting:Awesome

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 27

Page 29: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

StatisticswithPiwik

• WebLicht usestheCLARINinstallationoftheanalyticsplatformPiwik togatheruserstatistics

• EachwebserviceinvocationfromWebLicht orWaaS isrecorded

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 28

Page 30: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

CurrentTools

• Languages:– German,Englishmostrepresented– SometoolsforFrench,Italian,Spanish,…– Needsupportformorelanguages

• Tools:– Tokenization/Sentencesplitters– POStaggers– Lemmatizers– Morphologicalanalyzers– Parsers

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 29

Page 31: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

WebLicht:Login

http://weblicht.sfs.uni-tuebingen.de/weblichtwiki/

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 30

Page 32: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

Links

• DeveloperManual– TutorialforcreatingWebLicht webservices– TCFindetail

• Comet forcreatingtoolmetadata• Bombard:wlsupport[at]sfs.uni-tuebingen.de• Awesome testswebservicesagainstmetadata• Harvester listofharvestedWebLicht services• Jesque toimprovescalability• WaaSWebLicht asaService

WebLichtToolIntegration,10.11.2016,Ljubljana,Slovenia 31

Page 33: WebLicht Tool Integration...Statistics with Piwik • WebLicht uses the CLARIN installation of the analytics platform Piwikto gather user statistics • Each web service invocation

www.clarin-d.net

ThankYou!