ToolKit for Large-SCAle studies of Web documents

17
SI4 June Project Imane BELLAT – Bastien BLANCHARD Clément TOCHE – Maurice YARED * ToolKit for Large-SCAle studies of Web documents 1

description

ToolKit for Large-SCAle studies of Web documents. SI4 June Project Imane BELLAT – Bastien BLANCHARD Clément TOCHE – Maurice YARED. WorkFlow with Modularity. Initial WorkFlow. More Complex WorkFlow. Agenda. Project Subject. Problems. Outlooks. Some Tests. - PowerPoint PPT Presentation

Transcript of ToolKit for Large-SCAle studies of Web documents

Page 1: ToolKit for Large-SCAle studies of Web documents

1

SI4 June ProjectImane BELLAT – Bastien BLANCHARD

Clément TOCHE – Maurice YARED

*ToolKit for Large-SCAle studies of Web documents

Page 2: ToolKit for Large-SCAle studies of Web documents

2

Accessibility : multiple paradigms

*Web documents evaluation process*Co-operating services

*Evaluations and Requirements*Web Accessibility Initiative (WAI)

*Checkpoints from the Web Content Accessibility Guidelines

*Additional Services*Page Rank

*W3C Validation

*Pretty Print of statistics

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 3: ToolKit for Large-SCAle studies of Web documents

3

Scalability Tools*Grid Workflow Efficient Enactment for Data Intensive Applications (Gwendia)*Workflow description framework

*Grid Infrastructure

*Optimizing distributed computation

*MOTEUR as WorkFlow Manager

*Goal*Accessibility tests on elements in HTML, CSS, Script code

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 4: ToolKit for Large-SCAle studies of Web documents

4

*Agenda

*Project Subject*Initial WorkFlow*Vision of Architecture*Code Implementation*WorkFlow Structure

*WorkFlow With Modularity*Concept*WorkFlow Structure

*More Complex WorkFlow*Vision of Architecture*WorkFlow Structure

*Some Tests*Problems*Outlooks

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 5: ToolKit for Large-SCAle studies of Web documents

5

1•Data entrance

2•Loading data on pages

3•Retrieving elements

4•Processing elements

5•Results processing

Vision of Architecture

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 6: ToolKit for Large-SCAle studies of Web documents

6

Code Implementation*Subdivision of WorkFlow in Processors

*Encoding processors* BeanShell *Internal MOTEUR Java Code

*Directly deployed on the grid

*Common WebServices*Accessible through Web Service Description Language (WSDL)

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 7: ToolKit for Large-SCAle studies of Web documents

7

WorkFlow StructureData Entrance

Loading Data

RetrievingElements

Processing data

Processing results

Page 8: ToolKit for Large-SCAle studies of Web documents

8

Concept of Modularity*Meeting the needs of developers wishing to expand treatments

*Communication constraints

*Modularity :*By Layers

*By Processors

Contract Models Implementation

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 9: ToolKit for Large-SCAle studies of Web documents

9

Next Layer

Previous Layer

Data contract

Data flow

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 10: ToolKit for Large-SCAle studies of Web documents

10

WorkFlow StructureData Entrance

Loading data

Retrieve elts

Retrieve elts 2

Processing data

Processing results

Page 11: ToolKit for Large-SCAle studies of Web documents

11

1• Data entrance

2• Loading domain

3• Loading data on pages

4.1• Retrieving elements level 1

4.2• Retrieving elements level 2

5• Processing elements

6• Results processing

Vision of Architecture

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 12: ToolKit for Large-SCAle studies of Web documents

12

WorkFlow StructureData Entrance

Loading data

Retrieve elts

Retrieve elts 2

Processing data

Processing results

Domain Loader

Page 13: ToolKit for Large-SCAle studies of Web documents

Polytech W3C ServPubl0.000.100.200.300.400.500.600.700.800.901.00

Tests over domains

AccessW3C Valid

0 1 2 3 4 5 6 7 8 9 100.000.200.400.600.801.00

Page Accessibility per Rank

0 1 2 3 4 5 6 7 8 9 100.000.100.200.300.400.500.600.700.800.901.00

Validity Rate

13

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Pretty Printed Statistics

Page 14: ToolKit for Large-SCAle studies of Web documents

14

Problems Encountered* MOTEUR has no final release version

*MOTEUR Simulation use Common JVM Allocated Memory

*Insufficient for large scale data flows

*WorkFlow Graphic Modeling not Malleable

*Very Useless Debugging Platform

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 15: ToolKit for Large-SCAle studies of Web documents

Some improvements* Implementation of all WAI checkpoints tests

*With CSS parsing

*Dynamic scripts parsing

*Implementation of GlassFish WebServices in JGASW provider

*Implementation of Security Manager Modules to WorkFlow

*Defining automation for new processors implementation

*Conditioning every modification of the MOTEUR Workflow according to contracts

*MOTEUR modeling for neural networks 15

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 16: ToolKit for Large-SCAle studies of Web documents

16

* What we have learnt :

*WorkFlow modeling

*Team working and project planning

*Facing and resolving unplanned issues

*Working on a research-oriented project

*Getting along with additional requests from a

client

Project Subject Initial WorkFlow WorkFlow with Modularity More Complex WorkFlow Some Tests Problems OutlooksAgenda

Page 17: ToolKit for Large-SCAle studies of Web documents

17

"The power of the Web is in its universality. Access by everyone regardless of disability is an essential

aspect.“Tim Berners-Lee, W3C

Director and inventor of the World Wide Web

Thanks for your attention