Stetl for INSPIRE Data Transformation

Post on 11-May-2015

546 views 1 download

Tags:

description

Slides of presentation given at EuroGeographics KEN workshop on INSPIRE Data Harmonization, Paris oct 8-9, 2013: http://www.eurogeographics.org/event/inspire-ken-schema-transformation-workshop. Describes the Stetl ETL framework and cases of INSPIRE transformation. There is a video recording of this presentation: https://www.youtube.com/watch?v=vjdpYBm4AaM (first about XSLT and about halfway on Stetl for INSPIRE)

Transcript of Stetl for INSPIRE Data Transformation

INSPIRE Transformation with Stetl-

A lightweight Python Framework for Geospatial ETL

Just van den BroeckeEuroGeographics - KEN Workshop

Paris, Oct 8, 2013www.justobjects.nl

About MeIndependent Open Source Geospatial Professional

Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep

Just van den Broeckejust@justobjects.nl www.justobjects.nl

We have a Problem

The Rich GML Problem

Rich GML = Complex Mess

INSPIRE Dutch National Datasets

Germany: AFIS-ALKIS-ATKISUK: OS Mastermap

.

.

“Semi GML” e.g. Dutch Addresses & Buildings (BAG)

ArbitraryNesting

The Street Name!

A Street Element in an INSPIRE Annex I Address..

Complex Model

Transformations

100+ MBGML Files

Millionsof

Objects

10s of Millionsof

<Elements>

MultipleTransformation

Steps

Solution is Spatial ETL

But How ?(with FOSS)

FOSS ETL - DIY ? Maybe

FOSS ETL - High Level

FOSS ETL - Lower Level

Each powerful individually but cannot do the entire ETL

ogr2ogr

FOSS ETL - How to Combine?

=+ + ?ogr2ogr

Example - 2011 Kadaster ESDIN

http://inspire.kademo.nl/doc/design-etl.html

Good ideas buthard to scale and reuse. Need Framework

FOSS ETL : Add Python to Equation

=+ + ?( )ogr2ogr

=+ +

Stetl

( )ogr2ogr

Stetl=

SimpleStreaming

SpatialSpeedy

ETL

GML1

GML2

Stetl

From Barrels of GML to Maps

From Local National Datato INSPIRE DL Services

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl

StetlConcepts

Process Chain

Input Filter OutputFilter

Stetl concepts

Source Target

Process Chain

Input Filter Outputgml

Filter

Stetl concepts

Example: GML to PostGIS

Reader ogr2ogr

gml

Stetl concepts

Example: INSPIRE Model Transform

ogr2ogr XSLT Writergml

Stetl concepts

Simple Features

Complex Features

Example: deegree Store

ogr2ogr XSLTdeegreeWriter

Stetl concepts

Or viaWFS-T

Process Chain - How?

Input Filters Output

Stetl concepts

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Example: XML to Shape

The Source

Example: XML to Shape

XMLInput

Example: XML to Shape

XMLInput

XSLTFilter

Example: XML to Shape

Prepare XSLT Script

Example: XML to Shape

XSLT GML Output

Example: XML to Shape

XMLInput

XSLTFilter

ogr2ogrOutput

Example: XML to Shape

The Stetl Config File

ProcessChain

XMLInputXSLT

Filter

ogr2ogrOutput

Running Stetl

stetl -c etl.cfg

Result Shapefile viewed in QGIS

Installing Stetl

via PyPi

Deps•GDAL+Python bindings•lxml (xml proc)•psycopg2 (Postgres)

sudo pip install stetl

Speed: Streaming

Input Filter Output

gml

Stetl concepts

Speed: Going Native

Input Filter Outputgml

ogr2ogr StetlStetl

Native C Libs/Progs

Calls

Stetl concepts

Example Components

Input Filters Output

Stetl concepts

XMLFile XSLT GMLFile

ogr2ogr XMLAssembler ogr2ogr

LineStream XMLValidator WFS-T

deegree* FeatureExtractor deegree*

YourInput YourFilter YourOutput

Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMAT

log = Util.get_log("xsltfilter")

class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close()

def invoke(self, packet): if packet.data is None: return packet return self.transform(packet)

def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet

[etl]chains = input_xml_file|my_filter|output_std

[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml

# My custom component[my_filter]class = my.myfilter.MyFilter

[output_std]class = outputs.standardoutput.StandardXmlOutput

class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)

def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet

Your Own Components

Stetl concepts

Step 1- Define Class

Step 2- Config Class

Data Structures

Stetl concepts

• Components exchange Packets• Packet contains data and status• Data formats, e.g. :

xml_line_stream etree_docetree_element (feature)etree_element_arraystringany..

deegree Integration

Stetl concepts

•Input DeegreeBlobstoreInput•Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput

Cases - The Netherlands

•INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed)

•National GML Datasets GML to PostGIS (Top10NL, BGT)

[etl]chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres

# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql

# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql

# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}

[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}

# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMember

Top10NL Extract

ParameterSubstitution

Top10NL+BAG (Dutch Topo + Buildings)

BGT - Dutch Large Scale Topo

Cases - INSPIRE Transforms

•Simple: Dutch Admin Borders to AU

•Advanced: Dutch Addresses to AD

INSPIRE - XSLT STRUCTURE

Local CP GMLto

INSPIRE SpatialDataset

Local CP GMLto

INSPIRE GML

GenerateCP INSPIRE GML

ReusableXSLT ScriptsReusable

XSLT Scripts

Theme CP

Local AU GMLto

INSPIRE SpatialDataset

Local AU GMLto

INSPIRE GML

GenerateAU INSPIRE GML

Theme AU

Local GN GMLto

INSPIRE SpatialDataset

Local GN GMLto

INSPIRE GML

GenerateGN INSPIRE GML

Theme GN

Called by All

Locally Specific XSL

GenericXSL

XSLT Template Call

XSLT - 3 MAIN STEPS/SCRIPTS

1.Generate Spatial Dataset GML Container (specific)

2.Extract data values from local OGR simple feature data (specific)

3. Call XSLT template per Theme Feature type (generic)

XSLT AU - STEP 1

XSLT AU - STEP 2

XSLT AU - STEP 3

XSLT - REUSE

STETL CONFIG

STETL CONFIG AD

Case: INSPIRE DL Services - Dutch Addresses

Source<GML>

NLExtractStetl deegree

WFS

INSPIRE<GML>

AtomFeed

INSPIREAddresses

DutchAddresses+

Buildings

deegreeblobstore

Stetl

Other Uses (Geocoder etc)

Project Status - Sept 21, 2013

• v1.0.4 installable via PyPi• Documentation on www.stetl.org • Real world transforms done• Seeking feedback, support and contributors

Rich GML Problem Solved?