Running long and complex processes with PostGIS2010.foss4g.org/presentations/3548.pdf · 2010. 11....

Running long and complex processes withPostGIS

Vincent Picavet FOSS4G 2010 - Barcelona

Oslandia, who's that ?

Oslandia

Young French SME specialised in Open Source GIS

PostGIS experts: Vincent Picavet & Olivier Courtin

- Mainly Focuses on: - Spatial Databases (PostGIS, SpatiaLite) - OGC, ISO, INSPIRE Standards and SDI architecture - Complex analysis : Routing, Network and Graphs Solutions

Oslandia's ecosystem:

Oslandia's Technologies

3D GDAL GEOS

GRASS GraphServer INSPIRE MapServer

OGC PgRouting PostGIS

PostgreSQL Spatialite TinyOWS

TileCache PyWPS QGIS

Oslandia, Find us at FOSS4G

Running long and complexes processes with PostGIS

Vincent Picavet, Wednesday - 12h00 – Sala 6

PostGIS meets the third dimension

Olivier Courtin, Wednesday - 12h30 – Sala 6

State of the Art of FOSS4G for Topology and Network Analysis

Vincent Picavet, Thursday – 14h30 – Sala 5

Breakout Session: Spatial DatabasesCode Sprint on Friday: PostGIS

Oslandia : Bronze Sponsor

What you'll see and do next

Step 1 : Use case presentation Step 2 : Special use characteristics Step 3 : Issues and solutions Step 4 : Conclusion Step 5 : Perspectives

Step 6 : Stay here for Olivier's presentation Step 7 : Run for lunch

Step 1 : Use case

Use case Road network data (TA) + Custom client data linked to the network Initial network data imported in 2004 Parallel evolution during 4 years

Client modified road network data TA modified road network data

No ID stability on TA data→ data de-synchronization

Same-same, but different

Use case

Red : custom dataBackground : road network (rasterized)

Left 2004 right 2008

Desynchronization

Re-synch custom data with up-to-date network Graph pairing

= match networks streets, nodes, road sections Re-link or rebuild custom data on new network Have a full road network data update process

Automate this process Enable fully automated and regular data update

Process

Our processLoad dataGraph pairing modules (nodes, streets, sections)

Semantic, topological and geometrical subprocesses

Export output data

Facts & numbers Our data set

70% of french population (~40M)

50 Tables

10M rows

150Go at end of process

30K SQL and plpgsql lines

3000 queries, 6000 Python lines Our dev team

3 Mapinfo users and 1 PostGIS expert

Results

2004 → 2008 : 70% road sections pairing 93% custom data pairing

2008 → 2009 : 99% road sections pairing 99.95% custom data pairing

Less difference between networks Custom data have been cleaned

Step 2 : Characteristics

Use case characteristics

«ELT» : Extract, Load, Transform PostgreSQL + PostGIS + external tools

Big volumes Long, heavy and complex computation process

Global production time ~ 20 days Pairing : 5 days

Long SQL transactions

Step 3 : Issues and solutions

Issues and solutions

#1 − Hardware and server configuration#2 − Testing#3 − Monitoring#4 − Dealing with corner cases#5 − Splitting process#6 − Stability#7 − Optimization#8 − Process improvement

Almost all of this is linked to theway you design your process.

#1 − Hardware and configuration

Adapted hardware is essential Buy RAM Buy more RAM Buy more RAM Buy disks Buy faster disks

Server configuration is hard System monitoring Depends on the process Dynamic configuration

Fine-tune according to query plan Needs experience Needs testing

And use it !

#2 − Testing

Testing for correctness Ok on sample for development Corner case problems on full data

Testing for performance Meaningless on samples Very long on real data

Solutions Split process «Unit test» modules Guess and oversize everything

#3 − Monitoring, validating – MVCC MVCC = Multi-View Concurrency Control → concurrent access on data → Transactions isolated until committed → No easy way to access a running transaction

Use smaller transactions Sequence monitoring : sequences live out of MVCC

nextval('myseq') in query currval('myseq') gets progression

#3 − Monitoring and validating

System monitoring Memory, disk access Shows process stability and steps

Post-process monitoring and validation Log analysis Validation processes on result tables Statistics on result tables

Intra-process monitoring and validation Split process

#4 − Corner cases - issue

Computations with geometry is

not an exact science

<= Data error & imprecision<= Floating point models limits<= Robustness of algorithms<= Error propagation

#4 − Corner cases - issue

99.999999% success1M rows

Every additional «9» costs a lot more than precedent Performance-wise, code complexity-wise

Success rate drops with computation complexity<= Error propagation

→ Impossible to predict all corner cases

1 Geometry computation error

transaction fails !

#4 − Corner cases – Actual Solutions

Split process in chunks

Preprocess and simplify data Snap to grid (= reduce input precision) Simplify

Catch errors to ignore them Using exception catching in plpgsql Not precise enough (catch all) Less stability

#4 − Corner cases – Potential solutions

Finely handle errors Specific exceptions Discuss use cases to decide returning NULL or error

Change floating point models Enable custom FP models (In JTS and GEOS, not PostGIS) Dynamic floating point precision model Exact computation (costs a lot)

More robust algorithms

#5 − Split your process

Split computations Split data Not possible in plpgsql

<= no nested transaction Needs a process driver

Python is our driver Enables

Intra-process operations Backup, validate, stats, monitor…)

partial computation & diff updates // computation

Python driver =>

#5 − Split your process

#6 − Stability

Memory management in PG is smart Memory allocated and freed per transaction context PostGIS uses it, not GEOS

Longer transactions Some GEOS memory leaks Catching geometric errors

Use recent PostgreSQL release Do. Not. Use. Windows. Servers. Ever. (we did)

increaseinstability

#7 − Optimizing

Indexes Necessary for geometric operations Must be finely tuned Drop, modify, recreate (automated in plpgsql)

Constraints Same : drop, modify, recreate Or replace by validation steps

Maintenance Vacuum vs autovacuum

Quit plpgsql PostgreSQL C modules are fun ! − and efficient

#8 − Process improvement

Less geometry computation More topology and attribute-based processes Base computation on input data

Less computation errors Less error propagation Use original cleaned data

Use PostGIS mainly : in data preparation geometry rebuilding at the end

Step 4 : Conclusion

So what ?

It works ! Good results at the end

Ease of use for PostgreSQL/PostGIS newbie developers

With expert assistance on problematic points

Designing the process workflow carefully and thoroughly is the key

Step 5 : Perspectives

What more then ?

PostgreSQL improvement HOT standby => parallel work Nested transaction support ? Better autovacuum

In our case Horizontal process split effort Parallel processing Differential work

NoSQL «DB» ? Map/Reduce system

That's all folks !

Want to know more ?Ask now or write to :

Vincent Picavetvincent.picavet@oslandia.com

www.oslandia.com

Running long and complex processes with PostGIS2010.foss4g.org/presentations/3548.pdf · 2010. 11....

Documents

Transcript of Running long and complex processes with PostGIS2010.foss4g.org/presentations/3548.pdf · 2010. 11....

it’s about keeping your processes running...it’ s about keeping your processes running About AxFlow part of axel Johnson ab with sales of over 7 billion euro and 20,000 employees

Running Processes and Reports v92 · 2018-01-10 · Department Group Rate Report ... Table of Contents Running Processes and Reports PeopleSoft Procedure Manual v9.0 Page 3 How to

Running Airflow Workflows as ETL Processes on Hadoop

Today Exceptional Control Flow: Exceptions and Processes...Exceptional Control Flow Exceptions Processes Process Control 16 Processes Definition: A process is an instance of a running

Manual: Replication issues - Innovatint Task Manager and look at the processes running. Make sure you look at all processes by selecting the option ^Show processes from all users.

How to define and examine implicit processes? - UGentusers.ugent.be/~jdhouwer/bookproctor.pdf · 1 RUNNING HEAD: IMPLICIT PROCESSES How to define and examine implicit processes? Jan

MULTIPROCESSOR AND REAL -TIME SCHEDULING … · And this scheme is known as the Shortest-Remaining-Time-First (SRTF). Allows for running processes to be temporarily suspended. Processes

gimmenotes.co.za · Web viewConcurrent systems are those which support the concept of executing more than one applications or processes at the same time. Here the processes running

PROCESSES - rackspace.edgecloud.comrackspace.edgecloud.com/intro-to-linux/05-Processes.pdf · hard to see exactly when processes are running ( switched so quickly ), so the state

IT’S ABOUT KEEPING YOUR PROCESSES RUNNING

Containers/Docker...• Technically, a container is a Linux process, or many processes, which are running isolated from other processes on the system, using the chroot system call

Running Head: Principals’ Decision-Making Processes and ...

INNOVATIVE PRODUCT PORTFOLIO 2017/2018 · 11 12324567242 Cardano Suite Open Source 3D GIS system Cardano suite and the associated tools developed by Oslandia enable you to deploy

Commiters PostGIS PostGIS & QGIS - Be-opengis-fr · 2014-11-10 · Commiters PostGIS Be-OpenGIS 2014 - Bruxelles Olivier Courtin - Oslandia PostGIS & QGIS

The Art of Container Monitoring - s.itho.me The... · cAdvisor Collects, aggregates, processes, and exports information about running containers.

xStream App Director User Guide - Virtustream · 2. xStream App Director Features ... VMs Stopped ... Running Processes of SAP Components ...

Running Head: Principals’ Decision-Making Processes and Practices · 2018-10-10 · PRINCIPALS‘ DECISION-MAKING PROCESSES AND PRACTICES i Dedication This dissertation is dedicated

CA Process Automation - casupport.broadcom.com Automation Suite for Clouds... · the processes running across the enterprise, or can pause and resume a running process if necessary.

CHAPTER -6 GEOMORPHIC PROCESSES · features is called gradation. ... 1.Running Water ,2. ... Finally the surface of the earth is operated by different geomorphic processes and at

Running head: RECORDING TASK-ORIENTED READING PROCESSES Recording