INCOFISH WP5 MPAs on Continental Shelves Fisheries and Ecosystem Management.
INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino [email protected]...
-
Upload
sybil-richards -
Category
Documents
-
view
213 -
download
0
Transcript of INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino [email protected]...
![Page 1: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/1.jpg)
INCOFISH WP3 - Campinas, April 2006WEB Tools and Data Cleaning
Alexandre [email protected]
Centro de Referência em Informação Ambiental, CrIA
![Page 2: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/2.jpg)
WEB Tools and Data Cleaning
These tools were developed within the scope of thespeciesLink project, so, in some cases, there is a
complete dependency on the architecture, the localdatabase, and the libraries that were developed by CRIA.
Data Cleaning started as an idea that had not a very clear direction, it became a very particular system.
![Page 3: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/3.jpg)
The speciesLink
project is being
funded by
FAPESP (São
Paulo state
agency) from
October, 2001 to
October, 2005.
![Page 4: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/4.jpg)
Col 1
Col 2
Col 3
Col 4
Col 5
program
search
interface
Win2000Brahms
LinuxMySQL
Win98Access
Win98biota FreeeBSD
PostgreSQL
??
??
?
Different data sources software and systemsDifferent data sources software and systems
![Page 5: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/5.jpg)
Protocol and Content SchemaProtocol and Content Schema
• DiGIR protocol (Distributed Generic Information Retrieval)
Potential to be globally accepted
• DiGIR software (Java Portal & PHP Provider)
Collaborative development
• DarwinCore v.2
Covers the basic content elements (taxonomic
identification, location and date of collecting event)
![Page 6: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/6.jpg)
speciesLink site
Presentation Layer
speciesLink site
Presentation Layer
DiGIRPortal(Java)
DiGIRPortal(Java)
PerlPerl
Slow or unstable connectivity
Fast and stable connectivity
DataSOAP client
CollectionManagement
System
SQL
Collection C
DataRepository
DataSOAP client
CollectionManagement
System
SQL
Collection B
DataRepository
PostgresPHP
Provider
SOAP Server
SQL
Mirror Server
DataPHP
Provider
Collection Management
System
SQL
Collection A
System’s System’s ArchitectureArchitecture
![Page 7: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/7.jpg)
~40 connected collections~40 connected collections
~940.000 on-line records~940.000 on-line records
March/2006March/2006
JBRJ
speciesLink network
![Page 8: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/8.jpg)
WEB ToolsWEB Tools
• geoLoc
• spOutlier
• infoXY
• conversor
• speciesMapper
• data cleaning
![Page 9: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/9.jpg)
About geoLoc
to assist biological collections in geo-referencing their data
the database includes approximately 110 thousand names of Brazilian localities, obtained from:
Brazilian Institute of National Statistics and Geography (IBGE) GEOnet Names Server (GNS) speciesLink/Fapesp
algorithm based on concepts in the Egaz program (Shattuck 1997) capable of calculating a coordinate for a distance and direction
ToolsTools
![Page 10: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/10.jpg)
26 Noroeste-NW
Campinas São Paulo
![Page 11: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/11.jpg)
ToolsTools
About spOutlier
to assist biological collections in identifying possible suspect points in existing records
uses techniques modified from Chapman 1999 to detect outliers in latitude, longitude and altitude
allows users to indicate their data set as either terrestrial or marine
useful to biologists around the world who wish to identify possible errors in their data
![Page 12: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/12.jpg)
1, -63.25, -4.916666667, 7952, -67.05, -10.96666667, 8053, -68.0125, -12.66666667, 8094, -68.75, -13.60111111, 8155, -68.9102, -13.83333, 8106, -72.3666, -14.36611111, 7907, -78.3166, -14.38916667, 8018, -72.137, -11.8647, 700
![Page 13: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/13.jpg)
marine
![Page 14: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/14.jpg)
1, -63.25, -4.916672, 34.3239,67.9836aus, 150.0417,-34.90813, -68.0125, -12.66674, -22.0400, 63.9514id_teste, -45, -226, -75.3667, -14.36617, 71.37, -19.37eua, -80.8011,26.05069,-120.7642,58.721710,26.0089,-29.519711,-95.3781,16.7639
![Page 15: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/15.jpg)
Input/Output:-degrees, min, sec-decimal degrees-UTM
DATUM:-WGS84 (World)-SAD69 (Brazil)-Córrego Alegre (SP)
-3.5800 , 52.063334.3239 , 67.9836-45 , -22
03d34'47"W , 52d3'47"N34d19'23"E , 67d59'0"N44d59'58"W , 21d59'58"S
degrees, min, s
![Page 16: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/16.jpg)
Plot georeferenced points on a map.
Available layers:
-World-South and Central America-Brazil-São Paulo State
-95.6 -39.5166-70.2833 -4.2 -70.033333 -4.35 -69.914889 0.274694 -69.7333 -4.2333 -69.6661 -3.908333 ...
![Page 17: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/17.jpg)
Trachurus trachurus
Pteroscion pele
Gaidropsarus biscayensis
![Page 18: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/18.jpg)
Using
DataPostgreSQL
DataPostgreSQL
spOutliergeoLoc
SOAP
Web service
job1 job2
MapsPostGIS
MapsPostGIS
![Page 19: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/19.jpg)
ToolsTools
About Data Cleaning
Aim at helping curators in identifying possible errors and to standardize data
Records are not modified
The system just presents "suspect" records
![Page 20: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/20.jpg)
Col 1 Col 2 Col 3 Col n Col n
National collections
Col 1 Col 2
Internacional collections
... ...
Tables of Suspect RecordsTables of Suspect Records
chart.pm (Perl)
Local DatabaseLocal Databasedc_tax
dc_geo
PostgreSQL
PostgreSQL
Detect Suspect Records
Perl
Web
speciesLink PortalspeciesLink PortalJava
How
Data
Cle
anin
g W
ork
sH
ow
Data
Cle
anin
g W
ork
s
![Page 21: INCOFISH WP3 - Campinas, April 2006 WEB Tools and Data Cleaning Alexandre Marino marino@cria.org.br Centro de Referência em Informação Ambiental, CrIA.](https://reader036.fdocuments.in/reader036/viewer/2022081603/56649e855503460f94b86edf/html5/thumbnails/21.jpg)
Demonstration on-line