Geospatial Big Data: Software Architectures and the Role of APIs in ...

76
® Geospatial Big Data: Software Architectures and the Role of APIs in Standardized Environments Apache Big Data Europe 2016 Ingo Simonis Open Geospatial Consortium [email protected] slideset contains material from G. Percivall (OGC), R. Winterton (Pitney Bowes), J. Sanyal (ORNL), A. Asahara (Hitachi), J. Spinney (Pitney Bowes), T. Kolbe(TUM), Rob Emanuele (LocationTech) , ESA

Transcript of Geospatial Big Data: Software Architectures and the Role of APIs in ...

Page 1: Geospatial Big Data: Software Architectures and the Role of APIs in ...

®

Geospatial Big Data: Software Architectures and the Role of APIs in

Standardized EnvironmentsApache Big Data Europe 2016

[email protected]

slideset containsmaterialfromG.Percivall (OGC),R.Winterton (PitneyBowes),J.Sanyal (ORNL),A. Asahara (Hitachi),J.Spinney(PitneyBowes),T.Kolbe(TUM),RobEmanuele(LocationTech),ESA

Page 2: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeospatialData

• Spatial data is big data

• Apache projects are implementing geospatial functionalities

• Coordination of spatial implementations across Apache projects

• Open standards to increase interoperability and code reuse

• Architectures integrating Big Data Services and Geospatial Services

Page 3: Geospatial Big Data: Software Architectures and the Role of APIs in ...

EarthObservations

• BigEarthDataInitiative(BEDI)- Standardizingandoptimizingcollection,deliveryofU.S.Government’scivilEarthobservationdata.

• SentinelsatellitesoperatedbyESAintheframeworkoftheCopernicusprogramme fundedandmanagedbytheEuropeanCommission.

3

0.00

2.00

4.00

6.00

8.00

10.00

12.00

FY00 FY01 FY02 FY03 FY04 FY05 FY06 FY07 FY08 FY09 FY10 FY11 FY12 FY13

Volum

e(PB

s)

Multi-yearTotalArchiveVolume(PBs)Trend

TOWARDS A JRC EARTH OBSERVATION DATA AND PROCESSING PLATFORM

P. Soille1, A. Burger2, D. Rodriguez1, V. Syrris1, and V. Vasilev2

European Commission, Joint Research Centre (JRC)1Institute for the Protection and Security of the Citizens, Global Security and Crisis Management Unit

2Institute for Environment and Sustainability, Digital Earth and Reference Data Unit

ABSTRACTThe Copernicus programme of the European Union with itsfleet of Sentinel satellites operated by the European SpaceAgency are effectively making Earth Observation (EO) enter-ing the big data era. Consequently, most application projectsat continental or global scale cannot be addressed with con-ventional techniques. That is, the EO data revolution broughtin by Copernicus needs to be matched by a processing revolu-tion. Existing approaches such as those based on the process-ing of massive archives of Landsat data are reviewed and theconcept of the Joint Research Centre Earth Observation Dataand Processing platform is briefly presented.

Index Terms— Earth Observation, Sentinel, Copernicus,Infrastructure

1. INTRODUCTION

To date, the United States (U.S.) Government is the largestprovider of environmental and Earth system data in theworld1. A first data revolution happened in 2008 when theU.S. Geological Survey decided to release for free to the pub-lic its Landsat archive which is the worlds largest collectionof Earth imagery [11]. Still, the European Commission, withits ambitious Copernicus programme and associated Sentinelmissions (S1 to S6 satellite series) operated by the EuropeanSpace Agency and complemented by a range of contribut-ing missions, is on the way to become the main provider ofglobal EO data with a free, full, and open access data policy.With expected data volumes of 10 TB per day (when all Sen-tinel series will reach full operational capacity), data velocityhighlighted by the production of global coverage with repeattime as short as 2 days for Sentinel-3, and data variety result-ing from sensors in the optical and radar ranges at variousspatial, spectral, and temporal resolutions, the Copernicusprogramme is a game changer making EO data effectivelyentering the big data era [10]. Figure 1 shows the overallestimated data throughput for the Sentinel 1–3 missions com-pared to those delivered by the Landsat 8/MODIS satellites.

1http://dels.nas.edu/resources/static-assets/besr/miscellaneous/Stryker.pdf

Fig. 1. Yearly data flow estimates from Sentinel 1–3 (as-suming full operational capacity) compared to MODIS andLandsat 8 data flows.

Whilst the European Union (EU) is making EO enteringthe big data era in terms of data production, innovative de-velopments need to be pursued to fully exploit the potentialof the generated data whether for academic, institutional, orcommercial applications. This also applies to the Joint Re-search Centre where the current fragmented approach of EOdata storage and processing is no longer sustainable.

2. THE SENTINELS AND THE BIG EO DATA ERA

The evolution of the cumulative data produced by the Land-sat missions and Sentinel 1-2-3 with estimations until sum-mer 2018 is shown in Fig. 2. The underlying calculations arebased on the following assumptions and considerations:

1. The data volume of Landsat 1-6 missions is ⇠120 TB2;

2. The data volume of Landsat 7 and 8 is estimated fromthe relating metadata files provided by USGS3;

3. MODIS Terra and Aqua generate 70GB/day each [7];

2http://academic.emporia.edu/aberjame/remote/landsat/landsat.htm

3http://landsat.usgs.gov/metadatalist.php

Exploitation Platforms 2

Proc. of the 2016 conference onBig Data from Space (BiDS’16) doi: 10.2788/854791

65 Santa Cruz de Tenerife, Spain15–17 March 2016

Page 4: Geospatial Big Data: Software Architectures and the Role of APIs in ...

CommercialCloudHosting

• DigitalGlobe– EntireDGarchiveincloudin2016- 45PB- largestEOarchive– Harris/ENVIprocessing

• GoogleEarthEngine– 5PBstorage(Landsatandothers) - 800+LibraryFunctions– Limitingfactoristheabilitytopulldatafromanothercloudtosupportlocalprocessing.

• HexagonGeospatial– Cloudhosteddynamicinformationservice– AirBus archiveinAmazoncloudwithHexagonservices

Page 5: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geo-Enrichment

Allowsawidevarietyofdatasetstobeappendedtoadatarecord

oWhatarethepropertyattributesofthisinsuredproperty?oWhatdemographicgroupdoesthiscustomerbelongto?oWhatbusinessesareconnectedwiththisareaofpoornetworkcoverage?

Page 6: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geo-Analytics

Reducethecomplexityofbillionsoftransactionalrecordsbyassigningdatatogeographicbinsandaggregatingresults.

o Istheaverage4Gnetworkcoverageinthisareabetterthanacompetitor?

o Isthisdatapointinsideoroutsideofageo-fence?

Page 7: Geospatial Big Data: Software Architectures and the Role of APIs in ...

NetworkCoverageandPerformance

Connectedcarswillsend25gigabytesofdatatothecloudeveryhourimageby:http://barrachd.co.uk

Page 8: Geospatial Big Data: Software Architectures and the Role of APIs in ...

NetworkCoverageandPerformance

• Singleviewonnetwork:• Improvequalityofservice• Increasenetpromoterscores

• Enableacquisition• Reducechurn

PitneyBowes|Increasingthevalueofdatathroughlocationinsights|09/2016

Page 9: Geospatial Big Data: Software Architectures and the Role of APIs in ...

LayeredInformation

9

Demographics

Page 10: Geospatial Big Data: Software Architectures and the Role of APIs in ...

LayeredInformation

10

Demographics

Eachlayeroflocationdataaddsnewdetailsandadditionalinsight

Demographics

PointsofInterest

BusinessData

BuildingAttributes

AreaBoundaries

StreetNetworks

Page 11: Geospatial Big Data: Software Architectures and the Role of APIs in ...

AllDomains:Profitfromapreciseperspective

11

• Singleviewofrisk• Usagebasedinsurance• Frauddetection

Insurance

Page 12: Geospatial Big Data: Software Architectures and the Role of APIs in ...

PopulationDensityTables

RapidSettlementIdentificationandCharacterization

LandScanGlobalAmbientPopulationDistribution(~1km)

SettlementMapping

FacilityLevelPopulationDensity

AmbientandDay/NightPopulationDistribution(~90m)

Increaseinsp

atialand

tempo

ralresolution

LandScanHD

TopDo

wnandBo

ttomUp

Approach

PopulationDistributionandDynamicsModeling

Page 13: Geospatial Big Data: Software Architectures and the Role of APIs in ...
Page 14: Geospatial Big Data: Software Architectures and the Role of APIs in ...
Page 15: Geospatial Big Data: Software Architectures and the Role of APIs in ...

AddisAbaba,Ethiopia

• 2 Xeon Quad core 2.4GHz CPUs + 4 Tesla GPUs + 48GB

• Image analyzed (0.3m)• 40,000x40,000 pixels (800 sq. km)• RGB bands

• Overall accuracy 93%• Settlement class 89%• Non-settlement class 94%

• Total processing time• 27 seconds

Page 16: Geospatial Big Data: Software Architectures and the Role of APIs in ...

FireStation

MLDAddressFabric

WaterBodies

IncorporatedPlaces

MCDPointinPolygonPointinPolygon

FireStationWithMCDinformation

PointinPolygon

FireProtectionDataBundle

Complexspatialprocessing

Routing&othercomplexspatialprocessing

MLDAddressFabricWithMCD,Incorporated

Places,Nearest3FireStation,NearestWaterbodyinformation

MCD

MLDAddressFabricWithMCDinformation

MLDAddressFabricWithMCD&IncorporatedPlacesinformation

172MillionAddressestoClosest3FireStations&NearestWaterBoundary- 6hoursUsing10NodeElasticMapReduce

ComplexWorkFlows

Page 17: Geospatial Big Data: Software Architectures and the Role of APIs in ...

EcologyMapping

• 1 kmsq gridofUSeachwithninevariables,e.g.,daysbelowfreezing,amountofprecipitationingrowingseason

• Unsupervisedstatisticalmultivariateclustering

• Domains:tundra,prairie,alpine,andsoutheasternforest

A Groundbreaking Observatory to Monitor the Environment -- Pennisi 328 (5977): 418 -- Science

http://www.sciencemag.org/cgi/content/full/328/5977/418[5/10/2010 11:09:49 AM]

The united domains of America.Scientists divided the United Statesinto 20 ecological domains. Three siteswithin each domain will beinstrumented.

SOURCE: NEON

Sign In

More Information

More in CollectionsEcology

Scientific Community

Science and Policy

Related Jobs fromScienceCareers

Ecology

Environmental Science

to practice "ecoinformatics": the use of computers and software tools to integrate different types of information frommany locations. They will need to think about trends across a whole country instead of a single ecosystem. Thesuccess of NEON will depend in large part on whether they embrace or reject that new model.

Not everyone is pleased with how the project is set up. Some, like ecologist David Tilman of the University ofMinnesota, Twin Cities, lament the excision of an experiment to test the effects of global change. They say such anexperiment, deemed too expensive, is essential to obtaining timely answers about climate change. Others complainthat NEON won't be investing enough in the field sites that will host its instruments. There's also some concern thatecologists, untrained in the approach NEON is taking, won't use NEON's data. "Everybody still has some questionsbecause it's a new thing," says John Porter, an ecologist at the University of Virginia in Charlottesville who is notpart of NEON.

LTER on steroids

Monitoring a patch of land over time isn't a new idea for NSF. In 1980, it set up five U.S. sites, including one atNiwot Ridge, under the Long Term Ecological Research (LTER) Network that has grown to 26 sites, including twoin Antarctica and one off the Fiji Islands in the South Pacific. The $30-million-a-year program is widely considereda success, with findings on the effect of global warming on plant diversity, how forests could be overloaded byanthropogenic nitrogen, and the greater stability of diverse ecosystems.

But by the late 1990s, says Williams, "we realized there were limits to the LTER model." Each LTER was designedto answer questions posed by an individual investigator or a small team. Core activities, such as measuring primaryproductivity, were not a high priority, Williams acknowledges. "It was hard to integrate data [from different sites] andto do synthesis," he adds, because investigators followed different timetables and used different instruments.

At about the same time, Williams says, the community began to ask itself, "How do we grow ecology, and how dowe tap additional resources?" For NSF program managers, the goal was to fund construction of a large-scalebiology project without devouring their annual budgets, which nurture thousands of individual investigators(Science, 20 June 2003, p. 1869). Their models were the astronomy and geosciences communities, which havemanaged for decades to build costly instruments such as telescopes and ships without bankrupting their bread-and-butter programs. NSF already had a mechanism: Its budget included a special facilities account to financeconstruction of half a dozen projects at a time, with the understanding that NSF's research directorates would payfor operations and maintenance of those facilities from their annual budgets.

A series of workshops yielded a vision of NEON hailed by then-newly arrived NSF Director Rita Colwell, whoinserted the project into NSF's 2001 budget request to Congress. But the larger ecological community hadreservations. Congress also balked, wondering what particular scientific question NEON would be addressing.

In response to that resistance, NSF asked the American Institute of Biological Sciences to hold three town meetingsin 2002 and 2003. The resulting white paper called for a network of 17 sites in different biomes that, in turn, wouldbe linked to other research sites nearby. Each site was projected to cost $20 million to set up and $3 million a yearto operate.

Again, however, the community was divided. Although some people were excited, others wondered if ecologists,known for being independent, would take full advantage of NEON. To many, the program looked like "LTER onsteroids," says Williams. "It was not a good-enough plan."

Next up was an evaluation by the U.S. National Academies. Theresulting National Research Council (NRC) report endorsed NEON inprinciple but urged that the program be reoriented around six specificresearch questions, including biodiversity and land use. Eachquestion would be the focus of one observatory (Science, 26September 2003, p. 1828). "It forced us to look at large-scaleecological processes and large-scale drivers of change," says NSF'sElizabeth Blood. Nonetheless, Congress chose not to give NSFmoney in 2004 to begin construction, the third time in 4 years it hadpassed on funding NEON.

A plan takes shape

For NSF, the flaw in the NRC proposal was that the observatorieswere too independent to be considered a single entity. That featurewould preclude NEON from being funded by the agency's majorresearch equipment account. For NEON's supporters, the solution

Faculty Positions -Skeletal Muscle Biolog…University of IowaIowa City-IA-United States

FACULTY POSITION INMICROBIOLOGYEast Tennessee StateUniversity-United States

CHAIRUniversity of Florida-United States

POSTDOCTORALFELLOWSHIPPOSITIONSUniversity of MarylandSchool of Medicine-United States

FACULTY POSITION INMICROBIOLOGYUniversity of MinnesotaMinneapolis-MN-UnitedStates

POSTDOCTORALPOSITIONUniversity of Texas, HoustonHouston-TX-United States

INNATE IMMUNITYTENURE-TRACKFACULTY POSIT…University of Texas MedicalBranch

Science23April2010:Vol.328.no.5977,pp.418- 420DOI:

10.1126/science.328.5977.418

NSFNEONEcologicalDomains

Page 18: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Transportation• Toreducetrafficcongestion,tripdemanddatacollectedusingtransportationsurveys• GPSbaseddatacollectionoftripinformationisapplicable,withthebroadavailability

oflocationenabledmobiledevices• TheGPStracksareencodedbyMovingFeaturestoenablesharingbymany

stakeholderssuchaslocalgovernments,buscompanies,andsoon.

Transportationsurvey

12343456

23454567

12343456

12hrtotal24hrtotal

123234

34564567

34567890

TrafficDemandsTrafficCongestionsSmartphones

Peopleinthecity

TracksmeasuredbyGPS(encodedbyMovingFeatures)

Page 19: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Contexts &Possibilities

PRESENT

Behaviors &Actuals

PAST

Predictions &Potentials

FUTURE

LocationBasedMarketing

Page 20: Geospatial Big Data: Software Architectures and the Role of APIs in ...

CityModelsforSmartCities

• Berlin• >500,000 buildings upto

Level of Detail 4• Modeled according to

CityGML• Basis for real estate• Integration of sensors

• New York• 1M buildings plus roads at

LoD 1• NYC Open data • Next - Underground critical

infrastructure www.virtual-berlin.de

Page 21: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeospatialStandards

• Location• Geometry• Features• Coverages• SensorsandObservations• Processing,Analytics• WebServices

Page 22: Geospatial Big Data: Software Architectures and the Role of APIs in ...

PowerofLocation

• 1stlawofgeography:"Everythingisrelatedtoeverythingelse,butnearthingsaremorerelatedthandistantthings.”

– WaldoTobler

• Bymeasuringentropyofindividual’strajectory,wefind 93%potentialpredictabilityinusermobility

– LimitsofPredictabilityinHumanMobility,Science2010

Page 23: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SomePeculiaritiesaboutSpatial

Page 24: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SomePeculiaritiesaboutSpatial

Page 25: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Latitudeisnotunique!

norisLongitude!

Page 26: Geospatial Big Data: Software Architectures and the Role of APIs in ...

CoordinateReferenceSystems

• Coordinate– oneofasequenceofNnumbersdesignating

thepositionofapointinN-dimensionalspace• CoordinateSystems

– Cartesian2Dand3D– Spherical(3D),Polar(2D)– Cylindrical– Linear- alongapath– Ellipsoidal

• CoordinateReferenceSystem– coordinatesystemrelatedto

realworldbyadatum• Examples

– Geographic– Geocentric– Vertical– Engineering– Image– Temporal– DerivedCRS,e.g.,projections

ReferenceISO19111andOGCAbstractSpecTopic2

Page 27: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Mercatorprojection

Globularprojection

Orthographicprojection

Stereographicprojection

Afamiliarlyshaped‘continent’ indifferentmapprojections

Page 28: Geospatial Big Data: Software Architectures and the Role of APIs in ...

MapProjections

Mercator TransverseMercator

Page 29: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Whaterrorscanyouexpect?

Deviations(undulations)betweentheGeoidandtheWGS84ellipsoid

Page 30: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SeaLevel

Localverticaldatum

Page 31: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SeaLevel

TheNetherlandstoBelgium:-2.34m!

Page 32: Geospatial Big Data: Software Architectures and the Role of APIs in ...

NoMetadata– NoInterpretation

• Nogeodeticmetadataà coordinatescannotbeinterpreted– datum– ellipsoid– primemeridian– mapprojection

Page 33: Geospatial Big Data: Software Architectures and the Role of APIs in ...

HidingGeospatialComplexity

MartinDesruisseaux,Geomatys,presentationtodayaboutApacheSISProject• ItistemptingtoignorethecomplexityofgeospatialinternationalstandardsontheassumptionthateveryonetodayusescoordinatesgivenbyGPS.

• ApacheSISmethodshandlealotofthiscomplexity• Martinwillshowexampleofwhathappenunderthehoodduringacubetransformation,fordemonstratingwhatthedevelopersgainwithSIS.

Page 34: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeospatialInformation

Feature Data Coverage Data

Metadata Maps

Page 35: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SimpleGeometriesforSimpleFeature

OGC simple features (ISO 1923) geometries are restricted to 0, 1 and 2-dimensional geometric objects that exist in 2-dimensional coordinate space (R2).

Page 36: Geospatial Big Data: Software Architectures and the Role of APIs in ...

A/B A B A B A B

A B A B ABA

Equals Touches Overlaps Contains

Within Disjoint Intersects Crosses

TopologicalRelationsbetweenSpatialObjects

Page 37: Geospatial Big Data: Software Architectures and the Role of APIs in ...

TopologicalRelationsbetweenSpatialObjects

Page 38: Geospatial Big Data: Software Architectures and the Role of APIs in ...

– ogcf:relate(geom1: ogc:WKTLiteral, geom2: ogc:WKTLiteral,patternMatrix: xsd:string): xsd:boolean

– ogcf:sfEquals(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfDisjoint(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfIntersects(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfTouches(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfCrosses(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfWithin(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfContains(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

– ogcf:sfOverlaps(geom1: ogc:WKTLiteral,geom2: ogcf:WKTLiteral): xsd:boolean

GeoSPARQL for Topological Query Functions

Page 39: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeographicFeatures

Encodings

Access

ImplementationSpecifications

Concepts Vocabulary

StructureAbstractModels

<MultiGeometrygid="c731"srsName="http://www.opengis.net/gml/srs/epsg.xml#4326"><geometryMember><Pointgid="P6776"> <Coord><x>50.0</x><y>50.0</y></Coord> </Point></geometryMember><geometryMember> <LineStringgid="L21216"> <Coord><x>0.0</x><y>0.0</y></Coord> <Coord><x>0.0</x><y>50.0</y></Coord> <Coord><x>100.0</x><y>50.0</y></Coord> </LineString> </geometryMember> <geometryMember></MultiGeometry>

Page 40: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCGeographyMarkup Language

TwoDifferentUsagePatterns• Thematiccommunitiesdescribe

spatialdatasets: Cadastre,Topography,Geology,Hydrography,Meteorology,Aviation,CityModels,etc.

• EmbedlocationinotherXMLgrammars: GeoRSS,GeoSPARQL(OGC),Geopriv (IETF),POI(W3C),SensorWeb(OGC),etc.

GML: Geometry, Time, Features,

Reference Systems

Map

ping

XML Technologies (W3C)

Cit

y M

odel

s

Cad

astr

e

Geo

logy

Web

Fee

ds

...

Page 41: Geospatial Big Data: Software Architectures and the Role of APIs in ...

CityGML – GeometryandSemantics

– Geometric entities knowWHAT they are– Semantic entities knowWHERE they are and what their spatial extentsare

CityGML: (Up to) Complex objects with structured geometrySemantics Geometry

Page 42: Geospatial Big Data: Software Architectures and the Role of APIs in ...

CityGML and IndoorGML

1st layer: Topographic space model– building structure– geometric-topological model– network for route planning

2nd layer: Sensor space model– Radio/Beacon footprints– coverage of sensor areas– transition between sensor areas

Page 43: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGC Moving Features

• "Moving features" - vehicles, pedestrians, airplanes, ships.– This is Big Data – high volume, high velocity.

• CSV and XML encodings

Page 44: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Spatial Temporal Geometry

time

Spatialplane

1prism=1leaf+1sweep(&attribute)

Endleafoftracks

id=1

Id=2

11:11:20.835 11:11:26.215 11:11:28.021 11:11:30.127

(C)

(B)(D)

(A)

OGCMovingFeaturesStandardimplementsISO19141

Page 45: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Social Media in Geospatial Analysis

Social MediaAPIsSilos

GeoSPARQL Linked Data REST API

Web AccessLayer

Human-orientedClients

. . .

OGC Interfaces for Social MediaSocial Media Analysis WPS

Page 46: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geospatial Coverages

• Pixelgrid(e.g.,visiblebrightness)

Page 47: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geospatial Coverages

• Pixelgrid(landuse/landcover)

Page 48: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geospatial Coverages

• Pointgrid(e.g.,windspeed&direction)

Page 49: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Geospatial Coverages

• Triangulatedirregularnetwork(TIN)

Page 50: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCPointClouds

• WGestablishedin2015• Focusonalltypesofpointclouds:LiDAR/laser,bathymetric,meteorologic,photogrammetric…

Page 51: Geospatial Big Data: Software Architectures and the Role of APIs in ...

WebCoverageProcessingService• Query Language for nD sensor, image, simulation, statistics data

– Syntax close to XQuery (WCPS 2.0: integration)• Ex: "From MODIS scenes M1, M2, and M3, the difference between red and nir, as TIFF

where nir exceeds 127 somewhere”for $c in ( M1, M2, M3 )where some( $c.nir > 127 )return encode( $c.red - $c.nir, “image/tiff“ )

(tiff1,tiff2)

Page 52: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeospatialAnalytics

• Analyticexploitationofthespace-timefeatureswillusherinadvancesinhigh-qualitypredictionsystems.– Spacetimefeatures:thehighestorderbits- Jonas,Tucker

• UsingalgorithmicextractionandbigdatagraphstocreateandrelateentitiesontheWeb,organising themthroughasemantictaxonomyandenablingnaturalaccess– Thefutureis‘Where’"- S.Lawler,Bing

Page 53: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SpatialPartitioningTechniques

Page 54: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SpatialindexstoredperfileonHDFS

Zorder(2Dand3D),Hilbert(N-Dimensional)

Zorder(2Dand3D)Binnedperweekforspatiotemporal

N-DimensionalHilbertwitharbitrarybinningandtieredindexing

SpatialIndexing

GEOJINNI(formerly SPATIALHADOOP)

Page 55: Geospatial Big Data: Software Architectures and the Role of APIs in ...

DiscreteGlobalGridSystems

• “…aspatialreferencesystemthatusesahierarchicaltessellationofcells topartitionandaddresstheglobe.DGGSarecharacterizedbythepropertiesoftheircellstructure,geo-encoding,quantizationstrategyandassociatedmathematicalfunctions.”

– OGCDGGSCandidateStandard

Page 56: Geospatial Big Data: Software Architectures and the Role of APIs in ...

StandardizingDiscreteGlobalGridSystemsDifferent Cell Shapes

Square = Familiar Triangular = Fast Hexagonal = Fineness of Fit

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

nD Spatial Analyses ¯

1D Array Processes

Unique Cell Indices• Hierarchy-based, Space-filling Curve, Axes-based or Encoded Address

Page 57: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SpaceFillingCurvesA few different choices…

Page 58: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Sensors Everywhere(Things or Devices)

50 billions Internet-connected things by 2020

Page 59: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCSensorWebEnablement

• Quickly discoversensorsandsensordata (secureorpublic)thatcanmeetmyneeds– location,observables,quality,abilitytotask

• Obtainsensorinformation inastandardencodingthatisunderstandablebymeandmysoftware

• Readilyaccesssensorobservations inacommonmanner,andinaformspecifictomyneeds

• Tasksensors,whenpossible,tomeetmyspecificneeds• Subscribetoandreceivealerts whenasensormeasuresaparticularphenomenon

Page 60: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCSensorThings forIoT

• Builds on OGC Sensor Web Enablement (SWE) standards that are operational around the world

• Builds on Web protocols; easy-to-use RESTful style • OGC candidate standard for open access to IoT devices

http://ogc-iot.github.io/ogc-iot-api/datamodel.html

Page 61: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCEssentials

• SimpleFeaturesforSQL:FundamentalgeometriesandoperationswhichunderlieallOGCstandards.

• WellKnownText:TextencodingofSimpleFeaturesgeometries• WellKnownBinary:binaryencodingofWellKnownText.• CQL/Filter: CommonQueryLanguage andFilterlanguage• GeoPackage:SQLlite forgeospatial• WMTSSimpleTileMatrix

Page 62: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCBigGeoDataWhitePaper

BigGeoDataApplications

UseCasesforBigGeoData

OpenStandards

OpenSourceProjects

UseCasesReuseacrossApplications

Codereusebasedonstandards

Context forusecases Implementation

ofUseCases

Page 63: Geospatial Big Data: Software Architectures and the Role of APIs in ...

UseCasesforBigGeoData

HighVelocityIngest

GeoAnalytics,MachineLearning

GeospatialDatabases

SpatialModeling

ObservationSources

Users and consuming

apps

Page 64: Geospatial Big Data: Software Architectures and the Role of APIs in ...

UseCasesforBigGeoData

HighVelocityIngest

GeoAnalytics,MachineLearning

GeospatialDatabases

SpatialModeling

ObservationSources

Users and consuming

apps

IoTMessage Streaming

Social Media Message

Processing

ETL Stream processing using

RDF

Wide Area Motion

Imagery

Entity-oriented Spatial-temporal

analytics

Grid-oriented Spatial-temporal

analytics

Feature Fusion

Remote sensed data processing

Machine Learning

Array databases

NoSQLdatabases

Graphdatabases

Built environment

models

Integrated environmental

models

Modeling and simulation

Page 65: Geospatial Big Data: Software Architectures and the Role of APIs in ...

HighVelocityIngest- UseCases

• OpenSourceProjects– ApacheKafka,ApacheNiFi,ApacheJena,– SensorHub,SensorUp

• OpenStandards– IoT:MQTT,COAP,IPSO,– OGCSensorWebEnablement(SWE),SensorThings– RDF,OWL,GeoSPARQL,– WebProcessingService(WPS)– WideAreaMotionImagery(WAMI)

HighVelocityIngest

ObservationSources

IoTMessage Streaming

Social Media Message

Processing

ETL Stream processing using

RDF

Wide Area Motion

Imagery

DRAFT

Page 66: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeoAnalytics,MachineLearningUseCases

• OpenSourceProjects– Apache:Accumulo,Storm,Lucene,Hadoop,SIS,Magellan,Marmotta,Mahout,Spark

– LocationTech:GeoWave,GeoTrellis,GeoMesa,GeoJinni,JTSTopologySuite

– OSGeo:GDAL/OGR,OSSIM,pycsw– Others:MrGeo,MonetDB

• OpenStandards– OGCSimpleFeatures,DGGS– GeoTIFF,NetCDF,HDFencodings– WebProcessingService(WPS)

GeoAnalytics,MachineLearning

Entity-oriented Spatial-temporal

analytics

Grid-oriented Spatial-temporal

analytics

Feature Fusion

Remote sensed data processing

Machine Learning

DRAFT

Page 67: Geospatial Big Data: Software Architectures and the Role of APIs in ...

GeospatialDatabasesUseCases

• OpenSourceProjects– Apache:Accumulo,Lucene/Solr,Cassandra,SIS,

Marmotta– OSGeo:degree,GeoServer,OpenLayers,QGIS– EarthServer,THREDDS,RasterStorageArchive– MonetDB

• OpenStandards– WebFeatureService(WFS)– WebCoverageService(WCS)– WebMapService(WMS)– GeographyMarkupLanguage(GML)

GeospatialDatabases

Array databases

NoSQLdatabases

Graphdatabases DRAFT

Page 68: Geospatial Big Data: Software Architectures and the Role of APIs in ...

SpatialModelingUseCase

• OpenSourceProjects– ApacheSIS– CityDB– Cesium

• OpenStandards– CityGML– OpenMI– OGCCDB

SpatialModeling

Built environment

models

Integrated environmental

models

Modeling and simulation DRAFT

Page 69: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OpenSourceandOpenStandards

• Importanceofcoordination– “Havingjustoneimplementationofsomethingisrisky”- TomHardie,IETF– Needtodefinestableinterfaceswithstablestandardreference– Protocols,Interfacesandencodingsdocumentedinopenstandards

• OpenStandardsuseofOpenSource– ReferenceImplementationsofOpenStandards– CodesnippetsinOpenStandards.

Page 70: Geospatial Big Data: Software Architectures and the Role of APIs in ...

Commercial39%

Government27%

NGO8%

Research6%

University20%

TheOpenGeospatialConsortium

Not-for-profit, international voluntary consensus standards organization; leading development of geospatial standards• Founded in 1994• 515+ member organizations• 48 standards• Thousands of implementations • Broad user community

implementation worldwide• Alliances and collaborative activities

with ISO and many other SDO’s

Africa…

AsiaPacific…

Europe209Middle

East34

NorthAmerica…

SouthAmerica

3

Page 71: Geospatial Big Data: Software Architectures and the Role of APIs in ...

ApacheBDUSAMay2016- GeospatialTrack

• Open Geospatial Standards and Open Source – George Percivall, Open Geospatial Consortium (OGC)

• Magellan: Spark as a Geospatial Analytics Engine – Ram Sriharsha

• Applying Geospatial Analytics Using Apache Spark Running on Apache Mesos– Adam Mollenkopf, Esri

• SciSpark: MapReduce in Atmospheric Sciences – Kim Whitehall, NASA Jet Propulsion Laboratory

• Geospatially Enable Your Hadoop, Accumulo, and Spark Applications with LocationTech Projects

– Robert Emanuele, Azavea

© 2016 Open Geospatial Consortium

Page 72: Geospatial Big Data: Software Architectures and the Role of APIs in ...

ApacheBDUSAMay2016- GeospatialTrackII

• Hiding Some of Geospatial Complexity– Martin Desruisseaux, Geomatys

• Geospatial Querying in Apache Marmotta– Sergio Fernandez, Redlink GmbH

• Spatial Data Based People/Vehicles Trails Analysis to Support Precision Urban Planning – Yonghua (Henry) Zeng, IBM

• Crowd Learning for Indoor Positioning – Thomas Burgess, indoo.rs GmbH

© 2016 Open Geospatial Consortium

Page 73: Geospatial Big Data: Software Architectures and the Role of APIs in ...

OGCTestbed-13

Page 74: Geospatial Big Data: Software Architectures and the Role of APIs in ...

ESAEO(Exploitation)Platform

Page 75: Geospatial Big Data: Software Architectures and the Role of APIs in ...

BigDataIntegrationArchitectures

Page 76: Geospatial Big Data: Software Architectures and the Role of APIs in ...

TheOpenGeospatialConsortium

OpenGeospatialConsortiumwww.opengeospatial.org

OGCStandards- freelyavailablewww.opengeospatial.org/standards

OGConYouTubehttp://www.youtube.com/user/ogcvideo

[email protected]