AWSSummit Berlin Day1 ianrob migrating to amazon neptune Mark… · Reasons for migrating to Amazon...
Transcript of AWSSummit Berlin Day1 ianrob migrating to amazon neptune Mark… · Reasons for migrating to Amazon...
S U MM I TBe r l i n
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Migrating to Amazon Neptune
Ian RobinsonData ArchitectAWS Database Services Customer Advisory
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Amazon NeptuneFast, reliable graph databaseOptimized for storing and querying highly connected data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Fully managed graph database
Fast Reliable Open
Query billions of relationships with
millisecond latency
6 replicas of your data across 3 AZs with full backup and restore
Build powerful queries easily with Gremlin and
SPARQL
Supports Apache TinkerPop & W3C RDF
graph models
Easy
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Reasons for migrating to Amazon NeptuneComplex domain model
Difficult to maintain and evolve in current system
Connected data queriesJoins are slow in current systemJoins implemented in application layer in current system
Fully managed serviceProvisioning, backup, failover, durability, high availability, patching
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Do I have a graph workload?
Complex Domain Model Variable Schema Connected Queries
Large datasetMany different entities
Similar entities may have different properties
Highly connectedEntities connected in many different ways
Navigate connected structureTake account of strength,
weight or quality of relationships.
Variable Structure
Social Networking
Recommendations Knowledge Graphs
Fraud Detection Life Sciences Network & IT Operations
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Migration scenarios
K-V{}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Migrate use cases, not data
K-V{}
Relational
Non-relational
Graph
Talk to end users and subject matter expertsUse the application or APIYou may have to do some archaeology
Stored proceduresApplication codeObject model
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Design your target data model and queries
K-V{}
Relational
Non-relational
Graph
Application or Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Data modelling guidance
http://bit.ly/neptune-270219-1
http://bit.ly/neptune-270219-1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Data modelling example
http://bit.ly/neptune-270219-2
http://bit.ly/neptune-270219-2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Transform from source to target model
K-V{}
Relational
Non-relational
Graph
Application orService
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Converting data models guidance
http://bit.ly/neptune-270219-3
http://bit.ly/neptune-270219-3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Relational table
12 … Alice
id … f_name
37 … Bob
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Foreign keys
12 … Alice
id … f_name
37 … Bob
512 12 home High St
655 37 work Main St
700 12 work Any St
id p_id type addr_1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Join tables
12 … Alice
id … f_name
37 … Bob
512 … Any Co
655 … Example Co
700 … Example.com
id … name
12 512 2012 2015
37 512 2011 2016
p_id c_id from to
37 655 2016 2017
12 700 2015 2017
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Facts and dimensions
id
43
u_id
678
p_id
94
l_id
144
date
14-12-2018
…
…
id
94
…
…
id
678
…
…
id
144
…
…
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Nested documents{ id: order-1
delivery-address: {
// address-1
}
payment-address: {
// address-1
}
{ id: order-2
delivery-address: {
// address-1
}
{ id: order-3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Key-Value with implicit hierarchy
Alice TX Austin:Dev
37 Bob TX Dallas:Dev 555-0100
99 Dan TX 2016 Austin:Ops 555-0199
id f_name state start city:dept tel
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Data load optionsOnline endpoints• Gremlin or SPARQL• Good for ongoing replication
and modifying data• ACID transactions
Bulk loader API• Load data from S3 into
Neptune• Low overhead, optimized for
large datasets• Good for append-only loads
Bulk load from S3
Database Mgmt.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Using the online endpointsSubmit multiple items in a single request
All items created in a single transaction
g.addV('Person').
property(id, 'alice-id').
property('firstName','Alice’).
as('a').
addV('Person').
property(id, 'bob-id').
property('firstName','Bob').
addE('FOLLOWS').to('a')
INSERT
{
c:alice-id c:firstName "Alice" .
c:alice-id rdf:type c:Person .
c:bob-id c:firstname "Bob" .
c:bob-id rdf:type c:Person .
c:bob-id c:FOLLOWS c:alice-id .
} WHERE {}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Using the online endpointsImplement idempotent writes using conditional expressions
Multiple executions yield the same result
g.V('alice-id').
fold().coalesce(
unfold(),
addV('Person').
property(id, 'alice-id').
property('firstName','Alice').
property('lastName','Smith'))
INSERT
{
c:alice-id c:firstName "Alice" .
c:alice-id c:lastName "Smith" .
}
WHERE
{
FILTER NOT EXISTS { c:alice-id ?p ?o }
}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Encapsulate access to NeptunePut queries behind an API
e.g. AWS API Gateway + AWS Lamda
Develop queries test-firstAssert correctness of queries and model
%%unittest
results = (g.V().has('airport','code','LCY').
out().limit(1).
values('runways').
path().by('code').by('code').by().
next())
assert results == [LCY,AGP,2]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Streaming changes from source to target
Buffer writes in a durable streamControl concurrency with shards
Poll stream for batch of recordsSend single request to Neptune
per batch
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Streaming example
http://bit.ly/neptune-270219-4
http://bit.ly/neptune-270219-4
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Amazon Neptune Bulk LoaderUTF-8 encoded files in S3
Supports gzip compression of single filesCSV formatted files for Gremlin4 standard formats for RDF (N-Triples, N-Quads, RDF/XML, Turtle)
Requires a VPC endpoint for Amazon S3Cluster and bucket must be in same regionAdd IAM Role to Neptune allowing s3::Get* and s3::List* permissions for S3 bucket
Load, get status and cancel job via loader HTTP endpoint
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Event-driven batch import
Upload files to S3 Trigger Lambda on PUTInvoke Neptune bulk load API
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
AWS Database Migration ServiceMigrate databases to AWS
One-off and continuous data replication
Using DMS for Neptune migrationsInitial extract
Extract source data to S3Format for importBulk load into Neptune
Ongoing replicationDMS writes Change Data Capture stream to S3Lambda function loads from S3 to Amazon Kinesis data streamSecond Lambda function polls Kinesis stream and applies changes to Neptune
RDBMS
NoSQL
Data Warehouse
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Amazon AthenaServerless, interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.
Using Athena for Neptune migrationsPut source files in S3Use AWS Glue to crawl and discover data schemaQuery the source data using Athena
Output the query results as Neptune-formatted CSV to S3
Bulk load from S3 into Neptune
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
AWS GlueFully-managed ETL service
Build a data catalogGenerate transformationsSchedule and run jobsData lake integration
Using Glue for Neptune migrationsCatalog data sourcesCreate jobs that extract and transform data
Load data to S3 or write directly into Neptune
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Migrate from MySQL to Neptune
http://bit.ly/neptune-270219-5
http://bit.ly/neptune-270219-5
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
ETL with AWS Glue
http://bit.ly/neptune-270219-5
http://bit.ly/neptune-270219-5
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
What have we learned?
Work backwards from your use cases to design a target graph data model
Use common transformation patterns to guide your data model design
Choose a data load path, and a migration tool and migration architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM IT
Thank you!
SUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMM ITSUMM IT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.