IJCAI13 Paper review: Large-scale spectral clustering on graphs
Moving Graphs to Production At Scale
-
Upload
neo4j-the-fastest-and-most-scalable-native-graph-database -
Category
Technology
-
view
6.423 -
download
1
Transcript of Moving Graphs to Production At Scale
MovingGraphstoProduc3onAtScale
IanRobinson
Overview
• DeploymentOp,ons• Hardware/So5wareRequirements• HAArchitecture• Backups• Monitoring• Tes,ng• PerformanceTips
DeploymentOp3ons
EmbeddedServerServerwithExtensions
DeploymentOp3ons
Embedded• HostNeo4jinJavaprocess• AccesstoNeo4j’sJavaAPIs
ServerServerwithExtensions
JavaAPIs
Applica,on
DeploymentOp3ons
EmbeddedServer• Serverwrapsembeddedinstance• HTTP/JSONinterface• Transac,onalendpoint
ServerwithExtensions RESTAPIRESTAPIRESTAPI
Driver
Applica,on
Loadbalancer
DeploymentOp3ons
EmbeddedServerServerwithExtensions• JAX-RSRESTfulresources• Executecomplexlogiconserver• Closetothedata• Mul,pleopera,onsperrequest• Integratewithbackendsystems
• ControlHTTPrequest/responseformat,headers
RESTAPIRESTAPIRESTAPI
Driver
Applica,on
Loadbalancer
RESTAPI Extensions
HardwareCPU• IntelCorei3(minimum)• IntelCorei7(recommended)• Neo4jscaleswiththenumberofcores
• RequiresEnterprisetoscalebeyond4coresDisk• SLC(single-levelcell)SSDw/SATA• ext4(recommended),ZFS• IncreasepermiYednumberofopenfilesto40,000
Memory• LotsofRAM(forheap+pagecache)
• 8-12GBheap(upto24GB)• Explicitlysetpagecacheto(storesize+10%+headroom)
– Otherwisedefaultsto75%ofRAM-minus-heap(50%in2.3)
dbms.pagecache.memory=10g
neo4j.proper)es
SoEware
Java• OpenJDK8(preferred)or7orOracleJava8(preferred)or7• IBMJDKonPOWER8• G1garbagecollector• Defaultin2.3• JDK1.7.0_71orlater
Opera3ngSystem• Linux• HPUX• Windows2012
wrapper.java.additional=-XX:+UseG1GC
neo4j-wrapper.conf
Instances• HVM(hardwarevirtualmachine)overPV(paravirtual)• EBS-op,mized• DedicatedthroughputtoEBS
• C3orC4(compute-op,mized)• E.gc4.2xlarge(15GiBRAM,8vCPU,1000MbpsEBSthroughput)
• R3(memory-op,mized)• E.g.r3.xlarge(30.5GiBRAM,4vCPU)• NotEBS-op,mizedbydefault
Volumes• ProvisionedIOPS(io1)forpredictableperformance • ForI/Ointensiveworkloads• Upto30IOPSperGiB
– E.g.300GiBvolume,9000IOPS
HAArchitecture
Database
Transac,onPropaga,on
ClusterManagement
Neo4jHAInstance2
Database
Transac,onPropaga,on
ClusterManagement
Neo4jHAInstance1
Database
Transac,onPropaga,on
ClusterManagement
Neo4jHAInstance3
Master
ClusterConfigura3onJoiningCluster• ha.initial_hosts (neo4j.proper)es)
• Listofserverstocontactwhenjoiningcluster• Allhostsmustbeavailablewhenstar,nginstance• Forlargeclusters,supplyonlyasmallnumberofhosts,e.g.3
PullandPushTransac3ons• ha.pull_interval=10s (offbydefault)• ha.tx_push_factor=1 (default,butbesteffortsonly)
Tuning• ha.heartbeat_timeout=11s (default)
• Heartbeatssent,bydefault,every5s• Increase,meoutsifpausescauseheartbeatstobedelayed• Warning:itwilltakelongertodiscoveraninstancehasfailed
• ha.state_switch_timeout=120s (default)• Increaseifnewinstances,meoutwhilecatchingupwithmasteronstartup
HAEndpoints–UsefulforLoadBalancingEndpoint State StatusCode Body/db/manage/server/ha/master
Master 200 OK true
Slave 404 Not Found false
Unknown 404 Not Found UNKNOWN/db/manage/server/ha/slave
Master 404 Not Found false
Slave 200 OK true
Unknown 404 Not Found UNKNOWN/db/manage/server/ha/available
Master 200 OK master
Slave 200 OK slave
Unknown 404 Not Found UNKNOWN
From2.3onwards dbms.security.ha_status_auth_enabled=false
neo4j.proper)es
HAJMXEndpoint
JSONResponse• Alive?• Role• LastcommiYedtransac,onID• Instancesincluster• Role• InstanceID• Available?• URI
Iden,fyslavesfallingbehind
Doeseveryoneagreeoncomposi,onofcluster?
/db/manage/server/jmx/domain/org.neo4j/instance%3Dkernel%230%2Cname%3DHigh%20Availability
CrossDC-Clusters
• Samesubnet(considerusingaVPN)• BandwidthbetweenDCsalignedwithwritethroughput• Commonprac,ce:instancesinsecondaryrunasslave-only• Restrictsmasterelec,ontotheprimary
• Whenfailingover,reconfigureinstancesinsecondary
ha.slave_only=true
neo4j.proper)es
ha.slave_only=false
neo4j.proper)es
Mul3-RegionClustersinAWS
AllVersions• Recommended:AmazonVPC(VirtualPrivateCloud)
2.3Enterprise• 2.3supportsmul,-regionclusterswithnoaddi,onalinfrastructure• UsepublicDNSnamesratherthanIPaddressesinha.initial hosts,ha.serverandha.cluster_server
• Warning:usespublicinternet
ScaleHorizontallyForHighReadThroughput
Applica,on
Master Slave Slave
LoadBalancer
HAProxyELB
NGINX
ScaleHorizontallyForHighReadThroughput
Applica,on
Master Slave Slave
ReadLoadBalancerWriteLoadBalancer
HAProxyConfigura3on
hYp://blog.armbruster-it.de/2015/08/neo4j-and-haproxy-some-best-prac,ces-and-tricks/
ConfigureHAProxyasReadLoadBalancerglobal daemon maxconn 256
defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms
frontend http-in bind *:80 default_backend neo4j-slaves
backend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 check
listen admin bind *:8080 stats enable
ConfigureHAProxyasReadLoadBalancerglobal daemon maxconn 256
defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms
frontend http-in bind *:80 default_backend neo4j-slaves
backend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 check
listen admin bind *:8080 stats enable
404 Not Found false
404 Not Found UNKNOWN
200 OK true
Master
Slave
Unknown
ImproveReadPerformancewithCacheSharding
Applica,on
1 2 3
LoadBalancer
MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})...
CacheShardingUsingConsistentRou3ng
Applica,on
1 2 3
LoadBalancer
MATCH (c:Country{name:'Australia'})... MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})... A-I1J-R2S-Z3
MATCH (c:Country{name:'Zambia'})... MATCH (c:Country{name:'Norway'})... MATCH (c:Country{name:'Australia'})...
ConfigureHAProxyforCacheShardingglobal daemon maxconn 256
defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms
frontend http-in bind *:80 default_backend neo4j-slaves
backend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32
listen admin bind *:8080 stats enable
ConfigureHAProxyforCacheShardingglobal daemon maxconn 256
defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms
frontend http-in bind *:80 default_backend neo4j-slaves
backend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32
listen admin bind *:8080 stats enable
BackupsModes• Full• Incremental• Ontopofapreviousbackup• Useslogicallogstoapplychanges,sologsmustbekeptatleast2xbackupinterval
ConsistencyCheck• Backupandstandalonetool• Evaluatestorehealth• Partofbackupandstandalonetool• -verify false todisableinbackup
keep_logical_logs=7 days
neo4j.proper)es
BackupStrategies
• Localorremotebackups• Ifbackinguptoremotemachine,consistencychecktakesplaceofflinewithrespecttothedatabase
• Backupfromadedicatedslaveorroundrobin• Chooseaschedule:• Fullonceperday,incrementaleveryhour
• Torestorefrombackup:• Stopinstance• Replacegraph.dbwithbackup• Startinstance
BackupStrategies
BackupServer
A B C
A–full,consistencycheckB–full,consistencycheckC–full,consistencycheckA–incrementalB–incrementalC–incremental…A–incrementalB–incrementalC–incrementalA–full,consistencycheckB–full,consistencycheckC–full,consistencycheck
bin/neo4j-backup \ -from single://neo4j.example.org:20000 \ -to /backups/201510151318263/graph.db -verify true|false
MonitoringPull• MetricsavailableviaJMXandHTTPandinbrowser
Push• Metricspublishingincludedin2.3(Enterprise)• Node,rela,onship,propertycounts• HAnetworkusage• Transac,ons(ac,ve,started,commiYed,rolledback,etc)• Neo4jpagecache(pagefaults,evic,ons,flushes,excep,ons)• JVM
• Publishedto:• Graphite• Ganglia• CSV
metrics.graphite.enabled=true metrics.graphite.server=52.29.63.174:2003 metrics.prefix=neo4j-1
neo4j.proper)es
CollateInternalandExternalViewsoftheSystemSystem• collectd
Database• Metrics• Tailmessages.log
HAEndpoints• /db/manage/server/ha/master • /db/manage/server/ha/slave
ServerLatencies• h9p.log
CypherQueries• dbms.querylog.enabled=true • dbms.querylog.threshold=2s
Applica3onmetrics• End-to-endlatencies
TestatScaleSoakTests• Representa,vedatasetandqueries• Peakloadandabove
Verify• Correctness• Performance• Latency• Throughput
• StabilityOpera3ons• Backup• Disasterrecovery• Replaceinstances
PerformanceTips–UsetheCypherQueryPlanner
8,386,880hits 59,272hits
CREATE INDEX ON :Crime(description)
PerformanceTips–JVM
• LookforGCpausesinmessages.log• grep blocked data/graph.db/messages.log
• Causedby• Heaptoosmall• New/survivorspacetoosmall• BadlywriYenCypherqueryorunmanagedextension
EnableGCLogging
LogwillbewriYentodata/log/neo4j-gc.log
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log wrapper.java.additional=-XX:+PrintGCDetails wrapper.java.additional=-XX:+PrintGCDateStamps wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime wrapper.java.additional=-XX:+PrintTenuringDistribution wrapper.java.additional=-XX:+PrintGCCause
neo4j-wrapper.conf
PerformanceTips–UnmanagedExtensions• Singlerequest,manyopera,ons• Reducenetworklatencies
• Mul,pleimplementa,onop,ons• Cypher• TraversalFramework• GraphAlgoPackage• CoreAPI
• ControlRequest/ResponseFormat• JSON,CSV,protobuf,etc• Domain-specificrepresenta,ons• Compact• Conservebandwidth
• HTTPHeaders
Extension
Applica,on
PerformanceTips–WriteRequests
• AlignthenumberofconcurrentwriterequestswiththenumberofNeo4jserverthreadsonthemaster• Bydefault,numberofserverthreads=numberofCPUsreportedavailablebytheJVM
• Configurethenumberofthreadsinneo4j-server.proper)esusingorg.neo4j.server.webserver.maxthreads
• Servicerequestsfromathreadpoolinyourapplica,on• Usethethreadpoolqueuedepthtoapplybackpressure
PerformanceTips–BatchWritesUsingaQueue
Write
WriteWrite
Queue
SingleThread Batch
hYp://maxdemarzi.com/2013/09/05/scaling-writes/hYp://maxdemarzi.com/2014/07/01/scaling-concurrent-writes-in-neo4j/
ThankYou