My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
NGCC 2016 - Support large partitions
-
Upload
robert-stupp -
Category
Technology
-
view
188 -
download
0
Transcript of NGCC 2016 - Support large partitions
Storageformatandkeycachechanges tosupport largepartitions
ROBERTSTUPP,DATASTAX
SOLUTIONARCHITECT,COMMITTER
ReadPathrecap
1. Bloomfilter
2. IndexSummary
3. PrimaryIndex
4. DataFile
RowIndexEntry◦ points topartition indatafile◦ Onlyforpartitions<64kB
IndexedEntry extendsRowIndexEntry◦ points topartition indatafile◦ Onlyforpartitions>64kB◦ ContainsoneIndexInfo objectper64kB
IndexedEntry
IndexedEntry extendsRowIndexEntryDeletionTimeArrayList
IndexInfoß per64kBDeletionTimeBufferClustering
KindByteBuffer[]
ByteBufferbyte[]
…BufferClustering
KindByteBuffer[]
ByteBuffer…
Approximationon16byteclustering-value:
•1MB:3kB/>200objects
•4MB:11kB/>800objects
•64MB:180kB/>13kobjects
•512MB:1.4MB/>106kobjects
IssueswithIndexedEntry
IssueswithIndexedEntry
•IndexedEntry objecttreebuiltduringflush/compaction
•IndexedEntry objecttreeconstructedforeveryread
•Hugenumberofobjectsà GC,GC,GC,…Nestedobjectstructure– harderforgarbagecollection
•Evicts“legit”entriesfromthekeycacheonreadsàmorediskI/O
Initialapproach
•IndexInfo neverkeptonheap
•Readfromdiskwhenneeded
•Causesnon-negligibleperformancedegrationw/trades-workload
Currentapproach
•IndexInfo keptonheap,ifserializedsizeofIndexedEntry<column_index_cache_size_in_kb
•Otherwisealwaysreadfromdiskwhenneeded
•WritePath (flush,compaction)similar:• IndexInfo keptonheap(forkeycache)if<column_index_cache_size_in_kb
• Otherwiseserializedtoabuffer(notkeptasanobject)
Readpatterns•Binarysearch
•IndexInfo objectsrevisistedbythesame”consumer”
•Sequentialreads(notusingindex)
Whatdoesitbuy?•Lessheappressureduringreads
•Lessheappressureduringflushes/compactions
•Testedfunctionality(write,read,fullcompaction)with8GBpartitionsinautest (w/tinyheap)
•Localnodeloadtestw/280MBpartitions
•GCE(5noden1-standard-8)clusterloadtestw/770MBpartitions• Clusterconstrainted bydiskI/O+netI/O
Largepartitionsconsiderations•Dependingonworkload• Increasecommitlog_segment_size_in_mb +commitlog_total_space_in_mb
• Increaseconcurrent_compactors (defaultof2mightbeabottleneck)and/orcompaction_throughput
•Keycachecanhold“tons”ofpartitioninformation
•Repairsstilltaketime(don’tseemtobenegativelyinfluencedbythepatch)
•Compactions&flushescauselessheappressure
•RecommendationsonmaxamountofdatapernodestillappliesIMO
•LargepartitionsdoesnotmeanlargeCQLrows(nativeprotocolresp buffer)
Findings,Suggesstions,Improvements……FORDISCUSSIONTOMORROW
Biggestissueduringtests:EndlessCMS-GCloops
Biggestissueduringtests:EndlessCMS-GCloopsReadinglargepartitionsresultsinlargeresponses(duh!)
Concurrentlargeresponsesleadtodirect-memory-OOMà Causes“endless”CMS-GCloopà Deadnode(thankyou,ByteBuffer!)
•Solutionpart#1:separateoff-heapmemorypoolinNetty (4.0.37+4.1.1)
•Solutionpart#2:separateoff-heapmemorypoolinC*
Anoverloaded2.2noderecoversfromthis.(Lessdirectmemoryusage– thatsimple?)
Issuesduringlocaltests:ME!“HowthehelldoIsetupmonitoring?”
•Graphite+Whisperà pythondependencyhellà toocomplicated
•Prometheusà reallynice!à hadtowritea“native”exporter(https://github.com/snazy/prometheus-metrics-exporter)
•Grafanaà cool!
Gatling•GatlingCql – initiallybyMikhailStepurahttps://github.com/gatling-cql/GatlingCql
Gatling2.2.1+C*-driver3.0.2
Gatlingitselfemitsmetricsduring
Contributionswelcome!
Findings/defaultconfigs•Changedefaultofconcurrent_reads/writes/couter_writes/mv_writesfrom32to#ofCPUs
•Changedefaultofnative_transport_threads from128to2*CPUs
Issuesduringlocaltests-Resourceconsumers?•No”lightweight”metricstomeasureCPUandheapconsumptionofthreadgroups
•Solutionoption#1:integrateinourpools(wouldmisssomethreads)
•Solutionoption#2:useprometheus-metrics-exporter(https://github.com/snazy/prometheus-metrics-exporter) +C*patch