(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

59

description

Amazon CloudSearch is a fully-managed search service in the cloud that lets you quickly and easily set up and use a search solution for your application. The latest version of CloudSearch includes tons of new and advanced search and administrative features. This session covers how to design for high scale at low cost, as well as best practices for handling multiple languages, ranking your search results, securing your CloudSearch domains, achieving cost-effective multi-tenancy, sourcing from many different systems, and getting the most out of your CloudSearch instances.

Transcript of (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Page 1: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 2: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 3: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 4: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 5: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 6: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 7: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 8: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Amazon

CloudSearchActions

Upload

Page 9: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 10: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

_convert_tweet(l)

Page 11: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 12: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

_convert_tweet(r)

Page 13: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

cloudsearchdomain

Page 14: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 15: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 16: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

AWS CloudTrail

{

"eventVersion": "1.01",

"userIdentity": {"type": "Root", "principalId": "...", "arn": "...", "accountId": "...", "accessKeyId": "..."},

"eventTime": "2014-10-27T20:53:07Z",

"eventSource": "cloudsearch.amazonaws.com",

"eventName": "DescribeDomains",

"awsRegion": "us-east-1",

"sourceIPAddress": "...",

"userAgent": "aws-sdk-java/unknown-version Linux/2.6.18-164.el5 Java_HotSpot(TM)_64-Bit_Server_VM/23.25-b01/1.7.0_25",

"requestParameters": {"domainNames": ["twitter-geo"]},

"responseElements": null,

"requestID": "40d6953b-5e1b-11e4-ae8f-97e54e307088",

"eventID": "9835fa54-b8d3-4fb0-ac6e-ef1403069f7b"

},

Page 17: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 18: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 19: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 20: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 21: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 22: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

SmallLarge

XLarge

2XLarge2XLarge

(P1)

2XLarge

(P2)

Increasing data

Page 23: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Instance typeTwitter data

(Search only)

Common-crawl data

(Search only)

search.m1.small6.7 GB, 4.7 million

documents

4 GB, 625 K

documents

search.m1.large26.8 GB, 18.8 million

documents

16 GB, 2.5 million

documents

search.m2.xlarge53.6 GB, 37.6 million

documents

34 GB, 5 million

documents

search.m2.2xlarge*107.2 GB, 75.2 million

documents

64 GB, 10 million

documents

Page 24: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 25: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

No

options

All

optionsHighlight Return Sort Facet

Partitions 5 2xl 7 2xl 7 2xl 5 2xl 5 2xl 5 2xl

Percent

increase0% 243% 220.8% 153.2% 12.7% 0.3%

Page 26: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Instance type Instance threads Connecting threads

search.m1.small 2 1

search.m1.large 5 3

search.m2.xlarge 9 5

search.m2.2xlarge* 17 9

Page 27: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 28: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 29: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 30: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

SEARCH INSTANCE

Index Partition nReplica 1

SEARCH INSTANCE

Index Partition 2Replica 2

SEARCH INSTANCE

Index Partition nReplica 2

SEARCH INSTANCE

Index Partition 2Replica n

SEARCH INSTANCE

Search request volume and complexity

Index Partition nReplica n

SEARCH INSTANCE

Index Partition 1Replica 1

SEARCH INSTANCE

Index Partition 2Replica 1

SEARCH INSTANCE

Index Partition 1Replica 2

SEARCH INSTANCE

Index Partition 1Replica n

Page 31: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Instance typeInstance

threadsJMeter

Twitter

throughput

Com crawl

throughput

search.m1.small 202 hosts

10 threads

25.1 qps

397 ms

48.3 qps

206 ms

search.m1.large 204 hosts

20 threads

108.5 qps

183 ms

291.5 qps

68 ms

search.m2.xlarge 208 hosts

40 threads

419.6 qps

94 ms

665.9 qps

59 ms

search.m2.2xlarge 2016 hosts

80 threads

566.4 qps

140 ms

985.3 qps

80 ms

Page 32: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

????

? ??????

? ??

????

? ??

????

? ??????

? ??

????

? ??

SEARCH.m1.smallIndex Partition 1

Replica 1

SEARCH.m1.smallIndex Partition 1

Replica 1

SEARCH.m1.smallIndex Partition 1

Replica 2

SEARCH.m1.smallIndex Partition 1

Replica 3

Page 33: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 34: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

{"status": { "rid": "i8TQupgpEQocRhU=","time-ms": 3},

"hits": {"found": 9234, "start": 0,

"hit": [

{

"id": "523254764427952129",

"fields": {

"text": "idk if its yummy or what lol im hungry"

}

},...

Page 35: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

{"status": {"rid": "lPfcupgpFAocRhU=","time-ms": 4},

"hits": {"found": 6235,"start": 0,

"hit": [

{

"id": "523260481096540160",

"fields": {

"text": "idk what it is but ... something's

different"

}

}, ...

Page 36: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

{"status": {"rid": "9MPvupgpFwocRhU=","time-ms": 2},

"hits": {"found": 8997,"start": 0,

"hit": [

{

"id": "523303605575909376",

"fields": {

"text": "Idk ... Idk idk idk idk idk idk"

}

},

Page 37: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

{"status": {"rid": "+r6Wh5gpBgocRhU=", "time-ms": 178},

"hits": {"found": 78,"start": 0,

"hit": [

{

"id": "523341488005345280",

"fields": {

"text": "I love talking baseball with my dad"

}

},...

Page 38: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 39: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

0

50

100

150

200

250

q= fq= fq=(10Queries)

TotalQ

ueryLatency,M

illiseconds

QueryCondi on

p50

Average

p90

Page 40: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

{"status": {"rid": "vtjHjJgpDwocRhU=","time-ms": 41},

"hits": {"found": 10389,"start": 0,

"hit": [

{

"id": "523310760416378881",

"fields": {

"text": "Still can't believe it! What a game!

Can't wait for Tuesday @sfgiants #worldseries @ AT&T

Park http://t.co/TTNP7CPHHP",...

Page 41: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Great Day of Baseball here

at the Junior Fall Classic

Good Morning! Fall

#Baseball. #HuntingtonPark

Beautiful Saturday morning

for baseball in Norfolk.

A day off. Pretty nice to have

one sometimes. No teaching,

no #baseball

One word to describe 9th

inning....baseball. #SFGiants

I'm on a #SFGiants high.

Listening to analysis...

@RealTimers @thejoelstein

Unless it's #SFGiants...

Apropos of nothing:

#SFGiants are in the Big

Show again...

Page 42: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 43: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 44: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 45: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

AmazonCloudSearchDomainClient

Page 46: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

cloudsearchdomain

Page 47: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 48: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 49: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Amazon

ElastiCache

Amazon

CloudSearch

Application

servers

1

2

3

4

Page 50: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 51: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 52: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 53: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 54: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 55: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Page 56: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Application

servers

Amazon Redshift

Session info

Queries/results

Clicks

Purchases

Amazon

CloudSearch

Amazon EMR Application DB

Update

processing

Page 57: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 58: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
Page 59: (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals