(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Post on 29-Jun-2015

1.543 views 0 download

Tags:

description

Amazon CloudSearch is a fully-managed search service in the cloud that lets you quickly and easily set up and use a search solution for your application. The latest version of CloudSearch includes tons of new and advanced search and administrative features. This session covers how to design for high scale at low cost, as well as best practices for handling multiple languages, ranking your search results, securing your CloudSearch domains, achieving cost-effective multi-tenancy, sourcing from many different systems, and getting the most out of your CloudSearch instances.

Transcript of (SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014

Pro tip

Amazon

CloudSearchActions

Upload

_convert_tweet(l)

Pro tip

_convert_tweet(r)

cloudsearchdomain

AWS CloudTrail

{

"eventVersion": "1.01",

"userIdentity": {"type": "Root", "principalId": "...", "arn": "...", "accountId": "...", "accessKeyId": "..."},

"eventTime": "2014-10-27T20:53:07Z",

"eventSource": "cloudsearch.amazonaws.com",

"eventName": "DescribeDomains",

"awsRegion": "us-east-1",

"sourceIPAddress": "...",

"userAgent": "aws-sdk-java/unknown-version Linux/2.6.18-164.el5 Java_HotSpot(TM)_64-Bit_Server_VM/23.25-b01/1.7.0_25",

"requestParameters": {"domainNames": ["twitter-geo"]},

"responseElements": null,

"requestID": "40d6953b-5e1b-11e4-ae8f-97e54e307088",

"eventID": "9835fa54-b8d3-4fb0-ac6e-ef1403069f7b"

},

Pro tip

Pro tip

SmallLarge

XLarge

2XLarge2XLarge

(P1)

2XLarge

(P2)

Increasing data

Instance typeTwitter data

(Search only)

Common-crawl data

(Search only)

search.m1.small6.7 GB, 4.7 million

documents

4 GB, 625 K

documents

search.m1.large26.8 GB, 18.8 million

documents

16 GB, 2.5 million

documents

search.m2.xlarge53.6 GB, 37.6 million

documents

34 GB, 5 million

documents

search.m2.2xlarge*107.2 GB, 75.2 million

documents

64 GB, 10 million

documents

No

options

All

optionsHighlight Return Sort Facet

Partitions 5 2xl 7 2xl 7 2xl 5 2xl 5 2xl 5 2xl

Percent

increase0% 243% 220.8% 153.2% 12.7% 0.3%

Instance type Instance threads Connecting threads

search.m1.small 2 1

search.m1.large 5 3

search.m2.xlarge 9 5

search.m2.2xlarge* 17 9

Pro tip

SEARCH INSTANCE

Index Partition nReplica 1

SEARCH INSTANCE

Index Partition 2Replica 2

SEARCH INSTANCE

Index Partition nReplica 2

SEARCH INSTANCE

Index Partition 2Replica n

SEARCH INSTANCE

Search request volume and complexity

Index Partition nReplica n

SEARCH INSTANCE

Index Partition 1Replica 1

SEARCH INSTANCE

Index Partition 2Replica 1

SEARCH INSTANCE

Index Partition 1Replica 2

SEARCH INSTANCE

Index Partition 1Replica n

Instance typeInstance

threadsJMeter

Twitter

throughput

Com crawl

throughput

search.m1.small 202 hosts

10 threads

25.1 qps

397 ms

48.3 qps

206 ms

search.m1.large 204 hosts

20 threads

108.5 qps

183 ms

291.5 qps

68 ms

search.m2.xlarge 208 hosts

40 threads

419.6 qps

94 ms

665.9 qps

59 ms

search.m2.2xlarge 2016 hosts

80 threads

566.4 qps

140 ms

985.3 qps

80 ms

????

? ??????

? ??

????

? ??

????

? ??????

? ??

????

? ??

SEARCH.m1.smallIndex Partition 1

Replica 1

SEARCH.m1.smallIndex Partition 1

Replica 1

SEARCH.m1.smallIndex Partition 1

Replica 2

SEARCH.m1.smallIndex Partition 1

Replica 3

{"status": { "rid": "i8TQupgpEQocRhU=","time-ms": 3},

"hits": {"found": 9234, "start": 0,

"hit": [

{

"id": "523254764427952129",

"fields": {

"text": "idk if its yummy or what lol im hungry"

}

},...

{"status": {"rid": "lPfcupgpFAocRhU=","time-ms": 4},

"hits": {"found": 6235,"start": 0,

"hit": [

{

"id": "523260481096540160",

"fields": {

"text": "idk what it is but ... something's

different"

}

}, ...

{"status": {"rid": "9MPvupgpFwocRhU=","time-ms": 2},

"hits": {"found": 8997,"start": 0,

"hit": [

{

"id": "523303605575909376",

"fields": {

"text": "Idk ... Idk idk idk idk idk idk"

}

},

{"status": {"rid": "+r6Wh5gpBgocRhU=", "time-ms": 178},

"hits": {"found": 78,"start": 0,

"hit": [

{

"id": "523341488005345280",

"fields": {

"text": "I love talking baseball with my dad"

}

},...

Pro tip

0

50

100

150

200

250

q= fq= fq=(10Queries)

TotalQ

ueryLatency,M

illiseconds

QueryCondi on

p50

Average

p90

{"status": {"rid": "vtjHjJgpDwocRhU=","time-ms": 41},

"hits": {"found": 10389,"start": 0,

"hit": [

{

"id": "523310760416378881",

"fields": {

"text": "Still can't believe it! What a game!

Can't wait for Tuesday @sfgiants #worldseries @ AT&T

Park http://t.co/TTNP7CPHHP",...

Great Day of Baseball here

at the Junior Fall Classic

Good Morning! Fall

#Baseball. #HuntingtonPark

Beautiful Saturday morning

for baseball in Norfolk.

A day off. Pretty nice to have

one sometimes. No teaching,

no #baseball

One word to describe 9th

inning....baseball. #SFGiants

I'm on a #SFGiants high.

Listening to analysis...

@RealTimers @thejoelstein

Unless it's #SFGiants...

Apropos of nothing:

#SFGiants are in the Big

Show again...

Pro tip

AmazonCloudSearchDomainClient

cloudsearchdomain

Pro tip

Amazon

ElastiCache

Amazon

CloudSearch

Application

servers

1

2

3

4

Pro tip

Pro tip

Application

servers

Amazon Redshift

Session info

Queries/results

Clicks

Purchases

Amazon

CloudSearch

Amazon EMR Application DB

Update

processing

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevals