[RakutenTechConf2013] [A-4] The approach of Event in Japan Ichiba
-
Upload
rakuten-inc -
Category
Technology
-
view
629 -
download
0
description
Transcript of [RakutenTechConf2013] [A-4] The approach of Event in Japan Ichiba
The approach of Big EventIn Japan Ichiba
Vol.01 Oct/26/2013Yusuke Kobayashi
Group ManagerMall Group, Japan Ichiba SectionRakuten Ichiba Development Department, Rakuten, Inc.http://www.rakuten.co.jp/
2
Index
Big Sale in Ichiba
3
Introduce Me
Yusuke Kobayashi
Group ManagerJapan Ichiba Section Japan Mall Group
Rakuten Ichiba Development Department• Joined in 2005.• Fist career in Rakuten was Infoseek.• Transferred to Ichiba from 2009.
MALL RMS IBS IBE
Japan Ichiba Section
4
Index
1.Scale of Big Sales - Huge Traffic Scale, Amazing Sales.
2.History of success - Share how we could improve our services.
3.Case study - Show trouble case and explain the countermeasure. -- Checkout System -- Infrastructure(Cloud Environment/Network)
5
Shopping Marathon
Shop around points
6
Super Sale
Half Price Items, Point, Topic Items
7
Victory Sale
77%Off, Half Price,1001 yen items
8
1.Scale of Big Sales - Huge Traffic Scale, Amazing Sales.
2.History of success - Share how we could improve our services.
3.Case study - Show trouble case and explain the countermeasure. -- Checkout System -- Infrastructure(Cloud Environment/Network)
9
1.Scale of Big Sale
15 Billion Sales per day
10
1.Scale of Big Sale
Victory Sale5% traffic of entire Japan!!
11
Usual
1.Scale of Big Sale
Comparison of order numbers between big sale and usual.
Sale
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
Time(1hour)
Order Number
12
1.Scale of Big Sales - Huge Traffic Scale, Amazing Sales.
2.History of success - Share how we could improve our services.
3.Case study - Show trouble case and explain the countermeasure. -- Checkout System -- Infrastructure(Cloud Environment/Network)
13
Monitoring by 100 DU members
14
Our Energy!
15
2. History
2012/03/04 00:00 – 2012/03/04 23:59
2012/06/03 00:00 – 2012/03/06 01:59
2012/12/02 00:00 – 2012/12/04 01:59
2013/03/03 00:00 – 2013/03/05 01:59
2013/06/02 00:00 – 2013/06/05 01:59
2013/09/01 00:00 – 2013/09/04 01:59
2013/09/27 00:00 – 2013/09/30 01:59
16
2012/03 – Super Sale
24h LimitedHalf Price
Special Items
TV CommercialTrain AD
17
Top PageEvent Page
Search
IDAD
Entry
ItemPage
BookmarkPurchaseHistory
Review
Checkout
Search Engine
2012/03 – Super Sale
Point Coupon
18
2012/03 – Super Sale
19
2012/03 – Super Sale
20
2012/03 – Super Sale
Almost all services Delayed. Huge traffic. Application high load. Over band frequency. DB high load. NFS high load.
Mass Media powerwas so huge.
21
2012/03 – Super Sale
Just only Restart and Reboot. Change Apache Configuration. Restart Apache. Reboot physical servers.
Contents Delete Access Control by Creative Web Design team.
22
2012/03 – Super Sale
Countermeasure Enhance Web/Apps/NW/DB Servers. Band width limitation Tuning Middleware configuration. Decrease traffic. Contents Control. Web Front Speed UP.
23
2012/06 – Super Sale
!?
24
Bookmark
2012/06 – Super Sale
Top PageEvent Page
Search
IDAD
Entry
ItemPage
PurchaseHistory
Review
Checkout
Search Engine
Point Coupon
25
2012/06 – Super Sale
ID went down and Checkout delayed.
We expanded the period due to big troubles.
26
ID Service
0
10,000
20,000
30,000
40,000
50,000
60,000
6/222:00
6/30:00
6/32:00
6/34:00
6/36:00
6/38:00
6/310:00
6/312:00
6/314:00
6/316:00
6/318:00
6/320:00
6/322:00
6/40:00
6/42:00
#login 20120603
#login 20120527b
#login 20120304b
0
50000
100000
150000
200000
250000
300000
350000
400000
20
12
/6
/3
0:0
0
20
12
/6
/3
0:3
0
20
12
/6
/3
1:0
0
20
12
/6
/3
1:3
0
20
12
/6
/3
2:0
0
20
12
/6
/3
2:3
0
20
12
/6
/3
3:0
0
20
12
/6
/3
3:3
0
20
12
/6
/3
4:0
0
20
12
/6
/3
4:3
0
20
12
/6
/3
5:0
0
20
12
/6
/3
5:3
0
20
12
/6
/3
6:0
0
20
12
/6
/3
6:3
0
20
12
/6
/3
7:0
0
20
12
/6
/3
7:3
0
20
12
/6
/3
8:0
0
20
12
/6
/3
8:3
0
20
12
/6
/3
9:0
0
20
12
/6
/3
9:3
0
20
12
/6
/3
10
:00
20
12
/6
/3
10
:30
20
12
/6
/3
11
:00
20
12
/6
/3
11
:30
20
12
/6
/3
12
:00
20
12
/6
/3
12
:30
20
12
/6
/3
13
:00
20
12
/6
/3
13
:30
20
12
/6
/3
14
:00
20
12
/6
/3
14
:30
20
12
/6
/3
15
:00
20
12
/6
/3
15
:30
20
12
/6
/3
16
:00
20
12
/6
/3
16
:30
20
12
/6
/3
17
:00
20
12
/6
/3
17
:30
20
12
/6
/3
18
:00
20
12
/6
/3
18
:30
20
12
/6
/3
19
:00
20
12
/6
/3
19
:30
20
12
/6
/3
20
:00
20
12
/6
/3
20
:30
20
12
/6
/3
21
:00
20
12
/6
/3
21
:30
20
12
/6
/3
22
:00
20
12
/6
/3
22
:30
20
12
/6
/3
23
:00
20
12
/6
/3
23
:30
20
12
/6
/4
0:0
0
20
12
/6
/4
0:3
0
20
12
/6
/4
1:0
0
20
12
/6
/4
1:3
0
20
12
/6
/4
2:0
0
20
12
/6
/4
2:3
0
ID3 2/3
ID3 1/4
0:00AMSun 6/3
2:00AMMon 6/4
Fig.1: # of Login Successes
Fig.2: # of DB Connection Errors
[0:00-1:09]Just after the launch of Super Sales, DB Connection Errors occurred because of the users' massive accesses. Some of the users experienced connection errors. Errors automatically solved with users’ access decrease.
[20:20 - 0:34]Serious DB Connection Errors occurred because of the users' massive accesses. Critical user login failures by reboots of the ID services, limitation of Login, etc.
[22:37 - 0:42]Ichiba stopped using ID service. Purchases were only processed by non-members.
[23:15 - 23:40]ID service stopped because of server reboot and server down.
[Sun 6/3] Super Sale (2nd) this time[Sun 5/27] Ordinary Sunday[Sun, 3/4] Super Sale (1st) last time
[0:41]DB Connection Error terminated just after stopping batch program for Fraud Access Management running every 10 min. Users became login smoothly.
LoginSuccess
27
ID Service
ID Service had serious DB connection errors
during the following time period.(1) 0:00 - 1:09 (Sun, 6/3-2012)
Just after the launch of Super Sales, DB Connection Errors occurred because of the users' massive accesses.Some of the users experienced connection errors.
# of DB Connection Errors = 160,497 (in 10 min) (ref. # of Login Successes = 2,167,126)
(2) 20:20(Sun, 6/3-2012) - 0:34 (Mon, 6/4)Serious DB Connection Errors occurred because of the users' massive accesses. Critical user login failures by reboots of the ID services, limitationof Login, etc.
22:37-0:42, Ichiba stopped using ID service. Purchases were only processed by non-members.# of DB Connection Errors = 4,063,667 (in 4 hours 15 min)
Impacted to entire Rakuten Group.
28
Checkout
Web
Web
Web
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
Web
Web
Web
Web
Web
Web
APP
APP
APP
APP
APP
APP
APP
API
Enhance the instances
Enhance the instances
Change the configuration of thread
Change the threshold
29
Have no time!!!!
30
2012/06 – Super Sale
Countermeasure for next Super Sale. Migration of DB servers.(ID) Enhancement Applications Servers.
-> Transfer to Cloud environment not using Physical servers.
Load Test on Production environment.
-> We did on staging environment, but it was not enough.• User Numbers• Item Numbers• Transaction• Server Spec
Different between staging and production
environment.
31
2012/12 – Super Sale
32
2012/12 – Super Sale
Bookmark
Top PageEvent Page
Search
IDAD
Entry
ItemPage
PurchaseHistory
Review
Cart
Search Engine
Point Coupon
33
2012/12 – Super Sale
The first peak time -> DownThis was the most high traffic in this year.
Search, Item Page and Checkout were down.
34
Search
Web
Web
Web
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
Application load was high
The beginning of the first peak time, search applications was high load because the huge traffic came from event contents.-> We enhanced 65ins by using Cloud environment within 4 hours.
The LB which we enhanced was high load.
Search EngineAPP
APP
APP
APP
APP
APP
APP
LB was highEnhance
35
Item Page
12/02 9:00 pm to 24:00
Disk Util was 100%
36
Item Page
The connection delayed between App and NFS.
37
Item Page
We switched to Akamai during peak time. Cache 25min.(by Mikitani-san suggestion.) Inventory data was not updated in real time. The countermeasure of emergency.
38
Item Page
39
Checkout
Web
Web
Web
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
Web
Web
Web
Web
Web
Web
APP
APP
APP
APP
APP
APP
APP
API
High Load
40
2012/12 – Super Sale
Countermeasures for next Continuously doing the load test of checkout system. Need to decrease NFS call numbers in Item Page
system. Transfer to Cloud environment gradually.
41
Item Page
AppServer Decreasing Unnecessary File Call
Memcached Server Cache more kinds of files.
- Shop data, Layout data. Re-cache anytime whether file is update or not
when cache is expired
42
Item PageBefore After
30% Down
43
Checkout
APP
Jmeter
Web
WEB serverAPP
APP server
Cache DataBase
APP
API α APP
API βAPP
API γ APP
API δ
Load Test about 50 times in midnight…
44
Checkout
We always have so many load tests for checkout systems before Super
Sale.Explain later…by Hashiyama.
45
2013/03 – Super Sale
46
2013/03 – Super Sale
Bookmark
Top PageEvent Page
Search
IDAD
Entry
ItemPage
PurchaseHistory
Review
STEP
Cart
Search Engine
BasketAPI Point Coupon
47
No Trouble!!
Success!!
48
2013/06 & 2013/09 – Super Sale
No Big Trouble!
49
From this year…
50
Victory Sale
Just only 3 weeks for this event preparation..
51
Victory Sale
1st Spike. From Yahoo!
2nd Spike. Event Started.
3rd Spike. New Paper AD
52
Victory Sale
Bookmark
Top PageEvent Page
Search
IDAD
Entry
ItemPage
PurchaseHistory
Review
STEP
Cart
Search Engine
BasketAPI Point Coupon
53
Top & Search
Top Page
Search
The traffic was higher more than we expected.Around 6 or 7 times!!!!
The countermeasure for this was just enhancement.
54
1.Scale of Big Sales - Huge Traffic Scale, Amazing Sales.
2.History of success - Share how we could improve our services.
3.Case study - Show trouble case and explain the countermeasure. -- Checkout System -- Infrastructure(Cloud Environment/Network)
55
About Me
Name Makito Hashiyama(@capyogu)
Role Team manager of APIs for Rakuten Ichiba
Recent activity GlassFish Community Feedback @ JavaOne 2013
Contact [email protected]
56
Overview of Rakuten Ichiba Checkout
Architecture Behaves like a service bus based on SOA Calls more than 15 external APIs and mashes up
them, then provides
Scale More than 100 application servers
Checkout
APIKVS
External APIs
Client side
SOAP/REST
57
Provides over 50 services to client side Get/Set shopper information Get/Set merchant information Validation Update inventory Register order data into DB etc..
Checkout
API
Item API Cart API
Merchant API
Client side
SOAP/REST
Overview of Rakuten Ichiba Checkout
58
Overview of Rakuten Ichiba Checkout
Stateful API Manage session information instead of client side Creates unique key and manage it with KVS Client side only have to call API with the key
KVSCheckout
API
SOAP/REST
Client side
key value
Key1 Session1
Key2 Session2
59
Overview of Rakuten Ichiba Checkout
Rakuten Super Sale Biggest online sales in Japan It causes a huge amount of traffic
Performance bottleneck External APIs called by our API were slow down We needed to improve the system at the peak time
delay
Checkout
API
External APISOAP/REST
Slow downDelay
60
How to execute load test
delay
Checkout
API
External APISOAP/REST
JMeter
Environment On production at midnight Execute over 50 times with JMeter
Test case 100,000 dummy shoppers 1,000,000 dummy items / 6,500 dummy merchant Reproduce sale’s load as much as possible
61
How to execute load test
APP
Jmeter
Web
WEB serverAPP
APP server
Cache DataBase
APP
API α APP
API βAPP
API γ APP
API δ
As a result, bottleneck moved to APP server
62
Improvement to handle a huge traffic
Task Queue
Worker Thread
Worker Thread
Worker Thread
Checkout API
Worker Thread
Worker Thread
External APIs
delay
delay
delay
delay
delay
CPU load was high
Client side
Request
63
Improvement to handle a huge traffic
Task Queue
Worker Thread
Worker Thread
Worker Thread
Checkout API
External APIs
Client side
Request
(1)According to vmstat, ‘run queue’ was very high(2)Decrease worker threads to keep ‘run queue’ low(3)As a result, latency increased but throughput was improved
64
Improvement to handle a huge traffic
As a result… Checkout API could process over 12,000
transactions / minute We also achieved 30,000 TPM in load test
(Just yesterday we did!!!)
65
Overview of Rakuten Ichiba Checkout
In the future
Set SLA for each external APIResolve performance issues
Synchronous vs. Asynchronous Upgrade library / middleware / JDK Deep copy(copy constructor vs. serialize)
66
Self introduction
Vice Group ManagerServer Platform Group / Network administration Group
Global Infrastructure Development Department
And Committee member of JANOG(JApan Network Operators’ Group)
Name : Osamu Iwasaki
Role : Network / Cloud Eng & Mgr
Twitter @osamuiwasakiSkype osamu.iwasaki
67
Our traffics history
0
20000
40000
60000
80000
100000
120000
140000
160000(Gbps)
Peak traffics at Victory Sales, over 140Gbps which was about over 5% of Japan Internet traffics
68
Our traffics history
0
20000
40000
60000
80000
100000
120000
140000
160000(Gbps)
Peak traffics at Victory Sales, over 140Gbps which was about over 5% of Japan Internet traffics
Victory Sale
Super Sales
69
Network traffic trend from 2012/Jan(SS traffic focus)
SuperSale 2012June
2012Dec
2012Mar
2013June
2013Sep
2013Oct(VS)
CDN 60G 78.9G 69.1G 75.8G 73.7G 127.6G
RakutenDC 12.7G 14.2G 12.8G 12.5G 11.7G 12.9G
Total 72.7G 93.1G 81.9G 88.3G 85.4G 140.5G
0
20000
40000
60000
80000
100000
120000
140000
160000(Gbps)
70
PC/FeaturePhone/Smartphone/Table share by Sales
Mobile traffics increase rapidly!! Almost 50%
71
Our private cloud history
About 1years ago, we starts from 300VMs.But now, around 10000VMs running for Rakuten Ichiba services. Compared last year is over 30 times !!!
72
Victory Sale
Just only 3 weeks for this event preparation..
73
But !!
74
Our Load Balancers are downed……
75
What happened at peak time
LoadBalancer-ACPU utilization
Peak TimePeak Time
LoadBalancer-BCPU utilization
Due to heavy traffics at VictorySale start time, CPU load of LoadBalancer rapidly growth……
76
After the result of re-allocation operation
LoadBalancer-ACPU utilization
LoadBalancer-BCPU utilization
After the VIP re-allocation, we could separate heavy traffics to other LoadBalancer
77
Our counter action for next Victory sale
ActiveSLB
(Target CPU under 30%)
StandbySLB
(CPU 0%)
VIP Group A
VIP Group B
Internet
Regular time3times peak capable
ActiveSLB
(Target CPU under 15%)
StandbySLB
(Target CPU under 15%)
VIP Group A
VIP Group B
Internet
BigSale time6times peak capable
VIP Group B
3times is not enough for us, 6times we need for the Super/Victory sales.
78
Next Victory sale ready?
79
Next Victory sale ready?
Yes, we are ready !!!
80
Wrap UP
81
Wrap Up
• Traffic : 5% of entire Japan.• Sales : Over 15B yen/day• Continuously
Tuning/Improvement• Cloud environment
82
Global Expansion - Super Sale
83
Worldwide Rakuten Super Sale
In future
84
And…
85
86
87
Thank you for listening.
Yusuke Kobayashi
@okoba23
Makito Hashiyama
@capyogu
Osamu Iwasaki
@osamuiwasaki