Post on 14-Dec-2015
Walter Binder
University of Lugano, Switzerland
Niranjan Suri
IHMC, Florida, USA
Green Computing: Energy Consumption
Optimized Service Hosting
2
009
-01
-26
2
Motivation
• Data centers are becoming ubiquitous
Large installations of computer systems
Providing critical services
• Data centers are big power consumers
Continuously operating computers,
regardless of the load
Cooling
2
009
-01
-26
3
Reducing Power Consumption
• Green Grid consortium advocates data center design and management to improve energy efficiency
• Right-sizing data centers at design time
• Energy-efficient cooling
• Virtualization (multiple servers on same physical machine)
• Processor power saving (e.g., clock rate depending on load)
• Powering down unused machinesComputers with dedicated roles (e.g., computers performing backups)
2
009
-01
-26
4
Our Approach
• Load on machines varies over time
• Turn off subset of unnecessary machines, respectively restart machines according to load
• Problems
Load is distributed over multiple machines
Load reduction typically also distributed across multiple machines
Need to consolidate load on a subset of machines in order to free up machines that can be turned off
• Goal: Minimum number of machines running
• Constraint: QoS must be ensured
Service-Level Agreements (SLAs) must not be violated
2
009
-01
-26
5
Example
0%
100%
A B n
Heavy Load
0%
100%
A B n
Light Load (Evenly Distributed)
0%
100%
A B n
Light Load (Consolidated)
Shutdown Idle Servers
2
009
-01
-26
6
Service Types
• Hosting environment may offer multiple service types
• Service type consists ofService interface
SLA defining QoS parameters
• SLA parameters specified according to a common ontology
WS-Agreement, WSLA, SLAng, etc.
Here: Single QoS parameter: Response time
2
009
-01
-26
7
Stateless versus Stateful Services
• Stateless service:
Requests are independent
After completing all pending requests, a stateless service may be stopped
• Stateful service:
Requests in one session may depend on prior requests in the same session
Sessions may be explicitly terminated by clients, or expire after some period of inactivity
After termination of all sessions, a stateful service may be stopped
2
009
-01
-26
8
Hosting Environment (1)
• Dedicated machines for three different purposes:
File servers
• Provide all data sources
Compute servers
• Execute service requests
Dispatchers
• Receive service requests and choose compute servers to handle them
• Decide on shutdown and restart of compute servers
• Dispatchers and file servers are continuously running
• Only idle compute servers may be shut down
2
009
-01
-26
9
Hosting Environment (2)
Compute servers File serversDispatcherClients
requests dispatchdata
access
2
009
-01
-26
10
Hosting Environment (3)
• Heterogeneous environment
Machines have different computing resources
• Dynamically changing environment
New machines may be added
Cores may fail
• Compute servers may host any number of service types, and a service type may be hosted by any number of compute servers
• Compute servers are ranked according to energy efficiency
2
009
-01
-26
11
Node Manager
• Each compute server runs a Node Manager component
• Monitors idle time and average response time for each service type
• Communicates measurements to dispatcher
• Handles server shutdown upon request from dispatcher
• Notifies dispatcher upon startup
2
009
-01
-26
12
Shutdown of Compute Severs
• Dispatcher notifies Node Manager on compute server to prepare shutdown
• No further service requests are dispatched to the compute server
• Node Manager waits for
Completion of all previously accepted requests
Termination of all active sessions
• Alternative: Migration of sessions
2
009
-01
-26
13
Shutdown Options
• Complete shutdown
No power consumption
Ensures clean state upon restart (e.g., no memory leaks)
Slow restart
• Hibernation
No power consumption
Memory saved on persistent storage
Resume by reloading memory snapshot
• Standby
Reduced power consumption
Processor stopped, but memory remains active
Fast restart
2
009
-01
-26
14
Restart of Compute Servers
• Wake on LAN
• Magic packet is broadcast to LAN
Special header: 0xFF repeated 6 times
MAC address of the machine to restart
• Dispatcher initiates compute server restart
• Node Manager notifies dispatcher of completed restart
• Dispatcher needs to know MAC addresses of all compute servers
2
009
-01
-26
15
Service Dispatch: Definitions
• n compute servers <s1,…,sn>
• Sorted according to energy efficiency
sx more energy efficient than sy x < y
• In each configuration
s1 … sr are running (1 ≤ r ≤ n)
sr … sn are shut down
(or in the process of shutting down)
• pT(i): probability that request for service type T is dispatched to s i
2
009
-01
-26
16
Service Dispatch upon Request
• Take a random number z
(0 ≤ z ≤ 1; uniform distribution)
• Choose sc such that
c = min { i: (1 ≤ i ≤ n) &&
(z ≤ sum(1; i; pT(i))) }
• Related to lottery scheduling
Tickets instead of probabilities
2
009
-01
-26
17
Update of Probabilities (1)
• In regular intervals, dispatcher obtains monitoring data from Node Managers of running compute servers
• If si had idle time and si had no problem meeting the SLAs:
Increase load on si, reduce load on sr
pT(r) := pT(r) – Δp
pT(i) := pT(i) + Δp
• If r > 1 and for all service types TpT(r) = 0, initiate shutdown of sr
2
009
-01
-26
18
Update of Probabilities (2)
• If compute server si violates the SLA for a service type T (overload situation):
First try to find a running compute server sk (1 ≤ k ≤ r) that has idle time and met the SLAs of all service types
• Balance load between si and sk
• pT(i) := pT(i) – Δp
• pT(k) := pT(k) + Δp
If there is no such compute server sk, initiate restart of sr+1
2
009
-01
-26
19
Future Work (1)
• Testbed and evaluationMain evaluation metric: Energy savings for given workloads
Service performance must be modeled
Traces of service execution in data centers needed
• Migration of sessionsReduces the time for preparing shutdown
• Complex optimization criteriaMinimize number of service types hosted on the same compute server
Consider estimated shutdown preparation time when choosing the compute server to shut down
2
009
-01
-26
20
Future Work (2)
• Distribution and replication
Service dispatcher must not become bottleneck
• Fault tolerance
Dispatcher must detect compute server failures
Dispatcher must not become single point of failure
• Sudden load fluctuations
Shutting down machines increases vulnerability wrt. denial-of-service attacks
2
009
-01
-26
21
Conclusions
• Data centers are growing and consume huge amounts of electrical energy
• Energy can be saved by powering down unused machines according to the current load
• Requires consolidation of services on a subset of the available machines
• Probabilistic approach to energy consumption-aware load-balancing