1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy &...

1

On Choosing An Efficient Service

Selection Mechanism In Dynamic

Environmentsby

Murat Sensoy & Pinar YolumBogazici University,

Istanbul,Turkey

2

OUTLINE

Introduction Comparison of different service selection

mechanisms Problem statement and Proposed approach Evaluation Conclusions

3

INTRODUCTION

We examine the problem of service selection in an e-commerce setting where consumer agents cooperate to

identify service providers that would satisfy their service needs the most.

Service Selection Problem:

4

INTRODUCTION Using Selective Ratings (SR): Ratings are taken from those agents who have similar demands.

Using Context-Aware Ratings (CAR): Context of the ratings is described using an ontology. So, ratings are evaluated considering their context.

Using Experiences: Instead of ratings, experiences of consumers are represented using ontologies and shared. An experience represent what is demanded and what is provided in response. Two approach for using experiences:

Parametric classification and Gaussian Model (GM )Using Case-Based Reasoning (CBR )

5

Comparison of Service Selection Methods

Simulations: 20 Service Providers 400 Service Consumers Repeated 10 times

Performance Measures:Ratio of satisfaction (ratio of service decisions resulted in satisfaction)time required for service selection.

6

Variations in service quality (PI) : In our simulations, with a very small probability, providers deviate from their expected behavior in the favor of consumers (they produce absolutely satisfactory service). This probability is called probability of indeterminism (PI).

SIMULATION ENVIRONMENTSome factors are regarded in the Simulations:

Variations in service demand (PCD): Each service consumer changes its demand characteristics after receiving a service with a predefined probability denoted as PCD.

7

Variations in service satisfaction: Misleading similarity factor (β) is roughly the ratio of service consumers who have similar service demands but conflicting satisfaction criteria.

SIMULATION ENVIRONMENT

Example: β = 0.5 means that half of the consumers having similar demands have conflicting satisfaction criteria. In this case half of the ratings given for this demand will be misleading.

8

Configuration Bad Performance Good PerformancePCD=0, =0, PI=0 { } {SR, CAR, CBR, GM}

Consumers vary their demands.

( PCD > 0 )

{ SR } {CAR, CBR, GM}

Taste of the consumes vary.

( > 0 ){ SR, CAR } { CBR, GM }

Indeterminism.

( PI > 0){ CBR } {SR, CAR, GM}

PCD > 0, > 0, PI > 0 {SR, CAR, CBR} {GM}

RATIO OF SATISFACTION

9

Method Average Time Consumption (msec)

SR 0.9

CAR 10.6

CBR 502.6

GM 2432.9

TIME CONSUMPTION

TSR < TCAR < TCBR < TGM

10

PROBLEM

A number of different service selection methods are shortly explained.

Each of these approaches has different strengths and weaknesses in different configurations of the environment.

Configuration of the environment is not observable. Consumers can only observe the outcomes of their service selections.

How will an agent select among these methods given its trade-offs and a partially observable environment ?

11

Using Reinforcement Learning To Choose A Service Selection

Mechanism Dynamically

Reinforcement learning (RL) is an ideal learning technique to enable agents to learn the environment and thus decide on which strategy to use in a particular situation.

Hence, we propose to use RL for choosing a service selection mechanism in dynamic environments.

12

Basics of Reinforcement Learning

In RL, an agent interacts with the environment.

13


The agent partially observes the states of the environment.

14


The agent has a number of actions to take in a given state of the environment

15


As a result of this action, new state of the environment is observed.

16


… and a reward is given.The purpose of RL is to construct an optimal action policy that maximizes the total reward (finds the best service providers through out).

17

SERVICE SELECTION & RL

Actions are choosing one of different service selection mechanisms (e.g., choosing context-aware ratings).

Rewards are computed using the result of the current service selection mechanism and the trade-offs of the agent.

States of the environment is observed in terms of the consequences of the agent’s actions.

In order to use standard RL techniques, we need a reward function and a set of discrete states.

18

Reward function reflects the trade offs of the service consumers. The reward function used in this study is

shown below.

Reward Functiona negative reward after choosing an

action if there is another action with an expected ratio of satisfaction that is at

least 10% better than that of the chosen action.A negative reward if the chosen

action is at least 10% slower than another action whose ratio of

satisfaction is at most 1% worse than that of the chosen action.

19

States

Although we can parameterize environment in our simulations (using β, PCD, PI), in real-life, these parameters are not visible to consumer agents.

Consumer agents observe the environment through the consequences of their actions.

Therefore, states of the environment are coded using the expected ratio of satisfaction of known service selection mechanisms (actions). That is (RSR, RCAR, RCBR , RGM).

For example: If the consumer observes that RSR=0.5,RCAR=0.9, RCBR=0.7, and RGM=0.95 then the agent observes the state of the environment as (0.5,0.9,0.7,0.95).

20

States

Different values of (RSR, RCAR , RCBR, RGM) may represent different states of the environment.

This results in a continuous state-space.

This continuous state-space must be discretized in order to use standard Reinforcement Learning approaches.

We propose to use k-means clustering algorithm to incrementally create discrete states, each of which encapsulates a portion of continuous state-space.

21

Discretization Example

Initially there is only one state represented by a

cluster.

(0.5,0.7,0.9,0.95)

Initial observation of the agent

22


Observations of the agent are encapsulated

by this cluster

23


If within class variance exceeds a predefined threshold. A new state and a corresponding new

cluster is created.

24


25


26


27


28

Determination of the Current State

State 1

State 2

State 3

Given the current observation of the agent and the states with their corresponding clusters. How can we determine current state of the

environment ?

Current Observation

29


State 1

State 2

State 3

Euclidian distance from the current observation to the center of each cluster is computed. Current state is the nearest state (State 1).

0.1

0.15

0.2 Current state is State 1

30


State 1

State 2

State 3

Then, k-means is used to update clusters and compute new cluster centers. If necessary a new cluster (and a new state)

is created.

31

EVALUATIONWe take several runs to evaluate the performance of the proposed approach.

In the first 8 runs environment has only one configuration through the simulations. In the last run, environment is changed from the 1th

configuration to the 8th configuration through the simulations.Same performance as GM but 114 times faster.

RGM RRL TGM

TRL

Almost, same performance as GM but 46 times faster.

RGM RRL TGM

TRL

RGM RRL TGM

TRL

Same performance as GM but 95 times faster.

RGM RRL TGM

TRL

Performance is 10 % less than that of GM. Primary

choice of RL is GM.

When we combine different configurations, performance of RL is slightly less than that of GM and

it 32 times faster than GM.

Performance of Modeling each provider using GM

Performance of Proposed approach.

Ratio of time consumptions of the GM and the proposed approach.

32

CONCLUSION Our approach allows agents to learn how to choose the most useful service selection mechanism among different alternatives in dynamic environments.

Our experiments show that consumers choose the most useful service selection mechanism using the proposed approach.

The performance of the proposed approach does not go below the lower-bound defined by the trade-offs of the consumers.

As a future work, we plan to enable online addition of new service selection mechanisms. We also plan to enable agents to share their observations of the environment.

34


CONFIGURATION OF THE ENVIRONMENT

PERFORMANCE OF METHODS IN TERMS OF RATIO OF SATISFACTION

Each approach has the same performance

35




Performance of rating-based approach decreases when consumers vary their

demands.

36




Performances of rating-based approach and

context-aware ratings decrease when

taste of the consumes significantly vary.

37




Performance of GM is high and does not change in different configurations of

the environment.

Performance of CBR approach decreases when providers produce services

with a little indeterminism.

38



TIME CONSUMPTION OF METHODS (msec)

TSR < TCAR < TCBR < TGM

39

INTRODUCTION

Ratings have two major drawbacks:

Ratings disregard the context of the service demands.

Ratings reflect the satisfaction criteria and taste of ratings.

Previous approaches to service selection are mainly based on ratings.

40

Define the context using an ontology and attach this to the rating.

Consumers aggregates ratings from contexts that are similar to their current context.

Example: Consumer wants to buy a book.- Context: buying a book- Some Context-Aware Ratings: • Positive rating for buying a book from Amazon.

• Negative rating for buying a bycle from Amazon.

Enrich ratings with context-information (Context-Aware Rating):

41

Ratings reflect the satisfaction criteria and taste of the raters.

Experiences

Instead of Negative/Positive Ratings, Tell me about your Experiences and Let Me Evaluate them on my own.

MAIN IDEA Of Experiences

42

EXPERIENCES

An Experience of a consumer contains …

Service Demand of the Consumer

Identity of the Selected Service Provider

Supplied Service in response to the Service Demand

Date of the Experience

Commitments between the Consumer and the Provider if any

43

MAKING SERVICE DECISIONS USING EXPERIENCES

Modeling Service Providers Using Multivariate Gaussian Model (Parametric Classification)

Case-Based Reasoning

44

Comparison of Service Selection Methods If demands of the consumers do not change significantly and taste of consumers are similar for a specific demand, ratings are better.

If consumers significantly change their demands and their tastes are similar for a specific demand, using context-aware ratings is better.

If consumers significantly change their demands and their tastes, using experiences with CBR is better.

In other cases, using experiences with GM is better.

45

Basics of Reinforcement Learning Each state has a value in terms of the maximum

discounted reward expected in that state.

The equation expresses that expected value of a state, st, is the weighted sum of the rewards received when starting in the state st and following the current

policy.

The action selection at each step is based on Q-values, which are related to the goodness of the actions.

The Q-value, Q(s, a), is the total discounted reward that the agent would receive when it starts at a state s, performs an action a, and behaves optimally thereafter.

46


The purpose of RL is to construct an optimal action policy that maximizes the total reward (finds the best service providers through out ).

There are different approaches to achieve this, such as Q-Learning, SARSA etc.

We prefer SARSA in our work, because it learns rapidly and in the early part of learning, its average policy is better than that of the other RL approaches.

47

SARSA

Learning Rate Discount factor

RewardOld Q-value

Next State

Best action according to the current policy

48

Reward Function

RX = Average ratio of service decisions resulted in satisfaction when X is used for service selection.

For example: RCAR = 0.8 means that on the average 80 % of service decisions results in satisfaction of the consumer when context-aware ratings are used for service selection.

Terminology:

TX = Average time consumed when X is used for service selection.

49

GM SR CBR

CAR

RL

TRL/TGM

50

GM SR CBR

CAR

RL

TRL/TGM

51

GM SR CBR

CAR

RL

TRL/TGM

1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy &...

Documents

Transcript of 1 On Choosing An Efficient Service Selection Mechanism In Dynamic Environments by Murat Sensoy &...