1 Effects-Based Design of Robust · PDF fileAbstractŠEffects-based design of robust...

1

Effects-Based Design of Robust Organizations

Student Paper Submission(Suggested Track: Modeling and Simulation)

Sui Ruan1

E-mail: [email protected]

Candra Meirina1


Haiying Tu1


Feili Yu1


Krishna R. Pattipati1,2

1University of Connecticut, Dept. of Electrical and Computer Engineering371 Fairfield Road, Unit 1157

Storrs, CT 06269-1157Fax: 860-486-5585

Phone: 860-486-2890E-mail: [email protected]

*This work was supported by the Office of Naval Research under contract #N00014-00-1-0101.1Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-1157, USA.2Correspondence: [email protected]

Report Documentation Page Form ApprovedOMB No. 0704-0188

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.

1. REPORT DATE JUN 2004 2. REPORT TYPE

3. DATES COVERED 00-00-2004 to 00-00-2004

4. TITLE AND SUBTITLE Effects-Based Design of Robust Organizations

5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of Connecticut,Department of Eelctrical and ComputerEngineering,371 Fairfield Road Unit 1157,Storrs,CT,06269-1157

8. PERFORMING ORGANIZATIONREPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)

11. SPONSOR/MONITOR’S REPORT NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited

13. SUPPLEMENTARY NOTES The original document contains color images.

14. ABSTRACT

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

18. NUMBEROF PAGES

29

19a. NAME OFRESPONSIBLE PERSON

a. REPORT unclassified

b. ABSTRACT unclassified

c. THIS PAGE unclassified

Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

1

Effects-Based Design of Robust OrganizationsSui Ruana, Candra Meirinaa, Haiying Tua, Feili Yua, and Krishna R. Pattipatia

Abstract— Effects-based design of robust organizationsseeks to synthesize an organizational structure and itsstrategies (resource allocation, task scheduling and coor-dination), to achieve the desired effect(s) in a dynamicallychanging mission environment. In this paper, we model thedynamic system associated with the mission environment(e.g., environment faced by a joint task force with amilitary objective [8] or the competitive environmentconfronted by a consumer electronic company striving toincrease its market share) as a finite-state Markov DecisionProcess (MDP)[1][2][7]. Using this model, we determine anear-optimal action strategy that specifies which action totake in each state of the MDP model by Monte Carlocontrol method. The action strategy determines a rangeof possible missions the organization may face. The rangeof missions and platform utilization measures, in turn, areused to synthesize a robust organizational structure.

Keywords: Organizational Design, Markov Deci-sion Processes, Reinforcement Learning, and MonteCarlo Control Method.

I. INTRODUCTION

A. Motivation

Market environments and battlespaces are dy-namic and uncertain. Organizations seeking toachieve the desired effects in such environmentsare confronted with the following: 1) parts of theenvironment can not be controlled directly; 2) var-ious exogenous events may impact the state of theenvironment; 3) the interactions between potentialorganization’s actions and the dynamics of the envi-ronment may result in consequences that can not bepredicted a priori with certainty. Consequently, or-ganizations need to plan for potential contingencies,be flexible and adaptable. In other words, they needto be robust learning organizations. In this paper,we apply concepts from Markov Decision Processes(MDP), reinforcement learning, Monte Carlo con-trol method, and mixed integer optimization, as in

aElectrical and Computer Engineering Department, Universityof Connecticut, Storrs, CT 06269-1157, USA. E-mail: [sruan,meirina, yu02001, krishna]@engr.uconn.edu, [email protected] work is supported by the Office of Naval Research under contractNo. 00014-00-1-0101.

[1]-[5] and [16], to prescribe an optimal decisionstrategy and the concomitant organization structureto achieve desired effects in an uncertain missionenvironment.

Over the years, research in organizationaldecision-making has demonstrated that a strongfunctional dependency exists between the specificstructure of the mission environment and theresulting optimal organizational structure andits decision strategy. That is, the optimality ofan organization design depends on the missionenvironment and the organizational constraints [15].Such organizations, termed congruent, are expectedto perform well. Incongruence to environmentaldynamics hinders the organization’s ability toachieve the desired effects, results in a higher cost,or likely to encounter adverse effects.

Agility is arguably one of the most importantcharacteristics of successful information age or-ganizations [8]. Agile organizations do not justhappen. They are the results of an organizationalstructure, command and control approach, conceptsof operation, supporting systems, and personnel thathave a synergistic mix of the right characteristics.Agile organization is a synergistic combination ofthe following six attributes: Robustness, Resilience,Responsiveness, Flexibility, Innovation and Adapta-tion [8]. Robustness is the ability to retain a level ofeffectiveness across a range of missions that spanthe spectrum of conflicts, operating environments,and/or circumstances. An organization designed tocope with dynamic and stochastic mission environ-ments is said to be robust in the sense that it cancope with a range of contingencies.

B. Related Research

Over the past several years, mathematicaland computational models of organizations haveattracted a great deal of interest in various fields ofscientific research [15]. Many research efforts havefocused on the problem of quantifying the structural

2

(mis)match between organizations and their tasks.When modeling a complex mission and designingthe corresponding organization, the variety ofmission dimensions (e.g., functional, geographical,terrain), together with the required depth of modelgranunality, determine the complexity of the designprocess [14]. Model-driven synthesis of optimizedorganizations for a specific (deterministic) missionenvironment have been widely studied by theoperations research community [12][13].

On the other hand, planning and machine learningin uncertain and dynamic environments have beenlargely studied by the control and artificial intelli-gence community. In this vein, reinforcement learn-ing, a computational approach for understandingand automating goal-directed learning and decision-making [1]-[4], has become a dominant approachfor dealing with stochastic planning problems.

C. Scope and Organization of PaperIn section II, we provide an overview of our

organization methodology based on MDP, MonteCarlo control method, reinforcement learning, andmixed integer optimization techniques. In sectionIII, we formulate the dynamic environment as afinite state MDP, and formalize the objectives ofthe organizational design problem. In section IV,we apply Monte Carlo control method to prescribenear-optimal action strategies for the MDP model.Section V provides an integer programming formu-lation for the congruent organization design prob-lem. Simulation results are presented in Section VI.Finally, the paper concludes with a summary andfuture research directions in section VII.

II. ORGANIZATIONAL DESIGN METHODOLOGY

The methodology applied in this paper is shownin Fig. 1. In this study, we are considering adynamic military mission environment, and thedesign of a robust organization for accomplishingthe mission. The mission is represented via tasksand platforms (assets). There are three type oftasks considered in this paper, viz., mission tasks,time critical tasks, and “mosquitoes” (trivial tasks).Mission tasks are those that must be executed, areknown in advance, and typically have precedencerestrictions in the form of a dependency graph[9][10]. Time critical tasks are those whose

occurrence is uncertain, are time sensitive, and mayhave substantial impact on the mission. Mosquitoesare the relatively trivial tasks whose occurrence ishighly uncertain, but have insignificant impact onthe mission. Each task is characterized by a setof resource requirements [12]. A platform (asset)represents a physical entity of an organizationthat provides resource capabilities used to processthe tasks. Each platform has a specific resourcecapability. Successful task execution requiresthat the task’s resource requirements are met bythe overall resource capabilities of the platformsallocated to that task.

We model the dynamic environment as a finitestate Markov Decision Process (MDP) [3][4][7].The MDP is characterized by its state and action setsand by the the transition probability matrix. Givenany state and action, pair (s,a), the probability thatthe next state is s′ is given by

℘a

ss

′ = Pr{st+1 = s′|s

t= s, a

t= a} (1)

We assume that the structure of MDP model ofthe environment is known, but the environmentalparameters ( e.g., the transition probabilities )are unknown. Since the model parameters areunknown, they need to be estimated (“learned”)[3]; the estimated parameters enables us to finda near-optimal action strategy that optimizes anobjective function.

Monte Carlo control methods [3][6] are em-ployed to obtain a near-optimal action strategy.Monte Carlo methods require only experience-basedsample sequences of states, actions, and rewardsfrom an on-line or simulated interaction with theenvironment. Learning from on-line experience isstriking because it requires no prior knowledge ofthe environment’s dynamics, and yet can still attainoptimal behavior asymptotically. The near-optimalstrategy from Monte Carlo control method enablesus to compute the platform (resource) utilizationmeasures of the near-optimal strategy. These mea-sures are used to synthesize an organization thatimplements the near-optimal action strategy.

III. MDP FORMULATION AND ORGANIZATIONDESIGN OBJECTIVES

The dynamic stochastic mission environmentconsists of:

3

MDP ModelingTechnique

Monte CarloControl Method

Mixed-Integer Optimization

MDP Model of Mission

Near-OptimalAction Strategy

Robust Organization

Mission &

O

rganizationC

onstraints

Organization

Fig. 1. Robust Organization Design Process

• Effect set: M = {m1, m2, ..., mn}, the desired

effects, with some serving as the end goals.There can be dependencies among the effects.

• Exogenous event set: E = {e1, e2, ..., ef},

which represents uncontrollable random eventsin the environment.

• Action set: A = {a1, a2, ..., ak}, which denotes

controllable influences to achieve the desiredeffects, and minimize the adverse effects ofexogenous events. For each action a, there is acost c(a) associated with it.

When applying this modeling approach to a C2mission environment [9][10], the mission tasks cor-respond to the effects, and the time critical tasks and“mosquito” tasks correspond to exogenous events.The actions correspond to the asset allocation usedto achieve the desired effects and suppress theeffects of exogenous events.

Organization is a team of Decision Makers(DM ), ORG = {DM1, DM2, ..., DM

D}, i.e., the

personnel or automated systems that supervise theactions in the system. DMs coordinate their infor-mation, resources, and activities in order to achievetheir common goal in the dynamic and uncertainmission environment [12]. The constraint imposedon each DM is in the form of its limited resourcehandling capability. In this paper, we experimentwith the resource handling capability of each DMas a workload threshold. The DM’s workload is

a combination of internal workload and externalworkload [12][13].

The dynamic mission environment is modeledas a finite state Markov Decision Process (MDP ).The main sub-elements of the MDP follow themission characteristics cited earlier. They are asfollows:

• States: S = {s1, s2, ..., sz}

- Each state represents the status of effectsand exogenous events: s

i= (M

i, E

i)

where Mi⊆ M denotes the achieved

effects and Ei⊆ E denotes the existing,

but unmitigated, exogenous events.- The initial state, s

b= (∅, ∅), where no

effect has been achieved and no exoge-nous event has appeared yet; the terminalstates, S

e⊆ S, represent the attainment of

desired end effects.- The state has Markov property; that is the

next state depends solely on its previousstate and action, as in Eq. (1).

• Actions: A = {a1, a2, ..., ak}.

• Reward mechanism:– Reward : When an end state is reached, a

fixed reward r(se) > 0 is accrued by the

organization.– Penalty : When undesirable end effects are

reached, a penalty r(sh) < 0 is imposed on

the organization.

4

– Action cost: Whenever an action ai∈ A

is pursued, a cost c(ai) is incurred.

The objectives of the design problem are to:

1) Find an optimal closed-loop action strategy,viz., a mapping from state to action, to max-imize the expected net reward.

2) Design an organization, i.e., the allocation ofplatforms to decision makers, such that theoverall workload of the organization is min-imized, subject to constraints on each DM’sworkload capability.

The MDP problem posed above is astochastic optimal control problem[1][2]. Dynamicprogramming (DP) [2] is widely used to characterizethe optimal solution. However, it suffers from“the curse of dimensionality”, meaning that itscomputational requirements grow exponentiallywith the number of state variables. In addition,DP needs the complete parameters of the model.However, in real world applications, the transitionprobabilities are rarely known. Instead, we couldgain knowledge of the system from sampled data,and train the strategy. We propose Monte Carlocontrol methods [3][6] to achieve the objectiveof obtaining a near-optimal strategy for the MDPmodel without a complete knowledge of the MDPparameters (i.e., when the transition probabilitiesare not known). Monte Carlo method requires onlythat the MDP model generate a set of samplestate transitions, but not the complete probabilitydistributions of all possible state transitions. TheMonte Carlo method learns these quantities froman on-line simulation. This, in turn, enables usto obtain an optimal strategy [3] without priorknowledge of the environment’s dynamics. Anagent-based simulator, such as the one in [11], canprovide a vehicle to operationalize various missionepisodes utilized in the Monte Carlo method.

Platform (resource) utilization measures of thenear-optimal strategy are employed to design anorganizations that is congruent with the mission en-vironment. In this paper, the expected total workloadof the organization is minimized. An organizationtuned to such a strategy is robust in the sense thatit covers a range of missions that are most likely tomaterialize in the dynamic environment.

IV. MONTE CARLO METHOD FORNEAR-OPTIMAL ACTION STRATEGY

Monte Carlo methods are ways of solving thereinforcement learning problem based on averagingsample returns [3][6]. In Monte Carlo methods,we assume that the experience is divided intoepisodes, and that all episodes eventually terminateregardless of which actions are selected. It is onlyupon the completion of an episode that valueestimates (estimation of the expected reward ofstate, and state-action pair) and policies (strategy,the mapping from states to actions) are changed.Monte Carlo methods are thus incremental inan episode-by-episode sense [3][6], but not ina step-by-step sense. The term ”Monte Carlo”is often used more broadly for any estimationmethod whose operation involves a significantrandom component. Here, we use it specifically formethods based on averaging complete returns.

Monte Carlo methods could be applied to eval-uate state-value of any given strategy, viz., themapping of states to actions, by averaging thereturns from sample episodes. Starting from MonteCarlo policy evaluation, it is natural to alternatebetween evaluation and improvement on an episode-by episode basis. After each episode, the observedreturns are used for policy evaluation, and then thepolicy is improved at all the states visited in theepisode [3]. In addition, Monte Carlo methods areparticularly attractive when one requires the value ofonly a subset of the states. One can generate manysample episodes starting from these states, averag-ing returns only from these states, and ignoring allothers.

The complete Monte Carlo control algorithm,which combines Exploring Starts (ES) and ε-greedy,is given in Figure 2. The ES technique starts anepisode by randomly choosing the initial state. Theconcept of ε− greedy means that most of the timean action that has maximal estimated action value ischosen, but with probability ε an action is selectedat random. That is, all non-greedy actions are giventhe minimal probability of selection, ε

|A(s)|, and the

remaining bulk of the probability, 1 − ε + ε

|A(s)|,

is given to the greedy action [3], where |A(s)| isthe cardinality of action set, A(s) in state s. Thisenables the Monte Carlo control method to get outof local minima.

5

Monte Carlo Control Algorithm(Monte Carlo ES and ε-greedy combined)

Initialize:∀s

i∈ S, a ∈ A(s

i)

• Q(si, a) (value function of s

iand a) ← arbitrary positive real

number,• π(s

i) (state action mapping function of s

i) ← arbitrary action,

• Returns(si, a)← empty list

Repeat1) Generate an episode, ρ, using exploring starts [3] and π.2) For each pair s

i, a appearing in the episode ρ:

• R← return following the first occurrence of si, a.

• Append R to Returns(si, a)

• Q(si, a)← average(Returns(s

i, a))

3) For each s in the episode:π(s

i)← arg max

aQ(s

i, a) with probability 1− ε

π(si)← rand(A(s

i)) with probability ε

V (si)← max

aQ(s

i, π(s

i))

4) ε→ 0

Until V (si), ∀s

i∈ S converge, i.e.,

∑si∈S

|V (si)−V

′(si)

V (si)| ≤ δ

Fig. 2. Monte Carlo control method with Exploring Starts and ε-greedy

Note that, applying a simple Monte Carlo controlmethod, based on only exploring starts [3], is notsatisfactory, i.e. some states are rarely visited,while some other states form cycles. An ε−greedy[3] method is often combined with the ES method.A fixed ε is not efficient. A large ε yields slowconvergence and tends to oscillate; with a small ε,it is difficult to eliminate the possible state cycles.Typically, one chooses ε

i→ 0 such that

∑i

εi→∞.

For example, with εi= ε

N+i

, N = (2, 6), the resultsare satisfactory.

In the Monte Carlo control algorithm, all thereturns for each state-action pair are accumulatedand averaged, irrespective of which policy was inforce and when they were observed. Convergenceof the Monte Carlo control method is assuredbecause the changes to the action-value functiondecrease over time. A formal proof has not yetappeared, and it is one of most fundamental openquestions in reinforcement learning [3][4].

V. ORGANIZATION DESIGN

In a C2 environment [10], each mission taskcan be modeled as a desired effect, while timecritical tasks and “mosquito” tasks can be viewedas exogenous events. A Task is an activity thatentails the use of relevant resources (providedby organization’s platforms) and is carried outby an indvividual DM or a group of DMs toaccomplish the mission objectives [12]. Each taskT

i(i = 1, ..., N) has resource requirements specified

by the row vector [Ri1, Ri2, ..., RiL

]. A platform isa physical asset of an organization that providesresource capabilities and is used to process tasks.Each platform belongs to a unique platform class.For each platform class P

m(m = 1, ..., K), we

define its resource capability vector [rm1, ...rmL

],where r

mlspecifies the number of units of resource

type l available on platform class Pm

[12]. We haveassume that the number of platform classes andthe available platforms in each platform class isknown. In the MDP model, each action a

iis a set

of platforms P (ai) = [x

i1, xi2, ..., xim], specifying

6

the numbers of platforms of each class needed topursue the action. The workload of each decisionmaker in the organization embodies the activityof supervising the platforms and coordinating theplatforms with other decision makers to pursue theaction collaboratively to achieve the desired effectand mitigate the effects of exogenous events.

Starting from the near-optimal strategy, we candesign the organization structure which is congruentto this strategy. In this paper, we only study theorganization as the ownership of each platformof every platform class, so that the expected totalworkload of the organization is minimized witheach decision maker’s workload constraint satisfied.

Let [n1, ..., nm] be the available numbers of

platforms for platform classes [P1, ..., Pm]. Let

[xi1, ..., xim

] be the numbers of platforms of eachplatform class for action a

i. Define [X1, ..., Xm

]as the numbers of platforms for any action. Sincethe occurrence of actions in the optimal strategy isprobabilistic, so is [X1, ..., Xm

].

Let yki

be the the number of platforms ofclass P

iallocated to DM

k. For the organization

ORG = (DM1, ..., DMD), the workload WL

kof

DMk, k ∈ {1, ..., D} is approximated as:

WLk

= αm∑

i=1

yki

ni

E(Xi)

+βm∑

i=1

m∑

j=1

[y

ki(n

j− y

kj)

nin

j

E(XiX

j)]

∀k = 1, ..., D (2)

where∑

m

i=1

yki

niE(X

i) quantifies the internal

workload of DMk, which accounts for the super-

vision of platforms that DMk

owns. The internalworkload of DM

kincurred by platform class P

iis

proportional to- expected number of platforms of class P

i, i.e.,

E(Xi);

- the proportion of platforms of class Pi

thatDM

kowns, i.e., yki

ni;

That is, the larger is the value of E(Xi), the

higher is the workload imposed on platform classP

i; the larger the proportion of platforms of class

Pi

owned by decision maker DMk, i.e. yki

ni, the

higher is the workload from class Pi

on DMk. The

term∑

m

i=1

∑m

j=1[yki(nj−ykj)

ninjE(X

iX

j)] quantifies the

external workload, i.e., the coordination effort ofDM

kcooperating with other DMs in the organi-

zation to execute tasks. The external workload onDM

kincurred by the platform classes P

iand P

jis

proportional to- joint expectation of numbers of platforms of

classes Pi

and Pj, i.e., E(X

iX

j);

- the proportion of platforms of class Pi

thatDM

kowns, i.e., yki

ni;

- the proportion of platforms of class Pj

thatDM

kdoesn’t own, i.e., (nj−ykj)

nj.

Here, the higher order expectations of joint numbersof platform classes, e.g., third order expectationE(X

iX

jX

w), i, j, w ∈ {1, ..., D}, are ignored.

User specified constants α and β define the weightson internal and external workloads.

Thus, the total workload of all DMs is:

WLall

= αm∑

i=1

E(Xi)

+βm∑

i=1

m∑

j=1

[(1−

∑D

k=1y

kiy

kj

nin

j

)E(XiX

j)] (3)

The optimization problem associated withorganizational design is as follows:

objective : min WLall

= α∑

m

i=1E(X

i)

+β∑

m

i=1

∑m

j=1[(1−

∑D

k=1ykiykj

ninj)E(X

iX

j)]

(4)s.t. WL

k= α

∑m

i=1

yki

niE(X

i)

+β∑

m

i=1

∑m

j=1[yki(nj−ykj)

ninjE(X

iX

j)] ≤ C

k

∀k = 1, ..., D∑

D

k=1y

ki= n

i∀i = 1, ..., m

yki∈ I+ ∀k = 1, ..., D ∀i = 1, ..., m (5)

Note that E(Xi) is the expected number of

platforms of class Pi, and E(X

iX

j) is the joint

expectation of numbers of platform classes Pi

andP

j.These expectations are approximated by their

sample means:

Xi≈ E(X

i)

XiX

j≈ E(X

iX

j), ∀i, j ∈ {1, ..., m} (6)

7

This problem, defined in Eq.(5), is typically anonlinear constrained integer optimization problem,which is NP-hard. We applied MINLP (MixedInteger Nonlinear Programming) algorithms inTOMLABTM [17] optimization software to solvethe above integer programming problem. The solverpackage minlpBB for sparse and dense mixed-integer linear, quadratic and nonlinear programmingwas employed for this problem.

VI. SIMULATION

A. Simulation ModelWe illustrate our approach to effects-based

robust organization design using a simple joint-task-force scenario that can be operationalized inthe distributed dynamic decision-making (DDD-III)war-gaming simulator [9]. In the example system,there are four platform classes, i.e., P1, P2, P3,P4, as defined in Table I, and three mission tasks(desired effects), i.e., M1, M2, M3, and three timecritical tasks (exogenous events), i.e., T1, T2, T3,as defined in Table II. The dependencies amongmission tasks are shown in Fig. 3.

The following three types of resources are con-sidered:

- ASUW – Anti-SUbmarine Warfare- STK – STriKe warfare- SOF – Special/ground Operations

Rules of the simulation model, which are un-known to the learning agent, are as follows:

- All mission tasks should be executed by satis-fying the dependency constraints;

- All time critical tasks, if they ever appear, haveto be completed before the final mission taskis completed;

- Each resource contributes to the task accuracy;the more resources are allocated to a task, thehigher is the probability of completing the task.

- If the final mission task is achieved, the gameis won and a reward of 5,000 units is earned;

- If there are three time critical tasks existing inthe system, the game is lost and a penalty of3,000 units is incurred.

B. Monte Carlo SimulationsBy using the Monte Carlo control method as

shown in Fig. 2, we obtain a near-optimal (sta-ble) action strategy via 85000 runs (episodes). The

TABLE I

PLATFORMS

Platform Name Number ASUW STK SOF CostP1 F18s 3 0 2 0 100P2 FAB 5 1 0 0 80P3 FOB 3 1 1 1 160P4 SOF 2 0 0 1 60

Reward (Win) 5000Penalty (Lose) -3000

TABLE II

MISSIONS AND TIME CRITICAL TASKS

Mission Name ASUW STK SOFM1 NB - Naval Base 2 0 2M2 AB - Air Base 0 6 0M3 PRT - Sea Port (final goal) 2 2 0T1 SCUD - missile launcher 1 1 0T2 PH - Hostile ship 1 0 1T3 TSK - complex ground task 0 1 1

optimal action strategy, i.e., the mapping of statesto actions, viz., the combinations of platforms,and values of states, are as listed in Table III.The comparison of net rewards from the near-optimal strategy obtained by Monte Carlo controlmethod and a randomized greedy strategy from2000 runs are shown in Fig. 4. The greedy strategychooses randomly, at each state, one of the fivemost economically feasible platform combinations.After 10000 runs (sample episodes), the average netreward of the near-optimal strategy is 2985, whilethe average net reward of the greedy strategy isonly 2127. Thus, the near-optimal strategy providesa 40% better reward than the greedy strategy.

C. Organization Design ResultsThe platform utilization statistics, i.e., the sample

mean of the numbers of platforms of each class

x

x

x

��

��

��*

HH

HH

HHHj

M1 : Naval Base

M2 : Air Base

M3 : Sea Port

Fig. 3. Effect Dependency

8

-8000 -6000 -4000 -2000 0 2000 40000

500

1000

1500Greedy Strategy (Histogram)

Net Reward-8000 -6000 -4000 -2000 0 2000 4000

0

500

1000

1500Near-Optimal Strategy (Histogram)

Net Reward

Fig. 4. Comparison of Net Reward Distributions

TABLE III

NEAR-OPTIMAL STRATEGY

State Action (Platforms) State ValueP1 P2 P3 P4

∅ 1 1 1 3 2777.72M1 3 0 0 0 3008.48M2 0 2 0 2 3565.36T1 1 2 0 1 2115.63T2 0 2 0 2 2174.47T3 1 0 0 1 2291.07

T1, T2 1 1 0 0 1796.93M1, T1 3 0 0 0 2270.26M1, T2 0 2 1 2 2278.97M1, T3 3 0 0 0 2100.65M2, T1 0 2 0 2 2407.18· · · · · · · · ·

M1, T1, T2 1 0 1 1 1693.73M1, M2 1 2 0 0 3221.11

M1, M2, T1 1 2 0 1 2358.82· · · · · · · · ·

Xi, i ∈ {1, ..., m} and the sample mean of joint

numbers of platform class pairs XiX

j, i, j ∈

{1, ..., m} are listed in Table IV. Using MINLPalgorithms in TOMLABTM , we can obtain thecongruent organization for the near-optimal strategyas shown in Table V.

VII. CONCLUSION AND FUTURE WORK

In this paper, we proposed a methodology fordesigning a robust organization for stochastic anddynamic environments. The dynamic environmentcan be modeled as a finite state Markov DecisionProcess. Using Monte Carlo control methods,a near-optimal action strategy is obtained. An

TABLE IV

PLATFORM UTILIZATION STATISTICS

P1 P2 P3 P4

Xi 1.2 2.3 3.1 2

XiXj P1 P2 P3 P4

P1 1.5 0.75 0.25 1.00P2 0.75 1.916 0.25 1.58P3 0.25 0.25 0.333 0.667P4 1.00 1.583 0.667 2.667

TABLE V

DECISION MAKER PLATFORM OWNERSHIP

Expected WorkloadDM P1 P2 P3 P4 Workload Constraint

(α = 1, β = 1)DM1 1 3 1 1 5.7065 8DM2 1 1 1 1 4.9265 6DM3 1 1 1 0 3.3766 6

organization congruent to this strategy is designedby solving an integer optimization problem.

Simulation results support the conclusion thatthe Monte Carlo control methods are effectivein achieving the near-optimal strategies in realworld applications, where the system parametersare uncertain. Formulation of organizational designproblem and mix-integer optimization algorithmsprovide nice vehicles to realize the design oforganizations that are congruent with their dynamicand uncertain environments.

We are pursuing future research along the follow-

9

ing directions:- Incorporate realistic mission environments into

the MDP model.- Include additional organizational structure ele-

ments into the design process, e.g., commandstructure, information flow structure.

- Study the mechanisms of organizational adap-tation via agent-based simulations.

REFERENCES

[1] D. P. Bertsekas and S. E. Shreve, Stochastic OptimalControl: The Discrete-Time Case, Athena Scinetific,1996.

[2] D. P. Bertsekas, Dynamic Programming and OptimalControl, Athena Scientific 2001.

[3] R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction, The MIT Press.

[4] L.P. Kaelbling, M. L. Littman, and A. W. Moore, Re-inforcement learning: A survey, Journal of ArtificialIntelligence Research, 4:237-285.

[5] P. R. Kumar and P. Variya, Stochastic Systems: Estima-tion, Identification, and Adaptive Control, Prentice-Hall,Englewood Cliffs, NJ.

[6] G. Tesauro and G. R. Galperin, On-line Policy Improve-ment Using Monte-Carlo Search, Advances in NeuralInformation Processing: Proceedings of the Ninth Con-ference.

[7] M. L. Puterman, Markov Decision Processes: DiscreteStochastic Dynamic Programming, Wiley-IntersciencePublication.

[8] D. S. Alberts and R. E. Hayes. (2003), Power to theEdge: Command Control in the information Age, Infor-mation Age Transformation Series, CCRP Publications.

[9] D. L. Kleinman, P.W. Young, and G. S. Higgins, TheDDD-III: A tool for empirical research in adaptiveorganizations, Proc 1996 Command Control Res TechnolSymp, NPS, Monterey, CA, June 1996.

[10] D. L. Kleinman, G. M. Levchuk, S. G. Hutchins and W.G. Kemple, Scenario Design for the Empirical Testing ofOrganizational Congruence, 8th International Commandand Control Research and Technology Symposium, Mon-terey, CA, June 2003.

[11] C. Meirina, G. M. Levchuk, and K. R. Pattipati, A Multi-Agent Decision Framework for DDD-III Environment,8th International Command and Control Research andTechnology Symposium, Monterey, CA, June 2003.

[12] G. M. Levchuk, Y. N. Levchuk, J. Luo, K. R. Pattipatiand D.L. Kleinman, Normative Design of OrganizationsPart I: Mission Planning, IEEE Trans. on SMC: Part ASystems and Humans, Vol. 32, No. 3, pp. 346-359, May2002.

[13] G. M. Levchuk, Y.N. Levchuk, J. Luo, K.R. Pattipatiand D. L. Kleinman, Normative Design of OrganizationsPart II: Organizational Structures, IEEE Trans. on SMC:Part A Systems and Humans, Vol. 32, No. 3, pp. 360-375, May 2002.

[14] G. M. Levchuk, Y. Levchuk, J. Luo, F. Tu and K.R. Pattipati, A Library of optimization Algorithms forOrganizational Design, 2000 Command and ControlSymposium, Monterey, CA, June 26-28, 2000.

[15] K. M. Carley and Z. Lin, Organizational design suitedto high performance under stress”, IEEE transactions onSystem, Man and Cybernetics, Volume 25, 1995, pp.221-231.

[16] G. L. Nemhauser, L. A. Wolsey, Integer and Combina-torial Optimization,Wiley-Interscience, New York, 1988.

[17] K. Holmstrom, A. O. Goran and M. M. Edvall, User’sGuide for Tomlab /MINLP,http://tomlab.biz/download/manuals.php.

1

Effects-Based Design of Robust OrganizationsEffects-Based Design

of Robust Organizations

Command and Control Research and Technology SymposiumCommand and Control Research and Technology SymposiumJune 16, 2004June 16, 2004

SuiSui RuanRuanCandraCandra MeirinaMeirina

HaiyingHaiying TuTuFeiliFeili YuYu

Krishna R. Krishna R. PattipatiPattipati

Dept. of Electrical and Computer EngineeringUniversity of Connecticut

Contact: [email protected] (860) 486-2890

2

Introduction

Modeling Mission Environment

Markov Decision Processes for Mission Environment

Monte Carlo Control Method

Robust Organization Design

Summary and Future Work

OutlineOutline

Illustrative Example

3

Methodology:Mission Model: Finite-state Markov Decision ProcessMethods:

Robust strategiesMonte Carlo Control Methods

Robust structuresMixed Integer Nonlinear Programming

Design ProblemDesign ProblemEffectsEffects--Based Design of Robust OrganizationsBased Design of Robust Organizations

Objective:Design robust organizational structures and strategies to account for a dynamically changing mission environment

4

MDP ModelingTechnique

Learning Methods

Mixed-Integer Optimization

MDP Model of Mission

Near-OptimalStrategy

Robust Organization

Mission &

O

rganizationC

onstraints

Organization

Robust Organization Design Methodology:

Design MethodologyDesign Methodology

5

Characteristics of dynamic and stochastic environments:

Parts of the environment cannot be controlled directly

Various exogenous events may impact the environment

Consequences of actions cannot be predicted a priori with certainty

Modeling Mission Environment 1/3Modeling Mission Environment 1/3

Reqs. for organizations coping with stochastic environments:Plan for potential contingencies

Maintain CongruentCongruent with the dynamic mission environment

Be RobustRobust

6

Dynamic Stochastic Mission Environment:Effects: the desired effects, with some serving as the end goals

Exogenous events: uncontrollable random events

Actions: controllable influences to achieve the desired effects, and minimize the adverse effects of exogenous events

Organization:A team of Decision Makers (DM)

Human or automated system

Limited resource handling capability (workload threshold)


7

Command Control Mission Environment and OrganizationCommand Control Mission Environment and Organization

iTiTTi

1 2[ , ,..., ]m m mLr r rAsset-Task Allocation Actions

Platform (Asset):Resource Capability Vector:

Organization:Ownership of Platforms

1 2[ , ,..., ]i i iLR R R

Exogenous Events

Mission Tasks Effects

Time critical tasks

“Mosquito” tasks

Task:Resource Requirement Vector: .


8

States:Status of effects and exogenous events:

},...,,{ 21 zsssS =

},...,,{ 21 kaaaA =

0)( >esr0)( <hsr

MDP for C2 Mission EnvironmentMDP for C2 Mission EnvironmentiM M⊆

Reward Mechanism:

Actions:

Reward: desired end effect is reachedPenalty: undesirable end effects are reachedCost : action is pursued

),( iii EMs =iM M⊆ Achieved effects

iE E⊆ Unmitigated exogenous events

, Platform to task allocation

( ) 0iC a >

Transition Probability Matrix: ' 1( ' | , }ass t t tpr s s s s a a+℘ = = = =

Optimal Action Strategy:Mapping from states to actions, maximizing the expected net reward

Markov Decision Process for C2 Mission EnvironmentMarkov Decision Process for C2 Mission Environment

9

Illustrative ExampleIllustrative Example

-3000Penalty (Lose)

5000Reward (Win)

60

160

80

100

Cost

0

1

0

2

STK

102SOFP4

113FOBP3

003F18SP1

1

ASUW

0

SOF

5

Number

FAB

Name

P2

Platform

110TSK –complex group task

T3

101Hostile shipT2

011SCUD – missileT1

022Sea Port ( final)M3

060Air BaseM2

2

SOF

0

STK

2

ASUW

Naval Base

Name

M1

Task

Naval Base (M1)

Air (M2)Base

Sea (M3)Port

SCUD

Hostile Ship

TSK

F18S

FAB

FOBSOF

10

- Mission Task

- Time Critical Task- “Mosquitoes”

- Asset

Legendary

Monte Carlo Control Method – 1/4Monte Carlo Control Method – 1/4

2000a2 (<3P1> m2)1320

1560

S-A Value

a3 (<P3> T2)

a1(<2P2+2P4> m1)

Action

S1(T2)

State

Epsilon-greedy method: Selects best action with probability of 1 - ε→ Avoiding local minimum

Air (M2)Base

Sea (M3) Port

SCUD (T2)

action a2

m2

state s1

Nava (M1) Base

Objective: Learn Optimal Action StrategyMapping from states to actions, maximizing the expected net reward

Exploration method for finding a start state:Episode starts from a randomly selected initial state

11

1400a5(<P3> T1, <P2+P4> T2)

700a3 (<P3> T2)

1020

1200

S-A Value

a4(<P2+P4> T1)

a1(<2P2+2P4> m1)

Action

S2(M2,T2)

State


Naval (M1) Base

Sea (M3)Port

SCUD (T2)

action a5

state s2

Hostile Ship(T1)

T1

T2

Air (M2)Base

- Mission Task

- Time Critical Task- “Mosquitoes”

- Asset

Legendary

12

2170a2 (<3P1> m2)1320a3 (<P3> T2)

1560a1(<2P2+2P4> m1)S1(T2)

2800a5(<P3> T1, <P2+P4> T2)

700a3 (<P3> T2)

1020

1200

S-A Value

a4(<P2+P4> T1)

a1(<2P2+2P4> m1)

Action

S2(M2,T2)

StateUpdate Update

statestate--action valueaction value

state sn Net reward3400

Naval (M1) Base

Sea (M3)Port

Air (M2)Base


13

3020a5

1454a4

930a3

1356a3

323

1780

2115

State-Action Value

a2

a1

a1

Action

M2,T2

T2

State

Converged state-action values Optimal Strategy

Optimal Strategy: Mapping from states to actions

P2+P3+P4M2, T2

P1+P2T1, T2

2P2+2P4T2

P1+P2+P3+3P4

P1+P4T3

P1+2P2+P4T1

2P2+2P4M2

3P1

Optimal Action

M1

Stateφ


14

OrganizationDM1 DM2

DM3 DM4

Internal Workload

External Workload

Workload constraint

Robust Organization Design 1/2Robust Organization Design 1/2

Problem formulation

Mixed-integer optimization algorithms

Robust organization

Asset utilization of the near-optimal strategyObjective:

Design a congruent organizational structure in terms of DM ownership of platforms, such that the overall workload is minimized

15

Workload of DMk :

Integer Optimization Problem:Objective: Minimize overall workload

Subject to:1) Each DM cannot exceed his workload constraint2) Each platform has to be assigned to a DM

kWL =Internal Workload + External Workloadm

i=1

k

Internal workload {(platform class activity)*

(number of platforms of platform class owned by DM )}

i

i

P

P

∝ ∑

1 1

k

External workload {(platform classes cross activity)*

(number of platforms of platform class owned by DM )* (nu

m m

i ji j

i

P P

P= =

∝ ∑∑

kmber of platforms of platform class not owned by DM )}jP

Robust Organization Design 2/2Robust Organization Design 2/2

16

DM1P1+3P2+P3+P4

DM2P1+P2+P3+P4

DM3P1+P2+P3

Robust OrganizationalStructure

Mix-integer nonlinear

programming algorithms

Illustrative Example - RevisitIllustrative Example - Revisit

Near- Optimal Strategy

Statistics ofPlatform Utilization

1.5 0.75 0.25 1.00

0.75 1.916 0.25 1.58

0.25 0.25 0.33 0.67

1.00 1.58 0.67 2.67

1 1.08 0.33 1.34

P1 (3)

P2(5)

P3(3)

P4(2)

P1

P2

P3

P4

Expected Platform Utilization

Expected PlatformCoordination

17

SummarySummary

Proposed a methodology for designing robust organizations for dynamic and stochastic environments

Modeled the mission environment as a finite state Markov Decision Process

Applied Monte Carlo control methods to obtain a near-optimal action strategy

Utilized mixed-integer optimization technique to design organizational structure congruent to the strategy

18

Modeling Parameters:Incorporate more realistic mission environments into MDP model

Task locationsPlatform locations, velocities

Space Reduction in Learning:Generalization (Function Approximation)Abstraction (Factored Representation)

Organizational Design:Include additional organizational structure elements into the design process

Command structureInformation flow structure

Future WorkFuture Work

19

Thank You

1 Effects-Based Design of Robust · PDF fileAbstractŠEffects-based design of robust...

Documents

Transcript of 1 Effects-Based Design of Robust · PDF fileAbstractŠEffects-based design of robust...