1 Effects-Based Design of Robust · PDF fileAbstractŠEffects-based design of robust...
Transcript of 1 Effects-Based Design of Robust · PDF fileAbstractŠEffects-based design of robust...
1
Effects-Based Design of Robust Organizations
Student Paper Submission(Suggested Track: Modeling and Simulation)
Sui Ruan1
E-mail: [email protected]
Candra Meirina1
E-mail: [email protected]
Haiying Tu1
E-mail: [email protected]
Feili Yu1
E-mail: [email protected]
Krishna R. Pattipati1,2
1University of Connecticut, Dept. of Electrical and Computer Engineering371 Fairfield Road, Unit 1157
Storrs, CT 06269-1157Fax: 860-486-5585
Phone: 860-486-2890E-mail: [email protected]
*This work was supported by the Office of Naval Research under contract #N00014-00-1-0101.1Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-1157, USA.2Correspondence: [email protected]
Report Documentation Page Form ApprovedOMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.
1. REPORT DATE JUN 2004 2. REPORT TYPE
3. DATES COVERED 00-00-2004 to 00-00-2004
4. TITLE AND SUBTITLE Effects-Based Design of Robust Organizations
5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of Connecticut,Department of Eelctrical and ComputerEngineering,371 Fairfield Road Unit 1157,Storrs,CT,06269-1157
8. PERFORMING ORGANIZATIONREPORT NUMBER
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES The original document contains color images.
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT
18. NUMBEROF PAGES
29
19a. NAME OFRESPONSIBLE PERSON
a. REPORT unclassified
b. ABSTRACT unclassified
c. THIS PAGE unclassified
Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18
1
Effects-Based Design of Robust OrganizationsSui Ruana, Candra Meirinaa, Haiying Tua, Feili Yua, and Krishna R. Pattipatia
Abstract— Effects-based design of robust organizationsseeks to synthesize an organizational structure and itsstrategies (resource allocation, task scheduling and coor-dination), to achieve the desired effect(s) in a dynamicallychanging mission environment. In this paper, we model thedynamic system associated with the mission environment(e.g., environment faced by a joint task force with amilitary objective [8] or the competitive environmentconfronted by a consumer electronic company striving toincrease its market share) as a finite-state Markov DecisionProcess (MDP)[1][2][7]. Using this model, we determine anear-optimal action strategy that specifies which action totake in each state of the MDP model by Monte Carlocontrol method. The action strategy determines a rangeof possible missions the organization may face. The rangeof missions and platform utilization measures, in turn, areused to synthesize a robust organizational structure.
Keywords: Organizational Design, Markov Deci-sion Processes, Reinforcement Learning, and MonteCarlo Control Method.
I. INTRODUCTION
A. Motivation
Market environments and battlespaces are dy-namic and uncertain. Organizations seeking toachieve the desired effects in such environmentsare confronted with the following: 1) parts of theenvironment can not be controlled directly; 2) var-ious exogenous events may impact the state of theenvironment; 3) the interactions between potentialorganization’s actions and the dynamics of the envi-ronment may result in consequences that can not bepredicted a priori with certainty. Consequently, or-ganizations need to plan for potential contingencies,be flexible and adaptable. In other words, they needto be robust learning organizations. In this paper,we apply concepts from Markov Decision Processes(MDP), reinforcement learning, Monte Carlo con-trol method, and mixed integer optimization, as in
aElectrical and Computer Engineering Department, Universityof Connecticut, Storrs, CT 06269-1157, USA. E-mail: [sruan,meirina, yu02001, krishna]@engr.uconn.edu, [email protected] work is supported by the Office of Naval Research under contractNo. 00014-00-1-0101.
[1]-[5] and [16], to prescribe an optimal decisionstrategy and the concomitant organization structureto achieve desired effects in an uncertain missionenvironment.
Over the years, research in organizationaldecision-making has demonstrated that a strongfunctional dependency exists between the specificstructure of the mission environment and theresulting optimal organizational structure andits decision strategy. That is, the optimality ofan organization design depends on the missionenvironment and the organizational constraints [15].Such organizations, termed congruent, are expectedto perform well. Incongruence to environmentaldynamics hinders the organization’s ability toachieve the desired effects, results in a higher cost,or likely to encounter adverse effects.
Agility is arguably one of the most importantcharacteristics of successful information age or-ganizations [8]. Agile organizations do not justhappen. They are the results of an organizationalstructure, command and control approach, conceptsof operation, supporting systems, and personnel thathave a synergistic mix of the right characteristics.Agile organization is a synergistic combination ofthe following six attributes: Robustness, Resilience,Responsiveness, Flexibility, Innovation and Adapta-tion [8]. Robustness is the ability to retain a level ofeffectiveness across a range of missions that spanthe spectrum of conflicts, operating environments,and/or circumstances. An organization designed tocope with dynamic and stochastic mission environ-ments is said to be robust in the sense that it cancope with a range of contingencies.
B. Related Research
Over the past several years, mathematicaland computational models of organizations haveattracted a great deal of interest in various fields ofscientific research [15]. Many research efforts havefocused on the problem of quantifying the structural
2
(mis)match between organizations and their tasks.When modeling a complex mission and designingthe corresponding organization, the variety ofmission dimensions (e.g., functional, geographical,terrain), together with the required depth of modelgranunality, determine the complexity of the designprocess [14]. Model-driven synthesis of optimizedorganizations for a specific (deterministic) missionenvironment have been widely studied by theoperations research community [12][13].
On the other hand, planning and machine learningin uncertain and dynamic environments have beenlargely studied by the control and artificial intelli-gence community. In this vein, reinforcement learn-ing, a computational approach for understandingand automating goal-directed learning and decision-making [1]-[4], has become a dominant approachfor dealing with stochastic planning problems.
C. Scope and Organization of PaperIn section II, we provide an overview of our
organization methodology based on MDP, MonteCarlo control method, reinforcement learning, andmixed integer optimization techniques. In sectionIII, we formulate the dynamic environment as afinite state MDP, and formalize the objectives ofthe organizational design problem. In section IV,we apply Monte Carlo control method to prescribenear-optimal action strategies for the MDP model.Section V provides an integer programming formu-lation for the congruent organization design prob-lem. Simulation results are presented in Section VI.Finally, the paper concludes with a summary andfuture research directions in section VII.
II. ORGANIZATIONAL DESIGN METHODOLOGY
The methodology applied in this paper is shownin Fig. 1. In this study, we are considering adynamic military mission environment, and thedesign of a robust organization for accomplishingthe mission. The mission is represented via tasksand platforms (assets). There are three type oftasks considered in this paper, viz., mission tasks,time critical tasks, and “mosquitoes” (trivial tasks).Mission tasks are those that must be executed, areknown in advance, and typically have precedencerestrictions in the form of a dependency graph[9][10]. Time critical tasks are those whose
occurrence is uncertain, are time sensitive, and mayhave substantial impact on the mission. Mosquitoesare the relatively trivial tasks whose occurrence ishighly uncertain, but have insignificant impact onthe mission. Each task is characterized by a setof resource requirements [12]. A platform (asset)represents a physical entity of an organizationthat provides resource capabilities used to processthe tasks. Each platform has a specific resourcecapability. Successful task execution requiresthat the task’s resource requirements are met bythe overall resource capabilities of the platformsallocated to that task.
We model the dynamic environment as a finitestate Markov Decision Process (MDP) [3][4][7].The MDP is characterized by its state and action setsand by the the transition probability matrix. Givenany state and action, pair (s,a), the probability thatthe next state is s′ is given by
℘a
ss
′ = Pr{st+1 = s′|s
t= s, a
t= a} (1)
We assume that the structure of MDP model ofthe environment is known, but the environmentalparameters ( e.g., the transition probabilities )are unknown. Since the model parameters areunknown, they need to be estimated (“learned”)[3]; the estimated parameters enables us to finda near-optimal action strategy that optimizes anobjective function.
Monte Carlo control methods [3][6] are em-ployed to obtain a near-optimal action strategy.Monte Carlo methods require only experience-basedsample sequences of states, actions, and rewardsfrom an on-line or simulated interaction with theenvironment. Learning from on-line experience isstriking because it requires no prior knowledge ofthe environment’s dynamics, and yet can still attainoptimal behavior asymptotically. The near-optimalstrategy from Monte Carlo control method enablesus to compute the platform (resource) utilizationmeasures of the near-optimal strategy. These mea-sures are used to synthesize an organization thatimplements the near-optimal action strategy.
III. MDP FORMULATION AND ORGANIZATIONDESIGN OBJECTIVES
The dynamic stochastic mission environmentconsists of:
3
MDP ModelingTechnique
Monte CarloControl Method
Mixed-Integer Optimization
MDP Model of Mission
Near-OptimalAction Strategy
Robust Organization
Mission &
O
rganizationC
onstraints
Organization
Fig. 1. Robust Organization Design Process
• Effect set: M = {m1, m2, ..., mn}, the desired
effects, with some serving as the end goals.There can be dependencies among the effects.
• Exogenous event set: E = {e1, e2, ..., ef},
which represents uncontrollable random eventsin the environment.
• Action set: A = {a1, a2, ..., ak}, which denotes
controllable influences to achieve the desiredeffects, and minimize the adverse effects ofexogenous events. For each action a, there is acost c(a) associated with it.
When applying this modeling approach to a C2mission environment [9][10], the mission tasks cor-respond to the effects, and the time critical tasks and“mosquito” tasks correspond to exogenous events.The actions correspond to the asset allocation usedto achieve the desired effects and suppress theeffects of exogenous events.
Organization is a team of Decision Makers(DM ), ORG = {DM1, DM2, ..., DM
D}, i.e., the
personnel or automated systems that supervise theactions in the system. DMs coordinate their infor-mation, resources, and activities in order to achievetheir common goal in the dynamic and uncertainmission environment [12]. The constraint imposedon each DM is in the form of its limited resourcehandling capability. In this paper, we experimentwith the resource handling capability of each DMas a workload threshold. The DM’s workload is
a combination of internal workload and externalworkload [12][13].
The dynamic mission environment is modeledas a finite state Markov Decision Process (MDP ).The main sub-elements of the MDP follow themission characteristics cited earlier. They are asfollows:
• States: S = {s1, s2, ..., sz}
- Each state represents the status of effectsand exogenous events: s
i= (M
i, E
i)
where Mi⊆ M denotes the achieved
effects and Ei⊆ E denotes the existing,
but unmitigated, exogenous events.- The initial state, s
b= (∅, ∅), where no
effect has been achieved and no exoge-nous event has appeared yet; the terminalstates, S
e⊆ S, represent the attainment of
desired end effects.- The state has Markov property; that is the
next state depends solely on its previousstate and action, as in Eq. (1).
• Actions: A = {a1, a2, ..., ak}.
• Reward mechanism:– Reward : When an end state is reached, a
fixed reward r(se) > 0 is accrued by the
organization.– Penalty : When undesirable end effects are
reached, a penalty r(sh) < 0 is imposed on
the organization.
4
– Action cost: Whenever an action ai∈ A
is pursued, a cost c(ai) is incurred.
The objectives of the design problem are to:
1) Find an optimal closed-loop action strategy,viz., a mapping from state to action, to max-imize the expected net reward.
2) Design an organization, i.e., the allocation ofplatforms to decision makers, such that theoverall workload of the organization is min-imized, subject to constraints on each DM’sworkload capability.
The MDP problem posed above is astochastic optimal control problem[1][2]. Dynamicprogramming (DP) [2] is widely used to characterizethe optimal solution. However, it suffers from“the curse of dimensionality”, meaning that itscomputational requirements grow exponentiallywith the number of state variables. In addition,DP needs the complete parameters of the model.However, in real world applications, the transitionprobabilities are rarely known. Instead, we couldgain knowledge of the system from sampled data,and train the strategy. We propose Monte Carlocontrol methods [3][6] to achieve the objectiveof obtaining a near-optimal strategy for the MDPmodel without a complete knowledge of the MDPparameters (i.e., when the transition probabilitiesare not known). Monte Carlo method requires onlythat the MDP model generate a set of samplestate transitions, but not the complete probabilitydistributions of all possible state transitions. TheMonte Carlo method learns these quantities froman on-line simulation. This, in turn, enables usto obtain an optimal strategy [3] without priorknowledge of the environment’s dynamics. Anagent-based simulator, such as the one in [11], canprovide a vehicle to operationalize various missionepisodes utilized in the Monte Carlo method.
Platform (resource) utilization measures of thenear-optimal strategy are employed to design anorganizations that is congruent with the mission en-vironment. In this paper, the expected total workloadof the organization is minimized. An organizationtuned to such a strategy is robust in the sense thatit covers a range of missions that are most likely tomaterialize in the dynamic environment.
IV. MONTE CARLO METHOD FORNEAR-OPTIMAL ACTION STRATEGY
Monte Carlo methods are ways of solving thereinforcement learning problem based on averagingsample returns [3][6]. In Monte Carlo methods,we assume that the experience is divided intoepisodes, and that all episodes eventually terminateregardless of which actions are selected. It is onlyupon the completion of an episode that valueestimates (estimation of the expected reward ofstate, and state-action pair) and policies (strategy,the mapping from states to actions) are changed.Monte Carlo methods are thus incremental inan episode-by-episode sense [3][6], but not ina step-by-step sense. The term ”Monte Carlo”is often used more broadly for any estimationmethod whose operation involves a significantrandom component. Here, we use it specifically formethods based on averaging complete returns.
Monte Carlo methods could be applied to eval-uate state-value of any given strategy, viz., themapping of states to actions, by averaging thereturns from sample episodes. Starting from MonteCarlo policy evaluation, it is natural to alternatebetween evaluation and improvement on an episode-by episode basis. After each episode, the observedreturns are used for policy evaluation, and then thepolicy is improved at all the states visited in theepisode [3]. In addition, Monte Carlo methods areparticularly attractive when one requires the value ofonly a subset of the states. One can generate manysample episodes starting from these states, averag-ing returns only from these states, and ignoring allothers.
The complete Monte Carlo control algorithm,which combines Exploring Starts (ES) and ε-greedy,is given in Figure 2. The ES technique starts anepisode by randomly choosing the initial state. Theconcept of ε− greedy means that most of the timean action that has maximal estimated action value ischosen, but with probability ε an action is selectedat random. That is, all non-greedy actions are giventhe minimal probability of selection, ε
|A(s)|, and the
remaining bulk of the probability, 1 − ε + ε
|A(s)|,
is given to the greedy action [3], where |A(s)| isthe cardinality of action set, A(s) in state s. Thisenables the Monte Carlo control method to get outof local minima.
5
Monte Carlo Control Algorithm(Monte Carlo ES and ε-greedy combined)
Initialize:∀s
i∈ S, a ∈ A(s
i)
• Q(si, a) (value function of s
iand a) ← arbitrary positive real
number,• π(s
i) (state action mapping function of s
i) ← arbitrary action,
• Returns(si, a)← empty list
Repeat1) Generate an episode, ρ, using exploring starts [3] and π.2) For each pair s
i, a appearing in the episode ρ:
• R← return following the first occurrence of si, a.
• Append R to Returns(si, a)
• Q(si, a)← average(Returns(s
i, a))
3) For each s in the episode:π(s
i)← arg max
aQ(s
i, a) with probability 1− ε
π(si)← rand(A(s
i)) with probability ε
V (si)← max
aQ(s
i, π(s
i))
4) ε→ 0
Until V (si), ∀s
i∈ S converge, i.e.,
∑si∈S
|V (si)−V
′(si)
V (si)| ≤ δ
Fig. 2. Monte Carlo control method with Exploring Starts and ε-greedy
Note that, applying a simple Monte Carlo controlmethod, based on only exploring starts [3], is notsatisfactory, i.e. some states are rarely visited,while some other states form cycles. An ε−greedy[3] method is often combined with the ES method.A fixed ε is not efficient. A large ε yields slowconvergence and tends to oscillate; with a small ε,it is difficult to eliminate the possible state cycles.Typically, one chooses ε
i→ 0 such that
∑i
εi→∞.
For example, with εi= ε
N+i
, N = (2, 6), the resultsare satisfactory.
In the Monte Carlo control algorithm, all thereturns for each state-action pair are accumulatedand averaged, irrespective of which policy was inforce and when they were observed. Convergenceof the Monte Carlo control method is assuredbecause the changes to the action-value functiondecrease over time. A formal proof has not yetappeared, and it is one of most fundamental openquestions in reinforcement learning [3][4].
V. ORGANIZATION DESIGN
In a C2 environment [10], each mission taskcan be modeled as a desired effect, while timecritical tasks and “mosquito” tasks can be viewedas exogenous events. A Task is an activity thatentails the use of relevant resources (providedby organization’s platforms) and is carried outby an indvividual DM or a group of DMs toaccomplish the mission objectives [12]. Each taskT
i(i = 1, ..., N) has resource requirements specified
by the row vector [Ri1, Ri2, ..., RiL
]. A platform isa physical asset of an organization that providesresource capabilities and is used to process tasks.Each platform belongs to a unique platform class.For each platform class P
m(m = 1, ..., K), we
define its resource capability vector [rm1, ...rmL
],where r
mlspecifies the number of units of resource
type l available on platform class Pm
[12]. We haveassume that the number of platform classes andthe available platforms in each platform class isknown. In the MDP model, each action a
iis a set
of platforms P (ai) = [x
i1, xi2, ..., xim], specifying
6
the numbers of platforms of each class needed topursue the action. The workload of each decisionmaker in the organization embodies the activityof supervising the platforms and coordinating theplatforms with other decision makers to pursue theaction collaboratively to achieve the desired effectand mitigate the effects of exogenous events.
Starting from the near-optimal strategy, we candesign the organization structure which is congruentto this strategy. In this paper, we only study theorganization as the ownership of each platformof every platform class, so that the expected totalworkload of the organization is minimized witheach decision maker’s workload constraint satisfied.
Let [n1, ..., nm] be the available numbers of
platforms for platform classes [P1, ..., Pm]. Let
[xi1, ..., xim
] be the numbers of platforms of eachplatform class for action a
i. Define [X1, ..., Xm
]as the numbers of platforms for any action. Sincethe occurrence of actions in the optimal strategy isprobabilistic, so is [X1, ..., Xm
].
Let yki
be the the number of platforms ofclass P
iallocated to DM
k. For the organization
ORG = (DM1, ..., DMD), the workload WL
kof
DMk, k ∈ {1, ..., D} is approximated as:
WLk
= αm∑
i=1
yki
ni
E(Xi)
+βm∑
i=1
m∑
j=1
[y
ki(n
j− y
kj)
nin
j
E(XiX
j)]
∀k = 1, ..., D (2)
where∑
m
i=1
yki
niE(X
i) quantifies the internal
workload of DMk, which accounts for the super-
vision of platforms that DMk
owns. The internalworkload of DM
kincurred by platform class P
iis
proportional to- expected number of platforms of class P
i, i.e.,
E(Xi);
- the proportion of platforms of class Pi
thatDM
kowns, i.e., yki
ni;
That is, the larger is the value of E(Xi), the
higher is the workload imposed on platform classP
i; the larger the proportion of platforms of class
Pi
owned by decision maker DMk, i.e. yki
ni, the
higher is the workload from class Pi
on DMk. The
term∑
m
i=1
∑m
j=1[yki(nj−ykj)
ninjE(X
iX
j)] quantifies the
external workload, i.e., the coordination effort ofDM
kcooperating with other DMs in the organi-
zation to execute tasks. The external workload onDM
kincurred by the platform classes P
iand P
jis
proportional to- joint expectation of numbers of platforms of
classes Pi
and Pj, i.e., E(X
iX
j);
- the proportion of platforms of class Pi
thatDM
kowns, i.e., yki
ni;
- the proportion of platforms of class Pj
thatDM
kdoesn’t own, i.e., (nj−ykj)
nj.
Here, the higher order expectations of joint numbersof platform classes, e.g., third order expectationE(X
iX
jX
w), i, j, w ∈ {1, ..., D}, are ignored.
User specified constants α and β define the weightson internal and external workloads.
Thus, the total workload of all DMs is:
WLall
= αm∑
i=1
E(Xi)
+βm∑
i=1
m∑
j=1
[(1−
∑D
k=1y
kiy
kj
nin
j
)E(XiX
j)] (3)
The optimization problem associated withorganizational design is as follows:
objective : min WLall
= α∑
m
i=1E(X
i)
+β∑
m
i=1
∑m
j=1[(1−
∑D
k=1ykiykj
ninj)E(X
iX
j)]
(4)s.t. WL
k= α
∑m
i=1
yki
niE(X
i)
+β∑
m
i=1
∑m
j=1[yki(nj−ykj)
ninjE(X
iX
j)] ≤ C
k
∀k = 1, ..., D∑
D
k=1y
ki= n
i∀i = 1, ..., m
yki∈ I+ ∀k = 1, ..., D ∀i = 1, ..., m (5)
Note that E(Xi) is the expected number of
platforms of class Pi, and E(X
iX
j) is the joint
expectation of numbers of platform classes Pi
andP
j.These expectations are approximated by their
sample means:
Xi≈ E(X
i)
XiX
j≈ E(X
iX
j), ∀i, j ∈ {1, ..., m} (6)
7
This problem, defined in Eq.(5), is typically anonlinear constrained integer optimization problem,which is NP-hard. We applied MINLP (MixedInteger Nonlinear Programming) algorithms inTOMLABTM [17] optimization software to solvethe above integer programming problem. The solverpackage minlpBB for sparse and dense mixed-integer linear, quadratic and nonlinear programmingwas employed for this problem.
VI. SIMULATION
A. Simulation ModelWe illustrate our approach to effects-based
robust organization design using a simple joint-task-force scenario that can be operationalized inthe distributed dynamic decision-making (DDD-III)war-gaming simulator [9]. In the example system,there are four platform classes, i.e., P1, P2, P3,P4, as defined in Table I, and three mission tasks(desired effects), i.e., M1, M2, M3, and three timecritical tasks (exogenous events), i.e., T1, T2, T3,as defined in Table II. The dependencies amongmission tasks are shown in Fig. 3.
The following three types of resources are con-sidered:
- ASUW – Anti-SUbmarine Warfare- STK – STriKe warfare- SOF – Special/ground Operations
Rules of the simulation model, which are un-known to the learning agent, are as follows:
- All mission tasks should be executed by satis-fying the dependency constraints;
- All time critical tasks, if they ever appear, haveto be completed before the final mission taskis completed;
- Each resource contributes to the task accuracy;the more resources are allocated to a task, thehigher is the probability of completing the task.
- If the final mission task is achieved, the gameis won and a reward of 5,000 units is earned;
- If there are three time critical tasks existing inthe system, the game is lost and a penalty of3,000 units is incurred.
B. Monte Carlo SimulationsBy using the Monte Carlo control method as
shown in Fig. 2, we obtain a near-optimal (sta-ble) action strategy via 85000 runs (episodes). The
TABLE I
PLATFORMS
Platform Name Number ASUW STK SOF CostP1 F18s 3 0 2 0 100P2 FAB 5 1 0 0 80P3 FOB 3 1 1 1 160P4 SOF 2 0 0 1 60
Reward (Win) 5000Penalty (Lose) -3000
TABLE II
MISSIONS AND TIME CRITICAL TASKS
Mission Name ASUW STK SOFM1 NB - Naval Base 2 0 2M2 AB - Air Base 0 6 0M3 PRT - Sea Port (final goal) 2 2 0T1 SCUD - missile launcher 1 1 0T2 PH - Hostile ship 1 0 1T3 TSK - complex ground task 0 1 1
optimal action strategy, i.e., the mapping of statesto actions, viz., the combinations of platforms,and values of states, are as listed in Table III.The comparison of net rewards from the near-optimal strategy obtained by Monte Carlo controlmethod and a randomized greedy strategy from2000 runs are shown in Fig. 4. The greedy strategychooses randomly, at each state, one of the fivemost economically feasible platform combinations.After 10000 runs (sample episodes), the average netreward of the near-optimal strategy is 2985, whilethe average net reward of the greedy strategy isonly 2127. Thus, the near-optimal strategy providesa 40% better reward than the greedy strategy.
C. Organization Design ResultsThe platform utilization statistics, i.e., the sample
mean of the numbers of platforms of each class
x
x
x
��
��
���*
HH
HH
HHHj
M1 : Naval Base
M2 : Air Base
M3 : Sea Port
Fig. 3. Effect Dependency
8
-8000 -6000 -4000 -2000 0 2000 40000
500
1000
1500Greedy Strategy (Histogram)
Net Reward-8000 -6000 -4000 -2000 0 2000 4000
0
500
1000
1500Near-Optimal Strategy (Histogram)
Net Reward
Fig. 4. Comparison of Net Reward Distributions
TABLE III
NEAR-OPTIMAL STRATEGY
State Action (Platforms) State ValueP1 P2 P3 P4
∅ 1 1 1 3 2777.72M1 3 0 0 0 3008.48M2 0 2 0 2 3565.36T1 1 2 0 1 2115.63T2 0 2 0 2 2174.47T3 1 0 0 1 2291.07
T1, T2 1 1 0 0 1796.93M1, T1 3 0 0 0 2270.26M1, T2 0 2 1 2 2278.97M1, T3 3 0 0 0 2100.65M2, T1 0 2 0 2 2407.18· · · · · · · · ·
M1, T1, T2 1 0 1 1 1693.73M1, M2 1 2 0 0 3221.11
M1, M2, T1 1 2 0 1 2358.82· · · · · · · · ·
Xi, i ∈ {1, ..., m} and the sample mean of joint
numbers of platform class pairs XiX
j, i, j ∈
{1, ..., m} are listed in Table IV. Using MINLPalgorithms in TOMLABTM , we can obtain thecongruent organization for the near-optimal strategyas shown in Table V.
VII. CONCLUSION AND FUTURE WORK
In this paper, we proposed a methodology fordesigning a robust organization for stochastic anddynamic environments. The dynamic environmentcan be modeled as a finite state Markov DecisionProcess. Using Monte Carlo control methods,a near-optimal action strategy is obtained. An
TABLE IV
PLATFORM UTILIZATION STATISTICS
P1 P2 P3 P4
Xi 1.2 2.3 3.1 2
XiXj P1 P2 P3 P4
P1 1.5 0.75 0.25 1.00P2 0.75 1.916 0.25 1.58P3 0.25 0.25 0.333 0.667P4 1.00 1.583 0.667 2.667
TABLE V
DECISION MAKER PLATFORM OWNERSHIP
Expected WorkloadDM P1 P2 P3 P4 Workload Constraint
(α = 1, β = 1)DM1 1 3 1 1 5.7065 8DM2 1 1 1 1 4.9265 6DM3 1 1 1 0 3.3766 6
organization congruent to this strategy is designedby solving an integer optimization problem.
Simulation results support the conclusion thatthe Monte Carlo control methods are effectivein achieving the near-optimal strategies in realworld applications, where the system parametersare uncertain. Formulation of organizational designproblem and mix-integer optimization algorithmsprovide nice vehicles to realize the design oforganizations that are congruent with their dynamicand uncertain environments.
We are pursuing future research along the follow-
9
ing directions:- Incorporate realistic mission environments into
the MDP model.- Include additional organizational structure ele-
ments into the design process, e.g., commandstructure, information flow structure.
- Study the mechanisms of organizational adap-tation via agent-based simulations.
REFERENCES
[1] D. P. Bertsekas and S. E. Shreve, Stochastic OptimalControl: The Discrete-Time Case, Athena Scinetific,1996.
[2] D. P. Bertsekas, Dynamic Programming and OptimalControl, Athena Scientific 2001.
[3] R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction, The MIT Press.
[4] L.P. Kaelbling, M. L. Littman, and A. W. Moore, Re-inforcement learning: A survey, Journal of ArtificialIntelligence Research, 4:237-285.
[5] P. R. Kumar and P. Variya, Stochastic Systems: Estima-tion, Identification, and Adaptive Control, Prentice-Hall,Englewood Cliffs, NJ.
[6] G. Tesauro and G. R. Galperin, On-line Policy Improve-ment Using Monte-Carlo Search, Advances in NeuralInformation Processing: Proceedings of the Ninth Con-ference.
[7] M. L. Puterman, Markov Decision Processes: DiscreteStochastic Dynamic Programming, Wiley-IntersciencePublication.
[8] D. S. Alberts and R. E. Hayes. (2003), Power to theEdge: Command Control in the information Age, Infor-mation Age Transformation Series, CCRP Publications.
[9] D. L. Kleinman, P.W. Young, and G. S. Higgins, TheDDD-III: A tool for empirical research in adaptiveorganizations, Proc 1996 Command Control Res TechnolSymp, NPS, Monterey, CA, June 1996.
[10] D. L. Kleinman, G. M. Levchuk, S. G. Hutchins and W.G. Kemple, Scenario Design for the Empirical Testing ofOrganizational Congruence, 8th International Commandand Control Research and Technology Symposium, Mon-terey, CA, June 2003.
[11] C. Meirina, G. M. Levchuk, and K. R. Pattipati, A Multi-Agent Decision Framework for DDD-III Environment,8th International Command and Control Research andTechnology Symposium, Monterey, CA, June 2003.
[12] G. M. Levchuk, Y. N. Levchuk, J. Luo, K. R. Pattipatiand D.L. Kleinman, Normative Design of OrganizationsPart I: Mission Planning, IEEE Trans. on SMC: Part ASystems and Humans, Vol. 32, No. 3, pp. 346-359, May2002.
[13] G. M. Levchuk, Y.N. Levchuk, J. Luo, K.R. Pattipatiand D. L. Kleinman, Normative Design of OrganizationsPart II: Organizational Structures, IEEE Trans. on SMC:Part A Systems and Humans, Vol. 32, No. 3, pp. 360-375, May 2002.
[14] G. M. Levchuk, Y. Levchuk, J. Luo, F. Tu and K.R. Pattipati, A Library of optimization Algorithms forOrganizational Design, 2000 Command and ControlSymposium, Monterey, CA, June 26-28, 2000.
[15] K. M. Carley and Z. Lin, Organizational design suitedto high performance under stress”, IEEE transactions onSystem, Man and Cybernetics, Volume 25, 1995, pp.221-231.
[16] G. L. Nemhauser, L. A. Wolsey, Integer and Combina-torial Optimization,Wiley-Interscience, New York, 1988.
[17] K. Holmstrom, A. O. Goran and M. M. Edvall, User’sGuide for Tomlab /MINLP,http://tomlab.biz/download/manuals.php.
1
Effects-Based Design of Robust OrganizationsEffects-Based Design
of Robust Organizations
Command and Control Research and Technology SymposiumCommand and Control Research and Technology SymposiumJune 16, 2004June 16, 2004
SuiSui RuanRuanCandraCandra MeirinaMeirina
HaiyingHaiying TuTuFeiliFeili YuYu
Krishna R. Krishna R. PattipatiPattipati
Dept. of Electrical and Computer EngineeringUniversity of Connecticut
Contact: [email protected] (860) 486-2890
2
Introduction
Modeling Mission Environment
Markov Decision Processes for Mission Environment
Monte Carlo Control Method
Robust Organization Design
Summary and Future Work
OutlineOutline
Illustrative Example
3
Methodology:Mission Model: Finite-state Markov Decision ProcessMethods:
Robust strategiesMonte Carlo Control Methods
Robust structuresMixed Integer Nonlinear Programming
Design ProblemDesign ProblemEffectsEffects--Based Design of Robust OrganizationsBased Design of Robust Organizations
Objective:Design robust organizational structures and strategies to account for a dynamically changing mission environment
4
MDP ModelingTechnique
Learning Methods
Mixed-Integer Optimization
MDP Model of Mission
Near-OptimalStrategy
Robust Organization
Mission &
O
rganizationC
onstraints
Organization
Robust Organization Design Methodology:
Design MethodologyDesign Methodology
5
Characteristics of dynamic and stochastic environments:
Parts of the environment cannot be controlled directly
Various exogenous events may impact the environment
Consequences of actions cannot be predicted a priori with certainty
Modeling Mission Environment 1/3Modeling Mission Environment 1/3
Reqs. for organizations coping with stochastic environments:Plan for potential contingencies
Maintain CongruentCongruent with the dynamic mission environment
Be RobustRobust
6
Dynamic Stochastic Mission Environment:Effects: the desired effects, with some serving as the end goals
Exogenous events: uncontrollable random events
Actions: controllable influences to achieve the desired effects, and minimize the adverse effects of exogenous events
Organization:A team of Decision Makers (DM)
Human or automated system
Limited resource handling capability (workload threshold)
Modeling Mission Environment 2/3Modeling Mission Environment 2/3
7
Command Control Mission Environment and OrganizationCommand Control Mission Environment and Organization
iTiTTi
1 2[ , ,..., ]m m mLr r rAsset-Task Allocation Actions
Platform (Asset):Resource Capability Vector:
Organization:Ownership of Platforms
1 2[ , ,..., ]i i iLR R R
Exogenous Events
Mission Tasks Effects
Time critical tasks
“Mosquito” tasks
Task:Resource Requirement Vector: .
Modeling Mission Environment 3/3Modeling Mission Environment 3/3
8
States:Status of effects and exogenous events:
},...,,{ 21 zsssS =
},...,,{ 21 kaaaA =
0)( >esr0)( <hsr
MDP for C2 Mission EnvironmentMDP for C2 Mission EnvironmentiM M⊆
Reward Mechanism:
Actions:
Reward: desired end effect is reachedPenalty: undesirable end effects are reachedCost : action is pursued
),( iii EMs =iM M⊆ Achieved effects
iE E⊆ Unmitigated exogenous events
, Platform to task allocation
( ) 0iC a >
Transition Probability Matrix: ' 1( ' | , }ass t t tpr s s s s a a+℘ = = = =
Optimal Action Strategy:Mapping from states to actions, maximizing the expected net reward
Markov Decision Process for C2 Mission EnvironmentMarkov Decision Process for C2 Mission Environment
9
Illustrative ExampleIllustrative Example
-3000Penalty (Lose)
5000Reward (Win)
60
160
80
100
Cost
0
1
0
2
STK
102SOFP4
113FOBP3
003F18SP1
1
ASUW
0
SOF
5
Number
FAB
Name
P2
Platform
110TSK –complex group task
T3
101Hostile shipT2
011SCUD – missileT1
022Sea Port ( final)M3
060Air BaseM2
2
SOF
0
STK
2
ASUW
Naval Base
Name
M1
Task
Naval Base (M1)
Air (M2)Base
Sea (M3)Port
SCUD
Hostile Ship
TSK
F18S
FAB
FOBSOF
10
- Mission Task
- Time Critical Task- “Mosquitoes”
- Asset
Legendary
Monte Carlo Control Method – 1/4Monte Carlo Control Method – 1/4
2000a2 (<3P1> m2)1320
1560
S-A Value
a3 (<P3> T2)
a1(<2P2+2P4> m1)
Action
S1(T2)
State
Epsilon-greedy method: Selects best action with probability of 1 - ε→ Avoiding local minimum
Air (M2)Base
Sea (M3) Port
SCUD (T2)
action a2
m2
state s1
Nava (M1) Base
Objective: Learn Optimal Action StrategyMapping from states to actions, maximizing the expected net reward
Exploration method for finding a start state:Episode starts from a randomly selected initial state
11
1400a5(<P3> T1, <P2+P4> T2)
700a3 (<P3> T2)
1020
1200
S-A Value
a4(<P2+P4> T1)
a1(<2P2+2P4> m1)
Action
S2(M2,T2)
State
Monte Carlo Control Method – 2/4Monte Carlo Control Method – 2/4
Naval (M1) Base
Sea (M3)Port
SCUD (T2)
action a5
state s2
Hostile Ship(T1)
T1
T2
Air (M2)Base
- Mission Task
- Time Critical Task- “Mosquitoes”
- Asset
Legendary
12
2170a2 (<3P1> m2)1320a3 (<P3> T2)
1560a1(<2P2+2P4> m1)S1(T2)
2800a5(<P3> T1, <P2+P4> T2)
700a3 (<P3> T2)
1020
1200
S-A Value
a4(<P2+P4> T1)
a1(<2P2+2P4> m1)
Action
S2(M2,T2)
StateUpdate Update
statestate--action valueaction value
state sn Net reward3400
Naval (M1) Base
Sea (M3)Port
Air (M2)Base
Monte Carlo Control Method – 3/4Monte Carlo Control Method – 3/4
13
3020a5
1454a4
930a3
1356a3
323
1780
2115
State-Action Value
a2
a1
a1
Action
M2,T2
T2
State
Converged state-action values Optimal Strategy
Optimal Strategy: Mapping from states to actions
P2+P3+P4M2, T2
P1+P2T1, T2
2P2+2P4T2
P1+P2+P3+3P4
P1+P4T3
P1+2P2+P4T1
2P2+2P4M2
3P1
Optimal Action
M1
Stateφ
Monte Carlo Control Method – 4/4Monte Carlo Control Method – 4/4
14
OrganizationDM1 DM2
DM3 DM4
Internal Workload
External Workload
Workload constraint
Robust Organization Design 1/2Robust Organization Design 1/2
Problem formulation
Mixed-integer optimization algorithms
Robust organization
Asset utilization of the near-optimal strategyObjective:
Design a congruent organizational structure in terms of DM ownership of platforms, such that the overall workload is minimized
15
Workload of DMk :
Integer Optimization Problem:Objective: Minimize overall workload
Subject to:1) Each DM cannot exceed his workload constraint2) Each platform has to be assigned to a DM
kWL =Internal Workload + External Workloadm
i=1
k
Internal workload {(platform class activity)*
(number of platforms of platform class owned by DM )}
i
i
P
P
∝ ∑
1 1
k
External workload {(platform classes cross activity)*
(number of platforms of platform class owned by DM )* (nu
m m
i ji j
i
P P
P= =
∝ ∑∑
kmber of platforms of platform class not owned by DM )}jP
Robust Organization Design 2/2Robust Organization Design 2/2
16
DM1P1+3P2+P3+P4
DM2P1+P2+P3+P4
DM3P1+P2+P3
Robust OrganizationalStructure
Mix-integer nonlinear
programming algorithms
Illustrative Example - RevisitIllustrative Example - Revisit
Near- Optimal Strategy
Statistics ofPlatform Utilization
1.5 0.75 0.25 1.00
0.75 1.916 0.25 1.58
0.25 0.25 0.33 0.67
1.00 1.58 0.67 2.67
1 1.08 0.33 1.34
P1 (3)
P2(5)
P3(3)
P4(2)
P1
P2
P3
P4
Expected Platform Utilization
Expected PlatformCoordination
17
SummarySummary
Proposed a methodology for designing robust organizations for dynamic and stochastic environments
Modeled the mission environment as a finite state Markov Decision Process
Applied Monte Carlo control methods to obtain a near-optimal action strategy
Utilized mixed-integer optimization technique to design organizational structure congruent to the strategy
18
Modeling Parameters:Incorporate more realistic mission environments into MDP model
Task locationsPlatform locations, velocities
Space Reduction in Learning:Generalization (Function Approximation)Abstraction (Factored Representation)
Organizational Design:Include additional organizational structure elements into the design process
Command structureInformation flow structure
Future WorkFuture Work
19
Thank You