1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Hybrid Agent-Based Modeling: Hybrid Agent-Based Modeling: Architectures,Analyses and ApplicationsArchitectures,Analyses and Applications(Stage One)(Stage One)

Li, Hailin

Outline

Introduction

Least-Squares Method for Reinforcement Learning

Evolutionary Algorithms For RL Problem (in progress)

Technical Analysis based upon hybrid agent-based architecture (in progress)

Conclusion (Stage One)

Introduction Learning From Interaction

Interact with environment Consequences of actions to achieve goals No explicit teacher but experience

Examples Chess player in a game Someone prepares some food The actions of a gazelle calf after its born

Introduction Characteristics

Decision making in uncertain environment Actions

Affect the future situation Effects cannot be fully predicted

Goals are explicit Use experience to improve performance

Introduction What to be learned

Mapping from situations to actions Maximizes a scalar reward or reinforcement

signal Learning

Does not need to be told which actions to take Must discover which actions yield most reward

by trying

Introduction Challenge

Action may affect not only immediate reward but also the next situation, and consequently all subsequent rewards

Trial and error search Delayed reward

Introduction Exploration and exploitation

Exploit what it already knows in order to obtain reward

Explore in order to make better action selections in the future

Neither can be pursued exclusively without failing at the task

Trade-off

Introduction Components of an agent

Policy Decision-making function

Reward (Total reward, Average reward, Discounted reward) Good and bad events for the agent

Value Rewards in a long run

Model of environment Behavior of the environment

Introduction Markov Property & Markov Decision Processes

“Independence of path”:all that matters is in the current state signal

A reinforcement learning task that satisfies the Markov property is called a Markov decision process, MDP

Finite Markov Decision Process (MDP)

1( | , )ass t t tP P s s s s a a

1 1( | , , )ass t t t tR E r s s a a s s

Introduction Three categories of methods for solving the

reinforcement learning problem

Dynamic programming Complete and accurate model of the environment A full backup operation on each state

Monte Carlo methods A backup for each state based on the entire sequence of observed

rewards from that state until the end of the episode Temporal-difference learning

Approximate the optimal value function, and to view the approximation as an adequate guide

LS Method for Reinforcement Learning

For stochastic dynamic system 1 , ,t t t tx f x a w

: Current State : Control decision generated by policy

: Disturbance independently sampled from some fixed distribution

is a Markov chain

, , ,S A P RA P

R ,t tg x a : PrS A

0 1 2, , ,...x x x

MDP can be denoted by a quadrupleS : State Set : Action Set : state transition probability

: denotes the reward function : The policy is a mapping

For each policy , the value function is defined by equation:

J x E g x x x x

The optimal value function is defined by J

maxJ x J x

The optimal action can be generated through

arg max , , ,t t twa A

a E g x a J f x a w

Introducing Q value function

, , , ,w

Q x a E g x a J f x a w

arg max ,t ta A

a Q x a

Now the optimal action can be generated through

The exact Q-values for all state-action pairs can be obtained by solving the Bellman equations (full backups):

1 1 1 1 1, , , , , , , ,t t

t t t t t t t t t t t t tx xQ x a P x a x g x a x P x a x Q x x

or, in matrix format:

P S A S A denotes the transition probability from to ,t tx a 1 1,t tx x

Traditional Q-learning

Popular variant of temporal-difference learning to approximate Q value functions.

In the absence of the model of the MDP, using sample data 1, , ,t t t tx a r x

The temporal difference is defined as: td

1 1, max , ,t

t t t t t t ta

d g x a Q x a Q x a

Consider one-step Q-learning, the updated equation is:

, ,t t

tQ x a Q x a d

The final decision base upon Q-learning:

arg max ,t t ta A

a Q x a

The reason for the development of approximation methods:

•Size of state-action space•The overwhelming requirement for computation

•Model Approximation •Policy Approximation •Value Function Approximation

The categories of approximation methods for Machine Learning:

Model-Free Least-Squares Q-learning

, , , ,K

Q x a w w k x a x a W

1,..., k : Basis Functions '1 ,...,W w w K : A vector of scalar weights

Linear Function Approximator

For a fixed policy LS Method for Reinforcement Learning

is S A K matrix and K S A

If the model of MDP P , is available

1 1 1 1 1 1

, , , ,

t tS A S Ax

P x a x g x a x

The policy

1W A B

where PTA and TB

If the model of MDP P , is not available: Model-Free

1, , , 1, 2,...,t t t ti i i ix a r x i L Given Samples

1 1, , ,T

t t t t t tA A x a x a x x

,t t tB B x a r

Optimal policy can be found:

1 arg max , arg max ,t

a ax Q x a x a W

The greedy policy is represented by the parameter t

and can be determined on demand for any given state.

Simulation System is hard to model but easy to simulate Implicitly indicate the features of the system in terms of

the state visiting frequency Orthogonal least-squares algorithm for training

an RBF network Systematic learning approach for solving center selection

problem Newly added center always maximizes the amount of

energy of the desired network output

Hybrid Least-Squares Method

Least-Squares Policy Iteration (LSPI) algorithm

Simulation & Orthogonal Least-Squares regression

Environment

State Action

Feature Configuration

Reward

Optimal policy

Simulation

Cart-Pole System

sin sgnsin cos

t tt p t ctp

t tc p p

F m l xg

m m m l

2. .. .

..sin cos sgnt t tt p t t c

F m l x

Simulation

Conclusion(Stage One)

From Reinforcement learning perspective, the intractability of solutions to sequential decision problems requires value function approximation methods

At present, linear function approximators are the best alternatives as approximation architecture mainly due to their transparent structure.

Model-free least squares policy iteration (LSPI) method is a promising algorithm that uses linear approximator architecture to achieve policy optimization in the spirit of Q-learning. May converge in surprising few steps

Inspired by orthogonal least-squares regression method for selecting the centers of RBF neural network, a new hybrid learning method for LSPI can produce more robust and human-independent solution.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Documents

Transcript of 1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

EE141 System-on-Chip Test Architectures Ch. 2 – Digital Test Architectures - P. 1 Chapter 2 Digital Test Architectures.

ARCHITECTURES Series April 2017 - ARTEdownload.sales.arte.tv/...ARCHITECTURES...Low_Def.pdf · ARCHITECTURES The most remarkable achievements in modern architecture, from the ...

NONLINEAR OPTICAL SPECTROSCOPY OF DIPOLE AND ...pages.uoregon.edu/hailin/Palinginis.pdfPei-Cheng Ku, Forrest Sedgwick, Connie J. Chang Hasnain, Phedon Palinginis, Tao Li, Hailin Wang,

DR. HAILIN WU DR. YOSEPH FELEKE Kidney Function Panel.

COHERENT NONLINEAR OPTICS OF ELECTRON SPINS IN SEMICONDUCTORSpages.uoregon.edu › hailin › YSthesis_Main.pdf · Nonlinear optical processes in semiconductors are fundamentally

Slide 1 OutlineOutline Classification ILP Architectures Data Parallel Architectures Process level Parallel Architectures Issues in parallel architectures.

Importance of Soil Testing Hailin Zhang Department of Plant and Soil Sciences.

ELECTROMAGNETICALLY INDUCED TRANSPARENCY IN SEMICONDUCTORSpages.uoregon.edu/hailin/MCPthesis.pdf · Electromagnetically Induced Transparency (EIT) is a phenomenon in which the presence

Mobile Application Architectures Mobile Application Architectures

Enterprise IT Architectures Enterprise Architecture ... · Enterprise IT Architectures Enterprise Architecture – Governance ... Enterprise IT Architectures Enterprise Architecture

Dr. Hailin Wu. FORMULAS THAT CLEARS HEAT AND TRANSFORMS PHLEGM Bei Mu Gua Lou San (Fritillaria and Trichosanthes Fruit Powder)

Software Architectures, Week 5 - Advanced Architectures

Aligning Enterprise, System, and Software Architectures€¦ · Aligning Enterprise, System, and Software Architectures. Aligning enterprise, system, and software architectures

DR. HAILIN WU DR. YOSEPH FELEKE Cardiac Injury. (Creatine Kinase (CK), CK-MB, Myoglobin, Troponin I)

Router Architectures An overview of router architectures.

Agent architectures

Software Architectures, Week 4 - Message-based Architectures, Message Bus

Web Architectures

Multiprocessor computer architectures : algorithmic design ... · Multiprocessor computer architectures : algorithmic design and ... ARCHITECTURES: ALGORITHMIC DESIGN AND APPLICATIONS

Technical Architectures