Navigate swarms with AEPSO/CAEPSO

Post on 19-Mar-2016

43 views 1 download

Tags:

description

Adham Atyabi Supervisor: Dr. Somnuk Phon-Amnuaisuk Co-supervisor: Dr. Chin Kuan Ho. Navigate swarms with AEPSO/CAEPSO. What is Navigation?. Navigation to steer a course through a medium to steer or manage (a boat) in sailing http://www.merriam-ebster.com/dictionary/navigated - PowerPoint PPT Presentation

Transcript of Navigate swarms with AEPSO/CAEPSO

Adham AtyabiSupervisor: Dr. Somnuk Phon-Amnuaisuk

Co-supervisor: Dr. Chin Kuan Ho1

• Navigation • to steer a course through a medium• to steer or manage (a boat) in sailing

http://www.merriam-ebster.com/dictionary/navigated

• Different navigational techniques have evolved over the ages, all involve in locating one's position compared to known locations or patterns.

2

)R.Siegwart, I.R. Nourbakhsh, 2004(

3

4

The problem is Hostile robotic scenario based on cooperative robots trying to navigate bombs location and disarm them.

The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area).

The likelihood information is uncertain (because of noise and Illusion effects). 5

6

To identify design and evaluate strategies for implementing a new Particle Swarm Optimization (PSO) for robot navigation in hazard scenarios and hostile situations.

To solve the uncertainty in the perception level of the robots/agents in cooperative learning scenario.

To reduce the proportion of involved robots in the navigation tasks with the aim of reducing costs.

To solve the initial location dependency in navigation scenarios.

7

Navigation

Particle Swarm

Optimization (PSO)

Area Extension

(AEPSO) PSO

Robotic

Scenarios& Results

Conclusion

8

1. CLASSICAL APPROACHES The current developed classic methods are variations of a few general approaches: Roadmap (Retraction, Skeleton, or Highway approach)Cell Decomposition (CD)Potential fields (PF)mathematical programming.

2. HEURISTICAL APPROACHESArtificial Neural Network (ANN)Genetic Algorithms (GA)Particle Swarm Optimization (PSO)Ant Colony (ACO) Tabu Search (TS)

Heuristic algorithms do not guarantee to find a solution, but if they do, are likely to do so much faster than classical methods.

9

(Latombe , 1991, Keil and Sack, 1985, Masehian and Sedighzadeh, 2007, Pugh et al, 2007, Ramakrishnan and Zein-Sabatto, 2001, Hettiarachchi, 2006, Hu et.al, 2007, Liu et.al 2006, Mohamad et. Al 2006, Mclurkin and Yamins, 2005, Ying-Tung et. Al 2004)

• Navigation techniques performances are highly dependent to their initialization and reliability of their map.

• According to the literatures, in real robotic domains, a small difference in the starting location of the robots or goals may shows high effect on the overall performance.

• Due to the dynamic, noisy and unpredictable nature of real-world robotic applications, it is quite difficult to implement navigation technique based on a well-known predefined map.

(Pugh and Zhang,2005, Pugh and Martinoli,2006,2007; Gu et al. , 2003) 10

Navigation

Particle Swarm

Optimization (PSO)

Area Extension

(AEPSO) PSO

Robotic

Scenarios& Results

Conclusion

11

PSO is an Evolutionary Algorithm inspired from animal social behaviors. (Kennedy, 1995, Ribeiro and Schlansker, 2005; Chang et al., 2004; Pugh and Martinoli, 2006; Sousa et al., 2003; Nomura,2007) PSO outperformed other Evolutionary Algorithms such as GA in some problems (Vesterstrom and Riget, 2002; Ratnaweera et al., 2004; Pasupuleti and Battiti,2006). Particle Swarm Optimization (PSO) is an optimization technique which models a set of potential problem solutions as a swarm of particles moving about in a virtual search space. (Kennedy, 1995 ) The method was inspired by the movement of flocking birds and their interactions with their neighbors in the group. (Kennedy, 1995 ) PSO achieves optimization using three primary principles:

1) Evaluation, where quantitative fitness can be determined for some particle location;

2) Comparison, where the best performer out of multiple particles can be selected;

3) Imitation, where the qualities of better particles are mimicked by others.12

Every particle in the population begins with a randomized position X(i,j) and randomized velocity V(i,j) in the n-dimensional search space. where i represent the particle index and j represents the dimension in the search space Each particle remembers the position at which it achieved its highest performance (p). Each particle is also a member of some neighborhood of particles, and remembers which particle achieved the best overall position in that neighborhood (g).Vij(t)= last Velocity + Cognitive component + Social componentVij(t)= w*Vij(t-1) + C1*R1*(pij-xij(t-1)) + C2*R2*(gi-Xij(t-1)) X(t)= X(t-1)+ V(t)

13

1. Single objective domains• Improvement on neighborhood topology, velocity equation, global best

and personal best.

2. Multi objective domains:• Niching PSO, Mutation, Parallelism, Re-initialization, Clearing

memory, Using Sub-Swarms(Brits, Engelbrecht, and Van Den Bergh, 2002,2003; Yoshida, et al.,2001; Stacey,

Jancic and Grundy,2003;Chang, et al., 2005; Vestestrom, Riget, 2002; Qin et al., 2004; Pasupuleti and Battiti, 2006; Ratnaweera et al., 2004;Peram et al., 2003; Parsopoulos and Vrahatis, 2002)

14

Navigation

Particle Swarm

Optimization (PSO)

Area Extension

(AEPSO) PSO

Robotic

Scenarios & Results

Conclusion

15

• The amount of robots used in literatures are 20 to 300 robots (Lee at al.,2005; Hettiarachchi, 2006; Werfel et al., 2005; Chang et al., 2005; Ahmadabadi et al., 2001; Mondada et al. 2004).

• Robots can use more knowledge (e.g. robots have knowledge about the location of goals and their teammates) (luke et al., 2005; Ahmadabadi et al., 2001; Yamaguchi et al., 1997; Martinson and Arkin, 2003).

• It is commune to train robots individually (Ahmadabadi et al., 2001; Yamaguchi et al, 1997; Hayas et al., 1994).

16

1. Parallel Learning in Heterogeneous Multi-Robot Swarms-2007,2006. Evaluation in robotic learning is costly even more than the processing

of the learning algorithm itself. On real robots, sensors and actuators may have slightly different

performances due to variations in manufacturing. As a result, multiple robots of the same model may actually perceive and interact with their environment differently, creating a heterogeneous swarm.

2. Path planning for mobile robot using the particle swarm optimization with mutation operator-2004.

3. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment-2005.

4. Robot Path Planning using Particle Swarm Optimization of Ferguson Splines-2006.

5. Obstacle-avoidance Path Planning for Soccer Robots Using Particle Swarm Optimization- 2006.

17

18

Navigation

Particle Swarm

Optimization (PSO)

Area Extension

(AEPSO) PSO

Robotic

Scenarios& Results

Conclusion

19

1. To handle dynamic Velocity2. To handle Direction and Fitness criteria3. To handle Cooperation4. To handle diversity of search:5. To handle Lack of reliable perception (Pugh and

Martinoli, 2006; Bogatyreva and Shillerov, 2005):

20

1. New velocity heuristic which solved the premature convergence2. Credit Assignment heuristic which solve the cul-de-sacs problem 3. Hot Zone/Area heuristic.Different communications ranges

condition which provide dynamic neighborhood and sub-swarms4. Help Request Signal which provide cooperation between different

sub-swarms5. Boundary Condition heuristic which solve the lack of diversity in

basic PSO

6. Leave Force which provide the high level of noise resistance.7. Speculation mechanism which provide the high level of noise

resistance.

21

22

The idea is based on dividing the environment to sub virtual fixed areas with various credits.Areas credit defined the proportion of goals and obstacles positioned in the area. particles know the credit of first and second layer of its current neighborhood

23

•Robots can only communicate with those who are in their communication range.•Various communication ranges were used (500, 250, 125, 5 pixels).•This heuristic has major effect on the sub swarm size.•Help request signal can provide a chain of connections.

24

• Reward and Punishment• Suspend factor

•In AEPSO, robots would be suspend each time that they cross boundary lines.

•By this conditions they can escape from the areas that they are stuck in it and it is as useful as reinitializing the robot states in the environment.

25

•The Illusion idea is inspired from our real world perceptions errors and mistakes which can be easily imagined as corrupted data which could be caused by the lack of communication (satellite data’s) or even sensation elements (sensors) weaknesses. •Illusion effect forced approximately over 50% noise to the environment.

26

27

• It is commune to do the experiences in 2 phases (; Ahmadabadi et al., 2001).

1. Training2. Testing

• In the training phase, the suggested training method is important (Individual training or Team based training)

• In the testing phase, there are two different suggestions.1. Use same initialization as the training2. Use different initialization

28

•Speculation mechanism is based on using an extra memory in robots called Mask.

•Masks can take values by:1. Illusion effect.2. Robots self observation.3. Self Speculation.4. Neighbor’s observation.5. Neighbors Speculation.• Leave Force is an extra punishment which will force robots to

decrease 10% of their current area’s credit after certain iteration.

29

Navigation

Particle Swarm

Optimization (PSO)

Area Extension

(AEPSO) PSO

Robotic

Scenarios & Results

Conclusion

30

• Static Scenario.

• Dynamic scenario.

• Real-Time scenario.

•Cooperative learning scenario: •Homogeneous•Heterogeneous

31

32

•In contrast with static scenario, in dynamic domain, Bombs are able to run away.•Bomb velocity is set to 2 pixel/iteration and robots velocity is a value between 1 to 3 pixel/iteration.•Bombs’ explosion time is set to 20,000 iteration (maximum iteration). 33

The results are based on 100 run(each run is 20,000 iteration).In each run, 5 robots, 15 goals, and 44 obstacles are used.

34

-5

0

5

10

15

20

25

30

35

40

45

0 5000 10000 15000 20000

Static Dynamic

-200

0

200

400

600

800

1000

1200

1400

1600

0 5000 10000 15000 20000

Static Dynamic

Experimental results

Runs

Iterations

Located bombs

Iterations

Bombs explosion time is a random value between 3,000 to 20,000 iterations.

Robots should locate bombs before they reach to their explosion time.

A simple noise is presumed in the environment (an additional +/- value to areas’ credit).

35

36

37

AEPSO perform better local search compare with Basic PSO, Random search and Linear search in real-time and dynamic domains.

AEPSO perform well in dynamic environment and the results was reliable in noisy environment.

38

• Higher level of Noise (Illusion) is presumed.

• The scenarios have two phases:1.Training

2.Testing• Higher level of cooperation is needed.

39

• The robots have limited knowledge about the bombs location (only know the likelihood of bombs in the area).

• The likelihood information is uncertain (because of noise and Illusion effects).

• Robots should find the true credit of each area and observe those areas who have the most effect on the others first.

• Robots can inspire from their training results knowledge with the aim of solving the task faster.

• Robots should give priority to areas’ with highest effect on others.

40

41

-100

0

100

200

300

400

500

600

700

800

900

2000 4000 6000 8000 10000 12000 14000 16000 18000 20000

Testing New Initialization Training New Initialization

Testing Same Initialization Training Same Initialization

Bomb detection with Homogeneous Robots

Dete

cted

bom

bs

Iterations The results are based on 20 run(each run is 20,000 iteration).In each run, 5 robots, 51 bombs, and 51 obstacles are used.

CAEPSO achieved to reliable results with homogeneous robots due to its ability to reduce the effect of illusion from the environment

Results shows that CAEPSO achieved to 99% performance with same initialization and it also achieved to 97% performance with new initialization.

42

• There are various type of bombs and robots.• Each robot can only disarm an specific type

of bomb.• Robots and bomb types are set randomly.• Robots use more accurate version of Help

Request Signal.• Three scenarios are presumed:1.Homogeneous robots (S1).2.Heterogeneous robots (S2).3.Heterogeneous robots (S3).

43

44

45

46

CAEPSO achieved to 95% performance with heterogeneous robots.

CAEPSO was able to reduce the illusion effect from the environment and improve its movements.

47

AEPSO showed better movements compare with Basic PSO.

AEPSO achieved to reliable results with only 5 robots which is a big advantage compare with surveys.

AEPSO/CAEPSO proved its efficiency in complex scenarios based on navigation in hostile situations.

CAEPSO achieved to reliable results in homogeneous scenario with new and same initialization constrains.

CAEPSO achieved to reliable results with heterogeneous robots.

48

•In this study, we introduced AEPSO as a new modified version of Basic PSO and we also investigated its effectiveness on static, dynamic, real-time, multi dimension, and multi objective problem domains.

•It is necessary to mentioned that the small number of particles (only 5 robots) gave a great advantage to AEPSO (due to being able to reduce the costs). •Robots were able to solve problems with high level of complexities based on using poor level of knowledge (training knowledge) and high level of cooperation and experience sharing.•We are going to compare CAEPSO results with a behaviour-based version of q-learning in a Cooperative Learning scenario with Heterogeneous robots. 49

AEPSO performed better local search compare with other techniques (Basic PSO, Random Search, Linear Search).

AEPSO and CAEPSO are robust to Noise and Time dependency.

Cooperation between agents allowed CAEPSO to perform well.

50

1. "Particle Swarm Optimization with Area Extension (AEPSO)", conf CEC2007, IEEE Congress on Evolutionary Computation, Stanford university of Singapore, accepted in 15 July 2007.

2. "Effects of Communication range, Noise and Help request Signal on Particle Swarm Optimization with Area Extension (AEPSO)", conf WIC / IAT, IEEE / ACM International Conference on Intelligent Agent Technology, Stanford University, USA, 25-28 September 2007.

3. "Particle Swarm Optimizations: A Critical Review", conf IKT07, Third conference of Information and Knowledge Technology, Ferdowsi University, Iran, submitted in 21 July 2007.

51

4. “Effectiveness of a Cooperative Learning version of AEPSO in Homogeneous and Heterogeneous Multi Robot Learning Scenario”, Conf, IEEE World Congress on Computational Intelligence (WCCI 2008, CEC08), Hong Kong, accepted in 17 March 2008.

5. "Applying Area Extension PSO in Robotic Swarm", Journal paper, Evolutionary Computation, MIT-Press journal, submitted in 10 December 2007).

6. “Robotic Navigation with PSO”, Soft Computing journal, Elsevier, Submitted in April 2008.

52

Hettiarachchi, S. (2006). Distributed online evolution for swarm robotics. Autonomous Agents and Multi Agent Systems.J. Kennedy, and R. C. Eberhart, Particle swarm optimization,Proceedings of the 1995 IEEE International Conference on Neural Networks,vol. 4, IEEE Press, (1995): 1942-1948.Kennedy, J. and Mendes, R. (2002). Population structure and particle swarm performance. Proceedings of the 2002 Congress on Evolutionary Computation (CEC ’02).Krink, T., Vesterstrom, J. S., and Riget, J. (2002). Particle swarm optimization with spatial particle extension. Congress on Evolutionary Computation (CEC),2002 IEEE World Congress on Computational Intelligence.Lee, C., Kim, M., and kazadi, S. (2005). Robot clustering. Systems, Man and Cybernetics, 2005 IEEE International Conference on.Luke, S., Sullivan, K., Balan, G. C., and Panait, L. (2005). Tunably decentralized algorithms for cooperative target. Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2005).

53

Werfel, J., Yaneer, B. Y., and Negpal, R. (2005). Building patterned structures with robot swarms. Nineteenth International Joint Conference on Artificial Intelligence (IJCAI’05).Yamaguchi, T., Tanaka, Y., and Yachida, M. (1997). Speed up reinforcment learning between two agents with adaptive mimetism. IEEE/RSJ Intelligent Robots and Systems.Yang, C. and Simon, D. (2005). A new particle swarm optimization technique. 18th International Conference on Systems Engineering (ICSEng 2005).Zavala, A.E.M., Aguirro, A.H. and Diharce, E.R.V. (2005). Constrained Optimization via Particle Evolutionary Swarm Optimization Algorithm (PESO). Proceedings of the 2005 conference on Genetic and Evolutionary Computation. Zhang,W. and Xie, X. (2003). Depso: Hybrid particle swarm with differential evolution operator. IEEE, Systems, Man and Cybernetics.Zhao, Y. and Zheng, J. (2004). Particle swarm optimization algorithm in signal detection and blind extraction. IEEE, Parallel Architectures, Algorithms and Networks. 54

Ellips, M. and Davoud, S. “Classic and Heuristic Approaches in Robot Motion Planning – A Chronological Review”, Proc. World Academy of Science, Engineering and Technology, Vol 23 AUG 2007 ISSN 1307-6884.Li W.; Yushu L.; Hongbin D. and Yuanqing X.; "Obstacle-avoidance Path Planning for Soccer Robots Using Particle Swarm Optimization", Proc. IEEE Int. Conf. on Rob. and Biomimetics (ROBIO '06). (2006) pp. 1233- 1238.Saska, M.; Macas, M.; Preucil, L. and Lhotska, L. "Robot Path Planning using Particle Swarm Optimization of Ferguson Splines", Proc. IEEE/ETFA '06, (2006) pp. 833-839.Xin C. and Yangmin L.; "Smooth Path Planning of a Mobile Robot Using Stochastic Particle Swarm Optimization" Proc. IEEE on Mechatronics and Aut., (2006) pp. 1722-1727.Yuan-Qing Q.; De-Bao S.; Ning L. and Yi-Gang C.; Path planning for mobile robot using the particle swarm optimization with mutation operator Proc. Int. Conf. on Machine Learning and Cybernetics, (2004) pp. 2473 – 2478.Siegwart, R., Nourbakhsh, I. R., “Introduction to Autonomous Mobile Robots”, book, Springer, Chapter 5, 2004. 55

•In PSO, according to literature, each robot is responsible for several virtual agents/particles which it evaluate at each iteration.•The virtual agents represents a group of possible solutions.•The amount of the virtual agents for each robot is controlled by a concept called Swarm-Size (At least 20 agents).•The amount of used robots by literature is differed between 20 to 300 robots (at least 400 evaluation in each iteration).•In AEPSO and CAEPSO, each particle represent a robot it self. Due to the problem setup, 5 robots considered during the experiments (35 evaluation in overall).

56

57

•Ring (Lbest)•Star (Gbest)• Wheel• Von Neumann• Cluster• PyramidThe figure is presenting Various neighborhood topologies (Kennedy and Mendes, 2002; Zavala, Aguirro and Diharce, 2005)

57

•PSO Has been shown to perform as well as or better than GA in several instances.•Eberhart and Kennedy found PSO perform on par with GA on the Schaffer f6 function (R. Eberhart, J. Kennedy, 1995).

•In work by Kennedy and Spears, a version of PSO outperformed GA in a factorial time-series experiment. (J. Kennedy, W.M. Spears, 1998)

•Fourie showed that PSO appears to outperform GA in optimizing several standard size shape design problems. (P.C. Fourie, A.A. Groenwold, 2002)

•In works by Pug and Martinoli a local neighborhood version of PSO outperformed GA in a multi-robot learning scenario with homogeneous and heterogeneous robots. (J. Pug, A. Martinoli, 2005,2006,2007)

58

•In Basic PSO controlling parameters ( c1; c2;w ) is a major issue. •These parameters have major parts/roles in controlling the effectiveness of social and cognitive parts. •Time Varying Inertia Weigh (TVIW) (Vesterstrom and Riget, 2002; Pasupuleti and Battiti, 2006; Ratnaweera et al., 2004) ,linear decreasing Weight (LDW)(Pasupuleti and Battiti, 2006).•Time Varying Acceleration Coefficients (TVAC)(Vesterstrom and Riget, 2002; Pasupuleti and Battiti, 2006; Ratnaweera et al., 2004)

•Random or even Constant/Fix values (RANDIW, FAC) (Ratnaweera et al., 2004).• TVAC and TVIW achieved better performances in multi-modal functions in contrast with RANDIW which can only be effective in unimodal functions. 59

60

61-5

0

5

10

15

20

9000 11000 13000 15000 17000 19000

S2 Training S2 Testing S3 Training S3 Testing S4

Runs

Iterations

62-100

0

100

200

300

400

500

600

0 5000 10000 15000 20000

S1 Training S1 Testing S2 Training S2 Testing S4

Dete

cted

bom

bs

Iterations

63

0

50

100

150

200

250

5000 7000 9000 11000 13000 15000 17000 19000 21000

S1 Training S1 Testing S2 Training S2 Testing S3 Training S3 Testing S4

Iterations

Exp

losi

ons

64

-10

0

10

20

30

40

50

60

0 5000 10000 15000 20000

500-C2 500-C1

Overall results in Real-Time environmentsEffect of Deserting policy in Real-Time Static Environment

0

5

10

15

20

25

4000 6000 8000 10000 12000 14000 16000 18000 20000

Dynamic Dynamic Noisy Static Static Noisy

Bomb explosion results in Real-Time environment

0

5

10

15

20

25

30

0 5000 10000 15000 20000

Dynamic-500 Dynamic - 5 Static - 500 Static - 5

Overall results in Real-Time environments

Iterations

Iterations

Iterations Iterations

Run

s

Exp

losi

ons

Det

ecte

d bo

mbs

Run

s

65

IterationsIterations

Run

s

Det

ecte

d bo

mbs