Learning Control at Intermediate Reynolds...
Transcript of Learning Control at Intermediate Reynolds...
![Page 1: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/1.jpg)
Learning Control atIntermediate Reynolds Numbers
Experiments with perching UAVsand flapping-winged flight
Russ Tedrake
Assistant ProfessorMIT Computer Science and Artificial Intelligence Lab
IROS 2008 Learning WorkshopSeptember 22, 2008
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 2: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/2.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 3: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/3.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 4: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/4.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 5: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/5.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...
Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 6: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/6.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 7: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/7.jpg)
The Burden of Learning Control
Reinforcement learning has the potential to generatehigh-performance nonlinear, adaptive control policies forcomplicated systems...
but our techniques must be validated vs. more traditionalcontrol techniques.
There are many problems in robotics where traditionalnonlinear control offers relatively few solutions
Underactuated systems, many stochastic systems, ...Learning control can quickly become the state-of-the-art
Control in complicated fluid dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 8: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/8.jpg)
Example: Landing on a perch
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 9: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/9.jpg)
The State-of-the-Art in Perching
Approach 1: Morphing plane Approach 2: Over-powered hover
Both sacrifice performance to use linear control on modifiedvehicles (can’t compete w/ birds!)
Can learning nonlinear control produce superior performanceon existing vehicles?
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 10: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/10.jpg)
Technical challenges for control
Dynamics quickly get too complicated for conventionalcontrol design
Fluid dynamics are time-varying and very nonlinearCFD simulations in these regimes can take days to computeSevere lack of compact (design accessible) models
Limited control authority
Flow is only partially observableStalls result in intermittent losses of control authority“Underactuated control” - control actions have long-termconsequences
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 11: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/11.jpg)
Learning Control for Perching
Accurate Navier-Stokes simulations takes days to compute,but...
Model-based Reinforcement Learning:1 Learn approximate model of unsteady dynamics from real flight
data2 Formulate the goal of control as the long-term optimization of
a scalar cost3 Offline model-based numerical optimal control4 Online model-free optimal control (“learning”)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 12: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/12.jpg)
Experiment Design
Glider (no propellor)
Dihedral (passive rollstability)
Offboard sensing andcontrol
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 13: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/13.jpg)
System Identification
Nonlinear rigid-body vehicle model
Linear (w/ delay) actuator modelReal flight data
Very high angle-of-attack regimesSurprisingly good match to theoryVortex shedding
Lift Coefficient
−20 0 20 40 60 80 100 120 140 160−1.5
−1
−0.5
0
0.5
1
1.5
Angle of Attack
Cl
Glider DataFlat Plate Theory
Drag Coefficient
−20 0 20 40 60 80 100 120 140 160−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Angle of Attack
Cd
Glider DataFlat Plate Theory
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 14: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/14.jpg)
A dynamic model
(x, z)m, I
θF
F
g
w
el
lw
el
−φ
Planar dynamics
Aerodynamics from model
State: x = [x , y , θ, φ, x , y , θ]
Only actuator is the elevatorangle, u = φ
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 15: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/15.jpg)
Feedback Control Design
Optimal feedback control formulation:
Jπ(x) = min[(x− xd)T Q(x− xd), Jπ(x′)
],
x′ = f (x, π(x)))
x is the estimated state (from motion capture)π is the feedback policy, commanding elevator angle as afunction of statef is the identified system modelQ is a positive definite cost matrix
Discretize dynamics on a mesh over state space
Optimized Dynamic Programming algorithm approximates theoptimal policy, π∗(x), from model
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 16: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/16.jpg)
Glider Perching
Enters motion capture @ 6 m/s.
Perch is < 3.5 m away.
Entire trajectory @ 1 second.
RequiresSeparation!
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 17: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/17.jpg)
Flow visualization (very preliminary)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 18: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/18.jpg)
Flow visualization (very preliminary)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 19: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/19.jpg)
Transition to Outdoors: Gust Response
Current experiments are in “still air”
Outdoor flights require robustness to
ambient wind conditionspersistent flow structures (e.g., around the perch)short-term “gust” disturbances
Capture statistical environment model
Instrument motion capture arena with known aerodynamicdisturbances, obstacles, and wind
Already performed detailed LTV controllability analysis withdifferent actuator configurations
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 20: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/20.jpg)
The Intermediate Reynolds Number regime
Reynolds number (Re) - dimensionless quantity that correlateswith the resulting kinematics of the fluid
Low Re - viscousity dominates, flow is laminarHigh Re - turbulenceIntermediate Re - complicated, but structured flow (eg, vortexshedding). Glider perching example is Re 50,000 down to Re15,000.
At Intermediate Re:
Lots of interesting control problemsAlmost no good control solutions
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 21: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/21.jpg)
Bird-scale flapping flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 22: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/22.jpg)
Autonomous Flapping-Winged Flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 23: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/23.jpg)
Motivation
Bird-scale flapping vehicles will not surpass the speed orefficiency of fixed-wing aircraft for steady-level flight in still air
Propellors produce thrust very efficientlyAircraft airfoils can be highly optimized (for speed orefficiency)
But looking more closely...
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 24: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/24.jpg)
Motivation
Bird-scale flapping vehicles will not surpass the speed orefficiency of fixed-wing aircraft for steady-level flight in still air
Propellors produce thrust very efficientlyAircraft airfoils can be highly optimized (for speed orefficiency)
But looking more closely...
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 25: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/25.jpg)
Efficient flying machines
An albatross can fly for hours (or even days) without flapping,even migrating upwind (exploiting gradients in the shear layer)
Butterflies migrate thousands of kilometers, carried by thewind
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 26: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/26.jpg)
Super-maneuverability
Peregrine falcons have been clocked at 240+ mph in dives,and have the agility to snatch moving prey
Bats have been documented...
Catching prey on their wingsManuevering through thick rain forests at high speedsMaking high speed 180 degree turns...
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 27: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/27.jpg)
Manipulating the Air
Birds far surpass the performance of our best engineeredsystems (especially UAVs) in metrics of efficiency,acceleration, and maneuverability.
The secret:
Birds (and fish, ...) exploit unsteady aerodynamics atintermediate Re
A manipulation problem
Requires unconventional mechanical and control designsOnce you start thinking of bird flight as manipulating the air,it becomes harder to appreciate fixed-wing flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 28: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/28.jpg)
Manipulating the Air
Birds far surpass the performance of our best engineeredsystems (especially UAVs) in metrics of efficiency,acceleration, and maneuverability.
The secret:
Birds (and fish, ...) exploit unsteady aerodynamics atintermediate Re
A manipulation problem
Requires unconventional mechanical and control designsOnce you start thinking of bird flight as manipulating the air,it becomes harder to appreciate fixed-wing flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 29: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/29.jpg)
Manipulating the Air
Birds far surpass the performance of our best engineeredsystems (especially UAVs) in metrics of efficiency,acceleration, and maneuverability.
The secret:
Birds (and fish, ...) exploit unsteady aerodynamics atintermediate Re
A manipulation problem
Requires unconventional mechanical and control designsOnce you start thinking of bird flight as manipulating the air,it becomes harder to appreciate fixed-wing flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 30: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/30.jpg)
Manipulating the Air
Birds far surpass the performance of our best engineeredsystems (especially UAVs) in metrics of efficiency,acceleration, and maneuverability.
The secret:
Birds (and fish, ...) exploit unsteady aerodynamics atintermediate Re
A manipulation problem
Requires unconventional mechanical and control designsOnce you start thinking of bird flight as manipulating the air,it becomes harder to appreciate fixed-wing flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 31: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/31.jpg)
Example: Efficient swimming upstream
from George Lauder’s Lab at Harvard(Liao et al, 2003)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 32: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/32.jpg)
Prospects for machine learning
Key observations about fluid-body interactions at intermediateRe
Considerable previous work in system identification permits theuse of approximate modelsWon’t always be able to discretize the state spaceRelatively compact policies (few parameters) can generate alarge repetoire of behaviors
Formal analysis of the policy gradient algorithms reveals:
Performance (via SNR) degrades with the number of controlparametersPerformance is (locally) invariant to the complexity of theplant dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 33: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/33.jpg)
Prospects for machine learning
Key observations about fluid-body interactions at intermediateRe
Considerable previous work in system identification permits theuse of approximate modelsWon’t always be able to discretize the state spaceRelatively compact policies (few parameters) can generate alarge repetoire of behaviors
Formal analysis of the policy gradient algorithms reveals:
Performance (via SNR) degrades with the number of controlparametersPerformance is (locally) invariant to the complexity of theplant dynamics
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 34: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/34.jpg)
The Heaving Foil
work with Jun Zhang (NYU Courant)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 35: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/35.jpg)
The Heaving Foil
[Vandenberghe et al., 2006]
Rigid, symmetric wing
Driven vertically
Free to rotatehorizontally
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 36: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/36.jpg)
Symmetry breaking leads to forward flight
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 37: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/37.jpg)
Flow visualization
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 38: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/38.jpg)
Effect of flapping frequency
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 39: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/39.jpg)
The control problem
Previous work only used sinusoidal trajectories
Optimize stroke form to maximize the “efficiency” of forwardflight
Add vertical load cell (measures Fz(t))Dimensionless cost of transport:
cmt =
∫T|Fz(t)z(t)|dt
mg∫T
x(t)dt
Fortunately
min cmt = min
∫T|Fz(t)z(t)|dt∫
Tx(t)dt
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 40: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/40.jpg)
The control problem
Previous work only used sinusoidal trajectories
Optimize stroke form to maximize the “efficiency” of forwardflight
Add vertical load cell (measures Fz(t))Dimensionless cost of transport:
cmt =
∫T|Fz(t)z(t)|dt
mg∫T
x(t)dt
Fortunately
min cmt = min
∫T|Fz(t)z(t)|dt∫
Tx(t)dt
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 41: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/41.jpg)
Prospects for optimization
CFD model[Alben and Shelley, 2005]
Takes approximately 36 hours to simulate 30 flaps
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 42: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/42.jpg)
Prospects for optimization
CFD model[Alben and Shelley, 2005]
Takes approximately 36 hours to simulate 30 flaps
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 43: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/43.jpg)
Experimental Optimization
Can we perform the optimization directly in the fluid?
Direct policy search
Needs to be robust to noisy evaluationsMinimize number of trials required
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 44: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/44.jpg)
Optimized policy gradient
The basic algorithm (weight perturbation):
Perturb the control parameters, p by some amount z fromN(0, σ)Perform the update:
∆p = −η(cmt(p + z)− cmt(p))z
Strong performance guarantees
E [∆p] ∝ −∂cmt
∂p.
Poor performance (requires many trials)
SNR optimized policy gradient [Roberts and Tedrake, 2009]
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 45: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/45.jpg)
Learning results
0 10 20 30 40 504
5
6
7
8
9
10
11
12
13
Trial
Rewa
rd
IC − Sine WaveIC − Smoothed Square Wave
learns in about 10,000 flaps (@ 15 minutes)
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 46: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/46.jpg)
Learning results (cont.)
0 0.2 0.4 0.6 0.8 1−30
−20
−10
0
10
20
30
t/T
z (m
m)
Executed Waveforms
Initial WaveformIntermediate WaveformFinal Waveform
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 47: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/47.jpg)
A dynamic explanation
Forward speed is linear in flapping frequency
from experimentsstatement about average speed
Drag forces quadratic in speed (F ∝ ρSV 2)
Triangle wave obtains highest average speed w/ minimal drag
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 48: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/48.jpg)
Implications
Enabling tool for experimental fluid dynamicists
Suggests that motor learning algorithms could produceefficient control solutions in fluids
Suggests that we can use this to control robotic birds
Exciting prospects for online learning in changingenvironments
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 49: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/49.jpg)
Summary
Nonlinear, underactuated control w/ imperfect models viamachine learning control (birds don’t solve Navier-Stokes)
Allows our machines to exploit unsteady flow effects
Soon, robotic birds will:
Fly efficiently and autonomouslyOutperform fixed-wing vehicles in maneuverability
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 50: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/50.jpg)
Acknowledgements
Students:
John RobertsRick CoryZack JackowskiSteve Proulx
Funding:
MIT, MIT CSAILNEC corporationLincoln Labs ACCMicrosoft
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers
![Page 51: Learning Control at Intermediate Reynolds Numberspeople.csail.mit.edu/russt/iros2008_workshop_talks/Tedrake.pdfCurrent experiments are in \still air" Outdoor ights require robustness](https://reader036.fdocuments.in/reader036/viewer/2022071102/5fdafe4fdfee5b04585e63fb/html5/thumbnails/51.jpg)
References
Alben, S. and Shelley, M. (2005).Coherent locomotion as an attracting state for a free flappingbody.Proceedings of the National Academy of Sciences,102(32):11163–11166.
Roberts, J. W. and Tedrake, R. (2009).Signal-to-noise ratio analysis of policy gradient algorithms.In To appear in Advances of Neural Information ProcessingSystems (NIPS) 21, page 8.
Vandenberghe, N., Childress, S., and Zhang, J. (2006).On unidirectional flight of a free flapping wing.Physics of Fluids, 18.
Russ Tedrake, MIT CSAIL Learning Control at Intermediate Reynolds Numbers