1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin...

RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS

Walter H. Delashmit

Lockheed Martin Missiles and Fire Control

Dallas, TX 75265

walter.delashmit@lmco.com

walter.delashmit@verizon.net

Michael T. Manry

The University of Texas at Arlington

Arlington, TX 76010

manry@uta.edu

Memphis Area Engineering and Science Conference 2005

May 11, 2005

Outline of Presentation

• Review of Multilayer Perceptron Neural Networks

• Network Initial Types and Training Problems

• Common Starting Point Initialized Networks

• Dependently Initialized Networks

• Separating Mean Processing

• Summary

Review of Multilayer Perceptron Neural Networks

Typical 3 Layer MLP

Output Layer

Hidden Layer Input Layer

net p (1) O p (1) w

oh (1,1) y p (1)

y p (2)

y p (3)

y p (M)

O p ( N h ) net p ( N h )

w hi ( N h ,N) x p (N)

x p (3)

x p (2)

x p (1)

w hi (1,1)

w oh (M, N h )

MLP Performance Equations

Mean Square Error (MSE):

)i(y)i(tN

Output:

1jpohp

1koip )j(O)j,i(w)k(x)k,i(w)i(y

Net Function:

)j(netpp pe1

1))j(net(f)j(O

1kphip

)k(x)k,j(w)j(net

Net Control

Scales and shifts all net functions so that they do not generate small gradients and do not allow large inputs to mask the potential effects of small inputs

)i,j(w)i,j(w

hdhihi

)j(mm)1N,j(w)1N,j(w

hdhhdhihi

Neural Network Training Algorithms

• Backpropagation Training

• Output Weight Optimization – Hidden Weight Optimization (OWO-HWO)

• Full Conjugate Gradient

Output Weight Optimization – Hidden Weight Optimization (OWO-HWO)

• Used in this development

• Linear equations used to solve for output weights in OWO

• Separate error functions for each hidden unit are used and multiple sets of linear equations solved to determine the weights connecting to the hidden units in HWO

Network Initial Types and Training Problems

Problem Definition

• Assume that a set of MLPs of different sizes are to be designed for a given training data set

• Let be the set of all MLPs for that training data having Nh hidden units, Eint(Nh) denote the

corresponding training error of am initial network that belongs to

• Let Ef(Nh) denote the corresponding training error of a well-trained network

• Let Nhmax denote the maximum number of hidden units for which networks are to be designed

• Goal: Choose a set of initial networks from {S0, S1, S2, … }such that

Eint(0) Eint (1) Eint (2) …. Eint(Nhmax) and train the network to minimize Ef(Nh)

such that Ef(0) Ef (1) Ef (2) …. Ef(Nhmax)

• Axiom 3.1: If Ef(Nh) Ef (Nh-1) then the network having Nh hidden units is useless since the

training resulted in a larger, more complex network with a larger or the same training error.

Network Design Methodologies

• Design Methodology One (DM-1) – A well-organized researcher may design a set of different size networks in an orderly fashion, each with one or more hidden units than the previous networko Thorough design approach

o May take longer time to design

o Allows achieving a trade-off between network performance and size

• Design Methodology Two (DM-2) – A researcher may design different size networks in no particular ordero May be quickly pursued for only a few networks

o Possible that design could be significantly improved with a bit more attention to network design

Three Types of Networks Defined

• Randomly Initialized (RI) Networks – No members of this set of networks have any initial weights and thresholds in common. Practically this means that the initial random number seeds (IRNS) are widely separated. Useful when the goal is to quickly design one or more networks of the same or different sizes whose weights are statistically independent of each other. Can be designed using DM-1 or DM-2

• Common Starting Points Initialized (CSPI) Networks – When a set of networks are CSPI, each one starts with the same IRNS. These networks are useful when it is desired to make performance comparisons of networks that have the same IRNS for the starting point. Can be designed using DM-1 or DM-2

• Dependently Initialized (DI) Networks – A series of networks are designed with each subsequent network having one or more hidden units than the previous network. Larger size networks are initialized using the final weights and thresholds from training a smaller size network for the values of the common weights and thresholds. DI networks are useful when the goal is a thorough analysis of network performance versus size and are most relevant to being designed using DM-1.

Network Properties

• Theorem 3.1: If two initial RI networks (1) are the same size, (2) have the same training data set and (3) the training data set has more than one unique input vector, then the hidden unit basis functions are different for the two networks.

• Theorem 3.2: If two CSPI networks (1) are the same size and (2) use the same algorithm for processing random numbers into weights, then they are identical.

• Corollary 3.2: If two initial CSPI networks are the same size and use the same algorithm for processing random numbers into weights, then they have all common basis functions.

Problems with MLP Training

• Non-monotonic Ef(Nh)

• No standard way to initialize and train additional hidden units

• Net control parameters are arbitrary

• No procedure to initialize and train DI networks

• Network linear and nonlinear component interference

Mapping Error Examples

0.0005

0.0015

0.0025

0.0035

0.0045

3 4 5 6 7 8 9 10 11 12

Number of hidden units

Single seed

3 4 5 6 7 8 9 10 11 12Number of hidden units

Mean squareerrorMedian error

3 4 5 6 7 8 9 10 11 12Number of hidden units

Minimum error

Seed number

Tasks Performed in this Research

• Analysis of RI networks• Improved Initialization in CSPI networks• Improved initialization of new hidden units in DI

networks• Analysis of separating mean training approaches

CSPI and CSPI-SWI Networks

• Improvement to RI networksEach CSPI network starts with same IRNS

• Extended to CSPI-SWI (Structured Weight Initialization) networkso Every hidden unit of the larger network has the same initial weights and

threshold values as the corresponding units of the smaller networko Input to output weights and thresholds are also identical

• Theorem 5.1: If two CSPI networks are designed with structured weight initialization, the common subset of the hidden unit basis functions are identical.

• Corollary 5.1: If two CSPI networks are designed using structured weight initialization, the only initial basis functions that are not the same are the hidden unit basis functions for the additional hidden units in the larger network.

• Detailed flow chart for CSPI-SWI initialization in dissertation

CSPI-SWI Examples

3 4 5 6 7 8 9 10 11 12Nh

CSPI-SWI

3 4 5 6 7 8 9 10 11 12 13 14 15Nh

CSPI-SWI

fm twod

DI Network Development and Evaluation

• Improvement over RI, CSPI and CSPI-SWI networks

• The values of the common subset of the initial weights and thresholds for the larger network are initialized with the final weights and thresholds from a previously well-trained smaller network

• Designed with DM-1

• Single network designs networks are implementable

• After training, testing is feasible on a different set of data set

Create an initial network with Nh

hidden units

Train this initial

network

Nh Nh+p

Nh>Nhmax ?

Initialize new hidden units

Nh-p+1 j Nh

woh(k,j) 0, 1 k M

whi(j,i) RN(ind+), 1 i N+1

Net control for whi(j,i), 1 i N+1

Train new

network

Basic DI Network Flowgraph

Properties of DI Networks

• Eint(Nh) < Eint(Nh-p)

• Ef(Np) curve is monotonic non-increasing (i. e., Ef(Nh) Ef(Nh-p))

• Eint(Nh) = Ef(Nh-p)

Performance Results for DI Networks with Fixed Iterations

3 4 5 6 7 8 9 10 11 12Nh

TrainingTesting

3 4 5 6 7 8 9 10 11 12Nh

TrainingTesting

0.0E+00

5.0E+06

1.0E+07

1.5E+07

2.0E+07

2.5E+07

3.0E+07

3 4 5 6 7 8 9 10 11 12Nh

TrainingTesting

0.0E+00

2.0E+07

4.0E+07

6.0E+07

8.0E+07

1.0E+08

3 4 5 6 7 8 9 10 11 12Nh

TrainingTesting

fm twod

F24 F17

RI Network and DI Network Comparison

(1) DI network: standard DI network design for Nh hidden units

(2) RI type 1: RI networks were designed using a single network for each value of Nh and every network of size Nh was trained using the value of Niter that the

corresponding network was trained with for the DI network.

(3) RI type 2: RI networks were designed using a single network for each value of Nh and every network was trained using the total number of Niter that was

used for the entire sequence of DI networks. This can be expressed by

This results in the RI type 2 network actually having a larger value of N iter than the

DI network.

1jwiter

RI Network and DI Network Comparison Results

5 6 7 8 9 10 11 12

DI network

RI type 1

RI type 2

5 6 7 8 9 10 11 12

DI network

RI type 1

RI type 2

fm twod

Separating Mean Processing Techniques

• Bottom-Up Separating Mean• Top-Down Separating Mean

Generate linear mapping results

Train MLP using new data

ppp ttx,

Bottom-Up Separating Mean

)i(y)i(t̂)i(tN

Basic Idea:

•A linear mapping is removed from the training data.

•The nonlinear fit to the resulting data may perform better.

Generate new desired output vector

Bottom-up Separating Mean Results

3 4 5 6 7 8 9 10 11 12Nh

Baseline

Separating mean

3 4 5 6 7 8 9 10 11 12Nh

Baseline

Separating mean

3 4 5 6 7 8 9 10 11 12Nh

Baseline

Separating mean

fm power12

single2

Top-Down Separating Mean

Determine input and output subsets with similar means

Remove means from corresponding input and output subsets

Train MLP using modified inputs and outputs

Basic Idea:

•If we know which subsets of inputs and outputs have the same means in Signal Model 2 and 3, we can estimate and remove these means.

•Network performance is more robust.

Separating Mean Results

power12

3 4 5 6 7 8 9 10 11 12Nh

Bottom-up separating meanTop-down separating meanBaseline

Conclusions

• On the average CSPI-SWI networks have more monotonic non-increasing MSE versus Nh curves than RI networks

• MSE versus Nh curves are always monotonic non-increasing for DI networks

• DI network training was improved by calculating the number of training iterations and limiting the amount of training used for previously trained units

• DI networks always produce more consistent MSE versus Nh curves than RI, CSPI and CSPI-SWI networks

• Separating mean processing using both a bottom-up and top-down architecture often produce improved performance results

• A new technique was developed to determine which inputs and outputs are similar to use for top-down separating mean processing

1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin...

Documents

Transcript of 1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin...

An Embedded True Random Number Generator for FPGAs Bebek, Jerry Paul Kohlbrenner Lockheed Martin 3201 Jermantown Road Fairfax, VA 22030, USA Paul.W.Kohlbrenner@lmco.com.

On Representing Uncertainty In Some COCOMO Model Family Parameters October 27, 2004 John Gaffney j.gaffney@lmco.com 301-240-7038 Fellow, Software & Systems.

PAINTSQUARE.COM JOURNAL OF PROTECTIVE … · elastomeric lining, says Larry DeLashmit, sales manager of Polycorp. Natural rubber linings are used for sodium hydrochloric acid ladings,

˘ ˇ ˆphysicscourses.syr.edu/PHY351651.16Fall/index_files/74HC03N.pdf · ˇ ˆ SCLS077E − MARCH 1984 − REVISED NOVEMBER 2003 2 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265

VanDoorne - Florent...VanDoorne Advocaten « Notarissen • Fiscalisten Jachthavenweg 121 1081 KM Amsterdam Postbus 75265 Rechtbank Amsterdam «„,„.-.. 1070AG Amsterdam Team Handel,

P.O. Box 650205 Dallas, Texas 75265-0205 oices for · PDF fileAtmos Energy Corporation 2011 Summary Annual Report ... P.O. Box 650205 Dallas, Texas 75265-0205 atmosenergy.com V ...

1 Reggie Cole Lockheed Martin Senior Fellow reggie.cole@lmco.com reggie.cole@lmco.com Garry Roedler Lockheed Martin Fellow garry.j.roedler@lmco.com garry.j.roedler@lmco.com.

Modeling with SysML - Applied Physics · PDF fileModeling with SysML Instructors: Sanford Friedenthal sanford.friedenthal@lmco.com Joseph Wolfrom joe.wolfrom@jhuapl.edu Tutorial presented

INCOSE Evaluation: Systems Modeling Language (SysML) SysML Submission Team (SST) 13, 15, 20 December 2005 SST Chair: Sanford Friedenthal sanford.friedenthal@lmco.com.

asset.conrad.com...SLOS081G − FEBRUARY 1977 − REVISED SEPTEMBER 2004 2 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 description/ordering information (continued) ORDERING INFORMATION

datasheet.octopart.comdatasheet.octopart.com/SN74HC573ADW-Texas...7893.pdf · SCLS147E − DECEMBER 1982 − REVISED SEPTEMBER 2003 2 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265

75267€¦ · 75267 75269 75266 75265 75263 75272 75264 75270 75268 75271 75273

Publisher Framework (PFW) - SIGAda · 2006. 10. 5. · Publisher Framework (PFW) Judith Klein judith.klein@lmco.com Lockheed Martin 9211 Corporate Boulevard, Rockville, MD, 20850,

John DeLashmit, US EPA Region 7, What's New in WQS, Missouri Water Seminar, September 10-11, 2015, Columbia, MO

Retriggerable Monostable Multivibrators - … MONOSTABLE MULTIVIBRATORS SDLS043 – DECEMBER 1983 – REVISED MARCH 1988 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 1 PRODUCTION

Retriggerable Monostable Multivibrators MONOSTABLE MULTIVIBRATORS SDLS043 – DECEMBER 1983 – REVISED MARCH 1988 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 3 SN54122, SN54123,

pdf.eepw.com.cnpdf.eepw.com.cn/t20091104/90c4c9bdce9bf3c070fc01d7d34bbb...SLVS543J AUGUST 2004 REVISED DECEMBER 2005 POST OFFICE BOX 655303 DALLAS, TEXAS 75265 7 V ref TOLERANCE (25

Relief for the Daily Grind Keith Maxwell Lockheed Martin Keith.Maxwell@lmco.com O: 407-356-2831 C: 407-620-2289.

Newsletter · 6/8/2008 · Armstrong Smith; the remarkable Annie Abrams; attorney Chris Burks, who writes a column in the Log Cabin; and the veteran Larry Delashmit, who gave a moving

Tim Gallagher Reconfigurable Computing Technologies Lockheed Martin Space Systems Company timothy.c.gallagher@lmco.com MAPLD 2009 Seminar Day Aug 31 st,