[IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) -...

5
Selection of reconstruction variables and parameters in the phase space reconstruction on genetic algorithm Hui Tao, Xiao-Ping Ma School of Information and Electrical Engineering China University of Mining and Technology Xuzhou221116, China e-mail:[email protected], [email protected] Mei-Ying Qiao School of Electrical Engineering and Automation Henan Polytechnic University Jiaozuo 454000, China e-mail:[email protected] Abstract—Aimed at solving the interdependences among reconstruction variables and their parameters, this paper adopted genetic algorithm to determine reconstruction variables and parameters in the phase-space reconstruction of multivariate time series simultaneously. First, the process and method of phase-space reconstruction of multivariate time series are introduced. Then the theory of genetic algorithm to select reconstruction variables and parameters is given that chromosome coding is binary code string of multi-parameters cascade, fitness function is average prediction error function and the optimal reconstruction variables and parameters combination is obtained through genetic operation. Finally, in Matlab2009b environment, the algorithm is applied to two examples. Simulation results show the algorithm can overcome the defect of determine reconstruction variables and parameters in isolation, and lay a good foundation for chaotic time series analysis and prediction. Keywords-Genetic algorithm; Reconstruction variables; Embedding dimension; Delay time; Phase space reconstruction; Multivariate I. INTRODUCTION The analysis and prediction of chaotic time series are implemented all in the phase-space reconstruction, so phase- space reconstruction is the key step of chaotic time series analysis. Theoretically speaking, univariate time series can sufficiently reconstruct original dynamic system, only if embedding dimension and delay time are correctly chosen. However, in practice, there are many unexpected cases, for example, Lorenz dynamic system that X axis can be preferable to reconstruct the original system, but Z axis can’t reconstruct its system well [1]. So the viewpoint that any univariate time series can reconstruct its original dynamic system well is not always correct. In addition, since the measurable time series is often multidimensional in many systems, we can use multivariate series to reconstruct original dynamic system, and in [2, 3] the theory and application of phase-space reconstruction of multivariate time series were discussed and its superiorities were proved. Now in the phase-space reconstruction of multivariate time series, there are two key issues: reconstruction variables selection and determination of reconstruction parameters including embedding dimensions and time-delays. For some complex system, multivariate time series are obtained, but we can only take only one of any two interdependence time series into account, or it could lead to information redundancy and calculation complexity increase. Mutual information method is the common method to measure statistics dependence and thus select reconstruction variables, nevertheless, the reconstruction parameters must be known [4]. The selection of reconstruction parameters in multivariate time series phase-space reconstruction is that time-delays are determined through the method in univariate time series reconstruction first, then embedding dimensions are confirmed using false nearest neighbors method or minimizing prediction error method [5]. In addition, Kugiumtzis pointed out that there was a strong correlation between embedding dimension and time-delay and their mutual influence should be considered [6], then in [7] a method on GA to determine reconstruction parameters of univariate time series was proposed, in [8] a method determining simultaneously embedding dimensions and time-delays was offered. However, the above methods to confirm reconstruction parameters are based on the known reconstruction variables. Therefore, there are interdependent relationships among reconstruction variables and their parameters, and we can’t determine them in isolation but simultaneously considering their interactions. As an adaptive global optimization and intelligent algorithm, genetic algorithm (GA) can solve all kinds of complex optimization problems and is characterized by wide adaptability and strong robustness [9]. In this paper, we use GA to select the optional combination of reconstruction variables and parameters in phase-space reconstruction in multivariate time series. First, the process and method of phase-space reconstruction of multivariate time series are introduced. Then the theory of GA to select reconstruction variables and parameters is given that chromosome coding is binary code string of multi-parameters cascade, fitness function is average prediction error function and the optimal reconstruction variables and parameters combination is obtained through genetic operation. Finally, in Matlab2009b environment, the algorithm is applied to two examples. II. PHASE SPACE RECONSTRUCTION OF MULTIVARIATE TIME SERIES A. Reconstruction Process The process of traditional phase space reconstruction of multivariate time series [4] is showed in Figure 1. 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications 978-0-7695-4560-8/11 $26.00 © 2011 IEEE DOI 10.1109/IWCFTA.2011.88 332

Transcript of [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) -...

Page 1: [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) - Hangzhou, China (2011.10.19-2011.10.22)] 2011 Fourth International Workshop on Chaos-Fractals

Selection of reconstruction variables and parameters in the phase space reconstruction on genetic algorithm

Hui Tao, Xiao-Ping Ma School of Information and Electrical Engineering

China University of Mining and Technology Xuzhou221116, China

e-mail:[email protected], [email protected]

Mei-Ying Qiao School of Electrical Engineering and Automation

Henan Polytechnic University Jiaozuo 454000, China

e-mail:[email protected]

Abstract—Aimed at solving the interdependences among reconstruction variables and their parameters, this paper adopted genetic algorithm to determine reconstruction variables and parameters in the phase-space reconstruction of multivariate time series simultaneously. First, the process and method of phase-space reconstruction of multivariate time series are introduced. Then the theory of genetic algorithm to select reconstruction variables and parameters is given that chromosome coding is binary code string of multi-parameters cascade, fitness function is average prediction error function and the optimal reconstruction variables and parameters combination is obtained through genetic operation. Finally, in Matlab2009b environment, the algorithm is applied to two examples. Simulation results show the algorithm can overcome the defect of determine reconstruction variables and parameters in isolation, and lay a good foundation for chaotic time series analysis and prediction.

Keywords-Genetic algorithm; Reconstruction variables; Embedding dimension; Delay time; Phase space reconstruction; Multivariate

I. INTRODUCTION The analysis and prediction of chaotic time series are

implemented all in the phase-space reconstruction, so phase-space reconstruction is the key step of chaotic time series analysis. Theoretically speaking, univariate time series can sufficiently reconstruct original dynamic system, only if embedding dimension and delay time are correctly chosen. However, in practice, there are many unexpected cases, for example, Lorenz dynamic system that X axis can be preferable to reconstruct the original system, but Z axis can’t reconstruct its system well [1]. So the viewpoint that any univariate time series can reconstruct its original dynamic system well is not always correct. In addition, since the measurable time series is often multidimensional in many systems, we can use multivariate series to reconstruct original dynamic system, and in [2, 3] the theory and application of phase-space reconstruction of multivariate time series were discussed and its superiorities were proved.

Now in the phase-space reconstruction of multivariate time series, there are two key issues: reconstruction variables selection and determination of reconstruction parameters including embedding dimensions and time-delays. For some complex system, multivariate time series are obtained, but we can only take only one of any two interdependence time

series into account, or it could lead to information redundancy and calculation complexity increase. Mutual information method is the common method to measure statistics dependence and thus select reconstruction variables, nevertheless, the reconstruction parameters must be known [4]. The selection of reconstruction parameters in multivariate time series phase-space reconstruction is that time-delays are determined through the method in univariate time series reconstruction first, then embedding dimensions are confirmed using false nearest neighbors method or minimizing prediction error method [5]. In addition, Kugiumtzis pointed out that there was a strong correlation between embedding dimension and time-delay and their mutual influence should be considered [6], then in [7] a method on GA to determine reconstruction parameters of univariate time series was proposed, in [8] a method determining simultaneously embedding dimensions and time-delays was offered. However, the above methods to confirm reconstruction parameters are based on the known reconstruction variables. Therefore, there are interdependent relationships among reconstruction variables and their parameters, and we can’t determine them in isolation but simultaneously considering their interactions.

As an adaptive global optimization and intelligent algorithm, genetic algorithm (GA) can solve all kinds of complex optimization problems and is characterized by wide adaptability and strong robustness [9]. In this paper, we use GA to select the optional combination of reconstruction variables and parameters in phase-space reconstruction in multivariate time series. First, the process and method of phase-space reconstruction of multivariate time series are introduced. Then the theory of GA to select reconstruction variables and parameters is given that chromosome coding is binary code string of multi-parameters cascade, fitness function is average prediction error function and the optimal reconstruction variables and parameters combination is obtained through genetic operation. Finally, in Matlab2009b environment, the algorithm is applied to two examples.

II. PHASE SPACE RECONSTRUCTION OF MULTIVARIATE TIME SERIES

A. Reconstruction Process The process of traditional phase space reconstruction of

multivariate time series [4] is showed in Figure 1.

2011 Fourth International Workshop on Chaos-Fractals Theories and Applications

978-0-7695-4560-8/11 $26.00 © 2011 IEEE

DOI 10.1109/IWCFTA.2011.88

332

Page 2: [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) - Hangzhou, China (2011.10.19-2011.10.22)] 2011 Fourth International Workshop on Chaos-Fractals

Figure 1. The process of traditional phase space reconstruction of multivariate time series

B. Phase Space Reconstruction of Multivariate Time Series There are multivariate time series of I independent

variables { } ),,2,1(1, Iix N

nni �==

which are the measured value of the ith continuous variables )(txi .

{ } ),,;;,,,;,,( ,1,1,,22,21,2,12,11,11, NIIINNN

nni xxxxxxxxxx ����==

(1)

Based on multivariate time series reconstruction theory in [10], phase-space reconstruction is as follows:

�� ,,,,,(2111 ,2,2)1(,1,1,1 τττ −−−−= nnmnnnn xxxxxX

),,,,)1(,,,)1(,2 22 LLL mnLnLnLmn xxxx τττ −−−−− �� (2)

NNNn ,,1, 00 �+= { }1)1(max10 +−=

≤≤ llLlmN τ

Where iτ and )21( Iimi �= express the time-delay and

embedding dimension of the ith variable respectively. Analogous to delay-embedding theorem of F.Takes, if im or

�= imm is large enough, there is a mapping mm RRF →: making: )(1 nn XFX =+ (3)

Or RRF mi →: 1+→ nn XX (4)

Then the state space evolution reflects the evolution of original unknown dynamic system. That means attractor’s geometrical characteristic of original system is equivalent to that of m dimensions phase-state reconstruction.

III. DETERMINING RECONSTRUCTION VARIABLES AND PARAMETERS ON GA

A. Chromosome Coding The target of GA is to determine the optimal

combination of reconstruction variables and parameters. When one reconstruction variable is not adopted to phase space reconstruction, the variable embedding dimension can be considered Zero. Thus, reconstruction variable coding can integrate into the coding of embedding dimension. Consequently, GA need only binary coding of the reconstruction parameters that embedding dimensions take non-negative integer and time-delays take positive integer. The chromosome coding of individual kY is composed of 2I1 binary digit strings, multi-parameters cascade pattern adapted [11].

),,,,,(1111 22121 IIIIk yyyyyyY �� ++= (5)

Where I1 is not the number of independent variables but the number of all actual measure variables, ),2,1( 1Iiyi �= is binary coding string of

im , ),2,1( 11Iiy iI �=+

is binary coding string of iτ .

Suppose 1,2,1,1),2,1( Jiiii zzzIiy �� == (6)

21111 ,2,1,1),2,1( JiIiIiIiI zzzIiy ++++ == �� (7)

Where, 1, jiz and

21 , jiIz + are binary digit characters. The

reconstruction parameters after decoding are

�=

=1

,

1

2J

j

zi

jim (8)

122

,

1

+=�=

J

j

zi

jiτ (9)

When im equal to Zero, the corresponding ith time series will not be adopted to phase space reconstruction, while im is greater than Zero, the ith time series will be adopted to reconstruct phase space and the corresponding reconstruction parameters take im and iτ respectively.

B. Initial Population Initial population comprising M individuals generate by

computer at random, and GA starts to iterate from these M individuals.

C. Fitness Function Prediction error of test data shows the main performance

of phase-space reconstruction, and the less prediction error is, the more reasonable reconstruction variables and parameters combination is.

Chaotic time series prediction is to construct F̂ (or iF̂ ),

an approximate form of mapping F (oriF ) , make:

)(ˆˆnln XFX =+

(10)

Actual Complicated System

Multivariate Time Series

Observation or Experiment

Interdependence among Variables

Yes

Independent Multivariate Time Series

Reconstruction Parameters Computation and Phase Space Reconstruction

Geometric Invariants Computation and Prediction

No Remove Redundant Variables

333

Page 3: [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) - Hangzhou, China (2011.10.19-2011.10.22)] 2011 Fourth International Workshop on Chaos-Fractals

Or )(ˆˆ . nilni XFx =+ (11)

Neural network (NN) is capable of Approximating any complex nonlinear function, and it can be trained according to sample data, thus the prediction model F̂ (or

iF̂ ) is established[12,13]. The common chaos prediction NN is BP NN and RBF NN. RBF NN is superior to BP NN in approximation ability and learning speed, moreover BP NN must set input layer nodes before train and the nodes configuration is too inconvenient because the changing reconstruction variables and embedding dimensions in this paper. Therefore, we use RBF NN as the prediction model [14]. According to (10) and (11), RBF NN input is

),,,,,,,()1(,0,)1(,1,1,1

01101000 IImnInImnnnn xxxxxXτττ −−−−−= ��� , m-dimension

reconstructed state variable, and the input is 11 0

, +nx , the value of the first time series

1x in time of 10 +n because of the one-step prediction error of the first time series acts as the performance index of phase-space reconstruction in this paper. In order to overcome the chance, we take the average prediction error of many points. Definition: the average prediction error of P test samples is

�=

++ −=P

ppnipniII oo

xxP

mmRE1

,,11 ˆ1),,,( ττ �� (12)

Thus, Given prediction performance, fitness )(1 KYf of individual KY can be designed as root mean square error function ),,,,,( 2121 IImmmE τττ ��

2

1,,111 )ˆ(1),,,()( �

=++ −==

P

ppnipniIIK oo

xxP

mmRMSEYf ττ �� (13)

In addition, we want that the lowest possible dimension of reconstructed state space and the smallest possible number of reconstruction variable, for the case that the lower dimension, the simpler state space, and the smaller number, the more easily the data obtain. Therefore, fitness function )( KYf is designed as with punish function.

ICmCYfYf iKK 211 )()( ++= �

ICmCxxP i

P

ppnipni oo 21

2

1,, )ˆ(1 ++−= ��

=++

(14)

Where 1C and

2C are the penalty factors.

D. Genetic Operator 1)Selection Proportional selection operator is adopted,

and selected probability )( ks Yp of individual KY is in inverse

proportion to its fitness.

)(

)()( 1

k

M

kk

ks YMf

YfYp

�== (15)

2)Crossover: Multi-point crossover operator is used to cross, for chromosome coding adapting multi-parameters cascade pattern, and the crossover processes are as follows. First, all individuals in population pair at random. Then several gene loci are selected as the cross-points. Finally all of the chromosome genes between the cross-points are mutual exchanged intermittently to each pair of individuals, but the genes between the first gene locus and the first cross-point can’t be exchanged, and new individuals are generated.

3)Mutation To multi-parameters cascade chromosome coding, cascade single-point mutation, a novel mutation operator, is adopted that make mutation probability of each gene sting be same. The novel operator’s steps are as follows: first, chromosome kY is split to 2I1 substrings as

1111 22121 ,,,,, IIII yyyyyy �� ++. Then, each substring is

operated by single-point mutation. At last new individual is generated by anew jointing up all mutated substrings according to the original location. Cascade single-point mutation operator has obvious advantages. On one hand, it will not be easy to cause too big precision differences between different parameters solution due to the same mutation probability of each gene-sting. On the other hand, it will also not be easy to cause oscillation of individual fitness because individuals have good stability for each substring mutation is limited to one gene and mutation area is not big.

IV. EXAMPLES

A. The coupling system of Rossler equation and Hyper Rossler equation

����

����

(16)

Where 925.0=ω , and the initial states are 1.00,1 =x ,

2.00,1 =y , 3.00,1 =z , 00,20,2 == yx , 150,2 =z , 200,2 −=ω , and the coupling coefficient 008.0=ε and 012.0 corresponding to nonsynchronized and synchronized evolution. In matlab2009b simulation environment, 6000 groups of data of all variables value are computed using fourth-order Runge-Kutta integral method when integration step is 004.0 . In order to reduce the influence of transient state, the previous 5000 groups of data are removed and 850 groups added 5% noise serve as the train samples and the last 150 groups as the prediction samples in the left 1000, after normalized to [ ]1,1− using Mapmaxmin function. Number of variables 71 =I , and in GA, chromosome length is 49; population size is 50, Evolution epoch is 200; Crossover operator is four-points cross and crossover probability is

)(' 12111 xxzyx −+−−= εω

111 15.0' yxy += ω)10(2.0' 111 −+= xzz

)(25.0' 212222 xxzxx −+++= εω

222 3' ωyy +=

222 05.05.0' zyz +−=222 ' yx −−=ω

334

Page 4: [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) - Hangzhou, China (2011.10.19-2011.10.22)] 2011 Fourth International Workshop on Chaos-Fractals

0.2; Cascade single-point mutation is adopted and mutation probability is 0.05 01.021 == CC .

Evolution curves of average fitness and the optimal fitness in 008.0=ε are showed in Figure 2. , and the optimal individual is found in 124 generations. On GA, in 008.0=ε

the optimal individual is 1100100000000000000001010101111001100111000000011, while in 012.0=ε , the optimal individual is 0111100000000000000001011101111001100111000000011. And after decoding the reconstruction variables (RV), corresponding reconstruction parameters (RP) and 1-step prediction error (RMSE) are showed in TABLE I. Moreover the RV, RP, and RMSE by the traditional method of [10] are also showed in TABLE 1. From the table, we can conclude that On GA, through the same number or ever fewer reconstruction variables and the same or lower dimension than traditional method, the prediction errors are less and the prediction effect is better.

In 008.0=ε , based the optimal individual, x1 actual value, 1-step and 3-step prediction value of 150 test samples are illustrated in Figure 3. As the figures show, prediction value approximates to actual one and prediction has good effect.

TABLE I RECONSTRUCTION VARIABLES, PARAMETERS AND PREDICTION ERROR

ε 008.0=ε 012.0=ε

Parameters RV RP( τ/m ) RMSE RV RP( τ/m ) RMSE

GA x1, y1 6/11,2/12 0.4208 x1, y1 4/12,3/15 0.4223

Traditional Method

x1, x2 5/15,3/12 1.9525 x1, x2 2/15,6/14 2.1941 x1, x2,w2 2/15,5/12,3/15 1.7435 x1, x2,w2 3/15,2/14,1/18 1.8642

B. Rock Burst Prediction on Microseismic Time Series Rock burst is one of the worst natural disasters in mine

production, and microseismic monitoring is a wide-used method to monitor rock burst [15]. The data are from microseismic monitoring system of No.11 mine in Pingdingshan Coal and monitoring time is 200 days from October 1, 2009 to April 18, 2009. microseismic energy has a large scale from 101 to 107 Joule, and there is a linear relationship between magnitude and the logarithm of microseismic energy, so we take logarithm of energy as sample data. The monitor variables are daily microseismic maximal energy x1, total energy x2, frequency x3 and average energy x4, and the prediction variable is microseismic maximal energy x1. The first 150 data are taken as training samples and the last 50 data as test samples.

On GA, we obtain the optional reconstruction variables are x1 and x3, reconstruction parameters are 11 =τ , 13 =τ , and

6=m , 31 =m , 33 =m . Actual value and 1-step prediction

value of 1x are showed in Figure.4. It can be perceived that the prediction has high accuracy.

In order to further illustrate the prediction performances of the optional variables and parameters combination, we give prediction error RMSE of different prediction model including traditional add-weighted one-rank model (ADOR), BPNN model and RBF NN in TABLE II. Though the prediction effects of different models are various, as a whole the prediction accuracy is considerable.

Prediction Model

RMSE

1-step Prediction 3-step Prediction

ADOR 0.1373 0.2109

BP NN 0.0954 0.1261

RBF NN 0.0252 0.0368

V. CONCLUSION Given the interdependence between reconstruction

variables and parameters, GA is adopted to determine reconstruction variables and parameters in the phase-space reconstruction of multivariate time series simultaneously and the algorithm is applied to a dynamic system including noise signal and an actual dynamic system. Simulation results

TABLE II. RMSE OF DIFFERENT PREDICTION MODEL

Evolution Epochs Figure 2. Evolution curves of fitness function

fitne

ss

0 50 100 150 200

Average Fitness

Optinal Fitness

3.0

2.5

2.0

1.5

1.0

0.5

0

Sample No. Figure 3 Actual value and prediction value of x1

fitne

ss

0 50 100 150-10

-5

0

5

10

15

actual value

1-step prediction

3-step prediction

0 10 20 30 40 503

4

5

6

7

prediction value

Actual value

x 1

Sample No. Figure 4. Actual value and prediction value of maximal energy

335

Page 5: [IEEE 2011 Fourth International Workshop on Chaos-Fractals Theories and Applications (IWCFTA) - Hangzhou, China (2011.10.19-2011.10.22)] 2011 Fourth International Workshop on Chaos-Fractals

show the algorithm can overcome the defect of determine reconstruction variables and parameters in isolation, and give high prediction accuracy.

ACKNOWLEDGEMENTS The authors would like to thank to Kou JianXin in

Technology Center of Pingdinsan Coal for providing the Microseismic time-series data. Financial support for this work, provided by the National Natural Science Foundation of China (No. 60974126) is gratefully acknowledged.

REFERENCES [1] S. P. Hastings, W. C. Troy, “A shooting approach to chaos in the

Lorenz equations,” J of Differential Equations, vol. 127, 1996, pp. 41-53.

[2] L. Y. Cao, A. Mees, “Judd K. Dynamics from multivariate time series,” Physica D, vol. 121(1-2), 1998, pp. 75-88.

[3] S. Boccaletti, D. L. Valladares, “Reconstructing embedding spaces of coupled dynamical systems from multivariate data,” Physical Review, vol. 65, 2002, pp. 1-4.

[4] H. Y. Wang, Z. H. Sheng, J. Zhang, “Phase space reconstruction of complex systems based on multivariate time series,” Journal of Southeast University (Natural Science Edition), vol. 33(1), 2003, pp. 115-118.

[5] H. Y. Wang, S. Lu, Nonlinear time series analysis and its application, Beijing: Science Press,2006.

[6] D. Kugiumtzis, “State space reconstruction parameters in the analysis of chaotic time series-The role of the time window length,” Physica D, vol. 95, 1996, pp. 13-28.

[7] C. T. Zhang, Q. L. Ma, H. Peng, “Chaotic time series prediction based on information entropy optimized parameters of phase space reconstruction,” Acta Physica Sinica, vol. 59(11), 2010, pp. 7623-7629.

[8] Y. H.Yue, W. X. Han, G. P. Cheng, “Determination of parameters in the phase-space reconstruction of multivariate time series,” Control and Decision, vol. 20(3), 2005, pp. 290-293.

[9] C. Silvia, C. Valentina, and V. Marco, “Variable selection through genetic algorithms for classification purposes,” Proc. 10th IASTED International Conf. on Artificial Intelligence and Applications, 2010, pp. 6-11.

[10] S. Lu, H. Y. Wang, “Calculation of the maximal Lyapunov exponent from multivariate data,” Acta Physica Sinica, vol. 55(2), 2006, pp. 572-576.

[11] J. Tolvi, “Genetic algorithms for outlier detection and variable selection in linear regression models,” Soft Computing, pp.527 533, August 2004.

[12] X. B. Zou, J. W. Zhao, and H. P. Mao, “Genetic algorithm interval partial least squares regression combined successive projections algorithm for variable selection in near-infrared quantitative analysis of pigment in cucumber leave,” Applied Spectroscopy, vol. 64, no. 7, 2010, pp. 86-79.

[13] X. D. Xiang, Y. H. Guo, “Chaotic Attractor-Based Time Series Forecasting Method and Its Application.” Journal of Southwest Jiaotong University, vol. 36(5), 2001, pp. 472-475,

[14] L.X. Yang, “Based on improved RBF neural network for chaotic time series prediction”, 2010 Second International Conference on Computational Intelligence and Natural Computing (CINC 2010) 2010, pp. 124-131.

[15] A. Y. Cao, L. M.Dou, R. J. Yan, “Classification of microseismic events in high stress zone,” Mining Science and Technology, vol. 19(6), 2009. pp. 718-723.

336