oSTI - digital.library.unt.edu/67531/metadc... · C. James Elliott, EESJ Jason Pepin, XCM...

LA-UR-96- 854 F L

Title:

Author( s) :

Submitted to:

Los Alamos NATIONA ,L LABORATORY

SUPER SYNCHRONIZATION FOR FUSED VIDEO AND TIME-SERIES NEURAL NETWORK TRAINING

oSTI C. James Elliott, EESJ Jason Pepin, XCM

Conference: Neural Network Applications in Highway and Vehicle Engineering, July 1 , 1996

Los Alamos Natlonal Laboratory. an affirmative actiorVequal opportunity employer, is operated by the University of California for the U.S. Department of Energy under contract W-7405ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive. royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so. for U.S. Government purposes. The Los Alamos National Laboratory requests that the publisher Identify this article as work performed under the auspices of the U.S. Department of Energy.

TE

Super Synchronization for Fused Video and Time-Series Neural Network Training

C . James Elliott, EES-5, MS F665 and Jason Pepin, XCM, MS F645 Los Alamos National Laboratory,

Los Alamos, NM 87545

and

Ralph Gillmann, FHWA, HPM-30 400 7th Street, SW

Washington, DC 20590

I. Introduction

A key element in establishing neural networks for traffic monitoring is the ground truth data set that verifies the sensor data. The sensors we use have time series data gathered from loop and piezo sensors embedded in the highway. These signals are analyzed and parsed into vehicle events. Features are extracted from these sensors and combine to form the vehicle vectors. The vehicle vectors are combined together with the video data in a data fusion process thereby provid- ing the neural network with its training set.

We examine two studies, one by Georgia Tech Research Institute (GTRI)' and another by Los Alamos National Laboratory (LANL) that use video information and have had difficulties in establishing the fusion process.That is to say, the correspondence between the video events recorded as the ground truth data and the sensor events has been uncertain. We show that these uncertainties can be removed by establishing a more precise and accurate time measurement for the video events. The principal that the video time information is inherently precise to better than a frame (1/30 s) and that by tracing the factors causing imprecision in the timing of events, we can achieve precisions required for unique vehicle identification we call super synchronization. In the Georgia data study there was an imprecision on the order of 3 seconds and in the LANL study an - imprecisiohearly a second. In both cases, the imprecision had led to lack of proper identification of sensor events. k

In the case of the Georgia I20 study sensors were placed at various distances downstream, up to 250 meters, from the ground truth camera. The original analysis assumed that there was a fixed time offset corresponding to the downstream location. For this case we show that when we restrict the analysis to passenger cars and take into account the speed of the car we can achieve a precision of approximately 0.3 s. This value is an order of magnitude less than the previous procedure.

1. B. A. Harvey, G. H. Champion, S. M. Ritchie, C . D. Ruby, Accuracy of Traffic Monitoring Equipment, Technical Report GTRI Project A-9291, June 1995.

1

.

In the LANL case, sensors are located within a 30 meter range and these sensors also were described by a time offset. The LANL procedure also involved identification of a ground truth video frame containing the vehicle. The vehicle however, could be anywhere in the frame. By taking into account the location of the vehicle in the frame and by measuring the speed of the vehicle within the frame by video means, we show that a simple pinhole camera model of the image pro- vides an accuracy of approximately a tenth second and a precision of half that. Similar video techniques have been used for video measurement of down-stream speeds'. Our side viewing technique utilizes a uniform metric during the course of travel of any one vehicle. Within the pinhole camera model (perpendicular to the road), the distance precision is at best 20m divided up into 200 pixels (approximately 400 could have been used) or 1Ocm. The frame time should be precise to a percent or two of 1/30'th second. The observed speed precision of about 10% or better suggests correctable camera distortion is the culprit. Higher precision is not, however, required for unique vehicle identification.

Several terms are clarified. The term precision means the ability to repeat a measurement. Good precision does not require small systematic errors but does require small random errors. The term accuracy indicates the ability to determine the physical quantity being measured; good accuracy requires both systematic errors and random errors to be small.

11. The GTRI Study In May and September of 1993, Georgia Tech Research Institute undertook a study of traffic on I20 comparing 13 sensor and classifier configurations from 10 equipment vendors. The emphasis was to determine how well vehicles can be counted and classified into the FHWA 13 vehicle classes. We have used ground truth from that study in conjunction with summaries of vendor time stamps and vendor classification. The ground truth consists of the raw video footage and information gathered from it. As a part of that study, students at GTRI clicked on vehicle positions in digitized presentations of the ground truth and classified the vehicles into the 13 classes augmented by unexpected events. Some of the unexpected difficulties included the inability to record all the near wheel locations of a long flatbed carrying prefab houses. This problem was solved by a technique involving a white pad placed on the highway to enable better observation. The study demonstrated the ability to measure the length of a vehicle to within about one pixel of the video. It did not compute the speed by video techniques for a direct ground truth comparison with the vendor results. Vendor equipment accuracy showed poor overall classification accuracy compared to the Intermodal Surface Transportation Efficiency Act requirements.

The idea of super synchronization is quite simple. The master clock has a time that is determined by the integer number of seconds recorded on a video frame and by the number of the event frame with respect to the origin frame. The vendor clock is modeled to have a slow drift that involves a

reporting times to milliseconds but in this study, they were configured to record time in seconds. Infrequently, abrupt discontinuities in the vendor clocks were observed.

- calibration change of 10 secondiper hour or less. The downstream vendor clock is capable of

2. R. M. Inigo, Application of Machine vision to Traffic Monitoring and Control, IEEE Transactions on Vehicular Technology 38, 1 12- 122,1989.

2

The time interval between events is the time it takes the vehicle to trave1 from the master clock to the vendor site, U S where L is the distance downstream and S is the vehicle speed in compatible units, assumed constant for each vehicle in this study. Also, the slow drift is d(t). Thus, we have Tv = [Tm +WS-L/SO+d(Tm)J. Where the brackets indicate the integer part of the quantity. The term L/SO is simply for convenience and may formally be regarded as part of d(Tm). This quo- tient, WSO, is a reference time required to travel to the vendor site based upon a reference speed SO which we take to be 3 1.3m/s (70MPH). The equation for Tv simply states that the vendor event time is the sum of the master clock time, Tm, plus the relative clock drift plus the transit time offset to zero at a speed of SO.

All the events we study here are from the data of 9/10/93 in the time interval [144450,144600]. The reason this region was chosen is that most vendors had 12 vehicles recorded. Of these twelve, seven turned out to be passenger cars. Even in this relatively good interval, one of the vendors had incomplete data for passenger cars.Another vendor did not report speeds at all. These two vendors were not included in Table 1 and Table 2 below. The raw data is shown in Table 1. In Table 1, L indicates the length downstream and F indicates the frame number past the origin frame, where the origin frame was counted as zero. The seven F values are taken from the video and are used to compute the fraction of a second for the master clock time. The six L values indicate that six vendors were used in this study where the lengths, L, are measured in 30 cm units (1 foot).

The upper of the two number pairs recorded in the matrix indicates the difference in the vendor recorded time and the integer seconds time recorded on the video frame of the event of the front of the vehicle crossing a fiducial 1ine.The lower of the two numbers is the post-processed vendor recorded speed of the vehicle measured in units of 0.447 m/s (1 MPH). One exception is the speed recorded in the shaded cell of Table 1. It was originally listed as 29 and was judged to be in error and was replaced by its neighboring value of 70. Speeds were rounded to an integer value.

Table 2 shows that super synchronization holds for these vehicles except for the shaded regions.

TABLE 1: Raw integer times and speeds for extended super synchronization study. L is distance downstream, F is frame number. Seven vehicles at six locations.

L F 0 2 19 20 23 24 26 0 5 6 6 6 6 6 6

73 69 73 72 67 68 73 70

200

45 0

550

-1 72 0 69 -1 71 0 71

-1 69 0 67 -1 68 0 67

-1 71 0 69 -1 71 0 70

-1 72

0 68

0 0 69 66 -1 0 70 68

0 68 1 65 0 66 1 67

0 70 0 67 0 70 0 61

650 -1 -1 - 1 -1 0 0 0 71 69 71 71 69 68 69

3

Table 2: Effective Frames vs Integer Time

car # down-stream distance effective frame integer time 0 0 0.0, 5 1 0 2.0, 6 2 0 19.0, 6 3 0 20.0, 6 4 0 23.0, 6 5 0 24.0, 6 6 0 26.0, 6

0 70 1 70 2 70 3 70 4 70 5 70 6 70

0 200 1 200 2 200 3 200 4 200

-0.6, 2.3, 18.7, 19.4, 23.6, 24.6, 26.0,

0.8, 4.6, 19.8, 20.8, 26.5,

-1 -1 -1 -1 0 0 0

0 1 2

450 450 450 450 450 450 450

550 550 550

- 1.9, 5.9, 17.1, 20.0, 26.9, 32.0, 26.0,

-2.3, 9.2, 19.0,

-1 -1 -1 -1 0 0 0

0 0 0

4

Table 2: Effective Frames vs Integer Time

car # down-stream distance effective frame integer time 3 550 20.0, 0 4 550 25.3, 1 5 550 31.2, 1

0 650 -2.7, 1 650 4.8, 2 650 16.3, 3 650 17.3, 4 650 25.8, 5 650 29.6, 6 650 28.8,

-1 -1 -1 -1 0 0 0

At each fixed downstream distance, there are results reported for seven vehicles. The integer time is taken from Table 1 . The effective frame represents the frame number that would have been observed by a camera had it been located at the vendor site. The effective frame is the same as the master clock frame if the downstream distance is 0 or the vehicle speed measured at the vendor location is SO. The transit time is converted into lapsed frames by multiplying by 30 fps. This time is then added to the master clock frame after subtracting off the time it would have taken a vehicle traveling the distance L at speed So.

Compliance with super synchronization is defined to occur if the integer times are in non-decreas- ing order after sorting the effective frame values in ascending order. This condition applies for vehicles at a fixed downstream location. It is the same as requiring a plateau of the variety reported last month. Shaded entries in Table 2 that do not comply with this criterion are with (car No. downstream distance) pairs of (5,200),(6,200) and (6,550). The first two pairs would be in order were they switched. A frame-measuring error of only a fraction of a frame accounts for this discrepancy. The (6,550) out of sequence entry apparently is bad data. Ascribing this category to the data has additional supporting evidence. The vendor reports the vehicle to be a 5 axle (non- passenger car) vehicle and gives 0 for all the axle separation distances. This is in contrast to the other vendors who report the vehicle to be a passenger car.

111. The LANL Study Data for this study was collected in Pojoaque, NM during the summer of 1993. Data includes video, loop, and piezo data for each run. See Fig. 1 a for a top view drawing and Fig. 1 b for a camera view of the collection site. Every third frame of the video has been digitized for times when a vehicle is present. Since the video was shot at thirty frames per second, vehicles can be viewed at one tenth of a second intervals. The first problem with analyzing video data is geometry. The video is generated from an elevated camera located off the shoulder of the road. The camera angle with respect to the ground is gener-

5

ally unknown. A transformation must be calculated from the camera view to a top view in order to take actual measurements from the digitized video. Using a pinhole camera approximation, the transformations are:

y = (A*Y + B)/(Y - Yv) x = C*XO + D

where:x, y are top view (actual) coordinates; X, Y are screen (pixel) coordinates; A, B, C, D are constants to be determined by known landmarks and origin specification; Yv is the pixel value of the vanishing point at the horizon; Xo is a projected pixel location along an arbitrary Y origin as described be low.

Two lines are drawn on the screen that would be parallel in an actual top view. The intersection of these lines on the screen yield Xv and Yv, the pixel location of the camera view vanishing point. For the y equation, A and B can then be found directly from known road geometry. For the x equation, a common origin for (x,y) and (X,Y) must be established. Any line drawn from the vanishing point of the screen to the x axis (y = 0) must yield a constant x since these lines would be vertically parallel in the top view. A linear relationship between X and x is assumed to exist along the x axis. Thus, whenever pixel location (X,Y) is chosen, a line is drawn from (Xv,Yv) through (X,Y) to the x axis. The intersection point is known as Xo. Once again, C and D can be found from known road geometry. Now x and y are known for any pixel location from the screen.

Three points are needed per vehicle to demonstrate super synchronization. The point and time where the front tire meets the pavement is recorded at successive frames (or two different times). A point for the rear tire is recorded at a third time, If a third time is not available (i.e. for fast mov- ing vehicles where oniy two frames were digitized), the second time is also used with the rear tire point. Speed of the vehicle can be calculated using the two locations and times of the front tire. The velocity can be used to find the position of the front tire at the third time and a total axle length of the vehicle can be Calculated.

Assuming velocity is constant, the time can be found for any position of the vehicle in the screen using the linear equation y = m*x + b where y is distance, x is time, and m is speed. For these results, distance is measured in 30.48 cm units called feet. By inputting loop and piezo locations and computing times for the vehicle to reach these events, a comparison can be done with actual times to prove super synchronization.

Results Using run number r378 of the Los Alamos data, super synchronization was demonstrated with 3 loops and two piezos. Table 3 demonstrates the accuracy of computing speed and time intervals from three points. The time interval in this case is from the front tire of the vehicle first appearing to the rear tire last appearing in the video. View Time from Video and Speed from Video are taken from the original videotape by counting individual frames.

6

Veh. No.

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Std. dev.

Table 3: Time Interval and Speed Errors

%error CmptdVid Vid. View Time (s) Time (s)

0.72

0.70

0.56

0.76

0.80

0.59

0.83

0.83

0.87

0.96

0.80

0.73

0.85

0.85

0.79

0.60

0.57

0.79

0.64

0.64

0.67

0.7

0.6

0.7

0.7

0.6

0.73

0.73

0.77

0.87

0.73

0.7

0.83

0.77

0.73

0.6

0.6

0.77

0.57

0.63

7.46

0

6.67

8.57

14.29

1.67

13.7

13.7

13.0

10.3

9.59

4.29

2.41

10.39

8.22

0

5 .O

2.60

12.28

1.59

4.85

CmptdSp Vid.Speed 30.48cmls 30.48cds

66.64

75.54

88.46

82.54

75.93

82.38

73,71

79.86

70.57

55.71

76.53

65.08

75.69

55.29

79.54

72.21

80.23

78.6 1

85.21

72.07

72

80

90

90

84.7

90

90 90 84.7

65.45

84.7

72

80

65.45

84.7

77.84

84.7

84.7

90

84.7

8.04

5.9

1.74

9.04

11.55

9.25

22.1

12.7

20

17.5

10.7

10.63

5.7

18.4

6.5

7.8

5.57

7.75

5.62

17.5

5.62

Lane & Fraction

0.1

0.2

0.2

1.1

1 .o 0.3

1.1

1.1

1.1

0.1

1.1

0.2

1.2

0.3

1 .o 0.4

0.3

1.2

0.3

0.3

Because the digitized frame shows a wider view than the T.V., the computed time interval is longer thereby causing a small difference. The speeds were found using a 24 foot span from the beginning of one loop to the end of another. The video records thirty frames per second. Since the average speed is 80 ft.ls, each frame captures only about 2.6 feet of movement in a vehicle. There- fore, if you measure speed over a 24 foot distance, at least a plus or minus ten percent error should be expected from an error in manual counting of a frame.

The next three tables, Table 4 -6, show how closely the super synchronization method predicts the

7

0

event time for Ioops and piezos. Vehicle no. I5 is omitted from this study because it is a motorcy- cle and was not recorded by any loops or piezos. The computed video times are taken when the front tire passes the leading edge of the sensor. Signals from loops and piezos are measured in terms of amplitude. Each event causes a rise in amplitude to a maximum point foIlowed by a similar decrease in amplitude to noise level. For this study, the time at half maximum amplitude when the signal is rising is used for the sensor event time stamp. The video and data acquired by computer is synchronized by a computer controlled light which can be seen at bottom center of the picture in Fig. 1 b.

Vehicle Number

3

4

6

7

8

10

12

14

17

Veh No.

0

1

2

5

9

11

13

16

18

19

CmptdAbs loop3Time

(SI

0.9

2.34

14.26

27.27

34.99

37.29

39.47

55.98

59.8 1

7 1 -06

Table 4: Loop 2 Data for Lane 1

Cmptd Abs Time (s) Act1 Abs Time (s) time error (s)

17.67

22.37

28.33

29.82

33.98

35.71

37.48

44.72

57.28

17.79

22.48

28.44

29.92

34.08

35.8 1

37.59

44.8 2

57.39

Table 5: Loop Data for Lane 0

AbsLoop3 Time (s)

1.04

2.46

14.37

27.38

35.13

37.39

39.57

56.07

59.90

71.15

time error (s)

0.14

0.12

0.1 1

0.1 1

0.14

0.10

0.10

0.09

0.09

0.09

Cmp td A bs loop4Time

(SI

1.15

2.56

14.44

27.47

35.28

37.54

39.77

56.18

60.0

71.29

0.12

0.1 1

0.1 1

0.10

0.10

0.10

0.1 1

0.10

0.1 1

time error AbsLoop4 Time (s) (SI

1.24

2.67

14.55

27.57

35.36

37.63

39.83

56.27

60.08

71.35

0.09

0.1 1

0.1 1

0.10

0.08

0.09

0.06

0.09

0.08

0.06

8

. . Table 6: Piezo 2 Data for Lane 1

Veh. No.

3 4 6 7 8 10 12 14 17

Cmptd Abs Time (s) 17.59 22.28 28.24 29.73 33.88 35.63 37.40 44.63 57.20

Actual Abs Time (s) 17.66 22.36 28.3 1 29.8 1 33.95 35.68 37.45 44.69 57.26

time error

0.07 0.08 0.07 0.08 0.08 0.05 0.05 0.06 0.06

6)

IY Conclusion

The time errors in every case are very consistent and demonstrate ability of the super synchronization principle. Even better accuracy could be produced in the LANL study by using a different amplitude point to mark the actual event time. The quarter maximum amplitude time would more closely correspond to the front tire crossing the leading edge of the loop and would result in closer times. Likewise, if the computed time had been calculated elsewhere, similar adjustments could be made. Taking computed time when the front tire is halfway through the loop, instead of at the leading edge of the loop, would more closely simulate the half maximum amplitude time. The Georgia study shows that when using passenger cars, which avoid errors associated with the transit time of a vehicle, similar gains in precision can be obtained. These techniques reported are col- lectively called super synchronization. They provide roughly an order of magnitude improvement in techniques for obtaining measurements of vehicle time events. The time is the essential param- eter to make a correspondence between the video ground truth and the sensor signals. With this increase in precision, nearly perfect time matching can be obtained.

DISCLAIMER

This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or respnsi- bility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Refer- ence herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recom-

mendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

I 9

Q) e 2

N a 0 s

I

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

c*) 0 N aJ iz

Figure la. Loop and Piezo Configuration

10

Figure lb. Camera view with (Xv, Y v ) at the intersection

1 1

oSTI - digital.library.unt.edu/67531/metadc... · C. James Elliott, EESJ Jason Pepin, XCM...

Documents

Transcript of oSTI - digital.library.unt.edu/67531/metadc... · C. James Elliott, EESJ Jason Pepin, XCM...