Cricket Activity Detection - CSEIn this project we classify di erent types of cricket strokes played...

Cricket Activity Detection

Ashok KumarJavesh GargAdvisor: Dr. Amitabha MukerjeeDept. of Computer Science and Engineering{ashokkrm,javeshg,amit} @iitk.ac.in

Abstract

The use of Optical flow analysis along with the shot boundary detection can greatly help in the analysis ofbroadcasted sports’ videos. In this project we classify different types of cricket strokes played by a batsmanduring the match. The agent first splits the cricket match video into shots using supervised learning, and findsout when the batsman plays the stroke. Agent then classifies the stroke type using optical flow technique.

keywords: shot boundary detection, frameclassification, optical flow analysis, gradualfade, panning, zooming, angle of view, YUVcolor space

1 Introduction

In the field of computer vision, analysis of sports’videos is one of recently researched topic.[1], [2] Incricket, an important task in activity detection is toclassify batsman stroke-play during a match. In ourpaper, we classify the various cricket strokes playedby a batsman into four different directions.This requires solving problem of shot boundary de-tection in a video. We split the video into shots bysupervised learning approach using colour histograms.The shot boundaries can be any one of ’cut’ or a ’dis-solve/fade’. We then classify the video frames intofour classes namely, ground, fielder, pitch and otherusing multi-class SVM. Using this classification, wecan find out the video segments in which cricket strokeis played by a batsman. Now we have a basic entityof a few frames where the batsman hits the ball. Wedo optical flow analysis[3] on this part of the video todetermine the direction of the stroke played by thebatsman. This takes into account the camera pan andzoom also.The complexity of the problem is due to the fact thatvideos are at 25fps and also the broadcasted videosuse several cameras with multiple transition effectsand action-replays. The completely dynamic envi-ronment also adds to the problem and finding anymeaningful pattern in the video sequence can be quitecumbersome. Intermittent crowd view scoreboards

etc. also adds to complexity the problem. The mainmotivation for choosing this work is that this can bea major step in further developing an automatic com-mentary system which would be a huge contributionto the field of sports. Some work like ball-start detec-tion and cricket highlights retrieval have already beendone in this field. Our approach although presentlytested on complete videos, can be easily used withstreaming videos also.

2 Implementation

2.1 Shot boundary detection

Shot boundary detection is one of the fundamentalproblem in field of computer vision. It is the veryfirst step used for segmentation and classification of avideo data.

A shot is an uninterrupted sequence of frames cap-tured by a single camera. Two consecutive shots canbe separated by various transitions like cut (Figure1) or fades (Figure 2) (also called shot boundaries).

For detecting cuts and fades we have followed ahistogram based approach [5].

In this approach we represent each frame as a fea-ture vector.

2.1.1 Feature vector generation andclassification

To generate feature vector corresponding to a framewe represent each frame as a YUV image histogram

of 6

Figure 1: cut between two frames

Figure 2: figure showing Gradual/Fade change in frames

which stores the total number of pixels in each bin(distribution of colors in an image).We compute both global and local histograms corre-sponding to a frame.

• Global histogram with n bins. Global histogramis less sensitive to camera motion but it fails onfades as the global difference is less and alone itdoes not give good results.

• To compute local histogram we first divide animage into m ×m blocks. Local histogram in-cludes color’s distribution in different regions.It is a vector of many block histograms, whereeach histogram corresponds to a block of image.

• The local histogram takes the spatial informa-tion into account and combined with the globalhistogram can give good results.

• Still, the histograms are just based on colour dis-tribution and thus two very different images maystill appear to be similar using this approach.

We quantized the UV plane of the color spaceinto 40x40 bins and using these histograms, com-pute squared difference with histograms of previous30 frames.

The difference function is represented by given func-tion:

Globaldiff (f, g) =Histf (c)−Histg(c)

max(Histf (c), Histg(c)

Localdiff (f, g) =

m∑i=1

m∑i=1

Histfi,j (c)−Histgi,j (c)

max(Histfi,j (c), Histgi,j (c)

Data: Frame number: fResult: Feature vector corresponding to that

frame1 V ← Initialize a vector of size 602 for i← 0 to 60 do3 V (i)← 04 end5 index← f − 16 for i← f − 1 to max(f − 30, 0) do7 V (index− i)← globaldiff (f, i)

V (index + 30− i)← localDiff (f, i)

8 end9 return V

Algorithm 1: Feature representation

• Training and testing were done using K-foldcross validation method with K=3.

• For cut and fade classification in testing datawe used K −NN algorithm with K = 5.

2.2 Frame classification

A frame in a cricket match video can be classified into7 major classes - {Ground view, Batsman view, Crowdview, Long shot, Pitch view, Fielder view and others}.This is an important step in the analysis of the videoto demarcate any major activity during the match likestart of a ball, stroke play by batsman or a fieldingattempt. Most common activities on the field can betracked using these classes. Say we want to find outwhen does the bowler starts to bowl. Mostly at thestart of a new ball, there is a cut in the video and

of 6

Figure 3: Different classes of frames in video

Figure 4: Multi-class SVM

after the cut there is a sequence of frames showingthe pitch view. Such heuristics can help find patternsin the broadcast videos.For the classification we first tried the grass pixel ra-tio - colour based approach[6] to classify the framesinto field view (Ground view, Long shot and Pitchview) and non-field view (Crowd view, Batsman view,Fielder view and others). But supervised learningmethods gave better results than this approach as itdepends on the colour of grass and time of the game-play.Then we used naive Bayesian and K −NN approachto classify the frames into classes but still the resultswere not very accurate. The major misclassificationswere that these algorithm classified the Pitch viewsinto Ground views and also mixed Batsman views andCrowd views. Batsman and Crowd views got mixedas most of the times the background of batsman viewwas full of crowd. Also sometimes spectators usuallywear same clothes as the teams they cheer and thusthe algorithms misclassifies them.In our work we did not require explicit classificationinto Batsman view , crowd view etc and thus classifiedinto 4 major classes only - {Ground view, Pitch view,Fielder view and others}. Finally we used multi-classSVM to classify the frames into these classes (Figure4)We split the single camera shots obtained in Step 1into parts where the batsman plays a stroke. We seethat the event of start of a ball is marked by a seriesof frames classified into a pitch view (when the bowlertakes a run up). So we know that the batsman willplay the ball in a next few frames. Now if the bats-man plays the shot, say a cover drive, then there is asequence of ground view after a cut in the video. Ifthe batsman misses the ball then we see no cut and a

sequence of fielder view comes after that (the bowleror the wicket-keeper). Third case is when the batsmanplays a defensive shot. In this case we find that thereis longer sequence of pitch view and then there is acut or fade in the video.

2.3 Stroke classification using optical flow

For stroke classification we first followed tree-structureapproach [4] for batsman pose estimation, but thisapproach didn’t work. Then we used optical flow ap-proach to classify direction of the stroke played intoone of the 4 directions.After obtaining video segments in which batsman playsstroke from step 2, we first split that video into setof images and then compute net optical flow usingAlgorithm 2 [3]. It can be observed that directionof motion of significant pixels (which have velocityvectors ≥ average velocity vectors) is opposite to thedirection of stroke played. This is because of the cam-era pan and zoom along that direction and thus thepixels show opposite motion. For example in Figure6, the optical field lines are shown for a cover drivei.e. stroke played in direction 2. we need to take intoaccount only the significant pixels as others tend togive false results due to players moving in randomdirections (batsman usually moves forwards in thepitch and the bowler is taking run ups etc.).Using the video segments we can also classify whetherthe batsman missed the ball or played a defensive shottoo. When there is a miss by the batsman then, wecan classify the event using just frame classificationonly (stated in section 2.2 and for defensive play wefind that the velocity vectors are very small in magni-tude combined with the frame classification approach.

of 6

Figure 5: Tree-structure approach

Figure 6: Classification of stroke direction into 4 directions

Data: continuous sequence of frames:F (|F | ≥ 2)

Result: Direction of stroke1 [MVx,MVy]← OpticalF low(F ) (using [3])2 [x, y] = size(MVx) // All images are of n× n

for i← 1 to 4 do3 count[i]← 0 // maintains no of pixel in

direction i4 end5 Avgvelocity =Compute Avg velocity(MVx,MVy)

6 for i← 0 to x do7 for j ← 0 to y do8 if velocity(pixel(i, j)) > Avgvelocity

then9 count[Angle(pixel(i, j))] =

count[Angle(pixel(i, j))] + 1]10 // Angle(pixel(i, j)) and

velocity(pixel(i, j)) are computedusing MVx,MVy

11 end

12 end

13 end14 return Direction max val(count)

Algorithm 2: Computing direction of stroke

3 Results and Datasets

3.1 Dataset

There is no standard dataset for the cricket matchanalysis available and hence we had to create our owndataset. For this we chose an 8 over cricket matchvideo played between Australia and England (25 fps).Training and testing were done using K-fold crossvalidation method with K=3. Approximately 29000frames were manually analyzed for training purposesand about 14000 frames were used for testing.

3.2 Result

The results for the shot boundary detection are statedin Table 1 and we have achieved comparable resultsfor cuts detection and very good results for fades. Weachieved good recall values for finding the video seg-ments when the bowler bowls and the batsman playsthe stroke or defensive shot.

• Recall: 90.90%

• Precision: 50%

The overall accuracy of stroke direction classificationusing optical flow analysis of the video segments is80%. We also achieved good results for detectingdefensive shots or balls missed by the batsman.

of 6

Figure 7: Results

Transition Total Present Total Obtained Correct Obtained Precision Recall

Cuts 82 81 73 90.12 % 89.02 %

Gradual 46 47 39 82.97% 84.78%

Table 1: Shot Boundary Detection

Frame Class Total Present Total Obtained Correct Obtained Precision Recall

Pitch View 571 573 499 87.00% 87.39%

Ground View 773 781 732 93.70% 94.60%

Fielder View 453 400 373 93.25% 82.30%

Table 2: Frame Classification

Figure 8: Example of common errors obtained in shot boundary. Misses tend to be dominated in frames having similarbackground, while false positives frequently occur when there is a gradual change between frames.

3.3 Analysis of Error

In shot boundary detection we observed 7 cuts cor-rectly, Most of the false positives are observed inframes having fade/gradual change (Figure 8). Andmisses are observed in frames having similar back-ground. For example - when the camera shows crowdview in two shots separated by a cut then, the back-ground is almost same and the classifier misses thecut.

4 Acknowledgements

We would like to thanks Prof. Amitabha Mukerjeefor his valuable suggestions and guidance throughoutour project.We also thank Prof. Vinay P. Namboodiri and Mr.Dipen S. Rughwani for guiding us.

of 6

References

[1] Rughwani, Dipen S. “Shot Classification and Se-mantic Query Processing on Broadcast CricketVideos.” M. Tech Thesis (2008).

[2] Singh, Khushdeep, and Aakash Verma. “TennisStroke Detection.” (2012).

[3] B.K.P. Horn and B.G. Schunck, “Determiningoptical flow.” Artificial Intelligence, vol 17, pp185–203, 1981.

[4] Ramanan, Deva. “Learning to parse images ofarticulated bodies.” NIPS. Vol. 1. No. 6. 2006.

[5] Irena Koprinska and Sergio Carrato. Temp oralvideo segmentation: A survey. In Signal Process-ing: Image Communication, volume 16(5), pages477-500, 2001

[6] Kolekar, Maheshkumar H., Kannappan Palaniap-pan, and Somnath Sengupta. “Semantic eventdetection and classification in cricket video se-quence.” Computer Vision, Graphics & ImageProcessing, 2008. ICVGIP’08. Sixth Indian Con-ference on. IEEE, 2008.

of 6

Cricket Activity Detection - CSEIn this project we classify di erent types of cricket strokes played...

Documents

Transcript of Cricket Activity Detection - CSEIn this project we classify di erent types of cricket strokes played...