Unsupervised Monitoring and 3D Reconstruction of Single-Camera Snooker Video

(1)

Bachelor Computer Science

Unsupervised Monitoring and 3D

Reconstruction of Single-Camera

Snooker Video

Engel Hamer

June 17, 2019

Supervisor: dr. R.G. Belleman

Computer

Science

—

University

of

Amsterd

am

(2)

(3)

Abstract

Over the past decades computer vision has proven itself as a valuable and innovative technology in the world of sports. The strategic cue sport snooker can benefit from a computer vision application for training purposes. A traditional method of training like videotaping falls short here since it does not provide sufficient spatial awareness of the scene, which is required in the game. This research proposes a system that uses computer vision techniques to analyse a snooker match based on single-camera video footage. The analysis includes extraction of the table area, identification of balls within the scene and trajectory construction of each ball’s movement throughout the match. No human intervention is required for the execution of any of these steps. A web application is created to visualise the resulting reconstruction in a three-dimensional scene. This allows players to review their match from any desired point of view, removing the limited perspective that the video footage implies. In an evaluation of the system 96.9% of all balls before each stroke were correctly reconstructed. Trajectories were accurate for 87.3% of the ball movements. These results indicate satisfying accuracy for use in training.

(4)

(5)

4.2.4 Potting balls . . . 26 4.2.5 Constructing trajectories . . . 27 4.3 Reconstruction . . . 28 5 Experiments 31 5.1 Method . . . 31 5.1.1 Reconstruction accuracy . . . 32 6 Results 33 6.1 Reconstruction accuracy . . . 34 6.1.1 Ball detection . . . 34 6.1.2 Trajectories . . . 34 7 Discussion 35 7.1 Conclusion . . . 36 7.2 Future research . . . 36

(6)

(7)

CHAPTER 1

Introduction

Over the past decades computer vision has proven itself as a valuable and innovative technology in the world of sports. A wide variety of commercial applications has become available, ranging from virtual referee assistants to analysis tools for television broadcasting. Not only do these applications play a key role in enhancing viewer experience, they also enhance players’ perfor-mance. For example by keeping track of movement or other automated match statistics which aid in training and coaching [26].

The strategic cue sport snooker can benefit from a computer vision application for training purposes. The game involves discovering the optimal shot given a composition of balls on the table, which requires sufficient spatial awareness of the scene. A traditional method of train-ing like videotaptrain-ing falls short here [9]. It allows for analystrain-ing the match in hindsight but is limited to a single point of view on the table. This makes it harder to discover possible shots compared to having the ability to physically move around the table, as is often done in real-ity. For this reason, a more natural approach to analysing a snooker match would be to do so in a three-dimensional scene. This allows players to view the table from any desired point of view at any moment during the match, providing valuable insights into a player’s performance. The aim of this research is to develop a system that analyses a snooker match and reconstructs it in a three-dimensional interactive scene for training purposes, based on video footage from a single camera. In order to perform the analysis, computer vision techniques will be used to detect and track the positions of balls on the table throughout the video footage. Ball trajectories will then be reconstructed from this data, for visualisation in the three-dimensional scene. Since accuracy and robustness of the reconstruction are of major importance for the usability of the system, this research addresses the following question:

How can computer vision techniques be used to create an accurate and robust system for three-dimensional reconstruction of single-camera snooker video?

1.1 Research structure

The next chapter provides background information on snooker to understand the essence of the game, as is necessary for the rest of this research. Current applications of computer vision techniques in sports are reviewed here as well, emphasising on prior studies related to snooker. Both correspondences and differences compared to this research are highlighted. Chapter 3 proceeds with the design of the system. This concerns both its physical setup as well as hard- and software used. Several options are considered before commencing with the actual implementation of the system. A specification of the system’s implementation including techniques used follows in Chapter 4. To evaluate the research question, a real-world experiment on the accuracy of the system is conducted in chapter 5. The results of this experiment are presented in chapter 6. Finally, chapter 7 concludes the research question and proposes future work.

(8)

(9)

CHAPTER 2

Background

To ensure that the reader understands the essence of the game, this chapter starts by explaining a condensed version of the official snooker rules established by the World Professional Billiards and Snooker Association [27]. Following this explanation, the current state of computer vision techniques in sports is considered. Lastly, related work on the cutting edge of both topics is discussed, highlighting correspondences and differences compared to this research.

2.1 Outline of the game

Snooker is a strategic cue sport in which players compete in a sequence of individual games called frames. The playing field consists of a rectangular table with four pocket holes in the corners, and another two halfway along the longest sides. The table surface is lined with a (usually green coloured) baize, and surrounded by rubber cushions that allow balls to rebound. Table dimensions for snooker far exceed those of other cue sports such as billiards or pool. The official rules call for a playable area of 3569mm by 1778mm [27]. This increases the complexity of capturing the scene, as will be discussed in section 3.1.2. Pitch markings on the table indicate the dedicated spots of the balls as shown in Figure 2.1. In addition to a (white) cue ball, which is initially placed anywhere within the D-shaped area, 21 object balls are used: fifteen red balls (worth 1 point each), one yellow (2 points), one green (3 points), one brown (4 points), one blue (5 points), one pink (6 points) and one black ball (7 points).

Figure 2.1: Initial table layout. The white cue ball can be placed anywhere in the D-area [11]. At the beginning of a frame the first player may approach the table. Their objective is to pot (enter a pocket) one of the red balls by striking the cue ball once. If a red ball is successfully potted, the player is granted the value of the colour (1 point). They may then proceed to strike the cue ball once again, only this time a coloured (non-red) ball shall be potted. This process repeats itself, alternating between red and coloured balls, until all reds are off the table. At this point, the coloured balls shall be potted in order of ascending value. If the player fails to pot the required ball or commits a foul (referring to a stroke against the rules, such as potting the white ball) their turn ends and the other player may approach the table.

(10)

Unlike reds, coloured balls are replaced once potted. This means that whenever a coloured ball leaves the table it will be placed back on its own dedicated spot (as in Figure 2.1). If this is not possible, i.e. another ball already occupies that spot, the highest value available spot is chosen. In the exceptional case that all spots are occupied the ball is placed as close to its own spot as the situation allows, but always on the straight line between the own spot and the top cushion (closest to the black spot). Once no red balls remain on the table coloured balls are not replaced. They shall be potted in order of ascending points, after which the game is finished.

2.2 Computer vision in sports

The difference between triumph and demise in modern day sports is sometimes in the order of millimetres or -seconds. This puts pressure on athletes and their coaches, but also introduces difficulty for viewers to keep up. For this reason, various studies have researched the usability of computer vision to aid sports analysis. A recent survey by Thomas et al. provides an exten-sive overview of currently available commercial applications [26]. Most of these are related to television broadcasting, but some are also used by trainers and coaches to analyse and improve players’ performance.

The most common tasks the applications are concerned with are detection and tracking of certain subjects in the playing field, such as players or balls. An example of this is the Hawk Eye Tennis System, which reconstructs tennis ball trajectories based on video footage from camera’s placed around the court [23]. The system takes advantage of prior knowledge about tennis, including specific properties of the ball. Candidate ball regions are first detected in the images based on shape and size. These are then compared with plausible tracks and matched between the different camera views. With a reported error of 2.6 mm, the reconstruction is sufficiently accurate for use as a referee tool in official tournaments. Similar systems exist for precision problems in other sports, such as goal line crossings in soccer and fouls in cricket [12, 26].

2.2.1 Single camera systems

Whilst the above systems make use of multiple camera’s to accurately reconstruct the scene, it is not impossible to do so using only a single camera. An approach presented by the R&D department of the BBC first performs background subtraction on the camera image based on chroma keying techniques [8]. The foreground objects, e.g. players or balls, that remain are then placed in a three-dimensional model of the playing field. For enhanced quality of the reconstruction, three-dimensional shapes are generated from the objects’ silhouettes based on assumptions about their geometry. This allows audiences to view the reconstruction from any angle, making it a useful tool for television broadcasting. Figure 2.2 shows an example application of this technique during a soccer match, in an implementation developed by Ericsson.

(a) Original video frame (b) Reconstructed virtual stadium

Figure 2.2: Reconstruction of a single-camera soccer video frame (a) in a three-dimensional scene (b) for broadcasting, Generated by Ericsson’s Piero Sports Graphics system1[26].

(11)

2.2.2 Vision based calibration

Formerly, camera’s had to be equipped with special gyroscopic sensors and manually calibrated to the scene in order to perform sports tracking [26]. Modern day systems, on the other hand, estimate camera position, orientation and focal length based on visible features of the playing field. For example the presence of prominent pitch marks in soccer [25]. This omits the need for calibration and additional (costly) camera hardware, making automated tracking systems for sports analysis easier to deploy.

2.3 Related work

Several of the aforementioned studies researching computer visions in sports are related to snooker. Again, they mostly concern television broadcasting. Guo and Mac Namee presented the Snooker Extraction and 3D Builder (SE3DB), designed to give snooker audiences a look at the table from a player’s perspective [9]. The system exploits specular highlights emerging on the surface of the balls in order to determine their locations in an overhead photograph of a miniature snooker table. Detection of both locations as well as colouring of the balls achieved successful results, however the environment of an actual snooker match is not taken into account. No players or referees obstructed any of the table area during testing of the system. Various other studies also omitted this problem [17, 24].

In an attempt to improve the real world usability of a system like this, a British study added various forms of intelligent filtering [16]. Interference from players and cues is filtered based on colour and size of the candidate ball regions. Additionally, the system was tested on footage of a real snooker match, as shown in Figure 2.3. Despite these improvements increase usability of the system, its success still highly depends on the camera’s colour accuracy. Narrow colour segmentation will not always work due to issues like motion blur, causing colour bleeding.

(12)

(13)

CHAPTER 3

Design

Before proceeding with the implementation of the system, several possibilities regarding both hard- and software usage were considered. This chapter provides an overview of these possibili-ties and motivates the design choices that contributed to the final system.

The anticipated system consists of two main components:

• Match analysis — Recording of the match, as well as performing ball detection and tracking. This requires hardware that is dedicated to the system.

• Reconstruction — Displaying the match in a virtual three-dimensional model of the scene. This does not require dedicated hardware and can be performed on end users’ devices.

3.1 Match analysis

First, different options for the physical setup of the system are considered. The system’s hardware is chosen accordingly: it should suit the needs of this setup, as well as adhere to its limitations. Finally, a software library and programming language for the actual development of the tracking system are chosen.

3.1.1 Positioning the camera

Sports ball tracking systems generally use multiple cameras [12, 23]. A necessity which emerges from the sheer size of the field. Additionally, ball movement is not limited to the ground plane but rather a three-dimensional space [26]. Naturally, a snooker table is significantly smaller than the playing field of regular ball sports. Also the official rules prohibit jump shots where a ball leaves the table bed [27]. It is assumed that players obey the game rules, therefore a single camera is sufficient for capturing the scene.

Previous studies regarding ball detection in the snooker field have used different camera mounting positions. The most obvious being directly above the table, mounted to the ceiling [9, 17]. For increased portability, a tripod setup is also possible [16, 24]. This allows for ad hoc deployment of the system, but is highly prone to players blocking the camera’s sight. Since moving around the table is common practice during gameplay a top down ceiling mount was found most sufficient for the system. Having the camera in a fixed position this way is beneficial for further processing, since calibration steps only need to be taken once.

(14)

3.1.2 Hardware

Mounting the system hardware above the table surface requires a highly compact form factor. The Raspberry Pi1_{as shown in Figure 3.1a is a single board computer that fits this requirement,}

measuring only 85 by 49 mm. It is equipped with a quad core 1.2GHz ARM processor and 1GB of memory, as well as networking capabilities and a wide range of I/O. Further benefits include low power consumption and an affordable price tag of $35, making this device a suitable embodiment of the system.

(a) Board model 3B (b) Camera module V2

Figure 3.1: Raspberry Pi model 3B single board computer and camera module V2. Used for recording and tracking.

For video recording a Raspberry Pi camera module V22 _{as shown in Figure 3.1b is chosen}

as it is the de facto standard when working with Raspberry Pi computers. This camera easily connects to the board via a ribbon cable. An 8-megapixel Sony sensor inside provides a maximum recording resolution of 1920 by 1080 pixels at 30 frames per second. Due to space limitations above the table combined with a prescribed table length of over 3.5 meters the module’s standard camera lens is unable to capture the entire table surface [27]. For this reason, a 160 degree field of view fisheye lens is attached to the camera module (measured diagonally, thus approximating 120 degree horizontal). For the wider field of view to take full effect, the recording resolution is adapted to a 4:3 aspect ratio at 1296 by 972 pixels.

3.1.3 Computer vision library

The official (Debian-based) operating system for the Raspberry Pi, Raspbian3, is used by a large community. Many common packages are available pre-compiled for the ARM architecture of the Pi, including OpenCV4_{, the most popular computer vision and machine learning library in use}

today. It operates cross-platform, contains over 2500 highly optimised algorithms and focuses on real-time applications. For this reason, OpenCV is the library of choice for performing the match analysis.

In addition to the native C++ implementation, a Python wrapper is available for the OpenCV library. Preference is given to the latter: development benefits greatly from the rapid proto-typing that Python offers, whilst the system’s performance is warranted by the underlying C++ implementation. Python is also the language of choice for most Raspberry Pi projects, since it comes pre-packaged with Raspbian.

3.2 Reconstruction

Whereas match analysis takes place behind the scenes on dedicated hardware, end users of the system do get to interact with the three-dimensional reconstruction. To ensure easy accessibility it is important that this reconstruction runs on readily available hardware. For example a mobile phone or a laptop computer. Selecting a suitable graphics environment is important for the usability and performance of the reconstruction, especially given these hardware limitations. Two possible candidates were considered for this research: Unity and A-Frame.

1_{https://www.raspberrypi.org/products/raspberry-pi-3-model-b/} 2_{https://www.raspberrypi.org/products/camera-module-v2/} 3_{https://www.raspberrypi.org/downloads/raspbian/} 4_{https://www.opencv.org}

(15)

Unity5 is a well-established graphics engine used for creating games, simulations and other experiences. A-Frame6 was created more recently, and provides a web framework for building three-dimensional and virtual reality scenes using HTML. Since the reconstruction scene consist of only a table model and balls, the functionality offered by both platforms is sufficient for these purposes. Device compatibility is also excellent: Unity supports 25 different platforms, and A-Frame relies on WebGL which operates in all modern browsers. A benefit to using A-A-Frame is that no installation is required before use, contributing slightly more to the accessibility of the reconstruction than Unity. Additionally, A-Frame is open source software that can be freely used, whereas Unity is proprietary. This makes A-Frame the better platform for implementing the reconstruction.

5_{https://unity3d.com} 6_{https://aframe.io}

(16)

(17)

CHAPTER 4

Implementation

This chapter provides an overview of the techniques used by the system to analyse and reconstruct a match. In the first section, the system’s initialisation stage is described. This stage ensures all prerequisites for the actual match analysis are met, and calibrates the system according to the scene. After that follows a specification of the match analysis procedure. The chapter concludes with a brief description of the three-dimensional reconstruction. A schematic representation of the system’s image pipeline containing all components is shown in Figure 4.1.

3 Camera undistortion Video stream Table extraction Potting detection Ball detection Player detection Perspective transform estimation Background extraction Trajectory construction (tracking) Initial video frame JSON output Reconstruction in 3D scene Camera calibration

(a) Initialisation (b) Match analysis (c) Reconstruction

Figure 4.1: The system’s image pipeline.

The system’s match analysis component is implemented using OpenCV1_{, a well established}

library which provides a multitude of useful computer vision and machine learning algorithms. The implementation is written in the Python programming language and consists of several modular scripts for its various tasks, such as camera correction, table extraction and ball detec-tion. To ensure maintainability of the system all of its functions are documented according to the conventions described by PEP2572_{. Furthermore, all source code adheres to the PEP8}3 _style

guidelines.

1_{https://www.opencv.org}

2_{https://www.python.org/dev/peps/pep-0257/} 3_{https://www.python.org/dev/peps/pep-0008/}

(18)

4.1 Initialisation and image pre-processing

Before match analysis can start, the system is initialised based on the first frame of the video stream. This initialisation involves vision based calibration of the scene, as is common practise in modern sports computer vision systems [25, 26]. Several parameters that are required for image pre-processing are set here. Since the camera orientation and position remain fixed during the analysis, these parameters remain valid for subsequent frames.

4.1.1 Camera undistortion

As described in section 3.1.2, a fisheye lens is used to capture the entire table in frame of the camera. This type of lens is chosen for its large field of view, achieved by visual distortion of the image. Contrary to regular rectilinear lenses it gives images a convex appearance, in which straight lines are displayed rather as curves [21]. An example of this can be seen in Figure 4.2a. The usually straight table edges show strong negative radial distortion, curving towards the top and bottom of the image.

(a) Original image (b) Undistorted image

Figure 4.2: Original (a) and undistorted image (b) of a snooker table at the beginning of a game, taken with a 160 degree (diagonal) fisheye lens.

To understand why this distortion occurs and how to correct for it, the pinhole camera model is used. For any given coordinate vector Xw in the world reference frame, the corresponding

coordinate vector in the camera reference frame is computed by [21]:   x y z  = RXw+ T (4.1)

Where R and T denote a rotation and translation, respectively. In this model, the pinhole projectiona

b

of the three-dimensional point in the camera reference frame is then:

a = x/z (4.2)

b = y/z (4.3)

r2= a2+ b2 (4.4)

θ = atan(r) (4.5)

From here, the fisheye lens introduces a distortion from the pinhole coordinates to the distorted coordinatesx

0

y0

in the image plane:

θd= θ(1 + k1θ2+ k2θ4+ k3θ6+ k4θ8) (4.6)

x0= (θd/r)a (4.7)

(19)

Where k1· · · k4 are the distortion coefficients of the lens. For fisheye lenses these coefficients are

positive. Hence why points that are located far away from the centre of the camera’s view (and thus yield a high value for r) appear more distorted than points near the centre.

OpenCV provides functionality for approximating the distortion coefficients as well as the internal camera matrix via a checkerboard calibration procedure. A remap can then be con-structed based on this approximation, allowing distorted points to be transformed back to their original location [21]. By using pixel interpolation (in this case bilinear interpolation) the whole camera view can be corrected with this technique. The result of this is shown in Figure 4.2b, where it becomes clear that the table has been transformed back to its original rectangular shape. Since the distortion coefficients and camera matrix are intrinsic to the camera and lens com-bination, the calibration step with the checkerboard pattern only has to be performed once for the entire system. From here the remapping of the distorted pixels takes place for every frame, to provide an undistorted image for the following steps.

4.1.2 Table extraction

The next stage in the system involves localisation and extraction of the table area. Since other pixels in the image are irrelevant for the analysis, this is called the region of interest (ROI). Extracting this ROI in subsequent frames simplifies the analysis, since only relevant pixels are evaluated. As mentioned in section 2.1, the table surface in snooker is usually green coloured. Previous studies have demonstrated the importance of using such domain-specific knowledge about the scene [9, 16, 23, 24]. For this reason the system performs table detection based on colour segmentation. To make this segmentation invariant of lighting artefacts, the image is first converted from RGB to HSV color space. By applying a range threshold on the hue channel for every pixel in the image a boolean image of candidate table areas is constructed, such that:

dst(x, y) = (

1 if srcH(x, y) ∈ [H0, H1]

0 otherwise (4.9)

for all x, y in the image, where

• dst(x, y) denotes the value in the destination image at position (x, y).

• srcH(x, y) denotes the value in the hue channel of the source image at position (x, y).

• H0 and H1 denote the green hue range. In OpenCV this corresponds to [30, 75].

Note: this threshold can easily be adapted to target a different coloured baize.

The result of this threshold applied to the image from Figure 4.2b is shown in Figure 4.3a. To detect which pixels in the image belong to the table area, connected component labelling is applied [4]. This algorithm labels every pixel in the image such that pixels forming a blob in the image correspond. The largest labelled area in the output of this algorithm is assumed to be the table area, and can be separated from the other green objects as shown in Figure 4.3b. The shape of the table area is obtained by removing all other objects from the image followed by a flood-fill from the top left corner, as shown in Figure 4.3c.

(a) Green hue threshold (b) Connected components (c) Detected table area

(20)

Next, the locations of all four table corners need to be detected from the boolean image. First, the Canny edge detector is applied to the image [3]. The result of this will be the exact contour of the table, as shown in Figure 4.4a. Had the prior connected component labelling or flood-fill steps not taken place, then this edge image would have been polluted with various other edges such as ball outlines or pitch marks. For detecting the straight lines spanning the table area in the Canny edge image, the parametric notion of lines by means of polar coordinates is considered:

r = x cos(θ) + y sin(θ) (4.10)

where

• r denotes the length of a normal from the origin to the line. • θ denotes the rotation angle between r and the x-axis.

Given a single point in the cartesian coordinate space, the set of all straight lines going through that point can be represented by a unique sinusoidal curve in the (r, θ)-space. This observation forms the basis for the Hough lines transform, which involves plotting these sinusoids for all points in the detected edge segments [6]. A set of colinear points in the cartesian coordinate space hereby becomes readily apparent, since it yields curves which intersect at a common point. The result of this transformation applied to the edges from Figure 4.4a is shown in Figure 4.4b. In this example the four intersection points representing the table edges are clearly visible.

To reduce the computational complexity of the line detection step, the continuous curves in (r, θ)-space are approximated using a quantised voting system [6]. Both the r and θ axes are binned with a step size of 1. This results in a two-dimensional accumulator where each cell contains a number representing the amount of points on the corresponding line in the cartesian image plane. For the actual table edge detection all accumulator cells with a bin count higher than 40 are considered as potential candidates. This filters out any false detections, and ensures a sufficient line length. Remaining lines are categorised based on their positioning and slope, as either top, right, bottom or left edge. In each category the longest line is picked, since it is expected to resemble the edge most accurately. The table corners are obtained by computing the intersection points between these lines, extrapolating them when necessary.

(a) Canny edge detections (b) Hough lines transform (c) Detected corners

Figure 4.4: Canny edge detection (a), Hough lines transform (b) and detected table corners (c) detected from Figure 4.3c

Finally, the detected table area is extracted from the image. Cropping would fall short here, because the camera’s viewpoint is likely not perfectly perpendicular to the centre of the table. Therefore, a perspective transformation is required to correct the camera view. This maps points from one image directly to another coordinate frame whilst maintaining straight lines within the original image. Incidentally, this somewhat relaxes the requirement for mounting the system directly above the table, allowing camera perspectives at a slight angle [16]. Table 4.1 describes the mapping from points in the video frame to the resulting extracted image. To maintain the table’s original aspect ratio as defined in the official rules the dimensions of the output image will be 1200 by 600 pixels [27].

(21)

Table 4.1: Point mapping for table extraction, based on a 1200 by 600 pixels output image. i (xi, yi) (x0i, y0i) Top left 1 (x1, y1) (0, 0) Top right 2 (x2, y2) (1199, 0) Bottom left 3 (x3, y3) (0, 599) Bottom right 4 (x4, y4) (1199, 599)

A perspective transformation from coordinates (x, y) in the input image to coordinates (x0, y0) in the resulting image is described by [15]:

s   x0 y0 1  =   a b c d e f g h i     x y 1   (4.11)

Ignoring the scale factor s, this gives for x0 and y0: x0= ax + by + c

gx + hy + i (4.12)

y0= dx + ey + f

gx + hy + i (4.13)

This can be rewritten as:

x0(gx + hy + i) = ax + by + c (4.14) y0(gx + hy + i) = dx + ey + f (4.15)

0 = ax + by + c − x0(gx + hy + i) (4.16) 0 = dx + ey + f − y0(gx + hy + i) (4.17) Consequently, in matrix form:

0 0

=x y 1 0 0 0 −x

0_x _−x0_y _−x0

0 0 0 x y 1 −y0_x _−y0_y _−y0

              a b c d e f g h i               (4.18)

Stacking all projected points:             0 0 0 0 0 0 0 0             =             x1 y1 1 0 0 0 −x10x1 −x01y1 −x01

0 0 0 x1 y1 1 −y10x1 −y10y1 −y01

x2 y2 1 0 0 0 −x20x2 −x02y2 −x02

0 0 0 x2 y2 1 −y20x2 −y20y2 −y02

x3 y3 1 0 0 0 −x30x3 −x03y3 −x03

0 0 0 x3 y3 1 −y30x3 −y30y3 −y03

x4 y4 1 0 0 0 −x40x4 −x04y4 −x04

0 0 0 x4 y4 1 −y40x4 −y40y4 −y04

                          a b c d e f g h i               (4.19) 0 = A~p (4.20)

Since the method for extracting the four corner points ensures that they are never colinear, this equation should theoretically have one non-trivial solution for the parameter vector ~p given A.

(22)

However, noise introduced during point detection makes this highly unlikely in practice. The best approximation of ~p minimises kA~pk subject to the constraint k~pk = 1, to avoid the obvious solution of the zero vector. Given the singular value decomposition A = U DVT _{and ~}_{q = V}T_~_p,

finding ~p comes down to:

min kA~pk s.t. kpk = 1 (4.21)

(4.22) Which can be rewritten as:

min kU DVTpk~ s.t. kpk = 1 (4.23)

min kDVTpk~ s.t. kpk = 1 (4.24)

min kD~qk s.t. kV ~qk = 1 (4.25)

min kD~qk s.t. k~qk = 1 (4.26)

Since D is a diagonal matrix with sorted singular values along the diagonal, ~q = (00 · · · 01)T

solves this equation. This means that ~p = V ~q, the last column of V [15]. Reshaping ~p into a 3 by 3 matrix completes the perspective transformation. Figure 4.5 shows the extracted table area obtained through this transformation.

(a) Original undistorted image (b) Extracted table area

Figure 4.5: Original image (a) and extracted table area (b) using the inverted perspective trans-formation and bilinear interpolation.

4.1.3 Background extraction

Initial ball positions at the start of each game are inherent to the game, bounded by a predefined region. This is spanned by the green and yellow ball spots as the vertical boundaries, and the D-area and black ball spot as the horizontal boundaries [27]. By blurring only this region in the first video frame a grayscale background image of the scene is generated. This is used to perform ball detection during match analysis, as described in section 4.2.2. The use of a 120 by 120 pixels box blur kernel completely obscures all balls, as shown in Figure 4.6b. Pocket holes and edge cushions remain untouched.

(a) Initial ball area (red) (b) Extracted table background

(23)

4.2 Match analysis

Once all prerequisites for camera correction and table extraction are in place the actual match analysis can start. This section provides a detailed overview of the techniques used to perform this analysis. These are finally combined into the ball tracking routine described at the end of this section.

4.2.1 Multi-purpose tracking algorithms

To investigate the applicability of some of today’s most well known multi-purpose tracking al-gorithms for this system, several preliminary prototypes of the ball tracking routine were made. Individual balls were manually selected in the first frame of a short full colour video sequence containing several strokes. Next, a variety of algorithms were applied to track them. Some of these are known for their high accuracy, like CSRT [18] and KCF [10]. TLD [14] on the other hand excels in terms of occlusion over multiple frames and Mosse [2] holds a reputation for its high processing speed. The other algorithms that were used are MIL [1], MedianFlow [13] as well as the old Boosting tracker [7].

Compared to other applications of object tracking, tracking of snooker balls is particularly challenging due to their highly similar appearance and lack of features. Additionally, each ball covers only a small pixel area in the image. These circumstances caused frequent mismatches between balls for most of the trackers. Others were unable to perform any tracking at all, report-ing stationary ball locations throughout the entire video. Consequently, none of the investigated tracking algorithms were considered sufficient for the system.

4.2.2 Ball detection

While prototyping the match analysis component several ball detection methods were investi-gated. The first version of the system applied a circle-based version of the Hough line transform on an edge detection image of the table, in an attempt to detect the balls’ outlines [6]. While balls were occasionally detected with this method, it often failed. Likely due to the balls being too small or slight distortions in their circular shape as a result of the perspective transformation from section 4.1.2. Additionally, this method is easily broken by motion blur and the morphed appearance of fast moving balls.

In the final version of the system ball detection is performed based on background subtraction, a method that has been proven successful in a previous study [16]. Subtracting the previously generated background image (see section 4.1.3) from a grayscale video frame yields a difference image as shown in Figure 4.7b. This only shows intensity for pixels which are brighter than the background. The reflective surface of snooker balls results in strong specular highlights on each ball, which are much brighter than the matte table surface. This appears to be a consistent feature across most imagery and video of snooker tables, therefore the difference image contains at least some intensity at every ball location [9]. To filter out any noise and make the ball locations more apparent a threshold is applied to the difference image:

dst(x, y) = (

1 if src(x, y) − bg(x, y) ≥ 30

0 otherwise (4.27)

• src(x, y) denotes the intensity value in the source image (current frame) at position (x, y). • bg(x, y) denotes the intensity value in the background image at position (x, y).

(24)

(a) Original frame (b) Intensity differences (c) Thresholded differences

Figure 4.7: Original video frame (a), unprocessed (b) and thresholded (c) positive intensity differences with generated background image from Figure 4.6b.

To detect the locations of individual balls connected component labelling is applied, similar to the table extraction routine [4]. All connected components that cover a larger than expected area (i.e. ball size) are eroded to prevent the brightest balls from merging into a single blob when near each other (more on erosion in the next section). The obtained labelling is shown in Figure 4.8a. By computing the average cartesian image coordinates from all pixels in a labelled area the centroids of the areas are computed, which correspond to the ball locations as shown in Figure 4.8b.

(a) Connected components (b) Detected ball locations

Figure 4.8: Connected component labelling (a) and detected ball locations (b) extracted from Figure 4.7c.

4.2.3 Player and cue interference

So far this research has considered the ’ideal’ situation where only balls are in view of the camera, and it never loses sight of any of them. Naturally, this is not a realistic scenario. In real games players constantly interact with the table, often lean over it and block balls from view. To make the system more robust and invariant of such interference players’ interaction with the table should be anticipated.

Like in several other occasions, the official rules of the game contain a clause that can be used as an advantage here: players should at all times keep at least one foot on the ground [27]. This means that regardless of how a player positions themselves over the table, they will always cross the table’s edges on at least one occasion. On that note, since the entire table area including edge cushions is in view, detecting when a player enters the scene from an edge is uncomplicated. Whereas the ball detection step considered only areas of the table where the current frame is brighter than the background image, player detection requires absolute differences. This is because it cannot be expected that all player pixels in the image are lighter than the table surface, for example when players wear dark clothing. Again, a threshold is applied to detect siginificant areas: dst(x, y) =      1 if src(x, y) − bg(x, y) ≥ 30 1 if bg(x, y) − src(x, y) ≥ 30 0 otherwise (4.28)

(25)

for all x, y in the image, where all variables are the same as in equation 4.27. An example difference image with a player in view as well as the corresponding threshold are shown in Figure 4.9.

(a) Original frame (b) Intensity differences (c) Thresholded differences

Figure 4.9: Original frame (a), unprocessed (b) and thresholded (c) absolute intensity differences with generated background image from Figure 4.6b. Based on a video sequence where a player strokes the cue ball.

From this example, it becomes clear that not all pixels that belong to the player area are interconnected. This is undesired behaviour caused by insufficient intensity differences between the player’s body and the background image. To overcome this problem the thresholded image is dilated. This morphological operation expands the shapes contained in the source image according to the following formula [22]:

dst(x, y) = max

(x0_,y0_{): element(x}0_,y0₎₆₌₀src(x + x

0_{, y + y}0₎ _(4.29)

• src(x+x0, y +x0) denotes the intensity value in the source image (current frame) at position (x + x0, y + y0).

• x0 _{and y}0 _{denote offset values.}

Possible offset values used here range between -5 and 5 pixels. This allows 10 pixel gaps to be filled whilst preventing balls adjacent to edge cushions from touching the image boundaries. The resulting dilated image is shown in Figure 4.10a. Since only pixels that are connected to one of the table edges are of interest for player detection, connected component labelling is used to extract only these pixels from the image. The mask obtained through this labelling is shown in Figure 4.10b and highlighted in the original image in Figure 4.10c. Section 4.2.5 explains how it is further used in the system.

(a) Dilated differences (b) Connected edge pixels (c) Detected player area

Figure 4.10: Dilated threshold image (a), all pixels that are connected to the edges of the image (b) and detected player area (c) from the thresholded intensity differences in Figure 4.9c.

Because the cue and the cue ball collide when a player strokes, the cue ball is sometimes wrongfully included in the player mask. For example in Figure 4.10 above. This behaviour is undesired for obvious reasons and should be corrected for. This is done using the counterpart of dilation: erosion. Rather than expanding shapes in the source image, this operation shrinks them according to the following (highly similar) formula:

dst(x, y) = min

(x0_,y0_{): element(x}0_,y0₎₆₌₀src(x + x

(26)

for all x, y in the image, where all variables are identical to those in equation 4.29 and (x0, y0), (x1, y1)

denote the corners of a 20 by 20 pixels bounding box surrounding the last known location of the white ball. When applied to the thresholded positive difference image from the previous section (Figure 4.7c) only the bright ball remains, as shown in Figure 4.11.

(a) Original frame cutout (b) Intensity differences (c) Eroded cue ball area

Figure 4.11: Original frame cutout (a), thresholded positive intensity differences (b) with gener-ated background image from Figure 4.6b and eroded cue ball area (c), indicgener-ated in red.

4.2.4 Potting balls

As mentioned in section 2.1, the main goal in snooker is to pot balls. The system thus also has to detect the event of a potted ball to include it in the reconstruction. Denman et al. proposed a simple yet effective method for detecting objects leaving a scene at a particular location, showcased in the context of snooker [5]. They define two regions originating from each pocket: a small region and a larger, encompassing region. These are squares measuring 1

3 respectively 1 6

of the table width. A sufficient size for fast moving balls to appear within the regions while in motion. In each region the amount of ball detections is counted. By computing the difference between ball counts across two consecutive frames it is possible to determine which events took place, as explained in Figure 4.12.

(a) Rebound from cushion: one fewer ball only in inner region

(b) Successful pot : one fewer ball in inner and outer region

(c) Leaving pocket area: one fewer ball only in outer region

Figure 4.12: Detected events based on ball counts in two overlapping regions originating from each pocket, based on previous (top) and current (bottom) frame.

(27)

4.2.5 Constructing trajectories

To complete the system’s match analysis component the previous techniques are combined into a ball tracking routine. The aim of this routine is to construct a trajectory for each ball on the table. Tracking starts by detecting balls in the first video frame as described in section 4.2.2. Next, the colour of each ball is determined. Preliminary experiments with the system’s camera hardware exposed colour accuracy limitations which prohibit the use of colour classification for this purpose. An example of this can be seen in Figure 4.2a, where the pink ball appears rather white. To circumvent these limitations, the detected balls are matched with the known initial table layout as shown in Figure 2.1, based on their relative positioning. This provides informa-tion about what the colour of each ball should be and what it looks like in the scene. Based on this a custom colour mapping is constructed to allow classification of replaced balls further in the process. Colour classification was found to only be accurate within the initial ball area as indicated in Figure 4.6a, hence it cannot be used to aid tracking itself.

With the initial targets in place, tracking proceeds by finding the best assignment between known trajectories and newly detected balls. This introduces a combinatorial optimisation prob-lem where the last known location of each ball (trajectory) should be associated with a candidate ball location in the next frame of the video sequence [19]. In general form this is known as the assignment problem, defined as follows:

An optimal assignment of a number of agents to a number of tasks should be chosen, assuming that a cost is given for every agent-task assignment. An optimal assignment is one which minimises the sum of the agent’s cost for their assigned tasks [20].

In the context of object tracking, the agents and tasks are trajectories and detections, where the cost of an assignment is given by the euclidian distance between the two [19]. There exist n! possible assignments for two sets of n trajectories and detections, of which several may be optimal. Considering all assignments in a brute-force manner is therefore physically impossible given 22 balls on the table. A more efficient O(n3) algorithm for obtaining an optimal assignment is the Kuhn-Munkers algorithm [20]. When applied to an example matrix of the costs for assigning trajectories Tito detections Di this works as follows:

40 60 15 25 30 45 55 30 25 25 45 0 0 5 20 30 5 0 25 40 0 0 0 20 30 0 0 25 40 0 0 0 20 30 0 0 25 40 0 0 0 20 30 0 0 T1 T2 T3 D1 D2 D3 D1 D2 D3 D1 D2 D3 D1 D2 D3 D1 D2 D3 (1) (2) (3) (5)

1. Row reduction: Find the minimum cost in each row, then subtract this cost from all entries in the row.

2. Column reduction: Find the minimum cost in each column, then subtract this cost from all entries in the row.

3. Test optimalilty: Draw the minimum number of straight lines (vertical or horizontal) trough the matrix to cover all zeros. If this number is equal to the amount of rows and columns an optimal assignment can be made, proceed to step 5.

4. Shift zeros: If the minimum number of lines is lower than the amount of rows and columns, find the minimum cost that is not covered by a line. Subtract this cost from each uncovered value, and add it to every value that is covered by two intersecting lines. Then go back to step 3. In the example above this step is not necessary.

5. Make assignment : Assign each detection to a trajectory by choosing zeros such that each row and column of the matrix contain a single chosen zero.

(28)

It is possible that the amount of rows and columns in the cost matrix do not correspond. Under these circumstances the assignment problem can still be solved by augmenting the cost matrix with an extra row or column filled with zeros, to make it square. The augmented row or column is discarded during the last step, since it cannot actually be assigned. There are several events that could cause a discrepancy between the amount of trajectories and detections. To achieve an accurate analysis of the match these events are detected and the trajectories and detections are updated accordingly:

• Player interference: In each frame, the player area is detected as described in section 4.2.3. Each ball with a last known location in this area is assumed to remain in place. A detection at that location is manually added for assignment.

• Potted balls: If the amount of detections is lower than the amount of tracked balls, potted balls are detected as described in section 4.2.4. For each potential pot, a detection is manually added to the corresponding pocket location. Tracked balls at these locations are outside of the playable area and are marked for removal. To avoid removal due to falsely detected pots they are still eligible for assignment to detections within the next 10 frames. • Replaced balls: If there are persistently more detections than tracked balls over the course of 50 frames, a replaced ball is detected and added to the set of tracked balls. Its colour is classified based on the minimum euclidian distance in LAB colour space with regards to the colour mapping established during the system’s initialisation.

• Lost balls: If there are persistently fewer detections than tracked balls over the course of 50 frames and no pots are detected, it is assumed that a tracked ball got lost. This can happen if a ball is potted while moving too fast to be detected, or should a ball be manually removed from the table for some reason. Balls that are not assigned to any detections at this point are removed.

The cue ball is of particular importance in the reconstruction of a match. Therefore it is desirable to incorporate precautions in the assignment procedure that minimise the chances of a different detection being wrongfully assigned to its trajectory. An important observation that can be made from sample images of the detected ball areas like those in Figure 4.8a offers a solution here: unlike most other balls, the cue ball is always associated with a remarkably large area. This is due to its white colour, which is significantly brighter than the table surface. The same is true for the yellow and pink ball, but all 19 other balls are easily distinguishable from these three. This information is used to separate detections into two groups: one for the bright balls and another for the dark balls. Aside from making the tracking routine less error-prone, this also reduces the overall complexity of the assignment problem.

Once match analysis is finished, all obtained trajectories are exported in JSON format. Rather than listing the location for each ball at every individual frame, all repeated locations are dis-carded. This greatly reduces the disk size of the reconstruction files, since most balls on the table remain stationary for long consecutive time periods.

4.3 Reconstruction

For the reconstruction component of the system a web application is created using the A-Frame4

graphics framework. A benefit of this framework is that it operates in the browser, making reconstructed matches readily accessible from any device without installation. Furthermore, development using a combination of HTML and JavaScript provides rapid prototyping capabilities and a familiar environment for most developers.

Upon launch the application displays the three-dimensional scene as shown in Figure 4.13a. Initially this scene only consists of an empty table model, without any balls. These can be loaded from a JSON file containing the ball trajectories. Each ball is then rendered as a coloured sphere on the table surface. An example reconstruction of the initial table layout is shown in Figure 4.13b.

(29)

(a) Empty reconstruction scene. (b) Loaded ball layout from Figure 4.2a.

Figure 4.13: Three-dimensional reconstruction of the table.

By dragging across the screen the user is able to rotate the camera in a 360 degree orbit around the table. Additional touch and mouse controls adjust the height and zoom of the cam-era. This allows for viewing the table from any desired perspective, as shown in Figure 4.14.

Figure 4.14: Various perspectives of the three-dimensional scene shown in Figure 4.13b. To visualise the effect of a stroke the ball trajectories are animated. Since the original video frame rate of 25 frames per second is relatively low for modern day graphics standards, the framework applies linear interpolation to the key frames defined by the known ball locations. This ensures a smoother image at rates up to 90 frames per second. Play and pause buttons can be used to control the animated ball movement in the scene.

(30)

(31)

CHAPTER 5

Experiments

The system developed during this research is designed to aid snooker players in their training, to revisit the strategic choices made during a match. For this purpose it is important that the three-dimensional reconstruction accurately reflects the reality of events that took place. This chapter describes an experiment to test the system’s accuracy.

Previous research that addressed the detection and tracking of snooker balls provided little to no details about how the proposed implementation was tested. Among the four studies mentioned in the related work (section 2.3) only Guo and Mac Namee include absolute success rates of their system [9]. Unfortunately they used a miniature pool table instead of a real, full sized snooker table, and only stationary images were considered. Therefore, their results were found unsuitable for comparison with this system. The experiment conducted in this research thus merely serves a purpose of evaluating its performance.

5.1 Method

An entire frame of a snooker match was recorded using the setup described in section 3.1. Players were not limited in their playing style by any means, to ensure a realistic scenario for the experiment. The match consisted of 58 strokes and 31 potted balls. This resulted in 28:11 minutes of footage with a resolution of 1296 by 972 pixels, at a fixed rate of 25 frames per second. During the match several special events took place, including potting of the cue ball. On another occasion a cue was temporarily put aside on the table surface, as shown in Figure 5.1a. Additionally, players used a variety of tools that elevate and support the tip of the cue for more advanced shots when the cue ball was out of reach. An example of this can be seen in Figure 5.1b. Since all of the above can also occur during real-world usage, the system is expected to behave appropriately.

(a) Cue placed on table surface (b) Use of a cue support tool

(32)

5.1.1 Reconstruction accuracy

The previously recorded footage was provided as input to the system’s match analysis com-ponent described in the previous chapter. Detected ball trajectories were then loaded in the three-dimensional reconstruction scene. To evaluate the system’s accuracy, this reconstruction was visually compared with the real situation in the video footage based on various aspects.

First, the detected ball locations in the reconstructed scene were considered. The moment just before a stroke is of particular interest here, since it requires the player to make an important strategic decision. The layout of the table greatly influences this decision, which underlines the importance of the reconstruction’s accuracy. The following questions were addressed before each individual stroke in the comparison:

• Are all balls at the correct position? • Are all balls of the correct colour? • Are there any spurious balls? • Are there any undetected balls?

In addition to the detected locations, ball trajectories are another useful asset for training purposes. These can give players more insight into the effects of a stroke. Accurate motion is of importance here. To evaluate the generated trajectories the following questions were addressed after each individual stroke in the comparison:

• How many balls were moved?

• Were potted balls successfully detected?

(33)

CHAPTER 6

Results

Four example frames from the experiment’s match footage and the corresponding reconstruction in the three-dimensional scene are shown in Figure 6.1.

(a) Stroke 11 (b) Stroke 28

(c) Stroke 38 (d) Stroke 46

Figure 6.1: Table area (top) and three-dimensional reconstruction (bottom) of four video frames in the experiment footage, before a stroke.

(34)

6.1 Reconstruction accuracy

6.1.1 Ball detection

At the start of all 58 strokes in the match a total of 768 balls appeared in the video footage. The reconstruction displayed 744 (96.9%) of these correctly in terms of both position and colour. Another 22 balls (2.9%) were positioned correctly but had an incorrect colour. 2 balls were undetected. In addition to the real detections a total of 15 spurious balls appeared. Figure 6.2 shows the distribution of all detections throughout the match.

0 5 10 15 20 25 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58

Amount

of

ba

lls

Stroke

Spurious

Undetected

Incorrect colour

Fully correct

Figure 6.2: Amount of fully correct, incorrectly coloured, spurious and undetected balls in the reconstruction before each of the 58 strokes in the match.

6.1.2 Trajectories

In the entire match there were a total of 158 individual ball movements. Ball trajectories in the reconstruction were found to accurately reflect 138 (87.3%) of these. The majority of the inaccurate trajectories ended prematurely due to undetected pots: out of 31 pots, the system successfully recognised 17 (54.8%). This was caused by rapid ball movement, where the system was unable to detect the moment a ball entered the pocket. Additionally, the reconstruction contained 2 falsely detected pots.

(35)

CHAPTER 7

Discussion

The success rates of the detected ball locations and trajectories resulting from the experiment indicate an overall high accuracy of the system. This is sustained by the examples shown in Figure 6.1, where three out of four reconstructions are unblemished. The fourth example, Figure 6.1d, does show undesired behaviour of the system. Although the top part of the cue was successfully ignored for being larger than the expected ball size, a small bright spot in the cue handle resulted in a spurious green ball in the reconstruction. In a subsequent stroke a true blue ball detection was wrongfully assigned to the trajectory of this ball. Figure 6.2 illustrates this event, where the spurious ball detection for stroke 46 is followed by a series of colour discrepancies. This error persists during the remainder of the match, since no colour classification is performed when assigning detections to existing trajectories.

Other spurious balls in the experiment were a consequence of undetected potted balls. If the potting event remains undetected the ball is not removed from the table. Instead, the sys-tem will only proceed to remove the ball if it finds itself unable to assign any detections to the its trajectory during a longer sequence of frames. False detections in the ball’s vicinity during this time can delay its removal. For this reason, Figure 6.2 often shows a spurious ball just before the total ball count is lowered. If the removal of an undetected potted ball is postponed such that a coloured ball is potted in the meantime, the new detection of the replaced ball is wrongfully assigned to the spurious ball’s trajectory. This happened to a spurious red ball in stroke 43, when the new detection of a black replaced ball was assigned to it’s trajectory. This caused a colour discrepancy from stroke 44 until the end of the match, as shown in Figure 6.2. For this reason the black ball in the video frame appears red in the reconstruction in Figure 6.1d. To prevent this kind of propagating errors from happening it would be necessary to incorpo-rate colour data in the tracking procedure described in section 4.2.5. Unfortunately, as explained in that same section, the colour accuracy of the system’s camera module is insufficient for this purpose. The large amount of undetected pots in the reconstruction also emerges from camera limitations. At the current recording rate of 25 frames per second balls moving at high speed often cannot be captured while approaching a pocket. Further improvement of the reconstruction could therefore be achieved by using a different camera, should the increased accuracy outweigh potentially higher hardware costs.

Whilst the Raspberry Pi that was used to record the experiment footage was capable of performing the actual match analysis, limitations in computational power undermine its practical use. A measured average processing rate of 2 frames per second would introduce an excessive delay between playing a match and being able to view the reconstruction. To accelerate this process, match analysis is offloaded to more powerful external hardware.

(36)

7.1 Conclusion

Computer vision has has proven itself as a successful tool to aid players in various sports. The aim of this research is to investigate the usability of computer vision techniques in a system for the analysis and three-dimensional reconstruction of single-camera snooker video. Accuracy of the reconstruction is of particular interest, to ensure its usability for training purposes.

The proposed system operates on a top down video of a snooker table from which it recon-structs all ball trajectories throughout a match. The entire match analysis procedure does not require any human intervention. Instead, it relies on computer vision techniques for every step, including scene calibration, table detection and ball tracking. A web application is created to visualise the resulting reconstruction in a three-dimensional scene, allowing players to review their match from any desired point of view. In an evaluation of the system 96.9% of all balls before each stroke were correctly reconstructed. Trajectories were accurate for 87.3% of the ball movements. These results indicate satisfying accuracy for use in training.

7.2 Future research

As explained before, the current implementation of the system can suffer from propagating colouring errors caused by spurious ball detections. This problem can be solved by changing the camera hardware and performing ball colour classification for every step. Alternatively, software-based solutions that minimise the chances of incorrect trajectories can be researched. An example would be the use of a physics-based model to predict the next location of each ball in the scene based on simulation. The current tracking routine uses only the last known location of each ball for its assignment, leaving knowledge about prior movement in disuse.

Besides improving the system’s match analysis component, the functionality of the recon-struction application can be expanded. For example by adding support for visualising individual ball trails. A more advanced feature that can be useful for the purpose of training would be to simulate new strokes within the application, again based on a physics-based model. This allows players to immediately observe the effects of an alternative stroke compared to the reality of the match.

(37)

Bibliography

[1] B. Babenko, M. Yang, and S. Belongie. “Visual tracking with online Multiple Instance Learning”. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, pp. 983–990. doi: 10.1109/CVPR.2009.5206737.

[2] D. Bolme et al. “Visual object tracking using adaptive correlation filters”. In: IEEE Com-puter Society Conference on ComCom-puter Vision and Pattern Recognition. IEEE, 2010. doi: 10.1109/cvpr.2010.5539960.

[3] J. Canny. “A Computational Approach to Edge Detection”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8.6 (1986), pp. 679–698. doi: 10.1109/ TPAMI.1986.4767851.

[4] W.-Y. Chang, C.-C. Chiu, and J.-H. Yang. “Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees”. In: Sensors 15.9 (2015), pp. 23763–23787. doi: 10.3390/s150923763.

[5] H. Denman, N. Rea, and A. Kokaram. “Content-based analysis for video from snooker broadcasts”. In: Computer Vision and Image Understanding 92.2 (2003). Special Issue on Video Retrieval and Summarization, pp. 176–195. doi: 10.1016/j.cviu.2003.06.005. [6] R. O. Duda and P. E. Hart. “Use of the Hough transformation to detect lines and curves in

pictures”. In: Communications of the ACM 15.1 (1972), pp. 11–15. doi: 10.1145/361237. 361242.

[7] H. Grabner, M. Grabner, and H. Bischof. “Real-Time Tracking via On-line Boosting”. In: Procedings of the British Machine Vision Conference 2006. British Machine Vision Association, 2006. doi: 10.5244/c.20.6.

[8] O. Grau, M. C. Price, and G. A. Thomas. “Use of 3D techniques for virtual production”. In: Videometrics and Optical Methods for 3D Shape Measurement. Ed. by S. F. El-Hakim and A. Gruen. SPIE, 2000. doi: 10.1117/12.410895.

[9] H. Guo and B. Mac Namee. “Using computer vision to create a 3d representation of a snooker table for televised competition broadcasting”. In: AICS:The 18th Irish Conference on Artifical Intelligence & Cognitive Science (2007).

[10] J. F. Henriques et al. “Exploiting the Circulant Structure of Tracking-by-detection with Kernels”. In: Proceedings of the 12th European Conference on Computer Vision. Vol. IV. Springer-Verlag, 2012, pp. 702–715. doi: 10.1007/978-3-642-33765-9_50.

[11] M. Jaros. Snooker table drawn to scale. CC BY 2.5 (https : / / creativecommons . org / licenses/by/2.5). Retrieved 21-5-2019. 2006. url: https://commons.wikimedia.org/ wiki/File:Snooker_table_drawing.svg.

[12] G. A. J. Jinchang Ren James Orwell and M. Xu. “Tracking the soccer ball using multiple fixed cameras”. In: Computer Vision and Image Understanding 113.5 (2009). Computer Vision Based Analysis in Sport Environments, pp. 633–642. doi: 10.1016/j.cviu.2008. 01.007.

[13] Z. Kalal, K. Mikolajczyk, and J. Matas. “Forward-Backward Error: Automatic Detection of Tracking Failures”. In: 20th International Conference on Pattern Recognition. 2010, pp. 2756–2759. doi: 10.1109/ICPR.2010.675.

(38)

[14] Z. Kalal, K. Mikolajczyk, and J. Matas. “Tracking-Learning-Detection”. In: IEEE Trans-actions on Pattern Analysis and Machine Intelligence 34.7 (2012), pp. 1409–1422. doi: 10.1109/TPAMI.2011.239.

[15] D. Kriegman. “Homography estimation”. In: Lecture Computer Vision I (2007). Retrieved 29-5-2019. url: https : / / cseweb . ucsd . edu / classes / wi07 / cse252a / homography _ estimation/homography_estimation.pdf.

[16] P. A. Legg et al. “Intelligent filtering by semantic importance for single-view 3D reconstruc-tion from Snooker video”. In: 18th IEEE Internareconstruc-tional Conference on Image Processing. 2011, pp. 2385–2388. doi: 10.1109/ICIP.2011.6116122.

[17] Y. Ling et al. “The detection of multi-objective billiards in snooker game video”. In: Third International Conference on Intelligent Control and Information Processing. 2012, pp. 594– 596. doi: 10.1109/ICICIP.2012.6391406.

[18] A. Lukezic et al. “Discriminative Correlation Filter with Channel and Spatial Reliability”. In: International Journal of Computer Vision 126 (2016). doi: 10.1007/s11263- 017-1061-3.

[19] E. Maggio and A. Cavallaro. Video Tracking: Theory and Practice. John Wiley & Sons, Ltd, 2011. doi: 10.1002/9780470974377.

[20] J. Munkres. “Algorithms for the Assignment and Transportation Problems”. In: Journal of the Society for Industrial and Applied Mathematics 5.1 (1957), pp. 32–38. doi: 10.1137/ 0105003.

[21] _{OpenCV. Fisheye camera model. Retrieved 21-5-2019. url: https://docs.opencv.org/} 4.1.0/db/d58/group__calib3d__fisheye.html.

[22] _{OpenCV. Image Filtering. Retrieved 21-5-2019. url: https://docs.opencv.org/3.4.3/} d4/d86/group__imgproc__filter.html.

[23] N. Owens, C. Harris, and C. Stennett. “Hawk-eye tennis system”. In: 2003 International Conference on Visual Information Engineering VIE 2003. 2003, pp. 182–185. doi: 10 . 1049/cp:20030517.

[24] W. Shen and L. Wu. “A method of billiard objects detection based on Snooker game video”. In: 2nd International Conference on Future Computer and Communication. Vol. 2. 2010, pp. V2-251-V2-255. doi: 10.1109/ICFCC.2010.5497393.

[25] G. Thomas. “time camera tracking using sports pitch markings”. In: Journal of Real-Time Image Processing 2.2 (2007), pp. 117–132. doi: 10.1007/s11554-007-0041-1. [26] G. Thomas et al. “Computer vision for sports: Current applications and research topics”.

In: Computer Vision and Image Understanding 159 (2017). Computer Vision in Sports, pp. 3–18. doi: 10.1016/j.cviu.2017.04.011.

[27] W P BSA, The World Professional Billiards and Snooker Association. “Official Rules of the Games of Snooker and English Billiards”. In: World Professional Billiards and Snooker Association (2014). url: www.wpbsa.com/governance/rules-of-snooker/.

Unsupervised Monitoring and 3D Reconstruction of Single-Camera Snooker Video

Bachelor Computer Science