Insight Architecture - The design of the Insight pipeline for behavioral animal science & anima

System architecture can be defined as the designing of a set of guidelines for a product in order to comply with a set of requirements. System (or Software) architecture and requirements complement and justify each other with the architecture explaining how a feature is achieved and requirements set-ting what feature needs to exist.

As explained in Section 4.2 , the requirements elicitation was an iterative task during this project. In a project with so many moving parts and multiple teams working on it, a new update could require the revisit and review of the requirements. Due to the research nature of the project, an iterative prototype-based design was chosen. Since the results needed to be shared and validated with multiple teams, a Rapid Prototyping Approach [18] was followed.

I start this chapter by describing the design challenges that motivated the design. After that, I briefly describe the architecture of the system that contains Insight. Lastly, I explain Insight design, implemen-tation, and evaluation.

5.1 Design Challenges

During the design phase of Insight, a set of challenges were encountered. These challenges are related to the requirements of the project and they influenced the design choices. These were either the most important features or the hardest-to-tackle problems.

The design challenges are divided into functional or non-functional challenges, depending on if they are related to functional or non-functional requirements respectively.

5.1.1. Functional challenges

These challenges are related to the functional requirements. These requirements define basic system behavior; they are features that allow the system to function as intended.

In the case of Insight, the main functional design challenge was the sequential processing of the data.

From the first three Insight Requirements (IR1, IR2, and IR3) it is stated that the system must accept data, analyze it, and produce reports and/or plots. These actions must be strictly executed in order. First, the data must be read from the source. After that, the data must be analyzed. This analysis can change depending on the situation and the demands of the user. Lastly, the output will be produced depending on the analysis.

This challenge presented the perfect opportunity to use the software pipeline design pattern [19]. This design pattern allows to build and execute a sequence of operations.

5.1.2. Non-functional challenges

While the functional requirements defined what the system should do, the non-functional requirements define how the system should do it. Even if these requirements were not met, the system would perform its basic purpose. These requirements are important because they improve usability. In the case of In-sight, they are related to extensibility and scalability.

Extensibility

This design challenge is based on the Insight Requirement related to extensibility (IR6, IR6.1, IR6.2, IR6.3). Being a research project, some decisions such as the type of data, the type of animals, or the

plots generated may need to be changed or extended in the future. Insight was designed considering this to make it simple to extend any new functionality needed.

Scalability

This design challenge is based on the Insight Requirement related to extensibility (IR5, IR5.1, IR5.2).

Since the number of animals can change based on the pen, the system must be able to handle a variable number of animals. The performance of the system and the quality of the outputs should not be greatly influenced by a large number of individuals in the input. If the pipeline produces an output that is more demanding in resources of memory (e.g., a video as output) it should ensure that the creation and de-struction of objects are properly managed so that no memory leaks happen. Additionally, computations that only require individual animal data could be executed in parallel, improving the performance of the system (if that would be required).

Lastly, Insight is conceived to be run in the processing layer of a platform used by multiple team mem-bers at the same time (PR1, IR5.1). The pipeline must allow multiple independent executions coming from different users with variable configurations.

5.2 System Design

As mentioned in Section 1.3, Insight is integrated into the Discovery Informatics Platform of the IMAGEN project. In this platform, the animal data obtained by different sources (i.e. the farms) is processed. This data (generally videos) is used to train AI models as well as to validate said models.

The results of this validation process are files containing bounding boxes of the animals. These files are used as input for Insight to produce information such as the trajectories of the animals or a summary of their states.

An overview of this process is shown in Figure 8. In this Figure, the Input Data is divided into Training and Testing. The training data is annotated using the CVAT to produce Annotated videos. These An-notated videos are used for training a DL model. This model and the testing data are used to detect and track the animals. This tracking information is used as input for Insight (in blue) to produce plots.

Figure 8. Overview of the system. Insight pipelines and output are marked in blue

5.3 Prototype Design

Insight takes AI-generated data and processes it into easier-to-understand plots or analytics. To achieve this the following components are needed:

• A reader component that reads and preprocessed concrete data.

• An analyzer component that processes the data.

• A producer component that outputs reports or graphics.

Using these components along the pipeline pattern produces a high-level design as shown in Figure 9.

In this Figure, the Reader component reads the raw data and sends it to the Analyzer who processes the data and decides what is needed to plot. This information is then sent to the Plotter who generates the output.

Figure 9 High-level overview of Insight pipeline

Figure 10 shows a class diagram that provides a detailed view of the components in Figure 9 adapted to the needs of IMAGEN. In this Figure, the components are represented as software components in a class diagram. I will describe these components and their implementation in the following section.

Figure 10. Class diagram of Insight pipeline

class Starter Class Diagram

+ __init__(int, double, double, int, int, int, int) + append_position()

5.3.1. Insight components

In the class diagram shown in Figure 10, the components have been realized as software classes. Starting in the middle of the diagram there is the Analyzer class. This is the main class of Insight. It takes input files and analyzes them based on a configuration file. This object contains multiple components such as the Reader, the Plotter, and a list of Animal objects.

The data to be analyzed is read by the Reader object. This object first reads a configuration file using its ConfigurataionManager component. Once the configuration has been established it reads the data from the path written in the configuration. Different types of data can be read depending on the config-uration. The Reader class was added as a software interface to decouple the dependencies between the analyzer and the data to be read. The Reader class has the responsibility in regard to the reading of the data. Reader can be extended if new types of data need to be included in Insight so (Extensibility design challenge.)

Once the data has been read it is placed in a Pandas [20] data frame. This data is used by the Analyzer to generate Animal objects of the specific animal (i.e., Pig objects or Chicken objects). These objects are stored in a list and each of them contains all the information related to each specific animal (i.e., animal IDs and bounding boxes). These objects also contain indirect quantities such as the velocity or the animal states that are calculated during the creation of the object.

The Animal abstract class allows new specific animal classes (such as Turkey) to be created if Insight needs to be extended.

The Analyzer object also contains a Plotter object. This object handles plot creation and acts as a soft-ware interface to separate specific plotting behavior from the Analyzer. The Plotter object creates Plot objects based on the configuration. Having Plot as an abstract class allows Insight to be extended with new plots if they are required.

5.3.2. Insight implementation

As mentioned in Section 5 the design and implementation of Insight followed a Rapid Prototyping Approach. For this, a prototyping language such as Python (v3.8.5) was found a perfect candidate.

Additionally, Python has an extended list of public libraries such as known as MatPlotLib (v3.1.4) [21]

(a very powerful plotting library) or Pandas (v1.2.3) [20] that fit the needs of the pipeline perfectly.

Lastly, other parts of the pipeline (e.g., computer vision module) also are build in Python so this made the interfaces between the pipelines easier to create. In this section, I explain the most relevant imple-mentation details of the pipeline.

Configuration file

Insight can be configured with a YAML [22] file. YAML is a file format that is intuitive to understand for non-technical users. Thanks to this file the user can choose different options such as which animals need to be studied, the path of the input files, the searching radius (relevant for proximity-based plots), or which plots to obtain as output.

An example of the configuration file is shown in Figure 11 below. In this Figure, data and images are the folders chosen as the input folders. The type of animal to be analyzed is pig and all the animals will be considered (by choosing all in desired_animal_indices). If all is not selected a subset of the animals can be selected by their ID. The plots to be obtained are also chosen here, with the one having a # in their row being excluded.

25 Figure 11. Configuration file. The user has chosen to obtain a Graph plot with all the animals.

The animals to be analyzed are pigs and the searching radius is 100 pixels

This configuration file allows Insight to be run as multiple parallel jobs in the IMAGEN platform;

enabling users to perform analysis under different sets of configurations.

Radius

All the proximity-based plots use the radius attribute as a threshold. This radius is used to count the number of frames in which a pair of animals are close to each other. The distance between a pair of animals is calculated as the Euclidean distance [23] between the middle points of the bounding boxes.

If this distance is lower than the radius, the counter is increased. A graphical representation of the distance is shown in Figure 12 below.

This radius is measure in pixels. Since the number of pixels depends on the camera quality (which can vary depending on the farm), a way of automating the radius was implemented. This feature allows Insight to suggest and use the most appropriate radius for the situation, in an adaptative way. This suggested radius is calculated taking into account the average size of an animal bounding box in the full scene.

Figure 12 The distance between two animals is calculated as Ecludien distance

By varying the searching radius, the user can opt for a finer detection of proximity. The smaller this radius is, the closer two animals need to be for being considered an interaction. The results of the anal-ysis are deeply influenced by the choice of the radius.

Animal States

With the current AI algorithms available in IMAGEN, only the position of the animals is detected. This space and time information is not enough for Insight to produce complex behavior analysis but a simple level of animal activity can be achieved.

Having the positions among different frames and the time between frames Insight can estimate the velocity of each animal. These individual velocities are averaged and sorted. Animals whose velocity falls in the first quartile are considered passive while the animals in the top quartile are considered very active. Animals in the middle quartiles are considered active.

This way of selecting the states compares individual activity and compares it with the rest of the group.

While it is not precise, it is adaptable for different groups. It was discussed and approved with animal behavior experts.

5.3.3. Insight evaluation

Any software system built must be verified and validated. These are the processes that ensure that the product reaches the expected levels of quality and contains all the required features.

The verification process checks that the system or product meets a set of design specifications. For this purpose, one must ensure that all the parts of the system do exactly what they are supposed to do.

In the case of Insight, this process was done manually by checking that the previous results were con-sistent and unchanged when new functionality was added. To ensure that the coding style is readable, efficient, and consistent, the PEP8 Python style guide [24] was followed. This style is widely used by the Python community and it is considered the standard style. The code was automatically checked with a Visual Studio Code extension called autopep, which highlights any infraction to PEP8.

The validation process checks that the system or product meets the operational needs of the user. For this purpose, one must ensure that the usage and the results comply with the main stakeholders'

27 specifications. In the case of Insight, this process was done iteratively on demonstration sessions throughout the project. There were two types of sessions:

• Brainstorm sessions. In these sessions, I would meet with a reduced group of members of IMAGEN subteams (Pig project or Chicken project) and brainstorm about possible features to include in the prototypes.

• Demonstration sessions. In these sessions, I would meet the whole Imagen team and demon-strate the results of the prototype. These sessions allowed me to show not only my results but also receive feedback and suggestions. The sessions were interactive and the participation of the stakeholders was encouraged.

As a result of these evaluation meetings, Insight reached the level of Minimum Valuable Product at an early stage of the project which allowed me to increasingly add more features. Another result obtained from these sessions was a case study applying Insight to the data of the pig project. This case study is presented in the following chapter.

6. Case Study

In this chapter, I analyze a case study of applying Insight to a one-minute duration video scene analyzed with IMAGEN’s deep learning algorithms. This video scene contains 11 pigs living on a pen and it is segmented into 30 frames. An example of a frame can be seen in Figure 13.

Which analysis is done on this video and which plots the Insight system produces can be configured with a configuration file such as the one from Figure 11.

Figure 13. Example of a frame

Insight takes as input a Multi-Object Tracking (MOT) file such as the one in Figure 6 and generates Pig objects (as many as pigs are detected in the scene). Each of these Pig objects contains a specific ID of the animal and its position given by the bounding boxes on each frame.

The first plot Insight produces based on the MOT file is a trajectory plot as seen in Figure 14. In this plot, the x-axis and y-axis represent the horizontal and vertical distances of the camera’s field of view in pixels. The lines on the plot are the trajectories of individual pigs, i.e., trajectories of the centers of the bounding boxes for each animal ID. Each animal has its trajectory drawn with a different color. The starting point of each trajectory is marked with a red circle and the ending point is marked with a black star.

The trajectory plot can also be superimposed on top of a GIF generated from the frames, making it easier for the human eye to keep track of an animal’s trajectory.

29 Figure 14. Trajectory plot of pigs in a pen. Red dots are the starting points and black stars are

the ending points.

Given the trajectory, the system can also determine the total distance traveled by each individual pig.

This is an important metric for farmers and breeders. Breeders care about distance traveled because the activity of a pig is related to his feeding efficiency. The more that a pig moves, the more food it needs to grow.

Insight obtains the distance traveled and plots it in a bar plot as shown in Figure 15. This plot shows the individual distance traveled per animal in pixel units.

Figure 15. Distance traveled per animal.

The two previous plots can also be obtained for only a subset (chosen in the configuration file) of the pigs in the scene. This can be useful for the users that only want to study specific individuals. Examples of these same plots with a smaller subset are shown in Appendix C.

The proximity between animals is a crucial factor when analyzing aggressive or positive behavior. In Insight, multiple plots are available to visualize the proximity between certain pairs of animals. For each pair of animals, Insight calculates the distance between them and counts in how many frames this distance is under a predefined threshold.

The number of close encounters between each pair of animals defines a (symmetric) adjacency matrix.

This matrix can be visualized with a Heatmap plot as shown in Figure 16. In this figure, each row and column correspond to a pig and the number in the intersection is the number of frames these two pigs are near each other (e.g., Animal 1 and 3 are close to each other in 38 frames). The diagonal (i.e., where two equal IDs intersect) must be ignored as it has no meaning.

Figure 16. Heatmap plot representing the proximity between pair of pigs

One of the motivations of Insight is to obtain visual information that is understandable. Due to this, plots that can be understood with a single glance are preferred. Insight combines the aforementioned adjacency matrix with velocity information (obtained from the differences in the bounding boxes throughout multiple frames) to generate a social network. This social network is represented as a graph [25] where the nodes represent the pigs and the edges between them represent interactions.

The Graph plot presented in Figure 17 provides information in two additional ways, color and size. The nodes are colored based on the state of the pig and the edges connecting these nodes have their color and size variables depending on the number of interactions.

31 Figure 17. Social network plot on pig data. The proximity radius is 80 pixels

In the figure above, the animals are presented with their ID inside of each node. Each node is colored depending on the predominant state of the animal during the scene. Passive animals are colored in grey, active animals are colored in green, and very active animals are colored in red. A passive animal near the feeding station is considered as eating and colored in blue.

The states in the social network are obtained by calculating the average velocity of each pig and assign-ing these values to an array. This array is sorted and divided into quartiles. Animals in the lowest quartile are considered passive. Animals in the highest quartile are considered very active. The rest are consid-ered active.

A similar process using the number of interactions results in the different colors of the edges. In this case, the interaction with the greatest number of frames is considered as a reference. Interactions with a number lower than one-third of the reference are considered small contacts (green edges). Interactions with a number from one-third to two-thirds are considered medium contact (yellow edges). Lastly, the

In document The design of the Insight pipeline for behavioral animal science & animal breeding (pagina 47-71)