University of Groningen Relationship between Granger non-causality and network graph of state-space representations Jozsa, Monika

(1)

University of Groningen

Relationship between Granger non-causality and network graph of state-space

representations

Jozsa, Monika

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2019

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Jozsa, M. (2019). Relationship between Granger non-causality and network graph of state-space representations. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Relationship between Granger non-causality and

network graphs of state-space representations

(3)

This research has been carried out at Johann Bernoulli Institute for Mathematics and Computer Science, Faculty of Mathematics and Natural Sciences, University of Groningen, The Netherlands, and at the Research Unit Informatics and Control Systems (URIA) of IMT-Lille-Douai, France.

The work reported in this dissertation is part of the research program of the Dutch Institute of Systems and Control (DISC). The author has successfully completed the graduate program of DISC.

Printed by: Ridderprint BV, www.ridderprint.nl ISBN (printed version): 978–94–034–1296–2 ISBN (electronic version): 978–94–034–1295–5

(4)

Relationship between Granger non-causality and

network graphs of state-space representations

PhD thesis

to obtain the degree of PhD at the University of Groningen

on the authority of the Rector Magnificus Prof. E. Sterken

and in accordance with the decision by the College of Deans. This thesis will be defended in public on

25 February 2019 at 9:00 hours

by

M ´onika J ´ozsa

born on 4 July 1989 Budapest, Hungary

(5)

Supervisors: Prof. Dr. M. Kanat Camlibel Dr. Mih´aly Petreczky

Assessment committee: Prof. Dr. Arjan van der Schaft Prof. Dr. Peter E. Caines Prof. Dr. Ralf L. M. Peeters

(6)

List of Figures

1 Three-node star graph as a network graph of an LTI–SS representation 4

2.1 Network graph of Kalman representation in block triangular form. . 37

3.1 Network graph of Kalman representation in coordinated form . . . . 54

4.1 Network graph of a Kalman representation with TADG-zero structure 78

5.1 Network graph of an innovation transfer matrix with TADG-zero structure . . . 112

5.2 Example for TADG and non-TADG decomposition of innovation rep-resentation . . . 114

6.1 Network graph of the LTI–SS representation with block triangular system matrices . . . 121

6.2 Network graph of a GB–SS representation with block triangular sys-tem matrices . . . 125

7.1 Network graph of the reference representation in block triangular form159

7.2 Illustration of type I error of Granger causality test. . . 161

7.3 Illustration of type II error of Granger causality test . . . 162

7.4 Network graph of the reference representation in coordinated form . 165

7.5 Network graph of the reference representation with TADG-zero structure . . . 169

(11)

(12)

List of Tables

7.1 Dimension settings for reference representation in block triangular form. . . 160

7.2 Results on estimating the system matrices of minimal Kalman repre-sentation in causal block triangular form. . . 164

7.3 Dimension settings for reference representation in coordinated form. 165

7.4 Results on estimating minimal Kalman representation in causal coor-dinated form. . . 167

7.5 Results on estimating minimal Kalman representation with causal G-zero structure. . . 170

(13)

(14)

List of Algorithms

1 Minimal Kalman representation based on output covariances . . . . 20

2 Minimal Kalman representation based on LTI–SS representation . . . 21

3 Minimal innovation GB–SS representation based on output covariances 29

4 Minimal Kalman representation in causal block triangular form based on LTI–SS representation . . . 41

5 Minimal Kalman representation in causal block triangular form based on output covariances . . . 42

6 Kalman representation in causal coordinated form based on LTI–SS representation . . . 59

7 Kalman representation in causal coordinated form based on output covariances. . . 60

8 Extension of an observable Kalman representation in block triangular form. . . 84

9 Extension of an observable Kalman representation in block diagonal form. . . 85

10 Extension of an observable Kalman representation in coordinated form 88

11 Kalman representation with causal G-zero structure based on output covariances. . . 90

12 Kalman representation with G-zero structure based on LTI–SS repre-sentation . . . 90

13 Minimal innovation GB–SS representation in causal block triangular form. . . 125

14 Minimal Kalman representation in causal block triangular form . . . 140

15 Estimating system matrices of minimal Kalman representations in causal block triangular form . . . 141

16 Minimal Kalman representation in causal coordinated form . . . 143

(15)

LIST OF ALGORITHMS

17 Estimating system matrices of minimal Kalman representation in causal coordinated form . . . 144

18 Minimal Kalman representation with G-zero structure: . . . 145

19 Estimating system matrices of minimal Kalman representation with causal G-zero structure . . . 147

20 Empirical Geweke–Granger causality . . . 151

21 Calculating samples for the empirical distribution of zero Geweke– Granger causality . . . 153

(16)

List of Acronyms

EEG Electroencephalogram

fMRI Functional Magnetic Resonance Imaging GB–SS General Bilinear State-Space

LQG Least Quadratic Gaussian

LTI–SS Linear Time-Invariant State-Space MA Moving-Average

MEG Magnetoencephalography STD Standard Deviation

SVD Singular Value Decomposition TADG Transitive Acyclic Directed Graph VAR Vector Autoregressive

VARMA Vector Autoregressive Moving-Average

ZMSIR Zero-mean square-integrable with rational spectrum ZMWSSI Zero-mean weakly stationary with respect to input

(17)

(18)

Introduction

D

etecting interactions among stochastic processes can be of interest for severalapplications such as mapping interactions in the brain, predicting economical price movements or understanding social group behaviour. The first step towards these applications is to formulate an appropriate definition of interaction. In this thesis we consider two approaches for defining interaction. The first considers the stochastic processes as outputs of dynamical stochastic systems and concentrates on the information flow between systems that generate the interacting processes. The second approach focuses on statistical properties of the interacting processes. In all cases, the interactions will be one-directional, i.e., where one process influences another but the other does not influence the first. These are particularly interesting as they can be measured more easily compared to bi-directional interactions.

Consider a multidimensional stochastic process that is partitioned into compo-nents that interact with each other. Then the first approach is based on the exis-tence of a dynamical system having this process as its output process, such that it is decomposed into subsystems that communicate according to a so-called network graph: Informally, by the network graph of a system we mean a directed graph, whose nodes correspond to subsystems, such that each subsystem generates a com-ponent of the output process. There are as many subsystems as there are compo-nents of the output process. Regarding the edges of the network graph, there is an edge from one node to another, if the subsystem which corresponds to the source node sends information to the subsystem which corresponds to the target node. By interaction between two subsystems we mean that some processes of one subsys-tem serve as an input to another subsyssubsys-tem. In case of state-space syssubsys-tems it will mean that there is an edge from one node to another, if the state and noise process of the subsystem corresponding to the source node serve as an input to the sub-system which corresponds to the target node. In case of transfer matrices, we will

(19)

2 Introduction designate an edge, if the noise process of the subsystem of the source node serves as an input to the subsystem of the target node. As an example, let y be the output process of a stochastic dynamical system that is partitioned into two components such as y “ ryT

1, yT2sT. Then the network graph of a dynamical system which

gen-erates y has two nodes, the first one corresponds to the subsystem which gengen-erates y1and the second one corresponds to the subsystem which generates y2. The edges

of the network graph are determined by the information flow between the subsys-tems: there is a directed edge from one subsystem to another if the first one sends information to the other. If there is a dynamical system generating y, such that in its network graph there is an edge from the first node (corresponding to the subsystem of y1) to the second node (corresponding to the subsystem of y2), then we say that

y1influences y2. This idea is generalized for processes with several components.

The advantage of the this approach is that it offers an intuitive mechanistic ex-planation of how one component of the output process influences the other one. The apparent disadvantage, however, is that the same output process can be gener-ated by systems with different network graphs. As a result, the presence or absence of an interaction between two output components depends on the exact dynamical system that we choose to represent the output process.

Again, consider a multidimensional stochastic process that is an output process of a dynamical system and is partitioned into components. Then the second ap-proach for defining interaction among the components of the process is based on statistical properties of the process. A widely used example of this approach is called Granger causality, a purely statistical concept for defining linear causal re-lationships between processes: Denote the process in question by y and let it be partitioned into two components such as y “ ryT

1, yT2sT. Then intuitively, we can

say that y1Granger causes y2if the best linear prediction of y2based on the past

values of y is better than the one based on the past values of y2. For a process y that

is the output process of a linear dynamical system, we define directed interactions in yby the Granger causalities between y1and y2. This is further generalized for

pro-cesses with several components by using an extended version of Granger causality, called conditional Granger causality. As an extension of Granger causality, we will also define interaction in a process that is the output process of a so-called bilinear dynamical system.

As the second approach is based on statistical properties of processes, contrary to the first approach, it leads to definitions that do not depend on which dynamical system we use to represent y. However, it does not always offer an explanation of the mechanisms according to which the interaction takes place.

In summary, the first approach focuses on the mechanism inside a dynamical system but is too sensitive to the choice of the system itself. The second approach,

(20)

3 solving the issue with the first, is independent from which dynamical system we choose to represent the output process, however, it does not capture the inner mech-anism of the interaction in general. It is thus of interest, in order to benefit from the advantages of both approaches, to relate and find equivalence between the network graph of a dynamical system and statistical properties of its output process.

The following dynamical systems are subjects of this thesis: autonomous lin-ear time-invariant state-space (LTI–SS) representations, transfer matrices of au-tonomous LTI systems and general bilinear state-space (GB–SS) representations. For these system we derive results on the relation between network graphs and causal properties among the components of the output process. For this, we use the so-called innovation form of the representations, where we choose the noise process to be the so-called innovation process. It is then shown that the existence of a dynamical system in innovation form with a certain network graph is equivalent with causal relations among the components of the observed process. The results include algorithms for the construction of the dynamical systems in question.

Contribution

For simplicity, unless stated otherwise, the terms LTI–SS representation, transfer matrix and GB–SS representation mean autonomous stochastic LTI–SS represen-tation, transfer matrix of stochastic autonomous LTI system and stochastic GB–SS representation. To LTI–SS or GB–SS representations and to transfer matrices, we associate a so-called network graph. The network graph is defined based on the view of these systems as an interconnection of several subsystems. In this thesis we will restrict attention to network graphs which belong to one of the following classes of graphs: two node graphs with one edge, star graphs, and transitive acyclic graphs. The objective of this thesis is to associate the existence of systems having one of these network graphs to properties of the observed processes. The properties of the observed processes that help us to impose conditions and show implications of the existence of these systems are the so-called Granger causality, conditional Granger causality and GB–Granger causality. Below, we discuss network graphs and causality in more detail.

Network graph: Let y be an output process of an LTI–SS representation or a transfer matrix or a GB–SS representation and denote the system that represents y by S. Assume that y is partitioned such that y “ “y1T, . . . , yTn

‰T

and consider subsystems Si, i “ 1, . . . , n of the system S such that Si generates the component

(21)

4 Introduction S3 " x3pt ` 1q “ α33x3ptq ` β33e3ptq y3ptq “ γ33x3ptq ` e3ptq S1 " x1pt ` 1q “ři“1,3α1ixiptq ` β1ieiptq y1ptq “ři“1,3γ1ixiptq ` e1ptq S2 " x2pt ` 1q “ři“2,3α2ixiptq ` β2ieiptq y2ptq “ři“2,3γ2ixiptq ` e2ptq x3, e3 x3, e3

Figure 1:LTI–SS representation of a process y “ ryT

1, yT2, yT3sT with the three-node

star graph as its network graph: The state and noise process of subsystem S3serves

as an input to subsystems S1and S2.

node j, if the noise, and in case of state-space representations also the state process, of Siserve as an input of Sj. In fact, for the systems at hand, an edge pi, jq in the

network graph corresponds to non-zero blocks in certain matrix parameters of the system. In parallel, the lack of this edge corresponds to zero blocks in those matrices. Intuitively, an edge in the network graph means that information can flow from the subsystem corresponding to a source node to the subsystem corresponding to the target node, but there is no information flowing the other way around. Figure 1

illustrates the network graph of an LTI–SS representation having the three-node star graph as its network graph.

Causality:For an output process y ““yT

1, . . . , yTn

‰T

of an LTI–SS representation or a transfer matrix, we will study Granger non-causality and conditional Granger non-causality relations among the components y1, . . . , yn. Granger non-causality

(Granger, 1963) can be explained as follows: y1 does not Granger cause y2 if the

knowledge of the past values of y1and y2does not yield a more accurate prediction

of the future values of y2than the knowledge of the past values of only y2.

Condi-tional Granger non-causality is a general form of Granger non-causality: Informally, y1conditionally does not Granger cause y2 with respect to y3 if the knowledge of

the past values of y1, y2 and y3 does not yield a more accurate prediction of the

future values of y2than the knowledge of the past values of only y2and y3.

Among the components of the output process of a GB–SS representation, we will formulate an extended definition of Granger causality, called GB–Granger causality. Let y “ “yT

1, yT2

‰T

be the output process of a GB–SS representation and u be its input process. Then GB–Granger non-causality from y1to y2with respect to u has

(22)

5 the following intuitive meaning: the knowledge of all the products of the past y and u values do not give a more accurate prediction of y2than the knowledge of all

the products of the past y2and u values. In the trivial case when u is a constant

in-put, GB–Granger non-causality from y1to y2is equivalent to Granger non-causality

from y1to y2.

Results:The presentation of the main results is organized as follows: In Chapter2, we show that a process y ““yT

1, yT2

‰T

admits a specific LTI–SS rep-resentation in the so-called block triangular form if and only if y1does not Granger

cause y2. Informally, an LTI–SS representation in block triangular form is a system

whose network graph has two nodes, corresponding to two subsystems generating y1 and y2, and an edge from the node associated with y2 to the node associated

with y1. For both coercive and non-coercive processes, we give conditions for the

minimality of the representations and present algorithms on the construction of the representations.

In a similar manner, it is shown in Chapter3that a process y ““yT

1, yT2 . . . , yTn

‰T admits a specific LTI–SS representation in the so-called coordinated form if and only if yidoes not Granger cause ynand yiconditionally does not Granger cause yjwith

respect to ynfor all i ‰ j i, j “ 1, 2 . . . , n ´ 1. Informally, an LTI–SS representation in

coordinated form is a system whose network graph is a star graph (a tree of depth one which has one root node and all other nodes of which are leaves) such that the root node of the graph corresponds to the subsystem which generates yn and the

leaves correspond to subsystems which generate yj, j “ 1, . . . , n ´ 1. Chapter3,

together with Chapter2, is based on the journal paper (Jozsa et al., 2018b).

In Chapter4, the existence of an LTI–SS representation with the so-called tran-sitive acyclic directed graph (TADG) zero structure is characterized by the series of conditional Granger non-causality conditions: Let y “ “yT1, y2T. . . , yTn

‰T be the output process of an LTI–SS representation that has zero structure with a TADG G “ pV, Eq, where the set of nodes is V “ t1, 2 . . . , nu. Informally, an LTI–SS rep-resentation has a TADG zero structure with the graph G if its network graph is the graph G. Then we associate each component yiof y with a node i of G. The

condi-tional Granger non-causality conditions can be explained as follows: If pj, iq is not an edge of G then yidoes not conditionally Granger cause yjwith respect to the

col-lection of the components of y that correspond to the parent nodes of j. Chapter4

is partially based on the conference paper (Jozsa et al., 2017a).

As a counterpart of Chapter 4, Chapter 5 studies transfer matrices of LTI–SS representations with TADG network graphs. It deals with transfer matrices with TADG zero structure and shows that transfer matrices with TADG zero structure can be characterized by the same series of conditional Granger non-causality conditions as LTI–SS representations with TADG zero structure in Chapter4. The results of

(23)

6 Introduction Chapter5have been reported in the conference paper (Jozsa et al., 2017b).

In Chapter6the main result of Chapter2are extended to general bilinear state-space (GB–SS) representations and GB–Granger causality. It deals with GB–SS rep-resentations where a multiplicative input is present in the system and thus the relations between the processes of the system are nonlinear. It introduces an ex-tended form of Granger causality to a more general statistical property of the input and output processes, called GB–Granger causality. It then shows that a process y “ “yT

1, yT2

‰T

together with an input process admit a specific GB–SS representa-tion in block triangular form if and only if y1 does not GB–Granger cause y2. The

block triangular form of the GB–SS representation defines a network graph that has two nodes, corresponding to two subsystems generating y1 and y2, and an edge

from the node associated with y2to the node associated with y1. Chapter6is based

on the submitted journal paper (Jozsa et al., 2018a).

Finally, in Chapter7, we illustrate how the results and algorithms of Chapters3–

5can be applied to simulated data. The results show that with the help of the results in Chapters3–5we can identify the LTI–SS representations with different network graphs with precision comparable to classical identification methods.

Motivation

The contributions of this thesis are useful for reverse engineering of the network graph of stochastic dynamical systems. In addition, they can be relevant for dis-tributed estimation/control and for structure preserving model reduction.

Reverse engineering of the network graph: By reverse engineering of the net-work graph we mean finding out the netnet-work graph of a system based on the ob-served output of the system. This problem arises in several domains such as systems biology (Nordling and Jacobsen, 2011;Julius et al., 2009;Kang et al., 2015; Valdes-Sosa et al., 2011;Westra et al., 2007), neuroscience (Roebroeck et al., 2011c), smart grids (Bolognani et al., 2013;Zhang et al., 2017), etc. To solve this problem, we first need to understand when the observed behavior can be realized by a system with a specific network graph.

An emerging application in neuroscience (Roebroeck et al., 2011c;Valdes-Sosa et al., 2011;Friston et al., 2003;Goebel et al., 2003) is to detect and model interac-tions between brain regions using e.g., fMRI, EEG, MEG data. For this purpose, both Granger causality based methods (Goebel et al., 2003) and state-space based methods (Friston et al., 2003) were used. In the former case, the presence of an in-teraction was identified with the presence of Granger causality between the outputs

(24)

7 associated with various brain regions. In the latter case, the presence of an inter-action was interpreted as the presence of an edge in the network graph of a state-space representation, whose parameters were identified from data. However, the formal relationship between these methods was not always clear. This has lead to a lively debate regarding the advantages/disadvantages of both methods ( Valdes-Sosa et al., 2011;David, 2011;Roebroeck et al., 2011b). The results of Chapters2–4

imply that the network graph of a specific LTI–SS representation defines causal rela-tions in the output process and the causal properties of the output process restricts the network graph of a potential LTI–SS representation. In fact, considering condi-tional Granger non-causality conditions and LTI–SS representations in innovation form having a certain network graph, the two approaches are formally equivalent and produce the same outcome. Therefore, these results serve as an answer to the debate on the relationship between state-space methods and Granger causality in LTI–SS representations. Besides, it provides important knowledge for reverse engi-neering of the network graphs of LTI–SS representations.

The cited applications (Nordling and Jacobsen, 2011; Julius et al., 2009; Kang et al., 2015; Roebroeck et al., 2011c; Valdes-Sosa et al., 2011; Friston et al., 2003) and the related papers such as (Yuan et al., 2015; Yuan et al., 2011) use nonlinear state-space representations with inputs. For those state-space representations, there exist no simple methods for checking Granger non-causality. However, we show in Chapter6that by extending the concept of Granger causality to the so-called GB– Granger causality, GB–Granger non-causality can be translated to the existence of a GB–SS representation in innovation form having the two-node graph with one edge as its network graph. This can be a first step for further research on relating causality in the observed processes of nonlinear state-space systems to intrinsic properties, or more specifically to the network graph of those systems.

Distributed estimation/control: For the design of interconnected systems, choosing alternative network graphs realizing the same functionality can be ben-eficial. For example, for deterministic coordinated LTI–SS systems with inputs (Kempker et al., 2014b;Kempker, 2012), several control problems such as stabiliza-tion can be solved in a distributed manner: in order to stabilize the coordinator, no knowledge of the state of the agents is required, and in order to stabilize each agent, only the state of this agent and of the coordinator are needed.

It is therefore of interest to know which observed behaviors can be represented by LTI–SS representations with certain network graphs and how to convert an LTI– SS representation into an LTI–SS representation with certain network graphs while preserving its observed behavior. Such a transformation would allow the design of distributed controllers for systems which do not initially have the given network graph. Chapters2–4provide such conditions and transformations for autonomous

(25)

8 Introduction LTI–SS representations with transitive acyclic directed network graphs. Further-more, Chapter6provides similar results on GB–SS representations with the network graph that has two nodes and one directed edge. Although we do not study control design in this thesis, the results/analysis we provide are necessary first steps to-wards solving more general cases. In addition, these systems are already useful for distributed state estimation, as for such systems we know the following: the state of each subsystem that corresponds to a node in the network graph can be estimated only by using its own output and the output of the subsystems corresponding to parent nodes in the network graph. Hence, the state of the LTI–SS or GB–SS repre-sentations that are the subject of this thesis can be estimated in a distributed manner. Moreover, the proposed algorithms for constructing the representations in Chap-ters2–4and6, open up the possibility of distributed parameter estimation; for cal-culating a subsystem that is represented by a node in the corresponding network graph, only the observed process of the subsystem and the observed processes of the subsystems represented by the parent nodes are needed. That is, the results of Chapters2–4and6provide representations that are suitable for distributed param-eter estimation which serves as basis for distributed control.

Structure preserving model reduction: The results of the thesis could also be of interest for structure preserving model reduction, where the goal is to replace an interconnected model of the system by another, smaller dimensional (in terms of the dimension of states) interconnected model which has the same or similar network graph as the original model, see (van der Schaft, 2015;Sandberg and Murray, 2009) and (Monshizadeh et al., 2014). To model an observed behaviour by interconnected systems or systems with a certain network graphs, we first need to understand what property of an observed process allows for representing them by such models. Re-stricting ourselves to interconnected models that are subjects of this thesis, by our results, one can analyze a process to decide whether or not an interconnected model of the generating system exists. Furthermore, if it exists then by using the methods of the thesis, the interconnected model can be constructed. The constructed model has the following useful property: the reduction of the order of a subsystem corre-sponding to a node j of the network graph has local effect, meaning that the sub-systems that correspond to any other node i of the network graph such that there is no directed path from i to j or from j to i remain unchanged. Therefore, if the model reduction of a subsystem keeps the interconnection structure of the subsys-tem locally then the global interconnection structure remains unchanged. If it does not then still, even though the local interconnection structure changes, the global interconnection structure is partially preserved.

(26)

9

Related work

Besides the concept of network graph introduced in this thesis, there are several other notions for describing the structure of a system or the network of subsystems in a system. Examples of such notions are: feedback systems (Caines and Chan, 1975;Caines, 1976;Gevers and Anderson, 1982;Hsiao, 1982), dynamical structure function (Gonc¸alves et al., 2007; Howes et al., 2008; Yuan et al., 2015), dynamic network modeling (Weerts, 2018;Dankers, 2014;Weerts et al., 2018; Van den Hof et al., 2013), dynamic causal modeling (Friston et al., 2003; Havlicek et al., 2015; Penny et al., 2004), and causality graphs (Eichler, 2007;Eichler, 2012). Also, besides Granger and GB–Granger causality, there are several examples for statistical notions that have essential role to understand the relation between stochastic processes. We can mention here conditional orthogonality, transfer entropy (Barnett et al., 2009) and directional mutual information (L. Massey, 1990;Kramer, 1998).

The above-mentioned concepts are strongly related to each other, however, dis-cussing this in detail is outside the scope of this thesis. Since Granger causality (Granger, 1963; Granger, 1988;Wiener, 1956) is a key concept in this thesis and in several related papers, to give more insight about the above-mentioned concepts, we will focus on relating them to Granger causality. Note that in this thesis Granger causality is defined on covariance-stationary or in other words, weakly-stationary processes, however, it is possible to extend it to non-stationary, e.g., to co-integrated processes (Engle and Granger, 1987;Papana et al., 2014). Also, in this thesis Granger causality is defined as a logical value, i.e., it is either present or absent (see (Geweke, 1984) for measures on Granger causality) and in such a way that it does not change in time (see (L ¨utkepohl, 1993;Dufour and Renault, 1998;Triacca, 2000) for Granger causality in short and long time horizons).

Below, we first present previous research on model dependent formalization of interaction among processes and its model free counterpart. Then, we discuss exist-ing results on the relationship between these two approaches.

Model dependent formalization of interaction among processes

Feedback: Even though the idea behind Granger causality and feedback are differ-ent, in particular Granger causality is a model free concept in contrast to feedback, the two concepts are strongly related (Granger, 1963). The definition of feedback between processes is primarily based on feedback systems, i.e., models where a pro-cess serves as an input in the model of another propro-cess. It can, however, be defined in many ways. Note that this thesis studies the lack of Granger causality (Granger

(27)

10 Introduction non-causality) and thus we now focus on the lack of feedback, or in other words, on the feedback free property of processes. In (Caines and Chan, 1975;Gevers and Anderson, 1982) and in (Caines, 1988, Chapter 10), the feedback free property is de-fined based on network graphs of moving-average (MA) models. In (Caines, 1976) the definition of the feedback free property in (Caines and Chan, 1975) is renamed to weakly feedback free property and a stronger notion, called strongly feedback free property, is introduced. Both the strongly and weakly feedback free properties are then characterized by the model-free probabilistic concept, called conditional orthogonality (Caines, 1976). Moreover, it is pointed out that the lack of causal-ity defined in (Granger, 1963) is equivalent to the definition of weakly feedback free property, although also the strongly feedback free property is discussed in (Granger, 1963, Section VII), see (Caines, 1988, Chapter 10) for more details. It is important to mention that in this thesis, Granger non-causality is equivalent to weakly-feedback free property of processes ((Caines, 1988, Definition 2.1, Chapter 10)), which is in several applications more realistic than the strongly feedback free property. We note that (Caines, 1988, Chapter 10) studies a more general class of processes than this thesis, namely there is no restriction to processes with rational spectral density ma-trix, or equivalently, to processes that have LTI–SS representations.

The results in (Granger, 1963;Caines and Chan, 1975;Caines, 1976;Gevers and Anderson, 1982) are fundamental for our work since they relate Granger causality to network graph of MA and vector autoregressive (VAR) models. The results of Chapter2can be viewed as a counterpart of the results in the cited papers for LTI–SS representations. In addition, the results of Chapter5can be viewed as an extension of the results on Granger causality and MA models, on processes that have LTI–SS representations, where a collection of Granger causalities is characterized in terms of the network graph of MA models, or equivalently, using the terminology of the thesis, transfer matrices.

Dynamical structure function: Dynamical structure function (Gonc¸alves et al., 2007;Howes et al., 2008;Yuan et al., 2015;Yuan et al., 2011;Gonc¸alves and Warnick, 2008) describes structural properties of systems. It was first defined on deterministic linear systems such as deterministic LTI–SS and VAR systems. In (Yue et al., 2015) it was further extended to stochastic VAR systems and related to Granger causality. Dynamical structure function is a similar concept to what we call in this thesis net-work graph, however, it has not been applied to stochastic LTI–SS representations.

Dynamic network modeling:A dynamic network model in the sense of (Weerts, 2018;Dankers, 2014;Weerts et al., 2018;Van den Hof et al., 2013) is another approach for defining a network of interacting linear systems. The class of systems defined in (Weerts, 2018;Dankers, 2014;Weerts et al., 2018;Van den Hof et al., 2013) are differ-ent from the one considered in this thesis. In the cited papers, there are no general

(28)

11 results on the relationship between Granger causality and the network graph of dy-namic network models, however, examples were discussed in (Dankers, 2014).

Dynamical causal modeling: Dynamical causal modeling (Friston et al., 2003; Havlicek et al., 2015;Penny et al., 2004) is defined on both linear and nonlinear de-terministic models. It is compared to Granger causality in (Roebroeck et al., 2011a), explaining that dynamical causal modeling captures mechanistic inference among variables of a system whereas Granger causality only shows a statistical connection between them. Hence, (Roebroeck et al., 2011a) relates Granger causality to dynam-ical causal modeling in a similar way to how we relate Granger causality to network graphs of stochastic representations, however, it does not aim to show equivalence between the two concepts.

Coordinated systems: The work in (Kempker, 2012;Kempker et al., 2014a;Ran and van Schuppen, 2014), where deterministic LTI–SS representations in coordi-nated form were introduced, gave strong motivation for Chapter3. In (Kempker, 2012) and (Kempker et al., 2014a), a general method was presented to transform a system into coordinated form. In (Kempker, 2012) and (Pambakian, 2011), Gaussian coordinated systems and their LQG control were studied. Note that the cited pa-pers did not relate the coordinated system structure to properties of the observed process.

Model free formalization of interaction among processes

Causality graph: In (Eichler, 2005;Eichler, 2007;Eichler, 2012) interactions in sys-tems are defined with the help of causality graphs. Causality graphs are introduced using the combination of Granger causality and instantaneous coupling, see also (Yue et al., 2015). Note that in this thesis we do not study the notion of instanta-neous coupling that is defined between the observed processes of the systems that represent them. From this perspective, the cited papers aim to study more complex statistical properties of stochastic processes. However, in the cited papers causality graphs are defined on processes that are outputs of stochastic VAR models and are related to parameters of VAR models, not the more general class of systems, LTI– SS representations. In fact, the relation between causality graphs and parameters of VAR models in the cited paper are similar to the relation between the collection of causalities and LTI–SS representations showed in this thesis. Therefore, the re-sults of this thesis can help in extending causality graphs defined in (Eichler, 2005; Eichler, 2007;Eichler, 2012) to processes that are outputs of LTI–SS representations.

Information theoretic concepts:A branch of information theory, called directed information theory (L. Massey, 1990;Kramer, 1998) studies directional relation

(29)

be-12 Introduction tween stochastic processes. For the purpose of Granger causality, an important no-tion from this field is the so-called condino-tional direcno-tional mutual informano-tion or simply directed information (L. Massey, 1990) that is based on the notions of con-ditional transfer entropy and mutual information. In fact, concon-ditional directional mutual information can be formulated using the probabilistic notion of conditional independence which, in Gaussian case coincides with conditional orthogonality. Since Granger causality (as well as conditional Granger causality) can be formal-ized by conditional orthogonality, it is not surprising that under certain conditions, conditional directional mutual information can also provide an equivalent form of Granger causality, see (Barnett et al., 2009;Amblard and Michel, 2013).

To sum up the paragraphs above, we discussed different notions for describ-ing statistical properties of stochastic processes and structures of dynamical sys-tems. These were compared to the concepts considered in this thesis, in particular to Granger causality. In contrast to the definitions that were discussed above, we study network graphs of LTI–SS representations, LTI transfer matrices and GB–SS representations. The network graphs of the LTI systems are then related to condi-tional and uncondicondi-tional Granger causalities among the components of the output process and the network graphs of the GB–SS representations are related to the ex-tended form of Granger causality, called GB–Granger causality.

Relationship between model free and model dependent approaches

State-space representation: The first results on Granger causality in terms of LTI– SS representations were presented in (Barnett and Seth, 2015;Solo, 2016). The cited papers characterize Granger causality in the properties of LTI–SS representation by using transfer matrix approach. The idea behind Chapter2 and the cited papers are similar, however, we provide a different characterization of Granger causality in terms of properties of LTI–SS representations and, contrary to (Barnett and Seth, 2015;Solo, 2016), we give a construction for LTI–SS representations whose network graph characterizes Granger causality. Note that constructing such an LTI–SS repre-sentation is interesting since it provides mechanistic explanation for Granger causal-ity in its observed process.

The results in (Caines et al., 2003;Caines et al., 2009) are the closest ones to the results in Chapter3. The cited papers provide necessary and sufficient conditions for the existence of LTI–SS representations in the so-called conditional orthogonal form. Conditionally orthogonal LTI–SS representations form a specific subclass of LTI–SS representations in coordinated form discussed in Chapter3with additional assumptions on the covariance matrix of the noise process.

(30)

13 The conditions of (Caines et al., 2003;Caines et al., 2009) for the existence of such systems are much stronger than the conditions proposed in Chapter4. The paper (Caines and Wynn, 2007) is the closest one to the results in Chapters4 and5. The cited paper studies LTI–SS representations and their transfer matrices of Gaussian processes in a form that is a subclass of the LTI–SS representations with transitive acyclic directed graph (TADG) zero structure discussed in Chapter4with additional assumptions on the covariance matrix of the noise process. This additional assump-tion is closely related to the noassump-tion of strongly feedback free property of processes. Recall that we study Granger non-causality that corresponds to the weakly feedback free property of processes.

In (Caines and Wynn, 2007), also transfer matrices of the class of LTI–SS repre-sentations are studied in terms of conditional orthogonality. The existence of these systems are characterized by stronger conditional orthogonal conditions than the conditional orthogonality condition that are counterparts of the Granger causality conditions proposed in Chapters 4 and 5. Furthermore, the class of output pro-cesses that are modeled are more restrictive. Regarding the proofs of the statements of (Caines et al., 2003;Caines and Wynn, 2007;Caines et al., 2009), only the proof of existence of the LTI–SS representation in conditional orthogonal form can be found in (Caines et al., 2003) and it essentially differs from the proofs of the statements pre-sented in Chapters4and5. Note that (Caines et al., 2003;Caines and Wynn, 2007; Caines et al., 2009) did not provide algorithms to calculate the representations.

Causality in bilinear systems:The results of Chapter6relate the network graph of bilinear systems to an extended notion of Granger causality, called GB–Granger causality. To the best of our knowledge, the approach of extending Granger causal-ity to capture nonlinear causal relation among processes based on the properties of the nonlinear system that generates the processes, is new. We adopt the GB–SS rep-resentations from (Petreczky and Ren´e, 2017). The advantage of the adopted GB–SS representations is that, contrary to (Favoreel et al., 1999;Desai, 1986), the input pro-cess is not nepro-cessarily white noise. The disadvantage of the GB–SS representations considered in (Petreczky and Ren´e, 2017) and in Chapter6is that, contrary to (Chen and Maciejowski, 2001;D’Alessandro et al., 1974;Favoreel et al., 1999), it does not allow additional input term in the system, only multiplicative one.

Outline

Below, we briefly address the outline of the thesis:

(31)

14 Introduction state-space ( LTI–SS) representations, general bilinear state-space (GB–SS) repre-sentations and presents results on realization theory of these systems. For back-ground material on LTI–SS representations and their realization theory, we refer to (Lindquist and Picci, 2015;Katayama, 2005;Ljung, 1999;Hannan and Deistler, 1988; Van Overschee and De Moor, 1996). The background material on GB–SS represen-tations and their realization theory can be found in (Petreczky and Ren´e, 2017) and the references therein.

Chapter2 introduces Granger causality between two stochastic processes and presents results on characterizing it by the existence of the so-called Kalman repre-sentations in block triangular form.

The results of Chapter2are generalized in Chapter 3, introducing conditional Granger causality. A collection of conditional Granger causality and Granger causal-ity conditions is then characterized by the existence of so-called Kalman represen-tations in coordinated form. Chapters2and3are based on the journal paper (Jozsa et al., 2018b).

As a collection of conditional and unconditional Granger causality, Chapter 4

introduces transitive acyclic directed graph (TADG) causality structure of ZMSIR processes. By using the results of Chapters2 and3, Chapter4 presents results on characterizing TADG-causality structure with the existence of Kalman representa-tions with the so-called TADG-zero structure. Chapter4includes the results from the conference papers (Jozsa et al., 2017a), however, several additional statements are presented here that were not included in the cited paper.

The implications of the results presented in Chapter 4on transfer matrices are formulated in Chapter 5. The results of Chapter 5 are presented independently from the results of the previous chapters. The results of Chapter5 are based on the conference paper (Jozsa et al., 2017b).

Moving away from linear state-space models towards nonlinear state-space models, Chapter 6 deals with GB–SS representations. More precisely, we study GB–SS representations in the so-called innovation form and with a certain network graph. The main result shows that the existence of GB–SS representations in in-novation form with a certain network graph is equivalent to an extended form of Granger causality, called GB–Granger causality.

Chapter7illustrates the main results of Chapters2–4in practice, by applying the algorithms from Chapters2–4on simulated data.

(32)

Chapter 1 Preliminaries

In this thesis we consider multivariate discrete-time stochastic processes where the discrete-time axis is the set of integers Z. Let pR, F, P q be a probability space, where F is a σ-algebra and P is a probability measure on F. Throughout the thesis, all the random variables and stochastic processes are understood with respect to this probability space. We denote the random variable of a process z at time t P Z by zptq. If zptq is k-dimensional (for all t P Z) then we call k “ dimpzq the dimension of z_{and we write zptq P R}k

or z P Rk_{. If z}

1, . . . , zn are vector-valued processes, then

z ““zT 1, . . . , zTn

‰T

denotes the process defined by zptq ““zT

1ptq, . . . , zTnptq

‰T , t P Z. By using standard notation, we denote the covariance matrix of two random variables y and z by EryzT_s_{and we denote the conditional expectation of a y onto}

a σ-algebra F1Ď Fby Ery|F1s.

Throughout the thesis, the n ˆ n identity matrix is denoted by In or by I when

its dimension is clear from the context. Likewise, the n ˆ m zero matrix is denoted by 0n,mor by 0.

1.1 Hilbert spaces of stochastic processes

The zero-mean square-integrable random variables form a Hilbert space, denoted by H, with the covariance as the scalar product and with the standard multiplica-tion by scalar and addimultiplica-tion of random variables, see (Caines, 1988, Chapter 1) and (Gikhman and Skorokhod, 2004, Chapter 4) for more details.

The closed subspace generated by a set U Ă H is the smallest (with respect to set inclusion) closed subspace of H which contains U . The closed subspaces in H form Hilbert spaces themselves with the same inner product as H. For this reason we call a closed subspace generated by some random variables in H as the Hilbert space generated by those random variables.

Let z P Rk_{be a zero-mean square-integrable process and consider a time instant}

t P Z as the present time. Then Hz

t´, Hzt`, Hztdenote the Hilbert spaces generated by

the past, future and present values of z, i.e., by the sets t`T

zpsq | s P Z, s ă t, ` P Rk_u,

t`Tzpsq | s P Z, s ě t, ` P Rk_{u, and t`}T

(33)

16 1. Preliminaries If u P R is a random variable in H and U P H is a closed subspace, then we denote by Elru | U sthe orthogonal projection of u onto U . The orthogonal

pro-jection of a multivariate random variable u P Rk _{onto U is defined element-wise}

and is denoted by Elru|U s. That is, Elru|U sis the random variable with values in

Rk obtained by projecting the one-dimensional coordinates of u onto U . Accord-ingly, the orthogonality of u to U is meant element-wise. The orthogonal projec-tion of a closed subspace U Ď H onto a closed subspace V Ď H is defined by ElrU |V s :“ tElru|V s, u P U u. Note that for jointly Gaussian processes y and z, the

orthogonal projection Elryptq|Hztsof yptq onto Htzis equivalent to the conditional

expectation Eryptq|σpzptqqs of yptq given the σ-algebra generated by zptq.

Lastly, we mention that in subsequent chapters, for subspaces in H, we will use the following operations: the sum of two subspaces U, V Ď H is written by U ` V :“ tu ` v|u P U, v P V uand the orthogonal complement of U in V (with respect to H) by V a U ; if U X V “ t0u then the direct sum of them is denoted by U 9`V; if U and V are orthogonal then we write the orthogonal direct sum as U ‘ V .

1.2 LTI–SS representations

The results of Chapters2–4are based on linear stochastic realization theory. There-fore, to provide background material, we introduce the linear stochastic systems that are studied in these chapters and give a brief overview of basic results in the field (see (Lindquist and Picci, 2015)).

1.2.1 Introduction to LTI–SS representations

Below, we provide an introduction of linear time-invariant state-space (LTI–SS) rep-resentations. To begin with, we define the class of processes we will work with.

Definition 1.1 (ZMSIR). A stochastic process is called zero-mean square-integrable with rational spectrum (abbreviated by ZMSIR) if it is weakly-stationary, square-integrable, zero-mean, purely non-deterministic, and its spectral density is a proper rational function.

See (Lindquist and Picci, 2015; Rozanov, 1987) for further details on the prop-erties of purely non-deterministic, weakly-stationary (wide-sense stationary in (Rozanov, 1987)) processes with rational spectrum. In the literature, it is common to assume that the ZMSIR processes are coercive: Recall from (Lindquist and Picci, 2015, Definition 9.4.1) that y is coercive if its spectrum is strictly positive definite on the unit disk. Coercive and non-coercive processes are discussed separately in the main results of Chapters2–4.

(34)

1.2. LTI–SS representations 17 Next, we define the term LTI–SS representation for the class of ZMSIR processes.

Definition 1.2 (LTI–SS representation). A stochastic LTI–SS representation is a stochastic dynamical system of the form

xpt ` 1q “ Axptq ` Bvptq

yptq “ Cxptq ` Dvptq, (1.1)

where A P Rnˆn_{, B P R}nˆm_{, C P R}pˆn_{, D P R}pˆm _{for n ě 0, m, p ą 0 and where}

x P Rn

, y P Rp

, v P Rm_{are ZMSIR processes. The processes x, y and v are called}

state, output and noise processes, respectively. Furthermore, we require that A is stable, or equivalently that all its eigenvalues are inside the open unit circle, and that for any t, k P Z, k ě 0, ErvptqvT_{pt ´ k ´ 1qs “ 0, Ervptqx}T_{pt ´ kqs “ 0, i.e., vptq is}

white noise and uncorrelated with xpt ´ kq. An LTI–SS representation with output process y is called LTI–SS representation of y.

In (1.1) the state process x is uniquely determined by the noise process v and the system matrices A and B so that xptq “ ř8

k“0A

k_{Bvpt ´ kq, where the}

conver-gence of the infinite sum is understood in the mean square sense. On this basis (1.1) is referred to as LTI–SS representation pA, B, C, D, v, yq or LTI–SS representation pA, B, C, D, vqof y. Following the classical terminology, we call the dimension of the state process x the dimension of the LTI–SS representation (1.1). Also, an LTI–SS representation (1.1) is called minimal if it has minimal dimension among all the LTI– SS representations of y. Notice that we allow (1.1) to have zero dimension. Zero dimensional LTI–SS representations corresponds to representations of white noise processes (y “ Dv). Whenever we say that pA, B, C, D, vq is a minimal LTI–SS rep-resentation of a white noise process, it means that A, B, C are absent (or they are zero by zero empty matrices). Zero-dimensional representations are considered to be minimal, observable and controllable.

Note that the class of ZMSIR processes coincide with the class of processes that can be represented by LTI–SS representations. For convenience, we will assume that the outputs of LTI–SS representations, i.e., of ZMSIR processes, have a so-called full-rank property. To define full-rank property of ZMSIR processes, we use the following terminology: Recall that Hyt´denotes the Hilbert space generated by

typt ´ kqu8_k“1. We call the process eptq :“ yptq ´ Elryptq|H

y

t´s, t P Z

the (forward) innovation process of y.

(35)

18 1. Preliminaries

Definition 1.3. An output process y of an LTI–SS representation is called full rank if the variance matrix of the innovation process of y is strictly positive definite.

The following assumption will be in force for the rest of the thesis.

Assumption 1.4. The output process y of an LTI–SS representation(1.1) is full rank.

Assumption1.4is a commonly used technical assumption that can not be as-sumed without loss of generality. However, we know that if z is a ZMSIR process with innovation process ez_{, then there exists a full column rank matrix M and a full}

rank process y such that z “ M y and ez

“ M ey, where ey_{is the innovation process}

of y, see (Lindquist and Picci, 2015, (4.46)).

1.2.2 Realization theory of LTI–SS representations

Stochastic LTI–SS representations of a given process y are strongly related to de-terministic LTI–SS realizations of the covariance sequence Λyk :“ Erypt ` kqy

T_ptqs,

k “ 0, 1, 2, . . ., see (Lindquist and Picci, 2015, Chapter 6) and (Caines, 1988, Chap-ter 4) for more details. Below we briefly sketch this relationship, as it plays an im-portant role in deriving results in Chapter2. Consider an LTI–SS representation pA, B, C, D, vqof y with state process x. Note that weakly stationarity implies that the (co)variance matrices are time-independent. Denote the noise variance matrix by Λv

0 “ ErvptqvTptqsand the state variance matrix by Λx0 “ ErxptqxTptqs. Then, Λx0

is the unique symmetric solution of the Lyapunov equation Σ “ AΣAT

` BΛv0BT

and the covariance G :“ EryptqxT

pt ` 1qssatisfies G “ CΛx0A T ` DΛv0B T . (1.2)

In light of this, the covariances tΛyku8k“0 of y are equal to the Markov parameters

of the deterministic LTI–SS system pA, GT_{, C, Λ}v

0q, where recall that Λ y

k “ Erypt `

kqyT_{ptqs. More precisely, for k ą 0}

Λy_k “ CAk´1GT. (1.3)

Therefore, LTI–SS representations of y yield deterministic LTI–SS systems whose Markov parameters are the covariances tΛyku8k“0 of y. Conversely, deterministic

LTI–SS systems whose Markov parameters are the covariances tΛyku8k“0yield LTI–

SS representations of y. Recall that we call the process eptq :“ yptq ´ Elryptq|Hy_t´s, t P Z

(36)

1.2. LTI–SS representations 19 the innovation process of y. Assume now that pA, GT_{, C, Λ}y

0qis a stable minimal

deterministic LTI–SS system whose Markov parameters are the covariances of y, i.e., (1.3) holds. Call a matrix M minimal symmetric solution of a matrix equation if for any other symmetric solution ˜M the matrix ˜M ´ Mis positive definite. Let Σx

be the minimal symmetric solution of the algebraic Riccati equation

Σ “ AΣAT ` pGT ´ AΣCTqp∆pΣqq´1pGT ´ AΣCTqT, (1.4) where ∆pΣq “ pΛy0´ CΣCTqand set K as

K :“ pGT´ AΣxCTqp∆pΣxq´1q. (1.5)

Then, we know the following about the tuple pA, K, C, I, eq.

Proposition 1.5. (Katayama, 2005, Section 7.7) Let K be as in (1.5) and e be the innovation process of y. Then the tuple

pA, K, C, I, eq (1.6)

is a minimal LTI–SS representation of y.

If x is the state of pA, K, C, I, eq, then the minimal symmetric solution of (1.4) is Σx “ ErxptqxTptqs and ∆pΣxq “ EreptqeTptqs. Furthermore, K “ Erxpt ` 1q

eT

ptqsEreptqeTptqs´1in (1.5) is the gain of the steady-state Kalman filter (Lindquist and Picci, 2015, Section 6.9). This motivates the following definition:

Definition 1.6. Let e, y P Rp

be ZMSIR processes and A P Rnˆn_{, K P R}nˆp_{, C P}

Rpˆn, D P Rpˆp. An LTI–SS representation pA, K, C, D, e, yq is called Kalman repre-sentation if e is the innovation process of y and D “ Ip.

A Kalman representation with output process y is called Kalman representation of y. A Kalman representation is minimal, called minimal Kalman representation, if it is a minimal LTI–SS representation. The representation in Proposition1.5is a minimal Kalman representation, thus from the discussion above we can conclude that

Proposition 1.7. Every ZMSIR process y has a minimal Kalman representation.

Notice that Proposition1.7trivially implies that every ZMSIR process has a min-imal LTI–SS representation.

An important feature of Kalman representations is that they can be calculated from the covariance sequence of the output process, see Algorithm1 below. As a consequence, Kalman representations of a process can be calculated from any LTI– SS representation of that process, see Algorithm2below.

(37)

20 1. Preliminaries In Chapters 2–4, we deal with the so-called coercive property of ZMSIR pro-cesses, see (Lindquist and Picci, 2015, Definition 9.4.1). In terms of Kalman repre-sentations, coercivity of a process y is equivalent to the invertibility of any Kalman representation pA, K, C, I, eq of y, i.e., with the existence of the inverse matrix pA ´ KCq´1_{, see (}_{Lindquist and Picci, 2015}_{, Theorem 9.4.2). From this, it is easy to see}

that if y is coercive, i.e., pA ´ KCq´1 _{exists, then the innovation process e can be}

expressed as below. eptq “ yptq ´ 8 ÿ k“0 CpA ´ KCqkKypt ´ k ´ 1q.

In view of the foregoing, we present Algorithms1and2.

Algorithm 1Minimal Kalman representation based on output covariances

Input tΛyku2Nk“0: Covariance sequence of y

Output tA, K, C, Λe

0u: System matrices of (1.6) and variance of the innovation

pro-cess of y

Step 1Define the Hankel and the shifted Hankel matrices

H0“ » — — — – Λy₁ Λy₂ ¨ ¨ ¨ Λy_N Λy₂ Λy₃ ¨ ¨ ¨ Λy_{N `1} .. . ... ... Λy_N Λy_{N `1}¨ ¨ ¨ Λy_{2N ´1} fi ffi ffi ffi fl H1“ » — — — – Λy₂ Λy₃ ¨ ¨ ¨ Λy_{N `1} Λy₃ Λy₄ ¨ ¨ ¨ Λy_{N `2} .. . ... ... Λy_{N `1}Λy_{N `2}¨ ¨ ¨ Λy_2N fi ffi ffi ffi fl .

Step 2Calculate the SVD of H0“ U SVT.

Step 3Let m be such that Λy0 P Rmˆmand denote the first m rows of a matrix by

p.q1:m. Define

A “ S´1{2_UT_H

1V S´1{2

C “ pU S1{2q1:m G “ pV S1{2q1:m

Step 4Find the minimal symmetric solution Σxof the Riccati equation (1.4) (see

e.g., (Katayama, 2005, Section 7.4.2)).

Step 5Set K as in (1.5) and define Λe 0“ Λ

y

0´ CΣxCT.

Note that Steps 1–3 of Algorithm1calculate a minimal deterministic LTI–SS sys-tem pA, GT_{, C, Λ}

0qsuch that (1.3) holds using the classical Kalman-Ho realization

algorithm.

(38)

co-1.2. LTI–SS representations 21

Algorithm 2Minimal Kalman representation based on LTI–SS representation

Input t ¯A, ¯B, ¯C, ¯D, Λv

0u: System matrices of an LTI–SS representation

p ¯A, ¯B, ¯C, ¯D, vqof y and variance of vptq

Output tA, K, C, Λe

0u: System matrices of (1.6) and variance of the innovation

pro-cess of y

Step 1Find the solution Σxof the Lyapunov equation Σ “ ¯AΣ ¯AT ` ¯BΛv0B¯T.

Step 2Define G : ¯CΣxA¯T` ¯DΛv0B¯T and calculate the output covariance matrices

Λy_k “ ¯C ¯Ak´1_GT _{for k “ 0, . . . , 2n, where n is such that ¯}

A P Rnˆn_.

Step 3 Apply Algorithm 1 with input tΛyku 2n

k“0 and denote the output by

tA, K, C, Λe0u.

variance sequence tΛyku8k“0 and an LTI–SS representation p ¯A, ¯B, ¯C, ¯D, vqof y. Let

ebe the innovation process of y and N be larger than or equal to the dimension of a minimal LTI–SS representation of y. Then it follows from (Katayama, 2005, Lemma 7.9, Section 7.7) that if tA, K, C, Λe

0uis the output of Algorithm1 with

in-put tΛyku 2N

k“0, then pA, K, C, I, eq is a minimal Kalman representation of y and Λ e 0“

EreptqeT_{ptqs. Likewise, if tA, K, C, Λ}e

0u is the output of Algorithm 2 with input

t ¯A, ¯B, ¯C, ¯D, ErvptqvTptqsu, then pA, K, C, I, eq is a minimal Kalman representation of y and Λe

0“ EreptqeTptqs.

Remark 1.9. Algorithms1and2involve matrix multiplication, inversion, calculat-ing SVD and solvcalculat-ing Riccati and Lyapunov equations. The computational complex-ity of all involved matrix operations is polynomial in the sizes of the matrices (Golub and Van Loan, 2013). Also, solving Riccati and Lyapunov equations is polynomial in the size of the solution matrix (Bini et al., 2011). For Algorithm1, the sizes of the matrices involved are polynomial in the number 2N ` 1 and the size p “ dimpyq of the output covariances, hence its complexity is polynomial in N and p. By similar reasoning, Algorithm2has polynomial complexity in the dimensions of the state, output, and noise processes of the input LTI–SS representation p ¯A, ¯B, ¯C, ¯D, vq.

The algorithms in Chapters2–4are based on Algorithms1–2and under certain conditions, they also calculate minimal Kalman representations. Minimal Kalman representations have the following useful properties:

Proposition 1.10. A Kalman representation pA, K, C, I, e, yq is minimal if and only if

pA, Kqis controllable and pA, Cq is observable.

Proposition1.10 provides a characterization of minimality of a Kalman repre-sentation pA, K, C, I, e, yq by minimality of the deterministic system pA, K, C, Iq. In general, the characterization of minimality in LTI–SS representations is more

(39)

22 1. Preliminaries involved, and it is related to the minimality of the deterministic LTI–SS system pA, GT, C, Λy₀qassociated with the stochastic LTI–SS representation (see (Lindquist and Picci, 2015, Corollary 6.5.5)). The next proposition shows that minimal Kalman representations are isomorphic in the sense defined below:

Definition 1.11(isomorphism). Consider two Kalman representations pA, K, C, I, eq and p ˜A, ˜K, ˜C, I, eq of a process y. Then they are isomorphic if there exists a non-singular matrix T such that A “ T ˜AT´1_{, K “ T ˜}_K_{and ˜}_{C “ CT}´1_.

Proposition 1.12. (Lindquist and Picci, 2015, Theorem 6.6.1) Any two minimal Kalman representations of a process y are isomorphic.

Again, in general, the result does not apply for any two LTI–SS representations of y. The statement and its proof can be found in (Lindquist and Picci, 2015, Theorem 6.6.1, Section 6.6) with the modification that here the noise process is not normalized.

1.3 GB–SS representations

This section provides background material for Chapter6on general bilinear state-space (GB-SS) representations. We adopt the terminology of (Petreczky and Ren´e, 2017) and summarize some of its results. First, some basic notation and terminol-ogy are introduced. Then GB-SS representations are defined and a brief summary about realization theory of GB-SS representations is presented (for more details see (Petreczky and Ren´e, 2017)).

1.3.1 Introduction to GB–SS representations

To define general bilinear state-space representations, we first introduce the neces-sary terminology. For the rest of the chapter, we fix a finite set t1, 2, . . . , du, where d is a positive integer, and denote it by Σ.

Consider the discrete-time stochastic dynamical system xpt ` 1q “ ÿ

σPΣ

pAσxptq ` Kσvptqquσptq

yptq “ Cxptq ` Dvptq,

(1.7)

where the state xptq P Rn

, noise vptq P Rm

, output yptq P Rp_{, and input processes}

uσptq P R, σ P Σ are weakly stationary stochastic processes.

In order to be able to define generalized bilinear state-space (abbreviated by GB-SS) representations we need to impose further restrictions on systems of the form

(40)

1.3. GB–SS representations 23 (1.7). More precisely, we adapt GB–SS representations from (Petreczky and Ren´e, 2017) which are state-space representation of the form (1.7) that satisfy a number of additional conditions. Note that these conditions are necessary for realization theory of representation of the form (1.7). The following notation and terminology help us to define these conditions.

Let Σ`_{be the set of all finite sequences of elements of Σ, i.e., a typical element}

of Σ` _{is a sequence of the form w “ σ}

1¨ ¨ ¨ σk, where σ1, . . . , σk P Σ. We define

the concatenation operation on Σ`_{in the standard way: if w “ σ}

1¨ ¨ ¨ σk and v “

ˆ

σ1¨ ¨ ¨ ˆσlwhere σ1, . . . , σk, ˆσ1, ..., ˆσl P Σthen the concatenation of w and v, denoted

by wv, is defined as the sequence wv “ σ1¨ ¨ ¨ σkσˆ1¨ ¨ ¨ ˆσl. In the sequel, it will be

convenient to extend Σ`_{by adding a formal unit element R Σ}`_{. We denote this set}

by Σ˚ _{:“ Σ}`_{Y tu. The concatenation operation can be extended to Σ}˚ _{as follows:}

“ , and for any w P Σ`_{, w “ w “ w. Let w “ σ}

1¨ ¨ ¨ σk P Σ` and σ P Σ.

Then the length of w is defined by |w| :“ k and the length of is defined by || :“ 0. Consider a set of matrices tMσuσPΣ where Mσ P Rnˆn, n ě 1 for all σ P Σ and let

w “ σ1¨ ¨ ¨ σk P Σ`, where σ1, . . . , σk P Σ. Then, we denote the matrix Mσk¨ ¨ ¨ Mσ1 by Mwand we define M:“ I. In addition, for a set of processes tuσuσPΣand for w “

σ1¨ ¨ ¨ σk P Σ`, where σ1, . . . , σk P Σwe denote the process uσkptq ¨ ¨ ¨ uσ1pt ´ |w| ` 1q by uwptqand define uptq :” 1. In a dynamical system (1.7), the past of the noise,

state and output processes that are multiplied by the past of the input processes play an important role in defining GB-SS representations and adapting analytical tools from linear system theory to the study of GB-SS representations. For this reason we define the following processes:

Definition 1.13. Consider a process r and a set of processes tuσuσPΣ. Let σ P Σ and

w “ σ1¨ ¨ ¨ σk P Σ`, where σ1, . . . , σkP Σ˚. Then, we define the process

zr_wptq :“ rpt ´ |w|quwpt ´ 1q,

which we call the past of r with respect to tuσuσPΣ.

Definition 1.14. Consider a process r and a set of processes tuσuσPΣ. Let σ P Σ and

w “ σ1¨ ¨ ¨ σk P Σ`, where σ1, . . . , σkP Σ˚. Then, we define the process

zr`_w ptq :“ rpt ` |w|quwpt ` |w| ´ 1q,

which we call the future of r with respect to tuσuσPΣ.

Notice that for w “ , both the past zr

ptqand the future zr` ptqof r with respect

to tuσuσPΣequal rptq.

The processes in Definitions1.13and1.14slightly differ from the parallel past and future processes of a process used in (Petreczky and Ren´e, 2017). Note that

University of Groningen Relationship between Granger non-causality and network graph of state-space representations Jozsa, Monika