SHUFFLING CARDS with Random Transpositions

(1)

with Random Transpositions

Bachelor’s Project Mathematics

Name: A.F.G. Pelzer

First supervisor: Dr. D. Rodrigues Valesin

Second supervisor: Prof. Dr. A.C.D. van Enter

Date: 2 Juli 2018

(2)

2. Shuffling cards described mathematically 2

2.1. Preliminaries on Group Theory 2

2.2. Preliminaries on Markov Chains and Random Walks 7

2.3. Shuffling cards in mathematics 12

2.4. Examples 15

3. Random Transpositions 16

3.1. Natural method 16

3.2. The lower bound for the variation distance 19 3.3. Preliminaries on Representation Theory 27 3.4. The upper bound for the variation distance 42

4. Conclusion 55

5. Epilogue 56

References 57

(3)

1. Introduction

Playing card games entertained us for many centuries. The first playing card set was made in China in the ninth century and the first time it appeared in Europe was in 1370-1400. The most used card game was created in France approximately in the year 1480.[12][11]

In this game the playing cards are numbered in this order: Ace, 2, 3, 4,...., 10, J (Jack), Q (Queen) and K (King) and the cards are divided into four groups with the following symbols: Hearts, Diamonds, Clubs and Spades. These symbols stand for the four Medieval classes, the Hearts for Clergy, the Spades for Nobility, the Diamonds for the mer- chants and the Clubs for the farmers.

Other playing cards can have different symbols, for example we have Cups instead of Hearts.[10][11]

Since the end of the 17th century in this French card game they use the same names for the Kings, Queens and Jacks, as Jack of Diamonds is Hector (the hero of the Trojan war or Lancelot’s brother), Jack of Clubs is Lancelot (Knight of the Round Table), Queen of Spades is Pal- las (Greek goddess of wisdom and art)[13], King of Diamonds is Julius Caesar, etc.. Another interesting fact to know is that the women are not actually Queens and are not the wives of the Kings.

As we also saw in the last described card game, each card has a unique face and has a back, where the backs in each game are all the same.

Furthermore every card has the same size and shape.

A card game is played with a deck of n cards, where n can be 32, 52, 54 and so on. In the French card game we have that the number of cards is 52.[10][11]

Therefore there are 52! different ways to put a deck of 52 cards in a certain order and 52! ≈ 8.065817517 · 10⁶⁷, which is a huge number. So in this game it is impossible to look at all options of sequences of the playing cards, but if the number of cards is not much, then we can.

However we can never see when a deck of n cards is well shuffled, no matter what n is.

First we will describe shuffling cards mathematically using Group The- ory, Markov chains and Random walks in section 2.

In section 3.1 the Natural method will be described. This is a shuffling method, where we use random transpositions. Therefore the main question is as follows.

How many random transpositions do we need to shuffle a deck of n cards until it is well shuffled?

To get the answer to this main question we will prove the theorems of the lower bound and upper bound for the variation distance in sections 3.2 and 3.4. For the proof of the upper bound we will use Represen- tation Theory, described in section 3.3. The last section is about the conclusions of shuffling cards.

(4)

2. Shuffling cards described mathematically 2.1. Preliminaries on Group Theory.

An important group for describing shuffling cards mathematically is the symmetric group S_n. This is a group whose elements are bijective functions on the set {1, 2, ..., n} and it is with composition of functions as the group operation and the identity operation. This group S_n has the identity element id(k) = k and ∀σ ∈ S_n, there exists σ⁻¹ ∈ S_n. Before we describe shuffling cards mathematically, we need some more information about cyclic groups, cycles and permutations. For this basic material on Goup Theory, see [16][19][15].

Definition 1. Let G be a group and H ⊂ G. Let hHi be the smallest group, that contains all the elements of the set H. If hHi = G, then H generates G.

Definition 2. A group G is called cyclic, if there exists an element g in G, such that

G = hgi = {gⁿ : n ∈ Z}

Definition 3. Let a permutation σ ∈ S_n and assume that a₁, a₂, ..., a_k∈ {1, 2, ..., n} are distinct integers.

If for 1 ≤ i < k σ(a_i) = a_i+1, σ(a_k) = a₁ and σ(x) = x for x /∈ {a₁, a₂, ..., a_k},

then σ is a cycle of length k, also called a k-cycle in S_n, which is denoted by

σ = (a₁ a₂ ... a_k) A 2-cycle is called a transposition.

Moreover, if two cycles (a₁a₂...a_k) and (b₁b₂...b_l) have the property {a₁, a₂, ..., a_k} ∩ {b₁, b₂, ..., b_l} = ∅, then these two cycles are disjoint.

So the general form of the notation of a cycle is σ = (a₁a₂...a_k), where σ(a1) = a2, σ(a2) = a3,...,σ(ak−1) = ak and σ(ak) = a1.

Remark 1. Disjoint cycles commute.

Theorem 1. Every permutation σ ∈ S_n can be written as a product of pairwise disjoint cycles, so σ = σ₁· · · σ_r, where σ_i is pairwise disjoint cycle, ∀i ∈ {1, 2, ..., r}.

This product of pairwise disjoint cycles is unique, apart from the order of each σ_i.

Proof. We want to show that every σ ∈ S_n can be written as a product of pairwise disjoint cycles σ_i, where i ∈ {1, 2, ..., r}.

First we will prove the existence of this product of disjoint cycles by induction on n.

For n = 1 the only permutation is σ = (1), which can be written as a product of disjoint cycles.

Let n > 1 and assume that the existence of this product is true for all

(5)

permutations in S_m, where m < n.

Now let σ ∈ S_n, then {1, σ(1), σ(1)², ...} ⊂ {1, 2, ..., n}. Therefore there exist k and l, such that k < l and σ(1)^k = σ(1)^l. This is the same as σ^l−k(1) = 1, so ∃s ∈ Z>0 σ(1)^s = 1.

Let the least such number be denoted by q ∈ Z, then the integers 1, σ(1), ..., σ(1)^q−1 are pairwise distinct. Thus for σ we have the k- cycle σ₁ = (1 σ(1)...σ(1)^q−1).

Consider the other integers in {1, 2, ..., n}. If this is an empty set, then σ = σ1 and therefore we are done.

If it is the case that it is not empty, then σ acts as a permutation on this set. Applying that the existence is true for all S_m, where m < n we have that the restriction of σ to this subset can be written as a product of disjoint cycles σ₂, σ₃, ..., σ_r.

Now assume that these cycles are the permutations on {1, 2, ..., n}, then σ = σ₁σ₂· · · σ_r.

Now we will show the uniqueness of this product, apart from the order of each σ_i, where i ∈ {1, 2, ..., r}.

Assume for contradiction that some permutation σ can be written as two different products of pairwise disjoint cycles.

Fix i and j, such that σ(i) = j. Therefore in both products there exists exactly one cycle σ_r₁ of the form (... i j ...).

Now consider that σ(j) = h for some fixed i and h, then there exist one cycle σr2 in both products and that cycle is of the form (... j h ...).

Suppose that σ(h) = g. Similarly, there exists one cycle σ_r₃ of the form (... h g ...) in both products.

This algorithm can be done several times until we get the result that both products contain the same cycles, so contradiction. Thus, every σ ∈ S_n can be written as a product of pairwise disjoint cycles and this product is unique, apart from the order of each pairwise disjoint

cycle.

A consequence of the last theorem is as follows

Theorem 2. Every permutation σ ∈ S_n can be written as a product of 2-cycles.

Proof. From Theorem 1 we know that every permutation σ ∈ S_n can be written as a product of cycles σ_i, where i ∈ {1, 2, ..., k}. Since every cycle σ_i = (a_i₁a_i₂...a_i_m) can be written as a product of 2-cycles in this way:

(ai1ai2...aim) = (ai1ai2)(ai2ai3) · · · (aim−1aim),

thus σ can be written as a product of 2-cycles. Now we will discuss even and odd permutations.

First some notation, for n ≥ 2 we write

X := {(i, j) ∈ Z × Z : 1 ≤ i < j ≤ n}.¯

(6)

For σ ∈ S_n let f_σ(i, j) = (min{σ(i), σ(j)}, max{σ(i), σ(j)}) and let h_σ : ¯X → Q, such that hσ(i, j) = ^{σ(j)−σ(i)}_j−i .

From this a useful lemma follows.

Lemma 1. Let n ≥ 2, then 1) For σ, τ ∈ S_n,

fστ = fσ◦ fτ

2) f_σ is a bijection on ¯X.

3)

Y

(i,j)∈ ¯X

hσ(i, j) = ±1

We will only prove 2) and 3), because 1) is evident. So for the proof of 1) see [19].

Proof. Let n ≥ 2.

2) f_σ is a bijection on ¯X, since

id = f_id= f_σσ⁻¹ = f_σ⁻¹_σ = f_σ⁻¹◦ f_σ = f_σ◦ f_σ⁻¹ using 1), so f_σ has an inverse f_σ⁻¹ = f_σ⁻¹.

3) The absolute value of Q

(i,j)∈ ¯Xh_σ(i, j) is Y

(i,j)∈ ¯X

|h_σ(i, j)| = Y

(i,j)∈ ¯X

σ(j) − σ(i) j − i

= Q

(i,j)∈ ¯X|σ(j) − σ(i)|

Q

(i,j)∈ ¯X(j − i) , because i < j.

Since f_σ( ¯X) = ¯X, the numerator of the equation above is the same as the product of all (l − k). Therefore

Y

(i,j)∈ ¯X

|hσ(i, j)| = Q

(l,k)∈ ¯X|l − k|

Q

(l,k)∈ ¯X(k − l) = 1 So the absolute value is 1, hence,

Y

(i,j)∈ ¯X

h_σ(i, j) = ±1

To determine whether a permutation is even or odd, we need the following definition.

Definition 4. For n ≥ 2 (σ) is called the sign of a permutation in S_n and is given by

(σ) = Y

(i,j)∈ ¯X

h_σ(i, j) = Y

1≤i<j≤n

σ(j) − σ(i) j − i = ±1

(7)

If n = 1, then (σ) = 1.

A permutation σ is even, if (σ) = 1 and a permutation σ is odd, if (σ) = −1

This (σ) has the following properties.

Theorem 3. The sign : S_n→ ±1 is a homomorphism.

Proof. Let σ, τ ∈ S_n, then we have that f_σ is bijective on ¯X. Therefore

(σ) = Y

(i,j)∈ ¯X

h_σ(i, j)

= Y

(i,j)∈ ¯X

h_σ(f_τ(i, j))

= Y

1≤i<j≤n

σ(τ (i)) − σ(τ (j)) τ (j) − τ (i) So

(στ ) = Y

(i,j)∈ ¯X

(στ )(j) − (στ )(i) j − i

=

Y

1≤i<j≤n

σ(τ (j)) − σ(τ (i)) τ (j) − τ (i)

Y

(i,j)∈ ¯X

τ (j) − τ (i) j − i

= (σ)(τ )

So the sign is a homomorphism.

Furthermore, Lemma 2.

1) ∀τ ∈ S_n and any l-cycle (a₁a₂...a_l) ∈ S_n,

τ (a₁a₂...a_l)τ⁻¹ = (τ (a₁)τ (a₂)...τ (a_l)) 2) All 2-cycle (a₁a₂) satisfies ((a₁a₂)) = −1.

Proof. 1) Let τ ∈ S_n and (a₁a₂...a_l) be a l-cycle in S_n. We have that

(τ (a₁...a_l)τ⁻¹)(τ (a_l)) = (τ (a₁...a_l))(a_l) = τ (a₁) Similarly, for 1 ≤ k < l,

(τ (a1...al)τ⁻¹)(τ (ak)) = (τ (a1...al))(ak+1) = τ (ak+1) and for all remaining i ∈ {1, 2, ..., n},

(τ (a₁...a_l)τ⁻¹)(i) = i So

τ (a₁a₂...a_l)τ⁻¹ = (τ (a₁)τ (a₂)...τ (a_l))

(8)

2) Assume that (a₁a₂) is a 2-cycle in S_nand fix any permutation τ ∈ S_n, such that τ (a₁) = 1 and τ (a₂) = 2. Since is a homomorphism, we get the following

((12)) = ((τ (a₁)τ (a₂))) = (τ (a₁a₂)τ⁻¹)

= (τ )(a₁a₂)(τ )⁻¹ = ((a₁a₂)), so all transpositions have the same sign.

Applying Definition 4 we get the sign of ((12)):

((12)) = σ(2) − σ(1)

2 − 1 = 1 − 2 2 − 1 = −1

Hence, any 2-cycle (a₁a₂) satisfies ((a₁a₂)) = −1. From these facts it follows that for any 2-cycle (a₁a₂) ∈ S_n and

∀σ ∈ S_n,

−(σ) = ((a₁a₂))(σ) = ((a₁a₂)σ)

The alternating group consists of all even permutations in the symmetric group, so

Definition 5. For n ≥ 1, the alternating group A_n is the subgroup of S_n, that consists of all even permutations of S_n.

(9)

2.2. Preliminaries on Markov Chains and Random Walks.

In this subsection we will follow the book [16].

First, a finite Markov chain is a process, which moves among the elements of a finite set Ω in the following way.

If it is at x ∈ Ω, then the next position is chosen depending on a fixed probability distribution P (x, ·). So,

Definition 6. Let Ω be the state space, P be the transition matrix and x be the current state. Assume that (X₀, X₁, ...) is a sequence of random variables satisfying the following.

If ∀x, y ∈ Ω, ∀t > 1 and if all events Ht−1 = Tt−1

s=0{Xs = xs} satisfy P(H^t−1∩ {Xt= xt}) > 0, we have that

P{X^t+1 = y|Ht−1∩ {Xt−1 = x}}

= P{Xt+1= y|X_t= x} = P (x, y) (2.1) Then (X₀, X₁, ...) is a Markov chain with state space Ω and transition matrix P .

Equation (2.1) is called the Markov property, which means that the probability of going from state x to state y is the same, no matter what happened with the sequence x₀, x₁, ..., x_t−1 before the current state x.

Definition 7. Assume that the Markov chain (X₀, X₁, ...) has a finite state space Ω and the transition matrix P . Let x ∈ Ω, then

the hitting time for x is the first time at which the chain visits state x and we define the hitting time as follows

τ_x = min{t ≥ 0 : X_t= x}

Moreover, if we only have that a visit to x is at t ≥ 1, then we also have another notation for the hitting time, which is

τ_x⁺ = min{t ≥ 1 : Xt= x}

Also, if X₀ = x, then τ_x⁺ will be called the first return time.

Definition 8. A stopping time τ for (X_t) is a {0, 1, ...} ∪ {∞}-valued random variable, such that for all t, the event {τ = t} is determined by X₀, X₁, ..., X_t.

If τ is the stopping time, then using the previous definition and the Markov property we get

Px0{(X_{τ +1}, X_{τ +2}, ..., X_l) ∈ A|τ = k and (X₁, X₂, ..., X_k) = (x₁, x₂, .., x_k)}

= Pxk{(X₁, ..., X_l) ∈ A}, ∀A ⊂ Ω^l, which is the strong Markov property.

Definition 9. A chain P is irreducible, if for any states x, y ∈ Ω there exists t ∈ Z, such that P^t(x, y) > 0.

(10)

This means that from any state we can reach any other state with a positive probability. Note that t can depend on the two states x and y.

Before we get the result from the last definition, what is a random walk?

Definition 10. The random walk on a group G with increment distribution µ is defined as a Markov chain with state space G and which moves by multiplying the current state on the left by a random element of G selected according to µ.

So, the transition matrix P of this Markov chain is as follows.

P (g, hg) = µ(h), ∀g, ∀h ∈ G

Remark 2. We multiply on the left, because it is to be consistent with the usual notation for composition of functions in non-commutative cases. For commutative cases, it doesn’t matter, since in this case multiplying on the left or on the right is the same.

Thus from the two previous definitions and Definitions 1 and 2 we get the following result.

Proposition 1. Let G be a finite group and let S = {g ∈ G : µ(g) > 0}.

The random walk on G with increment distribution µ is irreducible, if and only if hSi = G.

Proof. ’⇒’: Let ¯g be an arbitrary element of G. Assume that the random walk is irreducible, so ∃r > 0, such that

P^r(id, ¯g) > 0. Therefore there is a sequence s₁, ..., s_r ∈ G, such that

¯

g = s_rs_r−1· · · s₁ and s_i ∈ S, ∀i ∈ {1, 2, ..., r}.

So hSi = G.

’⇐’: Now assume that hSi = G. Let ¯g, ˜g ∈ G be arbitrary, then

˜

g¯g⁻¹ = s_rs_r−1· · · s₁, where s_i ∈ S and i ∈ {1, 2, ..., r},

because every element of G has a finite order and every inverse in ˜g¯g⁻¹ can be rewritten as a positive power of the same group element.

Let m > 0, therefore

P^m(¯g, ˜g) ≥ P (¯g, s₁g)P (s¯ ₁g, s¯ ₂s₁¯g) · · · P (s_r−1s_r−2...s₁¯g, (˜g¯g⁻¹)¯g)

= µ(s₁)µ(s₂) · · · µ(s_r) > 0, ∀¯g, ˜g ∈ G

So the random walk on G with increment distribution µ is irreducible.

If all states have period 1, then the chain is aperiodic and if the chain is not aperiodic, then it is periodic. The period of a state is defined in the following way.

Definition 11. Let T (x) := {t ≥ 1 : P^t(x, x) > 0} be the set of times at which it is possible for the chain to return to the starting position x.

Therefore the period of the state x is the greatest common divisor of T (x), denoted by gcd(T (x)).

(11)

This implies the following result.

Lemma 3. Let µ be the probability distribution on a group G. If µ(id) > 0, then the random walk with increment distribution µ is aperiodic.

Proof. Let g ∈ G, then µ(id) = P (g, id · g) = P (g, g) > 0.

So 1 ∈ {t : P^t(g, g) > 0}, therefore gcd{t : P^t(g, g) > 0 = 1}.

Thus gcd(T (x)) = 1 and the resulting chain is aperiodic. Definition 12. A distribution π on a finite set Ω is called stationary distribution of the Markov chain, if π satisfies

π = πP, where P is the transition matrix.

Before we say something about mixing times, we need the definition of the total variation distance.

Definition 13. The total variation distance between two probability distributions µ and ν on a finite set Ω is given by

||µ − ν||_{T V} = max

A⊂Ω|µ(A) − ν(A)| (2.2)

Proposition 2. Let µ and ν be the two probability distributions on Ω, then

||µ − ν||_{T V} = 1 2

X

x∈Ω

|µ(x) − ν(x)| (2.3)

Figure 1

(12)

Proof. Let B = {x : µ(x) ≥ ν(x)}, as we see in Figure 1 and of course, B^c= {x : µ(x) < ν(x)}.

In this Figure 1 region I has area A(I) = µ(B) − ν(B) and II has area A(II) = ν(B^c) − µ(B). Note that the total area under each of µ and ν is one. Therefore

µ(B) + µ(B^c) = 1 ⇒ µ(B) = 1 − µ(B^c) ν(B) + ν(B^c) = 1 ⇒ ν(B) = 1 − ν(B^c) So

A(I) = µ(B) − ν(B) = ν(B^c) − µ(B) = A(II) Hence, the regions I and II have the same area.

Now let A ∈ Ω be any event.

First for any x ∈ A ∩ B^c, x satisfies the inequality µ(x) − ν(x) < 0,

therefore the difference in probability cannot decrease, if we eliminate all these elements x ∈ A ∩ B^c. So

µ(A) − ν(A) ≤ µ(A ∩ B) − ν(A ∩ B)

Note that if we include more elements of B, then the difference in probability cannot decrease, therefore

µ(A ∩ B) − ν(A ∩ B) ≤ µ(B) − ν(B) Hence,

µ(A) − ν(A) ≤ µ(B) − ν(B) Similarly, the following holds too.

ν(A) − µ(A) ≤ ν(A ∩ B^c) − µ(A ∩ B^c) ≤ ν(B^c) − µ(B^c) Assume that A = B, then

|µ(A) − ν(A)| = ν(B^c) − µ(B^c) = µ(B) − ν(B) So

maxA⊂Ω|µ(A) − ν(A)| = ||µ − ν||_{T V}

= 1

2[µ(B) − ν(B) + ν(B^c) − µ(B^c)]

= 1 2

X

x∈Ω

|µ(x) − ν(x)|

Hence,

||µ − ν||_{T V} = 1 2

X

x∈Ω

|µ(x) − ν(x)|

(13)

Let P the transition matrix of the Markov chain and π be the stationary distribution, then define the maximal distance between P^t(x, ·) and π as follows

d(t) = max

x∈Ω ||P^t(x, ·) − π||_{T V} Therefore

Definition 14. The mixing time measures the time required by a Markov chain for the distance to the stationarity to be small. So the mixing time is defined by

t_mix() = min{t : d(t) ≤ }

(14)

2.3. Shuffling cards in mathematics.

Now the question is, how can we describe shuffling cards mathematically? Again following [16].

An ordered arrangement of a deck of n cards can be seen as an element of the symmetric group S_n, which consists all permutations σ of the numbers {1, 2, ..., n}.

We interpret permutations of S_nas acting on the locations of the cards.

For example, we have that σ(1) = 3, σ(2) = 1, σ(3) = 2 and σ(4) = 4, then this means that card 1 goes to position 3, card 2 to position 1, card 3 to position 2 and card 4 stays in the same place 4. This can be also written in the cycle notation: σ = (132)(4).

Assume that µ is a probability distribution on S_n. We can define a procedure for shuffling cards based on µ as follows.

Apply the permutation σ ∈ S_n to a deck with probability µ(σ).

Therefore we get the following definition.

Definition 15. Assume that we use the last procedure, then

Repeatedly shuffling the deck is the same as running the random walk on the group G with the increment distribution µ.

Now let the support of µ be K, which is K = {σ ∈ S_n: µ(σ) > 0}.

Therefore from Proposition 1 we get the result that hKi = Sn, if and only if the resulting chain of repeatedly shuffling the deck is irreducible.

Moreover from Definition 12 we observe that every shuffle chain has a uniform stationary distribution.

(15)

2.3.1. Generating an exactly uniform Random Permutation.

We use random permutations to describe a simple algorithm for generating an exactly uniform random permutation using [16], so we get the following definition.

Definition 16. Let σ₀be the identity permutation and k be in {1, 2, ..., n}.

Then we construct the permutation σ_k from the previous permutation σ_k−1 by swapping the cards at locations k and J_k, where J_k∈ {k, ..., n}

uniformly and J_k is an integer independently of {J₁, J₂, ..., J_k−1}. So

σ_k(i) =







σ_k−1(i) if i 6= J_k, i 6= k σ_k−1(J_k) if i = k

σ_k−1(k) if i = J_k

(2.4)

Now we want to prove that this generates a uniformly chosen element of S_n, so

Lemma 4. Let J₁, ..., J_n−1be independent integers, where J_kis uniform on the set {k, k+1, ..., n}. Assume that σ_n−1is the random permutation obtained by Definition 16. Therefore σ_n−1 is uniformly distributed on Sn.

Proof. We want to prove that σ_n−1 is uniformly distributed on S_n by showing that

P σk(j) = η(j) : for j ∈ {1, 2, ..., k} =

k−1

Y

i=0

(n − i)⁻¹ (2.5) by induction on k ∈ {1, 2, ..., n − 1}.

Definition 16 gives us the equations,

σ_k(i) =







σ_k−1(i) if i 6= J_k, i 6= k σ_k−1(J_k) if i = k

σ_k−1(k) if i = J_k Let a specific permutation η ∈ S_n be given.

Basic step: For k = 1 we get that

P σ1(j) = η(j) : for j ∈ {1, 2, ..., k} =

1−1

Y

i=0

(n − i)⁻¹ = n⁻¹ and η(j) = σ₁(j) = σ₀(j), where σ₀(j) is the identity function,

∀j ∈ {1, 2, ..., k}.

So the equation (2.5) is true, if k = 1.

Induction step: Assume that is holds for k, then we have to prove it for k + 1. Thus we have that the equation (2.5) holds.

Let all η(1), ..., η(k) be distinct in {1, 2, ..., n}.

(16)

Therefore

P(σ^k+1(1) = η(1), ..., σ_k+1(k) = η(k), σ_k+1(k + 1) = η(k + 1))

= P(σk+1(1) = η(1), ..., σ_k+1(k) = η(k))

· P(σk+1(k + 1) = η(k + 1)|σ_k+1(1) = η(1), ..., σ_k+1(k) = η(k))

= P(σk(1) = η(1), ..., σ_k(k) = η(k))

· P(σk+1(k + 1) = η(k + 1)|σ_k(1) = η(1), ..., σ_k(k) = η(k)), applying Definition 16. Then the last equality is the same as

k−1

Y

i=0

(n − i)⁻¹· 1 n − k =

k

Y

i=0

(n − i)⁻¹. Therefore it holds for the case k + 1.

Thus by induction we proved that

P σk(j) = η(j) : for j ∈ {1, 2, ..., k} =

k−1

Y

i=0

(n − i)⁻¹,

∀k ∈ {1, 2, ..., n − 1}.

Hence, σ_n−1 is uniformly distributed on S_n.

(17)

2.4. Examples.

Some useful examples to determine if a random walk is irreducible or not, are given here using the source [16].

Example 1. Let T be the set of all 3-cycles in S_n and µ be uniform on T . So every permutation in T is even and not every permutation in S_n can be written as a product of 3-cycles. Therefore T does not generate S_n.

Hence, using Proposition 1 the random walk with increment distribution µ is not irreducible.

Example 2. Now let T be the set of all transpositions in S_n and µ be the uniform probability distribution on T .

So all permutations in T are odd.

Using Definition 16 and that every permutation in S_n can be written as a product of 2-cycles by Theorem 2 we have that T generates S_n. Therefore the random walk with increment distribution µ is irreducible applying Proposition 1.

Moreover, the support of µ K is odd.

So if the random walk is started at the identity, its position must be an even permutation after an even number of steps. And after an odd number of steps, its position is an odd permutation.

Example 3. (Lazy random transpositions) A more natural way for shuffling cards is the following.

Let each card be labeled with a number between 1 and n, then at time t, the shuffler chooses two cards L_t and R_t independently and uniformly at random.

If L_t and R_t are different, transpose these two cards. Otherwise, do nothing.

We get the following result for µ, µ(e) = 1

n, if e is the identity µ(σ) = 2

n², if σ is a transposition in S_n (2.6) µ(.) = 0, otherwise

Since we have n² possibilities in total and there are n ways to get the identity, the probability that a permutation in Sn is an identity permutation is _nⁿ2 = ¹_n. Also, there are two possibilities that σ is the transposition (ij), so the probability that a permutation in S_n is this transposition (ij) is _n²2, since (ij) = (ji).

We will use this method for the next section.

(18)

3. Random Transpositions 3.1. Natural method.

There are many shuffling methods, one of them is the Top in at random shuffle.[2] The procedure of this method is the following. The top card is removed and inserted into the deck at a random position and this process will be repeated for a number of times.

Another method is the Riffle shuffle [2][16], which is the most often used to shuffle real decks of 52 cards. It is as follows. First the shuffler cuts the deck into two piles, then the piles are riffled together.

We use the shuffling method Random Transposition shuffling [16][7], because is a simpler shuffle than the other methods. This method is also called the Natural method, because it is a more natural way for shuffling cards. The procedure was described in Example 3.

On this symmetric group we have a probability measure T , which models a random transposition in a certain way, which was explained in example 3. However we use T instead of µ.

The convolution of this probability measure T with itself k times models the result of k random transpositions. This convolution and the uniform distribution on S_n are the following.

Definition 17. T^∗k is the k-convolution of T with itself k times and this k-convolution models the result of k random transpositions.

U is the uniform distribution on the symmetric group S_n. Therefore

U (σ) = 1

n!, ∀σ ∈ Sn (3.1)

Using the equations (2.2) and (2.3) the total variation distance is in our case the following.

||T^∗k− U ||_{T V} = 1 2

X

σ∈Sn

|T^∗k(σ) − U (σ)|

= 1 2

X

σ∈Sn

T^∗k(σ) − 1 n!

= max

A⊂Sn

|T^∗k(A) − U (A)| (3.2)

The lower bound and upper bound for this variation distance are given in the following sections.

First, we need the Inclusion-Exclusion formula. Let A_i∩ A_j be denoted by A_iA_j, ∀i, j.

(19)

Theorem 4 (Inclusion-Exclusion Formula). If P is a probability function and A₁, A₂, ..., A_n are any sets in B, then

P(

n

[

i=1

Ai) =

n

X

i=1

P(Aⁱ) − X

1≤i<j≤n

P(AⁱAj) + X

1≤i<j<k≤n

P(AⁱAjAk)

− X

1≤i<j<k<l≤n

P(AⁱA_jA_kA_l) + ... + (−1)ⁿ⁺¹P(∩ⁿi=1A_i) (3.3) Proof. The proof is by Induction on n.

Assume that P is a probability function and A1, A₂, ..., A_n are any sets in B.

First the case that n = 2:

A₁ and A₂∩ A^c₁ are disjoint, since A₁ and A^c₁ are disjoint. So,

P(A1∪ A₂) = P(A1) + P(A2∩ A^c₁) (3.4) First A₂ = {A₂∩ A₁} ∪ {A₂∩ A^c₁}, then

P(A²) = P({A2∩ A₁} ∪ {A₂∩ A^c₁})

= P({A²∩ A1}) + P({A²∩ A^c₁}) ⇔ P(A2∩ A^c₁) = P(A2) − P(A1 ∩ A₂) So (3.4) becomes

P(A1∪ A₂) = P(A1) + P(A2) − P(A1∩ A₂) Now the case n = 3:

P(A1∪ A₂∪ A₃) = P(A1∪ (A₂∪ A₃)) Using case n = 2 we get the last equality is same as

P(A1 ∪ A₂) + P(A3) − P((A1∪ A₂) ∩ A₃)

Now using again the case n = 2 a few times, then the last equation is P(A¹) + P(A2) − P(A1∩ A₂) + P(A3) − P((A1∩ A₃) ∪ (A₂∩ A₃))

= P(A¹) + P(A²) + P(A³) − P(A¹∩ A2) − P(A¹∩ A3)

− P((A2 ∩ A₃) + P(A1∩ A₂∩ A₃)

= P(A1∪ A₂∪ A₃).

Assume that it holds for the case n, so P(

n

[

i=1

Ai) =

n

X

i=1

P(Aⁱ) − X

1≤i<j≤n

P(AⁱAj) + X

1≤i<j<k≤n

P(AⁱAjAk)

− X

1≤i<j<k<l≤n

P(AⁱA_jA_kA_l) + ... + (−1)ⁿ⁺¹P(∩ⁿi=1A_i)

(20)

holds and now we have to show it for the case n + 1:

So prove that P(

n+1

[

i=1

A_i) =

n+1

X

i=1

P(Aⁱ) − X

1≤i<j≤n+1

P(AⁱA_j)

+ X

1≤i<j<k≤n+1

P(AiA_jA_k) − ... + (−1)ⁿ⁺²P(∩ⁿ⁺¹i=1A_i) Indeed,

using the case n = 2 and the case n we get

P(∪ⁿ⁺¹i=1Ai) = P(Aⁿ⁺¹) + P(∪ⁿi=1Ai) − P(Aⁿ⁺¹∩ (∪ⁿ_i=1Ai)) (3.5) Then

P(An+1∩ (∪ⁿ_i=1A_i)) = P(∪ⁿi=1(A_i∩ A_n+1))

=

n

X

i=1

P(AiA_n+1) − X

1≤i<j≤n

P(AiA_jA_n+1) + ...

Therefore (3.5) is equal to P(An+1) +

n

X

i=1

P(Ai) − X

1≤i<j≤n

P(AiA_j) −

n

X

i=1

P(AiA_n+1)

+ X

1≤i<j<k≤n

P(AⁱAjAk) + X

1≤i<j≤n

P(AⁱAjAn+1) − ...

=

n+1

X

i=1

P(Aⁱ) − X

1≤i<j≤n+1

P(AⁱAj)

+ X

1≤i<j<k≤n+1

P(AiA_jA_k) − ... + (−1)ⁿ⁺²P(∩ⁿ⁺¹i=1A_i)

Hence, this proves the Inclusion-Exclusion formula. Also useful is this using [4],

Definition 18.

We can write

f (n) = o(g(n)), if and only if

n→∞lim

|f (n)|

g(n) = 0 Also, we can write

f (n) = O(n),

if and only if there exists some constant C > 0 in R, such that

|f (n)| ≤ Cn

(21)

3.2. The lower bound for the variation distance.

Theorem 5 (The lower bound for the variation distance).

Assume that T is the probability measure on Sn, given by the equation (2.6) in Example 3 using T instead of µ.

Suppose that T^∗k is its k-convolution. Let U be the uniform distribution on S_n, such that U (σ) = _n!¹, ∀σ ∈ S_n. Suppose that

c = c(k, n) = ^k−

1 2n log n

n , then ∀k,

||T^∗k− U ||_{T V} ≥ (1

e − e^−e^−2c) + o(1), (3.6) as n → ∞.

Remark 3.

This lower bound is useful if c < 0. That is, if k < ¹₂n log n.

So from the last remark we only have to look at the case in the proof of Theorem 5, when k < ¹₂n log n.

Now we will give the proof of theorem 5 following the sources [7][8].

Proof of the lower bound for the variation distance.

Let A be the set of all permutations of Snwith one or more fixed points.

For all i ∈ {1, 2, ..., n}, i is called a fixed point of the permutation σ if σ(i) = i.

Therefore we have the following claim.

Claim 1.

U (A) = 1 − 1

e + O1 n!

(3.7) Proof of Claim 1.

U (A) = P(at least one fixed point of the permutation)

= P(

n

[

i=1

{σ(i) = i})

Using the Inclusion-Exclusion formula (3.3) we get that the last equality is

P(

n

[

i=1

{σ(i) = i}) =

n

X

i=1

P({σ(i) = i}) − X

1≤i<j≤n

P({σ(i) = i} ∩ {σ(j) = j})

+ ... + (−1)ⁿ⁺¹P(

n

\

i=1

{σ(i) = i})

= nP(σ(1) = 1) −n 2

P({σ(1) = 1} ∩ {σ(2) = 2}) +n

3

P({σ(1) = 1} ∩ {σ(2) = 2} ∩ {σ(3) = 3}) − ...

(22)

We have that

P({σ(1) = 1}) = 1

n, P({σ(1) = 1} ∩ {σ(2) = 2}) = 1 n(n − 1),

P({σ(1) = 1} ∩ {σ(2) = 2} ∩ {σ(3) = 3}) = 1

n(n − 1)(n − 2) and so on, then

U (A) = n1

n−n(n − 1) 2!

1

n(n − 1)+n(n − 1)(n − 2) 3!

1

n(n − 1)(n − 2)−..., since for example ⁿ₂ = _2!(n−2)!^n! = ⁿ⁽ⁿ⁻¹⁾_2! .

So

U (A) = 1 − 1 2! + 1

3! − 1

4! + ... + (−1)ⁿ⁺¹ 1

n! ≈ 1 −1 e, because e^x =P∞

i=0 xⁱ

i , then e⁻¹ =

∞

X

i=0

(−1)ⁱ i

= 1 − 1 + 1 2!− 1

3!+ ... = 1 2!− 1

3!+ 1 4!− ...

Furthermore, we want to prove that

−(1 − e⁻¹) + U (A) = O1 n!

using Definition 18.

So

(1 − e⁻¹) − U (A) = (−1)ⁿ 1

(n + 1)! + (−1)ⁿ⁺¹ 1

(n + 2)! + ...

Then using the triangle inequality,

| − (1 − e⁻¹) + U (A)| ≤ |(−1)ⁿ 1

(n + 1)!| + |(−1)ⁿ⁺¹ 1

(n + 2)!| + ...

= 1

(n + 1)! + 1

(n + 2)! + ...

=

∞

X

i=1

1 (n + i)!

(23)

Moreover,

∞

X

i=1

1

(n + i)! ≤ 1 n! +

∞

X

i=1

1 (n + i)!

= 1 n! + 1

n!

1

(n + 1)+ 1 n!

1

(n + 1)(n + 2) + ...

= 1

n!(1 + 1

n + 1+ 1

(n + 1)(n + 2) + ...)

≤ 1

n!(1 + 1 n + 1

n² + 1 n³ + ...)

≤ 1 n!C,

where C is some constant in R, since (1+_n¹+_n¹2+_n¹3+...) is a geometric series, which converges.

Hence, the result of this is that

| − (1 − e⁻¹) + U (A)| = |(1 − e⁻¹) − U (A)| ≤ C 1 n!, for some C > 0. Therefore using Definition 18 we get that

−(1 − e⁻¹) + U (A) = O

1 n!

, which is the same as

U (A) = 1 − e⁻¹+ O1 n!

,

which proves the claim.

We have an equation for U (A) and now we want to find a lower bound for T^∗k(A). Assume that the random transpositions are (L₁, R₁), .., (L_k, R_k) and let B be the event that the set of labels {L_i, R_i}^k_i=1 is strictly smaller than {1, 2, ..., n}. Therefore the second claim is as follows.

Claim 2.

P(B) = 1 − e^−ne

− 2kn

+ o(1), (3.8)

as n → ∞ and ∀k ∈ [0, b¹₂n log nc).

Also,

∀ > 0, there exists n₀, such that n ≥ n₀, then

|P(B) − (1 − e^−ne^{− 2k}ⁿ)| < , for any 0 ≤ k < ¹₂n log n.

Proof of Claim 2.

The situation of event B can be seen as the case that 2k balls are dropped in n boxes (or called cells), then the probability of B is the same as the probability of at least one cell is empty.

Assume that each arrangement has probability _n¹_2k and let A_j be the

(24)

event that cell j is empty, where j ∈ {1, 2, ..., n}.

Then in A_j one cell is empty, so 2k balls can be placed in (n − 1) cells in (n − 1)^2k different ways. For 2 empty cells we have (n − 2)^2k different ways, and so on.

Therefore

P(Aj1) = 1

n^2k(n − 1)^2k = 1 − 1

n

2k

, P(Aj1 ∩ A_j₂) = 1

n^2k(n − 2)^2k = 1 − 2

n

2k

, and so on.

Then using the Inclusion-Exclusion formula (3.3), P(at least one cell is empty)

=

n

X

i=1

P(Ai) − X

1≤i<j≤n

P(AiA_j) + ... + (−1)ⁿ⁺¹P(

n

\

i=1

A_i)

= nP(A¹) −n 2

P(A¹∩ A2) + ... + (−1)ⁿ⁺¹P(A¹∩ ... ∩ An)

=

n

X

i=1

(−1)ⁱ⁺¹n i

1 − i

n

2k

Let Si = ⁿ_i(1 −_nⁱ)^2k, then

P(all cells are occupied) = P0(2k, n)

= 1 − P(at least one cell is empty) =

n

X

i=0

(−1)ⁱS_i

Our goal is to find the limiting formula for the probability P0(2k, n) as n tends to infinity and k < ¹₂n log n.

Let λ = ne⁻^2kⁿ. Since k < ¹₂n log n, we have that λ < 1 and λ is positive. Therefore λ stays in a finite interval (a, b), where 0 < a ∈ R and a < b ∈ R. So we will follow the proof of Theorem 3 of the book [8].

First we will estimate S_i, note that

n(n − 1) · · · (n − i + 1) = n!

(n − i)!

and

(n − i)ⁱ < n!

(n − i)! < nⁱ, which is the same as

nⁱ

nⁱ(n − i)ⁱ(1 − i

n)^2k < i!

i!

n!

(n − i)!(1 − i

n)^2k < nⁱ(1 − i n)^2k ⇔ nⁱ(1 − i

n)^i+2k < i!S_i < nⁱ(1 − i

n)^2k (3.9)

(25)

To get another expression for (3.9), we need the following.

First we have that geometric series is 1

1 − t = 1 + t + t²+ t³+ ... =

∞

X

j=0

t^j,

if and only if −1 < t < 1.

Therefore

t · 1

1 − t = t + t²+ t³+ t⁴+ ...

Moreover the Taylor expansion of the natural logarithm is

log t =

∞

X

j=1

(−1)^j−1(t − 1)^j

j ,

if |t − 1| ≤ 1 and if t 6= 0.

Substitute 1 − t instead of t in the Taylor expansion of the natural logarithm we get that

log(1 − t) = −

∞

X

j=1

t^j j

⇒ − log(1 − t) =

∞

X

j=1

t^j j

= t + 1 2t²+1

3t³+ ..., if −1 < t < 1.

So

t < − log(1 − t) < t

1 − t ⇔ e^−t > (1 − t) > e⁻^1−t^t Using the last inequality the expression (3.9) becomes

nⁱe⁻⁽^1−i/n^i/n ^)(i+2k) < i!S_i < nⁱe⁻ⁿⁱ^2k ⇔

(ne⁻^i+2kⁿ⁻ⁱ)ⁱ < i!S_i < (ne⁻^2kⁿ)ⁱ (3.10) Moreover for a fixed i we get that

(ne⁻^2kⁿ)ⁱ (ne⁻^i+2kⁿ⁻ⁱ)ⁱ

−→ 1 and (ne⁻^i+2kⁿ⁻ⁱ)ⁱ

(ne⁻^2kⁿ)ⁱ −→ 1,

(26)

as 2k, n tend to infinity, such that 0 < a < λ < b.

We also have that λ = ne⁻^2kⁿ, therefore i!S_i

λⁱ −→ 1 ⇒ S_i −→ λⁱ

i! ⇒

0 ≤ λⁱ

i! − S_i −→ 0, (3.11)

if 2k and n increase in such a way that λ is bounded.

This relation (3.11) still holds, if λ tends to zero, because then S_i tends to zero too.

Since 2k and n increase, such that 0 < a < λ < b, we can use (3.11) to get the following expression.

P0(2k, n) =

n

X

i=0

(−1)ⁱS_i ≈

n

X

i=0

(−1)ⁱλⁱ

i! ≈ e^−λ Moreover we want to show that

−P0(2k, n) + e^−λ = o(1), using Definition 18.

So

n→∞lim |e^−λ− P0(2k, n)|

= lim

n→∞

∞

X

i=0

(−λ)ⁱ i! −

n

X

i=0

(−1)ⁱS_i

=

∞

X

i=0

(−λ)ⁱ

i! − lim

n→∞

n

X

i=0

(−1)ⁱS_i

=

∞

X

i=0

(−λ)ⁱ i! −

∞

X

i=0

(−λ)ⁱ i!

= 0,

since 2k → ∞ and n → ∞, such that λ is bounded, therefore we can use (3.11) in the last expression.

Thus using Definition 18 we get the following.

− P0(2k, n) + e^−λ = o(1) ⇔ P(B) = 1 − e^−λ+ o(1)

Now for any k < ¹₂n log n we want to prove that for an arbitrary > 0, there exists n₀, such that n ≥ n₀, then

|(1 − e^−λ) − P(B)| <

(27)

Indeed, let > D for some constant D ∈ R, then

|(1 − e^−λ) − P(B)| = |P(B) − (1 − e^−λ)|

= |e^−λ− P0(2k, n)|

=

∞

X

i=0

(−λ)ⁱ i! −

n

X

i=0

(−1)ⁱSi

<

∞

X

i=0

(−λ)ⁱ i! −

n

X

i=0

(−1)ⁱ(λ)ⁱ i!

< 1 −

n

X

i=0

(−1)ⁱ(λ)ⁱ i!

= λ − 1

2λ²+ 1

3!λ³− ... + (−1)ⁿλⁿ n!

<

∞

X

i=1

(−1)ⁱ⁺¹λⁱ i! < D,

since 2k and n tend to infinity, such that 0 < a < λ < b, therefore 0 < e^−λ < 1 and using (3.10). Also, the last expression converges, because λ is bounded and positive, where D is some constant in R and D < . So,

|(1 − e^−λ) − P(B)| < Hence,

P(B) = 1 − e^−ne

− 2kn

+ o(1), as n → ∞ and ∀k ∈ [0, b¹₂n log nc).

Also,

∀ > 0, there exists n₀, such that n ≥ n₀, then

|P(B) − (1 − e^−ne^{− 2k}ⁿ)| < , for any 0 ≤ k < ¹₂n log n.

This proves the second claim.

We have assumed that c = ^k−

1 2n log n

n , then e^−2c= e−2k/n+log n = ne⁻^2kⁿ. Therefore

P(B) = 1 − e^−ne

− 2kn

+ o(1) = 1 − e^−e^−2c+ o(1) We have that

T^∗k(A) ≥ P(B) = 1 − e^−e^−2c + o(1), (3.12) because B ⊂ A.

Thus using the variation distance (3.2) and the formulas (3.7) and

(28)

(3.12) we get

||T^∗k− U ||_{T V} ≥ |T^∗k(A) − U (A)|

≥ (T^∗k(A) − U (A)) ≥ (1 − e^−e^−2c+ o(1) − 1 + e⁻¹+ O1 n!

)

= (1

e − e^−e^−2c) + o(1),

∀k < ¹₂n log n, as n → ∞ and this proves theorem 5.

(29)

3.3. Preliminaries on Representation Theory.

Before we can give the theorem of the upper bound for the variation distance and the proof of it, we need some information about Repre- sentation Theory following the article [7] and the book [17]. In this subsection we will not give the proof of every result we state. The reader can find these proofs in the article [7].

What is a representation?

First let V be a vector space over the field C, G be a finite group and GL(V ) = Aut(V ) be general linear group on V , where Aut(V ) is the group of isomorphisms of V onto itself.

Definition 19. A linear representation of G is a homomorphism ρ : G → GL(V )

In other words a linear representation is a map ρ, such that ρ(st) = ρ(s)ρ(t), ∀s, t ∈ G

Furthermore V is called the representation space of G, or shortly the representation of G.

Note that if ρ is the representation of G and V the representation space of G, then we write (ρ, V ) as a notation. Also, let the degree of a representation ρ be denoted by d_ρ, which is the dimension of V . Assume that V has a finite dimension. Therefore GL(V ) is the group of dρ× dρ-invertible matrices on the field C.

Example 4. A trivial representation is ρ(s) =identity for all s ∈ G.

Example 5. A representation ρ with d_ρ = 1 is given by ρ : G → GL1(C) = C \ {0} = C^×

So every element of G has a finite order and the values of ρ(s) are roots of unity.

Example 6. Let G be Z/2Z × Z/2Z and ρ : G → GL2(C) be a homomorphism given by

ρ(0, 0) =1 0 0 1

, ρ(1, 0) =−1 0 0 −1

, ρ(0, 1) =0 1

1 0

, ρ(1, 1) = 0 −1

−1 0

Therefore ρ is a linear representation of G of degree 2.

Definition 20. Let the representations ρ and π be of the same group G. Therefore two representations (ρ, V_ρ) and (π, V_π) are equivalent or called isomorphic, if there exists a bijective linear map T : V_ρ → V_π, such that

T ◦ ρ(s) = π(s) ◦ T, ∀s ∈ G

Moreover the equivalent representations have the same degree.

(30)

Here is the definition of a subrepresentation.

Definition 21. Let

ρ : G → GL(V )

be a linear representation and W be a G-invariant vector subspace of V , which means that ∀w ∈ W ρ(s)w ∈ W , for all s ∈ G.

Moreover the restriction ρ(s)

W of ρ(s) to W is an isomorphism given by

ρ

W : G → GL(W ) and ρ

W(st) = ρ W(s)ρ

W

(t).

Therefore ρ

W is a linear representation of G in W and is called linear subrepresentation of G. Moreover W is the subrepresentation of V .

The last definition implies the following.

Definition 22. A representation ρ : G → GL(V ) is called irreducible, if V is not zero and if there are only trivial G-invariant subspaces of V , these are {0} and V .

Now we will discuss the transform of a function at a representation.

Definition 23. Let G be finite group and ρ : G → GL(V ) be the representation of G. Assume that the function P : G → C, then the analog of the Fourier transform of function P at a representation ρ is

ˆ

ρ(P ) =X

η∈G

P (η)ρ(η)

Furthermore let P1 and P2 be the functions on a finite group G, then the convolution of the two functions P₁ and P₂ is the following.

For γ ∈ G,

P₂∗ P₁(γ) =X

η∈G

P₂(γη⁻¹)P₁(η) Therefore,

Lemma 5. If P₁ and P₂ are two functions on G and ρ is a representation of G, then

ˆ

ρ(P1∗ P2) = ˆρ(P1) ˆρ(P2)

Assume that ˆG be the set of all irreducible representations of G. For the function P : G → C we have that ρ 7→ ˆρ(P ), where ρ ∈ ˆG and ˆ

ρ(P ) ∈ GL(V ). So ˆρ(P ) of ˆG can be seen as a matrix valued function in GL(V ).

Moreover, let |G| be the number of elements of G, also called the order of G. Further let Tr[.] be the trace of the matrix and ∗ be the complex conjugate transpose. Thus,

(31)

Lemma 6. For any function P : G → C, the Plancherel formula is

X

η∈G

|P (η)|² = 1

|G|

X

ρ∈ ˆG

d_ρTr[ ˆρ(P ) ˆρ(P )^∗] (3.13) The character of the representation ρ is denoted by X_ρ, which is the function X_ρ : G → C, such that

Xρ(s) = Tr[ρ(s)], ∀s ∈ G

Furthermore a character of an irreducible representation is called an irreducible character.

Since C-vector space V has dimension dρ, we have that ρ(1) = 1 and Tr(1) = dρ, therefore Xρ(1) = dρ. So the irreducible characters of G are defined by X1, ..., Xr, such that Xi(1) = di, where 1 ≤ i ≤ r.

Before we can go further, we need the following definition of regular representation.

Definition 24. Let V be a vector space of dimension d_ρ = |G| with a basis (e_t)_t∈G. Assume that ρ_s : V → V is a linear representation, for all s ∈ G, such that

ρ_s(e_t) = e_st

Then ρ_s is called the regular representation of G and V is called the regular representation space of G.

Note that the degree of ρs is the same as the order of G, which is |G|

and e_s = ρ_s(e₁).

To make it more clear, here is an example. [18]

Example 7. In S₃ the vector space V has a basis

(e_t) = (e_identity, e₍₁₂₎, e₍₁₃₎, e₍₂₃₎, e₍₁₂₃₎, e₍₁₃₂₎) by Definition 24, where t ∈ S₃. This implies that if we take for example the permutation t = (12) in S₃, then

ρ_s((12)) =







0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0





 ,

because identity=(1) and (1)(12) = (12), so we get a one on the second place in the first column. Further, (12)(12) = identity, thus there is a one on the first place in the second column and (13)(12) = (123), so we have a one on the fifth place in the third column, etc.. For the last column we have a one on the fourth place, since (132)(12) = (23).

Let X_G be the character of the regular representation ρ_s. If s 6= 1, then st 6= t. Therefore the diagonal elements of the matrix ρ_s are zero, so X_G(s) =Tr(ρ_s) = 0. Moreover if s = 1, then