• No results found

Mastermath midterm examination Parallel Algorithms.

N/A
N/A
Protected

Academic year: 2021

Share "Mastermath midterm examination Parallel Algorithms."

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Mastermath midterm examination Parallel Algorithms.

Teacher: Rob H. Bisseling, Utrecht University October 24, 2018

Each of the four questions is worth 10 points. Total time 120 minutes. Mo- tivate you answers!

1. (a) [5 pt] What is an h-relation in the BSP model?

(b) [5 pt] What is the cost of an h-relation?

2. An artificial neural network has several layers, each consisting of arti- ficial neurons and having connections to the previous and next layer.

Consider two layers: an input layer represented by a vector x of length n, where xi is the activation value at neuron i of this layer, and an output layer represented by a vector y of length n.

(a) [5 pt] Assume that the input and output vectors are distributed by the block distribution over p processors with n mod p = 0.

Assume that the output value yi is computed by yi = max(aixi−1+ bixi+ cixi+1− di, 0),

where the values ai, bi, ci are weights that have been obtained by previous training of the network, and di is a bias value, also ob- tained by training. The weights and the bias are hence repre- sented by vectors of length n, also in the block distribution. Define dummy values x−1= xn = 0, to define the boundary behaviour.

Give an efficient BSP algorithm for processor P (s) for this com- putation in the notation we have learned.

(b) [5 pt] Analyse the BSP cost.

3. Consider the LU decomposition of an n × n matrix A without pivoting.

1

(2)

(a) [2 pt] In stage k of the sequential LU decomposition without piv- oting, all matrix elements aij with k < i, j < n are updated by the statement aij := aij − aikakj. Determine the total cost of the matrix updates in this algorithm. You may use the formula

Pn

k=0k2 = n(n+1)(2n+1)

6 .

(b) [3 pt] Formulate the computation superstep of stage k of the par- allel LU decomposition that corresponds to the matrix update.

Assume that we use the square block distribution, with p = M2 processors and n mod M = 0. Use the notation we have learned to express algorithms.

(c) [3 pt] Analyse the computation cost of the matrix updates in the parallel LU decomposition algorithm for the square block distri- bution.

(d) [2 pt] What is the maximum speedup on p processors that the parallel LU decomposition algorithm can obtain, compared to the sequential algorithm?

4. Rook pivoting is a form of pivoting that is more stable than partial pivoting; it is almost as good as complete pivoting, but has a lower cost in finding the pivot. Rook pivoting works as follows: in stage k of the LU decomposition, we search for an element arc that has maximum absolute value in both its row and its column, among all elements aij with i, j ≥ k. This element becomes the pivot element. Then rows r and k are swapped, and columns c and k. The final result is an LU decomposition P AQ = LU , where P and Q are permutation matrices.

Now assume that we are at the start of the LU decomposition (i.e., k = 0) and that all matrix elements are different, so we do not have to break ties. Assume we have p = M2 processors and n mod M = 0.

Assume the matrix is distributed by the square cyclic distribution.

(a) [5 pt] Design a BSP algorithm that quickly finds a rook pivot.

Where possible and useful, alternate between rows and columns to give both equal treament. It is sufficient to express the algorithm in words (so no detailed program text is needed here).

(b) [5 pt] Analyse its BSP cost.

2

Referenties

GERELATEERDE DOCUMENTEN

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded from: https://hdl.handle.net/1887/26994..

Applications, volume Mathematical Centre Tracts 131. A calculus of refinements for program derivations. Refinement calculus, part ii: Parallel and reactive programs. Decentralization

In this bachelor project I considered the possibilities of using a grid search method in the analysis stage of the Multi-Scale Fiber Tract Bundling algo- rithm as described in [3],

(a) [5 pt] Give an efficient BSP algorithm with 2 supersteps, in the notation we have learned for processor P (s, t), 0 ≤ s, t &lt; M , for the computation of the matrix C = AB,

Bisseling, Utrecht University October 24, 2012.. Each of the four questions is worth

Give two different examples of a 10-relation with a total communica- tion volume (i.e., number of data words communicated) of 40 for 4 processors.. You may choose a

Analyse the BSP cost; if necessary, make additional assumptions in

Made available in electronic form by the T BC of A−Eskwadraat In 2005/2006, the course WISM459 was given by Rob