Hardware synthesis with the aid of dynamic programming

(1)

Hardware synthesis with the aid of dynamic programming

Citation for published version (APA):

Woudenberg, van, H., & Born, van den, R. (1988). Hardware synthesis with the aid of dynamic programming. (EUT report. E, Fac. of Electrical Engineering; Vol. 88-E-201). Eindhoven University of Technology.

Document status and date: Published: 01/01/1988 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Hardware Synthesis with

the Aid of

Dynamic Programming

by H. van Woudenberg and

R. van den Born

EUT Report 88-E-201 ISBN 90-6144-201-X

(3)

. 1

ISSN 0167- 9708

Faculty of Electrical Engineering

Eindhoven The Netherlands

Coden: TEUEDE

HARDWARE SYNTHESIS WITH THE AID OF DYNAMIC PROGRAMMING

by

H. van Woudenberg

and

R. van den Born

EUT Report 88-E-201

ISBN 90-6144-201-X

Eindhoven

June 1988

(4)

COOPERATIVE DEVELOPMENT OF AN INTEGRATED. HIERARCHICAL AND MULTIVIEW VLSI DESIGN SYSTEM WITH DISTRIBUTED

MANAGEMENT ON WORKSTATIONS. (Multiview VLSI-design System leo)

code: 991 DEUVERABLE Report on activity: S.1.D.

Abstract: This report describes the automated synthesis of hardware structures from behavioural descriptions.

Hereto the description is first translated to a demand graph. The nodes of this graph describe operations and algorithmic constructs, the edges describe data flo\\ Then a dynamic programming based method is used to generate the structure. Dynamic programming is used to restrict the large number of possible implementations by selecting at some intervals the best intennediate structural solutions.

The final hardware structure consists of a list of modules and a net-list with the interconnections between the modules. This hardware description will be completed with a state machine description.

1 Ie synthesis program is coded in LISP. Some improvements and completions

are suggested. The results are encouraging for funher research.

deliverable code: WP 5, task: 5.1. activity 5.1.0

date: 13·06-1988

partner: Eindhoven UniversilY of Technology

authors: H. van Woudenberg, R. van den Born

This report was accepted as a M.Sc. Thesis of

H. van Woudenberg by Prof.Dr.-Ing. J.A.G. Jess,

Automatic System Design Group,

Facu~ty

of

E~ectrica~

Engineering, Eindhoven University of

Techno~ogy.

The

work was supervised by Drs. R. Van den Born.

CIP-GEGEVENS KONINKLIJKE BIBLIOTHEEK, DEN HAAG Woudenberg~ H. van

Hardware synthesis with the aid of dynamic programming I by H. van Woudenberg and R. van den Born. Eindhoven: University of Technology. Fig.

-(Eindhoven University of Technology research reports I Faculty of Electrical Engineering, ISSN 0167-9708j 88-E-201)

Met lit. opg., reg. ISBN 90-6144-201-X

51S0 664.3 UDC 621.382:6B1.3.06 NUGI B32

(5)

I

, 1

1. INTRODUCTION • • • • • 1.1 CONTENTS OF THIS REPORT 2. SYSTEM OVERVIEW • • 2.1 DEMAND GRAPH. • 2.1.1 IF STATEMENT 2.1.2 WHILE LOOP 2.2 MODULE LIBRARY • 2.3 COST ESTIMATOR 3. DYNAMICPROGR~NG

3.1 SOLVING DECISION PROBLEMS 3.2 DYNAMIC PROGRAMMING

3.3 DYNAMIC HARDWARE GENERATION 4. HARDWARE GENERA TJNG PROCESS • •

4.1 IMPLEMENTATION • • • • • • • 4.2 PROCESSING A STATE . • • • • • 4.2.1 IMPLEMENTING A SIMPLE NODE 4.2.2 IMPLEMENTING A CASE STATEMENT • 4.2.3 IMPLEMENTING A WHILE LOOP. • 4.3 COMPARABILITY AND COST FUNCTION • 4.4 POSTPROCESSOR. • •

5. DATA STRUCTURES • • • • • • • • 5.1 STATE IDENTIFICATION • • • • • 5.2 RELATIONS TO THE DEMAND GRAPH 5.3 HARDWARE DESCRIPTION

5.4 COST DATA • • • • • • • 5.5 INPUTS, OUTPUTS & CONSTANTS 5.6 TRACING NODES AND MODULES 5.7 CYCLE MECHANISM • • • •

5.7.1 STARTING A NEW CYCLE •

5.7.2 NEW CYCLES FOR CASE·STATEMENT 5.7.3 NEW CYCLES FOR WHILE LOOP • 5.7.4 ENDING A CYCLE • • • • • 5.8 STACKS • • • • • • • • • • • 6. CONCLUSIONS AND RECOMMENDA,IONS REFERENCES

APPENDIX 1: DEMAND GRAPH NODE· AND EDGE·TYPES APPENDIX 2: STRUCTURE OF THE MODULE LIBRARY APPENDIX 3: FORMATS FOR COST E~TIMATOR

APPENDIX 4: SYNTAX OF A STATE • • • • • • •

I 2 3 4 5 6 7 8 9 9 10 11 12 13 13 13 16 18 19 20 21 21 21 21 23 23 23

24

25

25 26 27

28

29

30 33 36

(6)

- iv

-LIST OF FIGURES

Figure 2.1. Hardware synthesis system Figure 2.2. Demand graph for discriminant Figure 2.3. Demand graph for min-max-sort Figure 2.4. Demand graph for factorial

Figure 2.5. Standard cell bit-slice layout for factorial Figure 3.1. Tree for all possible sequences . . Figure 3.2. Lattice during dynamic programming Figure 4.1. Flow chart for hardware-generation Figure 4.2. Flow chart for process-state • • .

Figure 4.3. Mapping an operator node on an existing module

3 4 5 6 8 9 11 I3 14 15

(7)

1- INTRODUCTION

The synthesis of circuit structures or layouts from higher level descriptions is receiving more attention as the need for more powerful design aids increases. The rapid development of Very Large Scale Integrated Circuits (VLSI) creates these needs.

This

can well be illustrated by the fol-k\ling numbers (from [Latt79]). When the average layout productivity of a layout designer lays between 5 and 10 devices per day, then a VLSI circuit containing 100,000 transistors would take about sixty man years to layout and another sixty man years to debug the design.

The tremendous amount of detail and complexity associated with large systems has necessitated the development and use of design automation tools. First, automation design aids for simulation and verification have been developed. Then, with the increasing complexity of the designs, the need for creative aids synthesising a circuit design grows. The emphasis was on the placement and layout of regular logic structures in silicon. The design of circuits became an interactive process betwoon man and computer, in which more and more work was done by the computer, but still the man could not be missed. The advantages of automatic design are obvious: a reduced design time, reduced design costs, no need for verification of the designed hardware, etc. Next to the advan-tages in speed and costs, design automation tools ntake the design of even more complex circuits possible and it allows a designer, who is not versed in the detailed electrical problems of IC design, to design his IC chip.

The next step in this automation process is the reduction or even elimination of the human interac-tive influence on the design process as will be done by a silicon compiler. Silicon compilation is ([Davi84]) the translation of a (Very Large Scale) Integrated Circuit described in a high level

I, I guage into a target language describing the integrated circuit layout The silicon compilation

can be divided into several levels. This division starts with the hardware synthesis, also called the higher levels of the silicon compiler. G!obally and in common, hardware synthesis consists of

three stages ([Thom81]).

First, the transformation of the algorithm, for example into a graph that serves as an equivalent intermediate description of the circuit

Second, the generation of a net-list from this graph. A net-list represents all connections between

the modules that perform the operations prescribed by the algorithm. From this net-list description, a dal'l path will be generated. This dau: path contains the logic components that store and process

the data in the circuit

And third, the design of the control part. The control part represents the sequential state machine that evokes the processing of the information stored in the data path.

After the hardware synthesis a logic optimiser changes the data path following some optimisation rules; then the modules are placed and the connecting wires are routed.

Constructing a silicon compiler is at least as complex as designing a circuit Some obvious prob-lems, arising from the lower levels of the compiler, are (see [Sahn80]): the partitioning of the design process into sub-implementation-problems, keeping wire buildup in the routing channels within tolerable bounds, minimizing the total weighted wire length, routing the wires, minimizing

t1--.., number of layers. etc. Most problems that arise in the area of design automation are shown to

be NP-hard ([Coh083], [Sahn80]). But also at the higher levels NP-hard problems exist, e.g. the circuit realisation problem ([Sahn80]). This points out the importance of heuristics and other tools to obtain algorithms that perform well on the problem instances of interest and thus will return solutions that are near the optimal solution.

In some literature (e.g. [Shiv83]), silicon compilation is described as an academic exercise. In this context, mainly its higher levels are meant. But apart from the universities, a lot of industries pay atten")n to this field of research. As a result, a number of silicon compilers have been developed during the last few years. Mostly, these systems are no general compilers, but they only work for a small subset of all electrical circuits, for circuits of some special type. After all, a silicon compiler

(8)

·2·

""n best be used when the highest valued criteria are not minimizing the speed or the area of the design, but minimizing design costs and specially design time. Reducing design time overrules

the added cost due to less·than·perfect optimisation of design parameters.

The Esprit-991 project at the Automatic System Design Group of the department of Eleclrical Engillooring at the Eindhoven University of Technology concerns a silicon compiler. At this pro· ject I worked for my master degree and this thesis reports on the work I have done: it covers the generation of the net·list description and some initialisations for the state machine of the compiler. The demand graph that is constructed is transformed into a hardware deSCription that consists of two parts: a net-list that describes all connections in the hardware and a module·list that describes the cells that have been selected from a module library.

The dynamic programming approach is used to ensure that some near optimal hardware descrip· tion can be generated within reasonable time.

1.1 CONTENTS OF THIS REPORT

The next chapler gives a survey over the hardware synthesis pan of the silicon compiler syslem. The parts that are of main inlerest for the hardware generating subsystem are described in more detail.

Chapter 3 deals with dynamic programming. In short it describes this methodology solving multi· stage decision problems.

Then. chapler 4 comes to the heart of this report: the hardware generating syslem. This chapter descnbes the way the demand graph is transformed into hardware and how the dynamic program· ming approach works for the hardware generation.

Chapter 5 describes the stale, this is the main data structure that holds the status of the syslem; also the way some parts of the transformation are i-nplemented in a LISP program is described. Chapter 6 terminates this report; it gives some conclusions ahout the system and some recommen· dations for further research.

(9)

2. SYSTEM OVERVIEW

This chapter describes the hardware synthesis part of the Esprit-991 silicon compiler, as presented in figure 2.1. The main subject of this graduation work, the hardware generator, is in the middle of this figure. As shown by the arcs, it uses a demand graph as input data and it communicates with the module library and the cost estimator. Three subsections of this chapter are dedicated to these parts of the system.

The output of the hardware generator, the data path description and the state machine, is described in chapter 5. demand graph optimiser demand graph consttuctor layout description

Figure 2.1. Hardware synthesis system

The input to the system is a behavioural description of an algorithm in a high level language (LISP, C, Pascal). This algorithm describes the functions that must be fulfilled. The parser ana-lyses the algorithm syntactically with conventional compiler techniques ([Ah086) and converts this algorithm to an abstract syntax tree. This tree is converted into a demand graph by the demand

graph constructor, which has been described in [Stok86]. An optimiser deletes some inefficiencies of the demand graph, converting it into a functionally equivalent demand graph. Such an optimiser performs, for instance, constant folding, dead code elimination, code motion, elim'" :ltion of redundancies, and strength reduction, i.e. optimisations similar to those used in optimiSing compilers ([Abo86]). The hardware generator will produce the data path description

(10)

-4-and the state machine, wilb Ibe help of !he cost estiTMtor and choosing modules from Ibe ITlf)dule library. Then Ibe layout generator should convert Ibe more symbolical data palb description and

state machine into a detailed layout description. However, Ibis part of Ibe system is beyond Ibe

scope of Ibis report

C,mparing Lbe behavioural algorilbm at Ibe input of the system wiLb the produced data palh and state machine at the (intermediate) output, one can globally say thal the variables will be mapped to nets or registerS and operators to logic circuits; assignments can be seen as data flow through !he logic circuits; special constructs in the algorilbm (if statement, while loop) find their represen-tation in !he control by the state machine. .

2.1 DEMAND GRAPH

A demand graph is a directed graph that represents Lbe dataflow Lbrough operators. Figure 2.2 shows a demand graph for !he algorithm that computes the discriminant, used to solve !he qua-dratic equation as described by the next Pascal program:

program discriminant (a,b,c; var D);

read(a); read(b); read(c); D:= (b • b) - (4 • a. c); writein(D);

The nodes of a demand graph (shown as circles) represent the operations that are performed on the

dra The edges (shown as arcs) represent the dataflow from node to node. Each edge is directed from the node that uses !he data to the node that produces the data (demand!).

b a c

D

Figure 2.2_ Demand graph for discriminant

Appendix 1 summarises the properties that describe the nodes and the edges. A more detailed specification of the node and edge characteristics can be found in [Stok861 and [Veen851. In the next two sections two special demand graph constructs will be discussed. one for an

if

statement .

(11)

2.1.1 IF STATEMENT

The algorithm min-max-sort serves as an example for an

if

statement. This algorithm exchanges

the values of min and max when min is larger than max:

program min-max-sort (min max); read(min); read(max);

(if (min> max)

then begin min := min + max;

max :=min -max~ min := min - max

end;

writeln(min); writeln(max).

The corresponding demand graph is shown in figure 2.3. Two new node types ask for attention: the merge node and the branch node, that are drawn separately right to the demand graph. The names of the connecting edges for these nodes are written at those edges. For each of the variables involved in the if statement a merge and branch node pair have been created. Depending on the value at the control edge the data flows from the merge nodes to the link-in-l or to the Iink-in-2 nodes, where the paths for, respectively, the then and the else clause start. In the branch nodes these paths come together and one of them will be selected depending on the value at the control edge.

Figure 2.3. Demand graph for min-max-sort

value

6 M

cootto1 inlink-l inlink-2 outlink-l outlink-2

~conum

9

(12)

-6-The if statement is in fact a special case of the case statemen t where the case statement can select between n possible clauses, the if statement can be seen as a case statement for n=2_ For a case statement, each ""'rge node has n incoming edges named inlink-} .. inlink-n and each branch node

has consequently outgoing edges outlink-} " ou/link-n. The control structure will be somewhat more complex, but this does not change the idea behind the graph structure: at the ""'rge nodes n paths start, one path for each clause, and they come together in the branch nodes. Depending on the value of the control signal, one of the paths will be chosen.

2.1-2 WlDLE LOOP

The demand graph for a while loop is illustrated with the algorithm jactorial that computes the factorial of an integer variable n:

program factorial (n; var f);

read(n); f:= I; while (n > I) begin f:= f. n; n:= n - I end; writeln(f). >

Figure 2.4. Demand graph for factorial

last entry

~cont~

9

value

a

M=<~

last entry

The demand graph corresponding to this algorithm is shown in figure 2.4. Again two new node types are introduced: the entry node and the exit node. For each variable that is affected in the while loop, an entry- and exit-node pair is introduced. For both of these node types the node with

(13)

its edges and the names of these edges are listed next to the demand graph for the factorial algo-rithm.

Initially, the data enters the while loop through the entry edges of the entry nodes. Then the con-trol signal(s) for the entry and exit nodes can be calculated and they are connected to the concon-trol edges of both node types. While the control signal is 'true', the data circles around through the while loop: it leaves the exit nodes through the last edges, after the link-in-J nodes some opera-tions are performed on the data and then the data enters the entry nodes again through the last edges. At this moment the control signal(s) are computed again. The while loop ends when the control becomes 'false'; then the data leaves the exit nodes through the entry edges and becomes available at the connected Iink-in-2 nodes.

2.2 MODULE LIBRARY

The module library contains a set of predefined cells. Each cell is described by a set of characteris-tics. These characteristics can be accessed by a set of library access functions. Appendix 2 gives a summary of the syntax of the cell characteristics that are of interest, such as the width and height of the layout, the power dissipation, input-, output- and control-connections, delay time through the cell circuit, the operations that can be performed, etc. A complete description of the module library can be found in [Kais871.

The library cells perform specified functions. Simple cells only perform one function (for instance:

add, nand, or greater-than), but also more complex cells are provided (an alu that is able to add, subtract, divide and multiply, for instance). Next to the cells performing logic and arithmetic func-tions, also cells for (de)multiplexers, registers and some other 'functions' are present

The hardware generator chooses a cell taking into account the functions it can perform; it also takes into account the area the cell occupies and the delay of the cell.

At the moment all cells in the module library are of a special kind: standard cells for a bit-slice layout methodology. (This is not a limitation of the system, but a methodology that is chosen for.)

Standard cell means that each cell has the same width (the width depends on the complexity of the performed function(s)) and that connections for power supply, ground and (sometimes) clock sig-nals are placed on fixed distances in the height of the cell. These standard cells can economically be placed in rows: then the power, ground and clock will be connected automatically; between the rows channels are created that serve for routing the wires to connect the ceUs to each other. In a bit-slice layout methodology, each cell only describes one bit of a function. The functional elements will be composed of those bit-slices by taking a number of slices equal to the word length of the data processed. Looking at the layout, when you see the different cells being placed horizontally next to each other, then the composing slices are placed verticaUy next to each other, separated by wiring channels. This bit-slice approach saves the storage of a lot of network data that is similar for each slice.

In figure 2.5 the symbolic layout of two bit-slices of standard cells has been drawn. This place-ment has been generated with the hardware generator.

(Notes: 1. the interconnections between the slices (for carry etc.) have been left ou~ 2. all cells have equal width in this figure, but in reality they are different)

(14)

8

-de-

de-

ter-I multi- multi·

•

I control > multi- I

_-

multi· regis-

'er-mind plexer plexer Ie, minal

placr plexer

I

lUI UI\ I

LJ

II II

LJIUl

,LJ

L---1

I

do-

de-

le<-I multi- multi-

•

I control > multi- I

-

multi- regis-

'''-minal

placr plcxer plexer plexer

'e<

minal

I

LJ

I

UI\ I

LJ

II II

L-J

I

LJ

-c.:.I

Figure 2.5_ Slalldard cell bit-slice layout for factorial

2-3 COST ESTIMATOR

TIle implementation of a node can often be done in several ways. To make a well-considered choice between these possibilities, some different partial implementations, performing the same functions, are generated. Because it is unnecessary and even not desirable to evaluate all possible implementations till the far end, the process must continuously select the best partial implementation(s). Therefore these partial implementations must be compared somehow.

The cost estimator, described completely in [Enge88], produces an estimation for the module- and net-areas of an partial implementation; these values are used (among others) when implementa-tions are compared.

This estimator uses a model according to the linear placement heuristic of Kang. Linear placement

used for modelling hardware is a very time efficient method and hence the estimator can work quickly. This is important, because the estimator will be called frequently.

TIle linear placement also fits in the Slalldard cell bit-slice methodology. The estimator places the

slices in one row, trying to find the most economic sequence, i.e. the sequence causing the lowest

wiring density in the wiring channel. The placement algorithm works incremental: when a new cell must be placed in the row, the old placement row is taken, a number of cells will be deleted from the end until the new cell can best be placed and then the deleted cells will be placed again. Although the cost estimator uses a model instead of the real system and therefore is not 100% accurate, this does not matter that much. Because the cost estimator is used for comparing, it is

important that it has a good relative accuracy. Furthermore, the model is independent of the actu-ally used layout methodology, so it does not restrict the hardware generating system; a large variety of circuits can be represented. The model leaves out a number of details that are part of the original configuration, thus allowing a considerably smaller complexity in representing the system.

TIle way the hardware generator communicates with the cost estimator will be described in section 4.3. The syntax diagrams of appendix 3 give a summary of the formats used for the data that is

(15)

3_ DYNAMIC PROGRAMMING

Dynamic programming is a method used to get an optimal solution to multi-stage decision

prob-lems. In this chapter first is introduced why the dynamic programming strategy is applied to the hardware generating process. Then the method is explained. At the end the special conditions are discussed under which dynamic programming is used in our system.

3.1 SOLVING DECISION PROBLEMS

The class of problems this section deals with is illustrated with the next abstract example.

Consider a process of four jobs. These jobs are numbered 1 through 4. The process chooses one of these jobs as the first one to be executed. When this job has been completed, a next job is chosen and so on, until all jobs have been completed.

An essential characteristic of this process is the fact that costs can be calculated for each ellipse, i.e. for each subsequence of already completed jobs. The cost of a sequence depends not only on the jobs but also on the order in which they have been executed. Therefore the costs of all sequences probably will differ.

At this moment the problem of this process can be stated: which sequence is the cheapest and especially, how can you optimally find out which one is the optimal solution?

One can easily see, that there are 4!

=

24 different sequences scheduling these jobs. Figure 3.1 shows all of them in a tree. The numbers in the ellipses indicate the jobs that still have to be done.

Figure 3.1. Tree for all possible sequences

Of course it is possible to compute the costs for all possible sequences (depth first search or breadth first search) and then select the cheapest one just by comparing all costs. But realising

that the number of sequences grows proportional to the factorial of the number of jobs (and that is much worse than exponentia!!), this way will be quite time consuming even for small sets of jobs (as in the hardware synthesis process). Therefore it is necessary that the tree is restricted; dynamic programming is a method to achieve this.

(16)

-

!O-3.2 DYNAMIC PROGRAMMING

What dynamic programming exactly is, has been well described in literature ([Be1l62] and [Leve75] a.o.). This section presents a very shon reproduction of these works as far as they con-cern N-step detenninistic decision problems. Then the next section discusses how dynamic pro-gramming can be applied to the hardware generating process.

In a deterministic decision problem one has to make a finite number of decisions, each dependent on the status of the process at moment. Each decision consists of allocating some amount of a resource 10 an activity. Each activity rerums some yield, depending on the amount of invested resource. The objective is 10 find the optimal policy, i.e. the sequence of decisions that maximizes

the total yield of all activities. Therefore it is necessary, that the yields of all activities can be measured in some common unit and that me total yield can be obtained as me sum of the indivi-dual yields. Validity of the principle of optimality is a third condition that must be satisfied be:·ore dynamic programming can be applied.

Tbe principle of optimaUty:

An optimal policy ensures, independent of the initial state or the first decision, mat me next decisions fonn an optimal policy for the slate resulting from the first decision.

As a result of me dynamic programming approach, you get a sequence of decisions mat represents an optimal policy.

Suppose N decisions have to be taken, each decision being an investment R, in activity i (i =

1;2,.N). Then me principle of optimality tells, that having chosen some initial RN , you do not then

examine all poliCies involving that particular choice of _RN,but ramer only Ihose poUcies which are optimal for a N-I stage process. The same holds for a choice RN , RN_1 and aN -2 stage process,

and SO on. In this 'magical' way, operations will be kept essentially additive rather than multipli-cative.

Instead of maximizing the yield when investing in activities, dynamic programming can also be awlied when costs have 10 be minimized.

(17)

Figure 3-2. Lattice during dynamic progmmming

During the dynamic progmmming process. sequences containing the same jobs (or activities) in just different orders are compared and only the cheapest is maintained. This results in the lattice of figure 3.2 that is much smaller than the tree of figure 3.1. At the bottom of the lattice only one empty ellipse is present: this represents automatically the cheapest sequence and in this way the optimal solution has been found.

3.3 DYNAMIC HARDWARE GENERATION

The hardware synthesis implements the nodes of the demand graph one by one. Each implemen-tation concerns the mapping of a node on a hardware module. A decision in our hardware gen-erating process is the way a node from the demand graph is implemented into hardware. The rc;,Ources that are used when a node is implemented consist of area on the chip and time in a machine cycle.

A complete description of a state in the lattice is presented in chapter 5. For the dynamic process, a state represents an hardware description. This hardware has been formed during the implementa-tion of a part of the demand graph. The hardware can perform a certain set of operaimplementa-tions; this takes some machine cycle time and some area is needed for the hardware. Each state has a set of demand graph nodes that can be implemented next; implementing one of these nodes delivers a successive state in the state lattice.

The employment of dynamic progmmnting to the hardware generating process rests on two

mechanisms: a costftmction and comparability.

The costfunction is needed to measure in some way what a certain partial circuit costs. This

meas-urement is based on the total area of the cells that are selected and the wires that connect them and

on the maximal delay time through the circuit

The :·":Cond mechanism, comparability, is provided to determine which states are equivalent In section 4.3 the cost function and the comparability are specified in detail.

The strategies that are used for implementing some algorithmic constructs (e.g. if statement, while loop: see subsections 4.2.2 and 4.2.3) can not assure that the optimal policy will be found; on the cootrary, it is more likely it will not be found. So the principle of optimality is not valid and strictly speaking, the hardware generating process is not suitable for the dynamic progmmming approach.

Nevertheless, dynamic progmmming is used: it can be considered as a breadth first search in which the total number of possibilities is restricted considerably.

(18)

-12-4. HARDWARE GENERATING PROCESS

In this chapter the synthesis process will be described. The LISP-coded program is explained

lOp-down by discussing several of its algorithms, going inlO details when necessary. This chapter often refers 10 the properties of a slale. These properties (always printed in ilalics) and some of the rou· tines that use them

are

discussed in detail in the next chapter.

No

Yes generate demand graph initialise Slate·a, enter in next-slates current-states := next-states next,slales := empty

take state from current-slates

call process-state insert resulting state(s) in

next·slales if no cheaper comparable state is present

hardware representation

Figure 4.1. Flow chart for hardware-generation

block 1 block 2 block 3 block 4 blockS test 1 test 2 block 6

(19)

\,

4.1 IMPLEMENTATION

The main pan of the hardware generating process is presented in figure 4.1. The algorithm that must be compiled enters the system and is transformed into a demand graph; then it is passed to

the hardware generator.

In block 1 of the hardware generator some initialisations are performed: the properties of State.() and the global variables get the proper initial values.

The dynamic programming cycle is represented by block 2 until test 2. In block 2 the list

current·states keeps the states that represent the partial implementations that have been computed until now. Block 3 and test 1 take care that each of these states is passed to routine process-state (t:ock 4). This routine will be discussed in section 4.2. The output of process-state is a list of pos-sible new states, each representing an extended partial implementation with regard to the pro-cessed state. Each of these possible new states is added to the list next-states only if no cheaper comparable state is present in it. When a new state is added, the comparable but more expensive states that are already present in next-states are deleted. The calculation of the costs and the com-parability mechanism are discussed in detail in section 4.3.

When all states in current-slates are processed, a set of new states is present in next-states. Then. in block 2, these states move from next-states to ClUrent-states, next-stales becomes empty and a

new cycle of the dynamic process will be passed through. This goes on until all nodes of the demand graph have been implemented; then no new state will be generated and consequently

next-states stays empty. Test 2 fails and the states in current-states are passed to block 6: the

post-processor selects the cheapest state that has been generated for the full circuit and performs some optimizations on the hardware. See section 4.4 for more details.

Finally, the generated hardware is presented at the oulput of the hardware generator.

4.2 PROCESSING A STATE

Ever, state has a property bucket that holds the nodes from the demand graph that are free. A node is called rree if all nodes that are connected to the outgoing edges of the node have already been implemented. A free node can be implemented only if it is implementable. A free node is called implementable if all related nodes, i.e. those nodes that are controlled by the same control node, are free too. This only applies to special constructs in the algorithm: see the next subsections about the while loop and the case statement. Most node-types however are independent, they are implementable when they are free.

Figure 4.2 shows the How diagram of the state processing routine. This routine handles one state at a time. For every node or set of related nodes in the bucket of this state that can be implemented, a new state is created. Then, depending on the type of the node (-set), an implementing routine is called. The different types of implementing routines are subject of the next subsections. These routines return a list, containing one or more new states. All of them are put together in the oulput list, that is passed to the hardware generating routine that called this process.

4.2.1 IMPLEMENTING A SIMPLE NODE

Sim"k nodes are nodes of type get,put, constanl or nodes that represent an operator. These nodes are called simple because they stand alone, they are not related to other nodes. As stated before, a simple node that is present in the bucket is always implementable.

(20)

No

-14-depending on the node-type select implement-__ routine

No

Figure 4_2_ Flow chart for process-state

When a simple node is implemented, this node is removed from the bucket and placed in the set

reali .. d-nodes. Nodes that become free by the implementation of this node are added to the

bucket.

Tbe implementation of ari operator node is the most difficult because of the many possibilities: re-using a free module, making free an existent module in a new machine cycle or selecting a new module from the module hbrary.

Figure 4.3 shows the two possibilities concerning the re-use of an existing hardware module. Tbe implementing routine first searches in the property free-modules for modules that perform the

ri",1t function and that are not used yet in the current machine cycle. When such modules are present, one of them is selected and the node is mapped on this free module (see below). For the selection criterion, see recommendation 3 in chapter 6.

As /igure 4.3 shows, if no free module is present to implement the node, the hardware is searched for a module of the proper type that is already used in this machine cycle. If one is found, a new machine cycle will be started (see section 5.7). Then the node can be mapped on a free module as described below.

Besides trying to re-use an existing module, another possible implementation is to extend the hardware with an extta module. Therefore a cell must be selected from the module library. The seleclon criterion implemented selects three cells that all perform the operation as requested by the node; they are the smallest cell, the cell that has the shortest delay time and the cell that

(21)

add node to realised-nodes,

update bucket of state

free

No node type

resent?

No

Figure 4.3. Mapping an operator node on an existing module

embodies most operations covering the operations that still must be performed by the nodes of the demand graph that have not been implemented. The first one is selected in consideration of area econo'llical reasons, the second one is useful when short delay time is an important objective and the last cell is selected with a view to fUlUIe re·use and is thus area oconomical too.

Each cell that has a delay time exceeding the time left in the current cycle, must be removed from this set (see recommendation 5, chapter 6). If no cells are left in the set afterwards, first a new machine cycle will be started and the set is retrieved.

Possible doubles are removed from this cell selection. For each cell that remains a new state is

generated, a module is created for the cell and is added to the hardware; lhen 1he node is mapped on this (free) module.

All generated states (both for re-used modules and for new modules) are put in the oullis! and this

list is returned to the routine process-state lhat activated lhis implementation. Mapping a node on a free module implies a sequence of actions.

1. The selected module is deleted from lhe list/ree·modules.

2. To connoct the proper signals to lhe module, the nodes of lhe demand graph that produce lhese signals are determined wilh the aid of lhe graph sbUcture routines. Property

trace-nodes is used to find out on which modules of the hardware these nodes are mapped. The output nets of lhese modules are selected; if no net is connocted to lhe output of such a .nodule, a new net is created and then connected to lhe output of that module. The order in which aU data is passed to and fro is always maintained, because this can be important when operators are non-commutative (see chapter 6, recommendation 4).

3. If 1he module has never been used before, no nets are connected to 1he inputs and the output nets can direcdy be connected to the inputs.

But if the module has been used before once, the output nets can not be connected direcdy. First for each input signal multiplexers must be inserted. The nets that already are connected

to the input of the module are now moved to the first input of the multiplexers; new nets are created to connect the outputs of the multiplexers to the inputs of the module. Then the selected output nets can be connected to the second input of the multiplexers.

(22)

-16-these multiplexers have no free inputs left, first a larger multiplexer is selected from the module library to substitute the old multiplexer. The existing connections need not to be

cbanged; the selected output nets can now be connected to the first free input of each multi-plexer.

4. Property trace-nodes is updated for the node that has been mapped on the module.

Implementing a node of type get rakes the following steps.

A get node represents the taking in of a variable from outside. With each get node a port is

associ-8t1'Ai that represents the input pin. Each port will be mapped on a module of type terminal.

The name of the port for the get node is defined in the algorithm that is implemented. In property

inputs can be seen if the port has been implemented or not; if so, it gives the module on which the

port has been mapped

If no terminal is implemented for the get node, a new terminal module is created. At the input side of this module a dummy module of type dum-term is connected because (at this moment) the cost estimator cannot handle a module that is not connected to any other module. At the end of the hardware generating process this dummy module will be removed by the postprocessor. The name of the pon together with the name of the module is added to the property inputs.

If the terminal is not used already in the current machine cycle, the node is mapped on the free ter-minal, else first a new machine cycle will be started.

Property trace-nodes is updated and the new state is returned to the routine process-state.

Impltmenting 8 node of type put stans with looking in propeny outputs for the output port

where the computed signal must be presented to the outside world. If this port is not implemented, this is done first: a new module of typ(l terminal is created and the name of the pon together wilb the name of Ibe module is added to the property oUlputs. Then Ibe net that represents the signal of the put node is determined with the aid of trace-nodes. If the terminal is not used already in the current machine cycle, the net is connected to Ibis terminal, else first a new cycle will be staned.

Implementing 8 constant node differs a liule from Ibe other simple nodes. In fact only some con-nections to ground or supply voltage must be made. But because of the bit-slice structure this can not be described in that way. Therefore a module for a cell of type constant is created. In property

constants is kept which value must be realised by Ibis module. Later, in the layout realisation part of the silicon compiler, the correct connections must be made.

Because (at this moment) Ibe cost estimator can not handle an unconnected module, a dummy module of type dum-cons is created and connected to Ibe input of the constant module. At the end of the hardware generating process the postprocessor will remove this dummy module.

4.2.2 IMPLEMENTING A CASE STATEMENT

In section 2.1.1. the demand graph for a case statement (or if steuement) has been described; figure 2.3 was used to demonstrate the connections between the edges and nodes used.

A case statement consists of several paths in which operations are performed. Depending on a control signal one of these paths is selected and the values that are computed in the selected palb

\\.ll be present at the end of the case statement.

Every path is implemented in a separate state of the state machine to make it simple to select a path. Therefore each path is implemented in a new machine cycle.

(23)

When a merge node is passed to the process-state routine, it is tested whether it is implementable or not This is done by determining all merge nodes that are controlled by the same node that is connected at the control edge of the merge node. If they all are present in the bucket, the entire set of merge nodes is implementable. Then this set is passed to the implement-merge-set routine. This implement-merge-set routine performs several actions:

1. The control signal from the control node is connected to a module of type control,

representing the state machine while not implemented in the program (see chapter 6, recom-mendation 6).

2. The merge nodes that are implemented are deleted from the bucket. The current properties

bucket, cycle-nr, cycle-type and free-modules are pushed on stack (see section 5.7) because each path will be implemented in a separate machine cycle.

3. The different machine cycles wi11 be initialised: all link-in-<i> nodes are sorted for the clause they represent and pushed on the bucket-stack. The <i>-value determines the increas-ing order in which they wi11 be popped. At the same time the right values are pushed on the other stacks.

4. Property trace-nodes is updated for the merge nodes and all link-in-<i> nodes that are con-nected; the merge nodes are added to realised-nodes.

5. All paths of the case statement come together in branch nodes. For each branch node a module for a multiplexer is created and an association sublist pair (branch-node module-name) is placed in property cycle-save. (A multiplexer is not always needed; see section 4.4 about the postprocessor and the note at the end of this subsection.)

6. For economical reasons (savings in the width and length of the state lattice), every time when a state of the state machine for a path of a case statement is started, all link-in-<i>

nodes are implemented at once. These nodes do not perform any operation, so implementa-tion is done by only updating the realised-nodes and bucket properties. The trace-nodes pro-perty was already updated in step 4.

This ends the implementation of a set of merge nodes. In a number of machine cycles the nodes of each path will be implemented in the nonna! way. When the implementation of a clause of the case statement is finished, the state of the state machine and therefore the current machine cycle will be ended. The computed signals are connected to the proper multiplexers with the aid of pro-peny cycle-save.

A branch node becomes free and is placed in the bucket when all paths going to that branch node have been implemented. A branch node is implementable if the entire set of related branch nodes is present in the bucket.

Implementing the set of branch nodes takes the next actions:

1. The signals of the multiplexers are stored in propeny trace-nodes.

2. The control signal (see point 1 of implementing a merge set) is connected to the multi-plexers.

3. The association pair (branch-node multiplexer-name) is deleted from the propeny

cycle-save.

4. The branch nodes are added to the realised-nodes property and the bucket is updated. This completes the implementation of a case statement.

If a variable, that is computed in a case statement, always gets its value from the same operator, all inputs of the multiplexer for this variable will be connected to the same output net of that operator. Then this multiplexer is superfluous. The postprocessor detects such a situation and wi11 remove the multiplexer.

(24)

-18-4.2.3 IMPLEMENTING A WHILE LOOP

The demand graph for a while loop has been described in section 2.1.1; figure 2.4 shows an exam·

pIe.

A while loop can be described in two parts: a path in which some computations are performed and

the computation of a signal that controls this path. While the control signal delivers a value 'true',

data will circle around through the path: when it comes to the end of the path it will be fed back to

the beginning. Two states of the state machine cycles will be used: one for the computation of the control signal and one for the implementation of the nodes in the path.

A while loop starts with entry nodes. An entry node becomes free when the node at its entry edge has been implemented. An entry node is implementable if all related entry nodes are free. This means that all elllry nodes that have their control edges connected to the same node that delivers

the cvntrol signal, must be present in the bucket. In this

case

the entire set of related entry nodes is implemented, which results in:

1. Move the entry nodes from bucket to realised·nodes. Then push the current bucket, cycle-nr, cycle·type and free-modules properties on their stacks and start a new machine cycle for a new state of the state machine (see section 5.7).

2. Initialise the new cycle: property cycle· type gets the value and the free nodes after the entry

nodes are put in the bucket. These nodes will compute the control signal for the while loop.

3. For each entry node that is used for a variable a module for a two-input multiplexer is created. (So not for an entry node that realises a path between the sink node and nodes of

type constant, see section 5.6.)

The initial input signal for an entry node comes from the node that is connected to the edge of type entry. The output nets of the modules on which these nodes are mapped are con· nected to the first inputs of the created multiplexers. Then properties trace-nodes and trace-invalid are updated.

After the entry nodes have been implemented, normal implementation goes on for the nodes that produce the control signal. When this is ready, the bucket ouly contains all exit nodes for this

while :Oop. The exit nodes are also implemented as an entire set:

1. Now the control signal for the while loop has been computed. The control edge of all related

entry and exit nodes is connected to one node that delivers the control signal. A module of

type control, that represents the state machine, is created and the output net of the module on which the controlling node has been mapped is connected to this control module. This module is then connected to the control inputs of the multiplexer modules on which the

entry nodes are mapped. Again must be remarked that this control module only is used because the state machine is still not implemented in the program.

2. Wben the while loop finishes, the computed signals of the while loop will be present in the

link-in-2 nodes at the entry edges of the exit nodes. These link-in-2 nodes are added to the

bucket·list at the top of the bucket·stack, that contains the nodes that are pushed in step 1 of implementing entry nodes: these nodes will be popped from stack at the end of the while loop.

3. For each exit node, that is used for a variable, a module for a two-output demultiplexer is created. (So not for an exit node that realises a path between the sink node and nodes of type

constant, see section 5.6.) The nets that carry these variables are connected to these demul· tiplexers. The control module is connected to the control input of the demultiplexers. The

Jxit nodes are moved from the bucket to realised·nodes.

4. A new state of the state machine is started for the implementation of the loop path. The

(25)

the bucket.

5. Property trace-nodes is updated for the control node and all link-in-l, Iink-in-2, and exit nodes.

At this moment all exit nodes are implemented and the first nodes that realise the computations of the while loop are present in the bucket. They are implemented in the normal way. If a variable of an exit node (mapped on a demultiplexer) is used. the net that is connected to the first output will be token. When all nodes in the while loop are implemented. the bucket contains again the set of

entry nodes that have been implemented before. This time the following actions will take place: 1. The computed signals are connected to the second input of the multiplexers for the entry

nodes.

2. The cycle of type '(while) is ended; this means that for each node of type Iink-in-2 on the top of the bucket-stack a connection-net is created at the second output of the demultiplexer for its exit node. To the output-side of this net a dummy module (of type dum-conn) is con-nected. because the cost estimator cannot handle a net that is not connected at both sides. This dummy module will be deleted by the post-processor. Also frace-rwdes is updated and the top of each stack is popped.

3. The Iink-in-2 nodes are implemented (i.e. moved from bucket to rea/ised-rwdes and the free

nodes after the Iink-in-2 nodes are put in the bucket).

This ends the implementation of the entry and exit nodes and thus completes the description of the implementation of a while loop.

4-3 COMPARABIUTY AND COST FUNCTION

For every

state in the set current-states a set of new states is computed by the process-state rou-tine. These sets of new states are passed back to the hardware-generation routine (see figure 4.1:

block 4. and figure 4.2). Then the mechanism that delimits the number of states is activated: only the cheapest comparable states are put in the set next-states; these states will be processed in the next cycle of the dynamic hardware generation process.

When is State-A comparable to State-B that is already present in next-states? This is based on two rules:

1. The property realised-nodes of both states must contain exactly the same nodes.

2. The nodes-cells-cover-set of State-B must be smaller than or equal to the

rwdes-cells-cover-set of State-A.

The nodes-cells-cover-set of a state is the intersection (without removing doubles) of all opera-tions that are represented by nodes in the demand graph and all operaopera-tions that can be performed by the cells that already have been implemented in hardware.

The first rule assures that the hardware has been generated for the same part of the demand graph.

the second rule measures in some way how much the hardware can be used.

All states in next-states that satisfy these conditions are put in a list equal-states. Then the

cheapest state out of these equal-states and State-A is determined. This is done with the aid of a cost-function. The cost of a state depends on the area A the implemented hardware occupies and the number of machine cycles M that is needed. A scale-factor C is needed to be able to add area and time. Then the cost-function is defined as:

(26)

-20-cost=A+CoM

The value of C can be used to emphasise the importance of a small circuit (small C) or the impor-tance of a fast circuit (large C).

This roOSt function is linear in time and area, but it is also possible to implement another cost func-tion. For example, when the area that can be used is bound to a maximum, a progressive cost

func-tion is suggested to restrict hardware extensions when the total available area is almost consumed. Another possibility is to multiply the used area and the number of machine cycles.

4.4 POSTPROCESSOR

When one final state has been selected, a postprocessor will perform some optimizations on the

generated hardware:

1. All dummy modules are removed (see section 4.2.1: implementing a get or constant node,

and section 4.2.3 and 5.7: second connection to demultiplexer for

exit

node); nets that were connected are removed or reconnected.

2. If a net is connected to the same multiplexer more than one time, only one of the inputs of

the multiplexer will stay connected to this net The state machine will be updated and it is checked if a smaller multiplexer from the module library can be sufficient

3. When the oulput of a multiplexer is connected to the input of another multiplexer, this com-'>ination can be substituted by only one multiplexer. A new multiplexer will be selected from the module library and the hardware connections and the state machine will be

updated. (This situation occurs when a nested case statement has been implemented.) 4. Remove a demultiplexer with only one oUlput connected. Delete one affected net and

recon-nect the other one. (This situation can occur when a while loop has been implemented and one of the signals in the loop was only local.)

(27)

5. DATA STRUCTURES

The hardware synthesis process has heen implemented in a CommonLlSP [Wins841 program. The basic dala structure around which the process has heen built up, is the state. This chapter describes what dala exactly is held in a Slate. Also the program implemenlations for some of the routines that affect properties are presented.

A state represents a hardware description that has been generated as a result of the implementation of a part of a demand graph. The hardware description and the status of the demand graph imple-mentation are described in the properties of the state.

The complete demand graph is implemented in a dynamic programming process (see chapter 3). If

a new state is generated, first all data in the properties of the predecessor state are copied (except for the cost data, see section 4.4); then a next demand graph node or a set of related nodes is implemented and the hardware is extended according to the implementation rules for the node(s). The ;iynamic process compares equivalent Slates (hardware implemenlations) and deletes those states that probably will not develop into an optimal hardware implementation. This way a state lattice is formed, starting with an initial State-O and resulting in a set of final states from which the best will be selected.

5.1 STATE IDENTIFICATION

A state is identified by its property state-id, an integer number. This number, converted to its character string representation, forms the suffix part of the name of the state; the prefix is "State-". To determine the place of a state in the slate lattice, the property prev-stale is provided; this is an integer number equal to the state-id of the parent state, i.e. the previous state where this slate is descended from.

5.2 RELATIONS TO THE DEMAND GRAPH

The property realised-nodes keeps a list of demand graph nodes, that represents the part of the graph that has already been implemented. The propeity bucket, another list of demand graph nodes, contains the nodes that are free (see section 2.1). Initially, the set realised-nodes is empty and the set bucket contains DmgNode-O and DmgNode- I, i.e. the sink node and the lO-sink node, together being the starting points of the demand graph.

When a next state is generated, the node (or nodes) that will be implemented is deleted from the

bucket and addcd to the set of realised-nodes. The bucket is extended with the nodes that become free by the implementation of this node.

The computing of new slates stops, when all nodes have been implemented and the hardware gen-erated covers a complete circuit description that realises the function(s) prescribed by the

algo-rithm.

5.3 HARDWARE DESCRIPTION

The loardware is described by two sorts of elements. Modules constitute the building blocks for the hardware. A module represents a certain cell from the module library, it performs some operation(s) on data. Nets reptesent the wires that connect the modules, hence they transport data

(28)

-22-from one module to another.

While the hardware generating process proceeds, the circuit description is constructed. This causes many new modules and new nets to be generated, many searches for, references to and updates of old nets and modules, deletions of nets or modules, insertions of modules in between old con-nected modules, etc. In other words, the hardware data will be referred to and changed many times. Therefore the access to the nets and modules has to be very fast This is done using vectors, because in a vector each data field can be accessed directly. Two sorts of vectors are used and will be introduced below: the 1U!1-veClor and the module-veclor.

The 1U!1-veClor consists of the following three data fields:

net .. name

1U!1-in-modules

1U!1-Oul-modules

the name of a net is a string, that has prefix "net-" and the suffix is a unique integer number (converted into a string).

is a list that contains the module-names of the modules that have an output connected at the input of the net

is a list that contains the module-names of the modules that have an input connected at the output side of the net

The module-veclor consists of five data fields:

module-in-nels module-oul-1U!IS module-control-nels

library-module

a string with prefix "mdl-" and suffix an unique integer number con-verted into its string representation.

a list of net-names, each net is connected at a data input of the module. a list of net-names, each net is connected at an output of the module. a list of net-names, each net is connected at a control input of the module.

the name of a cell in the module-library, that has been selected for implementation.

The order in which the net-names occur in the lists module-in-1U!IS, module-oul-nels and module-control-1U!1S is important, because these orders determine at which pin of the cell these nets must be connected.

The module-vectors and net-vectors are stored in two properties of a state, the lists mdl-vecs and 1U!1-vecs, respectively. Further, propeny mdl-nr keeps the integer number for the suffix part of the name of the module that will be generated next The same holds for nel-nr and the next net-name. Some special actions are taken when somelbing must be changed in Ibe hardware data. To be able to access the hardware data, the hardware must be loaded: all nets and modules are interned.

This is a LISP function; it means that symbols are created for each net and module. A net-name or a module-name becomes Ibe name of a symbol and the corresponding net-vector or module-vector becomes its vaiue.

Then, all hardware data can be accessed immediately and the changes are carried out

Mterwards, the hardware must be saved. The hardware-save routine first saves Ibe newly created vectors in the properties 1U!1-vecs and mdl-vecs; then it uninlerns all symbols, i.e. the links between each vector as a value and the name of its symbol will be set free. In this way, it is prevented that a state would be coufused by hardware data of anolber state. When the hardware is saved (this occurs exactly one time for each new state), the changes in hardware are computed and the hardware-save routine activates the cost estimator.

(29)

,

5.4 COST DATA

As stated before. the cost estimator calculates some parameters that define the area of the hardware Ibat has been implemented in a state. Because Ibe cost estimator works incremental (see chapter 3.3), some data must be stored in a state for the benefit of the cost estimator. Three proper-ties are used to store all cost data.

The

costs

is the properly that stores the cost vector. This cost vector holds five fields. The first one should give Ibe total delay time of the critical path Ibrough the circuit, but this calculation is not performed by the cost estimator. The second field gives the total area occupied by all library modules. The third field gives the total wiring length, then the length of the sequence of stlItdard cells is provided and the last field gives the maximum number of nets above each other in the wir· ing channel.

The property

placement

contains a list of module-names. The order of the module-names deter-mines the order in which the cost estimator has placed Ibe library-modules next to each other. The property

new-cost-vecs

is not of interest for the hardware generating process. The cost estima-tor uses this data for its next incremental cost calculation.

Each time when a new state is generated the hardware has been changed. Then the cost estimator is activated: the three cost data types from Ibe old state are passed to the cost estimator and the new values that will be returned are saved in the properties of the new state. Therefore it is unnecessary to copy these properties when a new state is initialised.

5.5 INPUTS, OUTPUTS & CONSTANTS

The data for the circuit that is needed from outside the circuit itself, enters the circuit via tertni-nals. Also the output signals generated will be present at tertnitertni-nals. Tertninals are cells from the module-library. They correspond with ports that have been defined in the algorithm. Which port corresponds to which tertninal is stored in two properties:

input-conn

and

output-conn.

These

pro-perties are filled in during the hardware generation; both can be used as an association list in the program. Such an association list consists of sublists; each sublist has two elements: first the name of the port in the algorithm, then the narne of the corresponding module in the hardware descrip-tion.

Constants will be implemented with Ibe aid of library-modules of the

constant

type. In fact, this is only symbolic: in Ibe integrated circuit a constant will be obtained by connecting the right nel-wires for the data bits to ground or supply voltage. The property

constants

keeps the constant values that these constant modules must realise. It is an association lis~ for each constant module a sublist is present that contains the narne of Ibe module and the value of the constanL

5.6 TRACING NODES AND MODULES

During the hardware synthesis nodes from the demand graph are implemented by modules. When the next node will be implemented, the module must be connected to Ibose modules that produce the data for its inputs. These modules correspond to the nodes in the demand graph that are con-nected at the out-edges of the node 10 be implemented. When the nodes Ibat produce the data are known, the associated modules can be found with the aid of property

Hardware synthesis with the aid of dynamic programming

Hardware synthesis with the aid of dynamic programming

Hardware Synthesis with

the Aid of

Dynamic Programming

HARDWARE SYNTHESIS WITH THE AID OF DYNAMIC PROGRAMMING

H. van Woudenberg

and

R. van den Born

EUT Report 88-E-201

ISBN 90-6144-201-X

Eindhoven

June 1988

This report was accepted as a M.Sc. Thesis of

H. van Woudenberg by Prof.Dr.-Ing. J.A.G. Jess,

Automatic System Design Group,

of

Engineering, Eindhoven University of

The

work was supervised by Drs. R. Van den Born.

I

, 1

24

24

25

28

29

This

·2·

if

if

6

M

~conum

9

~cont~

9

a

M=<~

•

-

I

lUI UI\ I

II II

LJIUl

L---1

•

-

'e<

I

I

I

UI\ I

II II

I

I

-c.:.I

=

are

No

No

No

case

For every

exit

,

costs

placement

new-cost-vecs

input-conn

output-conn.

constant

constants

Irace-nodes.

_-