chipcflow

download chipcflow

of 4

Transcript of chipcflow

  • 8/4/2019 chipcflow

    1/4

    CHIPCFLOW - A DYNAMIC DATAFLOW MACHINE USING DYNAMIC

    RECONFIGURABLE HARDWARE

    Jorge L. Silva, Joelmir J. Lopes

    Department of Computer Systems

    University of Sao Paulo

    Av. Tabalhador Saocarlense, 400,

    Sao Carlos, SP, Brazil

    emails: [email protected],

    [email protected]

    Valentin O. Roda, Kelton P. Costa

    Department of Electrical Engeneering

    University of Sao Paulo

    Av. Trabalhador Saocarlense, 400,

    Sao Carlos, SP, Brazil

    emails: [email protected],

    [email protected]

    ABSTRACTIn order to convert High Level Language (HLL) into hard-

    ware, a Control Dataflow Graph (CDFG) is a fundamen-

    tal element to be used. Otherwise, Dataflow Architecture,

    can be obtained directly from the CDFG. In the 1970s and

    late 1980s, the Dataflow Model was the focus of attention

    that provided parallelism in a natural form. In particular,

    dynamic dataflow architecture can be generated to produce

    a high level of parallelism. In this paper, the ChipCflow

    project is described as a system to convert HLL into a dy-

    namic dataflow graph to be executed in a dynamic recon-

    figurable hardware, exploring the dynamic reconfiguration.

    The ChipCflow consists of various parts: the compiler to

    convert the C program into a dataflow graph; the operators

    and its instances; the tagged-token; and the matching data.

    Some results are presented in order to show a proof of con-

    cept for the project.

    1. INTRODUCTION

    A Dataflow Architecture is an architecture where a natural

    parallelism is present. This kind of architecture was first

    researched in the 1970s and was discontinued in the 1990s

    [1, 4, 7]. With the advance of the technology of microelec-

    tronics, the Field Programable Gate Array (FPGA) is been

    used, mainly because of its flexibility, the facilities to imple-ment complex systems and the intrinsic parallelism. Thus,

    the dataflow architecture is a topic once more [2, 5], spe-

    cially because of the reconfigurable architecture, witch is

    totally based on FPGAs. On the other hand, a great effort is

    been realized to covert high level language as a C language

    into a hardware, in order to help engineering to project his

    systems using a high level of abstraction as well as a digital

    logic level. In particular, the ChipCflow project is a system

    where a C programs is initially converted into a Dynamic

    Sponsor acknowledgments for CNPq

    Dataflow graph, followed by its execution in a Reconfig-urable Hardware. Its flow diagram is shown in Figure 1.

    As can be clearly seen in Figure 1, the ChipCflow system

    begins in a host machine where a C program is edited, to

    be converted into a control dataflow graph (CDFG) gener-

    ating a CDFG object program. The CDFG object program

    is converted into a VHDL where modules of CDFG are ac-

    cessed from a data base of VHDL modules. After gener-

    ating the complete VHDL program, an EDA tool to con-

    vert the VHDL program into a bitstream and to download

    it to a FPGA is used. A dynamic reconfiguration, present

    in some FPGAs that provide dynamic dataflow execution is

    used, witch is the purpose of this project.

    Fig. 1. The Flow Diagram for ChipCflow tool.

    A dataflow graph is composed by a set of operators in-

    terconnected by arcs. Various items of data can be in an arc

    coming to an operator. When all arcs of an operator are with

    data partners, it fires and send a result to another operator.

    There are two models of dataflow architecture: a static and a

    dynamic dataflow model. In the static model, just one item

    of data can be in an arc waiting for its partner. A protocol

    for a static dataflow model should be used to maintain just

    one item of data in an arc. Consequentially the parallelism

    is limited to one item of data per arc. In a dynamic model,

    978-1-4244-3846-4/09/$25.00 2009 IEEE 213

  • 8/4/2019 chipcflow

    2/4

    more than one item of data can be in an arc. The protocol

    for dynamic dataflow model also should control each item

    of data in the arc and their partners, however a tagged-token

    is used to control the data partners. In this case, the paral-

    lelism is present when all items of data can be in the arcs,

    rightly limited for the size of the hardware to receive thesedata.

    In the section 2 the basic structure for ChipCflow: the

    pre-compiler, its operators, and some examples of graphs is

    presented. In the section 3 the instances model; the tagged-

    token format and the iterative constructors are described witch

    allow several instance of an operator to be executed in the

    dynamic model of dataflow using iterative constructor re-

    spectively. In the section 4 the Matching data that identify

    items of data partners is described. In the section 5 the im-

    plementation of the operator and its instances are described.

    Finally in the section 6 the conclusion and future works are

    described.

    2. THE C PRE-COMPILER FOR DATAFLOW

    GRAPHS

    After lexical analyzing, semantic analyzing and code opti-

    mization, the code generation, in the C pre-compiler, pro-

    vide a file with various packets of bits that represent the

    dataflow graph. The format of the packet is described in

    the Figure 2. As can be clearly seen in the Figure 2, the first

    4-bits of the packet is been used to identify the operator; the

    second, the thirty and the fourth 5-bits is been used to iden-

    tify the three inputs (a,b and c) of the operator; finally the

    sixth and the seventh 5-bits to identify the outputs (s and z)of the operator. This is a generic template for the operator

    with three inputs and two outputs signal, however there are

    operators which less than three inputs signals and just one

    output signal. After compiling a C program, its correspon-

    dent dataflow graph and the packets of bits for this dataflow

    graph are generated. In the Figure 3, one single example of a

    While C statement is described. After compiling the While

    statement, a dataflow graph are generated. It is shown in the

    Fig. 2. The bit stream for dataflow graph.

    Figure 3 that each operator has a set of bits to identify its

    function, as well as each arc has a set of bits to identify its

    interconnections. In particular, in the left top of the figure

    has an operator with the code 0001 and its arcs 00000

    (value 0), 00001 (value i), 00010 (a control sig-

    nal) and 00011 (the output signal), corresponding to three

    input signals and one output signal respectively. The packet

    of bits for this particular operator can be clearly seen in the

    first packet of bits in the Figure 4. The xxxxx in the packet

    of bits represent an arc with no connection signal. Thus, a

    file with these packets of bits, is a binary representation for

    a dataflow graph extracted from a while C statement in the

    C pre-compiler.

    Fig. 3. A C program converted into a bit stream.

    Fig. 4. The bit stream generated from a C program.

    After generating a file of binary representation for a dy-

    namic dataflow graph, the C pre-compiler converts the file

    into a VHDL code. A library with all the operators imple-

    mented in VHDL is used for this conversion. The informa-

    tion in the packet of bits is used to construct a VHDL code

    component for the operator. The set of components are then

    used to generate the final VHDL code to be executed in the

    hardware.

    214

  • 8/4/2019 chipcflow

    3/4

    3. THE INSTANCES OF THE OPERATORS

    As the ChipCflow is based on a Dynamic Dataflow Archi-

    tecture, an arc can be viewed as a buffer of data and various

    items of data can be in the buffer waiting for the data part-

    ner in an operator. However, a new instance of the operatorwill be generated for each item of data coming through the

    arc. It is for this reason that there is no buffer of data in

    the arcs. Thus, various instances of an operator can be gen-

    erated, waiting for the items from their data partners. To

    implement the model of instances, a process to insert and

    remove the sub-graph in an original graph was proposed.

    3.1. The tagged-token

    Tagged-tokens are used in the item of data to instantiate sev-

    eral execution of the same operator. An example of the ap-

    plication of a tagged-token is the execution of a simple loop

    in C described in the following algorithm:

    z=0;

    for (i=0; i

  • 8/4/2019 chipcflow

    4/4

    Fig. 6. The Instances with its Matching Circuits and the

    common variable.

    system to generate the instances; and the control to execute

    these instances. Initially, an ADD operator only for proof-of-concept was implemented. A Statechart diagram of the

    operator is described in Figure 7. The process begins by re-

    ceiving the astr signal, informing the current operator that

    it has an item of data to receive from the previous operator

    and also to test the bitz signal that defines if there is no data

    waiting to be sent to the next operator. Thus, if the condi-

    tions are true, the next step is to observe the incoming data,

    with its partner bufa[47-32]=bufb[47-32], according to the

    tag specified in Figure 5. If there is a partner, the operator

    executes the operation, sends the result zstrb, and acknowl-

    edges all the items of data aack,back. When there are no

    partners, the data is buffered, and no acknowledgements are

    sent back. The implementation was carried out in an EDA

    tool from Xilinx ISE 9.2i for Virtex 5 (360 MHz). The oper-

    ator spent 6ns to deal with all the conditions for the protocol,

    matching data, and the execution. For this implementation,

    static dataflow architecture was tested, without dynamic re-

    configuration, which will be the focus in the future.

    6. CONCLUSION

    Research to convert High Level Language (HLL) into hard-

    ware has put forward various possibilities mainly with the

    flexibility and capacity of the reconfigurable architectures.A Control Dataflow Graph (CDFG) is a fundamental ele-

    ment in this process. Otherwise, a Dataflow Architecture,

    which was the focus in the 1980s, can be obtained directly

    from the CDFG. In particular, dynamic dataflow architec-

    ture can be generated in order to produce a high level of

    parallelism. In this paper, the ChipCflow project was de-

    scribed as a system to convert HLL into a dynamic dataflow

    graph to be executed in dynamic reconfigurable hardware,

    exploring the dynamic reconfiguration. The operator, which

    is the main element in the dataflow graph, was implemented,

    and spent 6ns to execute all the process. The simulation re-

    Fig. 7. The Statechart of an Operator.

    sults demonstrate the proof of concept for the operator. The

    next steps of the ChipCflow project are to implement the

    complete model of instances and generate an analysis with

    benchmarks to verify the impact of this approach.

    7. REFERENCES

    [1] Arvind. Dataflow: Passing the token. ISCA Keynote, 2005.[2] A. Capelli. A dataflow control unit for c-to-configurable

    pipelines compilation flow. IEEE Sumposium on Field-

    Programmable Custom Computing Machines FCCM04,

    2004.[3] A. DEHON. Reconfigurable architecture for general-purpose

    computing. Ph.D. thesis, Massachusetts Institute of Technol-

    ogy, 1996.[4] J. B. Dennis. A preliminary architecture for a basic dataflow

    processor. Proceedings of the 2nd Annual Symposium on Com-

    puter Architecture, 1975.[5] S. Swanson. Wavescalar. 36th Annual International Sympo-

    sium on Microarchitecture, 2003.[6] S. e. a. Swanson. Wavescalar. 36th Annual International Sym-

    posium on Microarchitecture, 2003.[7] A. H. Veen. Dataflow machine architecture. ACM Computing

    Surveys, 18(4):365396, 1986.

    216