Lecture1 Organization[1]

65
ECE 656M Embedded Systems Design And Prototyping Term 3, 2011-2012

description

Embedded systems design and protyping

Transcript of Lecture1 Organization[1]

  • ECE 656M

    Embedded Systems DesignAndPrototyping

    Term 3, 2011-2012

  • Cesar A. LlorenteResearch and teaching interests: reconfigurable computing machine vision energy systems

    Contact:Electronics and Communications EngineeringCollege of Engineering

    Contact: [email protected]

  • ECE 545LectureProjectsProject 1 30 %Project 2 20 % Homework 10 %exams Quiz 20 % in class Final 20 % take home

  • Lecture (1)Lecture 1 - Introduction to Embedded SystemsLecture 2 Introduction to VHDL Combinational Logic. Packages and Components.Hands-on Session 1: XST Synthesis and SimulationLecture 3 Behavioral Modeling of Sequential Logic. Registers, Counters, Shift Registers. Simple Testbenches.Lecture 4 - Introduction to FPGA Devices & ToolsHands-on Session 2: Tools for FPGA Synthesis and ImplemenationLecture 5 - Finite State MachinesLecture 6 - Algorithmic State Machines. Memories: RAM, ROM.Lecture 7 Advanced Testbenches. File I/O.Lecture 8 - Mixed Style RTL Modeling

    Quiz 1

  • Lecture (2)

  • TextbooksRequired Textbooks:

    Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004

    Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998

    Supplementary Textbooks:

    Stephen Brown and Zvonko Vranesic, Fundamentals of Digital Logic with VHDL Design, 2nd Edition, McGraw-Hill, 2005

    Peter J. Ashenden, The Designer's Guide to VHDL, 2nd Edition, San Francisco:Morgan Kaufman, 1996, 2002

  • Quiz 2 hours 30 minutes

    in class

    design-oriented

    open-books, open-notes

    Tentative date:

  • Final Examination take-home full design, including logic synthesis and timing analysis for FPGAs

    Tentative date:

  • Project technologiesFPGA: Field Programmable Gate Arrays

  • World of Integrated CircuitsIntegrated CircuitsFull-CustomASICsSemi-CustomASICsUserProgrammablePLDFPGAPALPLAPMLLUT(Look-Up Table)MUXGates

  • designs must be sent

    for expensive and time consuming fabrication in semiconductor foundry bought off the shelf

    and reconfigured by designers themselvesTwo competing implementation approachesASICApplication SpecificIntegrated CircuitFPGAField ProgrammableGate Array designed all the way

    from behavioral description to physical layout no physical layout design;

    design ends with a bitstream used to configure a device

  • Which Way to Go?Off-the-shelfLow development costShort time to marketReconfigurabilityHigh performanceASICsFPGAsLow powerLow cost inhigh volumes

  • Source: [Brown99]What is an FPGA Chip ?Field Programmable Gate ArrayA chip that can be configured by user to implement different digital hardwareConfigurable Logic Blocks and Programmable Switch MatricesBitstream to configure: function of each block & the interconnection between logic blocks

    ****

    FPGAs are the reconfigurable top of the shelf chips. Reconfiguration technique is very similar to SRAM approach.The FPGA architectures consist of CLBs and Programmable Switch Matrices (PSMs). The gates are placed inside CLBs in FPGAs. These info will be supported by the next slide as well.Bullet 3 is not very correct. Because FPGA is not a large array of gates with programmable interconnections. As I have mentioned above, the main power of FPGAs come from CLB and its internal components like LUTs(look up tables), Carry bit Logics, Gates, FFs(Flip Flops) and MUXs(multiplexors).

  • CLB Structure

    ****The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.

  • COUTDQCKSRECDQCKRECOG4G3G2G1Look-UpTableCarry&ControlLogicOYBYF4F3F2F1XBXLook-UpTableF5INBYSRSCarry&ControlLogicCINCLKCESLICECLB Slice

    ****The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.

  • LUT (Look-Up Table) FunctionalityLook-Up tables are primary elements for logic implementationEach LUT can implement any function of 4 inputs

  • Major FPGA VendorsSRAM-based FPGAsXilinx, Inc.Altera Corp.AtmelLattice Semiconductor

    Flash & antifuse FPGAsActel Corp.Quick Logic Corp.

    Share over 60% of the market

  • Xilinx FPGA FamiliesOld familiesXC3000, XC4000, XC5200

    old 0.5m, 0.35m and 0.25m technology. Not recommended for modern designs.Low-cost familiesSpartan/XL derived from XC4000Spartan-II derived from VirtexSpartan-IIE derived from Virtex-ESpartan-3High-performance familiesVirtex (0.22m)Virtex-E, Virtex-EM (0.18m)Virtex-II, Virtex-II PRO (0.13m)Virtex-4 (0.09m)

  • Design process (1)Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds..Library IEEE;use ieee.std_logic_1164.all;use ieee.std_logic_unsigned.all;

    entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; );end AES_core;SpecificationVHDL description (Your VHDL Source Files)Functional simulationPost-synthesis simulationSynthesis

  • Design process (2)Implementation(Mapping, Placing & Routing)ConfigurationTiming simulationOn chip testing

  • Design Process control from Active-HDL

  • Simulation ToolsMany others

  • architecture MLU_DATAFLOW of MLU issignal A1:STD_LOGIC;signal B1:STD_LOGIC;signal Y1:STD_LOGIC;signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;

    beginA1

  • Synthesis Tools and others

  • Features of synthesis toolsInterpret RTL codeProduce synthesized circuit netlist in a standard EDIF formatGive preliminary performance estimatesSome can display circuit schematics corresponding to EDIF netlist

  • ImplementationAfter synthesis the entire implementation process is performed by FPGA vendor tools

  • MappingLUT2LUT3LUT4LUT5LUT1FF1FF2LUT0

  • PlacingCLB SLICESFPGA

  • RoutingProgrammable ConnectionsFPGA

  • Design Process control from Active-HDL

  • Top Level ASIC Digital Design FlowRTL DesignPlace + RoutePhysical VerificationSynthesisDesign InceptionDesign CompleteMacro Development

    ****

  • RTL DesignDesign FunctionDigital ToolRTL DesignTestbench DevelopementMixed Mode SimulationFPGA Verification(users discression)Lint Checking(users discression)Code Coverage(users discression)Formal VerificationCadence NC VerilogMentor Graphis ModelSimCadence NC VerilogMentor Graphics ModelSimCadence AMS DesignerXilinx ISECadence HalCadence ICTAgilent ADSMatlabDesign InceptionDesign InceptionSynthesisSynthesis + Macro DevelopmentSystem Interface SimulationCadence ConformalSynthesis + Macro Development

    ****

  • Synthesis + Macro DevelopmentDesign FunctionDigital ToolSynthesisStatic Timing AnalysisLogical EquivalencyDFTPlace + RouteGate-Level SimulationRTLSynopsys DC Cadence RCSynopsys PrimeTimeCadence ConformalSynopsys DFT CompilerCadence RCPlace + RouteCadence NC VerilogMentor Graphics ModelsimRTLMacro GenerationMacro VerificationMacro Rules Generation / Library GenerationMentor Graphics CalibreArtisan/Cadence DFIIArtisanVerificationVerification

    ****

  • Place + RouteFloorplanMacro Placement / Std Cell PlacementPlacement-Based OptimizationClock Tree SynthesisRouteRC ExtractionSignal IntegrityDesign FunctionDigital ToolStatic Timing AnalysisCadence NanoRouteCadence Fire&Ice QXCadence CeltIC / Voltage StormSynopsysPrime-TimeVerificationVerificationCadence EncounterSynthesisSynthesisATPGMentor Graphics FastScanCadence EncounterMetal FillSpare Cells / Decoupling Cap Filler CellsCadence Encounter

    ****

  • Physical VerificationDesign FunctionDigital ToolGDSII Preparation / Schematic PreparationDRCLVSERCSimulation PreparationBack Annotated SimulationLayout Chip FinishingCadence DFIICadence DFIICadence NC VerilogCadence VirtuosoPlaced + Routed DesignPlaced + Routed DesignDesign CompleteDesign CompleteMentor Graphics CalibreTop-Level SimulationSynopsys NanosimCadence AMS Designer

    ****

  • CAD software available at DLSU (1) Xilinx ISE 12.3 (under Windows)

    VCS (under Linux)

    available in the STRC111 Intel Microprocessors Lab

    VHDL simulators Free Student Edition: ISE WebPack available in the STRC111 Intel Microprocessors Lab

  • CAD software available at DLSU (2)Tools used for logic synthesis Xilinx XST / EDK /SDK (under Windows)

    FPGA synthesis available in the STRC111 Intel Microprocessors Lab

  • CAD software available at DLSU (3) Xilinx XST (under Windows)

    FPGA synthesis available in the STRC111 Intel Microprocessors Lab

    Tools used for implementation (mapping, placing & routing) in the FPGA technology

  • Projects OverviewProject 1 (35 points) January February (~6 weeks)Project 2 (35 points) March (~4 weeks )Application: Game Application using Microblaze ProcessorTechnology: FPGATarget: synthesizable code, downloadable code Application: Game Software using state machines Technology: FPGATarget: synthesizable code, downloadable code

  • Projects 1, 2 choice between two project topics cryptography (e.g., encryption, authentication, hash) digital signal processing (e.g., digital filter, FFT,

    image processing, etc.)

    both topics specified by the instructor

    initial specification in the form of a

    - pseudocode and/or flowchart - detailed interface

    design and source code is required to be scalable,

    i.e., work for different parameters and operand sizes, specified at the time of synthesis

  • EncryptionInput: (A, B, C, D) Table S[0..2r+3]

    B = B + S[0]D = D + S[1]for i= 1 to r do { t= (B*(2B+1))

  • Encryption/decryptionunitwith control & i/o interfaceclockresetenc_decdata_indata_availabledata_readmS_ikey_availablekey_readKey memory unitdata_outwritefullmround numberround key(s)Required interfacewready

  • Projects 1, 2Optimization CriteriaMaximum ratio

    Throughput / Circuit Area

    or

    Minimum product Latency Circuit Area

  • Primary timing parametersLatencyThroughputCircuitTime to process a single block of dataXiYiNumber of bits processedin a unit of timeCircuitXiXi+1Xi+2YiYi+1Yi+2Throughput =Block_size Number_of_blocks_processed_simultaneouslyLatency

  • Infinite Impulse Response (IIR) FilterEquations (1)Transfer function

  • Two investigated architecturesArchitecture 1: Direct II Form

  • Architecture 2: Cascade of second-order systems(b)Fi(z)

  • Example of coefficients: Butterworth filterOrder O=10, Passband Fp=0.3Architecture 1: Direct II FormArchitecture 2: Cascade of second-order systemsa[1..10] = b[1..10] =

  • IIR Filterwith control unit & i/o interfaceclockresetdata_inwia_iab_writedata_outwoRequired interfacewcb_iwcprocessreadyvalid

  • Project 2bfrom FALL 2005to be modified in FALL 2006

  • Using high-level behavioral VHDL describe an 8-bit microcontroller MC68HC11E1, workingin the expanded mode, with the following simplifications:

    Inputs and outputs of the microcontroller are reduced to

    E (clock), RESETn (reset active low), RW (read/write), AS (address strobe), ADDR15..8 (also denoted as PB7..0), ADDR7..0/DATA7..0 (multiplexed address & data, also denoted as PC7..0), PORTD and PORTE.Microcontroller

  • 2. Internal registers are reduced to the registers A, IX, SP, CC (Condition Codes NZVC), and PC.

    3. The only parts of 68HC11E1 implemented in your model are:a. CPUb. RAM (512 B in the range $0000-$01FF)c. parallel I/O (PORTD and PORTE)

    4. Internally generated clock E has a frequency 2 MHz.

    5. Internal I/O registers are limited toPORTD at the memory address $1008DDRD at the memory address $1009PORTE at the memory address $100A

  • 6. Instruction set of the microcontroller is reduced to the following instructions

    Data transfer instructions

    LDAA, LDX, LDS, STAA, STXArithmetic instructions

    CLRA, NEGA, ADDA, SUBA, ASRA, ASLALogic instructions

    ANDA, ORAA, EORAData test instructions

    CMPA, CPX, TSTAControl instructions

    BEQ, BGT, BHI, BSR, JSR, RTS, JMPStack instructions

    PSHA, PULA, PSHX, PULX

  • 7. Addressing modes of the microcontroller are reduced to the following modesa. immediateb. extended c. indexedd. inherente. relative

    8. Main program is stored in the external RAM starting at the address $4000.

    9. After reset, PC is set to the address $0000 (internal RAM of MC68HC11) where the instruction JMP $4000 is located.

  • Microcontroller systemThe implemented microcontroller system should consist of:Microcontroller MC68HC11E18 kB RAM, such as 616474HC373 8-bit latch74HC138 decoder chipAuxiliary gates, if needed

  • Write Cycle

  • Features of the modelYour model should allow cycle accurate modeling

    of the circuit behavior.2.Your model should contain debugging featuresequivalent to the debugging features of the DLX model,discussed in class and described in Ashenden, Chapter 15.3.Generic parameters passed to the modelshould include a. name of the file with the contents of the external RAMb. clk-to-output delayc. debugging modeYour model should report all undefined opcodes,

    treat them as NOP, and proceed to the next RAM address.

  • Testing and debuggingThe behavior of your model should be carefully verifiedusing a testbench instantiating your model with a. the external RAM containing a valid program composed of a substantial subset of instructions implemented in the modelb. debugging mode set to the most detailed mode (trace_each_step)

  • DeliverablesAll source code files.Contents of the external RAM used for

    the model verification, in the hexadecimal notation, and expressed using the corresponding 68HC11 assembly language mnemonics.The detailed log/report generated by your model

    for a given contents of RAM, and with the debuggingmode set to trace_each_step.

  • All Projects - OrganizationProjects divided into phasesIntermediate code submitted through WebCT at selected checkpoints and evaluated by the instructor and/or TAPenalty points for falling behind the schedule (below 50% of the work that supposed to be done by a certain deadline)Feedback provided to students on a fair and best effort basisFinal report and codes submitted by WebCT and graded using a full scale Contest for the best results (bonus points awarded to the winners)Penalty and bonus points added to the final grade

  • Honor Code RulesAll students are expected to write and debug their codes individuallyStudents are encouraged to help and support each other in all problems related to the- operation of the CAD tools,- basic understanding of the problem.

    ****

    FPGAs are the reconfigurable top of the shelf chips. Reconfiguration technique is very similar to SRAM approach.The FPGA architectures consist of CLBs and Programmable Switch Matrices (PSMs). The gates are placed inside CLBs in FPGAs. These info will be supported by the next slide as well.Bullet 3 is not very correct. Because FPGA is not a large array of gates with programmable interconnections. As I have mentioned above, the main power of FPGAs come from CLB and its internal components like LUTs(look up tables), Carry bit Logics, Gates, FFs(Flip Flops) and MUXs(multiplexors).

    ****The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.****The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB.

    Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.

    ****

    ****

    ****

    ****

    ****