Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of...

102
IT 14 008 Examensarbete 30 hp Januari 2014 Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist Institutionen för informationsteknologi Department of Information Technology

Transcript of Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of...

Page 1: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

IT 14 008

Examensarbete 30 hpJanuari 2014

Adaptation of an ARM compatible System on chip as an IP-module in a FPGA

Emanuel Wahlqvist

Institutionen för informationsteknologiDepartment of Information Technology

Page 2: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the
Page 3: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Adaptation of an ARM compatible System on chip asan IP-module in a FPGA

Emanuel Wahlqvist

In the world of today a fast prototyping and low time to market are very importantfactors when developing products. Any effort to minimize these parameters as well asmaking systems easier to maintain is effort well placed. Syntronic is a consultantcompany dealing in electronic and software development, testing and maintenance.They see the soft core processor, implemented in a Field Programmable Gate Array,as a step towards more versatile platforms. As a first effort this thesis presents thespecification, implementation and testing of a System on Chip based on a open sourceARMv2a compatible processor designed in Verilog. The system aims at applicationswhere performance is not the highest priority but rather small FPGA area andpossibility to connect many different sensor types. The final result is a system that isable to execute both assembler and C code in simulations. There was no hardwareavailable for testing but the synthesis procedure shows promising results. The finalsystem include interfaces for UART, SPI and I2C along with support for up to 32General Purpose Input Output pins. All steps required for modifying and customizingthe system is also presented along with the tools used.

Tryckt av: Reprocentralen ITCIT 14 008Examinator: Philipp RümmerÄmnesgranskare: Leif GustafssonHandledare: Stig Silver

Page 4: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the
Page 5: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Acknowledgements

I would like to thank:

• Stig Silver at Syntronic for trusting in me to solve this problem.

• Lars Johansson, my supervisor at Syntronic, for guidance, knowledge and helpalong the way.

• Robert Adenmark who, despite being on parental leave, showed me in the rightdirection on more FPGA detailed issues.

• Leif Gustafsson, my supervisor at Uppsala University, for reading, correctingand giving knowledgeable input to this report and for directing my attention torelated studies in this subject.

And all others at Syntronic who in some way aided me in this work.

Emanuel Wahlqvist

Page 6: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Contents

1. Introduction 11.1. FPGA fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1. CLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2. RAM blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3. Routing net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2. FPGAs and processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3. Why ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4. HDL design with Verilog . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1. Example design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2. Related work 72.1. Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3. Specification 83.1. Core alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1. Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4. HDL Design 114.1. Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2. Amber project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3. Clock and reset manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.4. The Amber core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.4.1. ARMv2a Instruction Set Architecture . . . . . . . . . . . . . . . . 144.4.2. Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.4.3. Pipeline hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.5. Wishbone bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5.1. Wishbone signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.5.2. Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.3. De-multiplexer (Demux) . . . . . . . . . . . . . . . . . . . . . . . . 23

4.6. Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6.1. Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.6.2. Boot memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.6.3. Main . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.6.4. Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.7. I2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7.1. I2C protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.7.2. I2C controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Page 7: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.8. SPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.1. SPI protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.8.2. SPI controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.9. UART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.9.1. UART protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.9.2. UART controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.10. Ethernet MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.11. GPIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.11.1. GPIO controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.12. Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.12.1. Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.12.2. Setting up a timer . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.13. Interrupt controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.13.1. Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.14. Test module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.15. Verilog test bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.15.1. UART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.15.2. SPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.15.3. I2C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.15.4. GPIO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5. Configuration 505.1. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2. Adding or removing a peripheral . . . . . . . . . . . . . . . . . . . . . . . 51

6. Tools 536.1. Xilinx ISE 14.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.1.1. Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1.2. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.1.3. Bulk simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.1.4. Debug switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2. Sourcery CodeBench for ARM processors . . . . . . . . . . . . . . . . . . 596.3. Amber specific tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.3.1. amber-elfsplitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.3.2. amber-elfsplitter-memcontents . . . . . . . . . . . . . . . . . . . . 606.3.3. check mem size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.4. Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

7. Testing 627.1. Assembler tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1.1. SPI test (spi.S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627.1.2. I2C test (i2c.S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.1.3. UART test (uart tx.S) . . . . . . . . . . . . . . . . . . . . . . . . . 667.1.4. GPIO (gpio.S) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Page 8: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

7.2. C tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.2.1. Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.2.2. boot-loader-serial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.2.3. dhry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.2.4. hello-world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.2.5. spi-timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.3. Linux test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8. Result 70

9. Conclusion 719.1. Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719.2. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

9.2.1. Target independence . . . . . . . . . . . . . . . . . . . . . . . . . . 719.2.2. Peripheral integration . . . . . . . . . . . . . . . . . . . . . . . . . 71

9.3. Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729.3.1. Peripheral controllers . . . . . . . . . . . . . . . . . . . . . . . . . 72

9.4. Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729.5. Compiler optimization issue . . . . . . . . . . . . . . . . . . . . . . . . . . 73

10.Discussion 7410.1. Pros and cons with the Amber SoC . . . . . . . . . . . . . . . . . . . . . . 7410.2. Peripherals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

11.Future work 76

12.Bibliography 78

A. I2C test output I

B. Linux test output V

Page 9: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

List of Figures

1.1. Simple sketch of FPGA layout . . . . . . . . . . . . . . . . . . . . . . . . 2

4.1. Diagram showing the complete system design . . . . . . . . . . . . . . . . 114.2. Overview of the a23 verilog structure. . . . . . . . . . . . . . . . . . . . . 134.3. Example of pipelined execution. . . . . . . . . . . . . . . . . . . . . . . . . 164.4. Example of control hazard handling. . . . . . . . . . . . . . . . . . . . . . 184.5. Wishbone handshake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.6. Wishbone single read cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 224.7. Wishbone single write cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 224.8. Wishbone synchronous burst cycle . . . . . . . . . . . . . . . . . . . . . . 234.9. Schematic of the wishbone demultiplexer . . . . . . . . . . . . . . . . . . . 244.10. Tri-state buffers on SDA and SCL. . . . . . . . . . . . . . . . . . . . . . . 294.11. I2C transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.12. SPI transfer timing diagram. . . . . . . . . . . . . . . . . . . . . . . . . . 324.13. UART in half duplex mode. . . . . . . . . . . . . . . . . . . . . . . . . . . 334.14. UART in full duplex mode with RTS and CTS. . . . . . . . . . . . . . . . 344.15. A UART transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.16. Structural schematic of the UART controller. . . . . . . . . . . . . . . . . 354.17. GPIO pin tri-state buffer connection. . . . . . . . . . . . . . . . . . . . . . 384.18. Interrupt vectors and masks. . . . . . . . . . . . . . . . . . . . . . . . . . 424.19. Fast interrupt vectors and masks. . . . . . . . . . . . . . . . . . . . . . . . 42

6.1. Xilinx ISE design flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2. Simulation script organization . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.1. SPI transfer of first word. . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.2. SPI transfer of second word. . . . . . . . . . . . . . . . . . . . . . . . . . . 637.3. Start condition and sending slave address plus write bit (0x20) . . . . . . 637.4. Sending register address 0x01 . . . . . . . . . . . . . . . . . . . . . . . . . 647.5. Sending data 0xa5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647.6. Sending data 0x5a and a stop condition . . . . . . . . . . . . . . . . . . . 647.7. Start condition and sending slave address plus write bit (0x20) . . . . . . 647.8. Sending register address 0x01 . . . . . . . . . . . . . . . . . . . . . . . . . 647.9. Start condition and sending slave address plus read bit (0x21) . . . . . . . 647.10. Reading data 0xa5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657.11. Reading data 0x5a and stop condition . . . . . . . . . . . . . . . . . . . . 657.12. Start condition and sending slave address plus write bit (0x20) . . . . . . 65

Page 10: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

7.13. Sending invalid register address 0x10 and receiving NACK . . . . . . . . . 657.14. Send character ”H” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667.15. Send character ”i”, receive character ”H” . . . . . . . . . . . . . . . . . . 667.16. Send character ”!”, receive character ”i” . . . . . . . . . . . . . . . . . . . 667.17. Send character ” ”, receive character ”!” . . . . . . . . . . . . . . . . . . . 677.18. Pins [8:1] is ”0xDA” and mirrored on pins [16:9] . . . . . . . . . . . . . . 677.19. Pins [8:1] is ”0xBE” and mirrored on pins [16:9] . . . . . . . . . . . . . . . 68

Page 11: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

List of Tables

3.1. Comparison between ARM cores . . . . . . . . . . . . . . . . . . . . . . . 9

4.1. ARMv2a instructions supported by the Amber core. . . . . . . . . . . . . 144.2. Some of the control signals for the execute stage. . . . . . . . . . . . . . . 174.3. Wishbone signals, direction is seen from a master perspective . . . . . . . 214.4. Slave numbering in the Wishbone demultiplexer. . . . . . . . . . . . . . . 234.5. Coprocessor registers. All registers are 32 bits wide. . . . . . . . . . . . . 264.6. Layout of coprocessor register CR0 . . . . . . . . . . . . . . . . . . . . . . 264.7. I2C registers. All registers are 8 bits wide . . . . . . . . . . . . . . . . . . 304.8. SPI modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.9. SPI core registers. All registers are 32 bits wide. . . . . . . . . . . . . . . 334.10. UART core registers. All registers are 8 bits wide. . . . . . . . . . . . . . 364.11. Flag register bits. Bits 2 and 1 are always high. . . . . . . . . . . . . . . . 374.12. GPIO core registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.13. Timer core registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.14. Control register bits. Unused bits are always low. . . . . . . . . . . . . . . 404.15. Timer prescaler value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.16. Interrupt vector outline. The unused bits (NA) are initialized to zero. . . 424.17. Timer core registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.18. Test module registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1. Parameters to configure the system . . . . . . . . . . . . . . . . . . . . . . 50

6.1. Simulation script options . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.2. Simulation script options . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3. Required environmental variables . . . . . . . . . . . . . . . . . . . . . . . 61

7.1. Files required for a C test . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Page 12: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Abbreviations

CISC Complex Instruction Set Computer

CLB Configurable Logic Block

CTS Clear To Send

DMIPS Dhrystone Million Instructions Per Second

DSP Digital Signal Processors

ELF Executable and Linkable Format

FIFO First In First Out

FIRQ Fast Interrupt ReQuest

GPL General Public License

GUI Graphical User Interface

HDL Hardware Descriptive Language

I2C Inter-Integrated Circuit

IP Intellectual Property

IRQ Interrupt ReQuest

ISA Instruction Set Architecture

LGPL Lesser (or Library if old) General Public License

LSB Least Significant Byte

LUT Look-up table

MAC Media Access Control

PC Program Counter

PCB Printed Circuit Board

PLL Phase Locked Loop

RAM Random Access Memory

Page 13: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

RISC Reduced Instruction Set Computer

RTS Ready To Send

Rx Receive

SPI Serial Peripheral Interface

Tx Transmit

UART Universal Asynchronous Reciever/Transmitter

Page 14: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

1. Introduction

Syntronic is a global consultant company dealing in product development, testing andmaintenance. They are active in several areas like telecommunication, medical andautomotive. Their idea is to cover all areas from a design idea to a finished productapplied in the field. In doing this they really see the advantage in keeping a designflow that not only gets products out on the market quickly but also makes them easyto maintain and upgrade. As a step in optimizing that design flow they want to take acloser look into soft processors. The reason for this is that several of their earlier designshas involved both FPGAs and microprocessors. By integrating the microprocessor intothe FPGA there is a great potential in lessening the development time and at the sametime make the system easier to tailor for future needs.This thesis includes the specification and implementation of a SoC design around an

ARM core in a FPGA. The purpose of the system is to be used in FPGA applicationswhere a small control processor is needed. This could include data collection fromsensors, receiving commands from a controlling system or user interface, perform smallercontrol loops, coordinate other signal processing algorithms in the FPGA etc.

1

Page 15: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

1.1. FPGA fundamentals

There are several manufacturers of FPGAs that all use their own architecture but themain structure is very similar. A general FPGA mainly consists of Configurable LogicBlocks (CLB:s) but also contains memory and DSP blocks. All blocks are connectedtogether through a configurable routing net that can connect any blocks with each otherregardless of their physical location on the FPGA as shown in Figure 1.1. This makesit possible to create any logic function ranging from a simple AND gate to extremelycomplex digital circuits such as processors. Historically these functions were described bycreating a schematic on a drawing board. When the designs grew in size and complexitythe use of a Hardware Descriptive Language(HDL), such as VHDL and Verilog, followedby a synthesis process became common.

Figure 1.1.: Simple sketch of FPGA layout

1.1.1. CLB

The CLB consists of several Look-Up Tables (LUT:s) that works as a logic functiongenerator and at least one flip-flop per LUT. A typical LUT has four, five or six inputs,one or two outputs and contains 2n bits (where n is the number of inputs). The CLBusually also contain multiplexers and additional flip-flops or latches. The flip-flops areused to synchronize an output from the LUT with a clock signal.Instead of using the LUT as a logic function it can also be used as a memory block. This

2

Page 16: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

memory type is often referred to as Distributed Random Access Memory (DistributedRAM).

1.1.2. RAM blocks

Another type of FPGA memory is the RAM blocks. They interface the rest of the FPGAwith input and output buses, an address bus, write enable inputs, a clock input and areset input. The internal memory array can be very large, up to at least 36K bits. Sincethere are no multiplexers and relatively few flip-flops compared to the distributed RAMit is the preferred way of implementing large memories as it uses less FPGA area.

1.1.3. Routing net

The routing nets cover the whole FPGA and can be configured to connect all blocks inmany different ways. At every point where nets cross each other a configurable switchmatrix is located that is used to connect nets with each other. There are a special typeof nets called clock nets that is used to distribute clock signals through the FPGA withminimal delay and skew.

1.2. FPGAs and processors

The fact that an FPGA can be programmed to perform any (of course limited to thesize of the device) amount of tasks in parallel makes it very suitable for digital signalprocessing. Earlier, FPGAs were often coupled with a separate microprocessor whotook care of communication interfaces, task management and other small organizationaltasks. This has lead to FPGAs with an integrated hard processor. Examples of thisis the Xilinx Zynq[1] and Altera SoC[2] product lines which combines different FPGAswith an ARM Cortex A9 processor or Microsemis Smartfusion[3] which uses an ARMCortex-M3 processor. This is a solution for one who needs to combine a high capacityFPGA with a very competent processor. Compared to the solution with a separateprocessor this has the following benefits:

• Smaller total Printed Circuit Board (PCB) footprint

• No hardware interface needed between the processor and FPGA modules

For someone with not so high demands on performance, this is probably not the optimalsolution. Since also the cheaper FPGAs has grown in size it is possible to implement asoft processor core inside these FPGAs along with the desired parallel logic. This hasseveral benefits over the hard core solution such as:

• Possibility to change/upgrade the processor in the finished product

• Companies can hide their designs better

• Easier to implement customized multi-core platforms

3

Page 17: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

The fastest way to implement a soft processor core in a FPGA is to use one of the vendorspecific cores. For Altera the processor is called NIOS II[4] while Xilinx call theirsMicroBlaze[5]. Both are 32 bit Reduced Instruction Set Computer (RISC) processorswith a variety of configuration parameters. They however, are not the only players inthe market. There are several RISC processors in the open source community to choosefrom, along with ARMs own propriety soft processor called Cortex M1.

1.3. Why ARM

Since ARM has seized a firm grip on the embedded processor market and is likely tokeep their position, companies including Syntronic, see an advantage in learning andusing processors based on their architectures. Even though it is not a very big step tomove from ARM to a NIOS or Microblaze processor Syntronic wanted to investigate thepossibilities of using a soft ARM core in a FPGA.

1.4. HDL design with Verilog

To understand the description of the final system no deep knowledge about HDL lan-guages is needed. It is however necessary to be familiar with the basic structure of aVerilog design. The following concepts is enough to follow the reasoning:

• Module

• Top level module

• Port

• Wire

Module A module is a block of logic that can be used once or several times in a design.

Top level module The top level module has the same code structure as a regularmodule. But the top level module is where all the regular modules are instantiated andorganized to create the final design. Thus there can only be one top level module in adesign.

Port The port is always defined in the beginning of each module and contains theinterface of the modules, in other words, the module’s inputs and outputs.

Wire A wire is used to connect modules or logic together.

4

Page 18: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

1.4.1. Example design

An example of a module implementing a simple AND gate is shown below. Everythingbetween the two keywords ”module” and ”endmodule” defines the content of the modulewhile ”and gate” is the name of the module which will be used to reference it later. Thecode between the parenthesis is the port, in this case two inputs and one output, andbetween the port and the ”endmodule” keyword is where the implementation is written.

/∗∗ The po r t o f the and gate module∗/

module and ga te (i nput x ,i nput y ,output z

) ;

/∗∗ The l o g i c o f the and gate module∗/

as s i gn z = x & y ;

endmodule

5

Page 19: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

To put several modules together one can instantiate the desired modules and wirethem together as shown in the example below. There two instances of the ”and gate”module shown above is used together with an OR gate. Worth noting is that there alsocan be logic in the top level module and in some cases one actually uses only a top levelmodule for a design.

/∗∗ The po r t o f the top l e v e l module∗/

module t o p l e v e l (i nput a ,i nput b ,i nput c ,i nput d ,output r e s u l t

) ;

‘ i n c l u d e ” and ga te . v”

/∗∗ To connect the r e s u l t w i th the outcome o f the two AND ga t e s∗ one can use a ” w i r e ”∗/

wire a AND b ;wire c AND d ;

/∗∗ Cr e a t i n g two AND ga t e s by i n s t a n t i a t i n g the and gate module tw i c e∗/

and ga te gate1 (. x ( a ) ,. y ( b ) ,. z ( a AND b )

) ;

and ga te gate2 (. x ( c ) ,. y ( d ) ,. z ( c AND d )

) ;

/∗∗ I t i s a l s o p o s s i b l e to use l o g i c i n the top l e v e l module∗ So l e t s c r e a t e an OR gate f o r the r e s u l t∗/

as s i gn r e s u l t = a AND b | c AND d ;

endmodule

6

Page 20: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

2. Related work

The idea of using a soft core processor in a FPGA is not new. One example is the freeLEON processor written in VHDL and based on the SPARC architecture. The devel-opment of LEON started in 1997[6] and the first version was released under the LibraryGeneral Public License1 (LGPL) in 2000[7] by the European Space Agency. After thatthe second version of LEON called LEON2 have had several successful implementationsin space[8]. For the third version the development was moved to the Swedish companyAeroflex Gaisler and it is now on its forth version. Another example of a open sourceprocessor is the openRISC 1000, or OR1K, that was released in 2001[9] and marked thebeginning of the opencores community. Along with the community the OR1K grew andhas now a complete toolchain, several compatible operating systems and there are atleast two available SoC:s that are developed around it[10]. A commercial example is theNIOS processor developed by Altera that was released in 2001[11].

2.1. Optimizations

Since the subject has become very popular there have been several studies with the goalto make soft processors more efficient in area utilization and also have better perfor-mance. In Sheldon et al.[12] the sharing of resources such as floating point units andmultipliers between soft cores are analysed. They managed to decrease the area utiliza-tion of a dualcore platform with 16% while only introducing an cycle count overhead of1%. Another interesting article was written by Lysecky and Vahid [13] where a so calledwarp processor based on a Microblaze soft core is implemented. The warp processoranalyses the software at runtime and uses a dynamic partitioning scheme to implementimportant software functions as circuits in the FPGA at runtime. When comparing thewarp processor to a fully equipped Microblaze processor they find that the performanceincreased 5.8 times while the power consumption decreased with 57%. Another approachis to optimize the processor for a specific software before synthesis. In an article writtenby Sheldon et al.[14] a method for this based on a Microblase processor is shown. Theygain a 200% speedup at most and a 20% speedup when using tight size constraints. Yetanother approach to application specific optimization is taken by Yiannacouras, Steffan,and Rose[15]. They use a verilog generating software called SPREE to generate appli-cation specific processors. By first optimizing away unused features and then removeunused parts of the instruction set they achieve a performance per area increase of 25%compared to a NIOS II processor.

1Now called Lesser General Public License

7

Page 21: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

3. Specification

A big task of the system will be to read data from different sensors and control a variety ofchips. This is done through different communication protocols where the most commonused today are SPI and I2C. For communicating with a PC in a simple way UARTshave been used for a very long time and will also be included. In discussion with thesupervisor about Syntronic’s need we agreed to also include a GPIO controller and aflash memory. A complete list of the systems peripherals are shown below.

• I2C controller

• SPI controller

• UART controller

• GPIO

• Flash memory for storage

• Main memory

• Boot mechanism

• At least one user interrupt

• At least one user configurable timer

8

Page 22: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

3.1. Core alternatives

There are several processor cores available for developers to chose from. As mentionedearlier, ARM has released their own FPGA targeted design called Cortex M1. Themajor benefit from using this processor is that ARM themselves has verified its functionand along with the core you get their warranty. The downside is of course the cost.A free evaluation version exists though but with a fixed configuration and no visibilityinside the code. Alternatives to the Cortex M1 can be found on the OpenCores websitewww.opencores.org. For this thesis, two projects have been considered, Amber andStorm SoC. Table 3.1 shows a comparison of the cores.[16][17][18]

Core Amber a23/a251 Cortex M1 Storm SoC

Pipeline stages 3/5 3 8

Cache size(kb) 8-32 D=0-1, I=0-1 D=1, I=1

Interrupts 16 1-32 32

Frequency (MHz) 40-80 70-200 80

DMIPS/MHz 0.75/1.05 0.8 NA

Occupied area (LUTs) 90002 26003 90004

License LGPL Commercial GPL

Cost Free 1$/unit, min 1000$ Free

Table 3.1.: Comparison between ARM cores

3.1.1. Considerations

The most important properties to consider is licensing, cost, performance and area uti-lization. Cortex M1 is the most expensive option while also providing the highest per-formance. However, if there is high demands on system performance one should insteadconsider the MicroBlaze and NIOS II mentioned earlier since they provide more featuresand higher performance at a lower cost[19]. The Xilinx Zynq or Altera SoC are otherhigh performance options as mentioned earlier. That leaves the two open source projects.The major benefit these have over the Cortex M1 except the cost is that they alreadyare complete systems. With the Cortex one has to add a bus architecture, find suitableperipherals for that bus and create an arbitration scheme between these and that takestime. When looking at the included peripherals, the STORM SoC has everything listedin the beginning of this chapter. It would then seem to be the best choice for this project.However, since the core is to be used in commercial applications, the biggest differencebetween the two open source cores are the license. The General Public License (GPL)license states that any products containing GPL licensed software needs to be shipped

1The amber project includes two different cores called 23 and 252Core 23 and 16KB cache3Minimal config, no Cache4Core and 2KB cache

9

Page 23: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

along with the source code of the whole system itself, while this demand is left out ofthe LGPL license. Since there might be the case that some components in a system isvery high-tech and secret, giving out the code is not an option. Therefore the systemimplemented in this thesis will be built around the Amber project.

10

Page 24: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4. HDL Design

In this chapter the Amber project is presented more in detail along with the changesthat were made to it in order to obtain the system specified in chapter 3. An overviewof the complete design can be seen below in Figure 4.1. There it is shown how the coreconnect to the peripherals over the Wishbone bus[20]. The Wishbone bus is a competitorto ARMs open bus standard AMBA and how it works is shown in more detail in Section4.5. The peripherals that came with the Amber project (UART, Interrupt controller,Timer controller and test module) were not included in the Amber user guide[21]. Theinformation presented about these were obtained by us through analyzing the code andsimulations of it. All the configuration parameters mentioned in this chapter are detailedin table 5.1.

Figure 4.1.: Diagram showing the complete system design

11

Page 25: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.1. Target

Different FPGA:s were discussed as a platform for the project. The supervisor suggestedsome sort of Xilinx FPGA since that was going to be used in another project and thehardware could then be shared. Unfortunately the business arrangements were notcompleted in time so there was no hardware available for actual testing. For simulationand synthesis the FPGA targeted in the Amber system, a Spartan 6 LX45T, was used.The synthesis results were only verified by reading the synthesis reports. These do notreplace hardware testing but at least for device utilization and basic timing analysis weconsidered it enough.

4.2. Amber project

The Amber project was designed by Conor Santifort, a member at opencores. It wastested by him on a Xilinx SP605 development board[21]. The complete specification ofthe system is shown in the list below.

• ARMv2a compatible core

• Configurable cache size

• 8kB boot memory

• Two UART controllers

• Ethernet MAC

• Interrupts

• Timers

• Spartan-6 DDR3 memory controller

The peripherals connect to the core over a wishbone bus interface. The boot memorycontains a boot loader that uses one of the UART ports to receive programs to be runon the system. The project also contains an extensive suite of hardware test programswritten in assembler along with a bootable linux image that can be run in a simulator.

4.3. Clock and reset manager

In this module the system-wide clock is generated by a Phase Locked Loop (PLL) anda synchronous reset signal is generated. Originally in the Amber project there werethree clock generators. One PLL for a Spartan 6 FPGA, one for a Virtex 6 FPGA and athird, non synthesizeable, clock generator used for simulations. Since this thesis uses theSpartan 6 as target we decided to remove the Virtex 6 PLL and the non synthesizeableclock generator. This makes the simulations more realistic and cleans up the code. The

12

Page 26: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

remaining PLL uses a differential clock input of 200MHz to generate a 800MHz clock.This clock is then divided by the AMBER CLOCK DIVIDER parameter to get thesystem clock.

4.4. The Amber core

In the Amber project there are two different cores available called a23 and a25. Theyare totally software compatible but have some differences as shown in table 3.1. Sincearea is preferred over performance the a23 core will be used. In Figure 4.2 below anoverview of the core’s Verilog structure is shown. The picture is taken from Amber corespecification[22] where it is called Figure 5.

Figure 4.2.: Overview of the a23 verilog structure.

The core has a unified data and instruction cache and executes instructions in a threestage pipeline. It also supports two interrupts with different priorities where the Fast In-terrupt ReQuest (FIRQ) is prioritized over the normal Interrupt Request (IRQ). A moredetailed description of the core and a schematic diagram of the processor architecture

13

Page 27: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

can be found in the Amber core specification[22].

4.4.1. ARMv2a Instruction Set Architecture

The core was built to be compatible with the ARMv2a Instruction Set Architecture(ISA) which is built up by a couple of 32-bit wide instructions. The ones supportedby the Amber core can be divided in categories depending on their purpose as shownin Table 4.1. For descriptions of the individual instruction’s syntax and operation seeTable 4 in the Amber core specification[22].

Category Instructions Description

Data processing ADC, ADD, AND, BIC, CMN,CMP, EOR, MOV, MVN, ORR,RSB, RSC, SBC, SUB, TEQ, TST

Performs operations ondata already in registers

Multiply MLA, MUL Used to perform multiply-ing operations

Single data swap SWP, SWPB Swaps data in a registerwith data in memory

Single data trans-fer

LDR, LDRB, STR, STRB Used to move data be-tween memory and regis-ters

Block data trans-fer

LDM, STM Moves a series of words be-tween memory and regis-ters

Branch B, BL Branches the execution toother places in the pro-gram

Coprocessor datatransfer

MCR, MRC Used to move data to andfrom a coprocessor register

Software inter-rupt

SWI Used to throw a softwareinterrupt exception

Table 4.1.: ARMv2a instructions supported by the Amber core.

Registers and modes

The ARMv2a ISA is a load/store architecture which means that all operations on dataoccurs in the processors internal registers. In the Amber core, and in ARM cores ingeneral, there are 16 internal registers of which 13 can be utilized for data operations.Which registers that are accessible depends on which mode the processor is in. For theAmber core four different modes are available:

User Non privileged mode. Most user code is executed here

14

Page 28: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

IRQ Privileged mode that the processor enters when a interrupt occurs

FIRQ Privileged mode that the processor enters when a fast interrupt occurs

Supervisor Privileged mode that the processor enters when a software interrupt occurs

The current mode is indicated in the two least significant bits of register 15. Thisregister is referred to as the program counter or PC as it also points to where in theprogram the processor is, or more correctly, the next instruction it will execute. Theother reserved registers are register 14 and 13. Register 14 is called the ”Link register”and contain the address the processor will jump to when the current function call iscompleted. Register 13 is called the ”Stack pointer”, or SP, and is used as a pointer tothe end of the stack. A graphical overview of the registers in the respective modes areshown in Table 14 and 15 in the Amber core specification[22].

Comparison with other ISAs

Compared to a general Complex Instruction Set Computer (CISC) architecture the RISCarchitecture is simpler and contain fewer instructions. This makes it less complicatedto implement in digital logic with the nice side effect that it uses less resources in theFPGA. The ARMv2a ISA in particular holds no grave implementation specific benefitsover other RISC ISAs, for example, the MIPS ISA. They both require a pipeline to bereally efficient and both contain a relatively small number of instructions.

15

Page 29: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.4.2. Pipeline

Using a pipeline means that the execution of an instruction is divided into smallersteps much like the famous assembly line invented by Henry Ford. In the a23 core theexecution is divided in three steps, or stages, called called fetch, decode and execute.In Figure 4.3 an example of the execution on a pipelined processor is compared with aprocessor without a pipeline. The example is only for basic understanding and does nottake into account the hazards discussed later in this section or other delays that occurwhen dividing the execution in several stages.

Figure 4.3.: Example of pipelined execution.

Fetch

In this stage the instruction, or data, is fetched from the cache. If the cache misses,i.e. the instruction or data is not there, the whole pipeline is stalled while it is fetchedfrom memory over the memory bus. If a instruction was fetched it is passed along tothe decode stage but data is passed directly to the execute stage.

16

Page 30: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Decode

This is the most complicated pipeline stage in the core. Here, the fetched instructionis decoded according to the tables in Chapter 4 of the Amber core specification[22].The decoded instruction is converted into control signals for the execute stage. Someexamples of these control signals are presented in Table 4.2.

Name Size (b) Description

instruction execute 1 If cleared the instruction passesthrough the execute stage withoutbeing executed. See Section 4.4.3why this is necessary.

rn sel 4 Selects which of the 15 cpu regis-ters that is used as rn register inthe current instruction.

rm sel 4 Selects which of the 15 cpu registersthat is used as rm register in thecurrent instruction.

rds sel 4 Selects which of the 15 cpu registersthat is used as rd and rs register inthe current instruction1.

status bits mode 2 Shows what mode the processor isin and thus which registers are tobe accessed.

Table 4.2.: Some of the control signals for the execute stage.

Execute

In this stage the control signals from the decode stage are registered and combined withdata from the fetch stage. The data passes through the ALU and the result is writtenback to the cache. Additionally, the next address for the fetch stage is generated.

4.4.3. Pipeline hazards

A pipeline generally improves the performance of a processor but it also introduces someproblems, called hazards. First there is the possibility when two subsequent instructionsaccess the same register and the first is a write instruction. This is often referred to as”data hazard”. Another problem occurs when a instruction is executed only if a certaincondition is met. This condition is determined by the execution of an earlier instructionbut by then the other is already in the fetch or decode stage, scheduled for execution.

1In for example the MUL instruction, the Rd register specified in the Amber core specification[22] isactually the Rn register.

17

Page 31: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

This is often referred to as ”control hazard”, or in the case of a conditional branchinstruction, ”branch hazard”.

Data hazard

In the a23 core this is dealt with by keeping the second instruction in the decode stagefor two extra cycles and prevent the execute stage from executing it the two extra times.Two examples of this is shown in section 2.2 of the Amber core specification[22]. Themethod of disabling the execute stage can be compared with the method where thecompiler inserts NOP instructions in the code to avoid this type of hazards. However,the decode stage in the a23 core stores the ”stalled” instruction in a register so that itcan be decoded directly after the execution is resumed. This saves one clock cycle sowhere the NOP method would waste three clock cycles the a23 core only wastes two.

Control hazard

This problem is not documented in the specification but simulations show that it issolved in a similar manner. The condition flags of the instruction are compared withthe status bits of the Program Counter (PC) in the decode stage. If a faulty conditionis detected the execute stage is disabled when that instruction passes through as shownin the following example.Consider the following assembler code snippet:

mov r0 , #0x0 @ Load ing v a l u e 0x0 i n t o r e g i s t e r 0mov r1 , #0x1 @ Load ing v a l u e 0x1 i n t o r e g i s t e r 1subs r2 , r1 , r0 @ Compare the r e g i s t e r s and update c o n d i t i o n f l a g sbeq 1 f @ This branch w i l l not execute , r1 != r2

Figure 4.4 shows what happens in the pipeline, which is also described below, tick bytick.

Figure 4.4.: Example of control hazard handling.

1. ”mov r1” instruction enters the fetch stage

2. ”mov r1” is decoded while ”mov r2” enters fetch.

18

Page 32: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

3. ”mov r1” is executed, ”mov r2” is decoded and the ”subs” instruction enters fetch

4. ”mov r2” is executed, subs is decoded and the ”beq” instruction enters fetch

5. ”subs” is executed and beq is decoded. The decode stage detects a conditionalexecution and starts to read the status flags of the program counter. The executestage updates these flags after the execution is done.

6. The decode stage has detected a faulty condition so it disables the execute stage.The ”beq” instruction passes through without being executed.

19

Page 33: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.5. Wishbone bus

The wishbone bus is an open standard designed for interconnection between IntellectualProperty (IP)-cores. It is widely used in the open source community and is the officialinterconnect fabric for the cores at opencores.org. In a comparison to the AMBA busused by ARM, wishbone get praise for its simplicity and ease of use[23]. There are severaldifferent options for implementing the Wishbone bus, for example, the topology couldbe implemented in four different ways:

Point-to-point Connects one master to one slave

Pipelined The IP-cores are connected sequentially and thus act as both master andslave, forwarding the data

Shared bus Connects one or several masters to one or several slaves with a common busmedium. An arbiter is needed to direct all data traffic

Crossbar switch Similar to the shared bus with the addition that several masters cancommunicate at the same time, as long as they do not try to access the same slave.

There are also two different bus cycle definitions called classic and registered feedback.Registered feedback actually includes the classic cycle but also includes improvements tosend data in bursts. This improvement comes at the cost of a more complicated interfaceand the need for three additional signals. As for the Amber 23 system it uses a 32 bitwide classic wishbone bus with the standard protocol and a shared bus topology. Theonly exception is that the reset signal RST is not used. The bus supports the classic readand write cycle along with the simplest burst type called ”Synchronous cycle terminatedburst”. As seen in the standard there would be a performance gain in implementinganother burst type, for example the ”Advanced synchronous cycle terminated burst”which is also mentioned in the Chapter 11 Future Work.

4.5.1. Wishbone signals

The Wishbone signals has a naming scheme where the signal has a prefix according toits direction where I means In and O means Out. An output signal, for example strobe(called STB), could therefore be implemented as either O STB or STB O. In the Ambersystem the signal direction is in respective to each module so an output in the mastermodule would be called an input in the slave module and vice versa. The signals usedby the Amber system are:

20

Page 34: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

ADR O Target address for the current bus cycle.

SEL O Four bit signal that indicates which bytes in the 32 bit data that isvalid for the current cycle.

WE O Indicates if the current cycle is a write or read. 1 indicates a write.

DAT I 32 bit wide data input line.

DAT O 32 bit wide data output line.

CYC O Asserted at the slave targeted by the transfer. If not asserted, othersignals are not valid.

STB O Strobe line. Asserted to the slave targeted at the current bus cycle.

ACK I Set by the slave to indicate that the strobe and cyc signal is detected.In the case of a read cycle the data must be available at the nextpositive edge of the wishbone clock after ack is asserted.

ERR I Indicate that the slave cannot perform the requested action. No er-ror handling is implemented in the Amber core but the signal is stillpresent.

Table 4.3.: Wishbone signals, direction is seen from a master perspective

4.5.2. Protocol

The handshake between master and slave is clearly shown in the Wishbone standard [20]Illustration 3-3 which is shown below in Figure 4.5. The master asserts CYC O andSTB O at the positive edge of CLK I. When the slave is ready to respond it assertsACK I at a following positive clock edge. The master terminates the cycle by resettingCYC O and STB O.

Figure 4.5.: Wishbone handshake

Below in Figure 4.6 and Figure 4.7 a single read/write cycle is shown respectively. Thepictures are taken from the Wishbone standard [20] where they are named Illustration3-5 and Illustration 3-7 respectively.A read cycle is performed as follows. At the first clock edge the master asserts CYC -

O and STB O to indicate a valid transfer. It also presents an address on ADR O andasserts SEL O accordingly. The WE O is kept low to indicate a read cycle. When theslave is ready to present data on the DAT I lines (here at the next clock edge) it assertsACK I and presents the data on DAT I.

21

Page 35: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 4.6.: Wishbone single read cycle

The write cycle is similar but with the difference that the WE O signal is asserted toindicate a write and the data is presented on DAT O at the first clock edge. The slavestill asserts ACK O when it is ready which in the write cycle in most cases is the nextclock edge.

Figure 4.7.: Wishbone single write cycle

In figure 4.8 a burst access is shown. The picture is taken from the Wishbone stan-dard [20] where it is named Illustration 4-2. Here the master initiates a transfer butinstead of negating STB O after an ack it is kept high. By doing this the master cankeep owning the bus for several word transfers even if a master with higher priority isrequesting bus access. This also speeds up the transfer since one clock cycle is savedbetween every transferred word, where otherwise an initiating of the transfer should haveoccurred.

22

Page 36: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 4.8.: Wishbone synchronous burst cycle

4.5.3. De-multiplexer (Demux)

The Wishbone demux connects the Wishbone master (the Amber core) to the slaves(peripherals). It treats the Wishbone signals ADR O, DAT O and SEL O as generaland branch them out to all slaves, while the other signals are directed to the currentlyaddressed slave. It determines which slave is currently addressed by using its baseaddress. That is converted to a number as shown in Table 4.4. A schematic of thedemux is shown in Figure 4.9. The Verilog file is named wishbone arbiter.v althoughit is actually a demux. This is derived from the original Amber project where thiscomponent also arbitrated between two Wishbone masters. To not loose the referenceto the original code the filename has been kept but for correctness it is referenced hereas a demux.

Number Base address (hex) Slave

0 2000 I2C

1 NA2 Boot memory

2 NA3 Main memory

3 1600 UART0

4 1700 UART1

5 F000 Test module

6 1300 Timer module

7 1400 Interrupt controller

8 1800 SPI Controller

9 1900 GPIO

Table 4.4.: Slave numbering in the Wishbone demultiplexer.

2Depends on the BOOT MSB parameter, see Section 4.6.23Depends on the BOOT MSB and MAIN MSB parameter, see Section 4.6.3

23

Page 37: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 4.9.: Schematic of the wishbone demultiplexer

24

Page 38: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.6. Memory

The ARM architecture uses memory byte addressing, meaning that the smallest unitaddressable in any memory is one byte. ARMv2a also supports word access meaningthat a chunk of four bytes are addressed at the same time. An example of this is the ”str”and ”strb” assembler instructions that stores a word or a byte in a register respectively.There were a couple of different memories supplied with the Amber project which arelisted below.

• Generic RAM with variable size, byte-wide write enable

– Used as boot and cache data memory

– Synthesizes as flip-flops

• Generic RAM with variable size, line-wide write enable

– Used as cache tag memory

– Synthesizes as ram blocks in Spartan 6

• Spartan 6 specific block ram implementations of different fixed sizes

– Used as boot and cache (data and tag) memory

– Synthesizes as ram blocks

– Sizes: 256x21, 256x32, 256x128, 512x128, 1024x128, 2048x32 and 4096x32

– Useful only on 6 series FPGA:s

• Wishbone to Spartan 6 memory controller bridge with DDR3 model

– Used as main memory along with a DDR3 model generated by Coregen

– Useful only on Spartan 6 designs

• A non synthesizable memory model of variable size, 32 and 128 MB

– Used as main memory in simulations only

All Spartan 6 specific ram were removed since the code have to be usable for all kindsof FPGAs. As for the generic coded memories it is desirable that they synthesize inram blocks when they are available. Of the ones supplied only the one with line-widewrite enable achieved this. Therefore they were replaced by following the template foundin Xilinx UG687 [24]. The line-wide write enable memory was also replaced to keep acontinuous coding style. These memories will synthesize in any FPGA and in a 6 seriesFPGA from Xilinx (and most probably others as well) they will be utilizing ram blocks.

25

Page 39: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.6.1. Cache

The system uses a unified cache meaning that the data and instructions share cachespace. Its size is configurable through the parameter A23 CACHE WAYS and can beeither 2, 4, 8 or 16 kB. In the FPGA the cache is built from two different RAMs, onefor the tags and one for the actual data. The cache is controlled by a coprocessor whichin turn are controlled by four registers shown in Table 4.5. These registers are accessedwith the assembler instructions mcr and mrc.

Name Access Description

CR0 R ID register

CR2 R/W Cache control register

CR3 R/W Cachable area register

CR4 R/W Updateable area register

CR5 R/W Disruptive area

CR6 R Fault status register

CR7 R Fault address register

Table 4.5.: Coprocessor registers. All registers are 32 bits wide.

ID register (CR0)

This register returns an ID tag of the processor. It has the layout shown in Table 4.6.

Bit 31:24 23:16 15:8 7:0

Name Company ID Manufacturer ID Part type Revision

Value (hex) 41 56 03 00

Table 4.6.: Layout of coprocessor register CR0

Cache control register (CR2)

This register is used to enable and disable the cache memory. By setting bit 0 the cacheis enabled, otherwise it is disabled. The other bits are unused.

Cachable area register (CR3)

The area from the boot and main memory that can be cached are defined in this register.Every bit represents a 2MB region where bit 0 represent the lowest 2 MB.

Updateable area register (CR4)

This register marks 2 MB regions as read only, with bit 0 repersenting the lowest 2 MB.Writes to a read only region is ignored.

26

Page 40: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Disruptive area (CR5)

Writing to areas marked by this register flushes the cache. Bit 0 represents the lowest 2MB.

Fault status register (CR6)

If a cache miss occurs, a Fault status can be read from this register.

Fault address register (CR7)

If a cache miss occurs the faulty address is stored in this register.

4.6.2. Boot memory

The boot memory uses a per byte write enable and is variable in size. The size is changedwith the parameter BOOT MSB and the resulting size can be calculated using equation4.1.

Size(b) = 2BOOT MSB+1 (4.1)

The boot address space starts at address 0 and the highest address is found by subtract-ing 1 from the result of equation 4.1.

Originally the Amber system infused the boot memory content in the test bench. Atsynthesis, the specified block ram component was loaded through the makefile. Since allXilinx specific code was removed this is no longer possible. Instead the Verilog function”readmemh” is used as shown below. It infuses content into the boot memory at bothsimulations and synthesis using a file generated by the amber-elfsplitter-memcontentstool described in 6.

i n i t i a lbegin

$readmemh ( ” boot mem contents . data ” , mem, 0 , 2∗∗(ADDRESS WIDTH−2)−1) ;end

The command takes as argument a file containing the data, the array that is to beloaded and the index boundaries of the array. The file ”boot mem contents.data” isextracted from an elf4 file using the tool amber-elfsplitter-memcontents described insection 6.3.2 and contains only data values.

4.6.3. Main

The memory is implemented as a 32 bit wide array that is variable in depth by changingthe parameter MAIN MSB. The size (in bytes) of the memory can be calculated withequation 4.2.

4An elf file is the resulting executable after a program compilation

27

Page 41: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Size(b) = 2MAIN MSB+1 (4.2)

The memory address space starts where the boot memory ends which means that thelowest address can be calculated with equation 4.1. The highest address naturally iscalculated by summing the base and size and subtracting one as shown in equation 4.3

Address = 2BOOT MSB+1 + 2MAIN MSB+1− 1 (4.3)

In the original memory controllers there was a signal called i mem ctrl that was usedto wrap the memory address at bit 24 if it was set. The purpose was to simulate a32MB memory even if it was bigger like the 128MB RAM on the SP605 dev board. Inthe current implementation it has been left out since the size is variable through theMAIN MSB parameter.

4.6.4. Flash memory

There is no controller for a flash memory in the system. In those cases where it is needed,a serial flash memory can be directly connected to the SPI controller as a slave.

28

Page 42: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.7. I2C

I2C is a very common serial communication protocol. In this section the protocol is firstdescribed and then followed by an introduction to the core that was used in this system.

4.7.1. I2C protocol

I2C uses two signals for communication:

SCL Serial Clock

SDA Serial Data

SCL is a clock line that determines the speed of the transfer. This is controlled bythe master but the slave can force it low to pause the transfer temporarily (this is calledclock stretching). SDA is the data line and the control is shared between the master andthe slave. In order to share a common line there has to be tri-state buffers in both endsalong with an output enable (oe) signal as shown in Figure 4.10.

Figure 4.10.: Tri-state buffers on SDA and SCL.

A typical I2C transfer is shown in figure 4.11 below. The picture is taken from theI2C controller specification[25]

Figure 4.11.: I2C transfer.

Figure 4.11 step by step:

29

Page 43: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

1. The master generates a start condition (SDA line is pulled low before SCL). Allslaves starts to listen.

2. The master sends the address to the desired slave.

3. The master tristates the SDA and the addressed slave confirms that it detectedthe ”call” by pulling SDA low. This is called ACK.

4. The master sends the data.

5. The slave confirms the data by setting the SDA high. This is called a NACK.

6. The master generates a stop condition (SCL is pulled high before SDA).

4.7.2. I2C controller

The I2C controller used in this project was written by Richard Herveille and publishedon opencores in 2001[26]. The version used in this system was uploaded to opencores onthe 6:th of June in 2010. Some of the key features taken from the are:

• Multi master operation

• Clock stretching and wait generation

• Supports 7 and 10 bit addressing mode

• Arbitration lost interrupt

• 8 bit wishbone interface

Since the wishbone interface is only 8 bits it only supports byte access. To avoid anyunpredictable behavior caused by undefined values the Least Significant Byte (LSB) ofthe output signal DAT O is wired to the controller while the other bytes are set to zeroin system.v. All the other signals are also wired to the controller with the LSB. Thecore is configured and controlled by a set of registers shown in table 4.7.

Name Address Access Description

PRERlo 0x20000000 R/W Low byte of clock prescaler

PRERhi 0x20000004 R/W High byte of clock prescaler

CTR 0x20000008 R/W Control register

TXR 0x2000000C W Transmit register

RXR 0x2000000C R Receive register

CR 0x20000010 W Command register

SR 0x20000010 R Status register

Table 4.7.: I2C registers. All registers are 8 bits wide

For information how to set up the registers see the I2C controller specification[25].

30

Page 44: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.8. SPI

SPI is a full duplex capable serial protocol used in a wide variety of applications rangingfrom small sensors to transfers of large amounts of data. In this section the SPI protocolis described followed by an introduction to the core used in the system.

4.8.1. SPI protocol

SPI uses four signals for communication:

SS Slave Select. It is used to select the slave the master currently wants to address.This eliminates the need of sending an address like I2C at the cost of some extrahardware.

SCK Serial Clock. This is controlled completely by the master and sets the speed of thetransfer

MISO Master In Slave Out. Data line from slave to master.

MOSI Master Out Slave In. Data line from master to slave.

There are four different modes of SPI communication called 0,1,2 and 3 as shown intable 4.8. The parameters that determine the modes are:

CPOL Level of SCK in idle state. CPOL = 0 means SCK = low.

CPHA Phase of SCK. If the data is sampled on the rising or falling edge of SCK. CPHA= 0 means sample on rising edge.

Mode CPOL CPHA

0 0 0

1 0 1

2 1 0

3 1 1

Table 4.8.: SPI modes

A timing diagram of a transfer in mode 0 are shown below in figure 4.12 that is takenfrom the SPI controller specification[27].

31

Page 45: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 4.12.: SPI transfer timing diagram.

4.8.2. SPI controller

The SPI core used in this system was written by Simon Srot and published on opencoresin 2002[28]. The version used in this project was uploaded on the 10:th of March 2009.Some of the key features of the core are:

• Full duplex

• Variable length of transfer word up to 128 bits

• MSB of LSB first data transfer

• Supports mode 0 and 1

• Eight slave select lines

• 32 bit Wishbone slave interface

The core is configured and controlled by a set of registers shown in table 4.9. Forinformation on how to set up the registers for a specific configuration see the SPI corespecification[27]. However, two things are worth an extra notice. First is bit 8 of thecontrol register called GO BSY in the specification. When this bit is set, the transferstarts. It is important that all registers are set up before this, even the control register.Therefore, to start a transfer, two writes to the control register has to be done. Secondis that the receive and transmit registers are implemented in the same flip-flops. So awrite to the transmit register during a transfer will actually overwrite the received datain the corresponding transmit register.

32

Page 46: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Name Address Access Description

RX0 0x18000000 R Recieve register bits [31:0]

RX1 0x18000004 R Recieve register bits [63:32]

RX2 0x18000008 R Recieve register bits [95:64]

RX3 0x1800000c R Recieve register bits [127:96]

TX0 0x18000000 R/W Transmit register bits [31:0]

TX1 0x18000004 R/W Transmit register bits [63:32]

TX2 0x18000008 R/W Transmit register bits [95:64]

TX3 0x1800000c R/W Transmit register bits [127:96]

CTRL 0x18000010 R/W Control and status register

DIVIDER 0x18000014 R/W SCK divider value.

SS 0x18000018 R/W Slave select register

Table 4.9.: SPI core registers. All registers are 32 bits wide.

4.9. UART

UART is one of the most common serial protocols used to interface between different sys-tems. Its simplicity makes it ideal to send commands and instructions from a computeror terminal to a smaller system such as this. It is also used to convert parallel data trans-missions to serial or to interface with RS-232 and RS-485 drivers. The Amber projecthad two UART controllers already implemented and both of them were kept. One ofthem is used by the included boot-loader to interface with a computer and initializeprogram downloads.

4.9.1. UART protocol

UART is a point to point transmission and can be used in either simplex, half duplex orfull duplex mode. The transmission speed is called baud rate and is configured separatelyat both ends. There is therefore no need for a clock signal and in simplex mode thereis then only need for a single data line. In half duplex mode the data line is sharedbut an additional signal controls the direction of the data as shown in Figure 4.13. Thedirection signals is controlled by one of the UART controllers and is usually called ReadyTo Send (RTS) or Clear to send (CTS).

Figure 4.13.: UART in half duplex mode.

For full duplex there naturally is two data lines and in the smallest configuration this

33

Page 47: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

is all that is needed. However, since UART controllers usually uses small First In FirstOut (FIFO) buffers they often implement two more signals, RTS and CTS. They are usedto tell the other end if the receive buffer is full so that they can pause the transmission.The RTS signal from one controller is wired to the CTS input on the other as shown inFigure 4.14.

Figure 4.14.: UART in full duplex mode with RTS and CTS.

A transfer with the UART protocol is a bit flexible. It always starts with a start bitbut the data that follows can range from 5 to 9 bits. If the data is 8 bits or smaller it isthen followed by an optional parity bit and the transmission ends with one or two stopbits. The start bit is low, the data is always sent LSB first and the stop bits are high.A schematic of this is shown in Figure 4.15.

Figure 4.15.: A UART transfer.

4.9.2. UART controller

The controller was included in the Amber project and the main features of the controllerare:

• Fixed setting of:

– 8 data bits

– No parity bit

– 1 stop bit

• Hardware configurable baudrate, synchronous with system clock

• 1 byte buffer or 16 byte FIFO, enabled in software

• Transmit and receive interrupts

Having a fixed configuration and baudrate makes the controller quite inflexible butvery small in terms of FPGA utilization. Also, that the UART run synchronously with

34

Page 48: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

the system clock means that the baud rate is not exact. However, it is well withinthe 10% offset allowed in the standard according to the author of the controller[29]. InFigure 4.16 the structure of the UART controller is shown. The data register is drawnin dotted lines since it is not actually a register, just an common address to the transmitand receive FIFOs. In the system two UARTs are instantiated, called UART0 andUART1. They are differentiated by the first 16 bits of the address where UART0 has0x1600 and UART1 0x1700. These are shown as XXXX in Table 4.10 where the UARTconfiguration registers are shown.

Figure 4.16.: Structural schematic of the UART controller.

Interrupts

The transmit and receive interrupt share the same output. Thus when an output occursone need to read the interrupt status register to determine which kind of interrupt thatoccurred.

Receive interrupt If the FIFO is enabled the receive interrupt will trigger when thereis 8 bytes or more in the FIFO. Thus it can be cleared by reading bytes through thedata register until less than 8 bytes remain. If the FIFO is disabled the interrupt willtrigger when there is a byte ready in the transmission buffer and reset when the byte isread.

Transmit interrupt If the FIFO is enabled the transmit interrupt will trigger whenthere is 8 bytes or less left in the FIFO. Thus it can be cleared by pushing bytes through

35

Page 49: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

the data register until there is more than 8 bytes in the FIFO. If the FIFO is disabledthe interrupt will trigger when the transmission buffer is empty and reset when a byteis pushed to it. The transmit interrupt can also be cleared by a write to the interruptclear register.

Configuration registers

Name Address Access Description

PID0 0xXXXX0fe0 R Constant value of 0x00000010

PID1 0xXXXX0fe4 R Constant value of 0x00000010

PID02 0xXXXX0fe8 R Constant value of 0x00000004

PID03 0xXXXX0fec R Constant value of 0x00000000

CID0 0xXXXX0ff0 R Constant value of 0x0000000d

CID01 0xXXXX0ff4 R Constant value of 0x000000f0

CID2 0xXXXX0ff8 R Constant value of 0x00000005

CID3 0xXXXX0ffc R Constant value of 0x000000b1

DR 0xXXXX0000 R/W Data register

RSR 0xXXXX0004 R/W Receive Status Register

LCRH 0xXXXX0008 R/W Line Control Register High

LCRM 0xXXXX000c R/W Line Control Register Middle

LCRL 0xXXXX0010 R/W Line Control Register Low

CR 0xXXXX0014 R/W Control Register

FR 0xXXXX0018 R Flag Register

IIR 0xXXXX001c R Interrupt status register

ICR 0xXXXX001c W Interrupt Clear Register

Table 4.10.: UART core registers. All registers are 8 bits wide.

DR, Data register

A write to this register either pushes a byte into the FIFO if it is enabled, otherwiseputs it directly in the 1 byte buffer for transmission. When reading this register eitherthe oldest byte from the FIFO or the byte in the transmission buffer is retrieved. Thecontroller will initiate a transmission as soon as there is data in the FIFO or transmissionbuffer and the receiving UART signals that it is ready by pulling the CTS input low.Thus, a write to this register will implicitly start a transmission.

RSR, Receive status register

Not used but initialised to zero. A write to this register will store whatever value waswritten and a read will return the same value or zeroes if no write has occurred.

36

Page 50: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Line Control Registers

The Line Control Register consists of three bytes, called H (High), M(Middle) andL(Low). Of these only bit 4 in the high byte is used which will enable the FIFOs whenset.

Control Register

In this register bit 4 is used to enable the receive interrupt and bit 5 to enable thetransmit interrupt (high = enabled). The other bits are unused.

FR (Flag Register)

The 8 bits of the flag register are used as shown in Table 4.11.

Bit 7 6 5 4 3 2 1 0

Name TxE RxF TxF RxE Busy Not used Not used CTS

Table 4.11.: Flag register bits. Bits 2 and 1 are always high.

TxE Transmit FIFO empty. When no data is present in the FIFO or the buffer isempty this bit is high.

RxF Receive FIFO full. When the FIFO is full or there is data in the buffer this bit ishigh.

TxF Transmit FIFO full. When the FIFO is full or there is data in the buffer this bitis high.

RxE Receive FIFO empty. When no data is present in the FIFO or the buffer is emptythis bit is high.

Busy UART busy flag. When there is data in the buffer or FIFO, this bit is high.

CTS Clear To Send. When the device the UART communicating with is ready toreceive data this bit is high.

IIR Interrupt status register

This register is used to read the status of the interrupts. Bit 2 is the transmit interruptstatus and bit 1 the receive interrupt status (high means interrupt active).

ICR Interrupt clear register

A write with any data to this register clears the transmit interrupt.

37

Page 51: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.10. Ethernet MAC

The Ethernet Media Access Control (MAC) provided in the Amber project was removed.The decision was based upon the fact that it represented around 25% of the system’sFPGA utilization while not implementing a function Syntronic saw any future need of inthis system. If this function is needed sometime in the future, the work needed to reinsertthe controller is not overwhelming, see Section 5.2 for further information. Before doingthat though, one should also consider the use of an external controller since that wouldfurther simplify the implementation and save a considerable amount of FPGA resources.

4.11. GPIO

GPIO:s are exactly what the name states. A set of pins that can be configured individ-ually in software to act as either inputs or outputs. They are very useful for readingbuttons or driving led lights but can also be used to implement communication protocolssuch as the I2C and SPI presented earlier.

4.11.1. GPIO controller

The GPIO controller used in this project was written by Richard Harveille and uploadedon opencores in 2002[30]. The version used in this system was uploaded on the 10:th ofMarch in 2009. The original version had support for 8 GPIO pins and an 8 bit Wishboneinterface. If one wanted more one could instantiate several components to achieve that.Since that solution would be a bit cumbersome in this case we modified the controllerto support up to 32 GPIO pins and a 32 bit Wishbone interface in one instance. Thenumber of usable pins are configured with the GPIO PINS parameter. Since a pin canbe used as both input and output it has to utilize a tri-state buffer as shown in Figure4.17. The registers CTRL, WRITE and READ are explained below.

Figure 4.17.: GPIO pin tri-state buffer connection.

Configuration registers

The GPIO pins are controlled through two register addresses called CTRL and LINEwhere LINE actually points to two different internal registers. They vary in size with the

38

Page 52: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

GPIO PINS parameter and every bit in the registers represent a pin. Due to an issuewhile accessing the line register from a C program it has been split into two addressesin register addresses.h called WRITE and READ. The software registers are presentedin Table 4.12. More details about the access issue is described in Section 9.5.

Name Address Access Description

CTRL 0x19000000 R/W Control the direction of the pins

WRITE 0x19000004 R/W Write output pin values, points to the LINE register

READ 0x19000014 R/W Read input pin values, points to the LINE register

Table 4.12.: GPIO core registers.

CTRL This registers control if the respective pin is used as output or input. By settinga bit in this register to 1 its respective pin is used as an output.

WRITE This register is used to set a value to the output pins. There is nothing in thehardware that will prevent a read access but using this to read input pins might causefaulty values so use the READ register instead. Writing to an input pin will have noeffect.

READ This register is used to read the values from the input pins. It will also reportthe state of the output pins, an effect of them sharing hardware register.

4.12. Timers

The timer core was included in the Amber project. There are three timers available thatare configurable through a set of registers shown in Table 4.13.The timers are identicaland have the following features:

• Individual interrupts

• Either periodic or one-shot

• Three different prescalers

4.12.1. Registers

39

Page 53: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Name Address Access Description

TIMER0 LOAD 0x13000000 R/W Value set to timer 0

TIMER0 VALUE 0x13000004 R Current value of timer 0

TIMER0 CTRL 0x13000008 R/W Timer 0 control register

TIMER0 CLR 0x1300000c W Timer 0 interrupt clear

TIMER1 LOAD 0x13000010 R/W Value set to timer 1

TIMER1 VALUE 0x13000014 R Current value of timer 1

TIMER1 CTRL 0x13000018 R/W Timer 1 control register

TIMER1 CLR 0x1300001c W Timer 1 interrupt clear

TIMER2 LOAD 0x13000020 R/W Value set to timer 2

TIMER2 VALUE 0x13000024 R Current value of timer 2

TIMER2 CTRL 0x13000028 R/W Timer 2 control register

TIMER2 CLR 0x1300002c W Timer 2 interrupt clear

Table 4.13.: Timer core registers.

LOAD

The LOAD register is used to load a value that the timer will count down from. Theregister is two bytes wide an thus the biggest value that can be loaded is 0xFFFF. Thevalue is stored in the timer until it is overwritten.

VALUE

This register holds the current value of the timer and can be read. A write to thisregister has no effect.

CTRL

The control register is 8 bits and they are used as shown in Table 4.14.

Bit 7 6 5:4 3:2 1:0

Name Enable Mode Not used Prescaler Not used

Table 4.14.: Control register bits. Unused bits are always low.

Enable When set the timer is enabled.

Mode This bit controls if the timer will act in one-shot or periodic mode. Periodicmode is entered when the bit is high and works a expected. When the timer has reachedthe value in the LOAD register an interrupt is fired and the timer restarts the counting.One-shot mode however is not exactly what one would expect. Instead of disabling thetimer after it has reached the LOAD value it loads 0xFFFF and start counting again.To avoid this the irq-handler has to disable the timer at the first interrupt.

40

Page 54: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Prescaler The prescaler bits determine how much the timer counts on a system tick.How much the prescaler affects the counting is shown in Table 4.15.

CTRL[3:2] PS value

00 1

01 16

10 256

11 Not used

Table 4.15.: Timer prescaler value

CLR

Writing any value to this register clears the timer interrupt.

4.12.2. Setting up a timer

The timer is set up by first writing a value to the LOAD register and then set up thecontrol register with the enable bit set and the desired prescaler and mode. How longtime in seconds the timer will count is shown in equation 4.4 where LOAD is the valuein the LOAD register and PS value is the value of the prescaler as shown in Table 4.15.

T ime(s) =LOAD ∗ PS value

Freq(Hz)(4.4)

When the time expires the timer will fire an interrupt and restart.

4.13. Interrupt controller

The interrupt controller was included in the Amber project and basically combines allhardware and software interrupts into two interrupt request signals, IRQ and F(ast)IRQ.The interrupt request signals are then fed into the processor core. The irq signal isgenerated by combining the hardware and software interrupts with two enable masks.The firq signal only contains the hardware interrupts and has two enable masks of itsown. The relation between the interrupts and masks are shown in Figure 4.18 for theIRQ signal and 4.19 for the FIRQ signal. Shown in these figures are also the interruptrequests from the test module described in Section 4.14. They are not maskable andtheir status can not be read and thus they are not discussed more in this section.

41

Page 55: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 4.18.: Interrupt vectors and masks.

Figure 4.19.: Fast interrupt vectors and masks.

4.13.1. Registers

The registers in Table 4.17 handles the different vectors and masks shown in Figure 4.18and 4.19 above. For the vectors that include the software interrupt they are outlined aspresented in Table 4.16. For the others the software interrupt bit is unused.

Bit 31:10 9 8 7 6 5 4:3 2 1 0

Name NA SPI I2C Timer2 Timer1 Timer0 NA UART1 UART0 SW

Table 4.16.: Interrupt vector outline. The unused bits (NA) are initialized to zero.

42

Page 56: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Name Address Access Description

IRQ0 STATUS 0x14000000 R IRQ0 status

IRQ0 RAWSTAT 0x14000004 R IRQ0 hw IRQ status

IRQ0 ENABLESET 0x14000008 R/W IRQ0 mask enable

IRQ0 ENABLECLR 0x1400000c W IRQ0 mask disable

INT SOFTSET 0 0x14000010 R/W Set software interrupt 0

INT SOFTCLEAR 0 0x14000014 R/W Clear software interrupt 0

FIRQ0 STATUS 0x14000020 R FIRQ0 status

FIRQ0 RAWSTAT 0x14000024 R FIRQ0 hw IRQ status

FIRQ0 ENABLESET 0x14000028 R/W FIRQ0 mask enable

FIRQ0 ENABLECLR 0x1400002c W FIRQ0 mask disable

IRQ1 STATUS 0x14000040 R IRQ1 status

IRQ1 RAWSTAT 0x14000044 R IRQ1 hw IRQ status

IRQ1 ENABLESET 0x14000048 R/W IRQ1 mask enable

IRQ1 ENABLECLR 0x1400004c W IRQ1 mask disable

INT SOFTSET 1 0x14000050 R/W Set software interrupt 1

INT SOFTCLEAR 1 0x14000054 R/W Clear software interrupt 1

FIRQ1 STATUS 0x14000060 R FIRQ1 status

FIRQ1 RAWSTAT 0x14000064 R FIRQ1 hw IRQ status

FIRQ1 ENABLESET 0x14000068 R/W FIRQ1 mask enable

FIRQ1 ENABLECLR 0x1400006c W FIRQ1 mask disable

INT SOFTSET 2 0x14000090 None Defined but unused

INT SOFTCLEAR 2 0x14000094 None Defined but unused

INT SOFTSET 3 0x140000d0 None Defined but unused

INT SOFTCLEAR 3 0x140000d4 None Defined but unused

Table 4.17.: Timer core registers.

In Table 4.17 above there are six types of registers, STATUS, RAWSTAT, ENABLE-SET, ENABLECLR, SOFTSET and SOFTCLEAR. They are divided for the six differentinterrupt types IRQ0, IRQ1, FIRQ0, FIRQ1, SOFT0 and SOFT1. The registers haveexactly the same function for their respective interrupts and masks so there will onlyfollow a general description them.

STATUS Reading from this register will return the value of the masked interruptvector.

RAWSTAT Reading from this register will return the value of the (unmasked) hardwareinterrupt vector.

ENABLESET Writing 1 to bits in this register will enable the corresponding interrupt.Reading from it will return the current enable vector.

43

Page 57: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

ENABLECLR Writing 1 to bits in this register will disable the corresponding interrupt.

SOFTSET Writing 1 to bit zero of this register will set the corresponding softwareinterrupt. Reading from it will return the software interrupt status.

SOFTCLEAR Writing 1 to bit zero of this register will clear the corresponding softwareinterrupt. Reading from it will return the software interrupt status.

44

Page 58: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

4.14. Test module

This module was included in the Amber project and is used to interface with the verilogtest bench, to test the interrupt functionality, controlling the test bench UART andprovide a set of random numbers to the system. It is controlled by a set of registersshown in Table 4.18.

Name Address Access Description

STATUS 0xf0000000 R/W Test status register

FIRQ TIMER 0xf0000004 R/W FIRQ interrupt test register

IRQ TIMER 0xf0000008 R/W IRQ interrupt test register

UART CONTROL 0xf0000010 R/W Controls the test bench UART

UART STATUS 0xf0000014 R Test bench UART status register

UART TXD 0xf0000018 R/W Test bench UART data feed

SIM CTRL 0xf000001c R Simulation register

MEM CTRL 0xf0000020 R/W Not used

CYCLES 0xf0000024 R Counts system ticks

LED 0xf0000028 R/W Control LEDS on SP605 board

PHY RST 0xf000002c R/W Not used

RANDOM NUM 0xf0000100 R/W Provides a random number

RANDOM NUM00 0xf0000100 R/W Provides a random number

RANDOM NUM01 0xf0000104 R/W Provides a random number

RANDOM NUM02 0xf0000108 R/W Provides a random number

RANDOM NUM03 0xf000010c R/W Provides a random number

RANDOM NUM04 0xf0000110 R/W Provides a random number

RANDOM NUM05 0xf0000114 R/W Provides a random number

RANDOM NUM06 0xf0000118 R/W Provides a random number

RANDOM NUM07 0xf000011c R/W Provides a random number

RANDOM NUM08 0xf0000120 R/W Provides a random number

RANDOM NUM09 0xf0000124 R/W Provides a random number

RANDOM NUM10 0xf0000128 R/W Provides a random number

RANDOM NUM11 0xf000012c R/W Provides a random number

RANDOM NUM12 0xf0000130 R/W Provides a random number

RANDOM NUM13 0xf0000134 R/W Provides a random number

RANDOM NUM14 0xf0000138 R/W Provides a random number

RANDOM NUM15 0xf000013c R/W Provides a random number

Table 4.18.: Test module registers.

STATUS Used to terminate tests in simulation through the Verilog test bench. Awrite to this register with value ”32’d17” will terminate the test and generate a testpass message. A write with any other data will generate a testfail. If the data is equal

45

Page 59: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

to or greater than ”32’h8000” the the message will be: ”Failed ’testname’ - with error0x’data’” otherwise it will be: ”Failed ’testname’ - with error on line ’data’”. Readingwill return the value written to it or zeroes. This is useful in hardware and only if thetestpass() or testfail() functions are not used since they put the processor in an infiniteloop.

FIRQ TIMER and IRQ TIMER Used to test the interrupt functionality of the pro-cessor core. When writing to this register only the LSB is used and the data has thefollowing effect:

8’h00 Clears the interrupt

8’h01 Sets the interrupt

8’h02 to 8’hff Initiate a countdown that decreases every system clock tick. When itreaches 8’h01 it fires the interrupt and stops.

Note that these interrupts will not be shown in any of the interrupt controller vectors.However, reading from this register will return interrupt timer value and can thereforebe used to check if an interrupt is set from here.

UART CONTROL This register controls the UART interface in the Verilog test bench.For this only the two lowest bits are used. They have the following effect:

Bit 0 When set it enables transmission in the test bench UART.

Bit 1 When set the test bench UART is in loopback mode.

UART STATUS Returns the status of the test bench UART. Only bit one and zero isused and has the following meaning:

Bit 0 High if the UART transmit FIFO is empty.

Bit 1 High if the UART transmit FIFO is full.

UART TXD This register is used to push a byte into the test bench UARTs transmitFIFO if it not in loopback mode. If the FIFO is full the byte will be discarded and awarning message generated (in simulation).

SIM CTRL This register is used by software to determine if it runs in simulation orin hardware. If the register is zero then it is on the FPGA otherwise it is a simulation.This register is controlled by the run.sh script and a define in the code.

MEM CTRL This register has no effect any more. Its purpose was to wrap addressesgoing to the main memory and this feature was used by some of the tests.

46

Page 60: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

CYCLES This 32 bit register stores the number of system ticks since startup.

LED This register is used to control the LEDs on the SP605 development board. Forthis bit 3:0 is used.

PHY RST This register is not used any more. Its purpose was to reset the Ethernetcontroller on the SP605 development board.

RANDOM NUM registers These are a set of one byte wide registers containing arandom number. A new random number is retrieved reading the LSB from any of theseregisters. Writing to any of these registers will give the generator a new seed.

4.15. Verilog test bench

The test bench is the top level entity when running simulations and was included inthe Amber project. It instantiates the whole system along with slave modules for thefollowing functions:

• UART

• SPI

• I2C

• GPIO

Additionally the test bench generates a 200MHz clock signal, loads the main memorywith content and read the STATUS register of the test module described in 4.14 toterminate tests. When a test is terminated a message is printed that shows the currentsystem status and a Passed or Failed message as shown below. In the following sectionsthe slave modules will be described in more detail.

47

Page 61: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Amber CoreUser FIRQ IRQ > SVC

r0 0 x20000010r1 0 x000000c1r2 0 x00000002r3 0 x00000080r4 0 xdeadbee fr5 0 x000000a5r6 0 x0000005ar7 0 xdeadbee fr8 0 xdeadbee f 0 xdeadbee fr9 0 xdeadbee f 0 xdeadbee fr10 0 x00000011 0 xdeadbee fr11 0 xf0000000 0 xdeadbee fr12 0 xdeadbee f 0 xdeadbee fr13 0 xdeadbee f 0 xdeadbee f 0 xdeadbee f 0 xdeadbee fr14 ( l r ) 0 xdeadbee f 0 xdeadbee f 0 xdeadbee f 0 xdeadbee fr15 ( pc ) 0 x00000268

S ta tu s B i t s : N=0, Z=0, C=1, V=0, IRQ Mask 1 , FIRQ Mask 1 , Mode = Sup e r v i s o r−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

++++++++++++++++++++Passed i 2 c 12364 t i c k s++++++++++++++++++++Stopped at t ime : 309577500 ps : F i l e ”/home/ emanuel /workspace /amber SoC/

t runk /hw/ v l o g / tb / tb . v” L ine 462

4.15.1. UART

This UART controller has two modes, loopback and transmission. In loopback mode areceived byte is put in the transmission buffer and sent back. In transmission mode itutilises a 16 byte transmission buffer that can be filled with data using the TXD register.The registers are controlled from the test module described in Section 4.14.

4.15.2. SPI

The SPI slave model is a simple loopback model that was included in the SPI project[28]where it was a part of that systems test bench. The only modifications made was toread the CTRL register in order to automatically set the same mode as the Amber SPIcontroller.

4.15.3. I2C

This I2C slave was included in the I2C project where it was part of its test bench. Itis interfaced as a real I2C device with address 7’b0010000 and contain 16 registers with

48

Page 62: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

both read and write access. Since both the slave and the acutal controller tristatesSDA and SCL they are pulled up in the test bench top level with the Verilog ”pullup”command.

4.15.4. GPIO

There was no suitable test model included in the GPIO project[] so a simple loopbackmodel was written. It divides the GPIO signals in two equally sized sections. The lowerhalf of the signals are mirrored on the upper half.

49

Page 63: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

5. Configuration

In this chapter the parameters for configuring the system are presented. The steps thatwere taken to remove and add modules to the system are also shown.

5.1. Parameters

There are a number of parameters for configuring the system presented in Table 5.1.

Parameter Location Description

A23 CACHE WAYS a23 config defines Defines the size of the cache. Thesize can be either 2, 4, 8 or 16 kB.

A23 RAM REGISTER BANK a23 config defines If set the register bank is imple-mented in a RAM block otherwisein flip-flops.

AMBER CLOCK DIVIDER system config defines The PLL output is divided by thisvalue to get the system clock.

AMBER UART BAUD system config defines Specifies the baud rate for bothUARTs

BOOT MSB memory configuration.v Specifies the size of the boot mem-ory.

GPIO PINS system config defines.v The number of available GPIOpins. Any number from 1 to 32 isvalid.

MAIN MSB memory configuration.v Specifies the size of the main mem-ory.

SPI DIVIDER LEN spi defines.v Sets the bit length for the spi clockdivider.

SPI MAX CHAR spi defines.v Sets the maximum transmissiondata block size.

SPI SS NB spi defines.v Sets the number of slave select sig-nals.

WB SLAVES system.v Sets the number of slaves on theWishbone bus.

Table 5.1.: Parameters to configure the system

50

Page 64: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

5.2. Adding or removing a peripheral

This section contains a general description of the modifications that were made whenadding the SPI, GPIO and I2C controllers. For removing a peripheral the modificationsare similar. The files that needs to be changed are:

• amber isim.prj

• amber registers.h

• global defines.v

• interrupt controller.v

• memory configuration.v

• registry defines.v

• system.v

• tb.v

• wishbone arbiter.v

• xs6 source files.prj

amber isim.prj

All HDL files that is needed for simulations are listed here so all new files should beadded. This includes any test bench files if they exist.

amber registers.h

This header file contains defines for all registers in the system. It should be kept syn-chronized with register addresses.v

global defines.v

Here a reference to the peripheral is defined. It is not mandatory for the peripheral towork but it is a neat way to access signals and components during simulation.

interrupt controller.v

The interrupts are defined in a 32 bit vector called raw interrupts. If the new controllercontains an interrupt output it should be defined here and wired to a new input.

51

Page 65: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

memory configuration.v

The Wishbone demux uses a function (called in controllername) defined in this file toidentify the currently addressed slave. The peripherals base addresses are also definedin this file.

registry defines.v

All registers that are accessible through the Wishbone bus should be defined here.

system.v

In this file the following changes has to be made:

• Increase the WB SLAVES parameter

• Add any inputs and outputs

• Instantiate the component

• Modify instantiation of the interrupt controller if an interrupt signal is used

• Modify the Wishbone demux instantiation

tb.v

If any inputs or outputs are added to system.v they have to be added in the systeminstantiation. Instantiate also any slave module or bench for testing the peripheral.

wishbone arbiter.v

In this file the following has to be changed

• Add inputs and outputs for the peripherals Wishbone interface

• Add the peripheral to the slave arbitration (assignment to signal current slave)

• Add assignments to the peripherals Wishbone signals

xs6 source files

All HDL files that are needed for synthesis should be added to this file. Note that anytest bench files should not be listed here.

52

Page 66: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

6. Tools

Included in the Amber project are Linux scripts for running simulations and makefilesfor both synthesis and software compiling. They are all introduced in this chapter, alongwith the compiler, simulator and synthesis software they depend on.

6.1. Xilinx ISE 14.5

Both the simulation script and synthesis makefile use the Xilinx ISE development suite.It is available in a basic version called Webpack[31] that is free and contains all thenecessary functionality for this project. Since the simulations and synthesis are runfrom scripts the built in Graphical User Interface (GUI) and project manager are notused.

6.1.1. Synthesis

The makefile for synthesis uses the command line design flow described in the XilinxCommand Line Tools User Guide[32] and the XST User Guide[33]. Its overall procedureis shown in Figure 6.1. Before the synthesis step is entered the serial boot loader softwareis compiled and the .data file used to load the boot memory is placed in the work folder.For the synthesis step the input is a project file (file extension .prj) containing a listof all Verilog source files and a text file containing a seed from which the placementalgorithms generate its starting point. At step two, called the NGDBuild step, a userconstraints file is also used. That file contains information about which pins on theFPGA to use and also timing constraints on different nets. More information on the ucfand constraints can be found in the Xilinx constraints file[34]. The result of the designprocess is a bitfile that can be downloaded to the target FPGA.

53

Page 67: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 6.1.: Xilinx ISE design flow.

54

Page 68: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Synthesis, XST

This application uses the HDL source files and converts them to a list of logical circuitsand their connections. This file is called a netlist (file extension .ngc) and is the firststep towards the goal of flashing the FPGA. XST can take a large number of optionsand the ones used can be viewed in the makefile. Also, in ISE there is a viewer for thenetlist called RTL viewer. It displays all the circuits and the connections as a schematicdiagram. This can be used as a step in verifying the source code.

NGDBuild

This application uses the netlist file created by XST and the ucf to generate a NativeGeneric Database(NGD) file. The ngd file basically contains the same things as thenetlist file except that the logic circuits have been transformed into FPGA primitivessuch as AND gates, OR gates, flip-flops and look-up tables (LUTs).

MAP

In the mapping step the primitives in the ngd file is mapped into specific places in thetargeted FPGA. The file that describe the physical placement is called a Native CircuitDescription (NCD) file but since it only describes the placement of the primitives, notthe connections, it outputs an immediate file called map.ncd. It is also possible to dothe placement in the next step but it is not done so in this case.

Place and Route, PAR

In this step the connections between the primitives are routed in the FPGA and outputas an ncd file. This is a very complicated process since it is often a huge number ofconnections that are made and they can be made in a many different ways.

Timing analysis, trce

This is the final check of the circuit. It checks the design against timing constraints thatwere given in the user constraints file. If timing fails the place and route program isrerun in an effort to fix the errors and a new timing analysis is done. This is iterateduntil the design passes the test or a certain threshold of iterations is passed. If the placeand route fails to produce an error free routing one can rerun the map with another seedto get a new placement that allows for other routes to be made.

BitGen

The BitGen, or bitgenerator, takes the ncd file and converts it to a bitstream that canbe used by Xilinx flashing program Impact to program the FPGA.

55

Page 69: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

6.1.2. Simulation

The simulation script uses the Xilinx ISIM simulator that is included in the ISE de-sign suite to run a behavioural simulation. ISIM allows for the use of a test bench andviewing signals in a configurable waveform viewer. It also supports the Tool CommandLanguage (TCL) for controlling the simulator.The script invokes the gcc-arm compiler found in the CodeSourcery compiler suite dis-cussed in Section 6.2 to compile the software used as stimulus for the simulation. Runningthe script with a test named ”test my SoC” is done from the terminal with the followingcommand:

user@computer : / . . / ambe r d i r / t runk /hw/ i s im$ . / run . sh [− op t i o n s ] tes t my SoC

An output from running a test called i2c.S are included in Appendix A. The scriptcan be used with a set of options described in Table 6.1.

Option Description

-h Bring up a help message

-g Launch simulation in ISIM graphical interface

-w Used with -g option to specify a wave configuration file (wcfg) locatedin the wcfg directory

-b Specify the size of the boot memory. It is needed to generate the .data1

file properly. Defaults to 8192 bytes.

Table 6.1.: Simulation script options

When the script is invoked it follows the execution order showed in Figure 6.2. Ex-planations to the actions are given below.

1Boot memory contents file, see Section 4.6.2

56

Page 70: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 6.2.: Simulation script organization

57

Page 71: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Checking test type

There are three different types of tests that are supported. They are described in Table6.2 below. If the test does not exist the script will exit here and give the error message”Test ’testName’ not found”

Test type Location Description

Hardware hw/tests/ Test written in assembler. Used to test specific hard-ware functions of the system.

Software sw/testName/ Test written in C. This can either be a stand-aloneapplication or it can use the boot-loader that jumpsto address 0x8000.

vmlinux sw/vmlinux Boots a linux kernel. This is an extensive test thatis used to further verify the correctness of the core.It requires the MAIN MSB parameter to be 24 i.e.32MBytes.

Table 6.2.: Simulation script options

Generate boot .data file

As described in Section 4.6.2 the boot memory is loaded with data from a file calledboot mem contents.data. This step is where that file is generated for simulations. Ituses the tool amber-elfsplitter-memcontents described in Section 6.3.2. In order for thefile to be correct the boot memory size need to be specified correctly.

Launch FUSE

FUSE is part of the ISIM simulator. It is used to generate an executable file from aVerilog design specified in a project file with file extension .prj. Note that this is not thesame project file as the one used for synthesis since the simulation project also containsthe Verilog test bench.

Launch ISIM

ISIM is launched by running the executable that was created by FUSE. It can be runwith or without a GUI. If the GUI is used a wave configuration file can be specified. Itis a file for configuring the wave viewer to show some specific signals and are very usefulwhen debugging so one does not have to add them manually every time. The file shouldbe located in the ”wcfg” subdirectory of the ”isim” folder.

6.1.3. Bulk simulation

In the Amber project there was also a script included to run several assembler testsautomatically. The script is called all.sh and is invoked with the following command:

58

Page 72: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

user@computer : / . . / ambe r d i r / t runk /hw/ i s im$ . / a l l . sh [−b xxxx ]

It contains a list with test names and uses the run.sh script to sequentially run themall. The -b option is used to pass a boot memory size to the run.sh script. It can beomitted if the boot memory size is 8192 bytes, otherwise it has to be specified.

6.1.4. Debug switches

A set of debug switches were present in the Amber project and have been kept. Theyare located in the files a23 config defines.v and system config defines.v. Enabling themwill print a set of debug messages in the terminal when running simulations except forthe AMBER WISHBONE DEBUG parameter which will add jitter to the Wishboneinterface.

6.2. Sourcery CodeBench for ARM processors

For compilation of the software that is to be used in the system the GNU Cross Compiler(gcc) is used. It can be found in a ready made package from Mentor Graphics calledSourcery CodeBench Lite Edition[35]. To be able to download it one must register, itis completely free but there are no support included. To use it one has to specify thefollowing options to the compiler:

-march=armv2a Use instructions for the ARMv2a architecture.

-mno-thumb-interwork use ARM instructions only, no thumb instructions.

and this to the linker:

–fix-vfbx Changes all ”bx” instructions to ”mov pc, lr”.

The reason for this is that the Amber core does not support neither the thumb in-struction set nor the assembler instruction ”bx”. This information was found in theAmber user guide[21].

6.3. Amber specific tools

There were a number of tools included in the Amber project and the ones used in thethesis are presented here. One of them was modified slightly for use with the new bootmemory and named amber-elfsplitter-memcontents. Not presented here are some toolsfor looking at disassembled files and tools for generating different memory content files.These can prove useful in the future but as said above, has not been used during thethesis. If one is interested they are located in ”$AMBER BASE/trunk/sw/tools/”.

59

Page 73: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

6.3.1. amber-elfsplitter

This tool analyses a linker file (-elf) and generates a .mem file used to infuse specific ramblocks with content. Since no such memories are used any more this tool is only usedto load main memory content in the Verilog testbench that still uses the old infusionscheme.

6.3.2. amber-elfsplitter-memcontents

This tool is a modified version of the amber-elfsplitter described above. It still usesthe elf file as input but instead outputs a file with content for use with the $readmemhcommand. In order for that command to work properly the whole memory array needsto be filled. Therefore the array is padded with zeros after all valid data. This hasthe positive side effect that the boot memory will not contain any uninitialized memoryslots. In order to know how many zeros that should be added the tool has an additionalinput that specifies the size of the boot memory in bytes. A typical usage with the linkerfile test.elf and memory size of 8kB is shown below.

amber−e l f s p l i t t e r −memcontents t e s t . e l f 8192 > mem contents . data

6.3.3. check mem size

Used by the boot loader makefile to ensure that the compiled boot loader program fitsin the boot memory.

60

Page 74: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

6.4. Installation

In addition to using each software packages installer one has to configure a couple ofenvironmental variables shown in Table 6.3. These are most conveniently put in the.bashrc file or similar.

Variable Description

AMBER BASE Absolute path to the trunk of the Amber project

AMBER CROSSTOOL Name of the gcc compiler

XILINX Path to the Xilinx libraries

PATH Both the Sourcery CodeBench and the Xilinx ISEshould be available through the system variablePATH

Table 6.3.: Required environmental variables

A snapshot of the .bashrc file used for this thesis is presented below. It was producedfrom an example in the Amber user guide[21].

# Change / p r o j /amber to where you saved the amber package on your systemexport AMBER BASE=/home/ emanuel /workspace /amber SoC/ t runk# Change / opt / Sou r c e r y to where the package i s i n s t a l l e d on your systemPATH=/opt / s o u r c e r y / b i n : ${PATH}# Also need to add X i l i n x ISE d i r e c t o r y to PATHPATH=/opt / X i l i n x /14 .5/ ISE DS/ ISE/ b in / l i n 6 4 : ${PATH}# AMBER CROSSTOOL i s the name added to the s t a r t o f each GNU t o o l i n# the Code Sou r c e r y b i n d i r e c t o r y .# This v a r i a b l e i s used i n v a r i o u s ma k e f i l e s to s e t# the c o r r e c t t o o l to comp i l e code f o r the Amber co r eexport AMBER CROSSTOOL=arm−none−l i n u x−gnueab i# X i l i n x ISE i n s t a l l a t i o n d i r e c t o r y# This shou l d be c o n f i g u r e d f o r you when you i n s t a l l ISE .# But check tha t i s has the c o r r e c t v a l u e# I t i s used i n the run s c r i p t to l o c a t e the X i l i n x l i b r a r y e l ement s .export XILINX=/opt / X i l i n x /14 .5/ ISE DS/ ISE

additionally one has to give the scripts in ”AMBER BASE/hw/isim” and”AMBER BASE/hw/fpga/bin” permission to execute. This is easiest done with chmodas follows:

user@computer : / d i r e c t o r y $ chmod +x ∗ . sh

61

Page 75: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

7. Testing

There are mainly three test types used to test this system. They are assembler, Cand Linux tests. They are all discussed in this chapter and in Section 7.1, where theassembler tests are discussed, waveforms of the SPI, I2C, UART and GPIO pins areshown

7.1. Assembler tests

An extensive suite of hardware tests written in assembler were included in the Amberproject. They are listed in the Amber user guide[21] in Table 2. In addition to thesetests three more tests were written to test the SPI, I2C and GPIO controllers. Since theEthernet MAC was removed the tests concerning it was also removed. Worth noting isthat the test ”addr ex” requires a main memory of 128MB to complete successfully, andsome other test requires 32MB. This has no real practical influence so they have beenleft unattended.

7.1.1. SPI test (spi.S)

The SPI hardware test performs the following actions:

1. Set up controller with the following settings:

• Mode 1

• 40MHz SCK

• 64 bit transfer

• Interrupt enabled

• Send to (testbench) slave 0

2. Start transfer

3. Wait for interrupt

4. Verify the GO BSY bit is 0

5. Verify loopback of the first transferred word

In Figure 7.1 the transfer of the first word is shown. Since the slave modules bufferis initialized to zero the first received word contains only zeroes. Figure 7.2 shows thesecond word transfer where the first word is looped back. The pictures has a slightoverlap.

62

Page 76: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 7.1.: SPI transfer of first word.

Figure 7.2.: SPI transfer of second word.

7.1.2. I2C test (i2c.S)

Since the I2C controller only uses eight bits of the wishbone interface the I2C test usesboth word and byte access. This ensures that the padding was done successfully. Thetest performs the following actions:

1. Set a prescaler that gives 365KHz SCLK at 40MHz system clock

2. Enable the core

3. Send data 0xa5 to slave address 0x20 register 1

4. Send data 0x5a to register 2 using auto incrementation

5. Read back data and verify it

6. Write to an invalid register and check NACK

Figure 7.3 to 7.13 shows SDA and SCL during the complete sequence. The i2c slavemodel uses address 0010 000.

Figure 7.3.: Start condition and sending slave address plus write bit (0x20)

63

Page 77: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 7.4.: Sending register address 0x01

Figure 7.5.: Sending data 0xa5

Figure 7.6.: Sending data 0x5a and a stop condition

Figure 7.7.: Start condition and sending slave address plus write bit (0x20)

Figure 7.8.: Sending register address 0x01

Figure 7.9.: Start condition and sending slave address plus read bit (0x21)

64

Page 78: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 7.10.: Reading data 0xa5

Figure 7.11.: Reading data 0x5a and stop condition

Figure 7.12.: Start condition and sending slave address plus write bit (0x20)

Figure 7.13.: Sending invalid register address 0x10 and receiving NACK

65

Page 79: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

7.1.3. UART test (uart tx.S)

The UART transmission test performs the following actions:

1. Enable loopback in the test UART

2. Send message ”Hi!”, encoded in ASCII

3. Send white space character to flush the loopback buffer

Figure 7.14 to 7.17 shows the UART communication lines during the test.

Figure 7.14.: Send character ”H”

Figure 7.15.: Send character ”i”, receive character ”H”

Figure 7.16.: Send character ”!”, receive character ”i”

66

Page 80: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 7.17.: Send character ” ”, receive character ”!”

7.1.4. GPIO (gpio.S)

The GPIO test uses 16 pins and performs the following actions:

1. Configure pins [16:9] as inputs and pins [8:1] as outputs

2. Set the outputs to hex value ”0xDA”

3. Check the mirrored inputs

4. Set the outputs to hex value ”0xBE”

5. Check the mirrored inputs

In Figure 7.18 and Figure 7.19 the GPIO pin values are shown.

Figure 7.18.: Pins [8:1] is ”0xDA” and mirrored on pins [16:9]

67

Page 81: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Figure 7.19.: Pins [8:1] is ”0xBE” and mirrored on pins [16:9]

7.2. C tests

The test software written in C are located in the $AMBER BASE/sw/ directory whereeach test has its own folder. The folder has to have the same name as the test softwarefile and minimally contain the files presented in Table 7.1.

Filename Description

Makefile A makefile that specifies the files and dependencies, any extra vari-ables and then calls a makefile that is common for all C tests calledcommon.mk.

sections.lds Linker file that specifies the different memory sections of the test pro-gram.

start.S Assembler start routine. Contains exception and interrupt handlers,stack initiation, etc. Used when program is run without the boot-loader.

test.c The actual test program.

Table 7.1.: Files required for a C test

7.2.1. Libraries

There is a library included in the Amber project called mini-libc. It contains, amongothers, malloc and a printf version that uses the UART0 controller. These can be usedinstead of the standard ”stdio.h” library since it runs stand-alone on this system. Touse it one has to define the parameter ”USE MINI LIBC” in the makefile and set it to”1”. For an example see the Hello World sample program discussed in Section 7.2.4. The

68

Page 82: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

mini-libc library also contains functions, called testpass() and testfail(), for terminatingthe tests in a similar manner as the assembler tests. In case one does not want to use themini-libc library but still needs these functions they were extracted into a new librarythat was called ”test func”. The common makefile was modified so that the test funclibrary is used if the USE MINI LIBC parameter is undefined.

7.2.2. boot-loader-serial

This launches the boot loader that is also used in hardware. The boot-loader displays ahelp message and then the test is terminated.

7.2.3. dhry

This test launches the Dhrystone benchmark program version 2.1. The accuracy of thetest result is unknown but differences can be seen when implementing various hardwarechanges. For example, when the main memory was changed an increase in performanceof 0.03 DMIPS could be noted. This could with high probability be traced to the factthat the new memory used one wishbone cycle less per access than the replaced one.

7.2.4. hello-world

There is a simple ”Hello-World” sample program included in the Amber project thatprints the message ”Hello, World!” using the printf function in mini libc.

7.2.5. spi-timer

This sample program was written to show how the timers and interrupts are set up.It uses a timer to periodically send a message using the SPI controller. The messagecontains a number which tell how many messages that has been sent. The test uses theUART0 controller to print a $ when a SPI transfer starts and a ! every SPI interruptfollowed by the content of the received message.

7.3. Linux test

This test boots a precompiled Linux 2.4 kernel. It is set up for 32MB of main memoryso the parameter MAIN MSB must be set to 24 before launching this test. When thekernel is booted it prints a ”hello world” message as shown in Appendix B.

69

Page 83: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

8. Result

The Verilog code along with this document serves as the final product of this thesis. Thefinal SoC not only fulfills the specification specified in Chapter 3 but it is also writtenin a generic manner that should require a minimal effort to synthesize on an arbitraryFPGA. This is important since there might be specific customer demands on hardwarein any future product.The SoC is also very general in terms of peripheral functionality. Most sensors todayuse either SPI or I2C and where something else is used the GPIO pins can with mostcertainty be utilized. They can also be used in interface with other FPGA componentssuch as Digital Signal Processors, memories or other project specific components.When comparing performance, the finished SoC’s 0.78 DMIPS/MHz is not far from themaximum performance of the ARM7 families 0.9 DMIPS/MHz[36]. The ARM7 archi-tecture has for example been used in the LEGO Mindstorms NXT brick, which hasbeen used in a huge variety of applications including several control theory projects atUppsala University[37][38]. In these projects data from several sensors is collected andprocessed. Since the ARM7 is run at 48MHz it indicates that the a23 processor shouldbe powerful enough to handle similar situations.

70

Page 84: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

9. Conclusion

The goal of this thesis was to specify and implement a ARM based SoC in a FPGA. Inthe end, no hardware was available for testing the final system. However, in simulationsit is shown that the SoC executes both assembler and C code correctly and that theperipheral interfaces behave as expected. As a further verification the Verilog code wassynthesized for a Spartan 6 LX45T. Timing reports from the synthesis tools indicatethat the synthesis was successful and that the code is ready for hardware testing.

9.1. Specification

The ARM Cortex M1 processor core and two open source SoC:s are compared andone of the SoC:s, called Amber, was selected due to its advantages in the cost andlicensing properties. A specification for the whole system was also set that includedcommunication controllers that would be able to interface with most sensors today’smarket.

9.2. Implementation

9.2.1. Target independence

The complete design was written in Verilog HDL but several functions such as memories,an adder and the multiplication unit were instantiated as Xilinx specific FPGA blocks. Itis usually desired to use these blocks but not by forcing them through code instantiation.Instead it is common to write generic code that the synthesis tool translates into FPGAblocks when they are available. By doing this one can much easier migrate the design todifferent FPGA:s. To achieve this the block instantiations were replaced by generic codethat can be synthesized in a FPGA from any vendor with a tool chain that supportsVerilog code. As for the adder and multiplication unit a good generic replacement codewas included in the project. The memories also had generic variants supplied but theyeither did not use the RAM blocks at synthesis or, like the main memory, was notsynthesizable at all due to its size. They were instead replaced by new code written tosuit Xilinx RAM blocks but that will synthesize in any FPGA. The functionality of allthe replaced code was verified with the included tests.

9.2.2. Peripheral integration

Three peripheral projects were integrated into the system to add SPI, I2C and GPIOfunctionality. The original Amber project was studied in order to integrate the periph-

71

Page 85: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

erals while maintaining the projects code structure as much as possible. All peripheralprojects had a Wishbone bus interface but in some cases they had to be modified slightly.Since an Ethernet controller was not desired the Ethernet MAC included in the Amberproject was removed. In addition to not provide a desired function it would stand foralmost a third of the systems utilized FPGA area so by removing it the synthesis toolgot a much easier job of routing the design.

9.3. Documentation

Included in the Amber project was several scripts and makefiles that used the Xilinx ISEand CodeSourcery CodeBench Lite programs. The scripts and makefiles were studiedand their function is presented along with the installation procedure and work flow ofthe tools.

9.3.1. Peripheral controllers

The SPI, I2C and GPIO peripherals were only described in short and their documentationis referenced in their respective sections. As for the peripherals included in the Amberproject they were without any documentation. Their functionality was documented byanalysing the code and running simulation tests. All registers were investigated andpresented along with the functions they control. The resulting information presented inthis thesis should suffice for a developer that is using or modifying the system. Also,the peripheral integration process was documented to ease any future expansions of thesystem.

9.4. Testing

There were a suite of numerous tests written in assembler and C included in the Amberproject along with a Verilog test bench for the whole system. The assembler tests wasused to verify different hardware functions such as adding, UART transmit/receive,main memory access, cache access etc. They were used throughout the whole projectto verify that the changes made did not interfere with the rest of the design. When anew peripheral was added a new assembler test was written to verify its function. Asit is impossible to check signal levels through code the peripherals output and inputpins were also checked in a waveform graph to verify its correctness. To further test thesystem and to provide a more complex sample program than ”Hello World!” a programcalled ”spi-timer” was written that uses a timer to periodically send a SPI message. Itshows the usage of the timer, interrupt and SPI controllers as well as the new test funclibrary.

72

Page 86: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

9.5. Compiler optimization issue

An issue arose when compiling a C program that read and writes to a register with thesame address in the following manner:

• Writing 0x0000FEED to register 0x19000004

• Reading from register 0x19000004 and store in a variable called ”readout”

This illustrates a write and read to the GPIO controllers LINE register, in this casethe 16 upper bits are used as inputs and could have any value. The compiler optimizedaway the read and instead copied the written value (0x0000FEED) to readout. This wasdiscovered by watching the gpio wb stb signal to see that the gpio controller never wasaccessed for a read cycle.When instead reading from register 0x19000014 it reads the value correctly. This ispossible since the controller only read the third address bit. Thus these two addressesare equivalent. In order to solve this issue all compiler optimizations were turned off inthe common makefile common.mk. With it off the read was executed correctly. Aftersome testing it was clear that a combination of several optimizations produced the errorand no further troubleshooting was performed. The result of this discovery was that theGPIO controller got three registers in the register addresses.h file. One for the CTRLregister, one for reading the LINE register and one for writing to it.

73

Page 87: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

10. Discussion

At the beginning of the project the work to be done was hard to specify due to thevariance in the available alternatives. If the Cortex M1 had been chosen the focus wouldhave been put on getting a structured code base and a working bus architecture. Or, ifthe Storm SoC had been chosen, all focus would have been put on testing and softwarewriting to integrate the SoC in some existing application. Now, since the Amber SoCwas chosen, the focus instead lay on adding peripherals, generalizing code and improvingthe documentation of the original Amber system.

10.1. Pros and cons with the Amber SoC

The Amber SoC is a tightly integrated system with scripts that were tailored to beused with the Xilinx tools in a Linux environment. That needed quite some study tounderstand. Luckily this was not a problem since Syntronic planned to test the finalsystem on a Xilinx FPGA and the Linux environment is quite easy to set up. There werehowever other obstacles to be tackled coupled to the Amber SoC, two of them being:

• No earlier experience in Verilog, Bash, Assembler and Xilinx tools

• Lack of documentation of the system architecture, peripherals and scripts

Non of these are uncommon in the engineering business and not impossible to dealwith. They do however take quite some time in pretense. With that said the Ambersystem also had several upsides like:

• Being a complete working SoC from the beginning

• Well structured code

• An extensive test framework

The structure of the code was easy to follow and in many cases the code-commentscompensated for the lack of documentation. That eased the job of extending the docu-mentation even when the comments were not consistent with the code. In those cases thecode was easy enough to follow to spot errors. Since the Amber SoC already was a fullyworking SoC it was already a usable product. All implementation work could be focusedon the adaptation of it to suit Syntronics specific needs. Since there was an extensivetest suite included in the project every change could be tested directly. When there wasno suitable test to execute the existing ones could be used as templates. This way bugscould be found in a early state which shortened the total time spent on debugging.

74

Page 88: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

10.2. Peripherals

There were a couple of factors that limited the available alternatives of peripheral con-trollers. There was no time to write own controllers and to generate them from Xilinxcore-generator was not an alternative since they had to be vendor independent. Theonly alternative was to find open source IP cores that suited the specification. Thishowever, was not a problem since the opencores community has had the same need fora long time and there were a couple of controllers to choose from for every function thatwas needed. Since time was a limiting factor IP-cores without a Wishbone interface wassorted out. This greatly limited the available choices and as presented in Chapter 11there was a trade-off in features in the GPIO and SPI controllers that might have beenavoided otherwise. On the other hand it might prove that to integrate the missing fea-tures in the chosen cores takes less time than to create a Wishbone interface for anothercore.

75

Page 89: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

11. Future work

Since there was no hardware the next step is to verify the synthesis and to prove thatthe SoC actually works. All simulations seems to verify that it does indeed work but itis not certain until a real hardware test is done. When it comes to further improve thesystem these following options should be considered:

Test SPI flash memory To ensure that a permanent storage alternative is available.

SPI modes Adding support for SPI mode 2 and 3 would make the SPI controller evenmore useful as it will widen the number of sensors it can interface.

Central configuration file Moving all parameters to one file will make it easier to set upthe system. If the system is going to be used in many different projects one wouldalso benefit from a mechanism where the peripherals could be enabled/disabledfrom a file like this.

Improving the Wishbone bus Adding support for the ”Advanced burst scheme” couldimprove the whole system performance.

Main memory content infusion If one wants to load the main memory with data atsynthesis the ”readmemh” function should be added. It has not been done sincethere has been no need for it. Doing that would demand an adaptation of theLinux boot simulation.

UART baud rate defines Creating individual defines for the UART baud rate will makethem a little more flexible than they are now.

Replace UART controllers An alternative to the item above is to replace the UARTcontrollers with a more flexible one.

GPIO interrupt Adding interrupt support for the GPIO controller would greatly in-crease its usability.

JTAG For larger software designs a JTAG controller could prove useful for debugging.

What improvements and modifications to perform is very situation dependent but ina general sense I consider these three the most important:

• Test SPI flash memory

• Central configuration file

• GPIO interrupt

76

Page 90: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

These will all enhance the flexibility of the system and widen its possible uses. TheSPI flash memory would provide the system with a permanent storage where one couldstore large programs, measurement data, logs etc. It also provide a potential customerwith an easy way feed data and configurations to the system.Adding interrupt functionality to the GPIO controller would remove the need of pollingthe controller at certain intervals. This would increase the available CPU time as wellas freeing up atleast one timer. It would also make it possible to extend the system toreceive external timer and clock signals.As for the central configuration file it would mean a extensive change in the code butwould shorten the implementation time of the system dramatically in those cases wherethe peripheral functionality needs to be customized. It would be very useful if one coulddisable unused peripherals and add several instances of other just by changing somedefines. The FPGA footprint of the system would also be optimized by this but therewould most probably be no severe size benefits since the core is the most area demandingcomponent by far. On the other hand, there could be some improvement in timing sincethe routing would be simpler. This could potentially increase the maximum frequencyand thus the performance of the system.

77

Page 91: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

12. Bibliography

[1] Xilinx. Zynq-7000 All Programmable SoC. 2013. url: http://www.xilinx.com/content/xilinx/en/products/silicon-devices/soc/zynq-7000.html (visitedon 07/18/2013).

[2] Altera. SoC Overview. 2013. url: http://www.altera.com/devices/processor/soc-fpga/overview/proc-soc-fpga.html (visited on 07/18/2013).

[3] Microsemi. SoC FPGAs. 2013. url: http://www.microsemi.com/products/fpga-soc/soc-fpgas (visited on 11/12/2013).

[4] Altera. Nios II Processor: The World’s Most Versatile Embedded Processor. 2013.url: http://www.altera.com/devices/processor/nios2/ni2-index.html(visited on 07/18/2013).

[5] Xilinx. MicroBlaze Soft Processor Core. 2013. url: http://www.xilinx.com/tools/microblaze.htm (visited on 07/18/2013).

[6] Jan Andersson, Jiri Gaisler, and Roland Weigand. NEXT GENERATION MUL-TIPURPOSE MICROPROCESSOR. Article. 2010.

[7] Peter Clarke. European Space Agency launches free Sparc-like core. 2000. url:http://www.eetimes.com/document.asp?doc_id=1214267 (visited on 08/30/2013).

[8] Europe Space Agency. LEON’S FIRST FLIGHTS. 2013. url: http://www.esa.int/Our_Activities/Space_Engineering/LEON_s_first_flights (visited on08/30/2013).

[9] Opencores community. Main Page. 2013. url: http://opencores.org/or1k/Main_Page (visited on 08/30/2013).

[10] Opencores community. OR1K:Community portal. 2013. url: http://opencores.org/or1k/OR1K:Community_Portal (visited on 08/30/2013).

[11] Altera. Nios Embedded Processor. 2013. url: http://www.altera.com/products/ip/processors/nios/nio-index.html (visited on 07/18/2013).

[12] David Sheldon et al. Conjoining Soft-Core FPGA Processors. Report. 2006.

[13] Roman Lysecky and Frank Vahid. A Study of the Speedups and Competitivenessof FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning.Report. 2005.

[14] David Sheldon et al. Application-Specific Customization of Parameterized FPGASoft-Core Processors. Report. 2006.

[15] Peter Yiannacouras, Gregory Steffan, and Jonathan Rose. Exploration and Cus-tomization of FPGA-Based Soft Processors. Report. 2007.

78

Page 92: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

[16] Conor Santifort. Amber ARM-compatible core :: Overview. 2013. url: http://opencores.com/project,amber (visited on 07/18/2013).

[17] ARM Ltd. Cortex-M1 Processor. 2013. url: http://www.arm.com/products/processors/cortex-m/cortex-m1.php (visited on 05/08/2013).

[18] Stephan Nolting. Storm Core (ARM7 compatible) :: Overview. 2012. url: http://opencores.org/project,storm_core (visited on 05/08/2013).

[19] Xilinx. Xcelljournal - Solutions for a programmable world. Table 3. 2008. url:http://www.nxtbook.com/nxtbooks/xilinx/xcell64/index.php?startid=58

(visited on 05/20/2013).

[20] OpenCores. Wishbone B4. 2010.

[21] Conor Santifort. Amber Project User Guide. 2013.

[22] Conor Santifort. Amber 2 Core Specification. 2013.

[23] Rudolf Usselmann. OpenCores SoC Bus Review. Report. 2001.

[24] Xilinx. XST User Guide for Virtex-6,Spartan-6, and 7 Series Devices. 2011.

[25] Richard Harveille. I2C-Master Core Specification. 2003.

[26] Richard Harveille. I2C controller core :: Overview. 2013. url: http://opencores.org/project,i2c (visited on 07/01/2013).

[27] Simon Srot. SPI Master Core Specification. 2004.

[28] Simons. SPI controller core :: Overview. 2013. url: http://opencores.org/project,spi (visited on 05/31/2013).

[29] Conor Santifort. uart.v. 2013.

[30] Richard Harveille. Simple General Purpose IO :: Overview. 2009. url: http://opencores.org/project,simple_gpio (visited on 07/01/2013).

[31] Xilinx. ISE WebPACK Design Software. 2013. url: http://www.xilinx.com/products/design- tools/ise- design- suite/ise- webpack.htm (visited on03/27/2013).

[32] Xilinx. Command Line Tools User Guide. 2009.

[33] Xilinx. XST User Guide. 2009.

[34] Xilinx. Constraints Guide. 2009.

[35] Mentor Graphics. Sourcery CodeBench Lite Edition. 2013. url: http://www.mentor . com / embedded - software / sourcery - tools / sourcery - codebench /

editions/lite-edition/ (visited on 05/23/2013).

[36] ARM Limited. Dhrystone and MIPs performance of ARM processors. 2011. url:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/

ka3885.html (visited on 09/25/2013).

[37] Egi Hidayat Uppsala University. Embedded Control Systems Project Groups. 2012.url: http://www.it.uu.se/edu/course/homepage/projektsystek/ht11/Nyheter/Grupper (visited on 09/25/2013).

79

Page 93: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

[38] Egi Hidayat Uppsala University. Embedded Control Systems Project Groups. 2013.url: http://www.it.uu.se/edu/course/homepage/projektsystek/ht12/Nyheter/Grupper (visited on 09/25/2013).

80

Page 94: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

A. I2C test output

No boot memory s i z e s p e c i f i e d . D e f a u l t i n g to 8192 by t e sTest i 2c , type 1Compi le . . / t e s t s / i 2 c . SRunning : / opt / X i l i n x /14 .5/ ISE DS/ ISE/ b in / l i n 6 4 /unwrapped/ f u s e work . tb work .

g l b l −o amber−t e s t . exe −p r j amber−i s im . p r j −L u n i s im s v e r −dBOOT MEM FILE=” . . / t e s t s / i 2 c .mem” −d MAIN MEM FILE=”” −d AMBER LOG FILE=” t e s t s . l o g ” −d AMBER TEST NAME=” i 2 c ” −d AMBER SIM CTRL=1 −dAMBER TIMEOUT=0 − i n c r emen t a l − i . . / v l o g / l i b − i . . / v l o g / system − i −−/v l o g / system/ s p i − i −−/v l o g / system/ram − i −−/v l o g / system/ i 2 c − i . . / v l o g /amber23 − i . . / v l o g / tb

IS im P.58 f ( s i g n a t u r e 0 x fbc00daa )Number o f CPUs de t e c t e d i n t h i s system : 2Turn ing on mult−t h r e ad i ng , number o f p a r a l l e l sub−c omp i l a t i o n j obs : 4Dete rmin ing c omp i l a t i o n o r d e r o f HDL f i l e sAna l y z i ng V e r i l o g f i l e ”/ opt / X i l i n x /14 .5/ ISE DS/ ISE / ./ v e r i l o g / s r c / g l b l . v”

i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/boot mem32 . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ c l o c k s r e s e t s . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i n t e r r u p t c o n t r o l l e r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ system . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ t e s t modu l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ t ime r modu l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ ua r t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ gp io . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ w i s h b o n e a r b i t e r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ a f i f o . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/main mem . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i t o p . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i c l g e n . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i s h i f t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ram/wb mem . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ram/ ram ar r ay . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c m a s t e r b i t c t r l . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c m a s t e r b y t e c t r l . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c ma s t e r t o p . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 a l u . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 b a r r e l s h i f t . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 b a r r e l s h i f t f p g a . v” i n t o

l i b r a r y work

I

Page 95: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Ana l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 cache . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 c op r o c e s s o r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 co r e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 decode . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 decomp i l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 ex e cu t e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 f e t c h . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 mu l t i p l y . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 r e g i s t e r b a n k . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 r am r e g i s t e r b a n k . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 wi shbone . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / g e n e r i c i o b u f . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / boo t r am by t e en . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / gen ram by te en . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / g e n r am l i n e e n . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / i 2 c s l a v e mo d e l . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / s p i s l a v e mo d e l . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / t b u a r t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / t b g p i o . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb /dumpvcd . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / tb . v” i n t o l i b r a r y workS t a r t i n g s t a t i c e l a b o r a t i o nCompleted s t a t i c e l a b o r a t i o nFuse Memory Usage : 102528 KBFuse CPU Usage : 1760 msCompi l i ng module IBUFGDS(DIFF TERM=”TRUE” ,IOSTAND . . .Compi l i ng module PLL ADV(CLKFBOUT MULT=4,CLKIN1 P . . .Compi l i ng module BUFGCompi l i ng module c l o c k s r e s e t sCompi l i ng module g e n r am l i n e e n (DATA WIDTH=32’b0 . . .Comp i l i ng module gen ram by te en (DATA WIDTH=128 ,A . . .Comp i l i ng module a 2 3 c a c h e d e f a u l tCompi l i ng module a23 wi shboneCompi l i ng module a 2 3 f e t c h d e f a u l tCompi l i ng module a23 decomp i l e 2Compi l i ng module a23 decodeCompi l i ng module a 2 3 b a r r e l s h i f tCompi l i ng module a23 a l uCompi l i ng module a 2 3 mu l t i p l yCompi l i ng module a 2 3 r e g i s t e r b a n kCompi l i ng module a23 ex e cu t eCompi l i ng module a 2 3 c op r o c e s s o rCompi l i ng module a23 co r eCompi l i ng module i 2 c m a s t e r b i t c t r lComp i l i ng module i 2 c m a s t e r b y t e c t r lComp i l i ng module i 2 c ma s t e r t o pCompi l i ng module boo t r am by t e en (DATA WIDTH=32,A . . .Comp i l i ng module boot mem32 de fau l tCompi l i ng module ua r t (WB DWIDTH=32,WB SWIDTH=4)Compi l i ng module t e s t modu l e (WB DWIDTH=32,WB SWID . . .Compi l i ng module t ime r modu l e (WB DWIDTH=32,WB SWI . . .

II

Page 96: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Compi l i ng module i n t e r r u p t c o n t r o l l e r (WB DWIDTH=3 . . .Compi l i ng module s p i c l g e nCompi l i ng module s p i s h i f tCompi l i ng module s p i t o pCompi l i ng module gp ioCompi l i ng module r am a r r a y d e f a u l tCompi l i ng module wb mem defaultCompi l i ng module w i s h b o n e a r b i t e r (WB DWIDTH=32,WB. . .Compi l i ng module systemCompi l i ng module t b u a r t d e f a u l tCompi l i ng module s p i s l a v e mo d e lCompi l i ng module i 2 c s l a v e mo d e lCompi l i ng module t b g p i oCompi l i ng module dumpvcdCompi l i ng module tbCompi l i ng module g l b lTime Re s o l u t i o n f o r s imu l a t i o n i s 1ps .Wai t ing f o r 1 sub−c omp i l a t i o n ( s ) to f i n i s h . . .Compi led 42 V e r i l o g Un i t sB u i l t s imu l a t i o n e x e c u t a b l e amber−t e s t . exeFuse Memory Usage : 414448 KBFuse CPU Usage : 2930 msGCC CPU Usage : 270 msIS im P.58 f ( s i g n a t u r e 0 x fbc00daa )WARNING: A WEBPACK l i c e n s e was found .WARNING: P l e a s e use X i l i n x L i c e n s e Con f i g u r a t i o n Manager to check out a

f u l l IS im l i c e n s e .WARNING: IS im w i l l run i n L i t e mode . P l e a s e r e f e r to the IS im documentat ion

f o r more i n f o rma t i o n on the d i f f e r e n c e s between the L i t e and the F u l lv e r s i o n .

Th i s i s a L i t e v e r s i o n o f IS im .Time r e s o l u t i o n i s 1 psS imu l a t o r i s do ing c i r c u i t i n i t i a l i z a t i o n p r o c e s s .l o g f i l e t e s t s . log , t imeout 0 , t e s t name i 2 cF i n i s h e d c i r c u i t i n i t i a l i z a t i o n p r o c e s s .

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Amber CoreUser FIRQ IRQ > SVC

r0 0 x20000010r1 0 x000000c1r2 0 x00000002r3 0 x00000080r4 0 xdeadbee fr5 0 x000000a5r6 0 x0000005ar7 0 xdeadbee fr8 0 xdeadbee f 0 xdeadbee fr9 0 xdeadbee f 0 xdeadbee fr10 0 x00000011 0 xdeadbee fr11 0 xf0000000 0 xdeadbee fr12 0 xdeadbee f 0 xdeadbee fr13 0 xdeadbee f 0 xdeadbee f 0 xdeadbee f 0 xdeadbee f

III

Page 97: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

r14 ( l r ) 0 xdeadbee f 0 xdeadbee f 0 xdeadbee f 0 xdeadbee fr15 ( pc ) 0 x00000268

S ta tu s B i t s : N=0, Z=0, C=1, V=0, IRQ Mask 1 , FIRQ Mask 1 , Mode = Sup e r v i s o r−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

++++++++++++++++++++Passed i 2 c 12391 t i c k s++++++++++++++++++++Stopped at t ime : 309887500 ps : F i l e ”/home/ emanuel /workspace /amber SoC/

t runk /hw/ v l o g / tb / tb . v” L ine 386

IV

Page 98: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

B. Linux test output

No boot memory s i z e s p e c i f i e d . D e f a u l t i n g to 8192 by t e sTest vml inux , type 3make −s −C . . / mini− l i b c MIN SIZE=1arm−none−l i n u x−gnueab i−l d −B s t a t i c −Map boot−l o ade r− s e r i a l . map −−s t r i p −

debug −−f i x−v4bx −o boot−l o ade r− s e r i a l . e l f −T s e c t i o n s . l d s boot−l o ade r−s e r i a l . o s t a r t . o c r c16 . o xmodem . o e l f s p l i t t e r . o . . / mini− l i b c / p r i n t f . o. . / mini− l i b c / l i b c a sm . o . . / mini− l i b c /memcpy . o

arm−none−l i n u x−gnueab i−ob jcopy −R . comment −R . note boot−l o ade r− s e r i a l . e l f. . / t o o l s /amber− e l f s p l i t t e r boot−l o ade r− s e r i a l . e l f > boot−l o ade r− s e r i a l .mem. . / t o o l s /amber−memparams32 . sh boot−l o ade r− s e r i a l .mem boot−l o ade r−

se r i a l memparams32 . v. . / t o o l s /amber−memparams128 . sh boot−l o ade r− s e r i a l .mem boot−l o ade r−

se r i a l memparams128 . varm−none−l i n u x−gnueab i−objdump −C −S −EL boot−l o ade r− s e r i a l . e l f > boot−

l o ade r− s e r i a l . d i s. . / t o o l s / check mem s ize . sh boot−l o ade r− s e r i a l .mem ”@000020”Running : / opt / X i l i n x /14 .5/ ISE DS/ ISE/ b in / l i n 6 4 /unwrapped/ f u s e tb −o amber−

t e s t . exe −p r j amber−i s im . p r j −d BOOT MEM FILE=” . . / . . / sw/boot−l o ade r−s e r i a l /boot−l o ade r− s e r i a l .mem” −d MAIN MEM FILE=” . . / . . / sw/ vml inux /vml inux .mem” −d AMBER LOG FILE=” t e s t s . l o g ” −d AMBER TEST NAME=” vml inux ”−d AMBER SIM CTRL=3 −d AMBER TIMEOUT=0 −d AMBER LOAD MAIN MEM −i n c r emen t a l − i . . / v l o g / l i b − i . . / v l o g / system − i −−/v l o g / system/ s p i − i−−/v l o g / system/ram − i −−/v l o g / system/ i 2 c − i . . / v l o g /amber23 − i . . / v l o g /tb

IS im P.58 f ( s i g n a t u r e 0 x fbc00daa )Number o f CPUs de t e c t e d i n t h i s system : 4Turn ing on mult−t h r e ad i ng , number o f p a r a l l e l sub−c omp i l a t i o n j obs : 8Dete rmin ing c omp i l a t i o n o r d e r o f HDL f i l e sAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/boot mem32 . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ c l o c k s r e s e t s . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i n t e r r u p t c o n t r o l l e r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ system . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ t e s t modu l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ t ime r modu l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ ua r t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ gp io . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ w i s h b o n e a r b i t e r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ a f i f o . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/main mem . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i t o p . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i c l g e n . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ s p i / s p i s h i f t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ram/wb mem . v” i n t o l i b r a r y work

V

Page 99: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Ana l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ram/ ram ar r ay . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c m a s t e r b i t c t r l . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c m a s t e r b y t e c t r l . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / system/ i 2 c / i 2 c ma s t e r t o p . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 a l u . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 b a r r e l s h i f t . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 b a r r e l s h i f t f p g a . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 cache . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 c op r o c e s s o r . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 co r e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 decode . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 decomp i l e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 ex e cu t e . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 f e t c h . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 mu l t i p l y . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 r e g i s t e r b a n k . v” i n t o l i b r a r y

workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a 2 3 r am r e g i s t e r b a n k . v” i n t o

l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g /amber23/ a23 wi shbone . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / g e n e r i c i o b u f . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / boo t r am by t e en . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / gen ram by te en . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / l i b / g e n r am l i n e e n . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / i 2 c s l a v e mo d e l . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / s p i s l a v e mo d e l . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / t b u a r t . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / t b g p i o . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb /dumpvcd . v” i n t o l i b r a r y workAna l y z i ng V e r i l o g f i l e ” . . / v l o g / tb / tb . v” i n t o l i b r a r y workS t a r t i n g s t a t i c e l a b o r a t i o nCompleted s t a t i c e l a b o r a t i o nFuse Memory Usage : 100400 KBFuse CPU Usage : 1290 msCompi l i ng module c l o c k s r e s e t sCompi l i ng module g e n r am l i n e e n (DATA WIDTH=32’b0 . . .Comp i l i ng module gen ram by te en (DATA WIDTH=128 ,A . . .Comp i l i ng module a 2 3 c a c h e d e f a u l tCompi l i ng module a23 wi shboneCompi l i ng module a 2 3 f e t c h d e f a u l tCompi l i ng module a23 decomp i l e 2Compi l i ng module a23 decodeCompi l i ng module a 2 3 b a r r e l s h i f tCompi l i ng module a23 a l uCompi l i ng module a 2 3 mu l t i p l yCompi l i ng module a 2 3 r e g i s t e r b a n kCompi l i ng module a23 ex e cu t eCompi l i ng module a 2 3 c op r o c e s s o r

VI

Page 100: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Compi l i ng module a23 co r eCompi l i ng module i 2 c m a s t e r b i t c t r lComp i l i ng module i 2 c m a s t e r b y t e c t r lComp i l i ng module i 2 c ma s t e r t o pCompi l i ng module boo t r am by t e en (DATA WIDTH=32,A . . .Comp i l i ng module boot mem32 de fau l tCompi l i ng module ua r t (WB DWIDTH=32,WB SWIDTH=4)Compi l i ng module t e s t modu l e (WB DWIDTH=32,WB SWID . . .Compi l i ng module t ime r modu l e (WB DWIDTH=32,WB SWI . . .Comp i l i ng module i n t e r r u p t c o n t r o l l e r (WB DWIDTH=3 . . .Compi l i ng module s p i c l g e nCompi l i ng module s p i s h i f tCompi l i ng module s p i t o pCompi l i ng module gp ioCompi l i ng module r am a r r a y d e f a u l tCompi l i ng module wb mem defaultCompi l i ng module w i s h b o n e a r b i t e r (WB DWIDTH=32,WB. . .Compi l i ng module systemCompi l i ng module t b u a r t d e f a u l tCompi l i ng module s p i s l a v e mo d e lCompi l i ng module i 2 c s l a v e mo d e lCompi l i ng module t b g p i oCompi l i ng module dumpvcdCompi l i ng module tbTime Re s o l u t i o n f o r s imu l a t i o n i s 1ps .Wai t ing f o r 22 sub−c omp i l a t i o n ( s ) to f i n i s h . . .Compi led 38 V e r i l o g Un i t sB u i l t s imu l a t i o n e x e c u t a b l e amber−t e s t . exeFuse Memory Usage : 671904 KBFuse CPU Usage : 2670 msGCC CPU Usage : 27950 msIS im P.58 f ( s i g n a t u r e 0 x fbc00daa )WARNING: A WEBPACK l i c e n s e was found .WARNING: P l e a s e use X i l i n x L i c e n s e Con f i g u r a t i o n Manager to check out a

f u l l IS im l i c e n s e .WARNING: IS im w i l l run i n L i t e mode . P l e a s e r e f e r to the IS im documentat ion

f o r more i n f o rma t i o n on the d i f f e r e n c e s between the L i t e and the F u l lv e r s i o n .

Th i s i s a L i t e v e r s i o n o f IS im .Time r e s o l u t i o n i s 1 psS imu l a t o r i s do ing c i r c u i t i n i t i a l i z a t i o n p r o c e s s .l o g f i l e t e s t s . log , t imeout 0 , t e s t name vml inuxLoad main memory from . . / . . / sw/ vml inux / vml inux .memF i n i s h e d c i r c u i t i n i t i a l i z a t i o n p r o c e s s .Amber Boot Loader v20130822150540j 0 x00080000

L inux v e r s i o n 2.4.27− v r s 1 ( c ono r@ s e r v e r ) ( gcc v e r s i o n 4 . 5 . 1 ( Sou r c e r y G++L i t e 2010.09−58) ) #446 Mon Dec 21 14 : 04 : 42 GMT 2009

CPU: Amber 2 r e v i s i o n 0Machine : Amber−FPGA−SystemOn node 0 t o t a l p a g e s : 1024zone (0 ) : 1024 pages .zone (1 ) : 0 pages .

VII

Page 101: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

zone (2 ) : 0 pages .Ke rne l command l i n e : c o n s o l e=ttyAM0 mem=32M roo t=/dev/ram19.91 BogoMIPS ( p r e s e t v a l u e used )Memory : 32MB = 32MB t o t a lMemory : 31136KB a v a i l a b l e (496K code , 195K data , 32K i n i t )Dentry cache hash t a b l e e n t r i e s : 4096 ( o r d e r : 0 , 32768 by t e s )Inode cache hash t a b l e e n t r i e s : 4096 ( o r d e r : 0 , 32768 by t e s )Mount cache hash t a b l e e n t r i e s : 4096 ( o r d e r : 0 , 32768 by t e s )Bu f f e r cache hash t a b l e e n t r i e s : 8192 ( o r d e r : 0 , 32768 by t e s )Page−cache hash t a b l e e n t r i e s : 8192 ( o r d e r : 0 , 32768 by t e s )POSIX conformance t e s t i n g by UNIFIXL inux NET4. 0 f o r L inux 2 .4Based upon Swansea U n i v e r s i t y Computer S o c i e t y NET3.039S t a r t i n g kswapdttyAM0 at MMIO 0x16000000 ( i r q = 1) i s a WSBNpty : 256 Unix98 p t y s c o n f i g u r e dRAMDISK d r i v e r i n i t i a l i z e d : 16 RAM d i s k s o f 208K s i z e 1024 b l o c k s i z eNetWinder F l o a t i n g Po in t Emulator V0 .97 ( doub l e p r e c i s i o n )RAMDISK: ex t2 f i l e s y s t em found at b l o ck 8388608RAMDISK: Load ing 200 b l o c k s [ 1 d i s k ] i n t o ram d i s k . . . done .F r e e i n g i n i t r d memory : 200KVFS : Mounted r oo t ( ex t2 f i l e s y s t em ) readon ly .F r e e i n g i n i t memory : 32KBINFMT FLAT : Load ing f i l e : / s b i n / i n i tMapping i s 8b0000 , Ent ry p o i n t i s 8068 , d a t a s t a r t i s 8dd0Load / s b i n / i n i t : TEXT=8b0040−8b8dd0 DATA=8b8dd4−8b8d f f BSS=8b8df f−8b8e04He l l o , World !

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Amber Core> User FIRQ IRQ SVC

r0 0 x00000010r1 0 x008b8de4r2 0 x00000000r3 0 x00000000r4 0 x00000000r5 0 x00000000r6 0 x00000000r7 0 x00000000r8 0 x00000000 0 xdeadbee fr9 0 x00000000 0 xdeadbee fr10 0 x00000011 0 xdeadbee fr11 0 xf0000000 0 xdeadbee fr12 0 x00000000 0 xdeadbee fr13 0 x008a f f b4 0 xdeadbee f 0 x0210cc24 0 x02161fe8r14 ( l r ) 0 x00000000 0 xdeadbee f 0 x6209620b 0x008b8068r15 ( pc ) 0 x008b84bc

S ta tu s B i t s : N=0, Z=1, C=1, V=0, IRQ Mask 0 , FIRQ Mask 0 , Mode = User−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

++++++++++++++++++++

VIII

Page 102: Adaptation of an ARM compatible System on chip as an IP ...700637/FULLTEXT01.pdf · Adaptation of an ARM compatible System on chip as an IP-module in a FPGA Emanuel Wahlqvist In the

Passed vml inux 12120135 t i c k s++++++++++++++++++++Stopped at t ime : 303003852500 ps : F i l e ”/home/ emanuel /workspace /amber SoC

/ t runk /hw/ v l o g / tb / tb . v” L ine 395

IX