Home / Resources / Fetch and execute software from an AXI slave with ARM Cortex-A9

Introduction

The Zynq-7000 family is based on the Xilinx All Programmable SoC architecture. These products integrate a feature-rich dual or single-core ARM Cortex-A9 MPCore based processing system (PS) and Xilinx programmable logic (PL) in a single device.

The Cortex-A9 processor implements the ARM v7-A architecture with full virtual memory support and can execute 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java byte codes in the Jazelle state.

The PL is nearly identical to a Xilinx 7-series Artix FPGA, except that it contains several dedicated ports and buses that tightly couple it to the PS. The PL must be configured either directly by the processor or via the JTAG port.

OpenOCD is an on-chip debugging in-system programming and boundary-scan testing tool for various ARM and MIPS systems. The debugger uses an IEEE 1149-1 compliant JTAG TAP bus master to access on-chip debug functionality available on ARM based microcontrollers or system-on-chip solutions.

Here is shown how to fetch and execute software from an AXI slave with ARM Cortex-A9. We also show to use a second AXI slave as a stack that is exclusive to the execution of this software. AXI slaves are Block RAM connected to a FPGA Artix-7.

Implementation has been conducted on a Digilent Cora Z7 development board. The Digilent Cora Z7 is a ready-to-use, low-cost, and easily embeddable development platform designed around the Zynq-7000 All-Programmable System-on-Chip from Xilinx.

Hardware

The first step is to declare an AXI slave to store the code that we want to execute. The pinout must be AXI compatible. Xilinx provides Language Templates to instantiate a Block RAM.

module slave
#(
  parameter FILE_INIT  = "NONE" //!< path to a file with the hex content
) (
  input  [11:0] addr,   //!< address where to access memory (2 LSB left unused)
  input  clk,           //!< clock
  input  [31:0] wrdata, //!< data to write to memory
  output [31:0] rddata, //!< data read from memory
  input  rst,           //!< synchronous reset
  input  [3:0]  we      //!< write-enable bits, one for each Byte
);
  BRAM_SINGLE_MACRO #(
     .BRAM_SIZE("36Kb"),        // Target BRAM, "18Kb" or "36Kb"
     .DEVICE("7SERIES"),        // Target Device: "7SERIES"
     .DO_REG(0),                // Optional output register (0 or 1)
     .INIT(36'h000000000),      // Initial values on output port
     .INIT_FILE(FILE_INIT),
     .WRITE_WIDTH(32),          // Valid values are 1-72 (37-72 only valid when BRAM_SIZE="36Kb")
     .READ_WIDTH(32),           // Valid values are 1-72 (37-72 only valid when BRAM_SIZE="36Kb")
     .SRVAL(36'h000000000),     // Set/Reset value for port output
     .WRITE_MODE("WRITE_FIRST") // "WRITE_FIRST", "READ_FIRST", or "NO_CHANGE"
  ) i_bram_single (
     .DO(rddata),       // Output data, width defined by READ_WIDTH parameter
     .ADDR(addr[11:2]), // Input address, width defined by read/write port depth
     .CLK(clk),         // 1-bit input clock
     .DI(wrdata),       // Input data port, width defined by WRITE_WIDTH parameter
     .EN(1'b1),         // 1-bit input RAM enable
     .REGCE(1'b0),      // 1-bit input output register enable
     .RST(rst),         // 1-bit input reset
     .WE(we)            // Input write enable, width defined by write port depth
  );
endmodule

Two of these modules are instantiated:

We connect them to AXI BRAM controllers. Xilinx provides IP cores that work out-of-the-box. These modules and the Zynq-7000 are connected together as described by the following schematic:

Hardware schematic

On the Cora-Z7, we do not use external devices so no connexion is described in the constraints file.

Software

Important! When the microprocessor fetches from an AXI slave, instructions are pre-fetched by packets of 32 Bytes (8 instructions in ARM state). As a consequence, branch destination addresses must be aligned to 32 Bytes ; otherwise, a pre-fetch abort exception occurs.

Our software switches stack context to use the exclusive stack. Then, we illustrate that we can use this stack in loops. Virtual addresses for both the software and the stack are respectively 0x40000000 and 0x42000000.

// saving stack context + switching to exclusive stack:
mov r0, fp
mov r1, sp
movw r2, #0x1000  // __ex_stack_top
movt r2, #0x4200  //
mov fp, r2
mov sp, r2
push {lr}
push {r0}

push {r1}
nop
nop
nop
nop
nop
nop
mov r0, #-1   // set r0 to -1

// push from 0 to 9 to the stack:
my_loop_0:
  add r0, r0, #1
  push {r0}
  cmp r0, #9
  blt my_loop_0
nop
nop
nop
mov r0, #-1

// pop 10 times from the stack:
my_loop_1:
  add r0, r0, #1
  pop {r1}
  cmp r0, #9
  blt my_loop_1
nop
nop
nop
nop

Our software also writes a witness at the lowest address of the exclusive stack. In the end it returns.

// witness (0x42000000 == __ex_stack_low):
movw r3, #0x0000
movt r3, #0x4200
movw r2, #0x5678
movt r2, #0x1234
str r2, [r3]
// restoring stack context and old link:
pop {r1}
pop {r0}
pop {lr}

mov fp, r0
mov sp, r1
bx lr // return

We can execute this code from the main function:

movw r1, #0x0000
movt r1, #0x4000
mov lr, pc
bx r1

Run

We program the board through JTAG with openOCD and gdb. The following files must be available:

The following gdb script monitors openOCD and programs the board:

set architecture armv7

monitor reset halt
monitor pld load 0 ./hw/design_0_wrapper.bit
monitor gdb_sync

file  sw/fsbl/main.elf
load  sw/fsbl/main.elf
break _boot
jump  _boot

break Loop
break FsblHookFallback
continue
continue

# load the code into the AXI slave:
monitor load_image sw/app/sw_axi.bin 0x40000000

# load the application:
file sw/app/main.elf
load sw/app/main.elf
break _start
jump  _start

break main
continue
continue

Makefile

Building the hardware requires a synthesis using Xilinx Vivado. A script is provided in the archive from the associated resources.

Building the software requires a compilation. Here we use gcc and ld. A script is provided in the archive from the associated resources.

Running requires to open two terminals: the first one to run openOCD and to communicate with the hardware ; the second one to run gdb and monitor openOCD.

Here we use a custom openOCD configuration file for zynq_7000: there is only one CPU core in case of cora Z7 and we must not create 2 targets.

Here is a Makefile that provides the recipe (make sure to indent with tabulations):

OCD = ../openocd_zynq.tcl

all: openocd

openocd: ${OCD} hw/design_0_wrapper.bit gdbinit.gdb
    make -C sw
    # TO RUN IN AN OTHER TERMINAL:
    # gdb-multiarch -ex "set architecture armv7" -ex "target extended-remote localhost:3333" --command="gdbinit.gdb"
    #
    /usr/bin/openocd -f $<

hw/design_0_wrapper.bit:
    make -C hw

Associated resources

Here is an archive which contains the discussed example and its Makefiles.



no cookie, no javascript, no external resource, KISS!