Home / Resources / Fetch and execute software from an AXI slave with ARM Cortex-A9
Introduction
The Zynq-7000 family is based on the Xilinx All Programmable SoC architecture. These products integrate a feature-rich dual or single-core ARM Cortex-A9 MPCore based processing system (PS) and Xilinx programmable logic (PL) in a single device.
The Cortex-A9 processor implements the ARM v7-A architecture with full virtual memory support and can execute 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java byte codes in the Jazelle state.
The PL is nearly identical to a Xilinx 7-series Artix FPGA, except that it contains several dedicated ports and buses that tightly couple it to the PS. The PL must be configured either directly by the processor or via the JTAG port.
OpenOCD is an on-chip debugging in-system programming and boundary-scan testing tool for various ARM and MIPS systems. The debugger uses an IEEE 1149-1 compliant JTAG TAP bus master to access on-chip debug functionality available on ARM based microcontrollers or system-on-chip solutions.
Here is shown how to fetch and execute software from an AXI slave with ARM Cortex-A9. We also show to use a second AXI slave as a stack that is exclusive to the execution of this software. AXI slaves are Block RAM connected to a FPGA Artix-7.
Implementation has been conducted on a Digilent Cora Z7 development board. The Digilent Cora Z7 is a ready-to-use, low-cost, and easily embeddable development platform designed around the Zynq-7000 All-Programmable System-on-Chip from Xilinx.
Hardware
The first step is to declare an AXI slave to store the code that we want to execute. The pinout must be AXI compatible. Xilinx provides Language Templates to instantiate a Block RAM.
module slave
#(
parameter FILE_INIT = "NONE" //!< path to a file with the hex content
) (
input [11:0] addr, //!< address where to access memory (2 LSB left unused)
input clk, //!< clock
input [31:0] wrdata, //!< data to write to memory
output [31:0] rddata, //!< data read from memory
input rst, //!< synchronous reset
input [3:0] we //!< write-enable bits, one for each Byte
);
BRAM_SINGLE_MACRO #(
.BRAM_SIZE("36Kb"), // Target BRAM, "18Kb" or "36Kb"
.DEVICE("7SERIES"), // Target Device: "7SERIES"
.DO_REG(0), // Optional output register (0 or 1)
.INIT(36'h000000000), // Initial values on output port
.INIT_FILE(FILE_INIT),
.WRITE_WIDTH(32), // Valid values are 1-72 (37-72 only valid when BRAM_SIZE="36Kb")
.READ_WIDTH(32), // Valid values are 1-72 (37-72 only valid when BRAM_SIZE="36Kb")
.SRVAL(36'h000000000), // Set/Reset value for port output
.WRITE_MODE("WRITE_FIRST") // "WRITE_FIRST", "READ_FIRST", or "NO_CHANGE"
) i_bram_single (
.DO(rddata), // Output data, width defined by READ_WIDTH parameter
.ADDR(addr[11:2]), // Input address, width defined by read/write port depth
.CLK(clk), // 1-bit input clock
.DI(wrdata), // Input data port, width defined by WRITE_WIDTH parameter
.EN(1'b1), // 1-bit input RAM enable
.REGCE(1'b0), // 1-bit input output register enable
.RST(rst), // 1-bit input reset
.WE(we) // Input write enable, width defined by write port depth
);
endmodule
Two of these modules are instantiated:
- one to store the code to execute,
- one to use as a stack that is exclusive to this execution.
We connect them to AXI BRAM controllers. Xilinx provides IP cores that work out-of-the-box. These modules and the Zynq-7000 are connected together as described by the following schematic:
On the Cora-Z7, we do not use external devices so no connexion is described in the constraints file.
Software
Important! When the microprocessor fetches from an AXI slave, instructions are pre-fetched by packets of 32 Bytes (8 instructions in ARM state). As a consequence, branch destination addresses must be aligned to 32 Bytes ; otherwise, a pre-fetch abort exception occurs.
Our software switches stack context to use the exclusive stack. Then, we
illustrate that we can use this stack in loops. Virtual addresses for both the
software and the stack are respectively 0x40000000
and 0x42000000
.
// saving stack context + switching to exclusive stack:
mov r0, fp
mov r1, sp
movw r2, #0x1000 // __ex_stack_top
movt r2, #0x4200 //
mov fp, r2
mov sp, r2
push {lr}
push {r0}
push {r1}
nop
nop
nop
nop
nop
nop
mov r0, #-1 // set r0 to -1
// push from 0 to 9 to the stack:
my_loop_0:
add r0, r0, #1
push {r0}
cmp r0, #9
blt my_loop_0
nop
nop
nop
mov r0, #-1
// pop 10 times from the stack:
my_loop_1:
add r0, r0, #1
pop {r1}
cmp r0, #9
blt my_loop_1
nop
nop
nop
nop
Our software also writes a witness at the lowest address of the exclusive stack. In the end it returns.
// witness (0x42000000 == __ex_stack_low):
movw r3, #0x0000
movt r3, #0x4200
movw r2, #0x5678
movt r2, #0x1234
str r2, [r3]
// restoring stack context and old link:
pop {r1}
pop {r0}
pop {lr}
mov fp, r0
mov sp, r1
bx lr // return
We can execute this code from the main function:
movw r1, #0x0000
movt r1, #0x4000
mov lr, pc
bx r1
Run
We program the board through JTAG with openOCD and gdb. The following files must be available:
hw/design_0_wrapper.bit
, the bitstream to program the FPGAsw/fsbl/main.elf
, the build of the First Stage Boot Loadersw/app/main.elf
, the build of the applicationsw/app/sw_axi.bin
, the build of the code to be placed in the AXI slave
The following gdb script monitors openOCD and programs the board:
set architecture armv7
monitor reset halt
monitor pld load 0 ./hw/design_0_wrapper.bit
monitor gdb_sync
file sw/fsbl/main.elf
load sw/fsbl/main.elf
break _boot
jump _boot
break Loop
break FsblHookFallback
continue
continue
# load the code into the AXI slave:
monitor load_image sw/app/sw_axi.bin 0x40000000
# load the application:
file sw/app/main.elf
load sw/app/main.elf
break _start
jump _start
break main
continue
continue
Makefile
Building the hardware requires a synthesis using Xilinx Vivado. A script is provided in the archive from the associated resources.
Building the software requires a compilation. Here we use gcc and ld. A script is provided in the archive from the associated resources.
Running requires to open two terminals: the first one to run openOCD and to communicate with the hardware ; the second one to run gdb and monitor openOCD.
Here we use a custom openOCD configuration file for zynq_7000: there is only one CPU core in case of cora Z7 and we must not create 2 targets.
Here is a Makefile that provides the recipe (make sure to indent with tabulations):
OCD = ../openocd_zynq.tcl
all: openocd
openocd: ${OCD} hw/design_0_wrapper.bit gdbinit.gdb
make -C sw
# TO RUN IN AN OTHER TERMINAL:
# gdb-multiarch -ex "set architecture armv7" -ex "target extended-remote localhost:3333" --command="gdbinit.gdb"
#
/usr/bin/openocd -f $<
hw/design_0_wrapper.bit:
make -C hw
Associated resources
Here is an archive which contains the discussed example and its Makefiles.