This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Platforms and Toolchains

Code is built by a toolchain (compiler, linker) to run on a platform (e.g., a pixel 7a cellphone).

This project adopts the Bazel framework for building importable exemplars. Platforms describe the foundation on which code will run. Toolchains compile and link code for different platforms. Bazel builds are hermetic, which for our purposes means that platforms and toolchains are all versioned and importable, so build results are the same no matter where the build host may be.

Example of RISCV-64 platforms and toolchains

The directory RISCV64/toolchain defines these platforms:

  • //platforms:riscv_userspace for a generic RISCV-64 Linux appliance with the usual libc and libstdio APIs
  • //platforms:riscv_vector for a more specialized RISCV-64 Linux appliance with vector extensions supported
  • //platforms:riscv_custom for a highly specialized RISCV-64 Linux appliance with vector and vendor-specific extensions supported
  • //platforms:riscv_local for toolchain debugging, using a local file system toolchain under /opt/riscvx

Note: The current binutils and gcc show more vendor-specific instruction set extensions from THead, so we will arbitrarily use that as the exemplar custom platform.

This directory defines these toolchains:

  • //toolchains:riscv64-default - a gcc-13 stable RISCV compiler, linker, loader, and sysroot of related include files and libraries
  • //toolchains:riscv64-next - a gcc-14 unreleased but feature-frozen RISCV compiler, linker, loader, and sysroot of related include files and libraries
  • //toolchains:riscv64-custom - a variant of //toolchains:riscv64-next with multiple standard and vendor-specific ISA extensions enabled by default
  • //toolchains:riscv64-local - a toolchain executing out of /opt/riscvx instead of a portable tarball. Generally useful only when debugging the generation of a fully portable and hermetic toolchain tarball.

Exemplars are built by naming the platform for each build. Bazel then finds a compatible toolchain to complete the build.

# compile for the riscv_userspace platform, automatically selecting the riscv64-default toolchain with gcc-13.
bazel build -s --platforms=//platforms:riscv_userspace gcc_vectorization:helloworld_challenge
# compile for the riscv_vector platform, automatically selecting the riscv64-next toolchain with gcc-14.
bazel build -s --platforms=//platforms:riscv_vector gcc_vectorization:helloworld_challenge

This table shows relationships between platforms, constraints, toolchains, and default options:

platform cpu constraint toolchain default options added optimized options
//platforms:riscv_userspace //toolchains:riscv64 //toolchains:riscv64-default -O3
//platforms:riscv_vector //toolchains:riscv64-v //toolchains:riscv64-next -march=rv64gcv -O3
//platforms:riscv_custom //toolchains:riscv64-c //toolchains:riscv64-custom -march=rv64gcv_zba_zbb_zbc_zbkb_zbkc_zbkx_zvbc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_xtheadcondmov_xtheadmac_xtheadfmemidx_xtheadmempair_xtheadsync -O3
//platforms:riscv_local //toolchains:riscv64-l //toolchains:riscv64-local -O3

Notes:

  • The -O3 option is likely too aggressive. The -O2 option would be more common in broadly released software.
  • //toolchains:riscv64-default currently uses a gcc-13 toolchain suite
  • the other toolchains use various developmental snapshots of the gcc-14 toolchain suite
  • vector extensions version 1.0 are default on //toolchains:riscv64-next and //toolchains:riscv64-custom
  • //toolchains:riscv64-custom adds bit manipulation and many of the THead extensions supported by binutils.

Warning: C options can be added by the toolchain, within a BUILD file, and on the command line. For options like -O and -march, only the last instance of the option affects the build.

Toolchain details

Toolchains generally include several components that can affect the generated binaries:

  • the gcc compiler, built from source and configured for a specific target architecture and language set
  • binutils utilities, including a gas assembler with support for various instruction set extensions and disassembler tools like objdump that provide reference handling of newer instructions.
  • linker and linker scripts
  • a sysroot holding files the above subsystems would normally expect to find under /usr, for instance /usr/include files supplied by the kernel and standard libraries
  • libc, libstdc++, etc.
  • default compiler options and include directories

The toolchain prepared for building a kernel module won’t be the same as a toolchain built for userspace programs, even if the compilers are identical.

See adding toolchains for an example of adding a new toolchain to this project.

1 - ISA Extensions

Extensions to a processor family’s Instruction Set Architecture add capability and complexity.

The RISCV community has a rich set of extensions to the base Instruction Set Architecture. That means a diverse set of new binary import targets to test against. This work-in-progress is collected in the riscv64/generated/assemblySamples directory. The basic idea is to compare current Ghidra disassembly with current binutils objdump disassembly, using object files assembled from the binutils gas testsuite. For example:

  • riscv64/generated/assemblySamples/h-ext-64.S was copied from the binutils gas testsuite. It contains unit test instructions for hypervisor support extensions like hfence.vvma and hlv.w.
  • riscv64/exemplars/h-ext-64.o is the object file produced by a current snapshot of the binutils 2-41 assembler. The associated listing is riscv64/exemplars/h-ext-64.list.
  • riscv64/exemplars/h-ext-64.objdump is the output from disassembling riscv64/exemplars/h-ext-64.o using the current snapshot of the binutils 2-41 objdump.

So we want to open Ghidra, import riscv64/exemplars/h-ext-64.o, and compare the disassembly window to riscv64/exemplars/h-ext-64.objdump, then triage any variances.

Some variances are trivial. The h-ext-64.S tests include instructions that assemble into a single 4 byte sequence. Disassembly will only give a single instruction, perhaps the simplest one of the given aliases.

Other variances are harder - it looks like Ghidra expects to see an earlier and deprecated set of vector instructions than one currently approved set.

riscv64/generated/assemblySamples/TODO.md collects some of the variances noted so far.

One big question is what kind of pcode should Ghidra generate for some of these instructions - and how many Ghidra users will care about that pcode. The short term answer is to treat extension instructions as pcode function calls. The longer term answer may be to wait until GCC14 comes out with support for vector extensions, then see what kind of C source is conventionally used when invoking those extensions. The memcpy inline function from libc is a likely place to find early use of vector instructions.

Also, what can we safely ignore for now? The proposed vendor-specific T-Head extension instruction th.l2cache.iall won’t be seen by most Ghidra users. On the other hand, the encoding rules published with those T-Head extensions look like a good example to follow.

The Fedora 39 kernel includes virtual machine cache management instructions that are not necessarily supported by binutils - they are ‘assembled’ with gcc macros before reaching the binutils assembler. We will ignore those instruction extensions for now, and only consider instruction extensions supported by binutils.

Determining the ISA extensions required by a binary

Some newer compilers annotate executable binaries by adding the ISA extensions used during the build.

$ /opt/riscvx/bin/riscv64-unknown-linux-gnu-readelf -A riscv64/exemplars/whisper_cpp_default
Attribute Section: riscv
File Attributes
  Tag_RISCV_stack_align: 16-bytes
  Tag_RISCV_arch: "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zmmul1p0"

$ /opt/riscvx/bin/riscv64-unknown-linux-gnu-readelf -A riscv64/exemplars/whisper_cpp_vector
Attribute Section: riscv
File Attributes
  Tag_RISCV_stack_align: 16-bytes
  Tag_RISCV_arch: "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
  Tag_RISCV_priv_spec: 1
  Tag_RISCV_priv_spec_minor: 11

$ /opt/riscvx/bin/riscv64-unknown-linux-gnu-readelf -A riscv64/exemplars/whisper_cpp_vendor
Attribute Section: riscv
File Attributes
  Tag_RISCV_stack_align: 16-bytes
  Tag_RISCV_arch: "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zba1p0_zbb1p0_zbc1p0_zbkb1p0_zbkc1p0_zbkx1p0_zvbc1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_xtheadba1p0_xtheadbb1p0_xtheadbs1p0_xtheadcmo1p0_xtheadcondmov1p0_xtheadfmemidx1p0_xtheadmac1p0_xtheadmempair1p0_xtheadsync1p0"
  Tag_RISCV_priv_spec: 1
  Tag_RISCV_priv_spec_minor: 11

If Tag_RISCV_arch contains the substring v1p0, then the associated binary was built assuming RV Vector 1.0 extension instructions are present on the executing CPU hardware thread.