This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Issues
Summarize Ghidra import issues here to promote discussion on relative priority and possible solutions.
Thread local storage class handling
Thread local storage provides one instance of the variable per extant thread.
GCC supports this feature as:
__thread char threadLocalString[4096];
Binaries built with this feature will often include ELF relocation codes like R_RISCV_TLS_TPREL64
. This relocation code is not recognized by
Ghidra, nor is it clear how TLS storage should be handled itself within Ghidra - perhaps as a memory section akin to BSS?
To reproduce, import libc.so.6
and look for lines like Elf Relocation Warning: Type = R_RISCV_TLS_TPREL64
.
Alternatively, compile, link, and import riscv64/generated/userSpaceSamples/relocationTest.c
.
Vector instruction support
Newer C compiler releases can replace simple loops and standard C library invocations with processor-specific vector instructions. These vector
instructions can be handled poorly by Ghidra’s disassembler and worse by Ghidra’s decompiler. See autovectorization
and vector_intrinsics for examples.
1 - inferring semantics from code patterns
How can we do a better job of recognizing semantic patterns in
optimized code? Instruction set extensions make that more challenging.
Ghidra users want to understand the intent of binary code. The semantics
and intent of the memcpy(dest,src,nbytes)
operation are pretty clear.
If the compiler converts this into a call to a named external function,
that’s easy to recognize. If it converts it into a simple inline loop of load and
store instructions, that should be recognizable too.
Optimizing compilers like gcc can generate many different instruction sequences from
a simple concept like memcpy
or strnlen
, especially if the processor for which the code is intended
supports advanced vector or bit manipulation instructions. We can examine the compiler
testsuite to see what those patterns can be, enabling either human or machine translation of
those sequences into the higher level semantics of memory movement or finding the first null
byte in a string.
Gcc semantics recognizes memory copy operations via the operation cpymem
. Calls to
the standard library memcpy
and various kinds of struct copies are translated into this RTL (Register Transfer Logic)
cpymem
token. The processor-specific gcc back-end then expands cpymem
into a half-dozen or so instruction patterns,
depending on size, alignment, and instruction set extensions of the target processor.
In the ideal world, Ghidra would recognize all of those RTL operations as pcode operations, and further recognize
all of the common back end expansions for all processor variants. It might rewrite the decompiler window or simply
add comments indicating a likely cpymem
pcode expansion.
It’s enough for now to show how to gather the more common patterns to help human Ghidra operators untangle these
optimizations and understand the simpler semantics they encode.
Note: This example uses RISCV vector optimization - many other optimizations are supported by gcc too.
Patterns in gcc vectorization source code
Maybe the best reference on gcc vectorization is the gcc source code itself.
- What intrinsics are likely to be replaced with vector code?
- What patterns of vector assembly instructions are likely to be generated?
- How does the gcc test suite search for those patterns to verify intrinsic replacement is correct?
Start with:
gcc/config/riscv/riscv-vector-builtins.cc
gcc/config/riscv/riscv-vector-strings.cc
gcc/config/riscv/autovec.md
gcc/config/riscv/riscv-string.cc
Ghidra semantics use pcode
operations. GCC uses something similar in RTL (Register Transfer Language). These
are described in gcc/doc/md.texi
. These include:
cpymem
setmem
strlen
rawmemchr
cmpstrn
cmpstr
The cpymem
op covers inline calls to memcpy and structure copies. Trace this out:
riscv.md
:
(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "general_operand")
(match_operand:BLK 1 "general_operand"))
(use (match_operand:P 2 ""))
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
})
riscv_expand_block_move
is also mentioned in riscv-protos.h
and riscv-string.cc
.
Look into riscv-string.cc
:
/* This function delegates block-move expansion to either the vector
implementation or the scalar one. Return TRUE if successful or FALSE
otherwise. */
bool
riscv_expand_block_move (rtx dest, rtx src, rtx length)
{
if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
{
bool ok = riscv_vector::expand_block_move (dest, src, length);
if (ok)
return true;
}
if (stringop_strategy & STRATEGY_SCALAR)
return riscv_expand_block_move_scalar (dest, src, length);
return false;
...
}
...
* --- Vector expanders --- */
namespace riscv_vector {
/* Used by cpymemsi in riscv.md . */
bool
expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
{
/*
memcpy:
mv a3, a0 # Copy destination
loop:
vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b
vle8.v v0, (a1) # Load bytes
add a1, a1, t0 # Bump pointer
sub a2, a2, t0 # Decrement count
vse8.v v0, (a3) # Store bytes
add a3, a3, t0 # Bump pointer
bnez a2, loop # Any more?
ret # Return
*/
}
}
Note that the RISCV assembly instructions in the comment are just an example, and that the C++ implementation
handles many different variants. The ret
instruction is not part of the expansion, just copied into the source
code from the testsuite.
The testsuite (gcc/testsuite/gcc.target/riscv
) shows which variants are common enough to test against.
a minimalist call to memcpy
void f1 (void *a, void *b, __SIZE_TYPE__ l)
{
memcpy (a, b, l);
}
** f1:
XX \.L\d+: # local label is ignored
** vsetvli\s+[ta][0-7],a2,e8,m8,ta,ma
** vle8\.v\s+v\d+,0\(a1\)
** vse8\.v\s+v\d+,0\(a0\)
** add\s+a1,a1,[ta][0-7]
** add\s+a0,a0,[ta][0-7]
** sub\s+a2,a2,[ta][0-7]
** bne\s+a2,zero,\.L\d+
** ret
*/
a typed call to memcpy
void f2 (__INT32_TYPE__* a, __INT32_TYPE__* b, int l)
{
memcpy (a, b, l);
}
Additional type information doesn’t appear to affect the inline code
** f2:
XX \.L\d+: # local label is ignored
** vsetvli\s+[ta][0-7],a2,e8,m8,ta,ma
** vle8\.v\s+v\d+,0\(a1\)
** vse8\.v\s+v\d+,0\(a0\)
** add\s+a1,a1,[ta][0-7]
** add\s+a0,a0,[ta][0-7]
** sub\s+a2,a2,[ta][0-7]
** bne\s+a2,zero,\.L\d+
** ret
memcpy with aligned elements and known size
In this case arguments are aligned and 512 bytes in length.
extern struct { __INT32_TYPE__ a[16]; } a_a, a_b;
void f3 ()
{
memcpy (&a_a, &a_b, sizeof a_a);
}
The generated sequence varies depending on how much the compiler knows about the target architecture.
** f3: { target { { any-opts "-mcmodel=medlow" } && { no-opts "-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl1024b" "--param=riscv-autovec-lmul=dynamic" "--param=riscv-autovec-lmul=m2" "--param=riscv-autovec-lmul=m4" "-
-param=riscv-autovec-lmul=m8" "--param=riscv-autovec-preference=fixed-vlmax" } } }
** lui\s+[ta][0-7],%hi\(a_a\)
** addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
** lui\s+[ta][0-7],%hi\(a_b\)
** addi\s+a4,[ta][0-7],%lo\(a_b\)
** vsetivli\s+zero,16,e32,m8,ta,ma
** vle32.v\s+v\d+,0\([ta][0-7]\)
** vse32\.v\s+v\d+,0\([ta][0-7]\)
** ret
f3: { target { { any-opts "-mcmodel=medlow --param=riscv-autovec-preference=fixed-vlmax" "-mcmodel=medlow -march=rv64gcv_zvl512b --param=riscv-autovec-preference=fixed-vlmax" } && { no-opts "-march=rv64gcv_zvl1024b" } } }
** lui\s+[ta][0-7],%hi\(a_a\)
** lui\s+[ta][0-7],%hi\(a_b\)
** addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
** addi\s+a4,[ta][0-7],%lo\(a_b\)
** vl(1|4|2)re32\.v\s+v\d+,0\([ta][0-7]\)
** vs(1|4|2)r\.v\s+v\d+,0\([ta][0-7]\)
** ret
** f3: { target { { any-opts "-mcmodel=medlow -march=rv64gcv_zvl1024b" "-mcmodel=medlow -march=rv64gcv_zvl512b" } && { no-opts "--param=riscv-autovec-preference=fixed-vlmax" } } }
** lui\s+[ta][0-7],%hi\(a_a\)
** lui\s+[ta][0-7],%hi\(a_b\)
** addi\s+a4,[ta][0-7],%lo\(a_b\)
** vsetivli\s+zero,16,e32,(m1|m4|mf2),ta,ma
** vle32.v\s+v\d+,0\([ta][0-7]\)
** addi\s+[ta][0-7],[ta][0-7],%lo\(a_a\)
** vse32\.v\s+v\d+,0\([ta][0-7]\)
** ret
** f3: { target { { any-opts "-mcmodel=medany" } && { no-opts "-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv_zvl1024b" "--param=riscv-autovec-lmul=dynamic" "--param=riscv-autovec-lmul=m8" "--param=riscv-autovec-lmul=m4" "--param=riscv-autovec-preference=fixed-vlmax" } } }
** lla\s+[ta][0-7],a_a
** lla\s+[ta][0-7],a_b
** vsetivli\s+zero,16,e32,m8,ta,ma
** vle32.v\s+v\d+,0\([ta][0-7]\)
** vse32\.v\s+v\d+,0\([ta][0-7]\)
** f3: { target { { any-opts "-mcmodel=medany" } && { no-opts "-march=rv64gcv_zvl512b" "-march=rv64gcv_zvl256b" "-march=rv64gcv" "-march=rv64gc_zve64d" "-march=rv64gc_zve32f" } } }
** lla\s+[ta][0-7],a_b
** vsetivli\s+zero,16,e32,m(f2|1|4),ta,ma
** vle32.v\s+v\d+,0\([ta][0-7]\)
** lla\s+[ta][0-7],a_a
** vse32\.v\s+v\d+,0\([ta][0-7]\)
** ret
*/
** f3: { target { { any-opts "-mcmodel=medany --param=riscv-autovec-preference=fixed-vlmax" } && { no-opts "-march=rv64gcv_zvl1024b" } } }
** lla\s+[ta][0-7],a_a
** lla\s+[ta][0-7],a_b
** vl(1|2|4)re32\.v\s+v\d+,0\([ta][0-7]\)
** vs(1|2|4)r\.v\s+v\d+,0\([ta][0-7]\)
** ret
2 - autovectorization
If a processor supports vector (aka SIMD) instructions, optimizing compilers will use them. That means Ghidra may need to make sense of
the generated code.
Loop autovectorization
What happens when a gcc toolchain optimizes the following code?
#include <stdio.h>
int main(int argc, char** argv){
const int N = 1320;
char s[N];
for (int i = 0; i < N - 1; ++i)
s[i] = i + 1;
s[N - 1] = '\0';
printf(s);
}
This involves a simple loop filling a character array with integers. It isn’t a well formed C string,
so the printf
statement is just there to keep the character array from being optimized away.
The elements of the loop involve incremental indexing, narrowing from 16 bit to 8 bit elements,
and storage in a 1320 element vector.
The result depends on the compiler version and what kind of microarchitecture gcc-14 was told to compile for.
Compile and link this file with a variety of compiler versions, flags, and microarchitectures to see how well
Ghidra tracks toolchain evolution. In each case the decompiler output is manually adjusted to relabel
variables like s
and i
and remove extraneous declarations.
RISCV-64 gcc-13, no optimization, no vector extensions
$ bazel build -s --platforms=//platforms:riscv_userspace gcc_vectorization:narrowing_loop
...
Ghidra gives:
char s [1319];
...
int i;
...
for (i = 0; i < 0x527; i = i + 1) {
s[i] = (char)i + '\x01';
}
...
printf(s);
The loop consists of 17 instructions and 60 bytes. It is executed 1319 times
RISCV-64 gcc-13, full optimization, no vector extensions
bazel build -s --platforms=//platforms:riscv_userspace --copt="-O3" gcc_vectorization:narrowing_loop
Ghidra gives:
long i;
char s_offset_by_1 [1320];
i = 1;
do {
s_offset_by_1[i] = (char)i;
i = i + 1;
} while (i != 0x528);
uStack_19 = 0;
printf(s_offset_by_1 + 1);
The loop consists of 4 instructions and 14 bytes. It is executed 1319 times.
Note that Ghidra has reconstructed the target vector s
a bit strangely, with the beginning
offset by one byte to help shorten the loop.
RISCV-64 gcc-13, full optimization, with vector extensions
bazel build -s --platforms=//platforms:riscv_userspace --copt="-O3" gcc_vectorization:narrowing_loop_vector
The Ghidra import is essentially unchanged - updating the target architecture from rv64igc
to rv64igcv
makes no difference
when building with gcc-13.
RISCV-64 gcc-14, no optimization, no vector extensions
bazel build -s --platforms=//platforms:riscv_vector gcc_vectorization:narrowing_loop
The Ghidra import is essentially unchanged - updating gcc from gcc-13 to gcc-14 makes no difference without optimization.
RISCV-64 gcc-14, full optimization, no vector extensions
bazel build -s --platforms=//platforms:riscv_vector --copt="-O3" gcc_vectorization:narrowing_loop
The Ghidra import is essentially unchanged - updating gcc from gcc-13 to gcc-14 makes no difference - when using the default
target architecture without vector extensions.
RISCV-64 gcc-14, full optimization, with vector extensions
Build with -march=rv64gcv
to tell the compiler to assume the processor supports RISCV vector extensions.
bazel build -s --platforms=//platforms:riscv_vector --copt="-O3" gcc_vectorization:narrowing_loop_vector
/* WARNING: Unimplemented instruction - Truncating control flow here */
halt_unimplemented();
The disassembly window shows that the loop consists of 13 instructions and 46 bytes.
Many of these are vector extension instructions for which Ghidra 11.0 has no semantics.
Different RISCV processors will take a different number of iterations to finish the loop.
If the processor VLEN=128, then each vector register will hold 4 32 bit integers and the
loop will take 330 iterations. If the processor VLEN=1024 then the loop will take 83 iterations.
Either way, Ghidra 11.0 will fail to decompile any such autovectorized loop, and fail to decompile
the remainder of any function which contains such an autovectorized loop.
x86-64 gcc-14, full optimization, with sapphirerapids
Note: Intel’s Saphire Rapids includes high end server processors like the Xeon Max family.
A more general choice for x86_64 exemplars would be -march=x86-64-v3
with -O2
optimization.
We can expect full Red Hat Linux distributions soon built with those options. The x86-64-v4
microarchitecture is a generalization of Saphire Rapids microarchitecture, and would more likely be found
in servers specialized for numerical analysis or possibly ML applications.
$ bazel build -s --platforms=//platforms:x86_64_default --copt="-O3" --copt="-march=sapphirerapids" gcc_vectorization:narrowing_loop
Ghidra 11.0 disassembler and decompiler fail immediately on hitting the first vector instruction vpbroadcastd
, an older avx2 vector extension.
/* WARNING: Bad instruction - Truncating control flow here */
halt_baddata();
builtin autovectorization
GCC can replace calls to some functions like memcpy
, replacing those calls with inline - and potentially vectorized - instructions.
This source file shows different ways memcopy can be compiled.
include "common.h"
#include <string.h>
int main() {
const int N = 127;
const uint32_t seed = 0xdeadbeef;
srand(seed);
// data gen
double A[N];
gen_rand_1d(A, N);
// compute
double copy[N];
memcpy(copy, A, sizeof(A));
// prevent optimization from removing result
printf("%f\n", copy[N-1]);
}
RISCV-64 builds
Build with:
$ bazel build --platforms=//platforms:riscv_vector --copt="-O3" gcc_vectorization:memcpy_vector
Ghidra 11 gives:
undefined8 main(void)
{
undefined auVar1 [64];
undefined *puVar2;
undefined (*pauVar3) [64];
long lVar4;
long lVar5;
undefined auVar6 [256];
undefined local_820 [8];
undefined8 uStack_818;
undefined auStack_420 [1032];
srand(0xdeadbeef);
puVar2 = auStack_420;
gen_rand_1d(auStack_420,0x7f);
pauVar3 = (undefined (*) [64])local_820;
lVar4 = 0x3f8;
do {
lVar5 = vsetvli_e8m8tama(lVar4);
auVar6 = vle8_v(puVar2);
lVar4 = lVar4 - lVar5;
auVar1 = vse8_v(auVar6);
*pauVar3 = auVar1;
puVar2 = puVar2 + lVar5;
pauVar3 = (undefined (*) [64])(*pauVar3 + lVar5);
} while (lVar4 != 0);
printf("%f\n",uStack_818);
return 0;
}
What would we like the decompiler to show instead? The memcpy
pattern should be fairly general and stable.
src = auStack_420;
gen_rand_1d(auStack_420,0x7f);
dest = (undefined (*) [64])local_820;
/* char* dest, src;
dest[0..n] ≔ src[0..n]; */
n = 0x3f8;
do {
lVar2 = vsetvli_e8m8tama(n);
auVar3 = vle8_v(src);
n = n - lVar2;
auVar1 = vse8_v(auVar3);
*dest = auVar1;
src = src + lVar2;
dest = (undefined (*) [64])(*dest + lVar2);
} while (n != 0);
More generally, we want a precomment showing the memcpy in vector terms immediately before
the loop. The type definition of dest
is a red herring to be dealt with.
x86-64 builds
Build with:
$ bazel build -s --platforms=//platforms:x86_64_default --copt="-O3" --copt="-march=sapphirerapids" gcc_vectorization:memcpy_sapphirerapids
Ghidra 11.0’s disassembler and decompiler bail out when they reach the inline replacement for memcpy
- gcc-14 has replaced the call with
vector instructions like vmovdqu64
, which is unrecognized by Ghidra.
void main(void)
{
undefined auStack_428 [1016];
undefined8 uStack_30;
uStack_30 = 0x4010ad;
srand(0xdeadbeef);
gen_rand_1d(auStack_428,0x7f);
/* WARNING: Bad instruction - Truncating control flow here */
halt_baddata();
}
3 - vector intrinsics
Invoking RISCV vector instructions from C.
RISCV vector intrinsic functions can be coded into C or C++.
That document includes examples of code that might be found shortly in libc:
void *memcpy_vec(void *restrict destination, const void *restrict source,
size_t n) {
unsigned char *dst = destination;
const unsigned char *src = source;
// copy data byte by byte
for (size_t vl; n > 0; n -= vl, src += vl, dst += vl) {
vl = __riscv_vsetvl_e8m8(n);
vuint8m8_t vec_src = __riscv_vle8_v_u8m8(src, vl);
__riscv_vse8_v_u8m8(dst, vec_src, vl);
}
return destination;
}
Note: GCC-14 autovectorization will often convert normal calls to memcpy
into something very similar to the memcpy_vec
code above, then assemble it down to RISCV vector instructions.
As another example, here is a snippet of code from the whisper.cc voice to text open source project:
...
#ifdef __riscv_v_intrinsic
#include <riscv_vector.h>
#endif
...
elif defined(__riscv_v_intrinsic)
size_t vl = __riscv_vsetvl_e32m4(QK8_0);
for (int i = 0; i < nb; i++) {
// load elements
vfloat32m4_t v_x = __riscv_vle32_v_f32m4(x+i*QK8_0, vl);
vfloat32m4_t vfabs = __riscv_vfabs_v_f32m4(v_x, vl);
vfloat32m1_t tmp = __riscv_vfmv_v_f_f32m1(0.0f, vl);
vfloat32m1_t vmax = __riscv_vfredmax_vs_f32m4_f32m1(vfabs, tmp, vl);
float amax = __riscv_vfmv_f_s_f32m1_f32(vmax);
...
}
Normally you would expect to see functions like __riscv_vfabs_v_f32m4
defined in the include file riscv_vector.h
, where Ghidra could process it and help identify calls to
these intrinsics. The vector intrinsic functions are instead autogenerated directly into GCC’s internal compiled header format when the compiler is built - there are just too many variants
to cope with. The PDF listing of all intrinsic functions is currently over 4000 pages long. For example, the signature for __riscv_vfredmax_vs_f32m4_f32m1
is given on page 734 under
Vector Reduction Operations
as
vfloat32m1_t __riscv_vfredmax_vs_f32m4_f32m1(vfloat32m4_t vs2, vfloat32m1_t vs1, size_t vl);
There aren’t all that many vector instruction genotypes, but there are an enormous number of contextual variations the compiler and assembler know about.
4 - link time optimization
Link Time Optimization (LTO) is a relatively new form of toolchain optimization that can produce
smaller and faster binaries. It can also mutate control flows in those binaries making
Ghidra analysis trickier, especially if one is using BSIM to look for control flow
similarities.
Can we generate importable exemplars using LTO to show what such optimization steps look like in Ghidra?
LTO needs a command line parameter added for both compilation and linking. With bazel, that means
--copt="-flto" --linkopt="-Wl,-flto"
is enough to request LTO optimization on a build. These lto
flags
can also be defaulted into the toolchain definition or individual build files.
Let’s try this with a progressively more complicated series of exemplars
# Build helloworld without LTO as a control
$ bazel build -s --copt="-O2" --platforms=//platforms:riscv_vector userSpaceSamples:helloworld
...
$ ls -l bazel-bin/userSpaceSamples/helloworld
-r-xr-xr-x. 1 --- --- 8624 Jan 31 10:44 bazel-bin/userSpaceSamples/helloworld
# The helloworld exemplar doesn't benefit much from link time optimization
$ bazel build -s --copt="-O2" --copt="-flto" --linkopt="-Wl,-flto" --platforms=//platforms:riscv_vector userSpaceSamples:helloworld
$ ls -l bazel-bin/userSpaceSamples/helloworld
-r-xr-xr-x. 1 --- --- 8608 Jan 31 10:46 bazel-bin/userSpaceSamples/helloworld
The memcpy
source exemplar can be built three ways:
- without vector extensions and without LTO - build target
gcc_vectorization:memcpy
- with vector extensions and without LTO - build target
gcc_vectorization:memcpy_vector
- with vector extensions and with LTO - build target
gcc_vectorization:memcpy_lto
In this case the LTO options are configured into gcc_vectorization/BUILD
.
$ bazel build -s --platforms=//platforms:riscv_vector gcc_vectorization:memcpy
...
INFO: Build completed successfully ...
$ ls -l bazel-bin/gcc_vectorization/memcpy
-r-xr-xr-x. 1 --- --- 13488 Jan 31 11:16 bazel-bin/gcc_vectorization/memcpy
$ bazel build -s --platforms=//platforms:riscv_vector gcc_vectorization:memcpy_vector
INFO: Build completed successfully ...
$ ls -l bazel-bin/gcc_vectorization/memcpy_vector
-r-xr-xr-x. 1 --- --- 13728 Jan 31 11:18 bazel-bin/gcc_vectorization/memcpy_vector
$ bazel build -s --platforms=//platforms:riscv_vector gcc_vectorization:memcpy_lto
ERROR: ...: Linking gcc_vectorization/memcpy_lto failed: (Exit 1): gcc failed: error executing CppLink command (from target //gcc_vectorization:memcpy_lto) ...
lto1: internal compiler error: in riscv_hard_regno_nregs, at config/riscv/riscv.cc:8058
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: external/gcc-14-riscv64-suite/bin/riscv64-unknown-linux-gnu-gcc returned 1 exit status
compilation terminated.
So it looks like LTO has problems with RISCV vector instructions. We’ll keep testing this as more gcc 14 snapshots become available,
but as a lower priority exercise. LTO does not seem like a popular optimization.
5 - testing pcode semantics
Ghidra processor semantics needs tests
Procesor instructions known to Ghidra are defined in Sleigh pcode or semanitc sections.
Adding new instructions - such as instruction set extensions to an existing processor -
requires a pcode description of what that instruction does. That pcode drives both the decompiler
process and any emulator or debugger processes.
This generates a conflict in testing. Should we test for maximum clarity for semantics rendered
in the decompiler window or maximum fidelity in any Ghidra emulator? For example,
should a divide instruction include pcode to test against a divide-by-zero? Should floating point
instructions guard against NaN (Not a Number) inputs?
We assume here that decompiler fidelity is more important than emulator fidelity. That implies:
- ignore any exception-generating cases, including divide-by-zero, NaN, memory access and memory alignment.
- pcode must allow for normal C implicit type conversions, such as between different integer and
floating point lengths.
- this implies pcode must pay attention to Ghidra’s type inference system.
Concept of Operations
Individual instructions are wrapped in C and exercised within a Google Test C++ framework.
The test framework is then executed within a qemu static emulation environment.
For example, let’s examine two riscv-64 instructions: fcvt.w.s
and fmv.x.w
fcvt.w.s
converts a floating-point number in floating-point register rs1 to a signed
32-bit or 64-bit integer, respectively, in integer register rd.
fmv.x.w
moves the single-precision value in floating-point register rs1 represented in
IEEE 754-2008 encoding to the lower 32 bits of integer register rd. For RV64,
the higher 32 bits of the destination register are filled with copies of the floating-point
number’s sign bit.
These two instructions have similar signatures but very different semantics. fcvt.w.s
performs
a float to int type conversion, so the float 1.0
can be converted to int 1
. fmv.x.w
moves the
raw bits between float and int registers without any type coversion.
We can generate simple exemplars of both instructions with this C code:
int fcvt_w_s(float* x) {
return (int)*x;
}
int fmv_x_w(float* x) {
int val;
float src = *x;
__asm__ __volatile__ (
"fmv.x.w %0, %1" \
: "=r" (val) \
: "f" (src));
return val;
}
Ghidra’s 11.2-DEV decompiler renders these as:
long fcvt_w_s(float *param_1)
{
return (long)(int)*param_1;
}
long fmv_x_w(float *param_1)
{
return (long)(int)param_1;
}
fmv_x_w
was missing a dereference operation. The fmv_x_w
version was also implying an
implicit type conversion when none is actually performed. Let’s trace how to use these test
results to improve the decompiler output.
Running tests
The draft test harness can be built and run from the top level workspace directory.
$ bazel build --platforms=//riscv64/generated/platforms:riscv_userspace riscv64/generated/emulator_tests:testSemantics
Starting local Bazel server and connecting to it...
INFO: Analyzed target //riscv64/generated/emulator_tests:testSemantics (74 packages loaded, 1902 targets configured).
...
INFO: From Executing genrule //riscv64/generated/emulator_tests:testSemantics:
...
INFO: Found 1 target...
Target //riscv64/generated/emulator_tests:testSemantics up-to-date:
bazel-bin/riscv64/generated/emulator_tests/results
INFO: Elapsed time: 4.172s, Critical Path: 1.93s
INFO: 4 processes: 1 internal, 3 linux-sandbox.
INFO: Build completed successfully, 4 total actions
$ cat bazel-bin/riscv64/generated/emulator_tests/results
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from FP
[ RUN ] FP.testharness
[ OK ] FP.testharness (3 ms)
[ RUN ] FP.fcvt
[ OK ] FP.fcvt (10 ms)
[ RUN ] FP.fmv
[ OK ] FP.fmv (1 ms)
[ RUN ] FP.fp16
[ OK ] FP.fp16 (0 ms)
[----------] 4 tests from FP (15 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (19 ms total)
[ PASSED ] 4 tests.
$ file bazel-bin/riscv64/generated/emulator_tests/libfloatOperations.so
bazel-bin/riscv64/generated/emulator_tests/libfloatOperations.so: ELF 64-bit LSB shared object, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, not stripped
Test parameters
What version of Ghidra are we testing against?
What do we do with Ghidra patches that improve the decompilation results?
- If the patched instructions only exist within the isa_ext
fork, we will make the changes to that fork and PR
- If the patches come from unmerged PRs we may cherry-pick them into
isa_ext. This includes some of the
fcvt
and fmv
patches from other sources.
Test example
Use meld
to compare the original C source file floatOperations.c
with the exported C decompiler view from Ghidra 11.2-DEV. A quick inspection shows some errors to address:
Original |
Ghidra |
float fcvt_s_wu(uint32_t* i) {return (float)*i;} |
float fcvt_s_wu(uint *param_1){return (float)ZEXT416(*param_1);} |
double fcvt_d_wu(uint32_t* j){return (double)*j;} |
double fcvt_d_wu(uint *param_1){return (double)ZEXT416(*param_1);} |
|
long fmv_x_w(float *param_1){return (long)(int)*param_1;} |
|
long fmv_x_d(double *param_1){return (long)(int)*param_1;} |
|
long fcvt_h_w(int param_1){return (long)param_1;} |
|
long fcvt_h_wu(int param_1){return (long)param_1;} |
|
ulong fcvt_h_d(ulong *param_1){return *param_1 & 0xffffffff;} |
The errors include:
- spurious
ZEXT416
in two places
fmv
instructions appear to force an implicit type conversion where none is wanted
- missing dereference operation in
fcvt_h_w
and fcvt_h_wu
- bad mask operation in
fcvt_h_d
Next steps
Testing semantics for the zfh
half-precision floating point instructions is more complicated
than usual. Ghidra’s semantics and pcode system has no known provision for half-precision floating point,
so emulation won’t work well. The current zfh
implementation makes these _fp16 objects look
like 32 bit floats in registers and like 16 bit shorts in memory operations, making Ghidra type inferencing
even more confusing.
Let’s look at a more limited scope, the definition of the Ghidra trunc
pcode op.
The documentation says trunc
produces a signed integer obtained by truncating its argument.
- how does
trunc
set its result type?
- does
trunc
expect only a floating point double?
- what would it take to define
trunk_u
to generate an unsigned integer
- what would it take to accept a half-precision floating point value as an argument?
The documentation also says that float2float
‘copies a floating-point number with more or less precision’,
so its implementation may tell us something about type inferencing.
Ghidra/Features/Decompiler/src/decompile/cpp/pcodeparse.cc
binds float2float
to OP_FLOAT2FLOAT
- this leads to
CPUI_FLOAT_FLOAT2FLOAT
and to several files under Ghidra/Features/Decompiler/src/decompile/cpp
.
- functions like
FloatFormat::opFloat2Float
and FloatFormat::opTrunc
look relevant in float.hh
and float.cc