1 - Application Survey
Survey a voice-to-text app for common vector instruction patterns
Take an exemplar RISCV-64 binary like whisper.cpp
, with its many vector instructions.
Which vector patterns are easy to recognize, either for a human Ghidra user or for a hypothetical Ghidra plugin?
Some of the most common patterns correspond to memcpy
or memset
invocations where the number of bytes is known at
compile time as is the alignment of operands.
ML apps like whisper.cpp
often work with parameters of less than 8 bits, so there can be a lot of demarshalling, unpacking,
and repacking operations. That means lots of vector bit manipulation and width conversion operations.
ML apps also do a lot of vector, matrix, and tensor arithmetic, so we can expect to find vectorized arithmetic operations
mixed in with vector parameter conversion operations.
Note: This page is likely to change rapidly as we get a better handle on the problem and develop better analytic tools
to guide the process.
Survey for vector instruction blocks
Most vector instructions come in groups started with a vsetvli
or vsetivli
instruction to set up the vector context.
If the number of vector elements is known at compile time and less than 32, then the vsetivli
instruction is often used.
Otherwise the vsetvli
instruction is used.
Scanning for these instructions showed 673 vsetvli
and 888 vsetivli
instructions within whisper.cpp
.
The most common vsetvli
instruction (343 out of 673) is type 0xc3 or e8,m8,ta,ma
. That expands to:
- element width = 8 bits - no alignment checks are needed, 16 elements per vector register if VLEN=128
- multiplier = 8 - up to 8 vector registers are processed in parallel
- tail agnostic - we don’t care about preserving unassigned vector register bits
- mask agnostic - we don’t care about preserving unmasked vector register bits
The most common vsetivli
instruction (565 out of 888) is type 0xd8 or e64,m1,ta,ma
. That expands to:
- element width = 64 bits - all memory operations should be 64 bit aligned, 2 elements per vector register if VLEN=128
- multiplier = 1 - only the named vector register is used
- tail agnostic - we don’t care about preserving unassigned vector register bits
- mask agnostic - we don’t care about preserving unmasked vector register bits
A similar common vsetivli
instruction (102 out of 888) is type 0xdb or e64,m8,ta,ma
. That expands to:
- element width = 64 bits - all memory operations should be 64 bit aligned, 2 elements per vector register if VLEN=128
- multiplier = 8 - up to 8 vector registers are processed in parallel, or 16 64 bit elements if VLEN=128
- tail agnostic - we don’t care about preserving unassigned vector register bits
- mask agnostic - we don’t care about preserving unmasked vector register bits
The second most common vsetivli
instruction (107 out of 888) is type 0xc7 or e8,mf2,ta,ma
. That expands to:
- element width = 8 bits
- multiplier = 1/2 - vector registers are only half used, perhaps to allow element widening to 16 bits
- tail agnostic - we don’t care about preserving unassigned vector register bits
- mask agnostic - we don’t care about preserving unmasked vector register bits
How many of these vector blocks can be treated as simple memcpy
or memset
invocations?
For example, this Ghidra listing snippet looks like a good candidate for memcpy
:
00090bdc 57 f0 b7 cd vsetivli zero,0xf,e64,m8,ta,ma
00090be0 07 74 07 02 vle64.v v8,(a4)
00090be4 27 f4 07 02 vse64.v v8,(a5)
A pcode equivalent might be __builtin_memcpy(dest=(a5), src=(a4), 8 * 15)
with a possible context note that
vector registers v8 through v16 are changed.
A longer example might be a good candidate for memset
:
00090b84 57 70 81 cd vsetivli zero,0x2,e64,m1,ta,ma
00090b88 93 07 07 01 addi a5,a4,0x10
00090b8c d7 30 00 5e vmv.v.i v1,0x0
00090b90 a7 70 07 02 vse64.v v1,(a4)
00090b94 a7 f0 07 02 vse64.v v1,(a5)
00090b98 93 07 07 02 addi a5,a4,0x20
00090b9c a7 f0 07 02 vse64.v v1,(a5)
00090ba0 93 07 07 03 addi a5,a4,0x30
00090ba4 a7 f0 07 02 vse64.v v1,(a5)
00090ba8 93 07 07 04 addi a5,a4,0x40
00090bac a7 f0 07 02 vse64.v v1,(a5)
00090bb0 93 07 07 05 addi a5,a4,0x50
00090bb4 a7 f0 07 02 vse64.v v1,(a5)
00090bb8 93 07 07 06 addi a5,a4,0x60
00090bbc a7 f0 07 02 vse64.v v1,(a5)
00090bc0 fd 1b c.addi s7,-0x1
00090bc2 23 38 07 06 sd zero,0x70(a4)
This example is based on a minimum VLEN of 128 bits, so the vector registers can hold 2 64 bit elements. The vmv.v.i
instruction sets those two elements of v1
to zero.
Seven vse64.v
instructions then store two 64 bit zeros each to successive memory locations, with a trailing scalar double word store to handle the tail.
A pcode equivalent for this sequence might be __builtin_memset(dest=(a4), 0, 0x78)
.
top down scan of vector blocks
The python script objdump_analytic.py
provides a crude scan of a RISCV-64 binary, reporting on likely vector instruction blocks. It doesn’t handle blocks with more than one
vsetvli
or vsetivli
instruction, something common in vector narrowing or widening operations. If we apply this script to whisper_cpp_vector
we can collect a crude field guide to vector expansions.
VLEN in the following is the hart’s vector length, determined at execution time. It is usually something like 128 bits for a general purpose core (aka hart) and up to 1024 bits
for a dedicated accelerator hart.
memcpy with known and limited nbytes
This pattern is often found when copying objects of known and limited size. It is useful with objects as small as 4 bytes if the source alignment is
unknown and the destination object must be aligned on half-word, word, or double-word boundaries.
; memcpy(dest=a0, src=a3, nbytes=a4) where a4 < 8 * (VLEN/8)
1d3da: 0c377057 vsetvli zero,a4,e8,m8,ta,ma
1d3de: 02068407 vle8.v v8,(a3)
1d3e2: 02050427 vse8.v v8,(a0)
memcpy with unknown nbytes
This pattern is usually found in a simple loop, moving 8 * (VLEN/8) bytes at a time.
The a5 register holds the number of bytes processed per iteration.
; memcpy(dest=a6, src=a7, nbytes=a0)
1d868: 0c3577d7 vsetvli a5,a0,e8,m8,ta,ma
1d86c: 02088407 vle8.v v8,(a7)
1d872: 02080427 vse8.v v8,(a6)
widening floating point reduction
The next example appears to be compiled from estimate_diarization_speaker
whose source is:
double energy0 = 0.0f;
double energy1 = 0.0f;
for (int64_t j = is0; j < is1; j++) {
energy0 += fabs(pcmf32s[0][j]);
energy1 += fabs(pcmf32s[1][j]);
}
This is a typical reduction with widening pattern.
The vector instructions generated are:
242ce: 0d8077d7 vsetvli a5,zero,e64,m1,ta,ma
242d2: 5e0031d7 vmv.v.i v3,0
242d6: 9e303257 vmv1r.v v4,v3
242da: 0976f7d7 vsetvli a5,a3,e32,mf2,tu,ma
242e4: 0205e107 vle32.v v2,(a1)
242e8: 02066087 vle32.v v1,(a2)
242ec: 2a211157 vfabs.v v2,v2
242f0: 2a1090d7 vfabs.v v1,v1
242f8: d2411257 vfwadd.wv v4,v4,v2
242fc: d23091d7 vfwadd.wv v3,v3,v1
24312: 0d8077d7 vsetvli a5,zero,e64,m1,ta,ma
24316: 4207d0d7 vfmv.s.f v1,fa5
2431a: 063091d7 vfredusum.vs v3,v3,v1
2431e: 42301757 vfmv.f.s fa4,v3
24326: 06409257 vfredusum.vs v4,v4,v1
2432a: 424017d7 vfmv.f.s fa5,v4
A hypothetical vectorized Ghidra might decompile these instructions (ignoring the scalar instructions not displayed here) as:
double vector v3, v4; // SEW=64 bit
v3 := vector 0; // load immediate
v4 := v3; // vector copy
float vector v1, v2; // SEW=32 bit
while(...) {
v2 = vector *a1;
v1 = vector *a2;
v2 = abs(v2);
v1 = abs(v1);
v4 = v4 + v2; // widening 32 to 64 bits
v3 = v3 + v1; // widening 32 to 64 bits
}
double vector v1, v3, v4;
v1[0] = fa5; // fa5 is the scalar 'carry-in'
v3[0] = v1[0] + ⅀ v3; // unordered vector reduction
fa4 = v3[0];
v4[0] = v1[0] + ⅀ v4;
fa5 = v4[0];
The vector instruction vfredusum.vs
provides the unordered reduction sum over the elements of a single vector. That’s likely faster than an ordered sum,
but the floating point round-off errors will not be deterministic.
Note: this whisper.cpp
routine attempts to recognize which of two speakers is responsible for each word of a conversation. A speaker-misattribution
exploit might attack functions that call this.
complex structure element copy
The source code includes:
static drwav_uint64 drwav_read_pcm_frames_s16__msadpcm(drwav* pWav, drwav_uint64 framesToRead, drwav_int16* pBufferOut) {
...
pWav->msadpcm.bytesRemainingInBlock = pWav->fmt.blockAlign - sizeof(header);
pWav->msadpcm.predictor[0] = header[0];
pWav->msadpcm.predictor[1] = header[1];
pWav->msadpcm.delta[0] = drwav__bytes_to_s16(header + 2);
pWav->msadpcm.delta[1] = drwav__bytes_to_s16(header + 4);
pWav->msadpcm.prevFrames[0][1] = (drwav_int32)drwav__bytes_to_s16(header + 6);
pWav->msadpcm.prevFrames[1][1] = (drwav_int32)drwav__bytes_to_s16(header + 8);
pWav->msadpcm.prevFrames[0][0] = (drwav_int32)drwav__bytes_to_s16(header + 10);
pWav->msadpcm.prevFrames[1][0] = (drwav_int32)drwav__bytes_to_s16(header + 12);
pWav->msadpcm.cachedFrames[0] = pWav->msadpcm.prevFrames[0][0];
pWav->msadpcm.cachedFrames[1] = pWav->msadpcm.prevFrames[1][0];
pWav->msadpcm.cachedFrames[2] = pWav->msadpcm.prevFrames[0][1];
pWav->msadpcm.cachedFrames[3] = pWav->msadpcm.prevFrames[1][1];
pWav->msadpcm.cachedFrameCount = 2;
...
}
This gets vectorized into sequences containing:
2c6ce: ccf27057 vsetivli zero,4,e16,mf2,ta,ma ; vl=4, SEW=16
2c6d2: 5e06c0d7 vmv.v.x v1,a3 ; v1[0..3] = a3
2c6d6: 3e1860d7 vslide1down.vx v1,v1,a6 ; v1 = v1[1:3], a6
2c6da: 3e1760d7 vslide1down.vx v1,v1,a4 ; v1 = v1[1:3], a4
2c6de: 3e1560d7 vslide1down.vx v1,v1,a0 ; v1 = (a3,a6,a4,a0)
2c6e2: 0d007057 vsetvli zero,zero,e32,m1,ta,ma ; keep existing vl (=4), SEW=32
2c6e6: 4a13a157 vsext.vf2 v2,v1 ; v2 = vector sext(v1) // widening sign extend
2c6ea: 0207e127 vse32.v v2,(a5) ; memcpy(a5, v2, 4 * 4)
2c6f2: 0a07d087 vlse16.v v1,(a5),zero ; v1 = a5[]
2c6fa: 0cf07057 vsetvli zero,zero,e16,mf2,ta,ma
2c702: 3e1660d7 vslide1down.vx v1,v1,a2 ; v1 = v1[1:3], a2
2c70a: 3e16e0d7 vslide1down.vx v1,v1,a3 ; v1 = v1[1:3], a3
2c70e: 3e1760d7 vslide1down.vx v1,v1,a4 ; v1 = v1[1:3], a4
2c712: 0d007057 vsetvli zero,zero,e32,m1,ta,ma
2c716: 4a13a157 vsext.vf2 v2,v1
2c71a: 0205e127 vse32.v v2,(a1)
That’s the kind of messy code you could analyze if you had to. Hopefully not.
2 - Application Top Down Analysis
How much complexity do vector instructions add to a top down analysis?
We know that whisper.cpp contains lots of vector instructions. Now we want to understand how few vector instruction blocks we really need to understand.
For this analysis we will assume a specific goal - inspect the final text output phase to see if an adversary has modified the generated text.
First we want to understand the unmodified behavior using a simple demo case. One of the whisper.cpp examples works well. It was built for the x86-64-v3 platform, not the riscv-64 gcv platform,
but that’s fine - we just want to understand the rough sequencing and get a handle on the strings we might find in or near the top level main routine.
what is the expected behavior?
Note: added comments are flagged with //
/opt/whisper_cpp$ ./main -f samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.46 MB (1 buffers)
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 14.86 MB
whisper_init_state: compute buffer (encode) = 85.99 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |
// done with initialization, lets run speach-to-text
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
// this is the reference line our adversary wants to modify:
[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
// display statistics
whisper_print_timings: load time = 183.72 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 10.30 ms
whisper_print_timings: sample time = 33.90 ms / 131 runs ( 0.26 ms per run)
whisper_print_timings: encode time = 718.87 ms / 1 runs ( 718.87 ms per run)
whisper_print_timings: decode time = 8.35 ms / 2 runs ( 4.17 ms per run)
whisper_print_timings: batchd time = 150.96 ms / 125 runs ( 1.21 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 1110.87 ms
The adversary wants to change the text output from “… ask not what you can do for your country.” to “… ask not what you can do for your enemy.”
They likely drop a string substitution into the code between the output of main: processing
and whisper_print_timings:
, probably very close to
code printing timestamp intervals like [00:00:00.000 --> 00:00:11.000]
.
what function names and strings look relevant?
Our RISCV-64 binary retains some function names and lots of relevant strings. We want to accumulate strings that occur in the demo printout,
then glance at the functions that reference those strings.
For this example we will use a binary that includes some debugging type information. Ghidra can determine names of structure types but not necessarily
the size or field names of those structures.
strings
%s: processing '%s' (%d samples, %.1f sec), %d threads, %d processors, %d beams + best of %d, lang = %s, task = %s, %stimestamps = %d ...
is referenced
near the middle of main
[%s --> %s]
is referenced by whisper_print_segment_callback
[%s --> %s] %s\n
is referenced by whisper_full_with_state
segment
occurs in several places, suggesting that the word refers to a segment of text generated from speech between two timestamps.
ctx
occurs 33 times, suggesting that a context structure is used - and occasionally displayed with field names
error: failed to initialize whisper context\n
is referenced within main
. It may help in understanding internal data organization.
functions
main
- Ghidra decompiles this as ~1000 C statements, including many vector statements
whisper_print_timings
- referenced directly in main near the end
whisper_full_with_state
- referenced indirectly from main via whisper_full_parallel
and whisper_full
output_txt
- referenced directly in main, invokes I/O routines like std::__ostream_insert<>
. There are
other output routines like output_json
. The specific output routine can be selected as a command line parameter
to main
.
types and structs
Ghidra knows that these exist as names, but the details are left to us to unravel.
gpt_params
and gpt_vocab
- these look promising, at a lower ML level
whisper_context
- this likely holds most of the top-level data
whisper_full_params
and whisper_params
- likely structures related to the optional parameters
revealed with the --help
command line option.
whisper_segment
- possibly a segment of digitized audio to be converted as speech.
whisper_vocab
- possible holding the text words known to the training data.
notes
Now we have enough context to narrow the search. We want to know:
- how does
main
call either whisper_print_segment_callback
or whisper_full_with_state
.
whisper_full
is called directly by main
. Ghidra reports this to be about 3000 lines of C. The Ghidra
call tree suggests that this function does most of the text-to-speech tensor math and other ML heavy lifting.
whisper_print_segment_callback
appears to be inserted into a C++ object vtable as a function pointer. The object itself
appears to be built on main
’s stack, so we don’t immediately know its size or use. whisper_print_segment_callback
is less than a tenth the size of
whisper_full_with_state
.
- how does the JFK output text get appended to the string
[%s --> %s]
?
- from what structures is the output text retrieved?
- where are those structures initialized? How large are they, and are any of their fields named
in diagnostic output?
- are there any diagnostic routines displaying the contents of such structures?
next steps
A simple but tedious technique involves a mix of top-down and bottom-up analysis. We work upwards from strings and function references, and down
from the main
routine towards the functions associated with our target text string. Trial and error with lots of backtracking are common here, so
switching back and forth between top-down and bottom-up exploration can provide fresh insights.
Remember that we don’t want to understand any more of whisper.cpp
than we have to. The adversary we are chasing only wants to understand where
the generated text comes within reach. Neither they nor we need to understand all of the ways the C++ standard library might use vector instructions
during I/O subsystem initialization.
On the other hand, they and we may need to recognize basic I/O and string handling operations, since the target text is likely to exist as either a
standard string or a standard vector of strings.
Note: This isn’t a tutorial on how to approach a C++ reverse engineering challenge - it’s an
evaluation of how vectorization might make that more difficult and an exploration of
what additional tools Ghidra or Ghidra users may find useful when faced with vectorization.
That means we’ll skip most of the non-vector analysis.
vectorization obscures initialization
This sequence from main
affects initialization and obscures a possible exploit vector.
vsetivli_e8m8tama(0x17); // memcpy(puStack_110, "models/ggml-base.en.bin", 0x17)
auVar27 = vle8_v(0xa6650);
vsetivli_e8m8tama(0xf); // memcpy(puStack_f0, "" [SPEAKER_TURN]", 0xf)
auVar26 = vle8_v(0xa6668);
puStack_f0 = auStack_e0;
vsetivli_e8m8tama(0x17);
vse8_v(auVar27,puStack_110);
vsetivli_e8m8tama(0xf);
vse8_v(auVar26,puStack_f0);
puStack_d0 = &uStack_c0;
vsetivli_e64m1tama(2); // memset(lStack_b0, 0, 16)
vmv_v_i(auVar25,0);
vse64_v(auVar25,&lStack_b0);
*(char *)((long)puStack_110 + 0x17) = '\0';
If the hypothetical adversary wanted to replace the training model ggml-base.en.bin
with a less benign model, changing the
memory reference within vle8_v(0xa6650)
would be a good place to do it. Note that the compiler has interleaved instructions
generated from the two memcpy expansions, at the cost of two extra vsetivli
instructions. This allows more time for the
vector load instructions to complete.
Focus on output_txt
Some browsing in Ghidra suggests that the following section of main
is close to where we need to focus.
lVar11 = whisper_full_parallel
(ctx,(long)pFVar18,(ulong)pvStack_348,
(long)(int)(lStack_340 - (long)pvStack_348 >> 2),
(long)pvVar20);
if (lVar11 == 0) {
putchar(10,pFVar18);
if (params.do_output_txt != false) {
/* try { // try from 0001dce8 to 0001dceb has its CatchHandler @ 0001e252 */
std::operator+(&full_params,(undefined8 *)pFStack_2e0,
(undefined8 *)pFStack_2d8,(undefined8 *)".txt",
(char *)pvVar20);
uVar13 = full_params._0_8_;
/* try { // try from 0001dcfc to 0001dcfd has its CatchHandler @ 0001e2ec */
std::vector<>::vector(unaff_s3,(vector<> *)unaff_s5);
/* try { // try from 0001dd06 to 0001dd09 has its CatchHandler @ 0001e2f0 */
output_txt(ctx,(char *)uVar13,¶ms,(vector *)unaff_s3);
std::vector<>::~vector(unaff_s3);
std::__cxx11::basic_string<>::_M_dispose((basic_string<> *)&full_params);
}
...
}
Looking into output_txt
Ghidra gives us:
long output_txt(whisper_context *ctx,char *output_file_path,whisper_params *param_3,vector *param_4)
{
fprintf(_stderr,"%s: saving output to \'%s\'\n","output_txt",output_file_path);
max_index = whisper_full_n_segments(ctx);
index = 0;
if (0 < max_index) {
do {
__s = (char *)whisper_full_get_segment_text(ctx,index);
...
sVar8 = strlen(__s);
std::__ostream_insert<>((basic_ostream *)plVar7,__s,sVar8);
...
index = (long)((int)index + 1);
} while (max_index != index);
...
}
...
}
Finally, whisper_full_get_segment_text
is decompiled into:
undefined8 whisper_full_get_segment_text(whisper_context *ctx,long index)
{
gp = &__global_pointer$;
return *(undefined8 *)(index * 0x50 + *(long *)(ctx->state + 0xa5f8) + 0x10);
}
Now the adversary has enough information to try rewriting the generated text from an arbitrary segment of speech.
The text is found in an array linked into the ctx
context variable, probably during the call to whisper_full_parallel
.
added complexity of vectorization
Our key goal is to understand how much effort to put into Ghidra’s decompiler processing of RISCV-64 vector instructions.
The metric for measuring that effort is relative to the effort needed to understand the other instructions produced by a C++
optimizing compiler implementing libstdc++ containers like vectors.
Take a closer look at the call to output_txt
:
std::vector<>::vector(unaff_s3,(vector<> *)unaff_s5);
output_txt(ctx,(char *)uVar13,¶ms,(vector *)unaff_s3);
std::vector<>::~vector(unaff_s3);
The unaff_s3
parameter to output_txt
might be important. Maybe we should examine the constructor and destructor for
this object to probe its internal structure.
In fact unaff_s3
is only used when passing stereo audio into output_txt
, so it is more of a red herring
slowing down the analysis than a true roadblock. Its internal structure is a C++ standard vector of C++ standard vectors
of float, so it’s a decent example of what happens when RISCV-64 vector instructions are used implementing vectors
(and two dimensional matrices) at a higher abstraction level.
A little analysis shows us that std::vector<>::vector
is actually a copy constructor for a class generated from
a vector template. The true type of unaff_s3
and unaff_s5
is roughly std::vector<std::vector<float>>
.
Comment: the copy constructor and the associated destructor are likely present only because the programmer didn’t mark
the parameter as a const
reference.
The destructor std::vector<>::~vector(unaff_s3)
listing shows no vector instructions are used. The inner vectors
are deleted and their memory reclaimed, then the outer containing vector is deleted.
The constructor std::vector<>::vector
is different. Vector instructions are used often, but in very simple contexts.
- The only
vset
mode used is vsetivli_e64m1tama(2)
, asking for no more than two 64 bit elements in a vector register
- The most common vector pattern stores 0 into two adjacent 64 bit pointers
- In one case a 64 bit value is stored into two adjacent 64 bit pointers.
Summary
If whisper.cpp is representative of a broader class of ML programs compiled for RISCV-64 vector-enabled hardware, then:
- Ghidra’s sleigh subsystem needs to recognize at least those vector instructions found in the rvv 1.0 release.
- The decompiler view should have access to pcodeops for all of those vector instructions.
- The 20 to 50 most common
vset*
configurations (e.g., e64m1tama
) should be explicitly recognized at the pcodeop layer
and displayed in the decompiler view.
- Ghidra users should have documentation on common RISCV-64 vector instruction patterns generated during compilation.
These patterns should include common loop patterns and builtin expansions for
memcpy
and memset
, plus examples showing
the common source code patterns resulting in vector reduction, width conversion, slideup/down, and gather/scatter instructions.
Other Ghidra extensions would be nice to have but likely deliver diminishing bang-for-the-buck relative to multiplatform
C++ analytics:
- Extend sleigh
*.sinc
file syntax to convey comments or hints to be visible in the decompiler view, either as pop-ups,
instruction info, or comment blocks.
- Take advantage of the open source nature of RISCV ISA to display links to open source documents on vector instructions
when clicking on a given instruction.
- Treat pcodeops as function calls within the decompiler view, enabling signature overrides and type assignment to the
arguments.
- Create a decompiler plugin framework that can scan the decompiled source and translate vector instruction patterns back
into calls to
__builtin_memcpy(...)
calls.
- Create a decompiler plugin framework that can scan the decompiled source and generate inline comments in a sensible
vector notation.
The toughest challenges might be:
- Find a Ghidra micro-architecture-independent approach to untangling vector instruction generation.
- Use ML translation techniques to match C, C++, and Rust source patterns to generated vector instruction sequences
for known architectures, compilers, and compiler optimization settings.