Gap Analysis Example

1: Examples
2: Impact
3: Effort
4: Scope
5: Existing Frameworks

What gaps in Ghidra’s import processes need the most long term attention?

Some features are easy or quick to add to Ghidra’s import processes. Other features might be nice to have but just aren’t worth the effort. How do we approach features that are probably going to be important in the long term but would require a lot of effort to address?

This section considers RISCV-64 code optimization by vector instruction insertion as an example. Either the compiler or the coder can choose to replace sequences of simple instructions with sequences of vector instructions. Those vector sequences often do not have a clean C representation in Ghidra’s decompiler view, making it difficult for Ghidra users to understand what the code is doing and to look for malware or other pathologies.

The overview introduced an approach to this sort of challenge:

What is a current example of this feature, especially examples that support analysis or pathologies of those features.
- ⇒see Examples
How and when might this feature impact a significant number of Ghidra analysts?
- ⇒see Impact
How much effort might it take Ghidra developers to fill the implied feature gap? Do we fill it by extending the core of Ghidra, by generating new plugin scripts or tools, or by educating Ghidra users on how to recognize semantic patterns from raw instructions?
- ⇒see Effort
Is this feature specific to RISCV systems or more broadly applicable to other processor families? Would support for that feature be common to many processor families or vary widely by processor?
- ⇒see Scope
What are the existing frameworks within Ghidra that might most credibly be extended to support that feature?
- ⇒see Existing Frameworks

1 - Examples

Where does this gap appear?

memory copy

alignment issues
obfuscated memcpy and strcpy inline code

other pcode or RTL expansions

loop optimization

vector intrinsics

ML and AI subsystems

2 - Impact

What is the impact of this gap?

How

Ghidra’s current limits in handling RISCV-64 vector instructions will impact users in phases, where the initial impacts are modest and fairly easy to deal with while later impacts will take significant design work to address.

The most immediate impact involves Ghidra disassembly and decompilation failure when encountering unrecognized instructions. The Fedora 39 exemplar kernel contains several extension instructions that Ghidra 11 can’t recognize. These are limited in number and don’t have a material impact on someone examining RISCV kernel code. The voice-to-text app whisper.cpp shows more serious limits - roughly one third of the app’s instructions are unprocessed by Ghidra 11 because of vector and other extension instructions.

That impact can be addressed by simply defining the missing instructions, as in Ghidra’s isa_ext experimental branch. This will allow the disassembler and decompiler to process all instructions in the app. This is necessary but not sufficient, since many or most of the vector extension instructions do not have a clean pcode representation. Obvious calls to memcpy will be replaced with one of a half-dozen inline vector instruction sequences. Simple or nested loops will be ‘vectorized’ with fewer iterations but much more complex instruction opcode sequences. Optimizing compilers can handle those complexities, while Ghidra users searching for malware will have a harder time of it.

The general challenge for Ghidra is that of reconstructing the context from sequences of vector extension instructions.

When

Note: Some material comes as-is from https://www.reddit.com/r/RISCV

The first generally available 64 bit RISCV vector systems development kit has just become available (January 2024), based on the relatively modest THead C908 core. This SDK appears tuned for video processing, perhaps video surveillance applications aggregating multiple cameras into a common video feed. We are probably several years from seeing server-class systems built on SiFive P870 cores, and fabricated on the fastest available fab lines. Memory bandwidth is poor at present, while energy efficiency is potentially better than x86_64 designs.

Judging from internet hype, we can expect to see RISCV vector code appearing in replacements of ARM systems (automotive and possibly cell phone) and as the extensible basis of AI applications.

Cores announced
- SiFive
  - P670 2 x 128 bit vector units, up to 16 cores
  - P870 2 x 128 bit vector units, vector crypto, up to 16 cores
- Alibaba XuanTie THead
  - C908 with RVV 1.0 support, 128 bit VLEN; announced 2022
- StarFive
  - Starfive does not appear to offer a vector RISCV core
SDKs available
- CanMV-K230, dual C908 cores, triple video camera inputs, $40; one core supports RVV 1.0 at 1.6 GHz; 512 MB RAM; announced 2023
- Sophgo SG2380 due Q3 2024 with 16 core SiFive P670 and 8 core SiFiveX280

Who is working this

January 2024 saw a flurry of open source toolchain and framework contributions from several sources.

binutils contributors
- multiple recent contributors from Alibaba, mostly in support of THead extensions
gcc contributors
- intel, alibaba, rivai (ref XCVsimd extension), embecosm, sifive, eswincomputing, ventanamicro, andestech all contributed to the riscv testsuite in the last two weeks.
glibc contributions
- some references to Alibaba riscv extensions
ML framework contributors
- riscv intrinsics appeared in whisper.cpp in November 2023, sync’d from llama, originally contributed by https://pk.linkedin.com/in/ahmad-tameem

3 - Effort

How much effort might it take to fill the gap?

4 - Scope

Does the scope of this gap extend to other processors?

x86_64 comparison
alignment

5 - Existing Frameworks

Which Ghidra frameworks might be extended to fill the gap?

Outline

What can we add to sleigh .sinc files?
- add all extension instructions
- add translation of Elf file attributes into vendor-specific processor selection
- flesh out extension mnemonics to convey vector context, especially vset* instructions
- add comments or metadata that is accessible to the decompiler
What can we add to pcode semantics?
- gcc built-ins like __builtin_memcpy or popcount
- cross platform vector notation
- processor dependent decompiler plugins
What can we add to disassembler
- generalized instruction information on common use patterns
What can we add to decompiler
- reconstruct gcc RTL built-ins
What plugins can we add?
- reconstruct gcc RTL built-ins
What external tools can we leverage?
- generate .sinc updates based on objdump mnemonics
- known source exemplar builds to correlate RTL expressions with instruction sequences
- apply general ML translation to undo pcode expansion into vector instructions