This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Gap Analysis Example

What gaps in Ghidra’s import processes need the most long term attention?

Some features are easy or quick to add to Ghidra’s import processes. Other features might be nice to have but just aren’t worth the effort. How do we approach features that are probably going to be important in the long term but would require a lot of effort to address?

This section considers RISCV-64 code optimization by vector instruction insertion as an example. Either the compiler or the coder can choose to replace sequences of simple instructions with sequences of vector instructions. Those vector sequences often do not have a clean C representation in Ghidra’s decompiler view, making it difficult for Ghidra users to understand what the code is doing and to look for malware or other pathologies.

The overview introduced an approach to this sort of challenge:

  1. What is a current example of this feature, especially examples that support analysis or pathologies of those features.
  2. How and when might this feature impact a significant number of Ghidra analysts?
  3. How much effort might it take Ghidra developers to fill the implied feature gap? Do we fill it by extending the core of Ghidra, by generating new plugin scripts or tools, or by educating Ghidra users on how to recognize semantic patterns from raw instructions?
  4. Is this feature specific to RISCV systems or more broadly applicable to other processor families? Would support for that feature be common to many processor families or vary widely by processor?
  5. What are the existing frameworks within Ghidra that might most credibly be extended to support that feature?

1 - Examples

Where does this gap appear?

memory copy

  • alignment issues
  • obfuscated memcpy and strcpy inline code

other pcode or RTL expansions

loop optimization

vector intrinsics

ML and AI subsystems

2 - Impact

What is the impact of this gap?

How

Ghidra’s current limits in handling RISCV-64 vector instructions will impact users in phases, where the initial impacts are modest and fairly easy to deal with while later impacts will take significant design work to address.

The most immediate impact involves Ghidra disassembly and decompilation failure when encountering unrecognized instructions. The Fedora 39 exemplar kernel contains several extension instructions that Ghidra 11 can’t recognize. These are limited in number and don’t have a material impact on someone examining RISCV kernel code. The voice-to-text app whisper.cpp shows more serious limits - roughly one third of the app’s instructions are unprocessed by Ghidra 11 because of vector and other extension instructions.

That impact can be addressed by simply defining the missing instructions, as in Ghidra’s isa_ext experimental branch. This will allow the disassembler and decompiler to process all instructions in the app. This is necessary but not sufficient, since many or most of the vector extension instructions do not have a clean pcode representation. Obvious calls to memcpy will be replaced with one of a half-dozen inline vector instruction sequences. Simple or nested loops will be ‘vectorized’ with fewer iterations but much more complex instruction opcode sequences. Optimizing compilers can handle those complexities, while Ghidra users searching for malware will have a harder time of it.

The general challenge for Ghidra is that of reconstructing the context from sequences of vector extension instructions.

When

Note: Some material comes as-is from https://www.reddit.com/r/RISCV

The first generally available 64 bit RISCV vector systems development kit has just become available (January 2024), based on the relatively modest THead C908 core. This SDK appears tuned for video processing, perhaps video surveillance applications aggregating multiple cameras into a common video feed. We are probably several years from seeing server-class systems built on SiFive P870 cores, and fabricated on the fastest available fab lines. Memory bandwidth is poor at present, while energy efficiency is potentially better than x86_64 designs.

Judging from internet hype, we can expect to see RISCV vector code appearing in replacements of ARM systems (automotive and possibly cell phone) and as the extensible basis of AI applications.

  • Cores announced
    • SiFive
      • P670 2 x 128 bit vector units, up to 16 cores
      • P870 2 x 128 bit vector units, vector crypto, up to 16 cores
    • Alibaba XuanTie THead
      • C908 with RVV 1.0 support, 128 bit VLEN; announced 2022
    • StarFive
      • Starfive does not appear to offer a vector RISCV core
  • SDKs available
    • CanMV-K230, dual C908 cores, triple video camera inputs, $40; one core supports RVV 1.0 at 1.6 GHz; 512 MB RAM; announced 2023
    • Sophgo SG2380 due Q3 2024 with 16 core SiFive P670 and 8 core SiFiveX280

Who is working this

January 2024 saw a flurry of open source toolchain and framework contributions from several sources.

  • binutils contributors
    • multiple recent contributors from Alibaba, mostly in support of THead extensions
  • gcc contributors
    • intel, alibaba, rivai (ref XCVsimd extension), embecosm, sifive, eswincomputing, ventanamicro, andestech all contributed to the riscv testsuite in the last two weeks.
  • glibc contributions
    • some references to Alibaba riscv extensions
  • ML framework contributors

3 - Effort

How much effort might it take to fill the gap?

4 - Scope

Does the scope of this gap extend to other processors?

  • x86_64 comparison
  • alignment

5 - Existing Frameworks

Which Ghidra frameworks might be extended to fill the gap?

Outline

  • What can we add to sleigh .sinc files?
    • add all extension instructions
    • add translation of Elf file attributes into vendor-specific processor selection
    • flesh out extension mnemonics to convey vector context, especially vset* instructions
    • add comments or metadata that is accessible to the decompiler
  • What can we add to pcode semantics?
    • gcc built-ins like __builtin_memcpy or popcount
    • cross platform vector notation
    • processor dependent decompiler plugins
  • What can we add to disassembler
    • generalized instruction information on common use patterns
  • What can we add to decompiler
    • reconstruct gcc RTL built-ins
  • What plugins can we add?
    • reconstruct gcc RTL built-ins
  • What external tools can we leverage?
    • generate .sinc updates based on objdump mnemonics
    • known source exemplar builds to correlate RTL expressions with instruction sequences
    • apply general ML translation to undo pcode expansion into vector instructions