QEMU Decodetree详解

2024-03-31 QEMU 源码分析 Decodetree, QEMU, TCG 评论字数统计: 4.9k(字) 阅读时长: 22(分)

QEMU 在 decode 指令的时候，需要调用各平台所定义的 instruction decoders 来解析指令。如在 ARM 平台下，就定义了：disas_arm_insn()、disas_thumb_insn() 及 disas_thumb2_insn() 等来分别负责 ARM 32-bits 指令、ARM Thumb 指令及 ARM Thumb2 指令的解析。

而 Decodetree 则是由 Bastian Koppelmann 于 2017 年在移植 RISC-V QEMU 的时候所提出来的机制 (详见：讨论邮件1、讨论邮件2)。提出该机制主要是因为过往的 instruction decoders (如：ARM) 都是采用一堆 switch-case 来做判断。不仅难阅读，也难以维护。

因此 Bastian Koppelmann 就提出了 Decodetree 的机制，开发者只需要通过 Decodetree 的语法定义各个指令的格式，便可交由 Decodetree 来动态生成对应包含 switch-case 的 instruction decoder .c 文档。

Decodetree 特别适合像 RISC-V 这种具有固定指令格式的 ISA。

因为各字段都在固定的位置，(如 RISC-V 的 opcode 都是固定在 bits[6..0] 的位置)。

Decodetree 其实是由 Python script (./scripts/decodetree.py) 所生成的。使用文档可以参考：./docs/devel/decodetree.rst，里面有详细定义了其语法的格式。QEMU 在编译时，会调用 Decodetree，根据各平台所定义的 decode 文档，动态生成对应的 decoder。

如 RISC-V 的 instruction decoders 就是被定义在：./target/riscv/*.decode 中。其 Makefile.obj 就有如下的声明：

...

DECODETREE = $(SRC_PATH)/scripts/decodetree.py

decode32-y = $(SRC_PATH)/target/riscv/insn32.decode
decode32-$(TARGET_RISCV64) += $(SRC_PATH)/target/riscv/insn32-64.decode

...

target/riscv/decode_insn32.inc.c: $(decode32-y) $(DECODETREE)
	$(call quiet-command, \
	  $(PYTHON) $(DECODETREE) -o $@ --static-decode decode_insn32 \
          $(decode32-y), "GEN", $(TARGET_DIR)$@)

Decodetree 的语法共分为：Fields、Argument Sets、Formats、Patterns 五部分。本文将介绍如何通过 Decodetree 的语法，来动态生成一个指令的 decoder。

Field

Field 定义如何取出一指令中，各字段 (eg: rd, rs1, rs2, imm) 的值。

field_def     := '%' identifier ( unnamed_field )* ( !function=identifier )?
unnamed_field := number ':' ( 's' ) number

其语法由 % 开头，随后紧接着一个 identifier 及零个或多个 unamed_field，并可再加上可选的 !function。

identifier 可由开发者自定，如：rd、imm… 等。
unamed_field 定义了该字段的所在比特。第一个数字定义了该字段的 least-significant bit position，第二个数字则定义了该字段的比特长度。另外可加上可选的 s 字符来标明在取出该字段后，是否需要做符号扩展。
- Eg：%rd 7:5 代表 rd 占了指令中 bits 7 ~ bits 11 的位置 (insn[11:7])，共 5 bits。
!function 定义在截取出该字段的值后，所会再调用的 function。

Field (32-bits 指令) 最后会生成对应的 extract32() 及 sextract32() 代码，以用来取得指令中各字段的值：

Field 示例

Input	Generated code
%disp 0:s16	sextract(i, 0, 16)
%imm9 16:6 10:3	extract(i, 16, 6) << 3
%disp12 0:s1 1:1 2:10	sextract(i, 0, 1) << 11
%shimm8 5:s8 13:1 !function=expand_shimm8	expand_shimm8(sextract(i, 5, 8)) << 1

以 RISC-V 的 U-type 指令为例：

其中，imm 占 insn[31:12]，共20位，rd 占 insn[11:7]，且 imm 需要做符号扩展后 左移 12 位 (20-bit immediate is shifted left by 12 bits to form U immediates)。因此，如果我们要定义 RISC-V 的 U-type 指令，则可以声明成：

%rd       7:5
%imm_u    12:s20                 !function=ex_shift_12

20 表示占 20 bits

最后会生成如下的代码：

static void decode_insn32_extract_u(DisasContext *ctx, arg_u *a, uint32_t insn)
{
    a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
    a->rd = extract32(insn, 7, 5);
}

static void decode_insn32_extract_u() 是由下文 Format 定义所生成的，而 arg_u *a 则是由 Argument Set 定义所生成的，将会在后面的部分再做说明。

可以看到：

a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
a->rd = extract32(insn, 7, 5);

a->imm 是由 insn[31:12] 所取得并做符号扩展，且会再调用 ex_shift_12() 来 左移 12 个 bits。

P.S. RISC-V 的 ex_shift_12() 是通过定义在./target/riscv/translate.c 中 EX_SH 这个 macro 所展开的：

#define EX_SH(amount) \
    static int ex_shift_##amount(DisasContext *ctx, int imm) \
    {                                         \
        return imm << amount;                 \
    }
EX_SH(1)
EX_SH(2)
EX_SH(3)
EX_SH(4)
EX_SH(12)

a->rd 是由 insn[11:7] 所取得。

此外，在 Decodetree 的 spec 中也有提到，我们可以通过只定义 !function 来直接调用该 function。在这种情况下，只有 DisasContext 会被传入该 function。

如 ARM Thumb ./target/arm/t16.decode 就有定义：

# Set S if the instruction is outside of an IT block.
%s               !function=t16_setflags

static void disas_t16_extract_addsub_2i(DisasContext *ctx, arg_s_rri_rot *a, uint16_t insn)
{
    a->imm = extract32(insn, 6, 3);
    a->rn = extract32(insn, 3, 3);
    a->rd = extract32(insn, 0, 3);
    a->s = t16_setflags(ctx); 
    a->rot = 0;
}

请注意，未包含任何 unnamed_fields 或 !function 的 Field 会被视为错误。

Argument Set

Argument Set 定义用来保存从指令中所截取出来各字段的值。

args_def    := '&' identifier ( args_elt )+ ( !extern )?
args_elt    := identifier

其语法由 & 开头，随后紧接着一个或多个的 identifier ，并可再加上可选的 !extern 。

identifier 可由开发者自订，如：regs、loadstore… 等。
!extern 则表示是否在其他地方已经由其他的 decoder 定义过。如果有该字段，就不会再次生成对应的 argument set struct。

Argument Set 示例

例1：

&ampreg3 ra rb rc

会生成以下的 argument set struct：

typedef struct {
    int ra;
    int rb;
    int rc;
} arg_reg3;

例2：

&loadstore reg base offset

则会生成以下的 argument set struct：

typedef struct {
    int base;
    int offset;
    int reg;
} arg_loadstore;

因此，以刚刚的 RISC-V U-type 指令为例，我们需要从指令中截取 imm 及 rd 字段的值，可以声明其 argument set 如下：

&u    imm rd

最后会生成以下的 argument set struct：

typedef struct {
    int imm;
    int rd;
} arg_u;

此 argument set struct 会被传入由 Format 定义所生成的 extract function：

static void decode_insn32_extract_u(DisasContext *ctx, arg_u *a, uint32_t insn)
{
    a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
    a->rd = extract32(insn, 7, 5);
}

所传入的arg_u 会保存从指令中截取出的 imm 及 rd 字段的值，待后续使用。

Format

Format 定义了指令的格式 (如 RISC-V 中的 R、I、S、B、U、J-type)，并会生成对应的 decode function。

fmt_def      := '@' identifier ( fmt_elt )+
fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
fixedbit_elt := [01.-]+
field_elt    := identifier ':' 's'? number
field_ref    := '%' identifier | identifier '=' '%' identifier
args_ref     := '&' identifier

其语法由 @ 开头，随后紧接着一个 identifier 及一个以上的 fmt_elt。

identifier 可由开发者自订，如：opr、opi… 等。
fmt_elt 则可以采用以下不同的语法：
- fixedbit_elt 包含一个或多个 0、1、.、-，每一个代表指令中的 1 个 bit。
  - . 代表该 bit 可以用 0 或是 1 来表示。
  - - 代表该 bit 完全被忽略。
- field_elt 可以用 Field 的语法来声明。
  - Eg：ra:5、rb:5、lit:8
- field_ref 有下列两种格式 (以下范例参考上文所定义之 Field)：
  - '%' identifier：直接参考一个被定义过的 Field。
    - 如：%rd，会生成：
```
a->rd = extract32(insn, 7, 5);
```
  - identifier '=' '%' identifier：直接参考一个被定义过的 Field，但通过第一个 identifier 来重命名其所对应的 argument 名称。此方式可以用来指定不同的 argument 名称来参考至同一个 Field。
    - 如：my_rd=%rd，会生成：
```
a->my_rd = extract32(insn, 7, 5); 
```
- args_ref 指定所传入 decode function 的 Argument Set。若没有指定 args_ref 的话，Decodetree 会根据 field_elt 或 field_ref 自动生成一个 Argument Set。此外，一个 Format 最多只能包含一个 args_ref。

当 fixedbit_elt 或 field_ref 被定义时，该 Foramt 的所有的 bits 都必须被定义 (可通过 fixedbit_elt 或 . 来定义各个 bits，空格会被忽略)。

Format 示例

@opi    ...... ra:5 lit:8    1 ....... rc:5

定义了 op1 这个 Format，其中：

insn[31:26] 可为 0 或 1。
insn[25:21] 为 ra。
insn[20:13] 为 lit。
insn[12] 固定为 1。
insn[11:5] 可为 0 或 1。
insn[4:0] 为 rc。

此 Format 会生成以下的 decode function：

typedef struct {
    int lit;
    int ra;
    int rc;
} arg_decode_insn320;

static void decode_insn32_extract_opi(DisasContext *ctx, arg_decode_insn320 *a, uint32_t insn)
{
    a->ra = extract32(insn, 21, 5);
    a->lit = extract32(insn, 13, 8);
    a->rc = extract32(insn, 0, 5);
}

由于我们没有指定 args_ref，因此 Decodetree 根据了 field_elt 的定义，自动生成了 arg_decode_insn320 这个 Argument Set。

以 RISC-V I-type 指令为例：

# Fields:
%rs1       15:5
%rd        7:5

# immediates:
%imm_i    20:s12

# Argment sets:
&i    imm rs1 rd

@i       ........ ........ ........ ........ &i      imm=%imm_i     %rs1 %rd

定义了 i 这个 Format，其中：

insn[31:20] 为 imm，且为符号扩展。
insn[19:5] 为 rs1。
insn[11:7] 为 rd。

此外，我们可以看到：

此 Format 指定了 Argument Set：&i。 &i 中必须包含所有有用到的 arguments (也就是：imm、rs1 及 rd)
imm 是通过重命名的方式来参考 %imm_i 这个 Field。

此范例会生成以下的 decode function：

typedef struct {
    int imm;
    int rd;
    int rs1;
} arg_i;


static void decode_insn32extract_i(DisasContext *ctx, arg_i *a, uint32_t insn)
{
    a->imm = sextract32(insn, 20, 12); 
    a->rs1 = extract32(insn, 15, 5);
    a->rd = extract32(insn, 7, 5);
}

相比于第一个范例，由于这次我们有指定 args_ref：&i，因此对应的 arg_i 会被传入 decode function。

回到先前的 RISC-V U-type 指令，我们可以如同 I-type 指令定义其格式：

# Fields:
%rd        7:5

# immediates:
%imm_u    12:s20                 !function=ex_shift_12

# Argument sets:
&u    imm rd

@u       ....................      ..... ....... &u      imm=%imm_u          %rd

定义了 u 这个 Format，其中：

insn[31:12] 为 imm，且为符号扩展。
insn[11:7] 为 rd。

会生成以下的 decode function：

typedef struct {
    int imm;
    int rd;
} arg_u;


static void decode_insn32_extract_u(DisasContext *ctx, arg_u *a, uint32_t insn)
{
    a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
    a->rd = extract32(insn, 7, 5);
}

我们可以看到：

此 Format 指定了 Argument Set：&u。 &u 中必须包含所有有用到的 arguments (也就是：imm、rd)
imm 是通过重命名的方式来参考 %imm_u 这个 Field。

Pattern

Pattern 实际定义了一个指令的 decode 方式。Decodetree 会根据 Patterns 的定义，来动态产生出对应的 switch-case decode 判断分支。

pat_def      := identifier ( pat_elt )+
pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
fmt_ref      := '@' identifier
const_elt    := identifier '=' number

其语法由用户所定义的 identifier，随后紧接着一个以上的 pat_elt。

identifier 可由开发者自订，如：addl_r、addli … 等。
pat_elt 则可以采用以下不同的语法：
- fixedbit_elt 与在 Format 中 fixedbit_elt 的定义相同。
- field_elt 与在 Format 中 field_elt 的定义相同。
- field_ref 与在 Format 中 field_ref 的定义相同。
- args_ref 与在 Format 中 args_ref 的定义相同。
- fmt_ref 直接参考一个被定义过的Format。
- const_elt 可以直接指定某一个 argument 的值。

由于 Pattern 实际定义了一个指令的 decode 方式，因此所有的 bits 及 arguments (如果有参考 args_ref 的话) 都必须明确的被定义，如果在搭配了所有的 pat_elt 后还有未定义的 bits 或是 arguments 的话，Decodetree 便会报错。

此外，Pattern 所产生出来的 decoder，最后还会调用对应的 translator function。translator function 需开发者自行定义。

Pattern 示例

addl_i   010000 ..... ..... .... 0000000 ..... @opi

定义了 addl_i 这个指令的 Pattern，其中：

insn[31:26] 为 010000。
insn[11:5] 为 0000000。
参考了 Format 示例中定义的 @opi Format。
由于 Pattern 的所有 bits 都必须明确的被定义，因此 @opi 必须包含其余 insn[25:12] 及 insn[4:0] 的格式定义，否则 Decodetree 便会报错。

最后 addl_i 的 decoder 还会调用 trans_addl_i() 这个 translator function。

搭配之前介绍的 Fields、Argument Sets 及 Formats，让我们再看几个完整的例子应该会更清楚 Decodetree 是怎产生一个指令的 decoder 的。

首先是 RISC-V 的 lui 及 auipc 指令：

# Fields:
%rd        7:5

# immediates:
%imm_u    12:s20                 !function=ex_shift_12

# Argument sets:
&u    imm rd

# Formats:
@u       ....................      ..... ....... &u      imm=%imm_u          %rd

# Patterns
lui      ....................       ..... 0110111 @u
auipc    ....................       ..... 0010111 @u

会产生以下 lui 及 auipc 的 decoder：

typedef struct {
    int imm;
    int rd;
} arg_u;


static void decode_insn32_extract_u(DisasContext *ctx, arg_u *a, uint32_t insn)
{
    a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
    a->rd = extract32(insn, 7, 5);
}

static bool decode_insn32(DisasContext *ctx, uint32_t insn)
{
    union {
        arg_u f_u;
    } u;

    decode_insn32_extract_u(ctx, &u.f_u, insn);
    switch (insn & 0x0000007f) {
    case 0x00000017:
    
    
        if (trans_auipc(ctx, &u.f_u)) return true;
        return false;
    case 0x00000037:
    
    
        if (trans_lui(ctx, &u.f_u)) return true;
        return false;
    }
    return false;
}

回顾到目前为止所介绍的：

Argument Sets：&u 这个 argument set 包含了 imm 及 rd 这两个 arguments。
```
typedef struct {
    int imm;
    int rd;
} arg_u;
```
Fields： imm 及 rd 分别位在 insn[31:12] 及 insn[11:7]，且 imm 为符号扩展。最后在截取出 imm 的值后，还会调用 ex_shift_12()。
```
a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
a->rd = extract32(insn, 7, 5);
```
Formats：@u 定义了 RISC-V U-type 指令的格式
- 参考了 &u 这个 Argument Set，因此 decode function 会传入 arg_u 作为参数。
- insn[31:12] 参考了 imm_u 这个 Field (并重命名为 imm)
- insn[11:7] 参考了 rd 这个 Field。
```
static void decode_insn32_extract_u(DisasContext *ctx, arg_u *a, uint32_t insn)
{
    a->imm = ex_shift_12(ctx, sextract32(insn, 12, 20));
    a->rd = extract32(insn, 7, 5);
}
```
Patterns：
- lui 的 opcode (insn[6:0]) 为 0010111，也就是 0x17，在产生出来的 switch-case 中可以看到其对应的 case。
- lui 的 decoder 最后调用了 trans_lui()，并传入 DisasContext 及经由 decode_insn32_extract_u() 所解析出来的 arg_u。
- auipc 的 opcode (insn[6:0]) 为 0110111，也就是 0x37，在产生出来的 switch-case 中可以看到其对应的 case。
- auipc 的 decoder 最后调用了 trans_auipc()，并传入 DisasContext 及经由 decode_insn32_extract_u() 所解析出来的 arg_u。
- P.S. 这边由于 Decodetree 发现 lui 及 auipc 可以共用 decode_insn32_extract_u()，因此将其提到了 switch-case 之外。
```
static bool decode_insn32(DisasContext *ctx, uint32_t insn)
{
    union {
        arg_u f_u;
    } u;

    decode_insn32_extract_u(ctx, &u.f_u, insn);
    switch (insn & 0x0000007f) {
    case 0x00000017:


        if (trans_auipc(ctx, &u.f_u)) return true;
        return false;
    case 0x00000037:


        if (trans_lui(ctx, &u.f_u)) return true;
        return false;
    }
    return false;
}
```
我们另外可以发现，Pattern + Format 把所有的 32-bits 都给了明确的定义：
- Pattern 定义了 opcode (insn[6:0])。
- Format 参考了 imm (insn[31:12]) 及 rd (insn[11:7])。
如果有任何未明确定义的 bits 的话，Decodetree 便会报错，例如如果我们将 lui 的 opcode 最高 2 个 bits (insn[6:5]) 由 01 改成 ..：
```
lui      ....................       ..... ..10111 @u
```
Decodetree 在解析时，便会报错：

./insn32.decode:17: error: (‘bits left unspecified (0x00000060)’,)

Decodetree 提醒我们，insn[6:5] (0x00000060) 尚未给出明确定义，并会显示出其错误的行数。

trans_lui() 和 trans_auipc() 被定义在 target/riscv/insn_trans/trans_rvi.inc.c：
```
static bool trans_lui(DisasContext *ctx, arg_lui *a)
{
    if (a->rd != 0) {
        tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm);
    }
    return true;
}

static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
{
    if (a->rd != 0) {
        tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm + ctx->base.pc_next);
    }
    return true;
}
```
可以看到 trans_*() 负责实际指令的业务逻辑及产生对应的 TCG codes。

如同先前所介绍，Patterns 的 pat_elt 也可以采用 field_elt 语法，如 RISC-V 的 fence 指令：

fence    ---- pred:4 succ:4 ----- 000 ----- 0001111

insn[27:24] 为 pred。
insn[23:20] 为 succ。
insn[14:12] 固定为 000。
insn[6:0] 为 opcode (0001111)。
没有参考任何的 Format。
剩下的 insn[31:28]、insn[19:15]、insn[11:7] 被声明为 -，因此就算没有被明确定义也没有关系。

所生成 fence 的 decoder 如下：

typedef struct {
    int pred;
    int succ;
} arg_decode_insn320;


static void decode_insn32_extract_decode_insn32_Fmt_0(DisasContext *ctx, arg_decode_insn320 *a, uint32_t insn)
{
    a->pred = extract32(insn, 24, 4);
    a->succ = extract32(insn, 20, 4);
}

static bool decode_insn32(DisasContext *ctx, uint32_t insn)
{
    union {
        arg_decode_insn320 f_decode_insn320;
    } u;

    decode_insn32_extract_decode_insn32_Fmt_0(ctx, &u.f_decode_insn320, insn);
    switch (insn & 0x0000707f) {
    case 0x0000000f:
    
    
        if (trans_fence(ctx, &u.f_decode_insn320)) return true;
        return false;
    }
    return false;
}

值得注意的是，虽然这次我们没有参考任何的 Argument Set，但 Decodetree 还是替我们生成了一个包含 pred 和 succ 的 arg_decode_insn320 。

trans_fence() 同样是被定义在 ./target/riscv/insn_trans/trans_rvi.inc.c：

static bool trans_fence(DisasContext *ctx, arg_fence *a)
{
  
    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
    return true;
}

Pattern Groups

Pattern Groups 由一个以上的 Patterns 所组成，其主要差别是不同 Patterns 之间的 bits 可以 overlap。当同组中有多个 Patterns 时，会依据该组中各 Pattern 的声明顺序依序判断目前的指令是否符合其定义。除此之外，当符合的 Pattern 其 trans_*() 回传值为 false 时，也会被视为不相符，而继续判断该组中的下一个 Pattern。因此 Pattern Groups 非常适合将多个相似格式的指令给组成同一个 Pattern Group。

group    := '{' ( pat_def | group )+ '}'

各 Pattern Group 以 { 开头，并以 } 结尾，且允许 nested pattern groups 的存在，其他语法皆与 Pattern 相同。

Pattern Group 示例

{
  {
    nop   000010 ----- ----- 0000 001001 0 00000
    copy  000010 00000 r1:5  0000 001001 0 rt:5
  }
  or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
}

会产生以下的 decoder：

switch (insn & 0xfc000fe0) {
case 0x08000240:
  
    if ((insn & 0x0000f000) == 0x00000000) {
    
        if ((insn & 0x0000001f) == 0x00000000) {
        
            extract_decode_Fmt_0(&u.f_decode0, insn);
            if (trans_nop(ctx, &u.f_decode0)) return true;
				
      }
      if ((insn & 0x03e00000) == 0x00000000) {
      
          extract_decode_Fmt_1(&u.f_decode1, insn);
          if (trans_copy(ctx, &u.f_decode1)) return true;
				　
      }
  }
  extract_decode_Fmt_2(&u.f_decode2, insn);
  if (trans_or(ctx, &u.f_decode2)) return true;
  return false;
}

当指令的值符合 nop 及 copy 这个内层 Pattern Group 时，会先判断该指令是否符合 nop 指令的定义，且 trans_nop() 的回传值为 true。否则的话，就会继续判断是否符合同组中的 copy 指令。若都不符，就会再判断是否符合外层 Pattern Group 的 or 指令。若仍不符，才会回传 false 表示 decode 失败。

与单纯使用 Pattern 最大不同的是，当一 Pattern 的 trans_*() 回传值为 false 时，不会直接回传 false (代表 decode 失败)，而是会接续着判断后续的 Patterns 是否相符。

RISC-V Compressed-Extension 中的 c.ebreak、c.jalr、及 c.add 指令，由于这三个指令的格式非常相似，因此非常适合使用 Pattern Group 来定义：

RISC-V spec. 中定义：

C.EBREAK指令与C.ADD指令共享相同的opcode，但是rd和rs2都为zero，因此也可以使用CR格式。
C.JALR指令只有在rs1≠x0时才有效；当rs1=x0时，对应的代码点是C.EBREAK指令。
C.ADD指令只有在rs2≠x0时才有效；当rs2=x0时，对应的代码点是C.JALR和C.EBREAK指令。具有rs2̸=x0和rd=x0的代码点是HINTs。

c.ebreak、c.jalr、c.add 三个指令：

insn[15:13]、insn[12]、insn[1:0] 的值皆相同。
当 insn[11:7] 且 insn[6:2] 的值皆为 0 (rs1=0 且 rs2=0) 时为 c.ebreak 指令。
当只有 insn[11:7] 的值为 0 (rs1=0 且 rs2≠0) 时为 c.jalr 指令。
否则为 c.add 指令 (rs1≠x0 且 rs2≠0)。

# Fields
%rd        7:5
%rs2_5     2:5

# Argument Sets
&r         rd rs1 rs2   !extern
&i         imm rs1 rd   !extern

# Formats
@cr        ....  ..... .....  .. &r      rs2=%rs2_5       rs1=%rd     %rd
@c_jalr    ... . .....  ..... .. &i      imm=0 rs1=%rd

# Pattern Groups
{
  ebreak          100 1  00000  00000 10
  jalr            100 1  .....  00000 10 @c_jalr rd=1  # C.JALR
  add             100 1  .....  ..... 10 @cr
}

所生成的 decoder 如下：

static void decode_insn16_extract_c_jalr(DisasContext *ctx, arg_i *a, uint16_t insn)
{
    a->imm = 0; 
    a->rs1 = extract32(insn, 7, 5);
}

static void decode_insn16_extract_cr(DisasContext *ctx, arg_r *a, uint16_t insn)
{
    a->rs2 = extract32(insn, 2, 5);
    a->rs1 = extract32(insn, 7, 5);
    a->rd = extract32(insn, 7, 5);
}

static void decode_insn16_extract_decode_insn16_Fmt_2(DisasContext *ctx, arg_decode_insn162 *a, uint16_t insn)
{}


static bool decode_insn16(DisasContext *ctx, uint16_t insn)
{
    union {
        arg_decode_insn162 f_decode_insn162;
        arg_i f_i;
        arg_r f_r;
    } u;

    switch (insn & 0x0000f003) {
    case 0x00009002:
    
        if ((insn & 0x00000ffc) == 0x00000000) {
        
        
            decode_insn16_extract_decode_insn16_Fmt_2(ctx, &u.f_decode_insn162, insn);
            if (trans_ebreak(ctx, &u.f_decode_insn162)) return true;
        
        }
        if ((insn & 0x0000007c) == 0x00000000) {
        
        
            decode_insn16_extract_c_jalr(ctx, &u.f_i, insn);
            u.f_i.rd = 1; 
            if (trans_jalr(ctx, &u.f_i)) return true;
        
        }
    
        decode_insn16_extract_cr(ctx, &u.f_r, insn);
        if (trans_add(ctx, &u.f_r)) return true;
        return false;
    }
    return false;
}

当指令格式符合 c.ebreak、c.jalr、c.add 的 Pattern Group 时，会依序判断该指令是否符合 c.ebreak、c.jalr、c.add 的定义以及其对应的 trans_*()。

另外值得一提的是，在 c_jalr Format 和 jalr Pattern 中有分别指定其 imm 及 rd 的值为 0，所生成的 codes 也会分别在对应的地方将该值设为 0 (见 codes 注解说明)。

总结

以上就是 Decodetree 的语法说明。通过 Decodetree，我们不用再像以前以样写一大包的 switch-case 来 decode 指令。将不同类型的指令写至不同的 decode 档，不仅方便维护，阅读起来也更为容易。

--translate：translator function 的 prefix，默认为 trans。一旦指定后，translator function 的 scope 就不会再是 static。
--decode：decode function 的 prefix，默认为 decode，且 scope 为 static。一旦指定后，decode function 的 scope 就不会再是 static。
--static-decode：如同 --decode，不过 decode function 的 scope 仍维持为 static。
-o / --output：指定生成的 decoder .c 档路径。
-w / --insnwidth：指令长度，eg：32 or 16，默认为 32。
--varinsnwidth：指令为不定长度。
最后一个参数为输入的 decode 档路径。

运行范例：

./decodetree.py -o target/riscv/decode_insn16.inc.c --static-decode decode_insn16 \
    -w 16 ./insn16.decode

static inline int32_t sextract32(uint32_t value, int start, int length){    assert(start >= 0 && length > 0 && length <= 32 - start);        return ((int32_t)(value << (32 - length - start))) >> (32 - length);}

本文链接： https://lifeislife.cn/2024/03/31/QEMU-Decodetree详解/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

夜云泊软件工程师

Game is Game