Summary. Previous code-generation systems sacrified speed for portability. At the expense of ease-of-use (VCODE offloads the burden of global optimizations such as register allocation and instruction scheduling, etc.. to the client which must code to a MIPS-like instruction set) VCODE generates machine code very efficiently by doing so in-place without the use of intermediate representations. VCODE has been designed in such a way that to retarget can take as little as 1-4 days for a RISC architecture.
iptr mkplus1(struct v_code *ip) { v_ref arg[1]; /* store register of our one * arg in local arg[1] */ v_lambda("%i", arg, V_LEAF, ip);/* allocate space on stack for callee-saved * regs, space for instructions * to perform save/restore of * callee-saved regs. */ v_addii(arg[0], arg[0], 1); /* add the integer immediate */ v_reti(arg[0]); /* return integer */ return (iptr)v_end(); }Paper's example from the MIPS implementation of how this works:
/* vcode instruction to add two unsigned integer registers on the MIPS arch */ #define v_addu(rd, rs1, rs2) addu(rd,rs1,rs2) /* Macro to generate the MIPS addu instruction (opcode 0x12) */ #define addu(dst, src1, src2) (*v_ip++=(((src1) <<21) | ((src2<<16) | (((dst)<<11) | 0x21)) # MIPS assembly code generated by gcc -O2 to implement the "addu" macro lw v1, 1244(gp) #allocate instruction sll a1, a1, 21 #shift and then or in the register vals sll a2, a2, 16 or a1, a1, a2 sll a0, a0, 11 or a1, a1, a0 addiu v0, v1, 4 #bump instruction pointer sw v0, 1244(gp) #store the new instruction pointer ori a1, a1, 0x21 #or in the opcode sw a1, 0(v1) #store the instruction in memory
Easily retargetable. VCODE was designed to be easily retargetable. Author claims that to retarget to a RISC architecture would take approximately 1-4 days. The VCODE instruction set consists of a core layer that must be retargeted and a multiple extension layers that are built on top of the core.
To retarget, must
Applications. The author used VCODE for three clients (to date): tcc (`C compiler), DPF, ASH (user-level network protocol implementation that was able to leverage low-level IR to optimize in a way that higher-level code doesn't allow and compose multiple data manipulation steps from different layers [byte-swapping, checksum, copy] into a single pass.
Questions. Peephole optimizer and "deep instruction reorder buffers".