VCODE: A Retargetable, Extensible, Very Fast Dynamic Code Generation System

Dawson R. Engler (MIT), 1996

Summary. Previous code-generation systems sacrified speed for portability. At the expense of ease-of-use (VCODE offloads the burden of global optimizations such as register allocation and instruction scheduling, etc.. to the client which must code to a MIPS-like instruction set) VCODE generates machine code very efficiently by doing so in-place without the use of intermediate representations. VCODE has been designed in such a way that to retarget can take as little as 1-4 days for a RISC architecture.

More detail...

Client can violate VCODE abstractions if more efficiency desired in three ways:

Can dynamically control register class (callee/caller- saved, unavailable) to handle special cases like interrupt handlers where all registers are "live".
Trade register allocation flexibility for faster codegen speeds by using hard-coded registers.
Can use portable macros to perform instruction scheduling of load- and branch-delay slots.

Codegen steps. Client writes to vcode macros:

    iptr mkplus1(struct v_code *ip) {
	v_ref arg[1];			/* store register of our one
				         * arg in local arg[1] */
	v_lambda("%i", arg, V_LEAF, ip);/* allocate space on stack for callee-saved
					 * regs, space for instructions
					 * to perform save/restore of
					 * callee-saved regs. */
	v_addii(arg[0], arg[0], 1);	/* add the integer immediate */
	v_reti(arg[0]);			/* return integer */
	return (iptr)v_end();
    }

Paper's example from the MIPS implementation of how this works:

/* vcode instruction to add two unsigned integer registers on the MIPS arch */
#define v_addu(rd, rs1, rs2) addu(rd,rs1,rs2)

/* Macro to generate the MIPS addu instruction (opcode 0x12) */
#define addu(dst, src1, src2) (*v_ip++=(((src1) <<21) | ((src2<<16) | (((dst)<<11) | 0x21))

# MIPS assembly code generated by gcc -O2 to implement the "addu" macro 
	lw	v1, 1244(gp)	#allocate instruction
	sll	a1, a1, 21	#shift and then or in the register vals
	sll	a2, a2, 16
	or	a1, a1, a2
	sll	a0, a0, 11
	or	a1, a1, a0
	addiu	v0, v1, 4	#bump instruction pointer
	sw	v0, 1244(gp)	#store the new instruction pointer
	ori	a1, a1, 0x21	#or in the opcode
	sw	a1, 0(v1)	#store the instruction in memory

Easily retargetable. VCODE was designed to be easily retargetable. Author claims that to retarget to a RISC architecture would take approximately 1-4 days. The VCODE instruction set consists of a core layer that must be retargeted and a multiple extension layers that are built on top of the core.

To retarget, must

construct macros to generate executable code for each machine instruction to be used,
map the core VCODE instruction set onto these macros, and
implement the machine's default calling conventions and activation record management.

Applications. The author used VCODE for three clients (to date): tcc (`C compiler), DPF, ASH (user-level network protocol implementation that was able to leverage low-level IR to optimize in a way that higher-level code doesn't allow and compose multiple data manipulation steps from different layers [byte-swapping, checksum, copy] into a single pass.

Miscellaneous

Engler is proving to be one of my favorite authors due not only to interesting content but also interesting use of English:

"eshewing an intermediate representation"
"boilerplate issues"
"has long been a goal of the networking community"
" have no natural high-level idiom"
" is inelegant but workable"
" must be couched withing the context of a well-formed data structure"
"generated code is generally not created ex nihilo"
see DPF paper for other examples.

Questions. Peephole optimizer and "deep instruction reorder buffers".