Thursday, August 21. 2008
A few months ago, when I first started looking at exception handling support for ggx, I discovered that it would be really easy if only I had a few spare registers. ggx only has 8 registers, and gcc likes to claim some for itself now and then. EH support is possible with only 8, but far too tricky for my taste.
About that same time I noticed that gcc was emitting a lot of code that looked like:
add.l $r1, $r1, $r2
This means "add the contents of $r1 to $r2 and store them in $r1". We don't actually need three operands for these instructions. Two would do just fine as long as we always put the result value in one of the operand registers.
If you recall, we currently encode three-operand instructions into 16 bits like so:
0ooooooaaabbbccc
oooooo - FORM 1 opcode number
aaa - operand A
bbb - operand B
ccc - operand C
Those three-bit operand fields only let us address 8 registers. But if we only have to encode two operands in a 16-bit instruction, we can do something like this:
0oooooooaaaabbbb
Using 4 bits to represent the operands instead of 3 effectively doubles the number of registers we can address to 16!
The down side is that it's possible that we'll end up with larger code in some cases. For instance,
add.l $r1, $r2, $r3
must now be...
mov.l $r1, $r2
add.l $r1, $r3
Happily, however, giving the compiler 8 more registers actually results in significantly smaller code. Here are some relative code sizes from sample MiBench benchmarks:
The smaller code must be the result of spilling fewer registers to the stack. A hardware implementation of this would also win from fewer memory accesses.
We also see a corresponding performance improvement for the most part. Each of these benchmarks runs from 10s to 100s of millions of instructions. Here are the relative instruction counts for the 2-operand vs 3-operand ISA:
That automotive_bitcount benchmark will need some investigation, but I feel good enough about this change that I've committed it to the ggxdev repository:
http://mercurial.intuxication.org/hg/ggxdev/rev/7dd29acfd29a
This patch also includes a change to how the simulator dumps instruction traces. It turns out that dumping everything to a .csv is smart because OpenOffice.org's calc program makes a great instruction trace viewer.
Sat, 13.12.2008 09:02
Yes, I know about llvm. I wish somebody would rewrite the qemu dynamic compiler to use it!