ggx: three's a crowd. Switching to two-operand instructions.

Thursday, August 21. 2008

A few months ago, when I first started looking at exception handling support for ggx, I discovered that it would be really easy if only I had a few spare registers. ggx only has 8 registers, and gcc likes to claim some for itself now and then. EH support is possible with only 8, but far too tricky for my taste.

About that same time I noticed that gcc was emitting a lot of code that looked like:

add.l $r1, $r1, $r2

This means "add the contents of $r1 to $r2 and store them in $r1". We don't actually need three operands for these instructions. Two would do just fine as long as we always put the result value in one of the operand registers.

If you recall, we currently encode three-operand instructions into 16 bits like so:

0ooooooaaabbbccc

oooooo - FORM 1 opcode number
aaa - operand A
bbb - operand B
ccc - operand C

Those three-bit operand fields only let us address 8 registers. But if we only have to encode two operands in a 16-bit instruction, we can do something like this:

0oooooooaaaabbbb

Using 4 bits to represent the operands instead of 3 effectively doubles the number of registers we can address to 16!

The down side is that it's possible that we'll end up with larger code in some cases. For instance,

add.l $r1, $r2, $r3

must now be...

mov.l $r1, $r2
add.l $r1, $r3

Happily, however, giving the compiler 8 more registers actually results in significantly smaller code. Here are some relative code sizes from sample MiBench benchmarks:



The smaller code must be the result of spilling fewer registers to the stack. A hardware implementation of this would also win from fewer memory accesses.

We also see a corresponding performance improvement for the most part. Each of these benchmarks runs from 10s to 100s of millions of instructions. Here are the relative instruction counts for the 2-operand vs 3-operand ISA:



That automotive_bitcount benchmark will need some investigation, but I feel good enough about this change that I've committed it to the ggxdev repository: http://mercurial.intuxication.org/hg/ggxdev/rev/7dd29acfd29a

This patch also includes a change to how the simulator dumps instruction traces. It turns out that dumping everything to a .csv is smart because OpenOffice.org's calc program makes a great instruction trace viewer.

Mauve's second life?

Thursday, June 19. 2008

Yes, it's fantastic that IcedTea passes that Java TCK. Congratulations! But, personally, I'm more jazzed that Sun will be incorporating Mauve testing into their OpenJDK development process. Mauve is only about a quarter the size of the TCK but, to my knowledge, it is a giant among stand-alone FOSS test suite efforts, with 50+ contributors and ~250k lines of test code.

The Mauve site has a comically dated FAQ (my fault). And, like many an FAQ, it doesn't really list Frequently Asked Questions so much as it lists questions we were hoping people would ask frequently.

It took 10 years, but David Herron finally popped the big question (#8): "I'm Sun. Can I use the Mauve Project's testsuite?". Yes, David. You can!

Fedora 9 and jack audio

Monday, May 12. 2008

The audio production apps in F9 appear to be working well with jack audio. Just be sure to add yourself to the jackuser group via System->Administration->Users and Groups. This will give you the permissions you need to run the jack audio server in Real Time mode (the default setting in qjackctl).

There's still no audio production group in comps.xml, so you'll have to do something like the following to install a nice collection of audio apps:

$ yum install vkeybd hydrogen qjackctl ardour rosegarden4 \*dssi\* zynadd\* phasex libfreebob ladspa\* swh\* seq24 csound sooperlooper

Remember, you don't need a MIDI keyboard to have fun. Just fire up vkeybd and use the qjackctl Connections windows to hook it up to synths in the ALSA tab.



Also make sure that your soft synth audio is routed to the system audio ports.



Have fun! And don't forget about the fedora-music mailing list.

ggx: mailing list and mercurial host

Monday, May 5. 2008

There's a new google group for ggx discussion: http://groups.google.com/group/ggxdev

Also, intuxication.org is hosting a mercurial repository with all of the tools sources. Details are available here.

I'm pretty impressed with how easy it was to create and use the mercurial repo at intuxication.org. Nice job guys!

ggx: MiBench and command line args

Tuesday, April 15. 2008

Regarding yesterday's question, I decided to enhance the simulator by passing simulated open/close/read/write software interrupts on through to the host. This lets me run MiBench benchmarks on my ggx simulator even though these programs read data files (remember -- there's no simulated OS running yet - just code on simulated bare metal).

MiBench programs also require command line arguments. I handled this by marshalling argc and argv from the host into the simulated memory at startup (sim_create_inferior()). I simply drop argc/argv at the simulated address 0x0. This is guaranteed not to be lost, as the ggx linker scripts currently have everything starting at 0x1000, so the first 0x1000 bytes are normally empty. Then I load $r0 and $r1 with argc and argv before calling main() in libgloss' crt0.S. Easy stuff! I poked around at some of the other sims in the src tree and didn't see any that did this trick. I wonder why? It's so handy...

And finally, I should mentioned that I've actually been using the MiBench fork from the MiDataSets project, not the one from the main MiBench page. It's much easier to execute build/test runs with a cross compiler and simulator on this forked version. I've tried out some of the benchmarks, and so far so good. Even a simple automotive benchmark executes well over a billion simulated instructions! Pretty soon I'll establish a baseline with which to evaluate my ISA tweaks.

I'll post tonight's patches on the ggx patch page tomorrow. Must sleep now.

ggx: trampolines, benchmarks, and rebasing patches

Monday, April 14. 2008

Somewhere along the line I botched one of my gcc patches and people have been complaining that they don't all apply nicely. Well today I'm posting two new big patches for src and gcc. These patches apply against the latest src and gcc development sources and contain all of the work to date.

The patches also include stdarg and trampoline support, my lazy gdb port, and a few simulator fixes (like sta.b should actually store a byte, not a word!). This brings the gcc testsuite's unexpected error count down to 315. Looking good!

A quick scan of the remaining test failures tells me that something is amiss with passing and returning aggregate types.

I've also been thinking about how to measure results of ISA changes. In a perfect world, EEMBC benchmarks would be freely available for all to use. EEMBC provides a terrific collection of benchmarks that are representative of different industries, and are all designed to build and run on bare metal embedded targets.

There's a Free work-alike to EEMBC called MiBench. The problem with the MiBench benchmarks is that they aren't designed to run on bare metal systems. For instance, they use file I/O to read and write benchmark data. So right now I think I have three options:

  • Modify MiBench to work more like EEMBC and run on bare metal.

  • Enhance my libgloss stubs to fake file I/O by using the host filesystem.

  • Port a simple OS like eCos to ggx, and then run that on qemu (using src/sim as the simulation core).

I don't know which way to go yet. Suggestions?

Today's patches are in the ggx patch archive.

ggx: stdarg, more ABI changes and a gdb surprise

Sunday, April 13. 2008

US Airways' mechanical problems forced me to stay by SFO one extra night. All that airport wait time gave me a chance to finally tackle stdarg support head on. And for this, I decided to change the ggx ABI one more time.

If you recall, the ggx ABI has the first two 32-bits of function arguments passed in registers $r1 and $r2 $r0 and $r1, and the rest follow on the stack. After much fiddling around, I decided to always reserve stack space for $r1 and $r2 $r0 and $r1 in the caller. For non-stdarg functions, this is just two wasted stack slots per frame. For functions like printf, the callee's function prologue copies registers $r1 and $r2 $r0 and $r1 into those slots and proceeds normally.

This makes the stdarg and non-stdarg ABIs fully compatible. Some architectures have different calling conventions for stdarg functions. It's possible that they may be slightly more space efficient, but if developers ever call printf() without first including stdio.h they'll be in for a surprise. Yes, modern C standards require that you include stdio.h, but it's still a nice property.

$ cat hello.c
#include <stdio.h>

int main()
{
  printf ("Hello %s %s %s! %g\n", "Anthony", "Thomas", "Green", (float) 3.14);
  return 0;
}
$ ggx-elf-gcc -g -o hello hello.c
$ ggx-elf-run hello
Hello Anthony Thomas Green! 3.14


On a different front, a while ago I complained that porting gdb required lots of bit parsing to do things like recognize function prologues. I recently visited Tom and Elyn in Boulder (I love that place!), and Tom pointed out that gdb uses many of those routines as fallbacks only if debug info isn't present. So, I copied the smallest gdb backend I could find (the Vitesse iq2000 port), and did three things: (1) s/iq2000/ggx/ in gdb , (2) describe the ggx registers to gdb, and (3) added a software breakpoint instruction (brk) to the ISA. It worked!
$ ggx-elf-gdb hello
GNU gdb 6.8.50.20080402-cvs
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=ggx-elf"...
(gdb) target sim
Connected to the simulator.
(gdb) load
(gdb) b main
Breakpoint 1 at 0x1094: file hello.c, line 5.
(gdb) run
Starting program: /tmp/hello

Breakpoint 1, main () at hello.c:5
5         printf ("Hello %s %s %s! %g\n", "Anthony", "Thomas", "Green", (float) 3.14);
(gdb) x/10i main
0x108c <main>:  ldi.l     $r5, 0xffffffe8
0x1092 <main+6>:        add.l   $sp, $sp, $r5
0x1094 <main+8>:        ldi.l   $r0, 0x8
0x109a <main+14>:       add.l  $r1, $sp, $r0
0x109c <main+16>:       ldi.l  $r0, 0x13ee0
0x10a2 <main+22>:       st.l   ($r1), $r0
0x10a4 <main+24>:       ldi.l  $r0, 0xc
0x10aa <main+30>:       add.l  $r1, $sp, $r0
0x10ac <main+32>:       ldi.l  $r0, 0x13ee8
0x10b2 <main+38>:       st.l   ($r1), $r0
Many parts are broken, but you can still step through instructions, and query memory/registers. This is a huge improvement over reading simulator traces, and I should have done it at the very beginning. Live and learn...

Patches forthcoming.

ggx: testsuite results, an ABI change and next steps

Friday, April 11. 2008

After a short break I've been looking at the ggx-elf-gcc testsuite results. A few obvious fixes later and this is where we stand:

                === gcc Summary ===

# of expected passes            44359
# of unexpected failures          767
# of unexpected successes           1
# of expected failures             80
# of unresolved testcases         134
# of untested testcases            35
# of unsupported tests            535
/home/green/ggx/bg/gcc/xgcc  version 4.4.0 20080410 (experimental) (GCC) 

Note that I also switched to GCC HEAD instead of the stable release branch. This has also caused me some grief as it's often difficult to know who is to blame for certain errors.

Adding proper stdarg support will take a big bite out those unexpected failures. Adding trampoline support (a GCC extension) will also knock that number down.

Speaking of GCC extensions, I recently modified the ABI to support GCC's nested functions extension. A function declared within another function is supposed to be able to access the local variables of the enclosing function. In order to support this, the compiler generates code to pass a hidden parameter, called the static chain pointer, to the nested function. The enclosing function's locals are indexed off of this static chain pointer. Many GCC ports reserve a register in which to pass the static chain pointer, but ggx only has eight registers, so I opted to pass it on the stack. The jsr and jsra instructions now reserve three stack slots: the return address, the caller's frame pointer and the static chain pointer.

Once I get trampolines and stdarg functions working, I'll post the entire patch set against GCC HEAD on the patch archive.

And right now I'm thinking that my first ISA optimization will be to make branches PC relative. Most branches only jump a few words away, but right now I'm using 4-byte absolute addresses for the branch targets. This is a huge waste.


F9 beta. TelePresence

Saturday, March 29. 2008

My laptop drive died the other day, so I used the opportunity to load up the F9 beta using an encrypted file system, and so far so good (apart from the compiz disappearing window decorations bug).

I also had my first Cisco TelePresence-based meeting the other day, and... wow! This is the same system you may have seen from this clip from last season's 24.

We were at three different sites, but it gave a convincing illusion that we were sitting at the same table. Hats off to Cisco for pulling it off. It was amazing.

The Crusher

Wednesday, March 26. 2008

My little ggx project is fun, but it's not the first time I've messed around with tools for a virtual machine architecture. Years ago I worked on a project with Andrew Haley and Jesper Skov called "The Crusher".

The driving assumption behind the crusher project was that memory was expensive, and anything that could be done to reduce code size for embedded systems designers would be a good thing.

Our goal was to produce a toolchain capable of generating directly executable compressed versions of programs. And by directly executable, I mean that program execution alone does not consume RAM (unlike UPX-like program packers).

Given this assignment, Andrew devised the clever strategy, partially inspired by an IBM research paper[1]. The approach was to compile programs down to an abstract virtual machine instruction set, for which we would generate a compact application specific instruction encoding using a special assembler that, as a by-product of the assembly process, produced an application specific interpreter for those instructions![2]

Instructions had variable bit widths. The opcode values were generated through Huffman coding the static usage of instructions within your application. The most frequently used instruction might only be a single bit long, but infrequently used instructions could be very long. Operand references were also Huffman coded, but on a function-by-function basis. If instructions within a given function only referenced four operands, the operand encoding for instructions within that function would only be two bits long. The toolchain would automatically link your program to the interpreter and you'd end up with a normal executable, just smaller.

We also had the ability to annotate individual functions as "native", meaning that they would be compiled down to native machine instructions and not compressed. If you assume that 90% of your program's execution time will be spent within 10% of your application code, then just leave that 10% as native and "crush" the rest. In theory your program should run only slightly slower, and still benefit from the code compression process.[3]

Our first prototype compiler was built with the SUIF toolkit. We eventually implemented a second version using GCC.

Our most impressive demo was a mostly-playable build of DOOM that was about one third the size of the original native program on MIPS.

It should have been eventually possible to siliconize the crusher interpreter, but we never got that far. Memory just got a lot cheaper. It also turns out that you can do pretty well by sticking a general purpose software decompressor in the page fault handler of your virtual memory system. This is what some commercial embedded systems were doing at the time. Or, even more exotically, by sticking a hardware decompressor between main memory and your processor's instruction cache. I believe that IBM produced a PowerPC variant with this feature.

Incidentally, this is where libffi was developed in its current form. We needed a way to call aribitrary native functions from the crusher interpreter. GCC's "ADDRESSOF" middle-end feature and related optimizations (as re-implemented by Jason Merrill) were also a by-product of this project, so it was not all for naught.

[1] Reducing execution parameter through correspondence in computer architecture,
Scott P. Wakefield, Michael J. Flynn,
IBM Journal of Research and Development,
Volume 31 , Issue 4 (July 1987)

[2] Our object files only really contained assembly source, and the real assembler was invoked by a linker driver so we could do whole-program static analysis without changing the user's workflow.

[3] It turns out that the 90/10 rule is a lie for many programs. There's no shortage of large programs that spend 90% of their time in 10% of their code base on a given run, but the 10% that mattered would change on different runs, depending on your input data.

ggx: the gcc testsuite and cooperative multitasking

Tuesday, March 25. 2008

There's good reason to implement support for the "write" system call right away. It's the bare minimum required to run any of the "c-torture/execute" tests. This is the group of tests in GCC's extensive testsuite which require execution on the target platform.

I let the gcc execution tests run last night, and this is what I got:


                === gcc Summary ===

# of expected passes            11945
# of unexpected failures        910
# of unresolved testcases       129
 
Not bad for a first run! Lots of failing tests were timing out. I think there's a bug in how we compile long long and long double support which is creating an infinite loop in libgcc. Or it could be a bug in the simulator. One never knows...

I did find a couple of simple sim bugs, which are fixed in today's patch.

I also implemented setjmp and longjmp support in newlib, and then tested with Cheap Threads.

Cheap Threads is a simple C library providing cooperative multitasking using setjmp/longjmp.

This is first code I've downloaded from the net which built and ran without a hitch:
$ ggx-elf-gcc -DCT_RETURN -c *.c
$ ggx-elf-gcc -o demo1 demo1.o cterror.o  ctsched.o ctassert.o ctalloc.o  ctmemory.o ctsubscr.o
$ ggx-elf-run -v demo1
ggx-elf-run demo1

Cheap threads demo #1
Hello from thread A
This is coming from thread B
Hello from thread A
This is coming from thread B
Hello from thread A
This is coming from thread B
Hello from thread A
This is coming from thread B
Hello from thread A
This is coming from thread B
This is coming from thread B
This is coming from thread B
This is coming from thread B

Maximum memory pools registered: 0
Total memory allocations: 2
Maximum outstanding memory allocations: 2


# instructions executed       44631
44k instructions! This is most we've run so far...

The patch is in the patch archive

ggx: Hello World!

Monday, March 24. 2008

[Patch 19 in a series]

This is it. Today we're building and running a "Hello World" C app on the ggx simulator.

Yesterday we identified nine missing instructions required to link hello.c to newlib. They were all arithmetic and logical operators and were simple to implement in all of the tools.

Today's patch also includes a software interrupt instruction (swi). This is the instruction that ggx code will use to talk to the simulator in order to interface with the outside world. Consider, for instance, the primitive function "write"

int write (int fd, const void *buf, size_t len);
We implement a system call in libgloss like so:

/*
 * Input:
 * $r0      -- File descriptor.
 * $r1      -- String to be printed.
 * -8($fp)  -- Length of the string.
 *
 * Output:
 * $r0    -- Length written or -1.
 * errno  -- Set if an error
 */


write:
        swi     SYS_write /* SYS_write is a constant 5 */
        ret

Then the simulator's handling of the swi instruction includes a switch on the interrupt number:

  case 0x5: /* SYS_write */
    {
        char *str = &memory[cpu.asregs.regs[3]];
        /* String length is at 0x8($fp) */
        unsigned count, len = EXTRACT_WORD(&memory[cpu.asregs.regs[0] + 8]);
        count = len;
        while (count-- > 0)
            putchar (*(str++));
        cpu.asregs.regs[2] = len;
     }

It's not a perfect implementation (all output goes to stdout, and there's no error checking), but it works for now.

Today's patch and tests really put the toolchain through its paces, as we'll be simulating thousands of instructions just for hello.c! Our simulator runs to date have been limited to a dozen or so instructions, so this is a big jump...
$ cat hello.c
#include <stdio.h>

int main()
{
  puts ("Hello World!");
  return 0;
}
$ ggx-elf-gcc -o hello hello.c
$ ggx-elf-run hello
Hello World
$ ggx-elf-run -v hello 
ggx-elf-run hello
Hello World!


# instructions executed        2704

2704 instructions is a lot of code for just Hello World. Consider, however, that this includes all of the system initialization code, such as initializing the heap, allocating IO buffers with malloc(), etc. There's a lot that has to happen before we get to printing our greeting.

To be honest, I skipped a step in there. It's the one where running "hello" fails in many interesting ways and you have to debug simulator traces. There's no avoiding this step. Just think of it as a rite of passage.

In my case, I tracked it down to bad relocation generation in the assembler and was able to fix it thanks again to #gcc IRC folks. Today's patch includes this fix.

When I started this series I wrote that I would show how go from nothing to running real programs on a new architecture by posting daily patches. I think we're there, so I'm going to slow down the blogging effort. But that's not to say that I'm done with ggx. There's still plenty to do. Here are some ideas for work items:

gcc testsuite


GCC comes with a huge suite of tests to exercise code generation. The next obvious step in the development of this toolchain is to start whacking away at bugs identified by the testsuite. I've had a quick crack at it, and a couple of obvious things need fixing. For instance, there's an off-by-one error in ABI handling for vararg functions. There's also no support for trampolines, which are needed for GCC's nested functions extension. Apart from that, the results look pretty good so far.

broader language support


Exception handling is big issue here. The compiler needs to be taught how to deal with C++ and Java exceptions. libffi is also needed for Java.

gdb support


This is a tricky one for ggx. If we were working from a pre-defined ISA, then I would go straight to gdb next. Stepping through code with gdb is much more productive than reading instruction traces from the simulator. But we're not working from a pre-defined ISA with fixed instruction encodings. We're just at a first draft of the ISA and plan to make many changes.

Unfortunately, it looks like a lot of what goes into making gdb work is hard coding recognition of instruction sequences for things like like function prologues. I don't want to have to hack gdb every time I tweak the ISA, so I'll leave this 'til much later in the game.

ISA tweaking


This is where the game really begins. Check out this chart, which shows the static frequency of instruction types used in our hello application:



The most frequently used instruction type is GGX_F1_A4, which is a 16-bit instruction with one operand followed by a 32-bit value. These are all of the "load immediate" and "load absolute" instructions. What we'll want to do is understand everything about these instructions: how and why they're used, and if there's anything we can do to eliminate them or encode them more efficiently. We already know there's 6-bits of waste in the first 16-bits of GGX_F1_A4 because we're not using operands B and C. Perhaps we can use those 6-bits to hold a small constant value. That would turn some number of those 48-bit load-immediate instructions into 16-bit instructions. There will be many opportunities for improvements like this, and it should be interesting to see how dense we can make our code.

Run Linux!


Only half joking here. Truth be told, it builds, but we're a long way from booting...



It's interesting to see that the mix of instructions is completely different in the kernel. The 4th and 5th most frequently used instruction types are swapped. This demonstrates how it will be important to have a good mix of programs at hand while we're tweaking the ISA.

I hope some people have found this series interesting. I'm interested in hearing feedback if you are so inspired.

As usual, today's patch is available in the ggx patch archive.

ggx: libgloss, newlib and hello.c

Sunday, March 23. 2008

[Patch 18 in a series]

We're almost at the next major milestone: building and running a Hello World program in C. But first, a quick recap...

The ggx core is a simulated 32-bit load-store harvard von Neumann architecture processor. It currently has 8 32-bit general purpose registers and 37 instructions (a mix of 16- and 48-bits long).

We're 17 patches into our experiment and as of yesterday we can configure, build and install gas, ld, binutils, a simulator and the GCC C compiler and runtime support library. There's no doubt these things still all need work, but today we'll turn our attention towards newlib and libgloss.

Newlib is a standard C library, much like glibc on Linux. It was originally cobbled together by folks at Cygnus in order to offer a liberally licensed C library to their embedded tools customers. It's still maintained by Jeff Johnston (now at Red Hat) and is widely used by embedded developers and the Cygwin community (newlib provides the standard C library in cygwin.dll).

libgloss is a little more difficult to describe. Rob Savoye (dejagnu, gnash) once told me it was called libgloss because you use it to "gloss" over details when don't have an OS on your embedded system. It sits between newlib and your hardware or simulator, and exports a system call interface to newlib so C programs can interact with it.

Let's say, for instance, that you have a MIPS-based single board computer with a serial port for I/O. You would need to port libgloss to that specific board so that reading and writing through newlib routines (printf, etc) would send data up and down the board's serial port. It's an important element of what you might refer to as a Board Support Package (BSP). In fact, the libgloss sources build a library called "libbsp.a", not "libgloss.a". libgloss provides a few more things (a gdb remote protocol stub for instance), but we're not going to worry about these things for now.

For ggx, we'll need to implement some kind of trap instruction so libgloss can talk to the simulator. Then we'll implement systems calls for things like "write". In the case of "write" we'll send output to the console on the machine running the sim. But let's worry about that tomorrow. I've just stuck "bad" instructions in libgloss' syscall stubs for now. Today's focus is to generate fully built libraries for libgloss and newlib.

The changes are all pretty obvious, and there's lots of sample code in other ports to copy. That being said, the patches are a little on the big side, mostly because of the new autotool generated configury.
As usual, I've placed them in the ggx patch archive.

If you apply the src patches for newlib/libgloss, then configure and build with ggx-elf-gcc, you'll find that you have two new libraries: libbsp.a (from libgloss) and libc.a (from newlib). Install them now, and let's try a hello world:

#include <stdio.h>

int main()
{
  puts ("Hello World!");
  return 0;
}
 
If you try compiling this like so: ggx-elf-gcc -o hello hello.c, you'll be presented with pages and pages of linker errors that look like this:

strlen.c:(.text+0x1a): undefined reference to `__andsi3'
strlen.c:(.text+0x56): undefined reference to `__ashlsi3'
strlen.c:(.text+0x5e): undefined reference to `__ashrsi3'
strlen.c:(.text+0x8e): undefined reference to `__ashlsi3'
strlen.c:(.text+0xb2): undefined reference to `__subsi3'
strlen.c:(.text+0xdc): undefined reference to `__one_cmplsi2'
strlen.c:(.text+0x116): undefined reference to `__one_cmplsi2'

What are these, you ask? They're all references to functions that GCC is trying to use in place of missing instructions.

GCC uses a clever trick during code generation. Let's say it you have some C code that looks like this: "a = b & c", where a, b and c are all ints. GCC needs to generate code to "and" b and c together. If it can't figure out how to do this, it will simply emit a call to a function with a special name (__andsi3 in this case) and hope that the user will provide a library implementation at link time. Try poking around in the linux kernel sources and you'll see that a few ports provide their own versions of "missing" instructions.

In our case, we have yet to implement any of these instructions. They represent (in order), logical and, arithmetic shift left, arithmetic shift right, subtraction, and bitwise "not". These are just the missing instructions required by newlib's strlen(). In total, there are about 9 instructions missing in order to complete the Hello World link. Tomorrow we'll add all 9 instructions to the ISA and link our first full application. But will it run? Tune in tomorrow...

Today's patches are available in the ggx patch archive.

ggx: bytes, shorts, some cleanup and gold

Saturday, March 22. 2008

[Patch 17 in a series]

We hit an important milestone today!...

Yesterday we tried building libgcc after adding compare and branch instructions. The compiler built a few files but eventually failed because it didn't know how to load and store values smaller than 4-bytes. Today we'll introduce patches for dealing with these 16-bit (.s) and 8-bit (.b) values.

The new instructions are directly analogous to their ".l" counterparts: ldi.b, ld.b, lda.b, st.b, sta.b, ldi.s, ld.s, lda.s, st.s, and sta.s.

Once we add these instructions almost all of libgcc builds. The compiler just asks for one more instruction before it'll finish the job: an unconditional jump to an address stored in a register (jmp). Ok, done!

Finally, all 143 object files build to create libgcc.a! [sound of champagne bottle popping open]

Despite hitting this milestone, today's patches are a little boring, so I've included a second patch to GCC.

One of the problems with coding by example is that you have to know which examples to pick in order to be successful. Today's second GCC patch fixes a poor choice I made in yesterday's compare and branch patch.

I had implemented compare/branch using "cc0", a magic condition code register that only the compiler knows about. GCC has plenty of special case logic to deal with this cc0 register, and the GCC hackers hate it. The preferred way to deal with this is to create a normal register to hold condition codes (ggx's is called "?cc"). It still only exists in the mind of the compiler (we'll never generate code referencing it), but I'm told it greatly simplifies things for the compiler developers. Does it change how the compiler optimizes? My understanding is that the underlying logic is more sane, but the output should be (almost?) identical. (comments welcome if I have this wrong) Despite the strong desire to rid GCC of cc0 ports, I think most a few of them still use cc0. Apparently, de-cc0'ing a mature port is a lot of work (another good reason to get this right from day one!), so I don't know if cc0 will be officially deprecated any time soon.

The second change in this patch replaces my 10 branch patterns in the machine description with a single pattern that uses an iterator to produce the 10 different patterns. It doesn't change the compiler's behaviour, but it does let us rip out a bunch of repetitive code.

(Thanks to Hans-Peter and Paolo for their feedback and advice on this last patch)

Today's src and gcc patches are available in the patch archive here: http://spindazzle.org/ggx

BTW - It's worth noting that Ian Taylor's new linker, gold, hit the src repository last night. You'll want to try it out if slow linking is what ails you (x86 and x86-64 only for now).

Tomorrow we'll tackle libgloss and newlib...

ggx: if... then... else...

Friday, March 21. 2008

[Patch 16 in a series]

Architectures with no if-then-else capability are pretty boring, so today we're adding compare and branch instructions.

In addition to the new instructions, we'll be adding a condition code register. This register holds a series of bits that will be set by our compare instruction. The bits currently are:


  • 00001: Greater Than, Signed
  • 00010: Less Than, Signed
  • 00100: Equal
  • 01000: Greater Than, Unsigned
  • 10000: Less Than, Unsigned
We could also probably benefit from a "Zero" bit, and we may want to squeeze arithmetic overlow in there at some point as well, but this collection of 5 bits is certainly good enough for now.

We're not currently exposing this register to the programmer. It's just a bit of internal global state that's implemented in the simulator. This may also change at some point.

GCC supports 10 conditional branch patterns: beq, bne, blt, bgt, bltu, bgtu, bge, ble, bgeu, bleu. They all do the obvious things. For instance, bleu is Branch if Less or Equal (Unsigned). I've implemented all 10 as instructions even though this isn't really necessary. For instance, bgt is essentially ble after you swap the operands. I didn't want to get too clever at this point, and I still have plenty of opcode space. Let's just keep this in our back pocket if we start feeling opcode space pressure.

You'll also note that I encoded all of these branch instructions as GGX_F1_AB4 GGX_F1_4. This means they have two no operands and are followed by a 4-byte value (the target address). This is pretty wasteful. We'll likely end up with a shorter PC-relative branch instruction, but I want to actually measure things before I added this complexity (for instance, how many bits are required to represent X% of all PC-relative branches?).

Here's some sample source and output:
int lessthan (int x, int y)
{
  return (x < y);
}

lessthan:
        ldi.l  $r5, 0
        cmp    $r1, $r0
        bge    .L2
        ldi.l  $r5, 1
.L2:
        mov    $r0, $r5
        ret
This patch also adds jsr and jmp instructions. They both use register operands containing the target address. Both were pretty straight forward to implement and required to get to this next step...

Running "make -k" in the libgcc directory results in 116 object files!

There should be 143, so we're very close. Of the ones that are failing, they all die like so:
../../../gcc/libgcc/../gcc/unwind-c.c: In function ‘__gcc_personality_sj0’:
../../../gcc/libgcc/../gcc/unwind-c.c:59: internal compiler error: in emit_move_multi_word, at expr.c:3247

This error is indicative of missing "mov" patterns for smaller data sizes (short words and bytes). This is what we'll add tomorrow. Once we've built libgcc, we'll move on the newlib and libgloss. We're very close to running a real "Hello World" app!

Today's src and gcc patches are available in the patch archive here: http://spindazzle.org/ggx