lcc for Magic-1
To build your own, simply grab the lcc 4.2 distrubution here (and buy the book if you're interested in compilers), edit bind.c to add an m1 target as follows:
and add my machine description file, m1.md, to the src directory (renamed "m1.md", and with appropriate makefile additions). I made some local modifications to the IR. If you don't have my other changes (done for D16/M), delete or comment out the "/* byte width */" line in the IR declaration. Then, build lcc and invoke the m1 backend as follows:
lcc -c -S -target=m1 testfile.c
The assembly output will show up in testfile.asm (though I'm going to change that from x.asm to x.s soon).
Following are some random thoughts, problems and solutions encountered during the retargeting. Thanks go to Richard Man for his suggestions about better handling 8-bit operations.
7/4/2007 - Fixed what I believe to be a bug in the handling of ANSI integral promotion. There are some odd rules about promoting unsigned integer types to signed in ANSI C that are somewhat different than standard C. When an unsigned short type is widened to an int, it is changed from unsigned to signed. LCC does this - but when the sizeof(short) is the same as sizeof(int), the change of sign should not happen. LCC made this change anyway, which broke some code in Minix (and pretty much rendered unsigned short useless). My fix was to add an if clause to binary() that allows an unsigned short to remain unsigned iff short and int widths are the same.
I'm going with a 16-bit int as my basic inttype. Char is 1, short is 2, long is 4 and long long is also 4. On the floating point side, I've chosen to make all of them 4 bytes. I don't expect to make much use of longs & floats, so I'm not all that concerned about efficiency. In lcc, I emit pseudo-ops for all 4-byte data types, which via assembler macros will be converted into combinations of inline code and "millicode" calls. The latter term comes from my HP days, and refers to runtime support subroutine calls that follow a lightweight calling convention. In my case, the convention is no frame built, incoming parameters in a & b as either int data (for conversions) or pointers to the operands in memory. Because I've only got two useful registers here, operations such as add, sub, etc will treat operand 1 as both an input and the result.
As far as the other flags in the interface record, I'm asking for ARGB and CALLB (which means I'm explicitly handling structures as arguments and function results), big-endianness, left-to-right argument evaluation and signed chars.
lcc was really designed for RISC-like architectures with lots of orthoganal registers. Register usage presents a bit of a challenge for m-1. We declare 2 16-bit registers, a and b, and tell lcc they are both to be used for temps, and that no registers are to be used for register variables. We also tell lcc that we have 4 32-bit integer registers, and 4 32-bit floating point registers. Of course, m-1 doesn't actually have these. Instead, we allow lcc to generate code using them, but we will end up mapping these virtual registers to frame locals. The interesting code to handle this deception is in function(). After code gen (but before code emit), we look to see which, if any, of these virtual registers have been used. For those that have been used, we allocate space in the frame for them and then change the name of the register to actually be the stack offset of the location. For example, when we first set up the virtual registers, we create a register "R_L0". If that register is actually used, we'll change it's name to "50(sp)" - or something like that depending on the actual offset. All long/float assembly instructions are generated as something like:
which get emitted as something like:
which the macro processor will convert into:
M1 has native support for both 8 and 16 bit operations. lcc always generates code to convert data to it's supertype prior to arithmetic operations. This means code generated for char operations has lots of sign extends, and would never use my 8-bit operations at all (always extending to 16 bits). A partial solution for this is to have conversion rules that simply pass along their input operand, doing nothing. However, sometimes you really need the extends, so we have the notion of two intermediate rule types: reg and reg8. Here are some sample rules:
In short, we have rules which suppress the sign extends producing a "reg8" result. All 8-bit operations then require reg8 operands and return reg8 results. If something actually does need to be extended, it will use the rule which produces the "reg".
The big problem with this scheme is that when we suppress the CVII2, we lose our register targeting. This may simply be a problem of my misunderstanding and there's a better solution, but it seems to work reasonably well to have target() also target the kids operand if it is a CVII2 or any other operator we may suppress. For example:
In the above code snippet, we naturally make sure both the left operand and target are register a (which M-1 requires) and also if the left operand is the result of a conversion operator, we make sure that the conversion operator's input is also register a. If anyone knows of a better solution, please email me.
Compare & branch
One of my favorite features of M-1 is the rich set of compare and branch instructions. These are a natural match for lcc, which uses compare and branch operators. Because of the space they occupied in my 1-byte opcode encoding, I couldn't afford to have all possible conditions and sizes. Equality and non-equality work just fine for either unsigned or signed data, and I fully support these. For >, <, >= and <=, though, I only directly suport < and <= for signed operations. By reversing the compare-and-branch operands when dong a > or >=, I can fully support signed compare-and-branches (note: "a >"b is the same as "b < a" and "a >= b" is the same as not "b <= a"). So, my rules for GT and GE reverse the operands and map to LE and LT for all data types.
Note that M-1's cmpb instructions only permit a 1-byte signed branch target displacement. If the branch is too far, we need to do a regular cmp op followed by the appropriate branch on condition code (br.eq, br.ne, br.lt, br.le, br.ltu, br.leu). For unsigned lt,gt,ge,le, we would always want to generate the long-form cmp/branch sequence. The problem is that our compiler isn't going to know whether the branch will reach. That is going to be known by the assembler. The solutions include emitting macros for compare-and-branches and allow the assembler to choose, and to always emit the worst case and allow the assembler to optimize the to the short cases when possible.
I decided to do something similar to the first option. I'll always emit a cmpb instruction - even for the unsigned cases in which there really isn't a cmpb to map to. In M-1 ISA, I have a lot of variant forms of instructions - long versions for big immediate and short versions for short immediates. I've decided to treat the cmpb in a similar fashion. I can pretend that I have cmpb forms for everything. The assembler will then choose the optimal template based on immediate size. Even though the long form for cmpb is actually two separate instructions, I will just create a template that contains both the cmp and br.
Note that usually you'd want to emit worst-case and then optimize, but I decided the output assembly file would be easier to read if I used the cmpbs.
I ran into a bit of trouble with dealing wiith code, lit, bss and data segments. Ideally, I would have liked to combine code and lit and have them mapped to a process's code page table, and combine data and bss mapped to the data page table. My problem is that I also wanted to emit position independent code, and to be able to have multiple processes map to the same code/lit at possibly different virtual addresses. This mean that I can't have embedded absolute code & literal addresses in the code or lit segments. You must still support this, but by ensuring that all such initialized structures live in the data segment where load-time fixups can set up the correct values.
It turns out that lcc does emit absolute code addresses in lit for its switch statement branch tables. Rather than rewrite that code to emit pc-relative offsets, I just decided to combine lit, bss and data and put them in the data side of the house. Inelegent, but it works. Perhaps I'll revisit that later.
However, if I do revisit the problem later, I'll have to make sure I can tell which segment a symbol is located within by looking at it's symbol table entry. This is probably easily doable, but I did run into some problems figuring it out. There is a seg field, p->u.seg, but it lives in the union space of the record and isn't always valid. The reason for needing to distinguish is that I must generate different code to materialize an address in code space (lea r,xx(pc)) vs. data space (lea r,xx(dp)).
Handling my calling convention in lcc was pretty simple - what I wanted to do looked a lot like that lcc assumed. I thought about passing the last incoming argument in via register a, but decided against it (just a bit messy, and not much upside). So, all incoming arguments are stored in the caller's frame. Function return values come back in register a (for 8/16 bits) or registers a & b (for 32 bits). In the latter case, register a will contain the least significant word, and register b the msb.
Handling this in lcc was simple - nothing unusual needed.
I intend to handle a lot of details using macros and a smart macro assembler. Until I write it, I'm just using m4. So, at the moment I have lcc generate a bunch of m4 macrdos at the head of each file.