Diary entries for 2002
WooHoo! The microcode is complete (though unchecked). Next step is to fiddle with my Perl scripts to extract all of the microcode fields, create a C struct declaration and compile the microcode. Following that, I'll write the microcode-level simulator and start fixing bugs.
OK, I've added the declaration for the microcode word struct and have begun fiddling with the Perl scripts that will create the actual microcode bits from the microcode web page. When I get it a bit further on, I'll add a source page to store the code for the PROM bits as well as the various simulators, compilers, etc.
Dooh! Yes, I'm an idiot. Of course I need to be able to save and restore the interrupt enable flag as well as the user/supervisor mode bit. My strategy throughout has been to not blindly put in features that I know exist in other machines in order to make sure I understand why they are needed. I had thought I could get away with not saving EI because only supervisor code would mask interrupts and it would know to explicitly enable and disable. The problem here is that I also need to disable interrupts while handling traps and faults. So, unless I can save and restore the existing state of EI, I don't know whether to re-enable interrupts when leaving the trap/fault handler.
OK, no problem. Here's what we'll do: each of the control bits, EI, M and P, will be implemented with an individual D flip-flop, and each will have a separate latch control line in the microcode instruction word. They will also have a common enable line to place their values on the X bus. I think I'll rename the FLAGS register to MSW (Machine Status Word), and it will consist of 7 bits: the four ALU flags (Z,C,S and O/H) plus the 3 control bits. I'll pack them all down into the low byte, and will continue to push and pop as 16 bits. When pushed, all values are used, but when popped only the ALU flags are written. There will also be a LD MSW,A instruction that is available in supervisor mode and a LD A,MSW instruction usable everywhere. The other way to set the control bits is to RETI. Finally, I think I'll eliminate the EI and DI instructions, and just require the bits to be manipulated using normal logical bit operations.
I ran into this problem while writing the interrupt/fault-handling microcode. This sequence (and corresponding RETI code) will, by far, be the longest microcode instruction. Here are few other random nuggets from my latest ponderings:
Still thrashing around trying to nail down the trap/interrupt mechanism. At the moment, I'm considering using a pair of cascaded 74148 8-line priority encoders. This will give me a total of 16 interrupt/trap lines, and will produce a 4-bit code denoting which one to use. I will further permanently tie the lowest priority line active, and this will be the fetch line. In other words, whenever the NEXT field of the microcode instruction points to "fetch", we will actually use the output of the cascaded priority encoders to get the address of the next microinstruction to execute. That address will be in the range 0..15. Address 0 will belong to the fetch microinstruction, addresses 1..7 will be for external interrupt lines, addresses 8..14 will be for traps & faults, and the highest priority line, 15, will be reserved for DMA request.
What I still need to do is complete the detail design of the mechanism which will drive the trap/interrupt request lines active, and then deassert the one that we're going to handle. Finally, I need to write the interrupt transition microcode. I've already done this a few times, but it is tricky.
One additional item I've been wrestling with is interrupt disabling. At this point, I've specified a single R/S flip-flop to hold interrupt enable/disable. I had further planned on allowing this bit to be set or reset while in supervisor mode, and have no provision for detecting the current value of the bit. You could only set/reset it. I'm somewhat uncomfortable with this, but think I'll stick with it a bit longer. My theory here is that user-mode code cannot rely on precise timing in a multi-tasking system, and we can require the use of system calls to acquire semaphores to control critical regions. I also want to force myself to be good about not closing the interrupt window too long while in supervisor mode. Thus, we must make explicit code which will run with interrupts disabled.
Did a bit of cleanup on the web site. I deleted the instruction lists, which had gotten out of date. When I complete the first round of microcode, I'll regenerate fresh ones. Also made a few other minor tweaks to reflect recent design changes.
I've been thinking quite a bit about the details of the fault/interrupt mechanisms. Here's a stream of consciousness.
Regarding the issues discussed there, I think my current working plan is to use a bit in the constructed IVEC to denote whether we took a memory fault using the user vs. supervisor page table. Further, it still appears that I might be able to eliminate the MSW. In it's place, we'll have a bit in FLAGS that determines whether you're running in user or supervisor mode. That bit will be located in the high byte of the flags, and is settable only via a RETI or a LD FLAGS,A while in supervisor mode. When you push the flags, you will get the value of that bit pushed in the flag word. However, when you pop flags, you'll only get the low bits (the normal flags). I've still got a bit of thinking to do about the issue of how to handle the logic to handshake an external interrupt request, as well as see if there's an easy way to deal with DMA requests (in an earlier design, handling DMA fell out of my scheme for microcode branching in the event of a fault or interrupt. In the current design, I'm not sure that old way will work.)
The other former MSW bit that needs to stay alive (and will also live in the high byte of the FLAGS register) is PE, Paging Enable. If 0, address translation is off and only the low 64kbytes of the address space is addressable. Although not enforced, I would expect this mode to only exist in supervisor mode - and likely just be active from power-on reset through supervisor page table initialization.
Thinking more about the width of the microcode, I've decided to continue to simplify the control circuitry by adding explicit control bits. For example, at the moment I'm using a function "L(ir[bits)" to be able to use the same microcode to pop into A and FLAGS. I'd use bit of the instruction to select either A or FLAGS. So long as I don't have to increase the width of the microcode from 56 to 64 bits, I'll go ahead and delete the function and just duplicate the microcode and explicitly use control bits (L(a) and L(flags) in the above example).
This is clearly not elegant, and I'm not especially happy about this sort of wastefulness - but given that I've got lots of otherwise unused space in the microcode PROMs it seems like the right thing to do.
OK - just did a round of editing. Now the functions L(ir[bits]) and LA(ir[bits[5..7]]) are no more. So, we'll only be looking at opcode bits for conditional branches and alu operations (which reduces the number of muxes we'll need). In other minor housecleaning, I previously noted that 3 bits were necessary to select the 4 shift/rotate functions. To clarify, we have one bit that enables the shift/rotate functionality, and two bits to select which kind of shift/rotate to do. Also, I've added a bit to the immediate register (going from 2 to 3. This bit will be wired into the 1024 position to allow me to add 1024 to AD0 after writing a new page table entry with the wpte instruction. This should make initial setup of a process' page table much cleaner when using default page table entries. So, the possible immediates I can generate via microcode are: 0, 1, 1024, 1025, -1, -2, -1025 and -1026.
Here's the latest:
As it stands today, we're only a few bits over a multiple of eight. I image I could squeeze a bit or two out to fit into 6 PROMs, but I'm pretty sure that I'm going to need to add some other bits to handle interrupts and faults, so we'll just assume we're going to 7 PROMs.
Also, while editing, I noticed that my microcode branch function was specified based on an old sequencer. When I elminated the call/return capability for microcode I intended to get rid of the circuitry to generate the address of the next micro-op (i.e. an increment adder). When this existed, I defined microcode conditional branch to use the next field as the "condition met" target and then fall through to the next micro-op. However, without the microcode pc incrementer this doesn't work. However, all is fine. In all cases I was either branching to fetch or falling through. I'll just redefine CBR to be:
I rearranged the opcodes to simplify the generation of the ALU function codes. Now we only have two possibilities - an explicit "sub" or bits 1..3 of the opcode.
Next, I took a look at how the encoding impacted the conditional branch instructions. Again, I ended up rearranging the opcodes. We still end up with a 3-bit field to describe how to handle the CBR microcode function. It is defined such that if the condition is met, we branch to the microcode address in the next field; otherwise goto 0x00 (fetch). Here's how the bits encode:
In other words, we'll use a mux to select whether the CBR bits in the microcode instruction word or bits 1..3 of the opcode to feed into the 3->8 data selector which feeds the branch circuitry.
Worked a bit on correcting typos in the microcode, and then extracted the tokens to estimate the width of a microcode instruction. It's pretty wide:
I'm probably being a bit optimistic here, and still haven't fleshed out the details of the fault microcode. So, it's likely going to end up being 7 74s472 PROMs (6 for the control bits, and 1 for the "next" field).
Other than trapping instructions and the new lea's that I haven't defined yet, the microcode is complete (though far from debugged). The only new item of note is that I realized I need for the accumulator to have a separate 3-state output to the X bus (as does TA). This is necessary to allow parallel usage of the address unit.
Very close to completing this round of microcode. Things will probably slow down in the short term as I need to think about some details: First, I haven't yet decided how best to handle addressing for array accesses (as well as structure accesses through a pointer). Most likely I won't attempt to add direct loads and stores, but instead will add some lea's. What I probably need to do here is hand-code some sample C routines. Following this, the next step will be to nail down the microcode control bits. The work here will largely involve trying to simplify the circuits needed to generate the control signals via field decoders. For example, for most ALU operations, I will be obtaining the 74381 function code from bits 1..3 of the opcode. However, there are some cases in which I need to specify the code directly in the microcode. Can I tweak the instruction encoding to eliminate or minimize the extra bits needed to handle the non-standard cases?
The other significant item prior to writing the microcode simulator is nailing down the details of the interrupt/trap logic. I've already done this a couple of times, so it should go fairly quickly -- but I also need to work through some sample trap handlers. In particular, I need to make sure I can get fast access to everything I need for servicing page faults and system calls. The former case involves recovering the necessary info for the faulting address while preserving all register state. In the latter case, I need to be able to handle copying of parameters between user and supervisor space (and in particular, deal with or eliminate the possibility of taking an user-space memory access fault during the syscall transitions).
Worked through more microcode, and it became clear that I wasn't going to even come close to filling up my microcode storage PROMs. Thus, I've decided to simplify the microcode sequencer by eliminating the call/return mechanism and just duplicating the affected microcode. Also, I'm going to need to tweak the logic which drives the X bus to enable me to drive the low half with the contents of the TA register, and the high half with the output of the steering logic. Click here for an explanation. I also ran into the a situation in which my internal bus arrangement and lack of multiple temporary registers caused me a bit of trouble. For "call" instructions, I need to save the target displacement, push the return address and then generate the PC-relative branch. Normally I would copy the data to be pushed into the TA register (which lives in the integer side of the house). This allows me to store a byte and simultaneously update the stack pointer. However, because the TA register is occupied by the target displacement, I need to route the return address bytes across the data paths in the address unit. Thus, I'm going to have to serialize the stores and pointer updates. This will make calls a cycle or two slower that I'd like, but not slow them down so much that I'm going to throw more hardware at the problem.
Finished off the atomic compare and branch and test and branch instruction microcode.
More progress on the microcode. I also made a small change in the data paths by adding a 3-state output from the TA register to the X bus. I believe I had this in earlier but dropped it for some reason. It will allow me to better overlap stores and memory address construction (just as happens for loads). I've also decided on an couple more tweaks: at various times I've added and then dropped auto incrementing and auto decrementing operations using AD0 and AD1. Now that I'm doing the microcode for them again, I remember why there weren't there in the last round. If I fault, I need to undo any changes to the registers. That's why I take a snapshot of PC and SP (in SPC and SSP). These registers I allow to be changed as the instruction executes. If we then fault after a change, we can roll back to the original values. To support auto increment/auto decrement operations off of AD0 and AD1, I'd either have to back them up as well or write the microcode such that their values aren't changed until after all possible trapping microinstructions are complete. To keep complexity down, I think I'll just drop them. In truth, those instructions are most useful in block move/compare contexts and will likely be handled by runtime subroutines. Thus, the minor code expansion isn't important at all, and the speed of doing separate LD/INC will actually be about the same as a unified instruction. The other change will be to add a set of indexed effective address generation in the space freed by giving up autoinc/dec. They'll be of the form "lea AD0,(AD1+A)". I may also do an indexed load, perhaps using AD0 and a base and AD1 as an index. Given that my shift/rotate unit is embedded in the accumulator, I won't add automatic scaling of the index to support word loads. This will have to be done explicitly.
Working through the microcode now. Have finished add, sub, or, and, xor. These were among the easiest, and needed only 18 microinstructions in the lower half of the PROMs.
More fleshing out of the architecture and ISA pages. Also tweaked the site color scheme a bit (though still not especially happy with it). Added complete list of opcodes in numerical order. I also tweaked the instruction set a bit, replacing some of the unconditional branch and call instructions which went indirect off of addresses located at (DP+offset) and (SP+offset). In their place, I put traditional conditional branch on condition code instructions (though with my emphasis on "test and branch" and "compare and branch" instructions I'm expecting them to be used fairly infrequently. I'm now starting work on the microcode rewrite, which I'm sure will impact the encoding somewhat.
Conversion of my hand-written drawings and notes continues. I have rather a lot of schematic fragments covering most aspects of the design, but most of them are somewhat out of date. The design underwent a significant change a few months ago (largely in the microcode sequencer and data-path layout) which rendered all of my previous microcode obsolete. Most of the work in the short term will be reworking the microcode. After that, I'll be using a schematic capture program (CircuitMaker Student, probably). At the moment, I plan on doing the schematic in many small fragments rather than a couple of big sheets. I may change my mind here, but the thought is that I only have a normal printer and will want to wire this up a small section at a time.
I only find time to work on this project in short weekend and evening bursts, so I expect it will be awhile before I actually begin construction. However, my hope is that I can meet the following rough schedule: