;-*- Mode:Text -*- Architectural Specification Programmer's Reference This should produce two documents: an Architectural Specification that unambiguously defines the hardware, and a Programmer's Reference that defines the "virtual machine" as seen from user-level compiled code. The Architectural Specification describes every detail of the hardware in somewhat flat terms. Right now, everything is piled in here. ;;;;;;;;;;;;;;;; OVERVIEW: 29332 ALU; dual-port register ram; function-call hardware; LISP compiled directly into 64-bit machine instructions; three-address / three-opcode instructions; instruction and data caches; 2^26 word virtual memory space; hardware type-checking and GC assist; 16MB local memory; nubus interface. ;;;;;;;;;;;;;;;; SUMMARY OF HARDWARE FEATURES: ALU 29332 LDB and non-LDB instruction, control, status inputs status outputs passaround register ram dual port, 4K x 33 control and use of 33rd bit 12-bit address from call-hardware; 8/12-bit from indirect call hardware frame select registers O, A, R, plus 4-bit immediate G form 12-bit register ram address for sources and destination. automatic frame management trap when full sources / destinations 2 bits select O, A, R, G 4 bit offset 1 bit func / reg (right and dest only) value of O, A, R, G for sources is before the instruction's call-hardware operation; value for dest is after. Legal instruction sequences i.e. return-return is not legal. (must use tail-recursion to avoid call followed by return) virtual memory main memory 16MB: 4M x 32 addressed by VMA normal, early start data through MD read / write control logic stops or traps on access trap protocol for MD / VMA access map 26 bit virtual address map to local main memory or nubus one level, 4K clusters 64K x 28 bits: addr (20), access, write, volatility (2), local/nubus instruction memory 64-bit instructions; 23- or 24-bit PC instruction cache two cache sets, 2K x 64 each; 4 word blocks first 4K words of instruction space is fast memory instruction cache is read-only write access through main memory; requires cache flush cache fill is through virtual memory map instruction space appears in top of virtual memory space filled from alternating 32-bit words exceptions traps (execution of interrupted instruction is modified) (trap routine computes new value to be stored in dest cycle of re-executed instruction) alu overflow data type interrupts (interrupted instruction is re-executed exactly) page-fault GC transport nubus error nubus interrupts local interrupts call hardware full / error reset machine control register has trap-enable bit(s). all exceptions vector to PC zero. active trap and interrupt requests read in unencoded form from func source. when traps are enabled, any trap requests causes a trap and disables traps; cause of trap request must be reset before traps are enabled. functional sources status md trap requests call-hardware microsecond / stat counters functional destinations control vma md map call-hardware Icache control microsecond / stat counters nubus interface nubus access is indicated by map software single-step trace execute one instruction and then trap debug interface single-step execution test register on MFO, on M board IR VMA MD PC booting Execute from boot prom. Prom is on main-memory bus; fills IR / Icache with same timing as running from main memory. Control-reg bit set on RESET, forces instructions to fetch from boot prom instead of main memory. ;;;;;;;;;;;;;;;; Instructions: all instructions have three separate opcode fields: Instruction Category (ICAT): indicates how other fields are used. PC source (NEXTPC): selects PC for next instruction. Continuation (CHOP): call-hardware operation. Illegal combinations may cause the machine to halt or trap, and must not cause damage. ICAT: (3 bits) ALU all possible ALU chip inputs are available ALUI 8, 16 or 24-bit immediate data combined with ALU operation LOADI 32-bit immediate data; ALU op is "Y<-R" ADDR 23-bit address for jump or call; ALU op is "Y<-R" ALUX same as ALU but with different set of ALU ops. ALUIX same as ALUX but with different set of ALU ops. NEXTPC: (2 bits) IR: All or part of PC comes from Instruction Register Disp: Dispatch; PC taken from ALU output reg at end of current instruction (result of instruction fetched two cycles back) If JCOND bit 0 is 1, the low 4 bits are forced to zero. Ret: Return PC from call-hardware PC+1: All or part of PC is current PC + 1. CHOP: (3 bits) no-op no call hardware operation open allocate new register frame; address in open frame call activate open frame and do function-call protocol open-call: open and call together t-open tail-recursive open t-call tail-recursive call return function return ;; cancel-open: undo an open or t-open. Specially decoded combinations: CALLZ 8-bit address of 16-word multiple for call to cluster zero. BRANCH 12-bit addr within current 4096-word page, for local jump. Specially decoded operations: Certain combinations of the Instruction Opcode, PC-source and Continuation modify the combination of bits selected for the PC-source. Opcode PC Cont Effect ====== == ==== ====== any Disp any Dispatch: if Jump-condition-select bit zero is 1, PC bits 0-3 are forced to zero. CALLZ IR any CallZ: PC bits 0-3 and 12-11 are forced to zero. BRANCH IR any Branch: PC bits 8-22 are forced to select PC+1; if the selected jump-status is false, PC bits 0-7 are also forced to PC+1. I.E., if the jump condition is true, it is branch-within-256-word-page. Jump status is the condition selected by the PRECEDING instruction; it is always conditional, but condition "always" is available. Misc: (fixed) register-ram boxed bit control: (2 bits) select what value is used for boxed bit of register-ram destination 0: 0 1: 1 2: boxed bit from left source 3: boxed bit from right source Type-checking control: (3 bits) Controls data-type trap for this instruction. Selects what values of left and right ALU source data types and boxed bits will cause the instruction to be aborted (trap). List combinations ... Statistics: (1 bit) Bit may be selected for statistics counter, trap and/or halt (VERIFY that we still want it ...) Source / Dest specification: Destination: (7 bits) (always present) select functional or register-ram destination. Left source: (6 bits) select ALU left-side source; always from register-ram Right source: (7 bits) select ALU right-side source; functional-source or register-ram Return-destination: (7 bits) function-return destination field saved by call-hardware as part of function-call protocol. Destination of functional-destination D-RET causes current return-destination from call-hardware to be used as instruction's destination field, in place of indicated destination. 4-bit-immediate: (4 bits) Source / dest fields select a register-ram location or functional source or destination. The 7 bit fields have a bit that selects functional vs. register ram; the 6 bit field always selects the register ram. Used as register-ram address: 2 bits: select one of Open, Active, Return or Global register frame addresses. 4 bits: offset within frame O, A and R select 8 bit registers that are combined with the 4 bit offset to form a 12-bit registe ram address. G combines the 4-bit immediate field from the instruction with the 4 bit offset within frame to form an 8 bit address that selects one of the first 256 register ram locations. Use as functional source / dest: 2 bits of frame select and 4 bits of offset combine to form 6-bit functional source / dest address. Destination value of functional-destination all-ones (3F) is used as the "garbage" location for instructions that don't need a destination. location for instructions that don't write a normal dest location 4-bit-immediate: used only when G is selected in the register frame field. ALU operation: (9 bits) 9 bits of 29332 ALU instruction including 2-bit byte-width select. For machine instructions that don't specify the ALU operation, the operation is forced to copy the right (MFO) source to the dest. ALU position/width (11 bits) Select bit-field position and width for certain ALU operations. Jump address: (23 bits) 8- or 23-bit jump or call address. If an 8-bit field is used, it is in bits 0-7 for BRANCH and bits 4-11 for CALLZ. Jump condition select: (3 bits) Instructions that don't include a valid jump condition select may not be followed by a BRANCH; the status is garbage. Select one of: Indir From machine control register Jump-Status bit (indirect) Always always true; required for unconditional jump C alu status "carry" C- carry inverted Z alu status "zero" Z- zero inverted C+Z carry OR zero (C+Z)- C+Z inverted ;;;;;;;;;;;;;;;; All jumps and calls select IR as the IR-specified PC-mux select. Jump-local forces PC:8-22 to select PC+1. Call-zero forces PC:0-3,12-22 to zero. Dispatch-x16 selects Dispatch, and forces PC:0-3 to zero. Trap forces zero for all bits. conditional jump timing: IR0: compute result that will be used as jump condition IR1: select jump condition IR2: cond-jump jump-addr IR3: new code at jump-addr MD / VMA boxed bits: (destinations) vma-start-read (all cases) set MD_BOXED from IR:55, VMA_BOXED from IR:54. vma-start-write set VMA_BOXED from IR:54, preserve MD_BOXED. md-start-write set MD_BOXED from IR:55, VMA_BOXED from IR:54. md set MD_BOXED from IR:55. vma set VMA_BOXED from IR:54. If a DEST to MD or VMA is aborted because the previous mem cycle was a write and it hasn't reached the point at which it knows if it will trap or not, the machine must be frozen (cache-miss style) at or before the DEST cycle that would clobber MD or VMA, such that if the trap is not taken, the DEST is still asserted and the instruction can be completed as if nothing happened. The decision to freeze the machine because a write-in-progress is followed by a DEST to VMA or MD may be as simple as decoding a write to any functional destination. It is required that if the trap is taken, the DEST to VMA or MD is inhibited, so that the trap routine can read the old values of them. The timing for writing MD_BOXED and VMA_BOXED can be the same as for writing MD and VMA. It is not necessary for MD_BOXED and VMA_BOXED to be writable indirectly; they can be restored by writing "md" and "vma". They should be readable through any kind of status register. Note that the bits cannot be written indirectly with the output of the P-board boxed-bit mux. ;;;;;;;;;;;;;;;; example: MD <- foo VMA-start-write <- bar no-op VMA-start-read-early <- bletch The -start-write cannot be immediately followed by the -start-read-early because both be driving the DEST during the same cycle. Assume that the VMA-start-write causes a page-fault or GC trap. The VMA-start-write instruction completely normally. The VMA-start-read-early must be frozen at or before its DEST cycle, until it is known whether or not the VMA-start-write will cause a trap. If it does cause a trap, the VMA-start-read-early must be aborted (trapped before it's commit point, with the early DEST inhibited) so that it can be reexecuted normally after the trap is handled. It is not necessary for the trap-on-write to abort the instruction that caused it. ;;;;;;;;;;;;;;;; If an instruction that did VMA-start-read-early is trapped for any reason (in particular, a case in which the early-DEST write was NOT inhibited), the memory (local or Nubus) cycle that is started by the -start-read-early must be aborted before it affects any nubus device, and also, the MD and MD_BOXED must not be clobbered.