The SAM-32 ISA
Before making a CPU, it is important to come up with how the CPU will perform the Fetch, Decode, Execute cycle. I want to make my CPU 32-bit (16 isn't enough to do cool stuff, and 64 is overkill).
The Name - SAM32
Obviously, the most important part of designing an ISA is the name itself. I've chosen to do SAM32 - Super Awesome Machine, the greatest Little-Endian machine of all time.
The Base
Being a guy in High School, designing the entire ISA from scratch (surprisingly doable) isn't worth my time, especially with AP tests coming up. I've chosen to essentially modify RISC for my own needs and plans for the future, so that it becomes something Mine. I'm going to remove the stuff I feel like are unnecessarily complex and add features that I think make programming easier (like my GOAT, the status register).
Philosophy
Simplicity means I will actually complete this project, and a complete project is better than an incomplete one, no matter how impressive it is. Many of my decisions throughout this entire project, not just the ISA are based on this.
Registers
Registers are high-speed memory located inside the CPU itself. Think of it as a list of values the CPU needs right now. The values in registers constantly get added, multiplied, incremented, etc. Having more is usually better, as it means your CPU can 'remember' more things at once (it can do this with RAM as well, but STORE-ing and LOAD-ing from RAM is much slower than manipulating stuff already in registers). I've decided to stick with the 32 registers RISC-V uses. It's considered the sweet spot between latency (although I don't have to worry about signal distance), complexity and ease-of-use.
Each of these 32 registers are 32 bits wide, and can be accessed via r0-r31. r0 will be hardwired to 0, and cannot be overwritten (writes to this register will just be ignored). It will be used as a constant sink and synthesize pseudo-instructions like CLEAR and NOP.
The Program Counter (referred to as PC) is a dedicated register that tracks the current instruction's address. It will be incremented at the end of each cycle to allow for the program to move forward.
The Status Register SR is a new register that tracks the result of the Arithmetic Logic Unit (ALU). It will be automatically updated by the ALU and contains 4 primary flags. As a result, it will be 4 bits wide.
| BIT | FLAG | Purpose |
|---|---|---|
| 0 | Z | ALU result is exactly 0 |
| 1 | N | ALU result is negative |
| 2 | C | ALU addition carried over (result was >32 bits) |
| 3 | V | ALU math caused an overflow (bits wrapped around) |
Instructions
There's 3 different types of instructions, each starting with the opcode and the destination register, then their respective arguments.
R-Type - Register to Register
| Opcode [31:25] |
Rd [24:20] |
Rs1 [19:15] |
Rs2 [14:10] |
Funct [9:0] |
| Opcode [31:25] |
Rd [24:20] |
Rs1 [19:15] |
Immediate [14:0] |
| Opcode [31:25] |
Rd [24:20] |
Target Offset [19:0] |
opcode register field immediate / offset function code
## Control Flow ### Branching Branching allows for the CPU to execute if statements and other conditionals. Essentially, the way it works is: you tell the CPU to make a comparison, and give it a `Target Offset`. If this condition is true, then the CPU 'jumps' (changes the PC) by the `Target Offset`, skipping or looping code.Looping
Looping, as mentioned in the previous paragraph, is done entirely via software (decrement, compare, branch). Hardware loops are too complex for the scope of this project, and I don't really have plans to implement hardware loops in the future.
Instructions
Now, for the instructions themselves! These define the actual functions the CPU can perform!
Arithmetic and Logic
| Instruction | Format | Operands | Action | Opcode | Function Code |
|---|---|---|---|---|---|
ADD |
R | Rd, Rs1, Rs2 |
Rd = Rs1 + Rs2 |
0x01 |
0x001 |
ADDI |
I | Rd, Rs1, #imm |
Rd = Rs1 + imm |
0x08 |
|
SUB |
R | Rd, Rs1, Rs2 |
Rd = Rs1 - Rs2 |
0x01 |
0x002 |
MUL |
R | Rd, Rs1, Rs2 |
Rd = Rs1 * Rs2 |
0x01 |
0x003 |
MULI |
I | Rd, Rs1, #imm |
Rd = Rs1 * imm |
0x09 |
|
DIV |
R | Rd, Rs1, Rs2 |
Rd = Rs1 / Rs2 |
0x01 |
0x004 |
DIVI |
I | Rd, Rs1, #imm |
Rd = Rs1 / imm |
0x0A |
|
MOD |
R | Rd, Rs1, Rs2 |
Rd = Rs1 % Rs2 |
0x01 |
0x005 |
MODI |
I | Rd, Rs1, #imm |
Rd = Rs1 % imm |
0x0B |
|
LUI |
J | Rd, #imm20 |
Take 20-bit immediate, shift left by 12 bits and store in Rd |
0x12 |
|
AND |
R | Rd, Rs1, Rs2 |
Rd = Rs1 & Rs2 |
0x01 |
0x007 |
ANDI |
I | Rd, Rs1, #imm |
Rd = Rs1 & imm |
0x0C |
|
OR |
R | Rd, Rs1, Rs2 |
Rd = Rs1 | Rs2 |
0x01 |
0x008 |
ORI |
I | Rd, Rs1, #imm |
Rd = Rs1 | imm |
0x0D |
|
XOR |
R | Rd, Rs1, Rs2 |
Rd = Rs1 ^ Rs2 |
0x01 |
0x009 |
XORI |
I | Rd, Rs1, #imm |
Rd = Rs1 ^ imm |
0x0E |
|
SLL |
R | Rd, Rs1, Rs2 |
Rd = Rs1 << Rs2 |
0x01 |
0x00A |
SLLI |
I | Rd, Rs1, #imm |
Rd = Rs1 << imm |
0x0F |
|
SRL |
R | Rd, Rs1, Rs2 |
Rd = Rs1 >> Rs2 |
0x01 |
0x00B |
SRLI |
I | Rd, Rs1, #imm |
Rd = Rs1 >> imm |
0x10 |
|
SRA |
R | Rd, Rs1, Rs2 |
Rd = Rs1 >> Rs2(Preserves Sign Bit) |
0x01 |
0x00C |
SRAI |
I | Rd, Rs1, #imm |
Rd = Rs1 >> imm(Preserves Sign Bit) |
0x11 |
Memory
| Instruction | Format | Operands | Action | Opcode | Function Code |
|---|---|---|---|---|---|
LW |
I | Rd, [Rs1 + #off] |
Load Word: Read 32 bits from address Rs1 + #off into Rd |
0x18 |
|
SW |
I | Rs2, [Rs1 + #off] |
Store Word: Write 32 bits from Rs2 into address Rs1 + #off |
0x19 |
|
LB |
I | Rd, [Rs1 + #off] |
Load Byte: Read 8 bits, signed and padded to 32 bits from address Rs1 + #off into Rd |
0x1A |
|
LBU |
I | Rd, [Rs1 + #off] |
Load Byte: Read 8 bits, padded to 32 bits from address Rs1 + #off into Rd |
0x1B |
|
SB |
I | Rs2, [Rs1 + #off] |
Store Word: Write last 8 bits from Rs2 into address Rs1 + #off |
0x1C |
Control Flow & Branching
Jump means set PC to
| Instruction | Format | Operands | Action | Opcode | Function Code |
|---|---|---|---|---|---|
JMP |
J | #target |
Jump to PC + #target |
0x30 |
|
CALL |
J | Rd, #target |
Jump to PC + #target, Rd = PC + 4 |
0x31 |
|
RET |
I | Rs1 |
Jump to Rs1 |
0x32 |
|
BEQ |
J | #target |
Jump if Z to #target |
0x20 |
|
BNE |
J | #target |
Jump if not Z to #target |
0x21 |
|
BLT |
J | #target |
Jump if N != V to #target |
0x22 |
|
BGE |
J | `#target | Jump if N == V to #target |
0x23 |
|
BHI |
J | `#targe | Jump if not C & Z to #target |
0x24 |
|
BLS |
J | `#targ | Jump if C | Z to #target |
0x25 |
System & Stack
Yes, you saw PUSH and POP right! I love those guys!
| Instruction | Format | Operands | Action | Opcode | Function Code |
|---|---|---|---|---|---|
PUSH |
I | Rs1 |
R2 = R2 - 4, Stores Rs1 to memory address at R2 |
0x40 |
|
POP |
I | Rd |
R2 = R2 + 4, Load from memory address at R2 to Rd |
0x41 |
|
MFSR |
I | Rd |
Copy SR to Rd, pad to 32 bits |
0x42 |
|
MTSR |
I | Rs1 |
Copy bottom 4 bits of Rs1 to SR |
0x43 |
|
HALT |
I | Suspends the CPU clock (future use) | 0x44 |
Custom Misc. Instructions!
These are instructions I feel like are important, but aren't implemented in RISC-V
| Instruction | Format | Operands | Action | Opcode | Function Code |
|---|---|---|---|---|---|
MAC |
R | Rd, Rs1, Rs2 |
Rd = Rd + (Rs1 * Rs2) |
0x01 |
0x006 |
SLICE |
R | Rd, Rs1, Rs2 |
Extract bits based on start/length packed into Rs2, shift to bottom of Rd |
0x01 |
0x00D |
BSWAP |
R | Rd, Rs1 |
Reverse byte order of Rs1 |
0x01 |
0x00E |
CLZ |
R | Rd, Rs1 |
Count leading zeros in Rs1, stores count in Rd |
0x01 |
0x00F |
Pseudo-Instructions
These instructions are meant to only be a 'thing' in the assembler, and get translated to one of the instructions above
| Pseudo-Instruction | Assembled Instruction | Why? |
|---|---|---|
SUBI Rd, Rs1, #imm |
ADDI Rd, Rs1, #-imm |
Flips the immediate to negative and uses the hardware ADD circuit. |
NOP |
ADD R0, R0, R0 |
Burns a clock cycle |
CLEAR Rd |
ADD Rd, R0, R0 |
Wipes the register to zero |
MOV Rd, Rs1 |
ADD Rd, Rs1, R0 |
Copies a register from Rs1 to Rd |