Tuesday, 22 December 2020

Lunchtime Coder #4-4 - Arcade Space Invaders Emulator - The 8080 Processor Part 2

I'm not going to write a full tutorial on how to write an 8080 processor emulator, as enough of these exist already but if you do want to write a Space Invaders emulator, you will need to know what all of these terms mean:


Program Counter (PC) - Memory pointer. Points to the memory location that code is executing from. Can be a variable that starts at 0.
Read the instruction at zero. Carry out the instruction. Increment PC. Repeat.

Opcode - Instruction for the processor.
Read value at memory_array(PC). Decode the value into an Opcode. Write the equivalent instruction in your programming language. Increment PC. Repeat

Operand - Byte or bytes of data that go with the Opcode. Some opcodes are a single byte instruction. Some come with a byte of data and others come with two bytes of data.
Read value at memory_array(PC). Decode the value into an Opcode. If required Read value at memory_array(PC+1) and/or memory_array(PC+2). Write the equivalent instruction in your programming language. Increment PC (By 1, 2 or 3 depending on number of operands). Repeat

OpBytes - The length of the Opcode including the operand/operands in bytes.
Read value at memory_array(PC). Decode the value into an Opcode. If required Read value at memory_array(PC+1) and/or memory_array(PC+2). Write the equivalent instruction in your programming language. Increment PC Opbytes. Repeat

Registers (A, B, C, D, E, H, L) - Registers are the processors workspace. The 8080 processor has 7 registers. Some can be used in pairs for 16 bit address values. Register A or the accu,ulator is a special register that has additional functionality over the others. These can all be variables in an emulator.

Register Pairs (BC, DE, HL, AF) - As stated above, some of the registers work in pairs for when 16 bit memory values are required.

Conditional Flags - When certain opcodes are executed, its important to know whether the result is 0 ot not, Odd or even, Plus or minus, greater than 255 or less than 0. In these cases, certain checks are done on the result and "Condition" flags are set or unset. Its important to know that if an Opcode requires conditional flags to be set, then they must be set at 1 if condition is met, or 0 if condition is not met, not just set to 1 if condition is met.
There is another conditional flag called Auxilliary Carry. This flag is used in converting hexidecimal values into binary values and is associated with Opcode DAA. Space Invaders doesn't use the Auxilliary Carry flag, but it does use DAA to convert the number of credits into a decimal number.

The Stack Pointer (SP) - The Stack Pointer is initialised by the program on the roms and is an area in the game RAM where the processor can store values that it will need back later e.g. if the program goes jumps to a subroutine, it will need to return back to the address that it has jumped from later, so it stores PC in the stack before it jumps to the sub routine address.

PSW - This is a 16 bit "Register Pair" value made up of the contents of register A, and a byte thats made up of the conditional flag bits and some fixed value bits. This exists so that if the conditional flags and value of register A need storing while a sub routine runs, PSW can be put on the stack.

Opcode Timing - Each Opcode execution takes a number of processor clock cycles to execute. As the space invaders 8080 processor is running at approx. 2 MHz or 2 million clocks per second, its important to count how many clock cycles you have used executing opcodes so that you can time screen updates and interrupts.

16 Bit Shift Register (Not part of the 8080 but required for Space Invaders emulator) - As the processor cannot shift bit around as quick as Space Invaders requires, and external 16 bit shift register is used in conjunction with the processor using Opcodes IN and OUT. This will also need emulating.

Interrupts - Interrupts stop the current program running and execute a different piece of code. In reality the interrupt pin on the processor will be clocked, and if the processor has Enabled Interrupts, then it will stop what its doing and accept an instruction from an external source. For emulation, we can count the clocks used by opcodes, and every time the clocks reach a certain value, carry out an interrupt. Space invaders has two interrupts that are Opcodes RST 1 and RST 2. 2000000 Clocks per second. 60 FPS. 33333 clocks per frame. Two interrupts per frame update (First interrupt a RST 1 and second a RST 2). One interrupt per 16666 clock cycles (If the processor has enabled Interrupts, Opcode EI). Update the screen every two interrupts.

Dip Switches - The Space Invaders board has some Dip Switches on it that set how many lives you have and at what score a bonus life is earned, amongst other things. These are read by the processor using Opcode IN.



In addition to the information above, I used the following resources:

1. List of 8080 Opcodes and what they do.
2. A Table of 8080 Opcodes and associated Opbytes and timings.
3. The Intel 8080 Programming manula. Detailed description of each Opcode.
4. An online javascript emulator that executes an opcode at a time. Very useful for debugging.
5. An explanation of shift register, and how IN and OUT opcodes work.

Top Tips

1. Do not implement any opcode until you have tested it. Write a test program and test each one to ensure that the output you get matches the 8080 manual.

2. Build in a debug log that shows PC / Opcode / SP / Registers / Flags etx. Your emulator output can then be checked against the online JS emulator in the event of a bug.

3. Build in error checks.
a. If the Opcode is one that writes to memory, ensure that its the RAM area, not the ROM area its writing to. This seems to be the most common bug encountered by people. If you don't monitor where the processor is writing, it could be in the wrong location writing over the Space Invaders code.
b. Monitor the size of the stack. If the stack depth gets more that 64 bytes, I would start to get concerned. I had a bug where the Stack Pointer decremented through the whole of the RAM and then into the ROM as the code executed.
c. Check that PC hasn't gone higher than the array allows. Theoretically I don't think PC should ever get higher than the ROM area, but I'm not certain on this.


Efficient Code

There is no commands in Blitz3D to deal with values as binaries. Instead you can convert the decimal value into a binary string, and then manipulate it using BIN$ LEFT$ RIGHT$ MID$ etc. I initially used this method a lot.

When the emulator was running at low speeds, and with some knowledge of what processors could do quickly, I started to suspect that all of the Opcodes with these commands in were slowing the program down, and I was right.

I did some speed tests and found out that a routine that used simple arithmetic and if/then, for/next etc ran around 180 times faster.

I also did some speed tests on other commands to see what was faster. It turns out that a single condition SELECT TRUE is faster that an IF/THEN, But an IF/AND/THEN is faster that a two condition SELECT TRUE. If there are two conditions then the fastest way is a single condition SELECT TRU followed by an IF/THEN. Applying this sped up the emulator 3 fold.

Keep rewriting your Opcodes differently. Try three or four different methods and then run each one in a test program 10000 times and measure how long each one takes, then select the fastest.



Bugs That I Experienced

1. I had one of the Opcodes set to 7 Opbytes instead of 7 clock cycles. This meant that when the Opcode was used and PC incremented, a couple of Opcodes were skipped which lead to all manner of different red herring indications.

2. I had an erroneous credit appear in the middle of the attract mode. Attributed to the above bug.

3. When I implemeted CALL opcode, I put the PC of the CALL opcode onto the stack, rather than incrementing PC to the next Opcode and then putting it on the stack. This meant that the PC jumped to the subroutine, but then on the RTN opcode, wnet back to the CALL opcode and got stuck in an infinite loop.

4. Think 8 bit. Opcode ADD A,B. Add the value in register B to Register A. If the result is higher than 8 bits (higher than 255 decimal) then you need to adjust the value so its a single byte, as in 254, 255, 0, 1, 2, 3 etc. A=A+B. If A>255 then A=A-256.

5. Numerous bugs where values were being written into the ROM area. Any opcode that involves a write to memory, put in a check to ensure that the memory address is above the ROM area.

6. Infinite loop that caused SP to decrement down throguh all of the RAM and into the ROM. Debug log was very useful!



Where Do I Start?

1. Probably the best way to start is by writing a disassembler. Disassemble the code, and the check it against one of the numerous disassemblies on the internet.

2. Implement the processor emulator first without Interrupts or shift registers. Let the emulator tell you what Opcode it wants.


opcode=mem(PC) select true default text 10,10,"Opcode: "+right$(hex$(opcode),2)+" has not yet been implemented." end select


After the first 50 Opcodes that the emulator expects have been implemented, you will find that when you run the emulator, after around 50,000 opcodes it goes into an infinite loop at memory locations 0ada, 0add, 0ade. If you get this far, your emulator is running correctly qnd you are ready to add the interrupts.

Monday, 30 November 2020

Lunchtime Coder #4-3 - Arcade Space Invaders Emulator - The 8080 Processor Part 1

Before I go diving into the processor, I am slightly concerned about whether I can achieve the speed required for full speed Space Invaders emulation, so time for a bit of fact finding and then a programming language capability test.

The 8080 processor is clocked at approx. 2 MHz on the Space Invaders arcade board. Each instruction that the processor executes takes a set number of clock cycles to execute. This leads to a lot of different claims about how fast the processor actually runs in terms of MIPS (Million Instructions Per Second).

The normal way to rate the speed of a processor is to take an average number of cycles per instruction and then divide the clock speed by this number, which is why so many sites claim that the 8080 clocked at 2 MHz runs at 0.4 MIPS or 400,000 instructions per second.

I found this somewhat difficult to believe. It seemed somewhat of an overestimate to me. It was based on an instruction taking 5 clock cycles to execute when a lot of them take 7, 11 or even 17 cycles.

After some searching, I found a sensible estimate by Intel who make an educated guess at approximately 0.08 MIPS or 80,000 instructions per second. This sounds a bit on the slow side so the real number is probably somewhere wetween the two figures stated.

The Space Invaders display runs at a refresh rate of 60 Hz, and so with between 80000 and 400000 instructions per second, and a refresh rate of 60 Hz, this means that the processor emulator would need to execute around 1333 instructions between each display update (ish).

There is also over 51000 pixels bits to translate from the memory buffer to the screen, and just using "If value=0 plot black dot / If value=1 plot white dot" would mean over 102,000 "If / Then" checks. That's time consuming from a processor point of view, so simplifying this will save a lot of processing time.

For a start, lets count how many white pixels there actually is on a typical fully populated Space Invaders display.

DEV 1 - White pixel counter. A program that counts the colour of each pixel on a correct resolution, fully loaded Space Invaders screen snapshot (courtesy of MAME).

It turns out that even with a full complement of invaders / lives / shields etc, around 90 % of the screen is black, and therefore by not drawing the black parts of the screen, a lot of time can be saved.

So by only drawing the white pixels onto a background that's already black, we have reduced the If/Then checks by 50 % to 51000 ish, but we are reading the bits in chunks as bytes from the array and then converting the decimal byte into binary bits, and then drawing the bits where required.

If we check whether the byte value is 0 in the array before we split it into bits (and according to DEV 1, 90 % of them are) then we could cut 51000 If/Then checks down to about 5000 on a full screen, and even less as the invaders get wiped out.

Now we have enough information to run a speed test.

DEV 2 - A dummy emulator program / speed test. The program will run at 60 FPS and will execute 1350 simple instructions between each frame update. The program will also run a simulation of reading 5000 bits of the screen buffer to the back buffer ready to display 60 times a second. FPS must remain at 60 and at least 1333 instructions must execute between each frame.

It turns out that Blitz Basic will run a simulated 8080 emulator at 0.4 MIPs or 0.08 MIPS no problem (Thats a suprise to me as its such an old programming language!).

With a high confidence level, I am now ready to dive in. Now for the real challenge... The processor emulator!

Lunchtime Coder #4-2 - Arcade Space Invaders Emulator - Memory Buffer / Memory Map

Before I can start emulating, I need to make a ROM / RAM area or memory buffer to store the ROMs and provide the system with RAM.

And before I can do that, I need to know the Space Invaders memory map.

Space Invaders has four x 2 kB roms:

invaders.e
invaders.f
invaders.g
invaders.h

This rom set is widely available on the internet. I googled for websites supplying MAME 0.36 romsets. If you end up with a romset that has four x 2 kB roms of a different naming convention, you may be able to work out which rom corresponds to the naming convention above, or just find an older set.

Space invaders then has 8 kB of RAM; 1 kB of general purpose RAM and 7 kB of video RAM. It then also has a RAM mirror of a further 8 kB.

The processor addresses these ROMs and RAM as a single 16 kB entity so the memory map in Hex goes:

invaders.h 2 kB 0000 to 07ff
invaders.g 2 kB 0800 to 0fff
invaders.f 2 kB 1000 to 17ff
invaders.e 2 kB 1800 to 1fff
RAM 1 kB 2000 to 23ff
Video RAM 7 kB 2400 to 3fff

Note that already we can start to gather info for the processor emulator. 16 kB cannot be addressed in 8 bits / 1 byte. The 8080 processor is an 8 bit processor, but must use 16 bit memory addressing, or memory addressing spread across two bytes.

Also note that Space Invaders screen resolution is 256 x 224 in black and white, so one bit per pixel. 256 x 224 is 57344 bits or 7168 bytes or 7 kB. This makes for a nice easy bit streaming screen interpreter hopefully.

Two additional important notes to add here:

Note 1. Some Space Invaders Emulation sites state that there is a "RAM Mirror" above memory address 3fff which seems to be frequently misunderstood as to what it is. A RAM mirror isn't an extra area of RAM. It simply means that because of how the memory pins on the processor work, memory addresses above 3fff may be requested by the processor. If the requested memory address is above the stated memory limit of 3fff, then the real memory address is (Address-3fff). I am doubtful whether this situation ever comes up though. Several people have stated that their emulators don't request addresses above 3fff. Several others have stated that an emulator won't work unless you allow for the RAM mirror. I havent seen any need to take this into account in my emulator. Just keep it in mind.

Note 2. Building in error checks as you write the emulator is always a good idea. A good error check is that whenever the processor writes to memory, check to see that the address that its writing to is in the RAM area and not the ROM area. Another good check is to ensure that the program counter doesn't go up beyond the program area. If either of these checks fail, then its a good indicator that there is errors in your code.

Now that I have the memory map details, I can start making the memory buffer.

I have two choices here; Array or Databank. Both should be pretty straight forward to implement. Set up an array or databank with 16384 slots and allow one byte in each slot. The only real question is, which would be faster?

DEV 1 - Program to test the speed of writing to and reading from an array vs writing to and reading from a databank.

This was a nice easy dev program that checks the time in ms, writes a fixed value to each of the 16384 slots in the array, and then reads each of them back out. On completion check the time is ms again and see how long its taken.

Repeat the process for the databank and then compare which one was faster.

I let this run for several thousand cycles capturing the average time taken for each one, and was slightly disappointed to find that both methods take pretty much exactly the same amount of time.. (rolls eyes) and so after literally three seconds of consideration, I went with the array option as the coding to read and write from array slots is shorter.

I have a feeling that this will by far be the easiest part of this emulator.

I have made a single dimension array with 16384 spaces. I have loaded up the Space Invaders roms into first part of the array in the correct order, and left the rest of the array empty... And that's about it!

One last important thing to know. What memory location do you start running the Space Invaders game from?

While this isn't always the case with 8080 code, conveniently Space Invaders starts running from memory location 0000.