Saturday, August 20, 2016

Virtual Machines and CPU emulation - performance.

My work on MOS 6502 emulator continues.

After adding code measuring emulation speed, I realized my emulator was slow. Really slow. Barely exceeding speed of 1 MHz MOS 6502 with only the CPU emulated and going down to ~30 % when character I/O and raster graphics devices were enabled. This is not acceptable on today's fast quad core GHz processors with tons of RAM etc.

Something was very wrong.

After few frustrating hours and some minor optimizations that didn't return satisfying enough results I finally found the major bottleneck. A piece of code implementing debugging facility I almost forgot about - the op-codes execute history, which can be displayed in Debug Console and clearly not needed during normal emulator use. Not only was it implemented without any considerations for speed, but it also unnecessarily disassembled the instructions during each op-code execution to keep the execute history in symbolic form.
During each 6502 op-code execute cycle, the previously executed instruction with argument was disassembled to a symbolic form and combined with CPU registers status then added to the history queue as a string.
Totally unnecessary and expensive time wise.
First of all, disassembling to symbolic form is only needed when displaying the history to the user, therefore it is sufficient to keep history data in a raw format. Huge performance bust. Second of all, the history is only needed during debugging, so I disabled the feature by default and added an option in Debug Console that allows to enable it if necessary.

After I fixed the code I am getting decent performance now. Time for some stats.

Program measures the emulation performance by counting the number of executed clock ticks (cycles) and dividing them by the number of microseconds during which the counted cycles were executed, then multiplying the result by 100 to get the percentage of 1 MHz model CPU and displays the stats in the Debug Console. This way it uses a 1 MHz CPU as a reference to return % of speed compared to assumed 1,000,000 CPU cycles or clock ticks per second - which is considered a 100 % speed.
Performance is measured during the whole execution cycle and calculated at the
end of the run. Captured speed is summed with previous result and divided by 2 to produce average emulation speed during single session.

Emulating of pure 6502 machine code with all peripherals emulation disabled (memory mapped devices, character I/O etc.) and time critical debugging facility, the op-codes execute history also disabled, returns performance in range of 646 % on PC1 and 468 % on PC2.
(* see annotation at the bottom for PC configurations used in tests)
Enabling the op-code execute history causes the performance to drop by about 25 %. With all peripherals disabled and op-code history enabled we are down to 411 % on PC1 and 312 % on PC2.

Enabling and adding the emulated memory mapped devices to the pool may cause the emulation speed to drop as well. However with currently implemented peripherals (character I/O, graphics raster device) enabled and actively used and op-codes execute history enabled the performance is still well above 300 % on both PC1 and on PC2.
The same case but with op-code execute history disabled - performance exceeds 400 % on both PC configurations.

Currently the main emulation loop is not synchronized to an accurate clock tick or raster synchronization signal but just runs as fast as it can (no real time simulation). Therefore emulation speed may vary per PC and per current load on the system.

If this code is to be used as a template to implement emulator of a real-world machine, like C-64 or Apple I, it may need some work to improve performance even further, but I think is leaves a pretty good margin for expansion as it is.
On a fast PC the emulation speed above 600 % with basically nothing but CPU emulated and op-codes execute history disabled is IMO decent if we don't want to emulate MOS 6502 machine with clock much faster than 1 MHz.


Debug Console before running 6502 functional test. Note all I/O and execute history are disabled.


6502 functional test program reached its final loop and has been interrupted.
Back in Debug Console at the top the emulation speed stats are shown.



Annotations to 'Performance considerations':
*)

PC1 stats:
Type:           Desktop
CPU:            2.49 GHz (64-bit Quad-core Q8300)
RAM:            4,060 MB
OS:             Windows 10 Pro (no SP) [6.2.9200]

PC2 stats:
Type:           Laptop
CPU:            2.3 GHz (64-bit Quad-core i5-6300HQ)
RAM:            15.9 GB

OS:             Win 10 Home.

Thank you for visiting my blog.

M.K.
8/21/2016

No comments:

Post a Comment