Thursday, September 15, 2016

Virtual Machines and CPU Emulation (VM65), Linux, SDL2 and NCURSES.

One of the desired goals of my 6502 emulator project was that it would work the same on MS Windows and Linux. I neglected Linux port for some time and recently faced the difficult task of making the program work again in Linux environment.

So I added my graphics device emulator code and SDL2 library without much problems but my emulated character I/O stopped working in Linux. After long hours spent investigating the problem and after I tried all known solutions for non-blocking character input in C/C++ on Linux (I needed kbhit()/getch 'conio'-like functionality) to no avail I finally gave up, cut the losses and did huge refactoring switching all Linux code to ncurses library.
Ultimately I think I will move character I/O completely to the graphics device, which should be much more portable and also more true to the hardware architecture of real 6502 based computers. Let's just say that the character I/O that I have in place right now is emulating a serial port type or teletype kind of character I/O. When I'll implement it on my graphics device, it will be more like console I/O type. Anyway, this is how I see it, future will show where the project will take me.

At the same time I have been developing expansion of my virtual graphics device capabilities. The goal is to have a text mode. Well, not exactly a separate text mode, but rather ability to render 8x8 bit characters efficiently on the graphics device by simply pointing the device to characters table memory bank using one register and then write the code of the character and its coordinates in other registers to have the character rendered on the screen. That would enable my console I/O emulation in the future, as mentioned in previous paragraph.
The proof of concept works. I have downloaded Commodore 64 character ROM, converted it to my memory image definition format and have compiled a modified version of EhBasic which has the top of Basic RAM lowered by 4 kB so I can load the C64 characters table right on top of BASIC RAM and before the EhBasic code starts.

To test the proof of concept code, start the emulator (vm65), and in debug console issue commands:


EhBasic will load and prompt for memory size. Just press ENTER and then enter following program:

10 C=0:M=0:N=22:B=65506:POKE B+9,0
15 POKE B+13,N:POKE B+17,0:POKE B+18,0
20 FOR Y=0 TO 24
30 FOR X=0 TO 39
40 POKE B+14,X:POKE B+15,Y
50 POKE B+16,C
60 C=C+1:IF C<256 THEN 120
70 IF N=22 THEN N=23:GOTO 100
80 N=22:IF M=0 THEN M=1:GOTO 100
90 M=0
100 POKE B+13,N:POKE B+18,M
110 Y=Y+1:X=-1:C=0
116 PRINT " MODE, CHAR BANK ";N*2048
120 GET K$:IF K$=" " THEN END
130 NEXT X
140 NEXT Y
150 GOTO 5

Run above program to see both banks of C64 characters print on the SDL2 window first in normal mode, then in reversed color mode.
The expected result should (after a short while) look similar to this:

Later I did some performance optimizations like refreshing the surface only after the whole character is painted instead of painting/refreshing pixel by pixel and also by copying the entire characters table from VM's RAM to internal GraphDisp class buffer each time the virtual graphics device character ROM bank register is updated.
I added method

void GraphDisp::CopyCharRom8x8(unsigned char *pchrom);

specifically for this purpose.
CopyCharRom8x8() must be called each time when address of character ROM changes.
Code snippet from MemMapDev class shows how its done:

} else if ((unsigned int)addr == mGraphDispAddr + GRAPHDEVREG_CHRTBL) {                   
  // set new address of the character table, 2 kB bank #0-31                              
  mGrDevRegs.mGraphDispChrTbl = (unsigned char)(val & 0x003F);                            
  mCharTblAddr = mGrDevRegs.mGraphDispChrTbl * ((MAX_8BIT_ADDR+1) / 0x20);                
  unsigned char char_rom[CHROM_8x8_SIZE];                                                 
  for (unsigned int i=0; i<CHROM_8x8_SIZE; i++) {                                         
  char_rom[i] = mpMem->Peek8bitImg((unsigned short)((mCharTblAddr + i) & 0xFFFF));        

This way character rendering is handled completely internally inside GraphDisp class with this public method:

* Method: PrintChar8x8()                                                                 
* Purpose: Print 8x8 character at specified row and column.                              
* Arguments: code - character code                                                       
* col, row - character coordinates in 8 pixel intervals                                  
* reversed - color mode (reversed or normal)                                             
* Returns: n/a                                                                           
void GraphDisp::PrintChar8x8(int code, int col, int row, bool reversed);                 

My github repository is updated with new code and documentation.
For anybody following this project, I recommend getting this latest version as I found and fixed many problems recently and added cool features described above :-).


Thank you for reading!

Marek K.

Saturday, August 20, 2016

Virtual Machines and CPU emulation - performance.

My work on MOS 6502 emulator continues.

After adding code measuring emulation speed, I realized my emulator was slow. Really slow. Barely exceeding speed of 1 MHz MOS 6502 with only the CPU emulated and going down to ~30 % when character I/O and raster graphics devices were enabled. This is not acceptable on today's fast quad core GHz processors with tons of RAM etc.

Something was very wrong.

After few frustrating hours and some minor optimizations that didn't return satisfying enough results I finally found the major bottleneck. I piece of code implementing debugging facility I almost forgot about - the op-codes execute history, which can be displayed in Debug Console and clearly not needed during normal emulator use. Not only was it implemented without any considerations for speed, but it also unnecessarily disassembled the instructions during each op-code execution to keep the execute history in symbolic form.
During each 6502 op-code execute cycle, the previously executed instruction with argument was disassembled to a symbolic form and combined with CPU registers status then added to the history queue as a string.
Totally unnecessary and expensive time wise.
First of all, disassembling to symbolic form is only needed when displaying the history to the user, therefore it is sufficient to keep history data in a raw format. Huge performance bust. Second of all, the history is only needed during debugging, so I disabled the feature by default and added an option in Debug Console that allows to enable it if necessary.

After I fixed the code I am getting decent performance now. Time for some stats.

Program measures the emulation performance by counting the number of executed clock ticks (cycles) and dividing them by the number of microseconds during which the counted cycles were executed, then multiplying the result by 100 to get the percentage of 1 MHz model CPU and displays the stats in the Debug Console. This way it uses a 1 MHz CPU as a reference to return % of speed compared to assumed 1,000,000 CPU cycles or clock ticks per second - which is considered a 100 % speed.
Performance is measured during the whole execution cycle and calculated at the
end of the run. Captured speed is summed with previous result and divided by 2 to produce average emulation speed during single session.

Emulating of pure 6502 machine code with all peripherals emulation disabled (memory mapped devices, character I/O etc.) and time critical debugging facility, the op-codes execute history also disabled, returns performance in range of 646 % on PC1 and 468 % on PC2.
(* see annotation at the bottom for PC configurations used in tests)
Enabling the op-code execute history causes the performance to drop by about 25 %. With all peripherals disabled and op-code history enabled we are down to 411 % on PC1 and 312 % on PC2.

Enabling and adding the emulated memory mapped devices to the pool may cause the emulation speed to drop as well. However with currently implemented peripherals (character I/O, graphics raster device) enabled and actively used and op-codes execute history enabled the performance is still well above 300 % on both PC1 and on PC2.
The same case but with op-code execute history disabled - performance exceeds 400 % on both PC configurations.

Currently the main emulation loop is not synchronized to an accurate clock tick or raster synchronization signal but just runs as fast as it can (no real time simulation). Therefore emulation speed may vary per PC and per current load on the system.

If this code is to be used as a template to implement emulator of a real-world machine, like C-64 or Apple I, it may need some work to improve performance even further, but I think is leaves a pretty good margin for expansion as it is.
On a fast PC the emulation speed above 600 % with basically nothing but CPU emulated and op-codes execute history disabled is IMO decent if we don't want to emulate MOS 6502 machine with clock much faster than 1 MHz.

Debug Console before running 6502 functional test. Note all I/O and execute history are disabled.

6502 functional test program reached its final loop and has been interrupted.
Back in Debug Console at the top the emulation speed stats are shown.

Annotations to 'Performance considerations':

PC1 stats:
Type:           Desktop
CPU:            2.49 GHz (64-bit Quad-core Q8300)
RAM:            4,060 MB
OS:             Windows 10 Pro (no SP) [6.2.9200]

PC2 stats:
Type:           Laptop
CPU:            2.3 GHz (64-bit Quad-core i5-6300HQ)
RAM:            15.9 GB

OS:             Win 10 Home.

Thank you for visiting my blog.


Tuesday, August 9, 2016

Virtual Machines and CPU emulation - update.

The prior architecture of my emulator has not been easy to expand.
I decided to improve it a bit by adding a new abstraction layer.
To demonstrate it I also added basic graphics raster device to the system.

In microprocessor based systems in majority of cases communication with
peripheral devices is done via registers which in turn are located under
specific memory addresses.
Programming API responsible for modeling this functionality is implemented
in Memory and MemMapDev classes. The Memory class implements access to
specific memory locations and maintains the memory image.
The MemMapDev class implements specific device address spaces and handling
methods that are triggered when addresses of the device are accessed by the
Programmers can expand the functionality of this emulator by adding necessary
code emulating specific devices in MemMapDev and Memory classes implementation and header files.

In current version, two basic devices are implemented:

- character I/O and 
- raster (pixel based) graphics display. 

Character I/O device uses 2 memory locations, one for non-blocking I/O
and one for blocking I/O. Writing to location causes character output, while
reading from location waits for character input (blocking mode) or reads the
character from keyboard buffer if available (non-blocking mode).
The graphics display can be accessed by writing to multiple memory locations.

If we assume that GRDEVBASE is the base address of the Graphics Device, there are following registers:

Offset   Register               Description
 0       GRAPHDEVREG_X_LO       Least significant part of pixel's X (column)
                                coordinate or begin of line coord. (0-255)
 1       GRAPHDEVREG_X_HI       Most significant part of pixel's X (column)
                                coordinate or begin of line coord. (0-1)                                      
 2       GRAPHDEVREG_Y          Pixel's Y (row) coordinate (0-199)
 3       GRAPHDEVREG_PXCOL_R    Pixel's RGB color component - Red (0-255)
 4       GRAPHDEVREG_PXCOL_G    Pixel's RGB color component - Green (0-255)
 5       GRAPHDEVREG_PXCOL_B    Pixel's RGB color component - Blue (0-255)
 6       GRAPHDEVREG_BGCOL_R    Background RGB color component - Red (0-255)
 7       GRAPHDEVREG_BGCOL_G    Background RGB color component - Green (0-255)
 8       GRAPHDEVREG_BGCOL_B    Background RGB color component - Blue (0-255)
 9       GRAPHDEVREG_CMD        Command code
10       GRAPHDEVREG_X2_LO      Least significant part of end of line's X
11       GRAPHDEVREG_X2_HI      Most significant part of end of line's X
12       GRAPHDEVREG_Y2         End of line's Y (row) coordinate (0-199)

Writing values to above memory locations when Graphics Device is enabled
allows to set the corresponding parameters of the device, while writing to
command register executes corresponding command (performs action) per codes listed below:

Command code                    Command description
GRAPHDEVCMD_CLRSCR = 0          Clear screen
GRAPHDEVCMD_SETPXL = 1          Set the pixel location to pixel color
GRAPHDEVCMD_CLRPXL = 2          Clear the pixel location (set to bg color)
GRAPHDEVCMD_SETBGC = 3          Set the background color
GRAPHDEVCMD_SETFGC = 4          Set the foreground (pixel) color
GRAPHDEVCMD_DRAWLN = 5          Draw line
GRAPHDEVCMD_ERASLN = 6          Erase line

Reading from registers has no effect (returns 0).

Above method of interfacing GD requires no dedicated graphics memory space
in VM's RAM. It is also simple to implement.
The downside - slow performance (multiple memory writes to select/unselect 
a pixel or set color).
I plan to add graphics frame buffer in the VM's RAM address space in future

Simple demo program written in EhBasic that shows how to drive the graphics
screen is included. Please check the newest update in my github repository.

I added a User Manual and Programmers Reference document to the project that describes in more detail the architecture of the emulator and how programmer can expand upon it.

After you built the code, start DOS prompt in the program's directory and type:

vm65 ehbas_grdemo.snap

Re-focus on DOS window and in Debug Console prompt enter:

x ffc3

then press ENTER and then type:


and press ENTER.



Sunday, April 17, 2016

Virtual Machines and CPU emulation - update.

Since the last update the development of my MOS 6502 emulator slowed down a bit, but there were some major changes which I completed today. Therefore it is due for an update and I think it may be the last one for the DOS console based emulator.

The most important changes worth mentioning are:

- Cycle-accurate emulation.
- Support for Intel HEX format.
- Ability to save binary snapshot of the memory image with header capturing current state of the CPU and VM configuration.

Other less important changes include:

- Refactoring: huge switch/case in ExecOpcode method was replaced with array of pointers to methods. Each op-code now has its own method in the MKCpu class.
- Changes to the command line arguments.

Source code is available to download from github as usual.

Thanks for visiting my blog.


Monday, March 14, 2016

Virtual Machines and CPU emulation - update.

In previous episode I presented version 2.0 of my MOS 6502 simulator.
I wasn't yet fully satisfied with the character I/O emulation performance, so I started looking if anything could be improved in this area.
First of all, my character I/O emulation was designed to always be performed on a virtual text display device emulator (class Display). This approach required the contents of the virtual text display memory to be frequently updated on the console screen (each time the content was updated). This caused performance issues and flicker when characters were put to the emulated screen at high rate of speed. My friend suggested that I should use native STDIO for rudimentary text I/O. I thought it was a very good idea and even better idea was to always keep a shadow copy in the virtual display device. This way, when executing code in debug mode (either one of the step-by-step modes: S - step, N - go number of steps with or without registers animation - command 'F', or continuous code execute modes: C - continue or G - go/cont. from new address), the contents of the emulated console could always be recalled to the screen (command 'T') when code was interrupted or the screen contents messed up. In these modes native STDIO is not used to directly emulate character I/O - the display device is refreshed to the screen at the same rate as the contents of the text are updated - just like in previous version. 
It is OK for debug modes, they don't need to be fast.
However for continuous code execute mode (X - execute from new address), I wanted to have good performance and non-flicker character output. In this mode, I use native DOS/shell console STDIO to output text. The emulated text device keeps a copy of everything (to its capacity of course) that went to the console though. However it is not output to the console as the characters are already output via native STDIO.
This came along nicely and performance in DOS console improved greatly, flicker is gone. Surprisingly though, this STDIO native mode is much slower on Linux. Perhaps because I am connecting to my Linux box via network while I work on my DOS console directly. The display device emulation works faster in Linux shell than in DOS console though and has no or almost no flicker. 
Also, due to Linux default character output behavior (no buffer flush unless new line is output), I had to add flush of the 'cout' stream after each character output.
Well, I found the new version satisfactory and better than previous one despite the disappointing performance problem on Linux, so i updated code on github. Let's call it version 2.1.
As for the future plans, in addition to goals I set in previous blog update, I am also thinking of making this emulator cycle-accurate and switch to GUI at some point.
But I am having a bit of a burnout with this project, so perhaps I will take a break from it, go to my other open projects and return with renewed enthusiasm later. In the meantime, enjoy this version.


Sunday, March 13, 2016

Virtual Machines and CPU emulation - update.

In my last update I have been ecstatic about my MOS 6502 emulator running Tiny Basic. Nice achievement, but later more rigorous testing shown my emulation still had issues. I ran few available on the internet test suites and they all passed, but still I was unable to run Enhanced Basic in my emulator.
After I established that it was not a problem with the copy of EhBasic code in my possession itself, I turned to my code to look for errors. I found and corrected few obvious issues, but still no luck.
Finally I turned back to the internet search to find a better test suite.
I found a very good one - 6502 Functional Test by Klaus Dormann. 
After I ran that program, I found all issues with 6502 op-codes emulation, some of them were tricky and I am sure I would have problem finding them on my own. Therefore 6502 Functional Test by Klaus Dormann deserves huge credit in making my project going again.

I published version 2.0 on github. There are major changes, many new features and bug fixes. I believe this is a complete or nearly complete DOS/shell version, I run out of ideas what else I can add to a version that runs in text console. To move this project forward I have to move out of the shell window into the GUI world. However I have few ideas for improvements which I will outline below.

To summarize all the changes since originally published version 1.0:

  • Bug fixes in 6502 op-codes emulation.
  • Improved UI.
  • Improved character I/O emulation.
  • ROM emulation.
  • History of executed op-codes.
  • Dis-assembler. 
  • Registers animation mode.
  • Signals handling.
  • Loading binary files.
  • Conversion tool bin2hex (6502 binary to plain text memory image definition format conversion tool).
  • Expanded memory image definition file format (new keywords: IOADDR, ROMBEGIN, ROMEND, ENIO, ENROM, EXEC).
  • Linux port.
  • EhBasic.
  • Microchess.

I have following plans for the next build:

  • Reset option in debug console, which will initiate CPU reset sequence.
  • API methods in MKCpu class for triggering IRQ, NMI and RESET for future expansion.
  • Interface API to easier connect emulated peripherals.
  • Ability to load HEX format files.

With nice working step-by-step debugger, ability to interrupt code at any time, history of executed op-codes, disassembler, memory dump and modification facilities I think this has become nice and useful MOS 6502 code development and debugging tool. Program has a built-in detailed usage help. Recommended DOS console/shell terminal size is 25 or more (more highly recommended) rows by 80 columns. It is possible to work with smaller console, but the menu and registry information sections will not be nicely aligned or may be in total disarray and unreadable. Emulated text console display device will work fine with width less than 80 as well as wider than 80 characters, something that previous version couldn't do.
Instructions how to build the code are included in projects ReadMe file as well as information about main software features.

Below I present screenshots of working application.

Thanks for visiting.


Saturday, February 20, 2016

Virtual Machines and CPU emulation.

Simulation and modeling of the real world, emulation of real hardware and creating virtual worlds are IMO the coolest aspects of computing and programming. Speaking of hardware emulation - I always wanted to roll my own MOS 6502 emulator. Not that there is a lack of good emulators out there. Quite contrary. However, being MOS 6502 (and retro-computing in general) aficionado, I wanted to challenge myself just for the heck of it, to learn if I can pull it on my own and how difficult it would be.
The project I am presenting here is the result of above challenge.
I elected C++ as my hammer to nail it. It is in the very early stage of development, but at this point is usable and presentable, lacking perhaps some supporting tools (assembler, conversion tools etc.).
It implements full list of classic MOS 6502 op-codes (including BCD mode), but it is not a hardware-level emulator (I hope I got the emulation vs. simulation terminology right), meaning I do not emulate the processor cycle to cycle, just simulate the functionality of the MOS 6502 in such a way that the resulting state of the virtual processor registers and flags is the same as when the real processor executes them. There is no emulation of address lines, data lines and signals changes in the real-time or even in the emulated time scale, just the virtual representation of the processor internal state after each op-code.
Program at this point is a single-threaded DOS console executable, equipped with step-by-step debugger which can also execute the memory image in the continuous way or up to the next BRK op-code and can also emulate rudimentary character I/O and a virtual 80x25 display.
I prepared an input file 'tinybasic.dat', which allows to run Tiny Basic interpreter in my VM6502 machine.

After downloading the source code and building it (I use Dev C++ 5.11) invoke the program in DOS prompt like this:


You should see the screen of the debugger console:

then, in the debugger console issue commands:

i e000
x cf0

and then switch to upper case (press Caps Lock) and answer 'C' to the prompt from Tiny Basic for the mode of boot: (Warm/Cold).

Continue working in upper case mode while playing with TB, it does not accept lower case codes, will cause errors.

Source code, open and free for non-commercial use:

To play with my simulator, you need to prepare the memory image file or enter the machine code by hand in debug console/monitor (command: W address hexbyte hexbyte ... 100).

The format of the memory image is ASCII with following protocol:
hexbyte hexbyte hexbyte ...

hexaddr - a 16-bit hexadecimal address in the range $0000..$FFFF, must start with '$' character.
hexbyte - an 8-bit hexadecimal value in the range $00..$FF, must start with '$' character.
All file contents are optional. If not provided, default values are assumed.

Note that you can also use decimal values, when they don't start with '$', they will be interpreted as decimal numbers.
Address in the next line after ADDR keyword is the run address of the program. It is optional as all other tokens, default is 0x0000 (I know, this is incorrect and I will change it in the next version to reset vector) and can be changed in debug console with command: A address.
Address in the next line after ORG keyword is the starting address of where following data will be but in the memory image. You can use ORG several times in the memory image file, so yo don't have to put vast amounts of 0 codes when you need to skip large areas of memory to put the next meaningful data in it.

The memory image file is the argument to the program:

mkbasic memoryimagefile

It is not mandatory. If none is provided, program will try to read file 'dummy.ram' and will display warning if that file does not exist as well. 
With no input provided, all memory is filled with zero except IRQ/BRK vector $FFFE which is initialized with $FFF0 and op-code RTI is put inder $FFF0 address. I don't yet initialize NMI and RST vectors, this will go to the next version.
Program also tries to read ROM image file 'dummy.rom'. This functionality is not used for now and I am not sure if I will continue it or rather invent a different scheme of memory mapping. I provide both: 'dummy.rom' and 'dummy.ram' files with my program, so make sure you have them in the current directory if yo want to avoid warning messages.
The ROM area is currently emulated in the RAM image by rendering the addresses $D000 to $DFFF a read-only area. You can load data into this area from memory image file during program startup, but you cannot modify these addresses during simulation. This currently cannot be customized, but I will work on the new memory mapping scheme that will allow some sort of memory configuration mimicking the real world computer systems and allowing user to customize the memory configuration.

The rudimentary character input/output can be emulated.
This is by default turned off, but can be enabled in debug console (command: I hexaddr). The default address is $E000 and you should use it when running Tiny Basic emulator as that image has this address hard-coded as character I/O.
The same address $E000 is used to emulate character input (when reading from memory) or output (when writing to memory). Emulator handles this internally by either prompting the single character from user (non-blocking mode) or putting the character received from simulated 6502 code into the internal buffer which is then sent to the display device simulator.

Sorry, as I said before for now I don't provide conversion tools to prepare memory image input file, so when you generate machine code from your favorite 6502 assembler, you need to use 3-rd party tools or your own scripts or programs to convert the resulting binary code to the format I described above.
I used Kowalski's emulator, which has integrated assembler and also can save the code as a data file if the dis-assembler window is active. After that you need to process resulting file in programmer's editor or other word processor to remove 'eXXXX:' labels and '.DB' and '.END' directives and add:
at the beginning. This will result in memory image file compatible with my VM.

I will provide proper tools in the next release.

I hope you like it and please provide valuable feedback for me to improve on it, although don't expect too much in the area of improving user friendliness. This project is intended rather for myself than for the world. My intention is to create an emulator of the 8-bit home-brew 6502 computer that I am building now (check my other blog) and also as a base for more elaborate Virtual Machine that will have op-codes beyond MOS 6502 which I will use to build interpreters of some retro computer languages, like BASIC for example.

Thank you for visiting my blog.