Win32Emu / DIY WOW

This is a guest post by CaptainWillStarblazer

When the AXP64 build tools for Windows 2000 were discovered back in May 2023, there was a crucial problem. Not only was it difficult to test the compiled applications since you needed an exotic and rare DEC Alpha machine running a leaked version of Windows, it was also difficult to even compile the programs, since you needed the same DEC Alpha machine to run the compiler; there was no cross-compiler.

As a result, I began writing a program conceptually similar to WOW64 on Itanium (or WX86, or FX-32), only in reverse, to allow RISC Win32 programs to run on x86.

The PE/COFF file format is surprisingly simple once you get the hang of it, so loading a basic Win32 EXE that I assembled with NASM  was pretty simple – just map the appropriate sections to the appropriate areas, fix up import tables, and start executing.

To start, I wrote a basic 386 emulator core. To complement it, I wrote my own set of Windows NT system DLLs (USER32, KERNEL32, GDI32) that execute inside of the emulator and then use an interrupt to signal a system call  which is trapped by the emulator and thunked up to execute the API call on the host.

For example, up above, you can see that the emulated app calls MessageBoxA inside of the emulated USER32, which puts 0 in EAX (the API call number for MessageBoxA) and then does the syscall interrupt (int 0x80 in my case), which causes the emulator to grab the arguments off of the stack and call MessageBoxA.

To ease communication between the host’s Win32 environment and the emulated Win32 environment, I ran the emulated CPU inside of the host’s memory space, which means that to run applications written for a 32-bit version of Windows NT, you need a 32-bit version of win32emu (or a 64-bit version with /LARGEADDRESSAWARE:NO passed to the linker) to avoid pointer truncation issues, to prevent Windows from mapping memory addresses inaccessible by the emulated CPU.

To get “real” apps working, a lot of single-stepping through the CRT was required, but eventually I did get Reversi – one of the basic Win32 SDK samples – to work, albeit with some bugs at first. Calling a window procedure essentially requires a thunk in reverse, so I inserted a thunk window procedure on the host side that calls the emulated window procedure and returns the result.

It’s amazing, it’s reversi!

After this, I got to work on getting more complicated applications to work. Several failed due to lack of floating-point support, some failed due to unsupported DLLs, but I was able to get FreeCell and WinMine to work (with some bugs) after adding SHELL32. I was able to run the real SHELL32.DLL from Windows NT 3.51 under this environment.

Freecell
Minesweeper

One might wonder why I put all this work into running x86 programs on x86, but the reason is that there’s the most information about, and I’m most proficient with, Windows on the 386. Not only does Windows on other CPUs use other CPUs, but also there’s different calling conventions and a lot of other stuff I didn’t want to mess with at first. But this was at least a proof-of-concept to build a framework where I could swap the CPU core for an emulator for MIPS or PPC or Alpha or whatever I wanted and get stuff running.

Astute readers might be wondering why I didn’t take the approach taken by WOW64. For those who don’t know, most system DLLs on WOW64 are the same as those in 32-bit Windows, the only ones that are different are ones with system call stubs that call down to the kernel (NTDLL, GDI32, and USER32, the first of which calls to NTOSKRNL and the latter two calling to WIN32K.SYS). WOW64 instead calls a function with a system call dispatch number, which does essentially the same thing. The reason for this is that the system call numbers are undocumented and change between versions of Windows. WOW64, being an integrated component of Windows, can stay up to date. If I took this approach, I’d either have to stay locked to one emulated set of DLLs (i.e. from NT 4.0) and use their system call numbers on the emulated side, or write my own emulated DLLs and stick to a fixed set of numbers, but either way I’d somehow have to map them to whatever syscall numbers are being used on the host.

As I went on, I should probably also mention that what I said earlier about loading Win32 apps being easy was wrong. Loading a PE image is pretty straightforward, but once you get into populating the TEB and PEB (many of whose fields are undocumented), it quickly gets gnarly, and my PEB emulation is incomplete.

Adding MIPS support wasn’t too much of a hassle, since the MIPS ISA (ignoring delay slots, which gave me no shortage of trouble) is pretty clean and writing  an emulator wasn’t difficult. The VirtuallyFun Discord pointed me to Embedded Visual C++ 4.0, which was invaluable to me during development, since it included a MIPS assembler and disassembler, which I haven’t seen elsewhere. After writing a set of MIPS thunk DLLs and doing some more debugging, I finally got Reversi working.

There’s still some DLL relocation/rebasing issues, but Reversi is finally working in this homebrewed WOW!

I’d encourage someone to write a CPU module for the DEC Alpha AXP (or even PowerPC if anyone for some reason wants that). The API isn’t too complicated, and the i386 emulator is available for reference to see how the CPU emulator interfaces with the Win32 thunking side. An Alpha backend for the thunk compiler can definitely be written without too much trouble. Obviously, the AXP presents the challenge that fewer people are familiar with its instruction set than MIPS or 386, but this approach does free one from having to emulate all of the intricate hardware connections in actual Alpha applications while still running applications designed for it, and I’ve heard the Alpha is actually quite nice and clean. MAME’s Digital Alpha core could be a good place to start, but it’ll need some adaptation to work in this codebase. Remember that while being a 64-bit CPU with 64-bit registers and operations, the Alpha still runs Windows with 32-bit pointers, so it should run in a 32-bit address space (i.e. pass /LARGEADDRESSAWARE:NO to the linker).

Theoretically, recompiling the application to support the full address space should enable emulation of AXP64 applications, since the Alpha’s 64-bit pointers will allow it to address the host’s 64-bit address space, but I’m not sure if my emulator is totally 64-bit clean, or if the AXP64’s calling convention is materially different from that on the AXP32 in such a way that would require substantial changes. In either case, most of the code should still be transferable.

I also want to get more “useful” applications running, like development tools (i.e. the MSVC command line utilities – CL, MAKE, LINK, etc.) and CMD. Most of that probably involves implementing more thunks and potentially fixing CPU bugs.

This project is obviously still in a quite early stage, but I’m hoping to see it grow and become something useful for those in the hobby.

For those who want to play along at home, you can download the binary snapshot here: w32emu.zip

A more complete version of the writeup is available here: https://bhty.github.io/og/win32emu_VirtuallyFun_Post.htm and you can find the project here https://github.com/BHTY/Win32Emu/.

2 thoughts on “Win32Emu / DIY WOW

  1. Cool project! I do have some experience with Alpha, including code generation in a JIT compiler, but I am not sure if I will be able to set aside enough time for this. Maybe a few years down the road… 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.