I've always been intrigued by virtual machines and emulation as well. I've always wanted to try and make an emulator of some kind. I don't know much about the internals of VirtualBox, but my suggestion would be to start "easy" with one CPU/Computer System/Game Console and go from there. That's what I finally did with the 6502 and Commodore 64.
Conventionally, one starts from the CHIP-8, which is indeed a virtual machine rather than a system in a strict sense.
What I've found difficult is the step beyond that. NES and GameBoy are typical steps, however, I've been very frustrated by the confusing documentation of the GameBoy. There are 3/4 references, but one of them has significant mistakes, while another is incomplete. On the other hand, the Pan Docs should be complete and accurate.
I'm not sure if there is an easy middle ground, that, at the same time, is also well documented.
The Atary 2600 is architecturally simpler but less documented, and also requires very accurate timings. I've read somebody suggesting systems like Channel F, Astrocade and Odyssey2, but I'm not sure they're well documented.
I've personally lost my interest once I've found that building an emulator was essentially fighting specifications rather than actually building something.
I built about a third of a NES emulator. The nesdev wiki is mostly decent, although there's a fair number of things where it seems like the first people to figure things out got stuff kind of backwards, and if you flip it, it's a lot easier, that's the sort of fighting the specifications I think you're talking about.
All that said, emulating the CPU was pretty fun. There's a CPU test rom out there you can run with tracing and compare to the published results. I also got the background tiling from the PPU done, but the foreground processing has a lot of steps, so I indefinitely paused for now.
Also, I had amazingly poor performance, so I wasn't super motivated to continue.
The 2600 has a very similar cpu, but the very limited Stella output chip means most games are very timing dependent, which means you have to be super accurate, which adds difficulty. I think you should try to be cycle accurate anyway, but it's easy to mess that up, and having some freedom would be nice.
I did a GameBoy and similarly found the CPU enjoyable and the PPU a huge pain. Perhaps if I understood graphics better, I would have enjoyed it more, but like you say it just felt like a lot of steps.
I don't know if the GameBoy PPU has the background vs foreground split. The background processing was pretty reasonable, and once you got it kind of working, it was fun to debug and get it actually working. My favorite thing was when I was processing everything in the wrong order so the menu of the rom I was using to test had all the words backwards.
But the foreground / object sprites have this huge pipeline. IIRC, the PPU determines which sprites to draw in line X + 1 during line X. After that, it has to load the data for each object, etc etc. It was just discouraging. Plus since my frame rate is so low, I have to sit at a blank screen for quite some time waiting for the game to show anything, and longer for the demo to start (I don't have controls)...
A subset of CP/M calls is a pretty simple "rest of the system" to implement on top of an 8080/Z80 CPU emulation. (It's a bit of a cheat - like qemu's "Linux user mode emulation" or early version of DOSBox, because you restrict software to interacting with a high-level software interface, there are no lower-level details to aim for fidelity with)