Tangential question: why does it normally take so long to start traditional VMs in the first place? At least on Windows, if you start a traditional VM, it takes several seconds for it to start running anything.
Edit: when I say anything, I'm not talking user programs. I mean as in, before even the first instruction of the firmware -- before even the virtual disk file is zeroed out, in cases where it needs to be. You literally can't pause the VM during this interval because the window hasn't even popped up yet, and even when it has, you still can't for a while because it literally hasn't started running anything. So the kernel and even firmware initialization slowness are entirely irrelevant to my question.
You can optimize a lot to start a Linux kernel in under a second, but if you're using a standard kernel, there are all manners of timeouts and poll attempts that make the kernel waste time booting. There's also a non-trivial amount of time the VM spends in the UEFI/CSM system preparing the virtual hardware and initializing the system environment for your bootloader. I'm pretty sure WSL2 uses a special kernel to avoid the unnecessary overhead.
You also need to start OS services, configure filesystems, prepare caches, configure networking, and so on. If you're not booting UKIs or similar tools, you'll also be loading a bootloader, then loading an initramfs into memory, then loading the main OS and starting the services you actually need, with eachsstep requiring certain daemons and hardware probes to work correctly.
There are tools to fix this problem. Amazon's Firecracker can start a Linux VM in a time similar to that of a container (milliseconds) by basically storing the initialized state of the VM and loading that into memory instead of actually performing a real boot. https://firecracker-microvm.github.io/
On Windows, I think it depends on the hypervisor you use. Hyper V has a pretty slow UEFI environment, its hard disk access always seems rather slow to me, and most Linux distro don't seem to package dedicated minimal kernels for it.
I'm saying it takes a long time for it to even execute a single instruction, in the BIOS itself. Even for the window to pop up, before you can even pause the VM (because it hasn't even started yet). What you're describing comes after all that, which I already understand and am not asking about.
Unsubstantiated hunch: the hypervisor is doing a shitload of probes against the host system before allocating/configuring virtual hardware devices/behaviors. Since the host's hardware/driver/kernel situation can change between hypervisor invocations, it might have to re-answer a ton of questions about the host environment in order to provide things like "the VM/host USB bridge uses so-and-so optimized host kernel/driver functionality to speed up accesses to a VM-attached USB device". Between running such checks for all behaviors the VM needs, and the possibility that wasteful checks (e.g. for rare VM behaviors or virtual hardware that's not in use) are also performed, that could take some time.
On the other hand, it could just as easily be something simple, like setting up hugepages or checksumming virtual hard disk image files.
Both are total guesses, though. Could be anything!
I have always wondered the same, never tried looking into it but I wouldn't be surprised if Defender at least played a part in it. Defender is a huge source for general slowness on Windows from my experience.
Please tell me you are joking. Even if it’s a lie.
Management Engine .. actually I do not have the energy to deal with paranoid people. I never had that kind of energy. I never will. You’re all so efficient at drawing energy out of conversations and killing them. You’re like conversational vampires. It’s exhausting.
I don’t even care if you’re right or wrong about Intel ME. It is just so exhausting listening to you guys because of your word choices. It’s like you try to get ignored.
I respect your opinion and all, you just need to work on your messaging or something.
I think you need to provide more details on what VM software you’re using. On VirtualBox what you describe is very noticeable, and it didn’t have that delay in older versions. So it could be just an issue with that VM software and not a general “traditional VMs” issue.
Yup I'm asking about VirtualBox mainly, I just don't understand what the heck it's doing during that time that takes so long. Although I don't recall other VMs (like say, Hyper-V) being dramatically different either (ignoring WSL2 here).
Are you just guessing or have you actually seen the delay I'm talking about disappear as a result of this (or as a result of anything else for that matter)? Because I've already done this (yes, entirely, even the kernel mode drivers) and it's definitely not the issue.
There was a release of subversion back in the day that reduced the number of files that were opened during a repo action like pull, and the number of times any one file got opened. On Linux it ran about 2-3x faster. Very nice change.
On windows it was almost 10x faster. On the project where this change was released, my morning ritual was to come in, log on, run an svn pull command, lock my screen and go get coffee. I had at least ten minutes to kill after I got coffee, if the pot wasn’t empty when I got there.
Windows is hot garbage about fopen particularly when virus scanning is on.
Yes. The delay you’re complaining about happens because you are looking at general hypervisors which also come with virtualized hardware and need to mimic a bunch of stuff so that most software will work as usual.
For example: your VM starts up with the CPU in 16 bit mode because that’s just how things work in x86 and then it waits for the guest OS to set the CPU into 64 bit mode.
This is completely unnecessary if you just want to run x86-64 code in a virtualized environment and you control the guest kernel and can just assume things are in 64bit mode because it’s not the 70s or whatever
The guest OS would also need to probe few ports to get a bootable disk. If you control the kernel then you can just not do that and boot directly.
No it is not. The “first instruction in the BIOS” is 16 bit mode code when dealing with an x86 VM.
A virtual environment doesn’t even really need any BIOS or anything like that.
You can feel free to test with qemu direct kernel booting to see this skips a lot of delay without even having to use a specialized hypervisor like firecracker
A bare VM may not have a BIOS, it's just partitioning supported by the host CPU and OS. The emulation of the legacy PC hardware stack for conventional OS compatibility is a separate thing. If the guest OS is custom-designed to launch in a bare VM with known topology it can boot very, very fast.
In Linux, VM memory allocations can be slow if it tries to allocate GBs of RAM using 4K pages. There are ways to help it allocate 1GB at a time which vastly speeds it up.
Try Windows Server Core on an SSD. I've seen VMs launch in low single-digit seconds. You can strip it down even further by removing non-64-bit support, Defender, etc...
I mean it is basically booting a computer from scratch, kind of makes sense. You have to allocate memory, start virtual CPUs, initialize devices, run BIOS/UEFI checks, perform hardware enumeration, all that jazz while emulating all of it, which tends to be slower than "real" implementations. I guess there is a bunch of processes for security as well, like wiping like zeroing pages and similar things that takes additional time.
If I let a VM use most of my hardware, it takes a few seconds from start to login prompt, which is the same time it takes for my Arch desktop to boot from pressing the button to seeing the login prompt.
> You have to allocate memory, start virtual CPUs, initialize devices, run BIOS/UEFI checks, perform hardware enumeration, all that jazz while emulating all of it, which tends to be slower than "real" implementations.
That's not what I'm asking.
I'm saying it takes a long time for it to even execute a single instruction, in the BIOS itself. Even for the window to pop up, before you can even pause the VM (because it hasn't even started yet). What you're describing comes after all that, which I already understand and am not asking about.
Without any context in terms of what the VM is doing or what VMM software you use, my best guess is that the OS/VMM are pre-allocating memory for the VM. This might involve paging out other processes' memory, which could take some time.
I think task manager would tell you if there is a blip of memory usage and paging activity at the time. And I'm sure windows itself has profilers that can tell you what is happening when the VM is started..
VirtualBox on Windows, primarily. Though I feel like haven't seen other VMs in the past start up a whole ton faster (maybe a somewhat) (ignoring WSL2). Page files are already disabled, there's plenty of free RAM, and it makes no difference how little RAM the guest is allocated (even if it's 256MB). So no, those are not the issues. VirtualBox itself seems to be doing something slow during that time and I don't know what that is.
I remembered something about VirtualBox not playing nicely with Hyper-V on Windows, and dug up a possibly relevant post[0] on their forums. IIRC we ended up moving a few build systems to Docker and dropping VirtualBox because of hyper-v related issues, but it's been a few years.
That's the unrelated green-turtle issue. It's only relevant after the guest has actually started running instructions. I'm talking about before that point.
I'm not aware of any turtles, that was just the first thing I found when trying to see if VirtualBox and Hyper-V were still a problematic combo.
Again, it was a few years ago, but we didn't solve the problem or identify an actual root cause. We stopped banging our heads against that particular wall and switched technologies.
What is your definition of free memory? If the system has read a lot of data, the page cache is probably occupying most of the RAM you consider free. Look at cache and standby counters.
I’ve noticed that windows can only evict data from the page cache at about 5 GB/s. I do not know if this zeros the memory or that would need to be done in the allocation path.
A couple years ago I tracked down a long pause while starting qemu on Linux to it zeroing the 100s of GB of RAM given to the VM as 1 GB huge pages.
These may or may not be big contributors to what you are seeing, depending on the VM’s RAM size.
I experienced something similar back when Microsoft decided to usurp all hypervisors made for Windows and make Windows itself run as a VM on Hyper-V running as a Type 1 hypervisor on the hardware. That made it so other VMs could only run on Hyper-V alongside Windows or with nested virtualization.
So this meant VMWare, VirtualBox, etc as they were would no longer work on Windows. Microsoft required all of them to switch to using Hyper-V libs behind the scenes to launch Hyper-V VMs and then present them as their own (while hiding them from the Hyper-V UI).
VirtualBox was slow, hot garbage on its own before this happened, but now it's even worse. They didn't optimize their Hyper-V integration as well as VMWare (eventually) did. VMWare is still worse off than it was though since it has to inherit all of Hyper-V's problems behind the scenes.
Edit: when I say anything, I'm not talking user programs. I mean as in, before even the first instruction of the firmware -- before even the virtual disk file is zeroed out, in cases where it needs to be. You literally can't pause the VM during this interval because the window hasn't even popped up yet, and even when it has, you still can't for a while because it literally hasn't started running anything. So the kernel and even firmware initialization slowness are entirely irrelevant to my question.
Why is that?