Press "Enter" to skip to content

Hypervisor From Scratch – Part 6: Virtualizing An Already Running System

Sina Karvandi 10

Introduction

Hello and welcome to the 6th part of the tutorial Hypervisor From Scratch. In this part, I try to give you an idea of how to virtualize an already running system using Hypervisor. Like other parts, this part is really dependent to the previous parts so make sure to read them first.

Overview

In the 6th part, we’ll see how we can virtualize our currently running system by configuring VMCS, then we use monitoring features to detect execution of some important instructions like CPUID (and change the result of CPUID from user and kernel-mode), detecting modifications on different control registers, describing about VMX capabilities on different microarchitectures, talking about MSR Bitmaps and lot’s of other cool thing.

Before starting I should give my special thanks to my friend Petr Benes as he always solves my problems and explains me patiently and giving me ideas to implement a hypervisor from scratch.

The full source code of this tutorial is available on GitHub :

[https://github.com/SinaKarvandi/Hypervisor-From-Scratch]

Please make sure to have your own lab to test your hypervisor, I test my hypervisor on the 7th generation of Intel processors, so some features might not be supported on your processor and without a remote kernel debugger (not local kernel debugger) you might see your system halting or BSODs without understanding the actual error. By the way, It’s time to see our hypervisor…

Table of contents

  • Introduction
  • Overview
  • Table of contents
  • VMX 0-settings and 1-settings
  • VMX-Fixed Bits in CR0 and CR4
  • Capturing the State of Current Machine
    • Configuring VMCS Fields
    • Changing IRQL on all Cores
  • Changing the User-mode App
    • Getting handle using CreateFile
  • Using VMX Monitoring Features
    • CR3-Target Controls
    • Handling guest CPUID execution
    • Instructions That Cause VM Exits Conditionally
    • Control Registers Modification Detection
    • MSR Bitmaps
      • Handling MSRs Read
      • Handling MSRs Write
  • Turning off VMX and Exit from Hypervisor
  • VM-Exit Handler
  • Let’s Test it!
    • Virtualizing all the cores
    • Changing CPUID using Hypervisor
    • Detecting MSR Read & Write (MSRBitmap)
  • Conclusion
  • References

VMX 0-settings and 1-settings

In the previous parts, we implement a function called AdjustControl. This is an important part of each hypervisor as you might want to run your hypervisor on many different processors with different microarchitectures so you should be aware of your processor capabilities to avoid undefined behaviors and VM-Entry errors.

If you remember from the previous part we used the above function in 4 situations.

A brief look at APPENDIX A -VMX CAPABILITY REPORTING FACILITY shows the explanation about RESERVED CONTROLS AND DEFAULT SETTINGS, In Intel VMX certain controls, are reserved and must be set to a specific value (0 or 1) determined by the processor. The specific value to which a reserved control must be set is its default setting. These kinds of settings vary for each processor and microarchitecture but in general, there are three types of classes :

Always-flexible: These have never been reserved.
Default0: These are (or have been) reserved with a default setting of 0.
Default1: They are (or have been) reserved with a default setting of 1.

Now, There are separate capability MSRs for the pin-based VM-execution controls, the primary processor-based VM-execution controls, VM-Entry Controls, VM-Exit Controls and the secondary processor-based VM-execution controls.

We have these MSRs :

  • MSR_IA32_VMX_PROCBASED_CTLS
  • MSR_IA32_VMX_PROCBASED_CTLS2
  • MSR_IA32_VMX_EXIT_CTLS
  • MSR_IA32_VMX_ENTRY_CTLS
  • MSR_IA32_VMX_PINBASED_CTLS

In all of the above MSRs, bits 31:0 indicate the allowed 0-settings of these controls. VM entry allows control X (bit X) to be 0 if bit X in the MSR is cleared to 0; if bit X in the MSR is set to 1, VM entry fails if control X is 0. Meanwhile, bits 63:32 indicate the allowed 1-settings of these controls. VM entry allows control X to be 1 if bit 32+X in the MSR is set to 1; if bit 32+X in the MSR is cleared to 0, VM entry fails if control X is 1.

Although there are some exceptions, now, you should understand the purpose of AdjustControls as it first reads the MSR corresponding to the VM-execution control then adjust the 0-settings and 1-settings and return the final result.

I really recommend seeing the result of AdjustControls specifically for MSR_IA32_VMX_PROCBASED_CTLS and MSR_IA32_VMX_PROCBASED_CTLS2 as you might unintentionally set some of the bits to 1 so you should have a plan for handling some VM-Exits based on your specific processor.

VMX-Fixed Bits in CR0 and CR4

For CR0, IA32_VMX_CR0_FIXED0 MSR (index 486H) and IA32_VMX_CR0_FIXED1 MSR (index 487H) and for CR4 IA32_VMX_CR4_FIXED0 MSR (index 488H) and IA32_VMX_CR4_FIXED1 MSR (index 489H) indicate how bits in CR0 and CR4 may be set in VMX operation. If bit X is 1 in IA32_VMX_CRx_FIXED0, then that bit of CRx is fixed to 1 in VMX operation. Similarly, if bit X is 0 in IA32_VMX_CRx_FIXED1, then that bit of CRx is fixed to 0 in VMX operation. It is always the case that, if bit X is 1 in IA32_VMX_CRx_FIXEDx, then that bit is also 1 in IA32_VMX_CRx_FIXED1.

Capturing the State of Current Machine

In the 5th part, we saw how to configure different VMCS fields and finally execute our instruction (HLT) under the guest state. This part is really similar to the previous part with some minor changes in some VMCS attributes, let’s review and see the differences.

The first think you need to know is that you have to create different stacks for each core as we’re going to virtualize all the cores simultaneously. These stacks will be used whenever a VM-Exit occurs.

As you can see from above code we use VMM_Stack for each core separately (defined in _VirtualMachineState structure).

All the other things like Clearing VMCS State, loading VMCS and executing VMLAUNCH is exactly the same as previous part so I don’t want to describe them again, but see the function which is responsible for preparing our current core to be virtualized.

From the above code, Setup_VMCS_Virtualizing_Current_Machine is new, so let’s see what’s inside this function.

Configuring VMCS Fields

VMCS Fields are nothing new, it should the be configured to manage the state of virtualized core.

All the VMCS fields are the same as previous part, except :

For the CPU_BASED_VM_EXEC_CONTROL, we set CPU_BASED_ACTIVATE_MSR_BITMAP, this way you can enable MSR BITMAPs filter (described later in this part). Setting this field is somehow mandatory as you might guess, Windows access lots of MSRs during a simple kernel execution so if you don’t set this bit, then you’ll exit on each MSR access and of course, your VMX Exit-Handler is called so clearing this bit to zero makes the system notably slower.

For the SECONDARY_VM_EXEC_CONTROL, we use CPU_BASED_CTL2_RDTSCP to enable RDTSCP, CPU_BASED_CTL2_ENABLE_INVPCID to enable INVPCID and the CPU_BASED_CTL2_ENABLE_XSAVE_XRSTORS to enable XSAVE and XRSTORS.

It’s because I run the above code in my Windows 10 1809 and see Windows uses INVPCID and XSAVE for it’s internal use (in the processors that support these features), so if you didn’t enable them before virtualizing the core, then it probably lead to error.

Note that, RDTSCP reads the current value of the processor’s time-stamp counter (a 64-bit MSR) into the EDX:EAX registers and also reads the value of the IA32_TSC_AUX MSR (address C0000103H) into the ECX register. This instruction adds ordering to RDTSC and makes performance measures more accurate than using RDTSC. INVPCID, Invalidates mappings in the translation lookaside buffers (TLBs) and paging-structure caches based on the process-context identifier (PCID) and XSAVE Performs a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand.

Please make sure to review the final value that you put on these fields as your processor might not support all these features so you have to implement some additional functions or ignore some of them.

Nothing else for explaining in the function except, GuestStack which is used as the GUEST_RSP, I’ll tell you what to put in this argument later.

OK, now the problem is from where we can start our hypervisor, I mean how to save the state of a special core then execute VMLAUNCH on it and then continue the rest of execution.

For this purpose, I’ve changed the DrvCreate routine so you have to change CreateFile from user-mode application (I talk about it later), as a matter of fact, DrvCreate is the function which is responsible for putting all the cores in VMX state.

Our tiny driver is designed to be used in just One core, or two, three and even all the cores simultaneously, so as you can see from the bellow code, it gets logical processor count.

You can edit this line to virtualize a special number of cores or just a specific core but the above code virtualize all the cores by default.

Changing IRQL on all Cores

There is a special function called RunOnProcessor. This function takes processor ID as its first parameter, initialized EPTP pointer (explained in the 4th part) as the second parameter and a special routine called VMXSaveState as the third. RunOnProcessor set the processor affinity to a special core, then it raises the IRQL to Dispatch Level so the Windows Scheduler can’t kick in to change the context thus it runs our Routine and when it returns from VMXSaveState, the currently running core is virtualized so it can lower the IRQL to what it was before and now Windows can continue its normal execution while it’s under hypervisor. IRQL stands for Interrupt Request Level which is a Windows-specific mechanism to manage interrupts or giving priority by their level so raising IRQL means your routine will execute with higher priority than normal Windows codes (PASSIVE LEVEL & APC LEVEL ). For more information, you can visit here.

VMXSaveState has to save the state and call our already implemented function VirtualizeCurrentSystem.

We have to EXTERN this function in our assembly file (VMXState.asm) as all VMXSaveState is implemented in assembly.

VMXSaveState implemented like this :

It first saves a backup from all the registers, then subtracts the stack because of Shadow Space for fast call functions and then puts RSP to r8 and calls the VirtualizeCurrentSystem. The reason why RSP should be moved into the R8 (as I told you for GuestStack ) is in x64 fastcall parameter should be passed in this order : RCX , RDX , R8 , R9 + Stack , this means that our third argument to this function is current RSP, this value will be used as GUEST_RSP in our VMCS fields.

If the above function runs without error we should never reach to “ret” instruction as the state will later continue in another function called “VMXRestoreState“.

As you can see in the VirtualizeCurrentSystem which eventually calls Setup_VMCS_Virtualizing_Current_Machine, the GUEST_RIP is pointing to VMXRestoreState so the first function (routine) that executes in current core is VMXRestoreState. This function is defined like this :

In the above function first, we remove the Shadow Space and restore the registers state, when we return to RunOnProcessor, now it’s time to lower the IRQL.

This function will be called many times (based on your logical cores count) and eventually, all of your cores are under VMX operation and now you are in VMX non-root operation.

Changing the User-mode App

Based on the above assumptions we have to make some trivial changes on our user-mode application, so after loading the driver it can be used to notify kernel-mode code to start and end of loading the hypervisor.

Getting handle using CreateFile

After some checks for the vendor and presence of hypervisor, now we have to call DrvCreate and it’s through CreateFile user-mode function.

CreateFile API gives us a handle that can be used in our future functions but whenever you close the application or call CloseHandle then DrvClose is automatically called. DrvClose turns off the hypervisor and restores the state to what it was before (not virtualized).

Using VMX Monitoring Features

After configuring all the above fields, now it’s time to use the monitoring features using VMX, you’ll see how these features are unique in the case of security applications.

CR3-Target Controls

The VM-execution control fields include a set of 4 CR3-target values and a CR3-target count. If you see the VMCS I presented in the Setup_VMCS_Virtualizing_Current_Machine then you can see the following lines :

Intel defines CR3-Target Controls as “An execution of MOV to CR3 in VMX non-root operation does not cause a VM exit if its source operand matches one of these values. If the CR3-target count is n, only the first n CR3-target values are considered.”

Future processors might extend the Cr3-Target counts, the implementation of using this feature is like this :

I don’t have any good example of how this control might be helpful in a regular Windows as there are thousands of CR3 changes for each process but one of my friends told me that, it’s used in some special cases in scientific projects to improve the overall performance.

Handling guest CPUID execution

CPUID is one the main instructions that cause the VM-Exit. As you know, CPUID is used because it allows software to discover details of the processor. [If you want to know additional usage, I saw software use CPUID for flushing the pipeline for processors that don’t support instruction like RDTSCP so they can use CPUID + RDTSC and somehow gain a better result.]

Whenever any software in any privilege level, executes a CPUID instruction, then your handler is called, now you can decide whatever you want to show to the software, for example, previously I published an article “Defeating malware’s Anti-VM techniques (CPUID-Based Instructions)“. This article describes how to configure VMWare in a way that changes the CPUID instruction results so that the malware with anti-vm techniques can’t understand that they’re executing in a virtualized environment by executing the CPUID. VMWare (and other virtual environments) perform exactly the same mechanism for handling CPUID, in the following example I just passed the state of registers (state of registered after a VM-exits occurs) to the HandleCPUID. This function decides whether the requested CPUID should have a modified result or just execute CPUID and return the orginal results.

Let’s implement our handler,

The default behaviour for handling every VM-Exit (caused by execution of CPUID in VMX Non-root) is to get the original result by using _cpuidex which is the intrinsic function for CPUID.

So you can see that VMX Non-root by itself isn’t able to execute a CPUID and we can execute CPUID in VMX Root Mode and give back the results to the VMX Non-root mode.

Now, we need to check if RAX (CPUID Index) was 1, it’s because there is an indicator bit that shows whether the current machine is running under hypervisor or not, like many other virtual machines we set the HYPERV_HYPERVISOR_PRESENT_BIT to show that we’re running under a hypervisor.

There is a second check about hypervisor provider, we set it to ‘HVFS‘ to show that our hypervisor is [H]yper[V]isor [F]rom [S]cratch.

Now you can easily add more checks to the above code and customize your CPUID filter for instance, changing your computer vendor string and etc.

Here is the definition of hypervisor-related constants :

Finally, we put them into the registers, so that every time our routine is executed, then guest has a proper results.

Putting all the above codes together we have the following function :

It’s somehow like instruction level hooking for CPUID, also you can have the same handling functions for many other important instructions by configuring the primary and secondary processor based controls below is a list of these instructions.

Instructions That Cause VM Exits Conditionally

Thanks to my friend, @LordNoteworthy the following list is available.

  • Instructions cause VM exits in VMX non-root operation depending on the setting of the VM-execution controls.
    • CLTS
    • ENCLS
    • HLT
    • IN, INS/INSB/INSW/INSD, OUT, OUTS/OUTSB/OUTSW/OUTSD.
    • INVLPG
    • INVPCID
    • LGDT, LIDT, LLDT, LTR, SGDT, SIDT, SLDT, STR
    • LMSW
    • MONITOR
    • MOV from CR3/CR8, MOV to CR0/1/3/4/8
    • MOV DR
    • MWAIT
    • PAUSE
    • RDMSR, WRMSR
    • RDPMC
    • RDRAND, RDSEED
    • RDTSC, RDTSCP
    • RSM
    • VMREAD, VMWRITE
    • WBINVD
    • XRSTORS, XSAVES

Control Registers Modification Detection

Detecting and Handling Control Registers’ modifications is one of the great security features provided by hypervisors. Imagine if someone exploits the Windows Kernel (or any other OSs) and want to unset one of the bits of a control register (let’s say Write Protected or SMEP) then hypervisor detects this modification and prevent further execution.

These kinds of features are the reason why using hypervisor as a security mechanism is much more better than anything like using separate rings (1 , 2).

Note that SMEP stands for Supervisor Mode Execution Protection. CR4.SMEP allows pages to be protected from supervisor-mode instruction fetches. If CR4.SMEP = 1, software operating in supervisor mode cannot fetch instructions from linear addresses that are accessible in user mode and WP stands for Write Protect. CR0.WP allows pages to be protected from supervisor-mode writes. If CR0.WP = 0, supervisor-mode write accesses are allowed to linear addresses with read-only access rights; if CR0.WP = 1, they are not (User-mode write accesses are never allowed to linear addresses with read-only access rights, regardless of the value of CR0.WP).

Now it’s time to implement our functions.

First of all, let’s read the GUEST_CRs and EXIT_QUALIFICATION of the VMCS.

As you can see the following picture shows how Exit Qualifications can be interpreted.

Note that EXIT_QUALIFCATION is somehow a general VMCS field that in some situations like VM-Exits caused by Invalid VMCS Layout, Control Register Modifications, I/O Bitmaps and other events gives an additional information about the reason of VM-Exit, this is an important part of managing VM-Exits.

Now, as you can see from the above picture, let’s make some variables to describe the situation based on EXIT_QUALIFICATION.

Whenever a VM-Exit occurs that caused by some instructions like MOV CRx, REG, we have to manually modify the CRx of GUEST VMCS from VMX Root Operation. From the following code you can see how to modify GUEST_CRx field of VMCS using VMWRITE.

Otherwise, we have to read the CRx from our guest VMCS (not host Control Register as it might be different) then put into the corresponding registers (in registers that we saved when VM-Exit handler called) then continue with VMRESUME. This way the guest thinks as if it executed the MOV reg, CRx successfully.

Putting it all together we have a function like this :

The reason why implementing functions like HandleControlRegisterAccess is mandatory is because processors (even the recent Intel processor) have 1-settings of some processor based VM-execution controls like CR3-Load Exiting & CR3-Store Existing so you have to manage these kinds of VM-Exits by yourself but if your processor can continue without these settings it’s strongly recommended to reduce the amounts of VM-Exits because modern OSs access control registers a lot, thus, it has a notable performance penalty.

MSR Bitmaps

Everything here is based on whether you set the 28th bit of Primary Processor Based controls or not.

On processors that support the 1-setting of the “use MSR bitmaps” VM-execution control, the VM-execution control fields include the 64-bit physical address of four contiguous MSR bitmaps, which are each 1-KByte in size. This field does not exist on processors that do not support the 1-setting of that control.

Definition of MSR bitmap is pretty clear in Intel SDM, so I just copied them from the original manual, after reading them, we’ll start to implement and put it into our hypervisor.

  • Read bitmap for low MSRs (located at the MSR-bitmap address). This contains one bit for each MSR address in the range 00000000H to 00001FFFH. The bit determines whether execution of RDMSR applied to that MSR causes a VM exit.
  • Read bitmap for high MSRs (located at the MSR-bitmap address plus 1024). This contains one bit for each MSR address in the range C0000000H toC0001FFFH. The bit determines whether execution of RDMSR applied to that MSR causes a VM exit.
  • Write bitmap for low MSRs (located at the MSR-bitmap address plus 2048). This contains one bit for each MSR address in the range 00000000H to 00001FFFH. The bit determines whether execution of WRMSR applied to that MSR causes a VM exit.
  • Write bitmap for high MSRs (located at the MSR-bitmap address plus 3072). This contains one bit for each MSR address in the range C0000000H toC0001FFFH. The bit determines whether execution of WRMSR applied to that MSR causes a VM exit.

Ok, let’s implement the above sentences, if any of the RDMSR or WRMSR caused a VM-Exit then we have to manually execute RDMSR or WRMSR and set the results into the registers, because of this we have a function to manage our RDMSRs like :

Handling MSRs Read

You can see that it just checks for the sanity of MSR and then executing the RDMSR and finally put the results into RAX and RDX (because a non-virtualized RDMSR does the same thing).

Handling MSRs Writes

There is another function for handling WRMSR VM-Exits :

The functionality of the function is simple, by now you should probably understand that all the hooked RDMSRs and WRMSRs should finally call these function but one thing that really worth to experiment by yourself is to avoid setting CPU_BASED_ACTIVATE_MSR_BITMAP in CPU_BASED_VM_EXEC_CONTROL, you’ll see that all of the MSR reads and modifications will cause a VM-Exit with these reasons :

  • EXIT_REASON_MSR_READ
  • EXIT_REASON_MSR_WRITE

This time, you have to pass everything to the above functions and log these VM-Exits, so you can see what are the MSRs used by Windows while running in hypervisor but as I told you above, Windows executes a vast amount of MSR instructions so it can make your system much slower than you can bear it.

Ok, let’s get back to our MSR Bitmap, we need two functions to Set bits of our MSR Bitmap,

The other function for retrieving a special bit.

Now its time to gather all the things in one function based on the above descriptions about MSR Bitmaps. The following function first checks for the sanity of MSR then it changes the MSR Bitmap of the target logical core (this is why we hold both Physical Address and Virtual Address of MSR Bitmap, the physical address for VMCS fields and the virtual address to ease the modification), if it’s a read (rdmsr) for low MSRs then set the corresponding bit in MSRBitmap Virtual Address, if it’s a write (wrmsr) for the low MSRs then modify the MSRBitmap + 2048 (as noted in Intel manual) and exact the same thing for high MSRs (between 0xC0000000 and 0xC0001FFF) but don’t forget the subtraction (0xC0000000) because 0xC000nnnn is not a valid bit :d.

Just one more thing to remember, only the above MSR ranges are currently valid in Intel processors so even any other RDMSRs and WRMSRs cause a VM-Exit but the sanity check here is mandatory as the guest might send invalid MSRs and cause the whole system to crash (in VMX Root mode) !!!

Turning off VMX and Exit from Hypervisor

It’s time to turn off our hypervisor and restore the processor state to what it was before running hypervisor.

Like how we enter hypervisor (VMLAUNCH) we have to combine our C functions with Assembly routines to save the state then execute VMXOFF and free all of our previously allocated pools and finally restore the state.

The VMXOFF part of this routine should be executed in VMX Root operation, you can’t just execute __vmx_vmxoff in one of your driver functions an expect it turns off hypervisor as Windows and all its drivers are currently executing in VMX non-root so executing any of the VMX instructions is like a VM-Exit with one of the following reasons.

For turning off hypervisor, it’s better to use one of our IRP Major functions, in our case we use DrvClose as it always get notified whenever a handle to our device is closed, if you remember from the above, we create a handle from our device using CreateFile (DrvCreate) and now it’s time to close our handle using DrvClose.

Nothing special for the above function, just Terminate_VMX().

This function is similar to the routine of executing VMLAUNCH, except it executes VMXOFF instead.

As you can see, it executes RunOnProcessorForTerminateVMX on all the running logical cores and then free the allocated buffers for VMXON_REGION, VMCS_REGION, VMM_Stack and MSRBitMap using MmFreeContiguousMemory. (of course, convert physicals to virtuals whenever needed)

Note that you have to modify this function by yourself if you virtualized a portion of cores (not all of them).

In RunOnProcessorForTerminateVMX, we have to tell our VMX Root Operation about turning off hypervisor, as I told you it’s because we can’t execute any VMX instruction here and it’s pretty clear that VMX Root Operation can prevent us from this operation if there isn’t any mechanism for handling this situation.

You can use many ways to tell your VMX Root Operation about VMXOFF but in our case I used CPUID.

By now, you definitely know that executing CPUID will cause VM-Exit, now in our CPUID exit handler routine we manage that whenever a CPUID with RAX = 0x41414141 and RCX = 0x42424242 is executed then you have to return true and it shows caller that hypervisor needs to be off.

There is also another check for DPL.

This check makes sure that CPUID with with RAX = 0x41414141 and RCX = 0x42424242 is executed in system privilege level (kernel mode) so none of user-mode applications are able to perform this task.

Even this check is performed but absence of this check doesn’t mean that user-mode applications can turn off the hypervisor, it’s because we didn’t change CR3 to target user-mode process and changing the current privilege level to User-mode, so if you want to let user-mode applications be able to perform this task then you have to consider these cases.

Now our RunOnProcessorForTerminateVMX executes CPUID on all of the cores separately.

In our EXIT_REASON_CPUID we know that if the handler returns true then we have to turn it off so you should think about some other things. For example, Windows expects to run GUEST_RIP and GUEST_RSP whenever VM-exit handler returns so we have to save them in some locations and use them later to restore the Windows state.

Also we have to increase GUEST_RIP because we want to restore the state after the CPUID.

From the 5th part, you probably know MainVMExitHandler is called VMExitHandler (Assembly function from VMExitHandler.asm)

Let’s see it in details.

First we have to extern some previously defined variables.

Now our VMExitHandler works like this, whenever a VM-exit occurs our logical core executes VMExitHandler as it’s defined in HOST_RIP and our RSP is set to HOST_RSP, then we have to save all the registers and it means we have to create a structure for this purpose which allows us to read and modify registers in a C-like structure.

Just push all the registers in _GUEST_REGS order and push the RSP as the first arrgument to MainVMExitHandler (Fastcall RCX), then some subtraction for Shadow space.

You can see the VMExitHandler here :

From the above code when we return from the MainVMExitHandler, we have to check whether return result of MainVMExitHandler (in RAX) tells us to turn off hypervisor or just continue.

If it needs to be continued then restore the registers state and jump to our VM_Resumer function.

VM_Resumer just executes __vmx_vmresume and the processor sets the RIP to GUEST_RIP.

But what if it needs to be turned off ?

Then based on AL, it jumps to another function called VMXOFFHandler. This is a simple function that executes VMXOFF and turns off hypervisor (in current logical core) and then restore the registers to their previous state as we saved them in VMExitHandler.

The only thing we have to do here is changing the stack pointer to GUEST_RSP (We saved them in gGuestRSP) and jump to the GUEST_RIP (saved in gGuestRIP).

Now everything is done, we executed our normal Windows (driver) routine , I mean start the execution after last CPUID that executed from RunOnProcessorForTerminateVMX but now we’re not in VMX operation.

VM-Exit Handler

Putting all the above codes together, now we have to manage different kinds of VM-Exits, so we need to modify our previously explained (in 5th part) MainVMExitHandler, if you forget about it, please review the 5th part (VM-Exit Handler), it’s exactly the same but with different actions for differrent exit reasons.

The first thing we need to manage is to detect every VMX instructions that is executed in VMX non-root operation, it can be done using the following code :

As I tell you in DbgPrint, executing these kinds of VMX instruction will eventually cause BSOD because there might be some checks for the presence of hypervisor before our hypervisor comes so the routine that executes these instructions (of course it’s from kernel) probably thinks it can execute these instructions and if it didn’t manage them well (which is common) then you’ll see BSOD thus you have to discover the cause of invoking these kinds of instructions and manually disable them.

If you configured any CPU based controls or your processor support 1-settings of any of CR Access Exit controls then you can manage them using the function I described above,

The same thing is true for MSRs access, if you didn’t set any MSR Bit (so every RDMSR and WRMSR cause to exit) or you set any bits in MSRBitMaps then you have to manage them using following function for RDMSR :

Then this code for managing WRMSR :

And if you want to detect I/O instruction execution then :

Don’t forget to set adequate CPU based control fields if you want to use the above functionalities.

The last thing that is important for us is CPUID Handler, it call HandleCPUID (described above) and if the result is true then saves the GUEST_RSP and GUEST_RIP so that these values can be used to restore the state after VMXOFF is executed in our core.

Let’s Test it!

Now it’s time to test our hypervisor.

Virtualizing all the cores

First, we have to load our driver.

Running Driver

Then our DriverEntry is called, so we have to run our user-mode application to virtualize all the cores.

Hypervisor From Scratch App

You can see that if your press any key or close this window then you call DrvClose and restore the state (VMXOFF).

Driver log

All the cores are now under hypervisor.

Changing CPUID using Hypervisor

Now let’s test the presence of hypervisor, for this case, I used Immunity Debugger to execute CPUID with custom EAX. You can use any other debugger or any custom application.

Setting 0x40000001 as RAX

You have to manually set the EAX to HYPERV_CPUID_INTERFACE (0x40000001). Then execute CPUID.

HVFS is in RAX

As you can see HVFS (0x48564653) is on EAX so we successfully hooked the CPUID execution using our hypervisor.

Test HYPERV_CPUID_INTERFACE without hypervisor

Now you have to close the user-mode app window, so it execute VMXOFF on all cores, let’s test the above example again.

Test CPUID without hypervisor

You can see that the original result is appeared.

Detecting MSR Read & Write (MSRBitmap)

In order to test MSR Bitmaps, I create a local kernel debugger (using Windbg). In Windbg you can execute RDMSR & WRMSR to read and write MSRs. It’s exactly like executing RDMSR and WRMSR using a system driver.

In our VirtualizeCurrentSystem function, the following line is added.

Windbg Local Debugger (RDMSR & WRMSR)

In the remote debugger system, you can see the result as follows,

Remote Kernel Debugger RDMSR Execution Detected !

The execution of RDMSR is detected.

That’s it all folks.

Conclusion

In this part, we saw how we can virtualize an already running system by configuring the VMCS fields separately for each logical core. Then we use our hypervisor to change the result of CPUID instruction and monitor every access to control registers or MSRs and after this part our hypervisor is almost ready to be used for a practical project. The future part is about using Extended Page Table (as I described them previously in the 4th part). Personally, I believe most of interesting works in hypervisor can be performed using EPT because it has special logging mechanism e.g page read/write access detection and many other cool thing that you’ll see in the next part.

Before finishing, I have to say, I’m neither a System Programmer nor a Hypervisor Developer so please tell me about the mistakes in the comments section, this way you can help me and many other readers to reduce the misconceptions.

See you in next part.

References

[1] Vol 3C – Chapter 24 – (VIRTUAL MACHINE CONTROL STRUCTURES (https://software.intel.com/en-us/articles/intel-sdm)

[2] cpu-internals (https://github.com/LordNoteworthy/cpu-internals)

[3] RDTSCP — Read Time-Stamp Counter and Processor ID (https://www.felixcloutier.com/x86/rdtscp)

[4] INVPCID — Invalidate Process-Context Identifier (https://www.felixcloutier.com/x86/invpcid)

[5] XSAVE — Save Processor Extended States (https://www.felixcloutier.com/x86/xsave)

[6] XRSTORS — Restore Processor Extended States Supervisor (https://www.felixcloutier.com/x86/xrstors)

[7] What is IRQL ? (https://blogs.msdn.microsoft.com/doronh/2010/02/02/what-is-irql/)

  1. LordNoteworthy LordNoteworthy

    Hey.

    Some typos:
    – physical *address* for VMCS fields and the virtual address to ease the modification
    – neither a System Program*m*er nor a Hypervisor developer …

    Great work ! Looking for the next article.

    • Sinaei Sinaei

      Thank you 🙏 It’s correct now.

  2. anon anon

    How does the ept work if you dont set the EPT_POINTER in the vmcs ?(line 277 in vmx.c is commented). Anyway great series waiting for the next.

    • Sinaei Sinaei

      If you don’t set CPU_BASED_CTL2_ENABLE_EPT then everything about EPT is ignored by your processor so EPTP is not important in this part.

  3. mohammad mohammad

    I think the code have a conflict requirement that cause BSOD:
    During run RunOnProcessor function, you raised the Irql To DISPATCH_LEVEL and call the routine “VMXSaveState” that is then call “VirtualizeCurrentSystem” which call “PAGED_CODE()” macro. the “PAGED_CODE()” check about irql <= APC_LEVEL.

    Please can you tell me if my think is right? and what your suggestions to solve that?

    • Sinaei Sinaei

      I think it’s my mistake.

      • mohammad mohammad

        But how you solve that? can just comment one of them will solve the problem?

        • Sinaei Sinaei

          If you mean which one is correct then you have to raise IRQL to DISPATCH_LEVEL, there is no need to “PAGED_CODE()” but it didn’t cause BSOD for me, BTW, unfortunately, I don’t have a remote kernel debugging PC to check it again 🙁

  4. Winny Thomas Winny Thomas

    Thank you for this fantastic series!!
    Excellent! When will you be posting part 7. All to eager to experiment with EPT

    • Sinaei Sinaei

      It’s almost ready, just needs some tests. I’m kinda busy for the final tests of the university but post it ASAP.

Leave a Reply

Your email address will not be published. Required fields are marked *