A forum for reverse engineering, OS internals and malware analysis 

Forum for discussion about kernel-mode development.
 #31785  by tangptr
 Thu Jul 05, 2018 9:57 am
I was writing a code of building a hyper-visor (based on Intel VT-x) in system. But something I don't understand occured:

If I set a break-point at guest rip, or even some instructions after, the break-point would hit but continuing the execution is fine. Nothing bad happens.
If I don't set any break-points, after certain amount of executions of instructions, the hyper-visor caught a triple-fault, a fatal error to guest.
It is almost as if the break-point saved the system!

My conclusion is that the debugger automatically recovered certain conditions in system.

I also tried to set the exception-bitmap field in VMCS. The conclusion comes that:
If I intercept #PF, #PF would trap in a never-ending spiral.
If I don't intercept #PF, but all other exceptions, only Double-Fault would be intercepted. I think the first fault might be the #PF.

Then the conclusion comes to that the break-point avoided the #PF to happen somehow, but I don't understand it at all.
I, maybe, have faults on settings of guest state area. The break-point may changed one or more fields of the guest-state.

My analysis is given above. However, it cannot give me resolution. I therefore would like to ask how this strange phenomenon occurs in detail and how can I solve this?
Thanks in advance.
 #31791  by feryno
 Fri Jul 06, 2018 2:20 pm
Hi, on a breakpoint inside guest, the debugger may sanitize something which you did not set properly, e.g. tss, selector and its base/limit/access_rights etc - check guest state. Enable only as little features as possible in execution controls (no EPT, no exception in exception bitmap). If you have an older CPU (e.g. Ivy Bridge), try it as you can run with less features enabled than on latest CPUs, e.g. the mentioned CPU does not have PCID feature and so on. When you have stable hypervisor, enable more and more features and test on latest CPU, that way you can easily know that e.g. you did a mistake in handling of pagefault exception after enabling it in execution controls and writing its vm exit handler procedure. Do development in as smallest steps as possible and test thoroughly after every step.
 #31799  by tangptr
 Mon Jul 09, 2018 2:56 am
feryno wrote: Fri Jul 06, 2018 2:20 pm Hi, on a breakpoint inside guest, the debugger may sanitize something which you did not set properly, e.g. tss, selector and its base/limit/access_rights etc - check guest state. Enable only as little features as possible in execution controls (no EPT, no exception in exception bitmap). If you have an older CPU (e.g. Ivy Bridge), try it as you can run with less features enabled than on latest CPUs, e.g. the mentioned CPU does not have PCID feature and so on. When you have stable hypervisor, enable more and more features and test on latest CPU, that way you can easily know that e.g. you did a mistake in handling of pagefault exception after enabling it in execution controls and writing its vm exit handler procedure. Do development in as smallest steps as possible and test thoroughly after every step.
Thank you for your reminding. I find out my critical problem now - I forgot to set-up the structure alignment of GDT-Entries so that I got wrong base-addresses of segments.