Changeset 71279 in vbox for trunk/src/VBox/VMM/VMMR3
- Timestamp:
- Mar 8, 2018 7:56:47 PM (7 years ago)
- svn:sync-xref-src-repo-rev:
- 121208
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/src/VBox/VMM/VMMR3/NEMR3.cpp
r71275 r71279 24 24 * 25 25 * On Windows the Hyper-V root partition (dom0 in zen terminology) does not have 26 * nested VT-x or AMD-V capabilities. For a while raw-mode worked in it, 27 * however now we \#GP when modifying CR4. So, when Hyper-V is active on 28 * Windows we have little choice but to use Hyper-V to run our VMs. 26 * nested VT-x or AMD-V capabilities. For a while raw-mode worked inside it, 27 * but for a while now we've been getting \#GP when trying to modify CR4 in the 28 * world switcher. So, when Hyper-V is active on Windows we have little choice 29 * but to use Hyper-V to run our VMs. 30 * 29 31 * 30 32 * @subsection subsec_nem_win_whv The WinHvPlatform API … … 34 36 * This interface is a wrapper around the undocumented Virtualization 35 37 * Infrastructure Driver (VID) API - VID.DLL and VID.SYS. The wrapper is 36 * written in C++, namespaced and early version (at least) was using standard38 * written in C++, namespaced, early versions (at least) was using standard C++ 37 39 * container templates in several places. 38 40 * … … 73 75 * Running guest code is done thru the WHvRunVirtualProcessor function. It 74 76 * asynchronously starts or resumes hyper-V CPU execution and then waits for an 75 * VMEXIT message. Other threads can interrupt the execution by using 76 * WHvCancelVirtualProcessor, which which case the thread in 77 * WHvRunVirtualProcessor is woken up via a dummy QueueUserAPC and will call 78 * VidStopVirtualProcessor to asynchronously end execution. The stop CPU call 79 * not immediately succeed if the CPU encountered a VMEXIT before the stop was 80 * processed, in which case the VMEXIT needs to be processed first, and the 81 * pending stop will be processed in a subsequent call to 82 * WHvRunVirtualProcessor. 83 * 84 * {something about registers} 77 * VMEXIT message. Hyper-V / VID.SYS will return information about the message 78 * in the message buffer mapping, and WHvRunVirtualProcessor will convert that 79 * into it's own WHV_RUN_VP_EXIT_CONTEXT format. 80 * 81 * Other threads can interrupt the execution by using WHvCancelVirtualProcessor, 82 * which which case the thread in WHvRunVirtualProcessor is woken up via a dummy 83 * QueueUserAPC and will call VidStopVirtualProcessor to asynchronously end 84 * execution. The stop CPU call not immediately succeed if the CPU encountered 85 * a VMEXIT before the stop was processed, in which case the VMEXIT needs to be 86 * processed first, and the pending stop will be processed in a subsequent call 87 * to WHvRunVirtualProcessor. 88 * 89 * Registers are retrieved and set via WHvGetVirtualProcessorRegisters and 90 * WHvSetVirtualProcessorRegisters. In addition, several VMEXITs include 91 * essential register state in the exit context information, potentially making 92 * it possible to emulate the instruction causing the exit without involving 93 * WHvGetVirtualProcessorRegisters. 94 * 85 95 * 86 96 * @subsubsection subsubsec_nem_win_whv_cons Issues / Disadvantages … … 92 102 * the VidMessageSlotHandleAndGetNext call. 93 103 * 94 * IIRC this will make the kernel schedule the callback thru104 * IIRC this will make the kernel schedule the specified callback thru 95 105 * NTDLL!KiUserApcDispatcher by modifying the thread context and quite 96 106 * possibly the userland thread stack. When the APC callback returns to 97 107 * KiUserApcDispatcher, it will call NtContinue to restore the old thread 98 * context and resume execution from there. Upshot this is a bit expensive. 108 * context and resume execution from there. This naturally adds up to some 109 * CPU cycles, ring transitions aren't for free, especially after Spectre & 110 * Meltdown mitigations. 99 111 * 100 112 * Using NtAltertThread call could do the same without the thread context … … 122 134 * Since MMIO is currently realized as unmapped GPA, this will slow down all 123 135 * MMIO accesses a tiny little bit as WHvRunVirtualProcessor looks up the 124 * guest physical address the checks if it's a pending lazy mapping. 136 * guest physical address to check if it is a pending lazy mapping. 137 * 138 * The lazy mapping feature makes no sense to us. We as API user have all the 139 * information and can do lazy mapping ourselves if we want/have to (see next 140 * point). 125 141 * 126 142 * 127 143 * - There is no API for modifying protection of a page within a GPA range. 128 144 * 129 * We're left with having to unmap the range and then remap it with the new 130 * protection. For instance we're actively using this to track dirty VRAM 131 * pages, which means there are occational readonly->writable transitions at 132 * run time followed by bulk reversal to readonly when the display is 133 * refreshed. 134 * 135 * Now to work around the issue, we do page sized GPA ranges. In addition to 136 * add a lot of tracking overhead to WinHvPlatform and VID.SYS, it also causes 137 * us to exceed our quota before we've even mapped a default sized VRAM 138 * page-by-page. So, to work around this quota issue we have to lazily map 139 * pages and actively restrict the number of mappings. 140 * 141 * Out best workaround thus far is bypassing WinHvPlatform and VID when in 142 * comes to memory and instead us the hypercalls to do it (HvCallMapGpaPages, 143 * HvCallUnmapGpaPages). (This also maps a whole lot better into our own 144 * guest page management infrastructure.) 145 * 146 * 147 * - Observed problems doing WHvUnmapGpaRange followed by WHvMapGpaRange. 145 * From what we can tell, the only way to modify the protection (like readonly 146 * -> writable, or vice versa) is to first unmap the range and then remap it 147 * with the new protection. 148 * 149 * We are for instance doing this quite a bit in order to track dirty VRAM 150 * pages. VRAM pages starts out as readonly, when the guest writes to a page 151 * we take an exit, notes down which page it is, makes it writable and restart 152 * the instruction. After refreshing the display, we reset all the writable 153 * pages to readonly again, bulk fashion. 154 * 155 * Now to work around this issue, we do page sized GPA ranges. In addition to 156 * add a lot of tracking overhead to WinHvPlatform and VID.SYS, this also 157 * causes us to exceed our quota before we've even mapped a default sized 158 * (128MB) VRAM page-by-page. So, to work around this quota issue we have to 159 * lazily map pages and actively restrict the number of mappings. 160 * 161 * Our best workaround thus far is bypassing WinHvPlatform and VID entirely 162 * when in comes to guest memory management and instead use the underlying 163 * hypercalls (HvCallMapGpaPages, HvCallUnmapGpaPages) to do it ourselves. 164 * (This also maps a whole lot better into our own guest page management 165 * infrastructure.) 166 * 167 * 168 * - Observed problems doing WHvUnmapGpaRange immediately followed by 169 * WHvMapGpaRange. 148 170 * 149 171 * As mentioned above, we've been forced to use this sequence when modifying 150 * page protection. However, when upgrading from readonly to writable, we've 151 * ended up looping forever with the same write to readonly memory exit. 172 * page protection. However, when transitioning from readonly to writable, 173 * we've ended up looping forever with the same write to readonly memory 174 * VMEXIT. We're wondering if this issue might be related to the lazy mapping 175 * logic in WinHvPlatform. 152 176 * 153 177 * Workaround: Insert a WHvRunVirtualProcessor call and make sure to get a GPA 154 * unmapped exit between the two calls. Terrible for performance and code155 * sanity.156 * 157 * 158 * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to it 's159 * own defined format.178 * unmapped exit between the two calls. Not entirely great performance wise 179 * (or the santity of our code). 180 * 181 * 182 * - WHVRunVirtualProcessor wastes time converting VID/Hyper-V messages to its 183 * own format (WHV_RUN_VP_EXIT_CONTEXT). 160 184 * 161 185 * We understand this might be because Microsoft wishes to remain free to 162 186 * modify the VID/Hyper-V messages, but it's still rather silly and does slow 163 * things down .187 * things down a little. We'd much rather just process the messages directly. 164 188 * 165 189 * 166 190 * - WHVRunVirtualProcessor would've benefited from using a callback interface: 191 * 167 192 * - The potential size changes of the exit context structure wouldn't be 168 193 * an issue, since the function could manage that itself. 169 * - State handling could be optimized simplified (esp. cancellation). 194 * 195 * - State handling could probably be simplified (like cancelation). 170 196 * 171 197 * … … 173 199 * internally converts register names, probably using temporary heap buffers. 174 200 * 175 * From the looks of things, it's converting from WHV_REGISTER_NAME to 176 * HV_REGISTER_NAME that's documented in the "Virtual Processor Register 177 * Names" section of "Hypervisor Top-Level Functional Specification". This 178 * feels like an awful waste of time. We simply cannot understand why it 179 * wouldn't have sufficed to use HV_REGISTER_NAME here and simply checked the 180 * input values if restrictions were desired. 201 * From the looks of things, they are converting from WHV_REGISTER_NAME to 202 * HV_REGISTER_NAME from in the "Virtual Processor Register Names" section in 203 * the "Hypervisor Top-Level Functional Specification" document. This feels 204 * like an awful waste of time. 205 * 206 * We simply cannot understand why HV_REGISTER_NAME isn't used directly here, 207 * or at least the same values, making any conversion reduntant. Restricting 208 * access to certain registers could easily be implement by scanning the 209 * inputs. 181 210 * 182 211 * To avoid the heap + conversion overhead, we're currently using the … … 184 213 * 185 214 * 215 * - The YMM and XCR0 registers are not yet named (17083). This probably 216 * wouldn't be a problem if HV_REGISTER_NAME was used, see previous point. 217 * 218 * 186 219 * - Why does WINHVR.SYS (or VID.SYS) only query/set 32 registers at the time 187 220 * thru the HvCallGetVpRegisters and HvCallSetVpRegisters hypercalls? 188 221 * 189 222 * We've not trouble getting/setting all the registers defined by 190 * WHV_REGISTER_NAME in one hypercall... 191 * 192 * 193 * - . 223 * WHV_REGISTER_NAME in one hypercall (around 80)... 224 * 225 * 226 * - The I/O port exit context information seems to be missing the address size 227 * information needed for correct string I/O emulation. 228 * 229 * VT-x provides this information in bits 7:9 in the instruction information 230 * field on newer CPUs. AMD-V in bits 7:9 in the EXITINFO1 field in the VMCB. 231 * 232 * We can probably work around this by scanning the instruction bytes for 233 * address size prefixes. Haven't investigated it any further yet. 234 * 235 * 236 * - The WHvGetCapability function has a weird design: 237 * - The CapabilityCode parameter is pointlessly duplicated in the output 238 * structure (WHV_CAPABILITY). 239 * 240 * - API takes void pointer, but everyone will probably be using 241 * WHV_CAPABILITY due to WHV_CAPABILITY::CapabilityCode making it 242 * impractical to use anything else. 243 * 244 * - No output size. 245 * 246 * - See GetFileAttributesEx, GetFileInformationByHandleEx, 247 * FindFirstFileEx, and others for typical pattern for generic 248 * information getters. 249 * 250 * 251 * - The WHvGetPartitionProperty function uses the same weird design as 252 * WHvGetCapability, see above. 253 * 254 * 255 * - The WHvSetPartitionProperty function has a totally weird design too: 256 * - In contrast to its partner WHvGetPartitionProperty, the property code 257 * is not a separate input parameter here but part of the input 258 * structure. 259 * 260 * - The input structure is a void pointer rather than a pointer to 261 * WHV_PARTITION_PROPERTY which everyone probably will be using because 262 * of the WHV_PARTITION_PROPERTY::PropertyCode field. 263 * 264 * - Really, why use PVOID for the input when the function isn't accepting 265 * minimal sizes. E.g. WHVPartitionPropertyCodeProcessorClFlushSize only 266 * requires a 9 byte input, but the function insists on 16 bytes (17083). 267 * 268 * - See GetFileAttributesEx, SetFileInformationByHandle, FindFirstFileEx, 269 * and others for typical pattern for generic information setters and 270 * getters. 271 * 272 * 273 * @subsection subsec_nem_win_impl Our implementation. 274 * 275 * Tomorrow... 276 * 194 277 * 195 278 */
Note:
See TracChangeset
for help on using the changeset viewer.