Changeset 4518 in vbox
- Timestamp:
- Sep 4, 2007 2:41:22 PM (17 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/src/VBox/VMM/PGM.cpp
r4511 r4518 21 21 * 22 22 * 23 * @section sec_pg _modesPaging Modes23 * @section sec_pgm_modes Paging Modes 24 24 * 25 25 * There are three memory contexts: Host Context (HC), Guest Context (GC) … … 49 49 * 50 50 * 51 * @section sec_pg _shwThe Shadow Memory Context51 * @section sec_pgm_shw The Shadow Memory Context 52 52 * 53 53 * … … 63 63 * 64 64 * 65 * @section sec_pg _intThe Intermediate Memory Context65 * @section sec_pgm_int The Intermediate Memory Context 66 66 * 67 67 * The world switch goes thru an intermediate memory context which purpose it is … … 79 79 * 80 80 * 81 * @subsection subsec_pg _int_gcGuest Context Mappings81 * @subsection subsec_pgm_int_gc Guest Context Mappings 82 82 * 83 83 * During assignment and relocation of a guest context mapping the intermediate … … 90 90 * 91 91 * 92 * @section sec_pg _miscMisc93 * 94 * @subsection subsec_pg _misc_diffDifferences Between Legacy PAE and Long Mode PAE92 * @section sec_pgm_misc Misc 93 * 94 * @subsection subsec_pgm_misc_diff Differences Between Legacy PAE and Long Mode PAE 95 95 * 96 96 * The differences between legacy PAE and long mode PAE are: … … 116 116 * 117 117 * 118 * @subsection subsec_pg_pgmPhys_AllocPage Allocating a page. 118 * @subsection subsec_pgmPhys_Definitions Definitions 119 * 120 * Allocation chunk - A RTR0MemObjAllocPhysNC object and the tracking 121 * machinery assoicated with it. 122 * 123 * 124 * 125 * 126 * @subsection subsec_pgmPhys_AllocPage Allocating a page. 119 127 * 120 128 * Initially we map *all* guest memory to the (per VM) zero page, which … … 224 232 * 225 233 * 226 * @subsection subsec_pgmPhys_ SerializingTracking Structures And Their Cost234 * @subsection subsec_pgmPhys_Tracking Tracking Structures And Their Cost 227 235 * 228 236 * There's a difficult balance between keeping the per-page tracking structures … … 233 241 * to 32GB of memory on a 32-bit system and essentially unlimited on 64-bit ones. 234 242 * 235 * ... 243 * 244 * @subsubsection subsubsec_pgmPhys_Tracking_Kernel Kernel Space 245 * 246 * The allocation chunks are of fixed sized, the size defined at build time. 247 * Each chunk is given an unquie ID. Each page can be addressed by 248 * (idChunk << CHUNK_SHIFT) | iPage, where CHUNK_SHIFT is log2(cbChunk / PAGE_SIZE). 249 * Meaning that each page have an unique ID, a sort of virtual page frame number 250 * if you like, so that a page can be referenced to in an efficient manner. 251 * No surprise, the allocation chunks are organized in an AVL tree with 252 * their IDs being the key. 253 * 254 * The physical address of each page in an allocation chunk is maintained by 255 * the RTR0MEMOBJ and obtained using RTR0MemObjGetPagePhysAddr. There is no 256 * need to duplicate this information unnecessarily. 257 * 258 * We wish to maintain a reference to the VM owning the page. For the purposes 259 * of defragmenting allocation chunks, it would make sense to keep track of 260 * which page within the VM that it's being used as, although this will 261 * obviously make the handy pages a wee more work to realize. For shared 262 * pages we need a reference count so we know when to free the page. But tracking 263 * which VMs using shared pages will be too complicated and expensive, so we'll 264 * just forget about it. And finally, free pages needs to be chained somehow, 265 * so we can do allocations in an efficient manner. 266 * 267 * Putting shared pages in dedicated allocation chunks will simplify matters 268 * quite a bit. It will more or less eliminate the problem with defragmenting 269 * shared pages, but arranging it so that we will never encounter shared pages 270 * and normal pages in the same allocation chunks. And it will I think permit 271 * us to get away with a 32-bit field for each page. 272 * 273 * We'll chain the free pages using this field to indicate the index of the 274 * next page. (I'm undecided whether this chain should be on a per-chunk 275 * level or not, it depends a bit on whether it's desirable to keep chunks 276 * with free pages in a priority list by free page count (ascending) in order 277 * to maximize the number of full chunks.) In any case, there'll be two free 278 * lists, one for shared pages and one for normal pages. 279 * 280 * Shared pages that have been allocated will use the 32-bit field for keeping 281 * the reference counter. 282 * 283 * Normal pages that have been allocated will use the first 24 bits for guest 284 * page frame number (i.e. shift by PAGE_SHIFT and you'll have the physical 285 * address, all 24-bit set means unknown or out of range). The top 8 bits will 286 * be used as VM handle index - we assign each VM a unique handle [0..255] for 287 * this purpose. This implies a max of 256 VMs and 64GB of base RAM per VM. 288 * Neither limits should cause any trouble for the time being. 289 * 290 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ 291 * entails. In addition there is the chunk cost of approximately 292 * (sizeof(RT0MEMOBJ) + sizof(CHUNK)) / 2^CHUNK_SHIFT bytes per page. 293 * 294 * On Windows the per page RTR0MEMOBJ cost is 32-bit on 32-bit windows 295 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page. 296 * The cost on Linux is identical, but here it's because of sizeof(struct page *). 297 * 298 * 299 * @subsubsection subsubsec_pgmPhys_Tracking_PerVM Per-VM 300 * 301 * Fixed info is the physical address of the page (HCPhys) and the page id 302 * (described above). Theoretically we'll need 48(-12) bits for the HCPhys part. 303 * Today we've restricting ourselves to 40(-12) bits because this is the current 304 * restrictions of all AMD64 implementations (I think Barcelona will up this 305 * to 48(-12) bits, not that it really matters) and I needed the bits for 306 * tracking mappings of a page. 48-12 = 36. That leaves 28 bits, which means a 307 * decent range for the page id: 2^(28+12) = 1024TB. 308 * 309 * In additions to these, we'll have to keep maintaining the page flags as we 310 * currently do. Although it wouldn't harm to optimize these quite a bit, like 311 * for instance the ROM shouldn't depend on having a write handler installed 312 * in order for it to become read-only. A RO/RW bit should be considered so 313 * that the page syncing code doesn't have to mess about checking multiple 314 * flag combinations (ROM || RW handler || write monitored) in order to 315 * figure out how to setup a shadow PTE. But this of course, is second 316 * priority at present. Current this requires 12 bits, but could probably 317 * be optimized to ~8. 318 * 319 * Then there's the 24 bits used to track which shadow page tables are 320 * currently mapping a page for the purpose of speeding up physical 321 * access handlers, and thereby the page pool cache. More bit for this 322 * purpose wouldn't hurt IIRC. 323 * 324 * Then there is a new bit in which we need to record what kind of page 325 * this is, shared, zero, normal or write-monitored-normal. This'll 326 * require 2 bits. One bit might be needed for indicating whether a 327 * write monitored page has been written to. And yet another one or 328 * two for tracking migration status. 3-4 bits total then. 329 * 330 * Whatever is left will can be used to record the sharabilitiy of a 331 * page. The page checksum will not be stored in the per-VM table as 332 * the idle thread will not be permitted to do modifications to it. 333 * It will instead have to keep its own working set of potentially 334 * shareable pages and their check sums and stuff. 335 * 336 * For the present we'll keep the current packing of the 337 * PGMRAMRANGE::aHCPhys to keep the changes simple, only of course, 338 * we'll have to change it to a struct with a total of 128-bits at 339 * our disposal. 340 * 341 * The initial layout will be like this: 342 * @verbatim 343 RTHCPHYS HCPhys; The current stuff. 344 63:40 Current shadow PT tracking stuff. 345 39:12 The physical page frame number. 346 11:0 The current flags. 347 uint32_t u28PageId : 28; The page id. 348 uint32_t u2State : 2; The page state { zero, shared, normal, write monitored }. 349 uint32_t fWrittenTo : 1; Whether a write monitored page was written to. 350 uint32_t u1Reserved : 1; Reserved for later. 351 uint32_t u32Reserved; Reserved for later, mostly sharing stats. 352 @endverbatim 353 * 354 * The final layout will be something like this: 355 * @verbatim 356 RTHCPHYS HCPhys; The current stuff. 357 63:48 High page id (12+). 358 47:12 The physical page frame number. 359 11:0 Low page id. 360 uint32_t fReadOnly : 1; Whether it's readonly page (rom or monitored in some way). 361 uint32_t u3Type : 3; The page type {RESERVED, MMIO, MMIO2, ROM, shadowed ROM, RAM}. 362 uint32_t u2PhysMon : 2; Physical access handler type {none, read, write, all}. 363 uint32_t u2VirtMon : 2; Virtual access handler type {none, read, write, all}.. 364 uint32_t u2State : 2; The page state { zero, shared, normal, write monitored }. 365 uint32_t fWrittenTo : 1; Whether a write monitored page was written to. 366 uint32_t u20Reserved : 20; Reserved for later, mostly sharing stats. 367 uint32_t u32Reserved : ; Reserved for later, mostly sharing stats. 368 uint32_t u32Tracking; The shadow PT tracking stuff, roughly. 369 @endverbatim 370 * 371 * Cost wise, this means we'll double the cost for guest memory. There isn't anyway 372 * around that I'm afraid. It means that the cost of dealing out 32GB of memory 373 * to one or more VMs is: (32GB >> PAGE_SHIFT) * 16 bytes, or 128MBs. Or another 374 * example, the VM heap cost when assigning 1GB to a VM will be: 4MB. 375 * 376 * A couple of cost examples for the total cost per-VM + kernel. 377 * 32-bit Windows and 32-bit linux: 378 * 1GB guest ram, 256K pages: 4MB + 2MB(+) = 6MB 379 * 4GB guest ram, 1M pages: 16MB + 8MB(+) = 24MB 380 * 32GB guest ram, 8M pages: 128MB + 64MB(+) = 192MB 381 * 64-bit Windows and 64-bit linux: 382 * 1GB guest ram, 256K pages: 4MB + 3MB(+) = 7MB 383 * 4GB guest ram, 1M pages: 16MB + 12MB(+) = 28MB 384 * 32GB guest ram, 8M pages: 128MB + 96MB(+) = 224MB 236 385 * 237 386 * … … 280 429 * The 3rd step is identical to what we're already doing when updating a 281 430 * physical handler, see pgmHandlerPhysicalSetRamFlagsAndFlushShadowPTs. 431 * 432 * 433 * @subsection subsec_pgmPhys_Changes Changes 434 * 435 * Breakdown of the changes involved... 436 * 282 437 * 283 438 */
Note:
See TracChangeset
for help on using the changeset viewer.