VirtualBox

Changeset 4518 in vbox


Ignore:
Timestamp:
Sep 4, 2007 2:41:22 PM (17 years ago)
Author:
vboxsync
Message:

more words.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/VBox/VMM/PGM.cpp

    r4511 r4518  
    2121 *
    2222 *
    23  * @section         sec_pg_modes            Paging Modes
     23 * @section         sec_pgm_modes           Paging Modes
    2424 *
    2525 * There are three memory contexts: Host Context (HC), Guest Context (GC)
     
    4949 *
    5050 *
    51  * @section         sec_pg_shw              The Shadow Memory Context
     51 * @section         sec_pgm_shw             The Shadow Memory Context
    5252 *
    5353 *
     
    6363 *
    6464 *
    65  * @section         sec_pg_int              The Intermediate Memory Context
     65 * @section         sec_pgm_int             The Intermediate Memory Context
    6666 *
    6767 * The world switch goes thru an intermediate memory context which purpose it is
     
    7979 *
    8080 *
    81  * @subsection      subsec_pg_int_gc        Guest Context Mappings
     81 * @subsection      subsec_pgm_int_gc       Guest Context Mappings
    8282 *
    8383 * During assignment and relocation of a guest context mapping the intermediate
     
    9090 *
    9191 *
    92  * @section         sec_pg_misc             Misc
    93  *
    94  * @subsection      subsec_pg_misc_diff     Differences Between Legacy PAE and Long Mode PAE
     92 * @section         sec_pgm_misc            Misc
     93 *
     94 * @subsection      subsec_pgm_misc_diff    Differences Between Legacy PAE and Long Mode PAE
    9595 *
    9696 * The differences between legacy PAE and long mode PAE are:
     
    116116 *
    117117 *
    118  * @subsection subsec_pg_pgmPhys_AllocPage      Allocating a page.
     118 * @subsection subsec_pgmPhys_Definitions       Definitions
     119 *
     120 * Allocation chunk - A RTR0MemObjAllocPhysNC object and the tracking
     121 * machinery assoicated with it.
     122 *
     123 *
     124 *
     125 *
     126 * @subsection subsec_pgmPhys_AllocPage         Allocating a page.
    119127 *
    120128 * Initially we map *all* guest memory to the (per VM) zero page, which
     
    224232 *
    225233 *
    226  * @subsection subsec_pgmPhys_Serializing       Tracking Structures And Their Cost
     234 * @subsection subsec_pgmPhys_Tracking      Tracking Structures And Their Cost
    227235 *
    228236 * There's a difficult balance between keeping the per-page tracking structures
     
    233241 * to 32GB of memory on a 32-bit system and essentially unlimited on 64-bit ones.
    234242 *
    235  * ...
     243 *
     244 * @subsubsection subsubsec_pgmPhys_Tracking_Kernel     Kernel Space
     245 *
     246 * The allocation chunks are of fixed sized, the size defined at build time.
     247 * Each chunk is given an unquie ID. Each page can be addressed by
     248 * (idChunk << CHUNK_SHIFT) | iPage, where CHUNK_SHIFT is log2(cbChunk / PAGE_SIZE).
     249 * Meaning that each page have an unique ID, a sort of virtual page frame number
     250 * if you like, so that a page can be referenced to in an efficient manner.
     251 * No surprise, the allocation chunks are organized in an AVL tree with
     252 * their IDs being the key.
     253 *
     254 * The physical address of each page in an allocation chunk is maintained by
     255 * the RTR0MEMOBJ and obtained using RTR0MemObjGetPagePhysAddr. There is no
     256 * need to duplicate this information unnecessarily.
     257 *
     258 * We wish to maintain a reference to the VM owning the page. For the purposes
     259 * of defragmenting allocation chunks, it would make sense to keep track of
     260 * which page within the VM that it's being used as, although this will
     261 * obviously make the handy pages a wee more work to realize. For shared
     262 * pages we need a reference count so we know when to free the page. But tracking
     263 * which VMs using shared pages will be too complicated and expensive, so we'll
     264 * just forget about it. And finally, free pages needs to be chained somehow,
     265 * so we can do allocations in an efficient manner.
     266 *
     267 * Putting shared pages in dedicated allocation chunks will simplify matters
     268 * quite a bit. It will more or less eliminate the problem with defragmenting
     269 * shared pages, but arranging it so that we will never encounter shared pages
     270 * and normal pages in the same allocation chunks. And it will I think permit
     271 * us to get away with a 32-bit field for each page.
     272 *
     273 * We'll chain the free pages using this field to indicate the index of the
     274 * next page. (I'm undecided whether this chain should be on a per-chunk
     275 * level or not, it depends a bit on whether it's desirable to keep chunks
     276 * with free pages in a priority list by free page count (ascending) in order
     277 * to maximize the number of full chunks.) In any case, there'll be two free
     278 * lists, one for shared pages and one for normal pages.
     279 *
     280 * Shared pages that have been allocated will use the 32-bit field for keeping
     281 * the reference counter.
     282 *
     283 * Normal pages that have been allocated will use the first 24 bits for guest
     284 * page frame number (i.e. shift by PAGE_SHIFT and you'll have the physical
     285 * address, all 24-bit set means unknown or out of range). The top 8 bits will
     286 * be used as VM handle index - we assign each VM a unique handle [0..255] for
     287 * this purpose. This implies a max of 256 VMs and 64GB of base RAM per VM.
     288 * Neither limits should cause any trouble for the time being.
     289 *
     290 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
     291 * entails. In addition there is the chunk cost of approximately
     292 * (sizeof(RT0MEMOBJ) + sizof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
     293 *
     294 * On Windows the per page RTR0MEMOBJ cost is 32-bit on 32-bit windows
     295 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
     296 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
     297 *
     298 *
     299 * @subsubsection subsubsec_pgmPhys_Tracking_PerVM      Per-VM
     300 *
     301 * Fixed info is the physical address of the page (HCPhys) and the page id
     302 * (described above). Theoretically we'll need 48(-12) bits for the HCPhys part.
     303 * Today we've restricting ourselves to 40(-12) bits because this is the current
     304 * restrictions of all AMD64 implementations (I think Barcelona will up this
     305 * to 48(-12) bits, not that it really matters) and I needed the bits for
     306 * tracking mappings of a page. 48-12 = 36. That leaves 28 bits, which means a
     307 * decent range for the page id: 2^(28+12) = 1024TB.
     308 *
     309 * In additions to these, we'll have to keep maintaining the page flags as we
     310 * currently do. Although it wouldn't harm to optimize these quite a bit, like
     311 * for instance the ROM shouldn't depend on having a write handler installed
     312 * in order for it to become read-only. A RO/RW bit should be considered so
     313 * that the page syncing code doesn't have to mess about checking multiple
     314 * flag combinations (ROM || RW handler || write monitored) in order to
     315 * figure out how to setup a shadow PTE. But this of course, is second
     316 * priority at present. Current this requires 12 bits, but could probably
     317 * be optimized to ~8.
     318 *
     319 * Then there's the 24 bits used to track which shadow page tables are
     320 * currently mapping a page for the purpose of speeding up physical
     321 * access handlers, and thereby the page pool cache. More bit for this
     322 * purpose wouldn't hurt IIRC.
     323 *
     324 * Then there is a new bit in which we need to record what kind of page
     325 * this is, shared, zero, normal or write-monitored-normal. This'll
     326 * require 2 bits. One bit might be needed for indicating whether a
     327 * write monitored page has been written to. And yet another one or
     328 * two for tracking migration status. 3-4 bits total then.
     329 *
     330 * Whatever is left will can be used to record the sharabilitiy of a
     331 * page. The page checksum will not be stored in the per-VM table as
     332 * the idle thread will not be permitted to do modifications to it.
     333 * It will instead have to keep its own working set of potentially
     334 * shareable pages and their check sums and stuff.
     335 *
     336 * For the present we'll keep the current packing of the
     337 * PGMRAMRANGE::aHCPhys to keep the changes simple, only of course,
     338 * we'll have to change it to a struct with a total of 128-bits at
     339 * our disposal.
     340 *
     341 * The initial layout will be like this:
     342 * @verbatim
     343    RTHCPHYS HCPhys;            The current stuff.       
     344        63:40                   Current shadow PT tracking stuff.
     345        39:12                   The physical page frame number.
     346        11:0                    The current flags.
     347    uint32_t u28PageId : 28;    The page id.
     348    uint32_t u2State : 2;       The page state { zero, shared, normal, write monitored }.
     349    uint32_t fWrittenTo : 1;    Whether a write monitored page was written to.
     350    uint32_t u1Reserved : 1;    Reserved for later.
     351    uint32_t u32Reserved;       Reserved for later, mostly sharing stats.
     352 @endverbatim
     353 *
     354 * The final layout will be something like this:
     355 * @verbatim
     356    RTHCPHYS HCPhys;            The current stuff.       
     357        63:48                   High page id (12+).
     358        47:12                   The physical page frame number.
     359        11:0                    Low page id.
     360    uint32_t fReadOnly : 1;     Whether it's readonly page (rom or monitored in some way).
     361    uint32_t u3Type : 3;        The page type {RESERVED, MMIO, MMIO2, ROM, shadowed ROM, RAM}.
     362    uint32_t u2PhysMon : 2;     Physical access handler type {none, read, write, all}.
     363    uint32_t u2VirtMon : 2;     Virtual access handler type {none, read, write, all}..
     364    uint32_t u2State : 2;       The page state { zero, shared, normal, write monitored }.
     365    uint32_t fWrittenTo : 1;    Whether a write monitored page was written to.
     366    uint32_t u20Reserved : 20;  Reserved for later, mostly sharing stats.
     367    uint32_t u32Reserved : ;       Reserved for later, mostly sharing stats.
     368    uint32_t u32Tracking;       The shadow PT tracking stuff, roughly.
     369 @endverbatim
     370 *
     371 * Cost wise, this means we'll double the cost for guest memory. There isn't anyway
     372 * around that I'm afraid. It means that the cost of dealing out 32GB of memory
     373 * to one or more VMs is: (32GB >> PAGE_SHIFT) * 16 bytes, or 128MBs. Or another
     374 * example, the VM heap cost when assigning 1GB to a VM will be: 4MB.
     375 *
     376 * A couple of cost examples for the total cost per-VM + kernel.
     377 * 32-bit Windows and 32-bit linux:
     378 *      1GB guest ram, 256K pages:  4MB +  2MB(+) =   6MB
     379 *      4GB guest ram, 1M pages:   16MB +  8MB(+) =  24MB
     380 *     32GB guest ram, 8M pages:  128MB + 64MB(+) = 192MB
     381 * 64-bit Windows and 64-bit linux:
     382 *      1GB guest ram, 256K pages:  4MB +  3MB(+) =   7MB
     383 *      4GB guest ram, 1M pages:   16MB + 12MB(+) =  28MB
     384 *     32GB guest ram, 8M pages:  128MB + 96MB(+) = 224MB
    236385 *
    237386 *
     
    280429 * The 3rd step is identical to what we're already doing when updating a
    281430 * physical handler, see pgmHandlerPhysicalSetRamFlagsAndFlushShadowPTs.
     431 *
     432 *
     433 * @subsection subsec_pgmPhys_Changes           Changes
     434 *
     435 * Breakdown of the changes involved...
     436 *
    282437 *
    283438 */
Note: See TracChangeset for help on using the changeset viewer.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette