VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 39436

Last change on this file since 39436 was 39402, checked in by vboxsync, 13 years ago

VMM: don't use generic IPE status codes, use specific ones. Part 1.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 175.8 KB
Line 
1/* $Id: GMMR0.cpp 39402 2011-11-23 16:25:04Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2011 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relation ship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 *
111 * @subsection sub_gmm_locking Serializing
112 *
113 * One simple fast mutex will be employed in the initial implementation, not
114 * two as mentioned in @ref subsec_pgmPhys_Serializing.
115 *
116 * @see @ref subsec_pgmPhys_Serializing
117 *
118 *
119 * @section sec_gmm_overcommit Memory Over-Commitment Management
120 *
121 * The GVM will have to do the system wide memory over-commitment
122 * management. My current ideas are:
123 * - Per VM oc policy that indicates how much to initially commit
124 * to it and what to do in a out-of-memory situation.
125 * - Prevent overtaxing the host.
126 *
127 * There are some challenges here, the main ones are configurability and
128 * security. Should we for instance permit anyone to request 100% memory
129 * commitment? Who should be allowed to do runtime adjustments of the
130 * config. And how to prevent these settings from being lost when the last
131 * VM process exits? The solution is probably to have an optional root
132 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
133 *
134 *
135 *
136 * @section sec_gmm_numa NUMA
137 *
138 * NUMA considerations will be designed and implemented a bit later.
139 *
140 * The preliminary guesses is that we will have to try allocate memory as
141 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
142 * threads). Which means it's mostly about allocation and sharing policies.
143 * Both the scheduler and allocator interface will to supply some NUMA info
144 * and we'll need to have a way to calc access costs.
145 *
146 */
147
148
149/*******************************************************************************
150* Header Files *
151*******************************************************************************/
152#define LOG_GROUP LOG_GROUP_GMM
153#include <VBox/rawpci.h>
154#include <VBox/vmm/vm.h>
155#include <VBox/vmm/gmm.h>
156#include "GMMR0Internal.h"
157#include <VBox/vmm/gvm.h>
158#include <VBox/vmm/pgm.h>
159#include <VBox/log.h>
160#include <VBox/param.h>
161#include <VBox/err.h>
162#include <iprt/asm.h>
163#include <iprt/avl.h>
164#include <iprt/list.h>
165#include <iprt/mem.h>
166#include <iprt/memobj.h>
167#include <iprt/mp.h>
168#include <iprt/semaphore.h>
169#include <iprt/string.h>
170#include <iprt/time.h>
171
172
173/*******************************************************************************
174* Structures and Typedefs *
175*******************************************************************************/
176/** Pointer to set of free chunks. */
177typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
178
179/**
180 * The per-page tracking structure employed by the GMM.
181 *
182 * On 32-bit hosts we'll some trickery is necessary to compress all
183 * the information into 32-bits. When the fSharedFree member is set,
184 * the 30th bit decides whether it's a free page or not.
185 *
186 * Because of the different layout on 32-bit and 64-bit hosts, macros
187 * are used to get and set some of the data.
188 */
189typedef union GMMPAGE
190{
191#if HC_ARCH_BITS == 64
192 /** Unsigned integer view. */
193 uint64_t u;
194
195 /** The common view. */
196 struct GMMPAGECOMMON
197 {
198 uint32_t uStuff1 : 32;
199 uint32_t uStuff2 : 30;
200 /** The page state. */
201 uint32_t u2State : 2;
202 } Common;
203
204 /** The view of a private page. */
205 struct GMMPAGEPRIVATE
206 {
207 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
208 uint32_t pfn;
209 /** The GVM handle. (64K VMs) */
210 uint32_t hGVM : 16;
211 /** Reserved. */
212 uint32_t u16Reserved : 14;
213 /** The page state. */
214 uint32_t u2State : 2;
215 } Private;
216
217 /** The view of a shared page. */
218 struct GMMPAGESHARED
219 {
220 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
221 uint32_t pfn;
222 /** The reference count (64K VMs). */
223 uint32_t cRefs : 16;
224 /** Reserved. Checksum or something? Two hGVMs for forking? */
225 uint32_t u14Reserved : 14;
226 /** The page state. */
227 uint32_t u2State : 2;
228 } Shared;
229
230 /** The view of a free page. */
231 struct GMMPAGEFREE
232 {
233 /** The index of the next page in the free list. UINT16_MAX is NIL. */
234 uint16_t iNext;
235 /** Reserved. Checksum or something? */
236 uint16_t u16Reserved0;
237 /** Reserved. Checksum or something? */
238 uint32_t u30Reserved1 : 30;
239 /** The page state. */
240 uint32_t u2State : 2;
241 } Free;
242
243#else /* 32-bit */
244 /** Unsigned integer view. */
245 uint32_t u;
246
247 /** The common view. */
248 struct GMMPAGECOMMON
249 {
250 uint32_t uStuff : 30;
251 /** The page state. */
252 uint32_t u2State : 2;
253 } Common;
254
255 /** The view of a private page. */
256 struct GMMPAGEPRIVATE
257 {
258 /** The guest page frame number. (Max addressable: 2 ^ 36) */
259 uint32_t pfn : 24;
260 /** The GVM handle. (127 VMs) */
261 uint32_t hGVM : 7;
262 /** The top page state bit, MBZ. */
263 uint32_t fZero : 1;
264 } Private;
265
266 /** The view of a shared page. */
267 struct GMMPAGESHARED
268 {
269 /** The reference count. */
270 uint32_t cRefs : 30;
271 /** The page state. */
272 uint32_t u2State : 2;
273 } Shared;
274
275 /** The view of a free page. */
276 struct GMMPAGEFREE
277 {
278 /** The index of the next page in the free list. UINT16_MAX is NIL. */
279 uint32_t iNext : 16;
280 /** Reserved. Checksum or something? */
281 uint32_t u14Reserved : 14;
282 /** The page state. */
283 uint32_t u2State : 2;
284 } Free;
285#endif
286} GMMPAGE;
287AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
288/** Pointer to a GMMPAGE. */
289typedef GMMPAGE *PGMMPAGE;
290
291
292/** @name The Page States.
293 * @{ */
294/** A private page. */
295#define GMM_PAGE_STATE_PRIVATE 0
296/** A private page - alternative value used on the 32-bit implementation.
297 * This will never be used on 64-bit hosts. */
298#define GMM_PAGE_STATE_PRIVATE_32 1
299/** A shared page. */
300#define GMM_PAGE_STATE_SHARED 2
301/** A free page. */
302#define GMM_PAGE_STATE_FREE 3
303/** @} */
304
305
306/** @def GMM_PAGE_IS_PRIVATE
307 *
308 * @returns true if private, false if not.
309 * @param pPage The GMM page.
310 */
311#if HC_ARCH_BITS == 64
312# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
313#else
314# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
315#endif
316
317/** @def GMM_PAGE_IS_SHARED
318 *
319 * @returns true if shared, false if not.
320 * @param pPage The GMM page.
321 */
322#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
323
324/** @def GMM_PAGE_IS_FREE
325 *
326 * @returns true if free, false if not.
327 * @param pPage The GMM page.
328 */
329#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
330
331/** @def GMM_PAGE_PFN_LAST
332 * The last valid guest pfn range.
333 * @remark Some of the values outside the range has special meaning,
334 * see GMM_PAGE_PFN_UNSHAREABLE.
335 */
336#if HC_ARCH_BITS == 64
337# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
338#else
339# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
340#endif
341AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
342
343/** @def GMM_PAGE_PFN_UNSHAREABLE
344 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
345 */
346#if HC_ARCH_BITS == 64
347# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
348#else
349# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
350#endif
351AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
352
353
354/**
355 * A GMM allocation chunk ring-3 mapping record.
356 *
357 * This should really be associated with a session and not a VM, but
358 * it's simpler to associated with a VM and cleanup with the VM object
359 * is destroyed.
360 */
361typedef struct GMMCHUNKMAP
362{
363 /** The mapping object. */
364 RTR0MEMOBJ hMapObj;
365 /** The VM owning the mapping. */
366 PGVM pGVM;
367} GMMCHUNKMAP;
368/** Pointer to a GMM allocation chunk mapping. */
369typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
370
371
372/**
373 * A GMM allocation chunk.
374 */
375typedef struct GMMCHUNK
376{
377 /** The AVL node core.
378 * The Key is the chunk ID. (Giant mtx.) */
379 AVLU32NODECORE Core;
380 /** The memory object.
381 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
382 * what the host can dish up with. (Chunk mtx protects mapping accesses
383 * and related frees.) */
384 RTR0MEMOBJ hMemObj;
385 /** Pointer to the next chunk in the free list. (Giant mtx.) */
386 PGMMCHUNK pFreeNext;
387 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
388 PGMMCHUNK pFreePrev;
389 /** Pointer to the free set this chunk belongs to. NULL for
390 * chunks with no free pages. (Giant mtx.) */
391 PGMMCHUNKFREESET pSet;
392 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
393 RTLISTNODE ListNode;
394 /** Pointer to an array of mappings. (Chunk mtx.) */
395 PGMMCHUNKMAP paMappingsX;
396 /** The number of mappings. (Chunk mtx.) */
397 uint16_t cMappingsX;
398 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
399 * mapping or freeing anything. (Giant mtx.) */
400 uint8_t volatile iChunkMtx;
401 /** Flags field reserved for future use (like eliminating enmType).
402 * (Giant mtx.) */
403 uint8_t fFlags;
404 /** The head of the list of free pages. UINT16_MAX is the NIL value.
405 * (Giant mtx.) */
406 uint16_t iFreeHead;
407 /** The number of free pages. (Giant mtx.) */
408 uint16_t cFree;
409 /** The GVM handle of the VM that first allocated pages from this chunk, this
410 * is used as a preference when there are several chunks to choose from.
411 * When in bound memory mode this isn't a preference any longer. (Giant
412 * mtx.) */
413 uint16_t hGVM;
414 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
415 * future use.) (Giant mtx.) */
416 uint16_t idNumaNode;
417 /** The number of private pages. (Giant mtx.) */
418 uint16_t cPrivate;
419 /** The number of shared pages. (Giant mtx.) */
420 uint16_t cShared;
421 /** The pages. (Giant mtx.) */
422 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
423} GMMCHUNK;
424
425/** Indicates that the NUMA properies of the memory is unknown. */
426#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
427
428/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
429 * @{ */
430/** Indicates that the chunk is a large page (2MB). */
431#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
432/** @} */
433
434
435/**
436 * An allocation chunk TLB entry.
437 */
438typedef struct GMMCHUNKTLBE
439{
440 /** The chunk id. */
441 uint32_t idChunk;
442 /** Pointer to the chunk. */
443 PGMMCHUNK pChunk;
444} GMMCHUNKTLBE;
445/** Pointer to an allocation chunk TLB entry. */
446typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
447
448
449/** The number of entries tin the allocation chunk TLB. */
450#define GMM_CHUNKTLB_ENTRIES 32
451/** Gets the TLB entry index for the given Chunk ID. */
452#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
453
454/**
455 * An allocation chunk TLB.
456 */
457typedef struct GMMCHUNKTLB
458{
459 /** The TLB entries. */
460 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
461} GMMCHUNKTLB;
462/** Pointer to an allocation chunk TLB. */
463typedef GMMCHUNKTLB *PGMMCHUNKTLB;
464
465
466/**
467 * The GMM instance data.
468 */
469typedef struct GMM
470{
471 /** Magic / eye catcher. GMM_MAGIC */
472 uint32_t u32Magic;
473 /** The number of threads waiting on the mutex. */
474 uint32_t cMtxContenders;
475 /** The fast mutex protecting the GMM.
476 * More fine grained locking can be implemented later if necessary. */
477 RTSEMFASTMUTEX hMtx;
478#ifdef VBOX_STRICT
479 /** The current mutex owner. */
480 RTNATIVETHREAD hMtxOwner;
481#endif
482 /** The chunk tree. */
483 PAVLU32NODECORE pChunks;
484 /** The chunk TLB. */
485 GMMCHUNKTLB ChunkTLB;
486 /** The private free set. */
487 GMMCHUNKFREESET PrivateX;
488 /** The shared free set. */
489 GMMCHUNKFREESET Shared;
490
491 /** Shared module tree (global). */
492 /** @todo separate trees for distinctly different guest OSes. */
493 PAVLGCPTRNODECORE pGlobalSharedModuleTree;
494
495 /** The chunk list. For simplifying the cleanup process. */
496 RTLISTNODE ChunkList;
497
498 /** The maximum number of pages we're allowed to allocate.
499 * @gcfgm 64-bit GMM/MaxPages Direct.
500 * @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
501 uint64_t cMaxPages;
502 /** The number of pages that has been reserved.
503 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
504 uint64_t cReservedPages;
505 /** The number of pages that we have over-committed in reservations. */
506 uint64_t cOverCommittedPages;
507 /** The number of actually allocated (committed if you like) pages. */
508 uint64_t cAllocatedPages;
509 /** The number of pages that are shared. A subset of cAllocatedPages. */
510 uint64_t cSharedPages;
511 /** The number of pages that are actually shared between VMs. */
512 uint64_t cDuplicatePages;
513 /** The number of pages that are shared that has been left behind by
514 * VMs not doing proper cleanups. */
515 uint64_t cLeftBehindSharedPages;
516 /** The number of allocation chunks.
517 * (The number of pages we've allocated from the host can be derived from this.) */
518 uint32_t cChunks;
519 /** The number of current ballooned pages. */
520 uint64_t cBalloonedPages;
521
522 /** The legacy allocation mode indicator.
523 * This is determined at initialization time. */
524 bool fLegacyAllocationMode;
525 /** The bound memory mode indicator.
526 * When set, the memory will be bound to a specific VM and never
527 * shared. This is always set if fLegacyAllocationMode is set.
528 * (Also determined at initialization time.) */
529 bool fBoundMemoryMode;
530 /** The number of registered VMs. */
531 uint16_t cRegisteredVMs;
532
533 /** The number of freed chunks ever. This is used a list generation to
534 * avoid restarting the cleanup scanning when the list wasn't modified. */
535 uint32_t cFreedChunks;
536 /** The previous allocated Chunk ID.
537 * Used as a hint to avoid scanning the whole bitmap. */
538 uint32_t idChunkPrev;
539 /** Chunk ID allocation bitmap.
540 * Bits of allocated IDs are set, free ones are clear.
541 * The NIL id (0) is marked allocated. */
542 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
543
544 /** The index of the next mutex to use. */
545 uint32_t iNextChunkMtx;
546 /** Chunk locks for reducing lock contention without having to allocate
547 * one lock per chunk. */
548 struct
549 {
550 /** The mutex */
551 RTSEMFASTMUTEX hMtx;
552 /** The number of threads currently using this mutex. */
553 uint32_t volatile cUsers;
554 } aChunkMtx[64];
555} GMM;
556/** Pointer to the GMM instance. */
557typedef GMM *PGMM;
558
559/** The value of GMM::u32Magic (Katsuhiro Otomo). */
560#define GMM_MAGIC UINT32_C(0x19540414)
561
562
563/**
564 * GMM chunk mutex state.
565 *
566 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
567 * gmmR0ChunkMutex* methods.
568 */
569typedef struct GMMR0CHUNKMTXSTATE
570{
571 PGMM pGMM;
572 /** The index of the chunk mutex. */
573 uint8_t iChunkMtx;
574 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
575 uint8_t fFlags;
576} GMMR0CHUNKMTXSTATE;
577/** Pointer to a chunk mutex state. */
578typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
579
580/** @name GMMR0CHUNK_MTX_XXX
581 * @{ */
582#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
583#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
584#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
585#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
586#define GMMR0CHUNK_MTX_END UINT32_C(4)
587/** @} */
588
589
590/*******************************************************************************
591* Global Variables *
592*******************************************************************************/
593/** Pointer to the GMM instance data. */
594static PGMM g_pGMM = NULL;
595
596/** Macro for obtaining and validating the g_pGMM pointer.
597 *
598 * On failure it will return from the invoking function with the specified
599 * return value.
600 *
601 * @param pGMM The name of the pGMM variable.
602 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
603 * status codes.
604 */
605#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
606 do { \
607 (pGMM) = g_pGMM; \
608 AssertPtrReturn((pGMM), (rc)); \
609 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
610 } while (0)
611
612/** Macro for obtaining and validating the g_pGMM pointer, void function
613 * variant.
614 *
615 * On failure it will return from the invoking function.
616 *
617 * @param pGMM The name of the pGMM variable.
618 */
619#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
620 do { \
621 (pGMM) = g_pGMM; \
622 AssertPtrReturnVoid((pGMM)); \
623 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
624 } while (0)
625
626
627/** @def GMM_CHECK_SANITY_UPON_ENTERING
628 * Checks the sanity of the GMM instance data before making changes.
629 *
630 * This is macro is a stub by default and must be enabled manually in the code.
631 *
632 * @returns true if sane, false if not.
633 * @param pGMM The name of the pGMM variable.
634 */
635#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
636# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
637#else
638# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
639#endif
640
641/** @def GMM_CHECK_SANITY_UPON_LEAVING
642 * Checks the sanity of the GMM instance data after making changes.
643 *
644 * This is macro is a stub by default and must be enabled manually in the code.
645 *
646 * @returns true if sane, false if not.
647 * @param pGMM The name of the pGMM variable.
648 */
649#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
650# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
651#else
652# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
653#endif
654
655/** @def GMM_CHECK_SANITY_IN_LOOPS
656 * Checks the sanity of the GMM instance in the allocation loops.
657 *
658 * This is macro is a stub by default and must be enabled manually in the code.
659 *
660 * @returns true if sane, false if not.
661 * @param pGMM The name of the pGMM variable.
662 */
663#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
664# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
665#else
666# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
667#endif
668
669
670/*******************************************************************************
671* Internal Functions *
672*******************************************************************************/
673static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
674static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
675DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
676DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
677DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
678#ifdef GMMR0_WITH_SANITY_CHECK
679static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
680#endif
681static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
682DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
683DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
684static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
685#ifdef VBOX_WITH_PAGE_SHARING
686static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
687#endif
688
689
690
691/**
692 * Initializes the GMM component.
693 *
694 * This is called when the VMMR0.r0 module is loaded and protected by the
695 * loader semaphore.
696 *
697 * @returns VBox status code.
698 */
699GMMR0DECL(int) GMMR0Init(void)
700{
701 LogFlow(("GMMInit:\n"));
702
703 /*
704 * Allocate the instance data and the locks.
705 */
706 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
707 if (!pGMM)
708 return VERR_NO_MEMORY;
709
710 pGMM->u32Magic = GMM_MAGIC;
711 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
712 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
713 RTListInit(&pGMM->ChunkList);
714 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
715
716 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
717 if (RT_SUCCESS(rc))
718 {
719 unsigned iMtx;
720 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
721 {
722 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
723 if (RT_FAILURE(rc))
724 break;
725 }
726 if (RT_SUCCESS(rc))
727 {
728 /*
729 * Check and see if RTR0MemObjAllocPhysNC works.
730 */
731#if 0 /* later, see #3170. */
732 RTR0MEMOBJ MemObj;
733 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
734 if (RT_SUCCESS(rc))
735 {
736 rc = RTR0MemObjFree(MemObj, true);
737 AssertRC(rc);
738 }
739 else if (rc == VERR_NOT_SUPPORTED)
740 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
741 else
742 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
743#else
744# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
745 pGMM->fLegacyAllocationMode = false;
746# if ARCH_BITS == 32
747 /* Don't reuse possibly partial chunks because of the virtual
748 address space limitation. */
749 pGMM->fBoundMemoryMode = true;
750# else
751 pGMM->fBoundMemoryMode = false;
752# endif
753# else
754 pGMM->fLegacyAllocationMode = true;
755 pGMM->fBoundMemoryMode = true;
756# endif
757#endif
758
759 /*
760 * Query system page count and guess a reasonable cMaxPages value.
761 */
762 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
763
764 g_pGMM = pGMM;
765 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
766 return VINF_SUCCESS;
767 }
768
769 /*
770 * Bail out.
771 */
772 while (iMtx-- > 0)
773 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
774 RTSemFastMutexDestroy(pGMM->hMtx);
775 }
776
777 pGMM->u32Magic = 0;
778 RTMemFree(pGMM);
779 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
780 return rc;
781}
782
783
784/**
785 * Terminates the GMM component.
786 */
787GMMR0DECL(void) GMMR0Term(void)
788{
789 LogFlow(("GMMTerm:\n"));
790
791 /*
792 * Take care / be paranoid...
793 */
794 PGMM pGMM = g_pGMM;
795 if (!VALID_PTR(pGMM))
796 return;
797 if (pGMM->u32Magic != GMM_MAGIC)
798 {
799 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
800 return;
801 }
802
803 /*
804 * Undo what init did and free all the resources we've acquired.
805 */
806 /* Destroy the fundamentals. */
807 g_pGMM = NULL;
808 pGMM->u32Magic = ~GMM_MAGIC;
809 RTSemFastMutexDestroy(pGMM->hMtx);
810 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
811
812 /* Free any chunks still hanging around. */
813 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
814
815 /* Destroy the chunk locks. */
816 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
817 {
818 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
819 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
820 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
821 }
822
823 /* Finally the instance data itself. */
824 RTMemFree(pGMM);
825 LogFlow(("GMMTerm: done\n"));
826}
827
828
829/**
830 * RTAvlU32Destroy callback.
831 *
832 * @returns 0
833 * @param pNode The node to destroy.
834 * @param pvGMM The GMM handle.
835 */
836static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
837{
838 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
839
840 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
841 SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
842 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
843
844 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
845 if (RT_FAILURE(rc))
846 {
847 SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
848 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
849 AssertRC(rc);
850 }
851 pChunk->hMemObj = NIL_RTR0MEMOBJ;
852
853 RTMemFree(pChunk->paMappingsX);
854 pChunk->paMappingsX = NULL;
855
856 RTMemFree(pChunk);
857 NOREF(pvGMM);
858 return 0;
859}
860
861
862/**
863 * Initializes the per-VM data for the GMM.
864 *
865 * This is called from within the GVMM lock (from GVMMR0CreateVM)
866 * and should only initialize the data members so GMMR0CleanupVM
867 * can deal with them. We reserve no memory or anything here,
868 * that's done later in GMMR0InitVM.
869 *
870 * @param pGVM Pointer to the Global VM structure.
871 */
872GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
873{
874 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
875
876 pGVM->gmm.s.enmPolicy = GMMOCPOLICY_INVALID;
877 pGVM->gmm.s.enmPriority = GMMPRIORITY_INVALID;
878 pGVM->gmm.s.fMayAllocate = false;
879}
880
881
882/**
883 * Acquires the GMM giant lock.
884 *
885 * @returns Assert status code from RTSemFastMutexRequest.
886 * @param pGMM Pointer to the GMM instance.
887 */
888static int gmmR0MutexAcquire(PGMM pGMM)
889{
890 ASMAtomicIncU32(&pGMM->cMtxContenders);
891 int rc = RTSemFastMutexRequest(pGMM->hMtx);
892 ASMAtomicDecU32(&pGMM->cMtxContenders);
893 AssertRC(rc);
894#ifdef VBOX_STRICT
895 pGMM->hMtxOwner = RTThreadNativeSelf();
896#endif
897 return rc;
898}
899
900
901/**
902 * Releases the GMM giant lock.
903 *
904 * @returns Assert status code from RTSemFastMutexRequest.
905 * @param pGMM Pointer to the GMM instance.
906 */
907static int gmmR0MutexRelease(PGMM pGMM)
908{
909#ifdef VBOX_STRICT
910 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
911#endif
912 int rc = RTSemFastMutexRelease(pGMM->hMtx);
913 AssertRC(rc);
914 return rc;
915}
916
917
918/**
919 * Yields the GMM giant lock if there is contention and a certain minimum time
920 * has elapsed since we took it.
921 *
922 * @returns @c true if the mutex was yielded, @c false if not.
923 * @param pGMM Pointer to the GMM instance.
924 * @param puLockNanoTS Where the lock acquisition time stamp is kept
925 * (in/out).
926 */
927static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
928{
929 /*
930 * If nobody is contending the mutex, don't bother checking the time.
931 */
932 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
933 return false;
934
935 /*
936 * Don't yield if we haven't executed for at least 2 milliseconds.
937 */
938 uint64_t uNanoNow = RTTimeSystemNanoTS();
939 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
940 return false;
941
942 /*
943 * Yield the mutex.
944 */
945#ifdef VBOX_STRICT
946 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
947#endif
948 ASMAtomicIncU32(&pGMM->cMtxContenders);
949 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
950
951 RTThreadYield();
952
953 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
954 *puLockNanoTS = RTTimeSystemNanoTS();
955 ASMAtomicDecU32(&pGMM->cMtxContenders);
956#ifdef VBOX_STRICT
957 pGMM->hMtxOwner = RTThreadNativeSelf();
958#endif
959
960 return true;
961}
962
963
964/**
965 * Acquires a chunk lock.
966 *
967 * The caller must own the giant lock.
968 *
969 * @returns Assert status code from RTSemFastMutexRequest.
970 * @param pMtxState The chunk mutex state info. (Avoids
971 * passing the same flags and stuff around
972 * for subsequent release and drop-giant
973 * calls.)
974 * @param pGMM Pointer to the GMM instance.
975 * @param pChunk Pointer to the chunk.
976 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
977 */
978static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
979{
980 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
981 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
982
983 pMtxState->pGMM = pGMM;
984 pMtxState->fFlags = (uint8_t)fFlags;
985
986 /*
987 * Get the lock index and reference the lock.
988 */
989 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
990 uint32_t iChunkMtx = pChunk->iChunkMtx;
991 if (iChunkMtx == UINT8_MAX)
992 {
993 iChunkMtx = pGMM->iNextChunkMtx++;
994 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
995
996 /* Try get an unused one... */
997 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
998 {
999 iChunkMtx = pGMM->iNextChunkMtx++;
1000 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1001 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1002 {
1003 iChunkMtx = pGMM->iNextChunkMtx++;
1004 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1005 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1006 {
1007 iChunkMtx = pGMM->iNextChunkMtx++;
1008 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1009 }
1010 }
1011 }
1012
1013 pChunk->iChunkMtx = iChunkMtx;
1014 }
1015 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1016 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1017 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1018
1019 /*
1020 * Drop the giant?
1021 */
1022 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1023 {
1024 /** @todo GMM life cycle cleanup (we may race someone
1025 * destroying and cleaning up GMM)? */
1026 gmmR0MutexRelease(pGMM);
1027 }
1028
1029 /*
1030 * Take the chunk mutex.
1031 */
1032 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1033 AssertRC(rc);
1034 return rc;
1035}
1036
1037
1038/**
1039 * Releases the GMM giant lock.
1040 *
1041 * @returns Assert status code from RTSemFastMutexRequest.
1042 * @param pGMM Pointer to the GMM instance.
1043 * @param pChunk Pointer to the chunk if it's still
1044 * alive, NULL if it isn't. This is used to deassociate
1045 * the chunk from the mutex on the way out so a new one
1046 * can be selected next time, thus avoiding contented
1047 * mutexes.
1048 */
1049static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1050{
1051 PGMM pGMM = pMtxState->pGMM;
1052
1053 /*
1054 * Release the chunk mutex and reacquire the giant if requested.
1055 */
1056 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1057 AssertRC(rc);
1058 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1059 rc = gmmR0MutexAcquire(pGMM);
1060 else
1061 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1062
1063 /*
1064 * Drop the chunk mutex user reference and deassociate it from the chunk
1065 * when possible.
1066 */
1067 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1068 && pChunk
1069 && RT_SUCCESS(rc) )
1070 {
1071 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1072 pChunk->iChunkMtx = UINT8_MAX;
1073 else
1074 {
1075 rc = gmmR0MutexAcquire(pGMM);
1076 if (RT_SUCCESS(rc))
1077 {
1078 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1079 pChunk->iChunkMtx = UINT8_MAX;
1080 rc = gmmR0MutexRelease(pGMM);
1081 }
1082 }
1083 }
1084
1085 pMtxState->pGMM = NULL;
1086 return rc;
1087}
1088
1089
1090/**
1091 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1092 * chunk locked.
1093 *
1094 * This only works if gmmR0ChunkMutexAcquire was called with
1095 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1096 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1097 *
1098 * @returns VBox status code (assuming success is ok).
1099 * @param pMtxState Pointer to the chunk mutex state.
1100 */
1101static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1102{
1103 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1104 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1105 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1106 /** @todo GMM life cycle cleanup (we may race someone
1107 * destroying and cleaning up GMM)? */
1108 return gmmR0MutexRelease(pMtxState->pGMM);
1109}
1110
1111
1112/**
1113 * For experimenting with NUMA affinity and such.
1114 *
1115 * @returns The current NUMA Node ID.
1116 */
1117static uint16_t gmmR0GetCurrentNumaNodeId(void)
1118{
1119#if 1
1120 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1121#else
1122 return RTMpCpuId() / 16;
1123#endif
1124}
1125
1126
1127
1128/**
1129 * Cleans up when a VM is terminating.
1130 *
1131 * @param pGVM Pointer to the Global VM structure.
1132 */
1133GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1134{
1135 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1136
1137 PGMM pGMM;
1138 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1139
1140#ifdef VBOX_WITH_PAGE_SHARING
1141 /*
1142 * Clean up all registered shared modules first.
1143 */
1144 gmmR0SharedModuleCleanup(pGMM, pGVM);
1145#endif
1146
1147 gmmR0MutexAcquire(pGMM);
1148 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1149 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1150
1151 /*
1152 * The policy is 'INVALID' until the initial reservation
1153 * request has been serviced.
1154 */
1155 if ( pGVM->gmm.s.enmPolicy > GMMOCPOLICY_INVALID
1156 && pGVM->gmm.s.enmPolicy < GMMOCPOLICY_END)
1157 {
1158 /*
1159 * If it's the last VM around, we can skip walking all the chunk looking
1160 * for the pages owned by this VM and instead flush the whole shebang.
1161 *
1162 * This takes care of the eventuality that a VM has left shared page
1163 * references behind (shouldn't happen of course, but you never know).
1164 */
1165 Assert(pGMM->cRegisteredVMs);
1166 pGMM->cRegisteredVMs--;
1167
1168 /*
1169 * Walk the entire pool looking for pages that belong to this VM
1170 * and leftover mappings. (This'll only catch private pages,
1171 * shared pages will be 'left behind'.)
1172 */
1173 uint64_t cPrivatePages = pGVM->gmm.s.cPrivatePages; /* save */
1174
1175 unsigned iCountDown = 64;
1176 bool fRedoFromStart;
1177 PGMMCHUNK pChunk;
1178 do
1179 {
1180 fRedoFromStart = false;
1181 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1182 {
1183 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1184 if (gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1185 {
1186 /* We left the giant mutex, so reset the yield counters. */
1187 uLockNanoTS = RTTimeSystemNanoTS();
1188 iCountDown = 64;
1189 }
1190 else
1191 {
1192 /* Didn't leave it, so do normal yielding. */
1193 if (!iCountDown)
1194 gmmR0MutexYield(pGMM, &uLockNanoTS);
1195 else
1196 iCountDown--;
1197 }
1198 if (pGMM->cFreedChunks != cFreeChunksOld)
1199 break;
1200 }
1201 } while (fRedoFromStart);
1202
1203 if (pGVM->gmm.s.cPrivatePages)
1204 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.cPrivatePages);
1205
1206 pGMM->cAllocatedPages -= cPrivatePages;
1207
1208 /*
1209 * Free empty chunks.
1210 */
1211 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1212 do
1213 {
1214 fRedoFromStart = false;
1215 iCountDown = 10240;
1216 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1217 while (pChunk)
1218 {
1219 PGMMCHUNK pNext = pChunk->pFreeNext;
1220 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1221 if ( !pGMM->fBoundMemoryMode
1222 || pChunk->hGVM == pGVM->hSelf)
1223 {
1224 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1225 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1226 {
1227 /* We've left the giant mutex, restart? (+1 for our unlink) */
1228 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1229 if (fRedoFromStart)
1230 break;
1231 uLockNanoTS = RTTimeSystemNanoTS();
1232 iCountDown = 10240;
1233 }
1234 }
1235
1236 /* Advance and maybe yield the lock. */
1237 pChunk = pNext;
1238 if (--iCountDown == 0)
1239 {
1240 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1241 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1242 && pPrivateSet->idGeneration != idGenerationOld;
1243 if (fRedoFromStart)
1244 break;
1245 iCountDown = 10240;
1246 }
1247 }
1248 } while (fRedoFromStart);
1249
1250 /*
1251 * Account for shared pages that weren't freed.
1252 */
1253 if (pGVM->gmm.s.cSharedPages)
1254 {
1255 Assert(pGMM->cSharedPages >= pGVM->gmm.s.cSharedPages);
1256 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.cSharedPages);
1257 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.cSharedPages;
1258 }
1259
1260 /*
1261 * Clean up balloon statistics in case the VM process crashed.
1262 */
1263 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.cBalloonedPages);
1264 pGMM->cBalloonedPages -= pGVM->gmm.s.cBalloonedPages;
1265
1266 /*
1267 * Update the over-commitment management statistics.
1268 */
1269 pGMM->cReservedPages -= pGVM->gmm.s.Reserved.cBasePages
1270 + pGVM->gmm.s.Reserved.cFixedPages
1271 + pGVM->gmm.s.Reserved.cShadowPages;
1272 switch (pGVM->gmm.s.enmPolicy)
1273 {
1274 case GMMOCPOLICY_NO_OC:
1275 break;
1276 default:
1277 /** @todo Update GMM->cOverCommittedPages */
1278 break;
1279 }
1280 }
1281
1282 /* zap the GVM data. */
1283 pGVM->gmm.s.enmPolicy = GMMOCPOLICY_INVALID;
1284 pGVM->gmm.s.enmPriority = GMMPRIORITY_INVALID;
1285 pGVM->gmm.s.fMayAllocate = false;
1286
1287 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1288 gmmR0MutexRelease(pGMM);
1289
1290 LogFlow(("GMMR0CleanupVM: returns\n"));
1291}
1292
1293
1294/**
1295 * Scan one chunk for private pages belonging to the specified VM.
1296 *
1297 * @note This function may drop the gian mutex!
1298 *
1299 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1300 * we didn't.
1301 * @param pGMM Pointer to the GMM instance.
1302 * @param pGVM The global VM handle.
1303 * @param pChunk The chunk to scan.
1304 */
1305static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1306{
1307 /*
1308 * Look for pages belonging to the VM.
1309 * (Perform some internal checks while we're scanning.)
1310 */
1311#ifndef VBOX_STRICT
1312 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1313#endif
1314 {
1315 unsigned cPrivate = 0;
1316 unsigned cShared = 0;
1317 unsigned cFree = 0;
1318
1319 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1320
1321 uint16_t hGVM = pGVM->hSelf;
1322 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1323 while (iPage-- > 0)
1324 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1325 {
1326 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1327 {
1328 /*
1329 * Free the page.
1330 *
1331 * The reason for not using gmmR0FreePrivatePage here is that we
1332 * must *not* cause the chunk to be freed from under us - we're in
1333 * an AVL tree walk here.
1334 */
1335 pChunk->aPages[iPage].u = 0;
1336 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1337 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1338 pChunk->iFreeHead = iPage;
1339 pChunk->cPrivate--;
1340 pChunk->cFree++;
1341 pGVM->gmm.s.cPrivatePages--;
1342 cFree++;
1343 }
1344 else
1345 cPrivate++;
1346 }
1347 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1348 cFree++;
1349 else
1350 cShared++;
1351
1352 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1353
1354 /*
1355 * Did it add up?
1356 */
1357 if (RT_UNLIKELY( pChunk->cFree != cFree
1358 || pChunk->cPrivate != cPrivate
1359 || pChunk->cShared != cShared))
1360 {
1361 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1362 pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1363 pChunk->cFree = cFree;
1364 pChunk->cPrivate = cPrivate;
1365 pChunk->cShared = cShared;
1366 }
1367 }
1368
1369 /*
1370 * If not in bound memory mode, we should reset the hGVM field
1371 * if it has our handle in it.
1372 */
1373 if (pChunk->hGVM == pGVM->hSelf)
1374 {
1375 if (!g_pGMM->fBoundMemoryMode)
1376 pChunk->hGVM = NIL_GVM_HANDLE;
1377 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1378 {
1379 SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1380 pChunk, pChunk->Core.Key, pChunk->cFree);
1381 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1382
1383 gmmR0UnlinkChunk(pChunk);
1384 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1385 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1386 }
1387 }
1388
1389 /*
1390 * Look for a mapping belonging to the terminating VM.
1391 */
1392 GMMR0CHUNKMTXSTATE MtxState;
1393 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1394 unsigned cMappings = pChunk->cMappingsX;
1395 for (unsigned i = 0; i < cMappings; i++)
1396 if (pChunk->paMappingsX[i].pGVM == pGVM)
1397 {
1398 gmmR0ChunkMutexDropGiant(&MtxState);
1399
1400 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1401
1402 cMappings--;
1403 if (i < cMappings)
1404 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1405 pChunk->paMappingsX[cMappings].pGVM = NULL;
1406 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1407 Assert(pChunk->cMappingsX - 1U == cMappings);
1408 pChunk->cMappingsX = cMappings;
1409
1410 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1411 if (RT_FAILURE(rc))
1412 {
1413 SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1414 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1415 AssertRC(rc);
1416 }
1417
1418 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1419 return true;
1420 }
1421
1422 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1423 return false;
1424}
1425
1426
1427/**
1428 * The initial resource reservations.
1429 *
1430 * This will make memory reservations according to policy and priority. If there aren't
1431 * sufficient resources available to sustain the VM this function will fail and all
1432 * future allocations requests will fail as well.
1433 *
1434 * These are just the initial reservations made very very early during the VM creation
1435 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1436 * ring-3 init has completed.
1437 *
1438 * @returns VBox status code.
1439 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1440 * @retval VERR_GMM_
1441 *
1442 * @param pVM Pointer to the shared VM structure.
1443 * @param idCpu VCPU id
1444 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1445 * This does not include MMIO2 and similar.
1446 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1447 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1448 * hyper heap, MMIO2 and similar.
1449 * @param enmPolicy The OC policy to use on this VM.
1450 * @param enmPriority The priority in an out-of-memory situation.
1451 *
1452 * @thread The creator thread / EMT.
1453 */
1454GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1455 GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1456{
1457 LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1458 pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1459
1460 /*
1461 * Validate, get basics and take the semaphore.
1462 */
1463 PGMM pGMM;
1464 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1465 PGVM pGVM;
1466 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1467 if (RT_FAILURE(rc))
1468 return rc;
1469
1470 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1471 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1472 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1473 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1474 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1475
1476 gmmR0MutexAcquire(pGMM);
1477 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1478 {
1479 if ( !pGVM->gmm.s.Reserved.cBasePages
1480 && !pGVM->gmm.s.Reserved.cFixedPages
1481 && !pGVM->gmm.s.Reserved.cShadowPages)
1482 {
1483 /*
1484 * Check if we can accommodate this.
1485 */
1486 /* ... later ... */
1487 if (RT_SUCCESS(rc))
1488 {
1489 /*
1490 * Update the records.
1491 */
1492 pGVM->gmm.s.Reserved.cBasePages = cBasePages;
1493 pGVM->gmm.s.Reserved.cFixedPages = cFixedPages;
1494 pGVM->gmm.s.Reserved.cShadowPages = cShadowPages;
1495 pGVM->gmm.s.enmPolicy = enmPolicy;
1496 pGVM->gmm.s.enmPriority = enmPriority;
1497 pGVM->gmm.s.fMayAllocate = true;
1498
1499 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1500 pGMM->cRegisteredVMs++;
1501 }
1502 }
1503 else
1504 rc = VERR_WRONG_ORDER;
1505 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1506 }
1507 else
1508 rc = VERR_GMM_IS_NOT_SANE;
1509 gmmR0MutexRelease(pGMM);
1510 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1511 return rc;
1512}
1513
1514
1515/**
1516 * VMMR0 request wrapper for GMMR0InitialReservation.
1517 *
1518 * @returns see GMMR0InitialReservation.
1519 * @param pVM Pointer to the shared VM structure.
1520 * @param idCpu VCPU id
1521 * @param pReq The request packet.
1522 */
1523GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1524{
1525 /*
1526 * Validate input and pass it on.
1527 */
1528 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1529 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1530 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1531
1532 return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1533}
1534
1535
1536/**
1537 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1538 *
1539 * @returns VBox status code.
1540 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1541 *
1542 * @param pVM Pointer to the shared VM structure.
1543 * @param idCpu VCPU id
1544 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1545 * This does not include MMIO2 and similar.
1546 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1547 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1548 * hyper heap, MMIO2 and similar.
1549 *
1550 * @thread EMT.
1551 */
1552GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1553{
1554 LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1555 pVM, cBasePages, cShadowPages, cFixedPages));
1556
1557 /*
1558 * Validate, get basics and take the semaphore.
1559 */
1560 PGMM pGMM;
1561 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1562 PGVM pGVM;
1563 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1564 if (RT_FAILURE(rc))
1565 return rc;
1566
1567 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1568 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1569 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1570
1571 gmmR0MutexAcquire(pGMM);
1572 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1573 {
1574 if ( pGVM->gmm.s.Reserved.cBasePages
1575 && pGVM->gmm.s.Reserved.cFixedPages
1576 && pGVM->gmm.s.Reserved.cShadowPages)
1577 {
1578 /*
1579 * Check if we can accommodate this.
1580 */
1581 /* ... later ... */
1582 if (RT_SUCCESS(rc))
1583 {
1584 /*
1585 * Update the records.
1586 */
1587 pGMM->cReservedPages -= pGVM->gmm.s.Reserved.cBasePages
1588 + pGVM->gmm.s.Reserved.cFixedPages
1589 + pGVM->gmm.s.Reserved.cShadowPages;
1590 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1591
1592 pGVM->gmm.s.Reserved.cBasePages = cBasePages;
1593 pGVM->gmm.s.Reserved.cFixedPages = cFixedPages;
1594 pGVM->gmm.s.Reserved.cShadowPages = cShadowPages;
1595 }
1596 }
1597 else
1598 rc = VERR_WRONG_ORDER;
1599 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1600 }
1601 else
1602 rc = VERR_GMM_IS_NOT_SANE;
1603 gmmR0MutexRelease(pGMM);
1604 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1605 return rc;
1606}
1607
1608
1609/**
1610 * VMMR0 request wrapper for GMMR0UpdateReservation.
1611 *
1612 * @returns see GMMR0UpdateReservation.
1613 * @param pVM Pointer to the shared VM structure.
1614 * @param idCpu VCPU id
1615 * @param pReq The request packet.
1616 */
1617GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1618{
1619 /*
1620 * Validate input and pass it on.
1621 */
1622 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1623 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1624 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1625
1626 return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1627}
1628
1629#ifdef GMMR0_WITH_SANITY_CHECK
1630
1631/**
1632 * Performs sanity checks on a free set.
1633 *
1634 * @returns Error count.
1635 *
1636 * @param pGMM Pointer to the GMM instance.
1637 * @param pSet Pointer to the set.
1638 * @param pszSetName The set name.
1639 * @param pszFunction The function from which it was called.
1640 * @param uLine The line number.
1641 */
1642static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1643 const char *pszFunction, unsigned uLineNo)
1644{
1645 uint32_t cErrors = 0;
1646
1647 /*
1648 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1649 */
1650 uint32_t cPages = 0;
1651 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1652 {
1653 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1654 {
1655 /** @todo check that the chunk is hash into the right set. */
1656 cPages += pCur->cFree;
1657 }
1658 }
1659 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1660 {
1661 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1662 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1663 cErrors++;
1664 }
1665
1666 return cErrors;
1667}
1668
1669
1670/**
1671 * Performs some sanity checks on the GMM while owning lock.
1672 *
1673 * @returns Error count.
1674 *
1675 * @param pGMM Pointer to the GMM instance.
1676 * @param pszFunction The function from which it is called.
1677 * @param uLineNo The line number.
1678 */
1679static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1680{
1681 uint32_t cErrors = 0;
1682
1683 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1684 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1685 /** @todo add more sanity checks. */
1686
1687 return cErrors;
1688}
1689
1690#endif /* GMMR0_WITH_SANITY_CHECK */
1691
1692/**
1693 * Looks up a chunk in the tree and fill in the TLB entry for it.
1694 *
1695 * This is not expected to fail and will bitch if it does.
1696 *
1697 * @returns Pointer to the allocation chunk, NULL if not found.
1698 * @param pGMM Pointer to the GMM instance.
1699 * @param idChunk The ID of the chunk to find.
1700 * @param pTlbe Pointer to the TLB entry.
1701 */
1702static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1703{
1704 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1705 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1706 pTlbe->idChunk = idChunk;
1707 pTlbe->pChunk = pChunk;
1708 return pChunk;
1709}
1710
1711
1712/**
1713 * Finds a allocation chunk.
1714 *
1715 * This is not expected to fail and will bitch if it does.
1716 *
1717 * @returns Pointer to the allocation chunk, NULL if not found.
1718 * @param pGMM Pointer to the GMM instance.
1719 * @param idChunk The ID of the chunk to find.
1720 */
1721DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1722{
1723 /*
1724 * Do a TLB lookup, branch if not in the TLB.
1725 */
1726 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1727 if ( pTlbe->idChunk != idChunk
1728 || !pTlbe->pChunk)
1729 return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1730 return pTlbe->pChunk;
1731}
1732
1733
1734/**
1735 * Finds a page.
1736 *
1737 * This is not expected to fail and will bitch if it does.
1738 *
1739 * @returns Pointer to the page, NULL if not found.
1740 * @param pGMM Pointer to the GMM instance.
1741 * @param idPage The ID of the page to find.
1742 */
1743DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1744{
1745 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1746 if (RT_LIKELY(pChunk))
1747 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1748 return NULL;
1749}
1750
1751
1752/**
1753 * Gets the host physical address for a page given by it's ID.
1754 *
1755 * @returns The host physical address or NIL_RTHCPHYS.
1756 * @param pGMM Pointer to the GMM instance.
1757 * @param idPage The ID of the page to find.
1758 */
1759DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1760{
1761 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1762 if (RT_LIKELY(pChunk))
1763 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1764 return NIL_RTHCPHYS;
1765}
1766
1767
1768/**
1769 * Selects the appropriate free list given the number of free pages.
1770 *
1771 * @returns Free list index.
1772 * @param cFree The number of free pages in the chunk.
1773 */
1774DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1775{
1776 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1777 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1778 ("%d (%u)\n", iList, cFree));
1779 return iList;
1780}
1781
1782
1783/**
1784 * Unlinks the chunk from the free list it's currently on (if any).
1785 *
1786 * @param pChunk The allocation chunk.
1787 */
1788DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1789{
1790 PGMMCHUNKFREESET pSet = pChunk->pSet;
1791 if (RT_LIKELY(pSet))
1792 {
1793 pSet->cFreePages -= pChunk->cFree;
1794 pSet->idGeneration++;
1795
1796 PGMMCHUNK pPrev = pChunk->pFreePrev;
1797 PGMMCHUNK pNext = pChunk->pFreeNext;
1798 if (pPrev)
1799 pPrev->pFreeNext = pNext;
1800 else
1801 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1802 if (pNext)
1803 pNext->pFreePrev = pPrev;
1804
1805 pChunk->pSet = NULL;
1806 pChunk->pFreeNext = NULL;
1807 pChunk->pFreePrev = NULL;
1808 }
1809 else
1810 {
1811 Assert(!pChunk->pFreeNext);
1812 Assert(!pChunk->pFreePrev);
1813 Assert(!pChunk->cFree);
1814 }
1815}
1816
1817
1818/**
1819 * Links the chunk onto the appropriate free list in the specified free set.
1820 *
1821 * If no free entries, it's not linked into any list.
1822 *
1823 * @param pChunk The allocation chunk.
1824 * @param pSet The free set.
1825 */
1826DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1827{
1828 Assert(!pChunk->pSet);
1829 Assert(!pChunk->pFreeNext);
1830 Assert(!pChunk->pFreePrev);
1831
1832 if (pChunk->cFree > 0)
1833 {
1834 pChunk->pSet = pSet;
1835 pChunk->pFreePrev = NULL;
1836 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1837 pChunk->pFreeNext = pSet->apLists[iList];
1838 if (pChunk->pFreeNext)
1839 pChunk->pFreeNext->pFreePrev = pChunk;
1840 pSet->apLists[iList] = pChunk;
1841
1842 pSet->cFreePages += pChunk->cFree;
1843 pSet->idGeneration++;
1844 }
1845}
1846
1847
1848/**
1849 * Links the chunk onto the appropriate free list in the specified free set.
1850 *
1851 * If no free entries, it's not linked into any list.
1852 *
1853 * @param pChunk The allocation chunk.
1854 */
1855DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1856{
1857 PGMMCHUNKFREESET pSet;
1858 if (pGMM->fBoundMemoryMode)
1859 pSet = &pGVM->gmm.s.Private;
1860 else if (pChunk->cShared)
1861 pSet = &pGMM->Shared;
1862 else
1863 pSet = &pGMM->PrivateX;
1864 gmmR0LinkChunk(pChunk, pSet);
1865}
1866
1867
1868/**
1869 * Frees a Chunk ID.
1870 *
1871 * @param pGMM Pointer to the GMM instance.
1872 * @param idChunk The Chunk ID to free.
1873 */
1874static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1875{
1876 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1877 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1878 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1879}
1880
1881
1882/**
1883 * Allocates a new Chunk ID.
1884 *
1885 * @returns The Chunk ID.
1886 * @param pGMM Pointer to the GMM instance.
1887 */
1888static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1889{
1890 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1891 AssertCompile(NIL_GMM_CHUNKID == 0);
1892
1893 /*
1894 * Try the next sequential one.
1895 */
1896 int32_t idChunk = ++pGMM->idChunkPrev;
1897#if 0 /** @todo enable this code */
1898 if ( idChunk <= GMM_CHUNKID_LAST
1899 && idChunk > NIL_GMM_CHUNKID
1900 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
1901 return idChunk;
1902#endif
1903
1904 /*
1905 * Scan sequentially from the last one.
1906 */
1907 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
1908 && idChunk > NIL_GMM_CHUNKID)
1909 {
1910 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
1911 if (idChunk > NIL_GMM_CHUNKID)
1912 {
1913 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1914 return pGMM->idChunkPrev = idChunk;
1915 }
1916 }
1917
1918 /*
1919 * Ok, scan from the start.
1920 * We're not racing anyone, so there is no need to expect failures or have restart loops.
1921 */
1922 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
1923 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
1924 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1925
1926 return pGMM->idChunkPrev = idChunk;
1927}
1928
1929
1930/**
1931 * Allocates one private page.
1932 *
1933 * Worker for gmmR0AllocatePages.
1934 *
1935 * @param pChunk The chunk to allocate it from.
1936 * @param hGVM The GVM handle of the VM requesting memory.
1937 * @param pPageDesc The page descriptor.
1938 */
1939static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
1940{
1941 /* update the chunk stats. */
1942 if (pChunk->hGVM == NIL_GVM_HANDLE)
1943 pChunk->hGVM = hGVM;
1944 Assert(pChunk->cFree);
1945 pChunk->cFree--;
1946 pChunk->cPrivate++;
1947
1948 /* unlink the first free page. */
1949 const uint32_t iPage = pChunk->iFreeHead;
1950 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
1951 PGMMPAGE pPage = &pChunk->aPages[iPage];
1952 Assert(GMM_PAGE_IS_FREE(pPage));
1953 pChunk->iFreeHead = pPage->Free.iNext;
1954 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
1955 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
1956 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
1957
1958 /* make the page private. */
1959 pPage->u = 0;
1960 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
1961 pPage->Private.hGVM = hGVM;
1962 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
1963 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
1964 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
1965 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
1966 else
1967 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
1968
1969 /* update the page descriptor. */
1970 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
1971 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
1972 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
1973 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
1974}
1975
1976
1977/**
1978 * Picks the free pages from a chunk.
1979 *
1980 * @returns The new page descriptor table index.
1981 * @param pGMM Pointer to the GMM instance data.
1982 * @param hGVM The VM handle.
1983 * @param pChunk The chunk.
1984 * @param iPage The current page descriptor table index.
1985 * @param cPages The total number of pages to allocate.
1986 * @param paPages The page descriptor table (input + ouput).
1987 */
1988static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
1989 PGMMPAGEDESC paPages)
1990{
1991 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
1992 gmmR0UnlinkChunk(pChunk);
1993
1994 for (; pChunk->cFree && iPage < cPages; iPage++)
1995 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
1996
1997 gmmR0LinkChunk(pChunk, pSet);
1998 return iPage;
1999}
2000
2001
2002/**
2003 * Registers a new chunk of memory.
2004 *
2005 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2006 *
2007 * @returns VBox status code. On success, the giant GMM lock will be held, the
2008 * caller must release it (ugly).
2009 * @param pGMM Pointer to the GMM instance.
2010 * @param pSet Pointer to the set.
2011 * @param MemObj The memory object for the chunk.
2012 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2013 * affinity.
2014 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2015 * @param ppChunk Chunk address (out). Optional.
2016 *
2017 * @remarks The caller must not own the giant GMM mutex.
2018 * The giant GMM mutex will be acquired and returned acquired in
2019 * the success path. On failure, no locks will be held.
2020 */
2021static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2022 PGMMCHUNK *ppChunk)
2023{
2024 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2025 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2026 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2027
2028 int rc;
2029 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2030 if (pChunk)
2031 {
2032 /*
2033 * Initialize it.
2034 */
2035 pChunk->hMemObj = MemObj;
2036 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2037 pChunk->hGVM = hGVM;
2038 /*pChunk->iFreeHead = 0;*/
2039 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2040 pChunk->iChunkMtx = UINT8_MAX;
2041 pChunk->fFlags = fChunkFlags;
2042 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2043 {
2044 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2045 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2046 }
2047 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2048 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2049
2050 /*
2051 * Allocate a Chunk ID and insert it into the tree.
2052 * This has to be done behind the mutex of course.
2053 */
2054 rc = gmmR0MutexAcquire(pGMM);
2055 if (RT_SUCCESS(rc))
2056 {
2057 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2058 {
2059 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2060 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2061 && pChunk->Core.Key <= GMM_CHUNKID_LAST
2062 && RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2063 {
2064 pGMM->cChunks++;
2065 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2066 gmmR0LinkChunk(pChunk, pSet);
2067 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2068
2069 if (ppChunk)
2070 *ppChunk = pChunk;
2071 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2072 return VINF_SUCCESS;
2073 }
2074
2075 /* bail out */
2076 rc = VERR_GMM_CHUNK_INSERT;
2077 }
2078 else
2079 rc = VERR_GMM_IS_NOT_SANE;
2080 gmmR0MutexRelease(pGMM);
2081 }
2082
2083 RTMemFree(pChunk);
2084 }
2085 else
2086 rc = VERR_NO_MEMORY;
2087 return rc;
2088}
2089
2090
2091/**
2092 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2093 * what's remaining to the specified free set.
2094 *
2095 * @note This will leave the giant mutex while allocating the new chunk!
2096 *
2097 * @returns VBox status code.
2098 * @param pGMM Pointer to the GMM instance data.
2099 * @param pGVM Pointer to the kernel-only VM instace data.
2100 * @param pSet Pointer to the free set.
2101 * @param cPages The number of pages requested.
2102 * @param paPages The page descriptor table (input + output).
2103 * @param piPage The pointer to the page descriptor table index
2104 * variable. This will be updated.
2105 */
2106static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2107 PGMMPAGEDESC paPages, uint32_t *piPage)
2108{
2109 gmmR0MutexRelease(pGMM);
2110
2111 RTR0MEMOBJ hMemObj;
2112 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2113 if (RT_SUCCESS(rc))
2114 {
2115/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2116 * free pages first and then unchaining them right afterwards. Instead
2117 * do as much work as possible without holding the giant lock. */
2118 PGMMCHUNK pChunk;
2119 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2120 if (RT_SUCCESS(rc))
2121 {
2122 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2123 return VINF_SUCCESS;
2124 }
2125
2126 /* bail out */
2127 RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2128 }
2129
2130 int rc2 = gmmR0MutexAcquire(pGMM);
2131 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2132 return rc;
2133
2134}
2135
2136
2137/**
2138 * As a last restort we'll pick any page we can get.
2139 *
2140 * @returns The new page descriptor table index.
2141 * @param pSet The set to pick from.
2142 * @param pGVM Pointer to the global VM structure.
2143 * @param iPage The current page descriptor table index.
2144 * @param cPages The total number of pages to allocate.
2145 * @param paPages The page descriptor table (input + ouput).
2146 */
2147static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2148 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2149{
2150 unsigned iList = RT_ELEMENTS(pSet->apLists);
2151 while (iList-- > 0)
2152 {
2153 PGMMCHUNK pChunk = pSet->apLists[iList];
2154 while (pChunk)
2155 {
2156 PGMMCHUNK pNext = pChunk->pFreeNext;
2157
2158 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2159 if (iPage >= cPages)
2160 return iPage;
2161
2162 pChunk = pNext;
2163 }
2164 }
2165 return iPage;
2166}
2167
2168
2169/**
2170 * Pick pages from empty chunks on the same NUMA node.
2171 *
2172 * @returns The new page descriptor table index.
2173 * @param pSet The set to pick from.
2174 * @param pGVM Pointer to the global VM structure.
2175 * @param iPage The current page descriptor table index.
2176 * @param cPages The total number of pages to allocate.
2177 * @param paPages The page descriptor table (input + ouput).
2178 */
2179static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2180 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2181{
2182 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2183 if (pChunk)
2184 {
2185 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2186 while (pChunk)
2187 {
2188 PGMMCHUNK pNext = pChunk->pFreeNext;
2189
2190 if (pChunk->idNumaNode == idNumaNode)
2191 {
2192 pChunk->hGVM = pGVM->hSelf;
2193 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2194 if (iPage >= cPages)
2195 {
2196 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2197 return iPage;
2198 }
2199 }
2200
2201 pChunk = pNext;
2202 }
2203 }
2204 return iPage;
2205}
2206
2207
2208/**
2209 * Pick pages from non-empty chunks on the same NUMA node.
2210 *
2211 * @returns The new page descriptor table index.
2212 * @param pSet The set to pick from.
2213 * @param pGVM Pointer to the global VM structure.
2214 * @param iPage The current page descriptor table index.
2215 * @param cPages The total number of pages to allocate.
2216 * @param paPages The page descriptor table (input + ouput).
2217 */
2218static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2219 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2220{
2221 /** @todo start by picking from chunks with about the right size first? */
2222 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2223 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2224 while (iList-- > 0)
2225 {
2226 PGMMCHUNK pChunk = pSet->apLists[iList];
2227 while (pChunk)
2228 {
2229 PGMMCHUNK pNext = pChunk->pFreeNext;
2230
2231 if (pChunk->idNumaNode == idNumaNode)
2232 {
2233 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2234 if (iPage >= cPages)
2235 {
2236 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2237 return iPage;
2238 }
2239 }
2240
2241 pChunk = pNext;
2242 }
2243 }
2244 return iPage;
2245}
2246
2247
2248/**
2249 * Pick pages that are in chunks already associated with the VM.
2250 *
2251 * @returns The new page descriptor table index.
2252 * @param pGMM Pointer to the GMM instance data.
2253 * @param pGVM Pointer to the global VM structure.
2254 * @param pSet The set to pick from.
2255 * @param iPage The current page descriptor table index.
2256 * @param cPages The total number of pages to allocate.
2257 * @param paPages The page descriptor table (input + ouput).
2258 */
2259static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2260 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2261{
2262 uint16_t const hGVM = pGVM->hSelf;
2263
2264 /* Hint. */
2265 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2266 {
2267 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2268 if (pChunk && pChunk->cFree)
2269 {
2270 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2271 if (iPage >= cPages)
2272 return iPage;
2273 }
2274 }
2275
2276 /* Scan. */
2277 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2278 {
2279 PGMMCHUNK pChunk = pSet->apLists[iList];
2280 while (pChunk)
2281 {
2282 PGMMCHUNK pNext = pChunk->pFreeNext;
2283
2284 if (pChunk->hGVM == hGVM)
2285 {
2286 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2287 if (iPage >= cPages)
2288 {
2289 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2290 return iPage;
2291 }
2292 }
2293
2294 pChunk = pNext;
2295 }
2296 }
2297 return iPage;
2298}
2299
2300
2301
2302/**
2303 * Pick pages in bound memory mode.
2304 *
2305 * @returns The new page descriptor table index.
2306 * @param pGVM Pointer to the global VM structure.
2307 * @param iPage The current page descriptor table index.
2308 * @param cPages The total number of pages to allocate.
2309 * @param paPages The page descriptor table (input + ouput).
2310 */
2311static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2312{
2313 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2314 {
2315 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2316 while (pChunk)
2317 {
2318 Assert(pChunk->hGVM == pGVM->hSelf);
2319 PGMMCHUNK pNext = pChunk->pFreeNext;
2320 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2321 if (iPage >= cPages)
2322 return iPage;
2323 pChunk = pNext;
2324 }
2325 }
2326 return iPage;
2327}
2328
2329
2330/**
2331 * Checks if we should start picking pages from chunks of other VMs.
2332 *
2333 * @returns @c true if we should, @c false if we should first try allocate more
2334 * chunks.
2335 */
2336static bool gmmR0ShouldAllocatePagesInOtherChunks(PGVM pGVM)
2337{
2338 /*
2339 * Don't allocate a new chunk if we're
2340 */
2341 uint64_t cPgReserved = pGVM->gmm.s.Reserved.cBasePages
2342 + pGVM->gmm.s.Reserved.cFixedPages
2343 - pGVM->gmm.s.cBalloonedPages
2344 /** @todo what about shared pages? */;
2345 uint64_t cPgAllocated = pGVM->gmm.s.Allocated.cBasePages
2346 + pGVM->gmm.s.Allocated.cFixedPages;
2347 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2348 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2349 return true;
2350 /** @todo make the threshold configurable, also test the code to see if
2351 * this ever kicks in (we might be reserving too much or smth). */
2352
2353 /*
2354 * Check how close we're to the max memory limit and how many fragments
2355 * there are?...
2356 */
2357 /** @todo. */
2358
2359 return false;
2360}
2361
2362
2363/**
2364 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2365 *
2366 * @returns VBox status code:
2367 * @retval VINF_SUCCESS on success.
2368 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2369 * gmmR0AllocateMoreChunks is necessary.
2370 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2371 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2372 * that is we're trying to allocate more than we've reserved.
2373 *
2374 * @param pGMM Pointer to the GMM instance data.
2375 * @param pGVM Pointer to the shared VM structure.
2376 * @param cPages The number of pages to allocate.
2377 * @param paPages Pointer to the page descriptors.
2378 * See GMMPAGEDESC for details on what is expected on input.
2379 * @param enmAccount The account to charge.
2380 *
2381 * @remarks Call takes the giant GMM lock.
2382 */
2383static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2384{
2385 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2386
2387 /*
2388 * Check allocation limits.
2389 */
2390 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2391 return VERR_GMM_HIT_GLOBAL_LIMIT;
2392
2393 switch (enmAccount)
2394 {
2395 case GMMACCOUNT_BASE:
2396 if (RT_UNLIKELY( pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cPages
2397 > pGVM->gmm.s.Reserved.cBasePages))
2398 {
2399 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2400 pGVM->gmm.s.Reserved.cBasePages, pGVM->gmm.s.Allocated.cBasePages, pGVM->gmm.s.cBalloonedPages, cPages));
2401 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2402 }
2403 break;
2404 case GMMACCOUNT_SHADOW:
2405 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cShadowPages + cPages > pGVM->gmm.s.Reserved.cShadowPages))
2406 {
2407 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2408 pGVM->gmm.s.Reserved.cShadowPages, pGVM->gmm.s.Allocated.cShadowPages, cPages));
2409 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2410 }
2411 break;
2412 case GMMACCOUNT_FIXED:
2413 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cFixedPages + cPages > pGVM->gmm.s.Reserved.cFixedPages))
2414 {
2415 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2416 pGVM->gmm.s.Reserved.cFixedPages, pGVM->gmm.s.Allocated.cFixedPages, cPages));
2417 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2418 }
2419 break;
2420 default:
2421 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2422 }
2423
2424 /*
2425 * If we're in legacy memory mode, it's easy to figure if we have
2426 * sufficient number of pages up-front.
2427 */
2428 if ( pGMM->fLegacyAllocationMode
2429 && pGVM->gmm.s.Private.cFreePages < cPages)
2430 {
2431 Assert(pGMM->fBoundMemoryMode);
2432 return VERR_GMM_SEED_ME;
2433 }
2434
2435 /*
2436 * Update the accounts before we proceed because we might be leaving the
2437 * protection of the global mutex and thus run the risk of permitting
2438 * too much memory to be allocated.
2439 */
2440 switch (enmAccount)
2441 {
2442 case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages += cPages; break;
2443 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages += cPages; break;
2444 case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages += cPages; break;
2445 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2446 }
2447 pGVM->gmm.s.cPrivatePages += cPages;
2448 pGMM->cAllocatedPages += cPages;
2449
2450 /*
2451 * Part two of it's-easy-in-legacy-memory-mode.
2452 */
2453 uint32_t iPage = 0;
2454 if (pGMM->fLegacyAllocationMode)
2455 {
2456 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2457 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2458 return VINF_SUCCESS;
2459 }
2460
2461 /*
2462 * Bound mode is also relatively straightforward.
2463 */
2464 int rc = VINF_SUCCESS;
2465 if (pGMM->fBoundMemoryMode)
2466 {
2467 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2468 if (iPage < cPages)
2469 do
2470 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2471 while (iPage < cPages && RT_SUCCESS(rc));
2472 }
2473 /*
2474 * Shared mode is trickier as we should try archive the same locality as
2475 * in bound mode, but smartly make use of non-full chunks allocated by
2476 * other VMs if we're low on memory.
2477 */
2478 else
2479 {
2480 /* Pick the most optimal pages first. */
2481 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2482 if (iPage < cPages)
2483 {
2484 /* Maybe we should try getting pages from chunks "belonging" to
2485 other VMs before allocating more chunks? */
2486 if (gmmR0ShouldAllocatePagesInOtherChunks(pGVM))
2487 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2488
2489 /* Allocate memory from empty chunks. */
2490 if (iPage < cPages)
2491 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2492
2493 /* Grab empty shared chunks. */
2494 if (iPage < cPages)
2495 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2496
2497 /*
2498 * Ok, try allocate new chunks.
2499 */
2500 if (iPage < cPages)
2501 {
2502 do
2503 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2504 while (iPage < cPages && RT_SUCCESS(rc));
2505
2506 /* If the host is out of memory, take whatever we can get. */
2507 if ( rc == VERR_NO_MEMORY
2508 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2509 {
2510 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2511 if (iPage < cPages)
2512 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2513 AssertRelease(iPage == cPages);
2514 rc = VINF_SUCCESS;
2515 }
2516 }
2517 }
2518 }
2519
2520 /*
2521 * Clean up on failure. Since this is bound to be a low-memory condition
2522 * we will give back any empty chunks that might be hanging around.
2523 */
2524 if (RT_FAILURE(rc))
2525 {
2526 /* Update the statistics. */
2527 pGVM->gmm.s.cPrivatePages -= cPages;
2528 pGMM->cAllocatedPages -= cPages - iPage;
2529 switch (enmAccount)
2530 {
2531 case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages -= cPages; break;
2532 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages -= cPages; break;
2533 case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages -= cPages; break;
2534 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2535 }
2536
2537 /* Release the pages. */
2538 while (iPage-- > 0)
2539 {
2540 uint32_t idPage = paPages[iPage].idPage;
2541 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2542 if (RT_LIKELY(pPage))
2543 {
2544 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2545 Assert(pPage->Private.hGVM == pGVM->hSelf);
2546 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2547 }
2548 else
2549 AssertMsgFailed(("idPage=%#x\n", idPage));
2550
2551 paPages[iPage].idPage = NIL_GMM_PAGEID;
2552 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2553 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2554 }
2555
2556 /* Free empty chunks. */
2557 /** @todo */
2558
2559 /* return the fail status on failure */
2560 return rc;
2561 }
2562 return VINF_SUCCESS;
2563}
2564
2565
2566/**
2567 * Updates the previous allocations and allocates more pages.
2568 *
2569 * The handy pages are always taken from the 'base' memory account.
2570 * The allocated pages are not cleared and will contains random garbage.
2571 *
2572 * @returns VBox status code:
2573 * @retval VINF_SUCCESS on success.
2574 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2575 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2576 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2577 * private page.
2578 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2579 * shared page.
2580 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2581 * owned by the VM.
2582 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2583 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2584 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2585 * that is we're trying to allocate more than we've reserved.
2586 *
2587 * @param pVM Pointer to the shared VM structure.
2588 * @param idCpu VCPU id
2589 * @param cPagesToUpdate The number of pages to update (starting from the head).
2590 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2591 * @param paPages The array of page descriptors.
2592 * See GMMPAGEDESC for details on what is expected on input.
2593 * @thread EMT.
2594 */
2595GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2596{
2597 LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2598 pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2599
2600 /*
2601 * Validate, get basics and take the semaphore.
2602 * (This is a relatively busy path, so make predictions where possible.)
2603 */
2604 PGMM pGMM;
2605 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2606 PGVM pGVM;
2607 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2608 if (RT_FAILURE(rc))
2609 return rc;
2610
2611 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2612 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2613 || (cPagesToAlloc && cPagesToAlloc < 1024),
2614 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2615 VERR_INVALID_PARAMETER);
2616
2617 unsigned iPage = 0;
2618 for (; iPage < cPagesToUpdate; iPage++)
2619 {
2620 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2621 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2622 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2623 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2624 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2625 VERR_INVALID_PARAMETER);
2626 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2627 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2628 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2629 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2630 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2631 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2632 }
2633
2634 for (; iPage < cPagesToAlloc; iPage++)
2635 {
2636 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2637 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2638 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2639 }
2640
2641 gmmR0MutexAcquire(pGMM);
2642 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2643 {
2644 /* No allocations before the initial reservation has been made! */
2645 if (RT_LIKELY( pGVM->gmm.s.Reserved.cBasePages
2646 && pGVM->gmm.s.Reserved.cFixedPages
2647 && pGVM->gmm.s.Reserved.cShadowPages))
2648 {
2649 /*
2650 * Perform the updates.
2651 * Stop on the first error.
2652 */
2653 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2654 {
2655 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2656 {
2657 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2658 if (RT_LIKELY(pPage))
2659 {
2660 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2661 {
2662 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2663 {
2664 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2665 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2666 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2667 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2668 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2669 /* else: NIL_RTHCPHYS nothing */
2670
2671 paPages[iPage].idPage = NIL_GMM_PAGEID;
2672 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2673 }
2674 else
2675 {
2676 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2677 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2678 rc = VERR_GMM_NOT_PAGE_OWNER;
2679 break;
2680 }
2681 }
2682 else
2683 {
2684 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2685 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2686 break;
2687 }
2688 }
2689 else
2690 {
2691 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2692 rc = VERR_GMM_PAGE_NOT_FOUND;
2693 break;
2694 }
2695 }
2696
2697 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2698 {
2699 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2700 if (RT_LIKELY(pPage))
2701 {
2702 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2703 {
2704 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2705 Assert(pPage->Shared.cRefs);
2706 Assert(pGVM->gmm.s.cSharedPages);
2707 Assert(pGVM->gmm.s.Allocated.cBasePages);
2708
2709 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2710 pGVM->gmm.s.cSharedPages--;
2711 pGVM->gmm.s.Allocated.cBasePages--;
2712 if (!--pPage->Shared.cRefs)
2713 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2714 else
2715 {
2716 Assert(pGMM->cDuplicatePages);
2717 pGMM->cDuplicatePages--;
2718 }
2719
2720 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2721 }
2722 else
2723 {
2724 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2725 rc = VERR_GMM_PAGE_NOT_SHARED;
2726 break;
2727 }
2728 }
2729 else
2730 {
2731 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2732 rc = VERR_GMM_PAGE_NOT_FOUND;
2733 break;
2734 }
2735 }
2736 } /* for each page to update */
2737
2738 if (RT_SUCCESS(rc))
2739 {
2740#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2741 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2742 {
2743 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2744 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2745 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2746 }
2747#endif
2748
2749 /*
2750 * Join paths with GMMR0AllocatePages for the allocation.
2751 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2752 */
2753 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2754 }
2755 }
2756 else
2757 rc = VERR_WRONG_ORDER;
2758 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2759 }
2760 else
2761 rc = VERR_GMM_IS_NOT_SANE;
2762 gmmR0MutexRelease(pGMM);
2763 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2764 return rc;
2765}
2766
2767
2768/**
2769 * Allocate one or more pages.
2770 *
2771 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2772 * The allocated pages are not cleared and will contains random garbage.
2773 *
2774 * @returns VBox status code:
2775 * @retval VINF_SUCCESS on success.
2776 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2777 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2778 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2779 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2780 * that is we're trying to allocate more than we've reserved.
2781 *
2782 * @param pVM Pointer to the shared VM structure.
2783 * @param idCpu VCPU id
2784 * @param cPages The number of pages to allocate.
2785 * @param paPages Pointer to the page descriptors.
2786 * See GMMPAGEDESC for details on what is expected on input.
2787 * @param enmAccount The account to charge.
2788 *
2789 * @thread EMT.
2790 */
2791GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2792{
2793 LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2794
2795 /*
2796 * Validate, get basics and take the semaphore.
2797 */
2798 PGMM pGMM;
2799 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2800 PGVM pGVM;
2801 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2802 if (RT_FAILURE(rc))
2803 return rc;
2804
2805 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2806 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2807 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2808
2809 for (unsigned iPage = 0; iPage < cPages; iPage++)
2810 {
2811 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2812 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2813 || ( enmAccount == GMMACCOUNT_BASE
2814 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2815 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2816 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2817 VERR_INVALID_PARAMETER);
2818 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2819 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2820 }
2821
2822 gmmR0MutexAcquire(pGMM);
2823 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2824 {
2825
2826 /* No allocations before the initial reservation has been made! */
2827 if (RT_LIKELY( pGVM->gmm.s.Reserved.cBasePages
2828 && pGVM->gmm.s.Reserved.cFixedPages
2829 && pGVM->gmm.s.Reserved.cShadowPages))
2830 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2831 else
2832 rc = VERR_WRONG_ORDER;
2833 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2834 }
2835 else
2836 rc = VERR_GMM_IS_NOT_SANE;
2837 gmmR0MutexRelease(pGMM);
2838 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2839 return rc;
2840}
2841
2842
2843/**
2844 * VMMR0 request wrapper for GMMR0AllocatePages.
2845 *
2846 * @returns see GMMR0AllocatePages.
2847 * @param pVM Pointer to the shared VM structure.
2848 * @param idCpu VCPU id
2849 * @param pReq The request packet.
2850 */
2851GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2852{
2853 /*
2854 * Validate input and pass it on.
2855 */
2856 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2857 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2858 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2859 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2860 VERR_INVALID_PARAMETER);
2861 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2862 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2863 VERR_INVALID_PARAMETER);
2864
2865 return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
2866}
2867
2868
2869/**
2870 * Allocate a large page to represent guest RAM
2871 *
2872 * The allocated pages are not cleared and will contains random garbage.
2873 *
2874 * @returns VBox status code:
2875 * @retval VINF_SUCCESS on success.
2876 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2877 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2878 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2879 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2880 * that is we're trying to allocate more than we've reserved.
2881 * @returns see GMMR0AllocatePages.
2882 * @param pVM Pointer to the shared VM structure.
2883 * @param idCpu VCPU id
2884 * @param cbPage Large page size
2885 */
2886GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
2887{
2888 LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
2889
2890 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
2891 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
2892 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
2893
2894 /*
2895 * Validate, get basics and take the semaphore.
2896 */
2897 PGMM pGMM;
2898 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2899 PGVM pGVM;
2900 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2901 if (RT_FAILURE(rc))
2902 return rc;
2903
2904 /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
2905 if (pGMM->fLegacyAllocationMode)
2906 return VERR_NOT_SUPPORTED;
2907
2908 *pHCPhys = NIL_RTHCPHYS;
2909 *pIdPage = NIL_GMM_PAGEID;
2910
2911 gmmR0MutexAcquire(pGMM);
2912 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2913 {
2914 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
2915 if (RT_UNLIKELY( pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cPages
2916 > pGVM->gmm.s.Reserved.cBasePages))
2917 {
2918 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
2919 pGVM->gmm.s.Reserved.cBasePages, pGVM->gmm.s.Allocated.cBasePages, cPages));
2920 gmmR0MutexRelease(pGMM);
2921 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2922 }
2923
2924 /*
2925 * Allocate a new large page chunk.
2926 *
2927 * Note! We leave the giant GMM lock temporarily as the allocation might
2928 * take a long time. gmmR0RegisterChunk will retake it (ugly).
2929 */
2930 AssertCompile(GMM_CHUNK_SIZE == _2M);
2931 gmmR0MutexRelease(pGMM);
2932
2933 RTR0MEMOBJ hMemObj;
2934 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
2935 if (RT_SUCCESS(rc))
2936 {
2937 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
2938 PGMMCHUNK pChunk;
2939 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
2940 if (RT_SUCCESS(rc))
2941 {
2942 /*
2943 * Allocate all the pages in the chunk.
2944 */
2945 /* Unlink the new chunk from the free list. */
2946 gmmR0UnlinkChunk(pChunk);
2947
2948 /** @todo rewrite this to skip the looping. */
2949 /* Allocate all pages. */
2950 GMMPAGEDESC PageDesc;
2951 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2952
2953 /* Return the first page as we'll use the whole chunk as one big page. */
2954 *pIdPage = PageDesc.idPage;
2955 *pHCPhys = PageDesc.HCPhysGCPhys;
2956
2957 for (unsigned i = 1; i < cPages; i++)
2958 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2959
2960 /* Update accounting. */
2961 pGVM->gmm.s.Allocated.cBasePages += cPages;
2962 pGVM->gmm.s.cPrivatePages += cPages;
2963 pGMM->cAllocatedPages += cPages;
2964
2965 gmmR0LinkChunk(pChunk, pSet);
2966 gmmR0MutexRelease(pGMM);
2967 }
2968 else
2969 RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2970 }
2971 }
2972 else
2973 {
2974 gmmR0MutexRelease(pGMM);
2975 rc = VERR_GMM_IS_NOT_SANE;
2976 }
2977
2978 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
2979 return rc;
2980}
2981
2982
2983/**
2984 * Free a large page
2985 *
2986 * @returns VBox status code:
2987 * @param pVM Pointer to the shared VM structure.
2988 * @param idCpu VCPU id
2989 * @param idPage Large page id
2990 */
2991GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
2992{
2993 LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
2994
2995 /*
2996 * Validate, get basics and take the semaphore.
2997 */
2998 PGMM pGMM;
2999 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3000 PGVM pGVM;
3001 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3002 if (RT_FAILURE(rc))
3003 return rc;
3004
3005 /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3006 if (pGMM->fLegacyAllocationMode)
3007 return VERR_NOT_SUPPORTED;
3008
3009 gmmR0MutexAcquire(pGMM);
3010 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3011 {
3012 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3013
3014 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cBasePages < cPages))
3015 {
3016 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cBasePages, cPages));
3017 gmmR0MutexRelease(pGMM);
3018 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3019 }
3020
3021 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3022 if (RT_LIKELY( pPage
3023 && GMM_PAGE_IS_PRIVATE(pPage)))
3024 {
3025 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3026 Assert(pChunk);
3027 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3028 Assert(pChunk->cPrivate > 0);
3029
3030 /* Release the memory immediately. */
3031 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3032
3033 /* Update accounting. */
3034 pGVM->gmm.s.Allocated.cBasePages -= cPages;
3035 pGVM->gmm.s.cPrivatePages -= cPages;
3036 pGMM->cAllocatedPages -= cPages;
3037 }
3038 else
3039 rc = VERR_GMM_PAGE_NOT_FOUND;
3040 }
3041 else
3042 rc = VERR_GMM_IS_NOT_SANE;
3043
3044 gmmR0MutexRelease(pGMM);
3045 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3046 return rc;
3047}
3048
3049
3050/**
3051 * VMMR0 request wrapper for GMMR0FreeLargePage.
3052 *
3053 * @returns see GMMR0FreeLargePage.
3054 * @param pVM Pointer to the shared VM structure.
3055 * @param idCpu VCPU id
3056 * @param pReq The request packet.
3057 */
3058GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3059{
3060 /*
3061 * Validate input and pass it on.
3062 */
3063 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3064 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3065 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3066 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3067 VERR_INVALID_PARAMETER);
3068
3069 return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3070}
3071
3072
3073/**
3074 * Frees a chunk, giving it back to the host OS.
3075 *
3076 * @param pGMM Pointer to the GMM instance.
3077 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3078 * unmap and free the chunk in one go.
3079 * @param pChunk The chunk to free.
3080 * @param fRelaxedSem Whether we can release the semaphore while doing the
3081 * freeing (@c true) or not.
3082 */
3083static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3084{
3085 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3086
3087 GMMR0CHUNKMTXSTATE MtxState;
3088 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3089
3090 /*
3091 * Cleanup hack! Unmap the chunk from the callers address space.
3092 * This shouldn't happen, so screw lock contention...
3093 */
3094 if ( pChunk->cMappingsX
3095 && !pGMM->fLegacyAllocationMode
3096 && pGVM)
3097 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3098
3099 /*
3100 * If there are current mappings of the chunk, then request the
3101 * VMs to unmap them. Reposition the chunk in the free list so
3102 * it won't be a likely candidate for allocations.
3103 */
3104 if (pChunk->cMappingsX)
3105 {
3106 /** @todo R0 -> VM request */
3107 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3108 Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3109 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3110 return false;
3111 }
3112
3113
3114 /*
3115 * Save and trash the handle.
3116 */
3117 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3118 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3119
3120 /*
3121 * Unlink it from everywhere.
3122 */
3123 gmmR0UnlinkChunk(pChunk);
3124
3125 RTListNodeRemove(&pChunk->ListNode);
3126
3127 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3128 Assert(pCore == &pChunk->Core); NOREF(pCore);
3129
3130 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3131 if (pTlbe->pChunk == pChunk)
3132 {
3133 pTlbe->idChunk = NIL_GMM_CHUNKID;
3134 pTlbe->pChunk = NULL;
3135 }
3136
3137 Assert(pGMM->cChunks > 0);
3138 pGMM->cChunks--;
3139
3140 /*
3141 * Free the Chunk ID before dropping the locks and freeing the rest.
3142 */
3143 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3144 pChunk->Core.Key = NIL_GMM_CHUNKID;
3145
3146 pGMM->cFreedChunks++;
3147
3148 gmmR0ChunkMutexRelease(&MtxState, NULL);
3149 if (fRelaxedSem)
3150 gmmR0MutexRelease(pGMM);
3151
3152 RTMemFree(pChunk->paMappingsX);
3153 pChunk->paMappingsX = NULL;
3154
3155 RTMemFree(pChunk);
3156
3157 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3158 AssertLogRelRC(rc);
3159
3160 if (fRelaxedSem)
3161 gmmR0MutexAcquire(pGMM);
3162 return fRelaxedSem;
3163}
3164
3165
3166/**
3167 * Free page worker.
3168 *
3169 * The caller does all the statistic decrementing, we do all the incrementing.
3170 *
3171 * @param pGMM Pointer to the GMM instance data.
3172 * @param pGVM Pointer to the GVM instance.
3173 * @param pChunk Pointer to the chunk this page belongs to.
3174 * @param idPage The Page ID.
3175 * @param pPage Pointer to the page.
3176 */
3177static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3178{
3179 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3180 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3181
3182 /*
3183 * Put the page on the free list.
3184 */
3185 pPage->u = 0;
3186 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3187 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3188 pPage->Free.iNext = pChunk->iFreeHead;
3189 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3190
3191 /*
3192 * Update statistics (the cShared/cPrivate stats are up to date already),
3193 * and relink the chunk if necessary.
3194 */
3195 unsigned const cFree = pChunk->cFree;
3196 if ( !cFree
3197 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3198 {
3199 gmmR0UnlinkChunk(pChunk);
3200 pChunk->cFree++;
3201 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3202 }
3203 else
3204 {
3205 pChunk->cFree = cFree + 1;
3206 pChunk->pSet->cFreePages++;
3207 }
3208
3209 /*
3210 * If the chunk becomes empty, consider giving memory back to the host OS.
3211 *
3212 * The current strategy is to try give it back if there are other chunks
3213 * in this free list, meaning if there are at least 240 free pages in this
3214 * category. Note that since there are probably mappings of the chunk,
3215 * it won't be freed up instantly, which probably screws up this logic
3216 * a bit...
3217 */
3218 /** @todo Do this on the way out. */
3219 if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3220 && pChunk->pFreeNext
3221 && pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3222 && !pGMM->fLegacyAllocationMode))
3223 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3224
3225}
3226
3227
3228/**
3229 * Frees a shared page, the page is known to exist and be valid and such.
3230 *
3231 * @param pGMM Pointer to the GMM instance.
3232 * @param pGVM Pointer to the GVM instance.
3233 * @param idPage The Page ID
3234 * @param pPage The page structure.
3235 */
3236DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3237{
3238 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3239 Assert(pChunk);
3240 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3241 Assert(pChunk->cShared > 0);
3242 Assert(pGMM->cSharedPages > 0);
3243 Assert(pGMM->cAllocatedPages > 0);
3244 Assert(!pPage->Shared.cRefs);
3245
3246 pChunk->cShared--;
3247 pGMM->cAllocatedPages--;
3248 pGMM->cSharedPages--;
3249 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3250}
3251
3252
3253/**
3254 * Frees a private page, the page is known to exist and be valid and such.
3255 *
3256 * @param pGMM Pointer to the GMM instance.
3257 * @param pGVM Pointer to the GVM instance.
3258 * @param idPage The Page ID
3259 * @param pPage The page structure.
3260 */
3261DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3262{
3263 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3264 Assert(pChunk);
3265 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3266 Assert(pChunk->cPrivate > 0);
3267 Assert(pGMM->cAllocatedPages > 0);
3268
3269 pChunk->cPrivate--;
3270 pGMM->cAllocatedPages--;
3271 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3272}
3273
3274
3275/**
3276 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3277 *
3278 * @returns VBox status code:
3279 * @retval xxx
3280 *
3281 * @param pGMM Pointer to the GMM instance data.
3282 * @param pGVM Pointer to the shared VM structure.
3283 * @param cPages The number of pages to free.
3284 * @param paPages Pointer to the page descriptors.
3285 * @param enmAccount The account this relates to.
3286 */
3287static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3288{
3289 /*
3290 * Check that the request isn't impossible wrt to the account status.
3291 */
3292 switch (enmAccount)
3293 {
3294 case GMMACCOUNT_BASE:
3295 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cBasePages < cPages))
3296 {
3297 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cBasePages, cPages));
3298 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3299 }
3300 break;
3301 case GMMACCOUNT_SHADOW:
3302 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cShadowPages < cPages))
3303 {
3304 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cShadowPages, cPages));
3305 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3306 }
3307 break;
3308 case GMMACCOUNT_FIXED:
3309 if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cFixedPages < cPages))
3310 {
3311 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cFixedPages, cPages));
3312 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3313 }
3314 break;
3315 default:
3316 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3317 }
3318
3319 /*
3320 * Walk the descriptors and free the pages.
3321 *
3322 * Statistics (except the account) are being updated as we go along,
3323 * unlike the alloc code. Also, stop on the first error.
3324 */
3325 int rc = VINF_SUCCESS;
3326 uint32_t iPage;
3327 for (iPage = 0; iPage < cPages; iPage++)
3328 {
3329 uint32_t idPage = paPages[iPage].idPage;
3330 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3331 if (RT_LIKELY(pPage))
3332 {
3333 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3334 {
3335 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3336 {
3337 Assert(pGVM->gmm.s.cPrivatePages);
3338 pGVM->gmm.s.cPrivatePages--;
3339 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3340 }
3341 else
3342 {
3343 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3344 pPage->Private.hGVM, pGVM->hSelf));
3345 rc = VERR_GMM_NOT_PAGE_OWNER;
3346 break;
3347 }
3348 }
3349 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3350 {
3351 Assert(pGVM->gmm.s.cSharedPages);
3352 pGVM->gmm.s.cSharedPages--;
3353 Assert(pPage->Shared.cRefs);
3354 if (!--pPage->Shared.cRefs)
3355 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3356 else
3357 {
3358 Assert(pGMM->cDuplicatePages);
3359 pGMM->cDuplicatePages--;
3360 }
3361 }
3362 else
3363 {
3364 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3365 rc = VERR_GMM_PAGE_ALREADY_FREE;
3366 break;
3367 }
3368 }
3369 else
3370 {
3371 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3372 rc = VERR_GMM_PAGE_NOT_FOUND;
3373 break;
3374 }
3375 paPages[iPage].idPage = NIL_GMM_PAGEID;
3376 }
3377
3378 /*
3379 * Update the account.
3380 */
3381 switch (enmAccount)
3382 {
3383 case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages -= iPage; break;
3384 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages -= iPage; break;
3385 case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages -= iPage; break;
3386 default:
3387 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3388 }
3389
3390 /*
3391 * Any threshold stuff to be done here?
3392 */
3393
3394 return rc;
3395}
3396
3397
3398/**
3399 * Free one or more pages.
3400 *
3401 * This is typically used at reset time or power off.
3402 *
3403 * @returns VBox status code:
3404 * @retval xxx
3405 *
3406 * @param pVM Pointer to the shared VM structure.
3407 * @param idCpu VCPU id
3408 * @param cPages The number of pages to allocate.
3409 * @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3410 * @param enmAccount The account this relates to.
3411 * @thread EMT.
3412 */
3413GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3414{
3415 LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3416
3417 /*
3418 * Validate input and get the basics.
3419 */
3420 PGMM pGMM;
3421 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3422 PGVM pGVM;
3423 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3424 if (RT_FAILURE(rc))
3425 return rc;
3426
3427 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3428 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3429 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3430
3431 for (unsigned iPage = 0; iPage < cPages; iPage++)
3432 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3433 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3434 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3435
3436 /*
3437 * Take the semaphore and call the worker function.
3438 */
3439 gmmR0MutexAcquire(pGMM);
3440 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3441 {
3442 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3443 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3444 }
3445 else
3446 rc = VERR_GMM_IS_NOT_SANE;
3447 gmmR0MutexRelease(pGMM);
3448 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3449 return rc;
3450}
3451
3452
3453/**
3454 * VMMR0 request wrapper for GMMR0FreePages.
3455 *
3456 * @returns see GMMR0FreePages.
3457 * @param pVM Pointer to the shared VM structure.
3458 * @param idCpu VCPU id
3459 * @param pReq The request packet.
3460 */
3461GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3462{
3463 /*
3464 * Validate input and pass it on.
3465 */
3466 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3467 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3468 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3469 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3470 VERR_INVALID_PARAMETER);
3471 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3472 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3473 VERR_INVALID_PARAMETER);
3474
3475 return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3476}
3477
3478
3479/**
3480 * Report back on a memory ballooning request.
3481 *
3482 * The request may or may not have been initiated by the GMM. If it was initiated
3483 * by the GMM it is important that this function is called even if no pages were
3484 * ballooned.
3485 *
3486 * @returns VBox status code:
3487 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3488 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3489 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3490 * indicating that we won't necessarily have sufficient RAM to boot
3491 * the VM again and that it should pause until this changes (we'll try
3492 * balloon some other VM). (For standard deflate we have little choice
3493 * but to hope the VM won't use the memory that was returned to it.)
3494 *
3495 * @param pVM Pointer to the shared VM structure.
3496 * @param idCpu VCPU id
3497 * @param enmAction Inflate/deflate/reset
3498 * @param cBalloonedPages The number of pages that was ballooned.
3499 *
3500 * @thread EMT.
3501 */
3502GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3503{
3504 LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3505 pVM, enmAction, cBalloonedPages));
3506
3507 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3508
3509 /*
3510 * Validate input and get the basics.
3511 */
3512 PGMM pGMM;
3513 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3514 PGVM pGVM;
3515 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3516 if (RT_FAILURE(rc))
3517 return rc;
3518
3519 /*
3520 * Take the semaphore and do some more validations.
3521 */
3522 gmmR0MutexAcquire(pGMM);
3523 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3524 {
3525 switch (enmAction)
3526 {
3527 case GMMBALLOONACTION_INFLATE:
3528 {
3529 if (RT_LIKELY(pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cBalloonedPages <= pGVM->gmm.s.Reserved.cBasePages))
3530 {
3531 /*
3532 * Record the ballooned memory.
3533 */
3534 pGMM->cBalloonedPages += cBalloonedPages;
3535 if (pGVM->gmm.s.cReqBalloonedPages)
3536 {
3537 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3538 AssertFailed();
3539
3540 pGVM->gmm.s.cBalloonedPages += cBalloonedPages;
3541 pGVM->gmm.s.cReqActuallyBalloonedPages += cBalloonedPages;
3542 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n", cBalloonedPages,
3543 pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages, pGVM->gmm.s.cReqBalloonedPages, pGVM->gmm.s.cReqActuallyBalloonedPages));
3544 }
3545 else
3546 {
3547 pGVM->gmm.s.cBalloonedPages += cBalloonedPages;
3548 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3549 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages));
3550 }
3551 }
3552 else
3553 {
3554 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3555 pGVM->gmm.s.Allocated.cBasePages, pGVM->gmm.s.cBalloonedPages, cBalloonedPages, pGVM->gmm.s.Reserved.cBasePages));
3556 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3557 }
3558 break;
3559 }
3560
3561 case GMMBALLOONACTION_DEFLATE:
3562 {
3563 /* Deflate. */
3564 if (pGVM->gmm.s.cBalloonedPages >= cBalloonedPages)
3565 {
3566 /*
3567 * Record the ballooned memory.
3568 */
3569 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3570 pGMM->cBalloonedPages -= cBalloonedPages;
3571 pGVM->gmm.s.cBalloonedPages -= cBalloonedPages;
3572 if (pGVM->gmm.s.cReqDeflatePages)
3573 {
3574 AssertFailed(); /* This is path is for later. */
3575 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3576 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages, pGVM->gmm.s.cReqDeflatePages));
3577
3578 /*
3579 * Anything we need to do here now when the request has been completed?
3580 */
3581 pGVM->gmm.s.cReqDeflatePages = 0;
3582 }
3583 else
3584 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3585 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages));
3586 }
3587 else
3588 {
3589 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.cBalloonedPages, cBalloonedPages));
3590 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3591 }
3592 break;
3593 }
3594
3595 case GMMBALLOONACTION_RESET:
3596 {
3597 /* Reset to an empty balloon. */
3598 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.cBalloonedPages);
3599
3600 pGMM->cBalloonedPages -= pGVM->gmm.s.cBalloonedPages;
3601 pGVM->gmm.s.cBalloonedPages = 0;
3602 break;
3603 }
3604
3605 default:
3606 rc = VERR_INVALID_PARAMETER;
3607 break;
3608 }
3609 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3610 }
3611 else
3612 rc = VERR_GMM_IS_NOT_SANE;
3613
3614 gmmR0MutexRelease(pGMM);
3615 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3616 return rc;
3617}
3618
3619
3620/**
3621 * VMMR0 request wrapper for GMMR0BalloonedPages.
3622 *
3623 * @returns see GMMR0BalloonedPages.
3624 * @param pVM Pointer to the shared VM structure.
3625 * @param idCpu VCPU id
3626 * @param pReq The request packet.
3627 */
3628GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3629{
3630 /*
3631 * Validate input and pass it on.
3632 */
3633 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3634 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3635 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3636 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3637 VERR_INVALID_PARAMETER);
3638
3639 return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3640}
3641
3642/**
3643 * Return memory statistics for the hypervisor
3644 *
3645 * @returns VBox status code:
3646 * @param pVM Pointer to the shared VM structure.
3647 * @param pReq The request packet.
3648 */
3649GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3650{
3651 /*
3652 * Validate input and pass it on.
3653 */
3654 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3655 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3656 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3657 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3658 VERR_INVALID_PARAMETER);
3659
3660 /*
3661 * Validate input and get the basics.
3662 */
3663 PGMM pGMM;
3664 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3665 pReq->cAllocPages = pGMM->cAllocatedPages;
3666 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3667 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3668 pReq->cMaxPages = pGMM->cMaxPages;
3669 pReq->cSharedPages = pGMM->cDuplicatePages;
3670 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3671
3672 return VINF_SUCCESS;
3673}
3674
3675/**
3676 * Return memory statistics for the VM
3677 *
3678 * @returns VBox status code:
3679 * @param pVM Pointer to the shared VM structure.
3680 * @parma idCpu Cpu id.
3681 * @param pReq The request packet.
3682 */
3683GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3684{
3685 /*
3686 * Validate input and pass it on.
3687 */
3688 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3689 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3690 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3691 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3692 VERR_INVALID_PARAMETER);
3693
3694 /*
3695 * Validate input and get the basics.
3696 */
3697 PGMM pGMM;
3698 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3699 PGVM pGVM;
3700 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3701 if (RT_FAILURE(rc))
3702 return rc;
3703
3704 /*
3705 * Take the semaphore and do some more validations.
3706 */
3707 gmmR0MutexAcquire(pGMM);
3708 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3709 {
3710 pReq->cAllocPages = pGVM->gmm.s.Allocated.cBasePages;
3711 pReq->cBalloonedPages = pGVM->gmm.s.cBalloonedPages;
3712 pReq->cMaxPages = pGVM->gmm.s.Reserved.cBasePages;
3713 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3714 }
3715 else
3716 rc = VERR_GMM_IS_NOT_SANE;
3717
3718 gmmR0MutexRelease(pGMM);
3719 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3720 return rc;
3721}
3722
3723
3724/**
3725 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3726 *
3727 * Don't call this in legacy allocation mode!
3728 *
3729 * @returns VBox status code.
3730 * @param pGMM Pointer to the GMM instance data.
3731 * @param pGVM Pointer to the Global VM structure.
3732 * @param pChunk Pointer to the chunk to be unmapped.
3733 */
3734static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3735{
3736 Assert(!pGMM->fLegacyAllocationMode);
3737
3738 /*
3739 * Find the mapping and try unmapping it.
3740 */
3741 uint32_t cMappings = pChunk->cMappingsX;
3742 for (uint32_t i = 0; i < cMappings; i++)
3743 {
3744 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3745 if (pChunk->paMappingsX[i].pGVM == pGVM)
3746 {
3747 /* unmap */
3748 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3749 if (RT_SUCCESS(rc))
3750 {
3751 /* update the record. */
3752 cMappings--;
3753 if (i < cMappings)
3754 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3755 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3756 pChunk->paMappingsX[cMappings].pGVM = NULL;
3757 Assert(pChunk->cMappingsX - 1U == cMappings);
3758 pChunk->cMappingsX = cMappings;
3759 }
3760
3761 return rc;
3762 }
3763 }
3764
3765 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3766 return VERR_GMM_CHUNK_NOT_MAPPED;
3767}
3768
3769
3770/**
3771 * Unmaps a chunk previously mapped into the address space of the current process.
3772 *
3773 * @returns VBox status code.
3774 * @param pGMM Pointer to the GMM instance data.
3775 * @param pGVM Pointer to the Global VM structure.
3776 * @param pChunk Pointer to the chunk to be unmapped.
3777 */
3778static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3779{
3780 if (!pGMM->fLegacyAllocationMode)
3781 {
3782 /*
3783 * Lock the chunk and if possible leave the giant GMM lock.
3784 */
3785 GMMR0CHUNKMTXSTATE MtxState;
3786 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3787 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3788 if (RT_SUCCESS(rc))
3789 {
3790 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3791 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3792 }
3793 return rc;
3794 }
3795
3796 if (pChunk->hGVM == pGVM->hSelf)
3797 return VINF_SUCCESS;
3798
3799 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3800 return VERR_GMM_CHUNK_NOT_MAPPED;
3801}
3802
3803
3804/**
3805 * Worker for gmmR0MapChunk.
3806 *
3807 * @returns VBox status code.
3808 * @param pGMM Pointer to the GMM instance data.
3809 * @param pGVM Pointer to the Global VM structure.
3810 * @param pChunk Pointer to the chunk to be mapped.
3811 * @param ppvR3 Where to store the ring-3 address of the mapping.
3812 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3813 * contain the address of the existing mapping.
3814 */
3815static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3816{
3817 /*
3818 * If we're in legacy mode this is simple.
3819 */
3820 if (pGMM->fLegacyAllocationMode)
3821 {
3822 if (pChunk->hGVM != pGVM->hSelf)
3823 {
3824 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3825 return VERR_GMM_CHUNK_NOT_FOUND;
3826 }
3827
3828 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3829 return VINF_SUCCESS;
3830 }
3831
3832 /*
3833 * Check to see if the chunk is already mapped.
3834 */
3835 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3836 {
3837 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3838 if (pChunk->paMappingsX[i].pGVM == pGVM)
3839 {
3840 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3841 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3842#ifdef VBOX_WITH_PAGE_SHARING
3843 /* The ring-3 chunk cache can be out of sync; don't fail. */
3844 return VINF_SUCCESS;
3845#else
3846 return VERR_GMM_CHUNK_ALREADY_MAPPED;
3847#endif
3848 }
3849 }
3850
3851 /*
3852 * Do the mapping.
3853 */
3854 RTR0MEMOBJ hMapObj;
3855 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
3856 if (RT_SUCCESS(rc))
3857 {
3858 /* reallocate the array? assumes few users per chunk (usually one). */
3859 unsigned iMapping = pChunk->cMappingsX;
3860 if ( iMapping <= 3
3861 || (iMapping & 3) == 0)
3862 {
3863 unsigned cNewSize = iMapping <= 3
3864 ? iMapping + 1
3865 : iMapping + 4;
3866 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
3867 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
3868 {
3869 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3870 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
3871 }
3872
3873 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
3874 if (RT_UNLIKELY(!pvMappings))
3875 {
3876 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3877 return VERR_NO_MEMORY;
3878 }
3879 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
3880 }
3881
3882 /* insert new entry */
3883 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
3884 pChunk->paMappingsX[iMapping].pGVM = pGVM;
3885 Assert(pChunk->cMappingsX == iMapping);
3886 pChunk->cMappingsX = iMapping + 1;
3887
3888 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
3889 }
3890
3891 return rc;
3892}
3893
3894
3895/**
3896 * Maps a chunk into the user address space of the current process.
3897 *
3898 * @returns VBox status code.
3899 * @param pGMM Pointer to the GMM instance data.
3900 * @param pGVM Pointer to the Global VM structure.
3901 * @param pChunk Pointer to the chunk to be mapped.
3902 * @param fRelaxedSem Whether we can release the semaphore while doing the
3903 * mapping (@c true) or not.
3904 * @param ppvR3 Where to store the ring-3 address of the mapping.
3905 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3906 * contain the address of the existing mapping.
3907 */
3908static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
3909{
3910 /*
3911 * Take the chunk lock and leave the giant GMM lock when possible, then
3912 * call the worker function.
3913 */
3914 GMMR0CHUNKMTXSTATE MtxState;
3915 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3916 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3917 if (RT_SUCCESS(rc))
3918 {
3919 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
3920 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3921 }
3922
3923 return rc;
3924}
3925
3926
3927
3928#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
3929/**
3930 * Check if a chunk is mapped into the specified VM
3931 *
3932 * @returns mapped yes/no
3933 * @param pGMM Pointer to the GMM instance.
3934 * @param pGVM Pointer to the Global VM structure.
3935 * @param pChunk Pointer to the chunk to be mapped.
3936 * @param ppvR3 Where to store the ring-3 address of the mapping.
3937 */
3938static int gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3939{
3940 GMMR0CHUNKMTXSTATE MtxState;
3941 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3942 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3943 {
3944 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3945 if (pChunk->paMappingsX[i].pGVM == pGVM)
3946 {
3947 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3948 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3949 return true;
3950 }
3951 }
3952 *ppvR3 = NULL;
3953 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3954 return false;
3955}
3956#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
3957
3958
3959/**
3960 * Map a chunk and/or unmap another chunk.
3961 *
3962 * The mapping and unmapping applies to the current process.
3963 *
3964 * This API does two things because it saves a kernel call per mapping when
3965 * when the ring-3 mapping cache is full.
3966 *
3967 * @returns VBox status code.
3968 * @param pVM The VM.
3969 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
3970 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
3971 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
3972 * @thread EMT
3973 */
3974GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
3975{
3976 LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
3977 pVM, idChunkMap, idChunkUnmap, ppvR3));
3978
3979 /*
3980 * Validate input and get the basics.
3981 */
3982 PGMM pGMM;
3983 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3984 PGVM pGVM;
3985 int rc = GVMMR0ByVM(pVM, &pGVM);
3986 if (RT_FAILURE(rc))
3987 return rc;
3988
3989 AssertCompile(NIL_GMM_CHUNKID == 0);
3990 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
3991 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
3992
3993 if ( idChunkMap == NIL_GMM_CHUNKID
3994 && idChunkUnmap == NIL_GMM_CHUNKID)
3995 return VERR_INVALID_PARAMETER;
3996
3997 if (idChunkMap != NIL_GMM_CHUNKID)
3998 {
3999 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4000 *ppvR3 = NIL_RTR3PTR;
4001 }
4002
4003 /*
4004 * Take the semaphore and do the work.
4005 *
4006 * The unmapping is done last since it's easier to undo a mapping than
4007 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4008 * that it pushes the user virtual address space to within a chunk of
4009 * it it's limits, so, no problem here.
4010 */
4011 gmmR0MutexAcquire(pGMM);
4012 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4013 {
4014 PGMMCHUNK pMap = NULL;
4015 if (idChunkMap != NIL_GVM_HANDLE)
4016 {
4017 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4018 if (RT_LIKELY(pMap))
4019 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4020 else
4021 {
4022 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4023 rc = VERR_GMM_CHUNK_NOT_FOUND;
4024 }
4025 }
4026/** @todo split this operation, the bail out might (theoretcially) not be
4027 * entirely safe. */
4028
4029 if ( idChunkUnmap != NIL_GMM_CHUNKID
4030 && RT_SUCCESS(rc))
4031 {
4032 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4033 if (RT_LIKELY(pUnmap))
4034 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4035 else
4036 {
4037 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4038 rc = VERR_GMM_CHUNK_NOT_FOUND;
4039 }
4040
4041 if (RT_FAILURE(rc) && pMap)
4042 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4043 }
4044
4045 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4046 }
4047 else
4048 rc = VERR_GMM_IS_NOT_SANE;
4049 gmmR0MutexRelease(pGMM);
4050
4051 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4052 return rc;
4053}
4054
4055
4056/**
4057 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4058 *
4059 * @returns see GMMR0MapUnmapChunk.
4060 * @param pVM Pointer to the shared VM structure.
4061 * @param pReq The request packet.
4062 */
4063GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4064{
4065 /*
4066 * Validate input and pass it on.
4067 */
4068 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4069 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4070 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4071
4072 return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4073}
4074
4075
4076/**
4077 * Legacy mode API for supplying pages.
4078 *
4079 * The specified user address points to a allocation chunk sized block that
4080 * will be locked down and used by the GMM when the GM asks for pages.
4081 *
4082 * @returns VBox status code.
4083 * @param pVM The VM.
4084 * @param idCpu VCPU id
4085 * @param pvR3 Pointer to the chunk size memory block to lock down.
4086 */
4087GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4088{
4089 /*
4090 * Validate input and get the basics.
4091 */
4092 PGMM pGMM;
4093 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4094 PGVM pGVM;
4095 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4096 if (RT_FAILURE(rc))
4097 return rc;
4098
4099 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4100 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4101
4102 if (!pGMM->fLegacyAllocationMode)
4103 {
4104 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4105 return VERR_NOT_SUPPORTED;
4106 }
4107
4108 /*
4109 * Lock the memory and add it as new chunk with our hGVM.
4110 * (The GMM locking is done inside gmmR0RegisterChunk.)
4111 */
4112 RTR0MEMOBJ MemObj;
4113 rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4114 if (RT_SUCCESS(rc))
4115 {
4116 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /*fChunkFlags*/, NULL);
4117 if (RT_SUCCESS(rc))
4118 gmmR0MutexRelease(pGMM);
4119 else
4120 RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4121 }
4122
4123 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4124 return rc;
4125}
4126
4127
4128typedef struct
4129{
4130 PAVLGCPTRNODECORE pNode;
4131 char *pszModuleName;
4132 char *pszVersion;
4133 VBOXOSFAMILY enmGuestOS;
4134} GMMFINDMODULEBYNAME, *PGMMFINDMODULEBYNAME;
4135
4136/**
4137 * Tree enumeration callback for finding identical modules by name and version
4138 */
4139DECLCALLBACK(int) gmmR0CheckForIdenticalModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4140{
4141 PGMMFINDMODULEBYNAME pInfo = (PGMMFINDMODULEBYNAME)pvUser;
4142 PGMMSHAREDMODULE pModule = (PGMMSHAREDMODULE)pNode;
4143
4144 if ( pInfo
4145 && pInfo->enmGuestOS == pModule->enmGuestOS
4146 /** @todo replace with RTStrNCmp */
4147 && !strcmp(pModule->szName, pInfo->pszModuleName)
4148 && !strcmp(pModule->szVersion, pInfo->pszVersion))
4149 {
4150 pInfo->pNode = pNode;
4151 return 1; /* stop search */
4152 }
4153 return 0;
4154}
4155
4156
4157/**
4158 * Registers a new shared module for the VM
4159 *
4160 * @returns VBox status code.
4161 * @param pVM VM handle
4162 * @param idCpu VCPU id
4163 * @param enmGuestOS Guest OS type
4164 * @param pszModuleName Module name
4165 * @param pszVersion Module version
4166 * @param GCBaseAddr Module base address
4167 * @param cbModule Module size
4168 * @param cRegions Number of shared region descriptors
4169 * @param pRegions Shared region(s)
4170 */
4171GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4172 char *pszVersion, RTGCPTR GCBaseAddr, uint32_t cbModule,
4173 unsigned cRegions, VMMDEVSHAREDREGIONDESC *pRegions)
4174{
4175#ifdef VBOX_WITH_PAGE_SHARING
4176 /*
4177 * Validate input and get the basics.
4178 */
4179 PGMM pGMM;
4180 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4181 PGVM pGVM;
4182 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4183 if (RT_FAILURE(rc))
4184 return rc;
4185
4186 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4187
4188 /*
4189 * Take the semaphore and do some more validations.
4190 */
4191 gmmR0MutexAcquire(pGMM);
4192 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4193 {
4194 bool fNewModule = false;
4195
4196 /* Check if this module is already locally registered. */
4197 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4198 if (!pRecVM)
4199 {
4200 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegions[cRegions]));
4201 if (!pRecVM)
4202 {
4203 AssertFailed();
4204 rc = VERR_NO_MEMORY;
4205 goto end;
4206 }
4207 pRecVM->Core.Key = GCBaseAddr;
4208 pRecVM->cRegions = cRegions;
4209
4210 /* Save the region data as they can differ between VMs (address space scrambling or simply different loading order) */
4211 for (unsigned i = 0; i < cRegions; i++)
4212 {
4213 pRecVM->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4214 pRecVM->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4215 pRecVM->aRegions[i].u32Alignment = 0;
4216 pRecVM->aRegions[i].paHCPhysPageID = NULL; /* unused */
4217 }
4218
4219 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4220 Assert(fInsert); NOREF(fInsert);
4221
4222 Log(("GMMR0RegisterSharedModule: new local module %s\n", pszModuleName));
4223 fNewModule = true;
4224 }
4225 else
4226 rc = VINF_PGM_SHARED_MODULE_ALREADY_REGISTERED;
4227
4228 /* Check if this module is already globally registered. */
4229 PGMMSHAREDMODULE pGlobalModule = (PGMMSHAREDMODULE)RTAvlGCPtrGet(&pGMM->pGlobalSharedModuleTree, GCBaseAddr);
4230 if ( !pGlobalModule
4231 && enmGuestOS == VBOXOSFAMILY_Windows64)
4232 {
4233 /* Two identical copies of e.g. Win7 x64 will typically not have a similar virtual address space layout for dlls or kernel modules.
4234 * Try to find identical binaries based on name and version.
4235 */
4236 GMMFINDMODULEBYNAME Info;
4237
4238 Info.pNode = NULL;
4239 Info.pszVersion = pszVersion;
4240 Info.pszModuleName = pszModuleName;
4241 Info.enmGuestOS = enmGuestOS;
4242
4243 Log(("Try to find identical module %s\n", pszModuleName));
4244 int ret = RTAvlGCPtrDoWithAll(&pGMM->pGlobalSharedModuleTree, true /* fFromLeft */, gmmR0CheckForIdenticalModule, &Info);
4245 if (ret == 1)
4246 {
4247 Assert(Info.pNode);
4248 pGlobalModule = (PGMMSHAREDMODULE)Info.pNode;
4249 Log(("Found identical module at %RGv\n", pGlobalModule->Core.Key));
4250 }
4251 }
4252
4253 if (!pGlobalModule)
4254 {
4255 Assert(fNewModule);
4256 Assert(!pRecVM->fCollision);
4257
4258 pGlobalModule = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4259 if (!pGlobalModule)
4260 {
4261 AssertFailed();
4262 rc = VERR_NO_MEMORY;
4263 goto end;
4264 }
4265
4266 pGlobalModule->Core.Key = GCBaseAddr;
4267 pGlobalModule->cbModule = cbModule;
4268 /* Input limit already safe; no need to check again. */
4269 /** @todo replace with RTStrCopy */
4270 strcpy(pGlobalModule->szName, pszModuleName);
4271 strcpy(pGlobalModule->szVersion, pszVersion);
4272
4273 pGlobalModule->enmGuestOS = enmGuestOS;
4274 pGlobalModule->cRegions = cRegions;
4275
4276 for (unsigned i = 0; i < cRegions; i++)
4277 {
4278 Log(("New region %d base=%RGv size %x\n", i, pRegions[i].GCRegionAddr, pRegions[i].cbRegion));
4279 pGlobalModule->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4280 pGlobalModule->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4281 pGlobalModule->aRegions[i].u32Alignment = 0;
4282 pGlobalModule->aRegions[i].paHCPhysPageID = NULL; /* uninitialized. */
4283 }
4284
4285 /* Save reference. */
4286 pRecVM->pGlobalModule = pGlobalModule;
4287 pRecVM->fCollision = false;
4288 pGlobalModule->cUsers++;
4289 rc = VINF_SUCCESS;
4290
4291 bool fInsert = RTAvlGCPtrInsert(&pGMM->pGlobalSharedModuleTree, &pGlobalModule->Core);
4292 Assert(fInsert); NOREF(fInsert);
4293
4294 Log(("GMMR0RegisterSharedModule: new global module %s\n", pszModuleName));
4295 }
4296 else
4297 {
4298 Assert(pGlobalModule->cUsers > 0);
4299
4300 /* Make sure the name and version are identical. */
4301 /** @todo replace with RTStrNCmp */
4302 if ( !strcmp(pGlobalModule->szName, pszModuleName)
4303 && !strcmp(pGlobalModule->szVersion, pszVersion))
4304 {
4305 /* Save reference. */
4306 pRecVM->pGlobalModule = pGlobalModule;
4307 if ( fNewModule
4308 || pRecVM->fCollision == true) /* colliding module unregistered and new one registered since the last check */
4309 {
4310 pGlobalModule->cUsers++;
4311 Log(("GMMR0RegisterSharedModule: using existing module %s cUser=%d!\n", pszModuleName, pGlobalModule->cUsers));
4312 }
4313 pRecVM->fCollision = false;
4314 rc = VINF_SUCCESS;
4315 }
4316 else
4317 {
4318 Log(("GMMR0RegisterSharedModule: module %s collision!\n", pszModuleName));
4319 pRecVM->fCollision = true;
4320 rc = VINF_PGM_SHARED_MODULE_COLLISION;
4321 goto end;
4322 }
4323 }
4324
4325 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4326 }
4327 else
4328 rc = VERR_GMM_IS_NOT_SANE;
4329
4330end:
4331 gmmR0MutexRelease(pGMM);
4332 return rc;
4333#else
4334
4335 NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4336 NOREF(GCBaseAddr); NOREF(cbModule); NOREF(cRegions); NOREF(pRegions);
4337 return VERR_NOT_IMPLEMENTED;
4338#endif
4339}
4340
4341
4342/**
4343 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4344 *
4345 * @returns see GMMR0RegisterSharedModule.
4346 * @param pVM Pointer to the shared VM structure.
4347 * @param idCpu VCPU id
4348 * @param pReq The request packet.
4349 */
4350GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4351{
4352 /*
4353 * Validate input and pass it on.
4354 */
4355 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4356 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4357 AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(*pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4358
4359 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4360 pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4361 return VINF_SUCCESS;
4362}
4363
4364
4365/**
4366 * Unregisters a shared module for the VM
4367 *
4368 * @returns VBox status code.
4369 * @param pVM VM handle
4370 * @param idCpu VCPU id
4371 * @param pszModuleName Module name
4372 * @param pszVersion Module version
4373 * @param GCBaseAddr Module base address
4374 * @param cbModule Module size
4375 */
4376GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4377 RTGCPTR GCBaseAddr, uint32_t cbModule)
4378{
4379#ifdef VBOX_WITH_PAGE_SHARING
4380 /*
4381 * Validate input and get the basics.
4382 */
4383 PGMM pGMM;
4384 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4385 PGVM pGVM;
4386 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4387 if (RT_FAILURE(rc))
4388 return rc;
4389
4390 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4391
4392 /*
4393 * Take the semaphore and do some more validations.
4394 */
4395 gmmR0MutexAcquire(pGMM);
4396 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4397 {
4398 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4399 if (pRecVM)
4400 {
4401 /* Remove reference to global shared module. */
4402 if (!pRecVM->fCollision)
4403 {
4404 PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4405 Assert(pRec);
4406
4407 if (pRec) /* paranoia */
4408 {
4409 Assert(pRec->cUsers);
4410 pRec->cUsers--;
4411 if (pRec->cUsers == 0)
4412 {
4413 /* Free the ranges, but leave the pages intact as there might still be references; they will be cleared by the COW mechanism. */
4414 for (unsigned i = 0; i < pRec->cRegions; i++)
4415 if (pRec->aRegions[i].paHCPhysPageID)
4416 RTMemFree(pRec->aRegions[i].paHCPhysPageID);
4417
4418 Assert(pRec->Core.Key == GCBaseAddr || pRec->enmGuestOS == VBOXOSFAMILY_Windows64);
4419 Assert(pRec->cRegions == pRecVM->cRegions);
4420#ifdef VBOX_STRICT
4421 for (unsigned i = 0; i < pRecVM->cRegions; i++)
4422 {
4423 Assert(pRecVM->aRegions[i].GCRegionAddr == pRec->aRegions[i].GCRegionAddr);
4424 Assert(pRecVM->aRegions[i].cbRegion == pRec->aRegions[i].cbRegion);
4425 }
4426#endif
4427
4428 /* Remove from the tree and free memory. */
4429 RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4430 RTMemFree(pRec);
4431 }
4432 }
4433 else
4434 rc = VERR_PGM_SHARED_MODULE_REGISTRATION_INCONSISTENCY;
4435 }
4436 else
4437 Assert(!pRecVM->pGlobalModule);
4438
4439 /* Remove from the tree and free memory. */
4440 RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4441 RTMemFree(pRecVM);
4442 }
4443 else
4444 rc = VERR_PGM_SHARED_MODULE_NOT_FOUND;
4445
4446 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4447 }
4448 else
4449 rc = VERR_GMM_IS_NOT_SANE;
4450
4451 gmmR0MutexRelease(pGMM);
4452 return rc;
4453#else
4454
4455 NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCBaseAddr); NOREF(cbModule);
4456 return VERR_NOT_IMPLEMENTED;
4457#endif
4458}
4459
4460
4461/**
4462 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4463 *
4464 * @returns see GMMR0UnregisterSharedModule.
4465 * @param pVM Pointer to the shared VM structure.
4466 * @param idCpu VCPU id
4467 * @param pReq The request packet.
4468 */
4469GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4470{
4471 /*
4472 * Validate input and pass it on.
4473 */
4474 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4475 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4476 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4477
4478 return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4479}
4480
4481#ifdef VBOX_WITH_PAGE_SHARING
4482
4483/**
4484 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4485 *
4486 * @param pGMM Pointer to the GMM instance.
4487 * @param pGVM Pointer to the GVM instance.
4488 * @param pPage The page structure.
4489 */
4490DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4491{
4492 Assert(pGMM->cSharedPages > 0);
4493 Assert(pGMM->cAllocatedPages > 0);
4494
4495 pGMM->cDuplicatePages++;
4496
4497 pPage->Shared.cRefs++;
4498 pGVM->gmm.s.cSharedPages++;
4499 pGVM->gmm.s.Allocated.cBasePages++;
4500}
4501
4502
4503/**
4504 * Converts a private page to a shared page, the page is known to exist and be valid and such.
4505 *
4506 * @param pGMM Pointer to the GMM instance.
4507 * @param pGVM Pointer to the GVM instance.
4508 * @param HCPhys Host physical address
4509 * @param idPage The Page ID
4510 * @param pPage The page structure.
4511 */
4512DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage)
4513{
4514 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4515 Assert(pChunk);
4516 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4517 Assert(GMM_PAGE_IS_PRIVATE(pPage));
4518
4519 pChunk->cPrivate--;
4520 pChunk->cShared++;
4521
4522 pGMM->cSharedPages++;
4523
4524 pGVM->gmm.s.cSharedPages++;
4525 pGVM->gmm.s.cPrivatePages--;
4526
4527 /* Modify the page structure. */
4528 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4529 pPage->Shared.cRefs = 1;
4530 pPage->Common.u2State = GMM_PAGE_STATE_SHARED;
4531}
4532
4533
4534/**
4535 * Checks specified shared module range for changes
4536 *
4537 * Performs the following tasks:
4538 * - If a shared page is new, then it changes the GMM page type to shared and
4539 * returns it in the pPageDesc descriptor.
4540 * - If a shared page already exists, then it checks if the VM page is
4541 * identical and if so frees the VM page and returns the shared page in
4542 * pPageDesc descriptor.
4543 *
4544 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
4545 *
4546 * @returns VBox status code.
4547 * @param pGMM Pointer to the GMM instance data.
4548 * @param pGVM Pointer to the GVM instance data.
4549 * @param pModule Module description
4550 * @param idxRegion Region index
4551 * @param idxPage Page index
4552 * @param paPageDesc Page descriptor
4553 */
4554GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, unsigned idxRegion, unsigned idxPage,
4555 PGMMSHAREDPAGEDESC pPageDesc)
4556{
4557 int rc = VINF_SUCCESS;
4558 PGMM pGMM;
4559 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4560 unsigned cPages = pModule->aRegions[idxRegion].cbRegion >> PAGE_SHIFT;
4561
4562 AssertReturn(idxRegion < pModule->cRegions, VERR_INVALID_PARAMETER);
4563 AssertReturn(idxPage < cPages, VERR_INVALID_PARAMETER);
4564
4565 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4566
4567 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4568 if (!pGlobalRegion->paHCPhysPageID)
4569 {
4570 /* First time; create a page descriptor array. */
4571 Log(("Allocate page descriptor array for %d pages\n", cPages));
4572 pGlobalRegion->paHCPhysPageID = (uint32_t *)RTMemAlloc(cPages * sizeof(*pGlobalRegion->paHCPhysPageID));
4573 if (!pGlobalRegion->paHCPhysPageID)
4574 {
4575 AssertFailed();
4576 rc = VERR_NO_MEMORY;
4577 goto end;
4578 }
4579 /* Invalidate all descriptors. */
4580 for (unsigned i = 0; i < cPages; i++)
4581 pGlobalRegion->paHCPhysPageID[i] = NIL_GMM_PAGEID;
4582 }
4583
4584 /* We've seen this shared page for the first time? */
4585 if (pGlobalRegion->paHCPhysPageID[idxPage] == NIL_GMM_PAGEID)
4586 {
4587new_shared_page:
4588 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4589
4590 /* Easy case: just change the internal page type. */
4591 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->uHCPhysPageId);
4592 if (!pPage)
4593 {
4594 Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #1 (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x)\n",
4595 pPageDesc->uHCPhysPageId, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage));
4596 AssertFailed();
4597 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4598 goto end;
4599 }
4600
4601 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4602
4603 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->uHCPhysPageId, pPage);
4604
4605 /* Keep track of these references. */
4606 pGlobalRegion->paHCPhysPageID[idxPage] = pPageDesc->uHCPhysPageId;
4607 }
4608 else
4609 {
4610 uint8_t *pbLocalPage, *pbSharedPage;
4611 uint8_t *pbChunk;
4612 PGMMCHUNK pChunk;
4613
4614 Assert(pPageDesc->uHCPhysPageId != pGlobalRegion->paHCPhysPageID[idxPage]);
4615
4616 Log(("Replace existing page guest %RGp host %RHp id %x -> id %x\n", pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->uHCPhysPageId, pGlobalRegion->paHCPhysPageID[idxPage]));
4617
4618 /* Get the shared page source. */
4619 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paHCPhysPageID[idxPage]);
4620 if (!pPage)
4621 {
4622 Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #2 (idxRegion=%#x idxPage=%#x)\n",
4623 pPageDesc->uHCPhysPageId, idxRegion, idxPage));
4624 AssertFailed();
4625 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4626 goto end;
4627 }
4628 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4629 {
4630 /* Page was freed at some point; invalidate this entry. */
4631 /** @todo this isn't really bullet proof. */
4632 Log(("Old shared page was freed -> create a new one\n"));
4633 pGlobalRegion->paHCPhysPageID[idxPage] = NIL_GMM_PAGEID;
4634 goto new_shared_page; /* ugly goto */
4635 }
4636
4637 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4638
4639 /* Calculate the virtual address of the local page. */
4640 pChunk = gmmR0GetChunk(pGMM, pPageDesc->uHCPhysPageId >> GMM_CHUNKID_SHIFT);
4641 if (pChunk)
4642 {
4643 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4644 {
4645 Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #3\n", pPageDesc->uHCPhysPageId));
4646 AssertFailed();
4647 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4648 goto end;
4649 }
4650 pbLocalPage = pbChunk + ((pPageDesc->uHCPhysPageId & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4651 }
4652 else
4653 {
4654 Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #4\n", pPageDesc->uHCPhysPageId));
4655 AssertFailed();
4656 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4657 goto end;
4658 }
4659
4660 /* Calculate the virtual address of the shared page. */
4661 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paHCPhysPageID[idxPage] >> GMM_CHUNKID_SHIFT);
4662 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4663
4664 /* Get the virtual address of the physical page; map the chunk into the VM process if not already done. */
4665 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4666 {
4667 Log(("Map chunk into process!\n"));
4668 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
4669 if (rc != VINF_SUCCESS)
4670 {
4671 AssertRC(rc);
4672 goto end;
4673 }
4674 }
4675 pbSharedPage = pbChunk + ((pGlobalRegion->paHCPhysPageID[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4676
4677 /** @todo write ASMMemComparePage. */
4678 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4679 {
4680 Log(("Unexpected differences found between local and shared page; skip\n"));
4681 /* Signal to the caller that this one hasn't changed. */
4682 pPageDesc->uHCPhysPageId = NIL_GMM_PAGEID;
4683 goto end;
4684 }
4685
4686 /* Free the old local page. */
4687 GMMFREEPAGEDESC PageDesc;
4688
4689 PageDesc.idPage = pPageDesc->uHCPhysPageId;
4690 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4691 AssertRCReturn(rc, rc);
4692
4693 gmmR0UseSharedPage(pGMM, pGVM, pPage);
4694
4695 /* Pass along the new physical address & page id. */
4696 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
4697 pPageDesc->uHCPhysPageId = pGlobalRegion->paHCPhysPageID[idxPage];
4698 }
4699end:
4700 return rc;
4701}
4702
4703
4704/**
4705 * RTAvlGCPtrDestroy callback.
4706 *
4707 * @returns 0 or VERR_GMM_INSTANCE.
4708 * @param pNode The node to destroy.
4709 * @param pvGVM The GVM handle.
4710 */
4711static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvGVM)
4712{
4713 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
4714 NOREF(pvGVM);
4715
4716 Assert(pRecVM->pGlobalModule || pRecVM->fCollision);
4717 if (pRecVM->pGlobalModule)
4718 {
4719 PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4720 AssertPtr(pRec);
4721 Assert(pRec->cUsers);
4722
4723 Log(("gmmR0CleanupSharedModule: %s %s cUsers=%d\n", pRec->szName, pRec->szVersion, pRec->cUsers));
4724 pRec->cUsers--;
4725 if (pRec->cUsers == 0)
4726 {
4727 for (uint32_t i = 0; i < pRec->cRegions; i++)
4728 if (pRec->aRegions[i].paHCPhysPageID)
4729 RTMemFree(pRec->aRegions[i].paHCPhysPageID);
4730
4731 /* Remove from the tree and free memory. */
4732 PGMM pGMM;
4733 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4734 RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4735 RTMemFree(pRec);
4736 }
4737 }
4738 RTMemFree(pRecVM);
4739 return 0;
4740}
4741
4742
4743/**
4744 * Used by GMMR0CleanupVM to clean up shared modules.
4745 *
4746 * This is called without taking the GMM lock so that it can be yielded as
4747 * needed here.
4748 *
4749 * @param pGMM The GMM handle.
4750 * @param pGVM The global VM handle.
4751 */
4752static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
4753{
4754 gmmR0MutexAcquire(pGMM);
4755 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
4756
4757 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4758
4759 gmmR0MutexRelease(pGMM);
4760}
4761
4762#endif /* VBOX_WITH_PAGE_SHARING */
4763
4764/**
4765 * Removes all shared modules for the specified VM
4766 *
4767 * @returns VBox status code.
4768 * @param pVM VM handle
4769 * @param idCpu VCPU id
4770 */
4771GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
4772{
4773#ifdef VBOX_WITH_PAGE_SHARING
4774 /*
4775 * Validate input and get the basics.
4776 */
4777 PGMM pGMM;
4778 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4779 PGVM pGVM;
4780 int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4781 if (RT_FAILURE(rc))
4782 return rc;
4783
4784 /*
4785 * Take the semaphore and do some more validations.
4786 */
4787 gmmR0MutexAcquire(pGMM);
4788 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4789 {
4790 Log(("GMMR0ResetSharedModules\n"));
4791 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4792
4793 rc = VINF_SUCCESS;
4794 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4795 }
4796 else
4797 rc = VERR_GMM_IS_NOT_SANE;
4798
4799 gmmR0MutexRelease(pGMM);
4800 return rc;
4801#else
4802 NOREF(pVM); NOREF(idCpu);
4803 return VERR_NOT_IMPLEMENTED;
4804#endif
4805}
4806
4807#ifdef VBOX_WITH_PAGE_SHARING
4808
4809typedef struct
4810{
4811 PGVM pGVM;
4812 VMCPUID idCpu;
4813 int rc;
4814} GMMCHECKSHAREDMODULEINFO, *PGMMCHECKSHAREDMODULEINFO;
4815
4816/**
4817 * Tree enumeration callback for checking a shared module.
4818 */
4819DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4820{
4821 PGMMCHECKSHAREDMODULEINFO pInfo = (PGMMCHECKSHAREDMODULEINFO)pvUser;
4822 PGMMSHAREDMODULEPERVM pLocalModule = (PGMMSHAREDMODULEPERVM)pNode;
4823 PGMMSHAREDMODULE pGlobalModule = pLocalModule->pGlobalModule;
4824
4825 if ( !pLocalModule->fCollision
4826 && pGlobalModule)
4827 {
4828 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x collision=%d\n", pGlobalModule->szName, pGlobalModule->szVersion, pGlobalModule->Core.Key, pGlobalModule->cbModule, pLocalModule->fCollision));
4829 pInfo->rc = PGMR0SharedModuleCheck(pInfo->pGVM->pVM, pInfo->pGVM, pInfo->idCpu, pGlobalModule, pLocalModule->cRegions, pLocalModule->aRegions);
4830 if (RT_FAILURE(pInfo->rc))
4831 return 1; /* stop enumeration. */
4832 }
4833 return 0;
4834}
4835
4836#endif /* VBOX_WITH_PAGE_SHARING */
4837#ifdef DEBUG_sandervl
4838
4839/**
4840 * Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4841 *
4842 * @returns VBox status code.
4843 * @param pVM VM handle
4844 */
4845GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
4846{
4847 /*
4848 * Validate input and get the basics.
4849 */
4850 PGMM pGMM;
4851 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4852
4853 /*
4854 * Take the semaphore and do some more validations.
4855 */
4856 gmmR0MutexAcquire(pGMM);
4857 if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4858 rc = VERR_GMM_IS_NOT_SANE;
4859 else
4860 rc = VINF_SUCCESS;
4861
4862 return rc;
4863}
4864
4865/**
4866 * Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4867 *
4868 * @returns VBox status code.
4869 * @param pVM VM handle
4870 */
4871GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
4872{
4873 /*
4874 * Validate input and get the basics.
4875 */
4876 PGMM pGMM;
4877 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4878
4879 gmmR0MutexRelease(pGMM);
4880 return VINF_SUCCESS;
4881}
4882
4883#endif /* DEBUG_sandervl */
4884
4885/**
4886 * Check all shared modules for the specified VM
4887 *
4888 * @returns VBox status code.
4889 * @param pVM VM handle
4890 * @param pVCpu VMCPU handle
4891 */
4892GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
4893{
4894#ifdef VBOX_WITH_PAGE_SHARING
4895 /*
4896 * Validate input and get the basics.
4897 */
4898 PGMM pGMM;
4899 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4900 PGVM pGVM;
4901 int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
4902 if (RT_FAILURE(rc))
4903 return rc;
4904
4905# ifndef DEBUG_sandervl
4906 /*
4907 * Take the semaphore and do some more validations.
4908 */
4909 gmmR0MutexAcquire(pGMM);
4910# endif
4911 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4912 {
4913 GMMCHECKSHAREDMODULEINFO Info;
4914
4915 Log(("GMMR0CheckSharedModules\n"));
4916 Info.pGVM = pGVM;
4917 Info.idCpu = pVCpu->idCpu;
4918 Info.rc = VINF_SUCCESS;
4919
4920 RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Info);
4921
4922 rc = Info.rc;
4923
4924 Log(("GMMR0CheckSharedModules done!\n"));
4925
4926 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4927 }
4928 else
4929 rc = VERR_GMM_IS_NOT_SANE;
4930
4931# ifndef DEBUG_sandervl
4932 gmmR0MutexRelease(pGMM);
4933# endif
4934 return rc;
4935#else
4936 NOREF(pVM); NOREF(pVCpu);
4937 return VERR_NOT_IMPLEMENTED;
4938#endif
4939}
4940
4941#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
4942
4943typedef struct
4944{
4945 PGVM pGVM;
4946 PGMM pGMM;
4947 uint8_t *pSourcePage;
4948 bool fFoundDuplicate;
4949} GMMFINDDUPPAGEINFO, *PGMMFINDDUPPAGEINFO;
4950
4951/**
4952 * RTAvlU32DoWithAll callback.
4953 *
4954 * @returns 0
4955 * @param pNode The node to search.
4956 * @param pvInfo Pointer to the input parameters
4957 */
4958static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvInfo)
4959{
4960 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
4961 PGMMFINDDUPPAGEINFO pInfo = (PGMMFINDDUPPAGEINFO)pvInfo;
4962 PGVM pGVM = pInfo->pGVM;
4963 PGMM pGMM = pInfo->pGMM;
4964 uint8_t *pbChunk;
4965
4966 /* Only take chunks not mapped into this VM process; not entirely correct. */
4967 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4968 {
4969 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
4970 if (RT_SUCCESS(rc))
4971 {
4972 /*
4973 * Look for duplicate pages
4974 */
4975 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
4976 while (iPage-- > 0)
4977 {
4978 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
4979 {
4980 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
4981
4982 if (!memcmp(pInfo->pSourcePage, pbDestPage, PAGE_SIZE))
4983 {
4984 pInfo->fFoundDuplicate = true;
4985 break;
4986 }
4987 }
4988 }
4989 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
4990 }
4991 }
4992 return pInfo->fFoundDuplicate; /* (stops search if true) */
4993}
4994
4995
4996/**
4997 * Find a duplicate of the specified page in other active VMs
4998 *
4999 * @returns VBox status code.
5000 * @param pVM VM handle
5001 * @param pReq Request packet
5002 */
5003GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5004{
5005 /*
5006 * Validate input and pass it on.
5007 */
5008 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5009 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5010 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5011
5012 PGMM pGMM;
5013 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5014
5015 PGVM pGVM;
5016 int rc = GVMMR0ByVM(pVM, &pGVM);
5017 if (RT_FAILURE(rc))
5018 return rc;
5019
5020 /*
5021 * Take the semaphore and do some more validations.
5022 */
5023 rc = gmmR0MutexAcquire(pGMM);
5024 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5025 {
5026 uint8_t *pbChunk;
5027 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5028 if (pChunk)
5029 {
5030 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5031 {
5032 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5033 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5034 if (pPage)
5035 {
5036 GMMFINDDUPPAGEINFO Info;
5037 Info.pGVM = pGVM;
5038 Info.pGMM = pGMM;
5039 Info.pSourcePage = pbSourcePage;
5040 Info.fFoundDuplicate = false;
5041 RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Info);
5042
5043 pReq->fDuplicate = Info.fFoundDuplicate;
5044 }
5045 else
5046 {
5047 AssertFailed();
5048 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5049 }
5050 }
5051 else
5052 AssertFailed();
5053 }
5054 else
5055 AssertFailed();
5056 }
5057 else
5058 rc = VERR_GMM_IS_NOT_SANE;
5059
5060 gmmR0MutexRelease(pGMM);
5061 return rc;
5062}
5063
5064#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5065
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette