VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 92339

Last change on this file since 92339 was 92339, checked in by vboxsync, 3 years ago

VMM/GMM: Optimized GMMR0AllocateLargePage a little by making gmmR0RegisterChunk mark the chunk allocated, eliminating 512 calls to gmmR0AllocatePage. bugref:10093

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 197.0 KB
Line 
1/* $Id: GMMR0.cpp 92339 2021-11-10 21:21:20Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183/* This is 64-bit only code now. */
184#if HC_ARCH_BITS != 64 || ARCH_BITS != 64
185# error "This is 64-bit only code"
186#endif
187
188
189/*********************************************************************************************************************************
190* Defined Constants And Macros *
191*********************************************************************************************************************************/
192/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
193 * Use a critical section instead of a fast mutex for the giant GMM lock.
194 *
195 * @remarks This is primarily a way of avoiding the deadlock checks in the
196 * windows driver verifier. */
197#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
198# define VBOX_USE_CRIT_SECT_FOR_GIANT
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * Because of the different layout on 32-bit and 64-bit hosts in earlier
212 * versions of the code, macros are used to get and set some of the data.
213 */
214typedef union GMMPAGE
215{
216 /** Unsigned integer view. */
217 uint64_t u;
218
219 /** The common view. */
220 struct GMMPAGECOMMON
221 {
222 uint32_t uStuff1 : 32;
223 uint32_t uStuff2 : 30;
224 /** The page state. */
225 uint32_t u2State : 2;
226 } Common;
227
228 /** The view of a private page. */
229 struct GMMPAGEPRIVATE
230 {
231 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
232 uint32_t pfn;
233 /** The GVM handle. (64K VMs) */
234 uint32_t hGVM : 16;
235 /** Reserved. */
236 uint32_t u16Reserved : 14;
237 /** The page state. */
238 uint32_t u2State : 2;
239 } Private;
240
241 /** The view of a shared page. */
242 struct GMMPAGESHARED
243 {
244 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
245 uint32_t pfn;
246 /** The reference count (64K VMs). */
247 uint32_t cRefs : 16;
248 /** Used for debug checksumming. */
249 uint32_t u14Checksum : 14;
250 /** The page state. */
251 uint32_t u2State : 2;
252 } Shared;
253
254 /** The view of a free page. */
255 struct GMMPAGEFREE
256 {
257 /** The index of the next page in the free list. UINT16_MAX is NIL. */
258 uint16_t iNext;
259 /** Reserved. Checksum or something? */
260 uint16_t u16Reserved0;
261 /** Reserved. Checksum or something? */
262 uint32_t u30Reserved1 : 29;
263 /** Set if the page was zeroed. */
264 uint32_t fZeroed : 1;
265 /** The page state. */
266 uint32_t u2State : 2;
267 } Free;
268} GMMPAGE;
269AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
270/** Pointer to a GMMPAGE. */
271typedef GMMPAGE *PGMMPAGE;
272
273
274/** @name The Page States.
275 * @{ */
276/** A private page. */
277#define GMM_PAGE_STATE_PRIVATE 0
278/** A shared page. */
279#define GMM_PAGE_STATE_SHARED 2
280/** A free page. */
281#define GMM_PAGE_STATE_FREE 3
282/** @} */
283
284
285/** @def GMM_PAGE_IS_PRIVATE
286 *
287 * @returns true if private, false if not.
288 * @param pPage The GMM page.
289 */
290#define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
291
292/** @def GMM_PAGE_IS_SHARED
293 *
294 * @returns true if shared, false if not.
295 * @param pPage The GMM page.
296 */
297#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
298
299/** @def GMM_PAGE_IS_FREE
300 *
301 * @returns true if free, false if not.
302 * @param pPage The GMM page.
303 */
304#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
305
306/** @def GMM_PAGE_PFN_LAST
307 * The last valid guest pfn range.
308 * @remark Some of the values outside the range has special meaning,
309 * see GMM_PAGE_PFN_UNSHAREABLE.
310 */
311#define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
312AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
313
314/** @def GMM_PAGE_PFN_UNSHAREABLE
315 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
316 */
317#define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
318AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
319
320
321/**
322 * A GMM allocation chunk ring-3 mapping record.
323 *
324 * This should really be associated with a session and not a VM, but
325 * it's simpler to associated with a VM and cleanup with the VM object
326 * is destroyed.
327 */
328typedef struct GMMCHUNKMAP
329{
330 /** The mapping object. */
331 RTR0MEMOBJ hMapObj;
332 /** The VM owning the mapping. */
333 PGVM pGVM;
334} GMMCHUNKMAP;
335/** Pointer to a GMM allocation chunk mapping. */
336typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
337
338
339/**
340 * A GMM allocation chunk.
341 */
342typedef struct GMMCHUNK
343{
344 /** The AVL node core.
345 * The Key is the chunk ID. (Giant mtx.) */
346 AVLU32NODECORE Core;
347 /** The memory object.
348 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
349 * what the host can dish up with. (Chunk mtx protects mapping accesses
350 * and related frees.) */
351 RTR0MEMOBJ hMemObj;
352#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
353 /** Pointer to the kernel mapping. */
354 uint8_t *pbMapping;
355#endif
356 /** Pointer to the next chunk in the free list. (Giant mtx.) */
357 PGMMCHUNK pFreeNext;
358 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
359 PGMMCHUNK pFreePrev;
360 /** Pointer to the free set this chunk belongs to. NULL for
361 * chunks with no free pages. (Giant mtx.) */
362 PGMMCHUNKFREESET pSet;
363 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
364 RTLISTNODE ListNode;
365 /** Pointer to an array of mappings. (Chunk mtx.) */
366 PGMMCHUNKMAP paMappingsX;
367 /** The number of mappings. (Chunk mtx.) */
368 uint16_t cMappingsX;
369 /** The mapping lock this chunk is using using. UINT8_MAX if nobody is mapping
370 * or freeing anything. (Giant mtx.) */
371 uint8_t volatile iChunkMtx;
372 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
373 uint8_t fFlags;
374 /** The head of the list of free pages. UINT16_MAX is the NIL value.
375 * (Giant mtx.) */
376 uint16_t iFreeHead;
377 /** The number of free pages. (Giant mtx.) */
378 uint16_t cFree;
379 /** The GVM handle of the VM that first allocated pages from this chunk, this
380 * is used as a preference when there are several chunks to choose from.
381 * When in bound memory mode this isn't a preference any longer. (Giant
382 * mtx.) */
383 uint16_t hGVM;
384 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
385 * future use.) (Giant mtx.) */
386 uint16_t idNumaNode;
387 /** The number of private pages. (Giant mtx.) */
388 uint16_t cPrivate;
389 /** The number of shared pages. (Giant mtx.) */
390 uint16_t cShared;
391 /** The UID this chunk is associated with. */
392 RTUID uidOwner;
393 uint32_t u32Padding;
394 /** The pages. (Giant mtx.) */
395 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
396} GMMCHUNK;
397
398/** Indicates that the NUMA properies of the memory is unknown. */
399#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
400
401/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
402 * @{ */
403/** Indicates that the chunk is a large page (2MB). */
404#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
405/** @} */
406
407
408/**
409 * An allocation chunk TLB entry.
410 */
411typedef struct GMMCHUNKTLBE
412{
413 /** The chunk id. */
414 uint32_t idChunk;
415 /** Pointer to the chunk. */
416 PGMMCHUNK pChunk;
417} GMMCHUNKTLBE;
418/** Pointer to an allocation chunk TLB entry. */
419typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
420
421
422/** The number of entries in the allocation chunk TLB. */
423#define GMM_CHUNKTLB_ENTRIES 32
424/** Gets the TLB entry index for the given Chunk ID. */
425#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
426
427/**
428 * An allocation chunk TLB.
429 */
430typedef struct GMMCHUNKTLB
431{
432 /** The TLB entries. */
433 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
434} GMMCHUNKTLB;
435/** Pointer to an allocation chunk TLB. */
436typedef GMMCHUNKTLB *PGMMCHUNKTLB;
437
438
439/**
440 * The GMM instance data.
441 */
442typedef struct GMM
443{
444 /** Magic / eye catcher. GMM_MAGIC */
445 uint32_t u32Magic;
446 /** The number of threads waiting on the mutex. */
447 uint32_t cMtxContenders;
448#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
449 /** The critical section protecting the GMM.
450 * More fine grained locking can be implemented later if necessary. */
451 RTCRITSECT GiantCritSect;
452#else
453 /** The fast mutex protecting the GMM.
454 * More fine grained locking can be implemented later if necessary. */
455 RTSEMFASTMUTEX hMtx;
456#endif
457#ifdef VBOX_STRICT
458 /** The current mutex owner. */
459 RTNATIVETHREAD hMtxOwner;
460#endif
461 /** Spinlock protecting the AVL tree.
462 * @todo Make this a read-write spinlock as we should allow concurrent
463 * lookups. */
464 RTSPINLOCK hSpinLockTree;
465 /** The chunk tree.
466 * Protected by hSpinLockTree. */
467 PAVLU32NODECORE pChunks;
468 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
469 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
470 * (exclusive), though higher numbers may temporarily occure while
471 * invalidating the individual TLBs during wrap-around processing. */
472 uint64_t volatile idFreeGeneration;
473 /** The chunk TLB.
474 * Protected by hSpinLockTree. */
475 GMMCHUNKTLB ChunkTLB;
476 /** The private free set. */
477 GMMCHUNKFREESET PrivateX;
478 /** The shared free set. */
479 GMMCHUNKFREESET Shared;
480
481 /** Shared module tree (global).
482 * @todo separate trees for distinctly different guest OSes. */
483 PAVLLU32NODECORE pGlobalSharedModuleTree;
484 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
485 uint32_t cShareableModules;
486
487 /** The chunk list. For simplifying the cleanup process and avoid tree
488 * traversal. */
489 RTLISTANCHOR ChunkList;
490
491 /** The maximum number of pages we're allowed to allocate.
492 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
493 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
494 uint64_t cMaxPages;
495 /** The number of pages that has been reserved.
496 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
497 uint64_t cReservedPages;
498 /** The number of pages that we have over-committed in reservations. */
499 uint64_t cOverCommittedPages;
500 /** The number of actually allocated (committed if you like) pages. */
501 uint64_t cAllocatedPages;
502 /** The number of pages that are shared. A subset of cAllocatedPages. */
503 uint64_t cSharedPages;
504 /** The number of pages that are actually shared between VMs. */
505 uint64_t cDuplicatePages;
506 /** The number of pages that are shared that has been left behind by
507 * VMs not doing proper cleanups. */
508 uint64_t cLeftBehindSharedPages;
509 /** The number of allocation chunks.
510 * (The number of pages we've allocated from the host can be derived from this.) */
511 uint32_t cChunks;
512 /** The number of current ballooned pages. */
513 uint64_t cBalloonedPages;
514
515#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
516 /** Whether #RTR0MemObjAllocPhysNC works. */
517 bool fHasWorkingAllocPhysNC;
518#else
519 bool fPadding;
520#endif
521 /** The bound memory mode indicator.
522 * When set, the memory will be bound to a specific VM and never
523 * shared. This is always set if fLegacyAllocationMode is set.
524 * (Also determined at initialization time.) */
525 bool fBoundMemoryMode;
526 /** The number of registered VMs. */
527 uint16_t cRegisteredVMs;
528
529 /** The index of the next mutex to use. */
530 uint32_t iNextChunkMtx;
531 /** Chunk locks for reducing lock contention without having to allocate
532 * one lock per chunk. */
533 struct
534 {
535 /** The mutex */
536 RTSEMFASTMUTEX hMtx;
537 /** The number of threads currently using this mutex. */
538 uint32_t volatile cUsers;
539 } aChunkMtx[64];
540
541 /** The number of freed chunks ever. This is used as list generation to
542 * avoid restarting the cleanup scanning when the list wasn't modified. */
543 uint32_t volatile cFreedChunks;
544 /** The previous allocated Chunk ID.
545 * Used as a hint to avoid scanning the whole bitmap. */
546 uint32_t idChunkPrev;
547 /** Chunk ID allocation bitmap.
548 * Bits of allocated IDs are set, free ones are clear.
549 * The NIL id (0) is marked allocated. */
550 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
551} GMM;
552/** Pointer to the GMM instance. */
553typedef GMM *PGMM;
554
555/** The value of GMM::u32Magic (Katsuhiro Otomo). */
556#define GMM_MAGIC UINT32_C(0x19540414)
557
558
559/**
560 * GMM chunk mutex state.
561 *
562 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
563 * gmmR0ChunkMutex* methods.
564 */
565typedef struct GMMR0CHUNKMTXSTATE
566{
567 PGMM pGMM;
568 /** The index of the chunk mutex. */
569 uint8_t iChunkMtx;
570 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
571 uint8_t fFlags;
572} GMMR0CHUNKMTXSTATE;
573/** Pointer to a chunk mutex state. */
574typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
575
576/** @name GMMR0CHUNK_MTX_XXX
577 * @{ */
578#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
579#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
580#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
581#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
582#define GMMR0CHUNK_MTX_END UINT32_C(4)
583/** @} */
584
585
586/** The maximum number of shared modules per-vm. */
587#define GMM_MAX_SHARED_PER_VM_MODULES 2048
588/** The maximum number of shared modules GMM is allowed to track. */
589#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
590
591
592/**
593 * Argument packet for gmmR0SharedModuleCleanup.
594 */
595typedef struct GMMR0SHMODPERVMDTORARGS
596{
597 PGVM pGVM;
598 PGMM pGMM;
599} GMMR0SHMODPERVMDTORARGS;
600
601/**
602 * Argument packet for gmmR0CheckSharedModule.
603 */
604typedef struct GMMCHECKSHAREDMODULEINFO
605{
606 PGVM pGVM;
607 VMCPUID idCpu;
608} GMMCHECKSHAREDMODULEINFO;
609
610
611/*********************************************************************************************************************************
612* Global Variables *
613*********************************************************************************************************************************/
614/** Pointer to the GMM instance data. */
615static PGMM g_pGMM = NULL;
616
617/** Macro for obtaining and validating the g_pGMM pointer.
618 *
619 * On failure it will return from the invoking function with the specified
620 * return value.
621 *
622 * @param pGMM The name of the pGMM variable.
623 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
624 * status codes.
625 */
626#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
627 do { \
628 (pGMM) = g_pGMM; \
629 AssertPtrReturn((pGMM), (rc)); \
630 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
631 } while (0)
632
633/** Macro for obtaining and validating the g_pGMM pointer, void function
634 * variant.
635 *
636 * On failure it will return from the invoking function.
637 *
638 * @param pGMM The name of the pGMM variable.
639 */
640#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
641 do { \
642 (pGMM) = g_pGMM; \
643 AssertPtrReturnVoid((pGMM)); \
644 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
645 } while (0)
646
647
648/** @def GMM_CHECK_SANITY_UPON_ENTERING
649 * Checks the sanity of the GMM instance data before making changes.
650 *
651 * This is macro is a stub by default and must be enabled manually in the code.
652 *
653 * @returns true if sane, false if not.
654 * @param pGMM The name of the pGMM variable.
655 */
656#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
657# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (RT_LIKELY(gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0))
658#else
659# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
660#endif
661
662/** @def GMM_CHECK_SANITY_UPON_LEAVING
663 * Checks the sanity of the GMM instance data after making changes.
664 *
665 * This is macro is a stub by default and must be enabled manually in the code.
666 *
667 * @returns true if sane, false if not.
668 * @param pGMM The name of the pGMM variable.
669 */
670#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
671# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
672#else
673# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
674#endif
675
676/** @def GMM_CHECK_SANITY_IN_LOOPS
677 * Checks the sanity of the GMM instance in the allocation loops.
678 *
679 * This is macro is a stub by default and must be enabled manually in the code.
680 *
681 * @returns true if sane, false if not.
682 * @param pGMM The name of the pGMM variable.
683 */
684#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
685# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
686#else
687# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
688#endif
689
690
691/*********************************************************************************************************************************
692* Internal Functions *
693*********************************************************************************************************************************/
694static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
695static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
696DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
697DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
698DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
699#ifdef GMMR0_WITH_SANITY_CHECK
700static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
701#endif
702static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
703DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
704DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
705static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
706#ifdef VBOX_WITH_PAGE_SHARING
707static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
708# ifdef VBOX_STRICT
709static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
710# endif
711#endif
712
713
714
715/**
716 * Initializes the GMM component.
717 *
718 * This is called when the VMMR0.r0 module is loaded and protected by the
719 * loader semaphore.
720 *
721 * @returns VBox status code.
722 */
723GMMR0DECL(int) GMMR0Init(void)
724{
725 LogFlow(("GMMInit:\n"));
726
727 /*
728 * Allocate the instance data and the locks.
729 */
730 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
731 if (!pGMM)
732 return VERR_NO_MEMORY;
733
734 pGMM->u32Magic = GMM_MAGIC;
735 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
736 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
737 RTListInit(&pGMM->ChunkList);
738 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
739
740#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
741 int rc = RTCritSectInit(&pGMM->GiantCritSect);
742#else
743 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
744#endif
745 if (RT_SUCCESS(rc))
746 {
747 unsigned iMtx;
748 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
749 {
750 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
751 if (RT_FAILURE(rc))
752 break;
753 }
754 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
755 if (RT_SUCCESS(rc))
756 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
757 if (RT_SUCCESS(rc))
758 {
759 /*
760 * Figure out how we're going to allocate stuff (only applicable to
761 * host with linear physical memory mappings).
762 */
763 pGMM->fBoundMemoryMode = false;
764#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
765 pGMM->fHasWorkingAllocPhysNC = false;
766
767 RTR0MEMOBJ hMemObj;
768 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
769 if (RT_SUCCESS(rc))
770 {
771 rc = RTR0MemObjFree(hMemObj, true);
772 AssertRC(rc);
773 pGMM->fHasWorkingAllocPhysNC = true;
774 }
775 else if (rc != VERR_NOT_SUPPORTED)
776 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
777# endif
778
779 /*
780 * Query system page count and guess a reasonable cMaxPages value.
781 */
782 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
783
784 /*
785 * The idFreeGeneration value should be set so we actually trigger the
786 * wrap-around invalidation handling during a typical test run.
787 */
788 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
789
790 g_pGMM = pGMM;
791#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
792 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
793#else
794 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
795#endif
796 return VINF_SUCCESS;
797 }
798
799 /*
800 * Bail out.
801 */
802 RTSpinlockDestroy(pGMM->hSpinLockTree);
803 while (iMtx-- > 0)
804 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
805#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
806 RTCritSectDelete(&pGMM->GiantCritSect);
807#else
808 RTSemFastMutexDestroy(pGMM->hMtx);
809#endif
810 }
811
812 pGMM->u32Magic = 0;
813 RTMemFree(pGMM);
814 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
815 return rc;
816}
817
818
819/**
820 * Terminates the GMM component.
821 */
822GMMR0DECL(void) GMMR0Term(void)
823{
824 LogFlow(("GMMTerm:\n"));
825
826 /*
827 * Take care / be paranoid...
828 */
829 PGMM pGMM = g_pGMM;
830 if (!RT_VALID_PTR(pGMM))
831 return;
832 if (pGMM->u32Magic != GMM_MAGIC)
833 {
834 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
835 return;
836 }
837
838 /*
839 * Undo what init did and free all the resources we've acquired.
840 */
841 /* Destroy the fundamentals. */
842 g_pGMM = NULL;
843 pGMM->u32Magic = ~GMM_MAGIC;
844#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
845 RTCritSectDelete(&pGMM->GiantCritSect);
846#else
847 RTSemFastMutexDestroy(pGMM->hMtx);
848 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
849#endif
850 RTSpinlockDestroy(pGMM->hSpinLockTree);
851 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
852
853 /* Free any chunks still hanging around. */
854 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
855
856 /* Destroy the chunk locks. */
857 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
858 {
859 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
860 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
861 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
862 }
863
864 /* Finally the instance data itself. */
865 RTMemFree(pGMM);
866 LogFlow(("GMMTerm: done\n"));
867}
868
869
870/**
871 * RTAvlU32Destroy callback.
872 *
873 * @returns 0
874 * @param pNode The node to destroy.
875 * @param pvGMM The GMM handle.
876 */
877static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
878{
879 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
880
881 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
882 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
883 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
884
885 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
886 if (RT_FAILURE(rc))
887 {
888 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
889 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
890 AssertRC(rc);
891 }
892 pChunk->hMemObj = NIL_RTR0MEMOBJ;
893
894 RTMemFree(pChunk->paMappingsX);
895 pChunk->paMappingsX = NULL;
896
897 RTMemFree(pChunk);
898 NOREF(pvGMM);
899 return 0;
900}
901
902
903/**
904 * Initializes the per-VM data for the GMM.
905 *
906 * This is called from within the GVMM lock (from GVMMR0CreateVM)
907 * and should only initialize the data members so GMMR0CleanupVM
908 * can deal with them. We reserve no memory or anything here,
909 * that's done later in GMMR0InitVM.
910 *
911 * @param pGVM Pointer to the Global VM structure.
912 */
913GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
914{
915 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
916
917 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
918 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
919 pGVM->gmm.s.Stats.fMayAllocate = false;
920
921 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
922 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
923 AssertRCReturn(rc, rc);
924
925 return VINF_SUCCESS;
926}
927
928
929/**
930 * Acquires the GMM giant lock.
931 *
932 * @returns Assert status code from RTSemFastMutexRequest.
933 * @param pGMM Pointer to the GMM instance.
934 */
935static int gmmR0MutexAcquire(PGMM pGMM)
936{
937 ASMAtomicIncU32(&pGMM->cMtxContenders);
938#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
940#else
941 int rc = RTSemFastMutexRequest(pGMM->hMtx);
942#endif
943 ASMAtomicDecU32(&pGMM->cMtxContenders);
944 AssertRC(rc);
945#ifdef VBOX_STRICT
946 pGMM->hMtxOwner = RTThreadNativeSelf();
947#endif
948 return rc;
949}
950
951
952/**
953 * Releases the GMM giant lock.
954 *
955 * @returns Assert status code from RTSemFastMutexRequest.
956 * @param pGMM Pointer to the GMM instance.
957 */
958static int gmmR0MutexRelease(PGMM pGMM)
959{
960#ifdef VBOX_STRICT
961 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
962#endif
963#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
964 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
965#else
966 int rc = RTSemFastMutexRelease(pGMM->hMtx);
967 AssertRC(rc);
968#endif
969 return rc;
970}
971
972
973/**
974 * Yields the GMM giant lock if there is contention and a certain minimum time
975 * has elapsed since we took it.
976 *
977 * @returns @c true if the mutex was yielded, @c false if not.
978 * @param pGMM Pointer to the GMM instance.
979 * @param puLockNanoTS Where the lock acquisition time stamp is kept
980 * (in/out).
981 */
982static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
983{
984 /*
985 * If nobody is contending the mutex, don't bother checking the time.
986 */
987 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
988 return false;
989
990 /*
991 * Don't yield if we haven't executed for at least 2 milliseconds.
992 */
993 uint64_t uNanoNow = RTTimeSystemNanoTS();
994 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
995 return false;
996
997 /*
998 * Yield the mutex.
999 */
1000#ifdef VBOX_STRICT
1001 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1002#endif
1003 ASMAtomicIncU32(&pGMM->cMtxContenders);
1004#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1005 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1006#else
1007 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1008#endif
1009
1010 RTThreadYield();
1011
1012#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1013 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1014#else
1015 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1016#endif
1017 *puLockNanoTS = RTTimeSystemNanoTS();
1018 ASMAtomicDecU32(&pGMM->cMtxContenders);
1019#ifdef VBOX_STRICT
1020 pGMM->hMtxOwner = RTThreadNativeSelf();
1021#endif
1022
1023 return true;
1024}
1025
1026
1027/**
1028 * Acquires a chunk lock.
1029 *
1030 * The caller must own the giant lock.
1031 *
1032 * @returns Assert status code from RTSemFastMutexRequest.
1033 * @param pMtxState The chunk mutex state info. (Avoids
1034 * passing the same flags and stuff around
1035 * for subsequent release and drop-giant
1036 * calls.)
1037 * @param pGMM Pointer to the GMM instance.
1038 * @param pChunk Pointer to the chunk.
1039 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1040 */
1041static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1042{
1043 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1044 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1045
1046 pMtxState->pGMM = pGMM;
1047 pMtxState->fFlags = (uint8_t)fFlags;
1048
1049 /*
1050 * Get the lock index and reference the lock.
1051 */
1052 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1053 uint32_t iChunkMtx = pChunk->iChunkMtx;
1054 if (iChunkMtx == UINT8_MAX)
1055 {
1056 iChunkMtx = pGMM->iNextChunkMtx++;
1057 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1058
1059 /* Try get an unused one... */
1060 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1061 {
1062 iChunkMtx = pGMM->iNextChunkMtx++;
1063 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1064 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1065 {
1066 iChunkMtx = pGMM->iNextChunkMtx++;
1067 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1068 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1069 {
1070 iChunkMtx = pGMM->iNextChunkMtx++;
1071 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1072 }
1073 }
1074 }
1075
1076 pChunk->iChunkMtx = iChunkMtx;
1077 }
1078 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1079 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1080 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1081
1082 /*
1083 * Drop the giant?
1084 */
1085 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1086 {
1087 /** @todo GMM life cycle cleanup (we may race someone
1088 * destroying and cleaning up GMM)? */
1089 gmmR0MutexRelease(pGMM);
1090 }
1091
1092 /*
1093 * Take the chunk mutex.
1094 */
1095 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1096 AssertRC(rc);
1097 return rc;
1098}
1099
1100
1101/**
1102 * Releases the GMM giant lock.
1103 *
1104 * @returns Assert status code from RTSemFastMutexRequest.
1105 * @param pMtxState Pointer to the chunk mutex state.
1106 * @param pChunk Pointer to the chunk if it's still
1107 * alive, NULL if it isn't. This is used to deassociate
1108 * the chunk from the mutex on the way out so a new one
1109 * can be selected next time, thus avoiding contented
1110 * mutexes.
1111 */
1112static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1113{
1114 PGMM pGMM = pMtxState->pGMM;
1115
1116 /*
1117 * Release the chunk mutex and reacquire the giant if requested.
1118 */
1119 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1120 AssertRC(rc);
1121 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1122 rc = gmmR0MutexAcquire(pGMM);
1123 else
1124 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1125
1126 /*
1127 * Drop the chunk mutex user reference and deassociate it from the chunk
1128 * when possible.
1129 */
1130 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1131 && pChunk
1132 && RT_SUCCESS(rc) )
1133 {
1134 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1135 pChunk->iChunkMtx = UINT8_MAX;
1136 else
1137 {
1138 rc = gmmR0MutexAcquire(pGMM);
1139 if (RT_SUCCESS(rc))
1140 {
1141 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1142 pChunk->iChunkMtx = UINT8_MAX;
1143 rc = gmmR0MutexRelease(pGMM);
1144 }
1145 }
1146 }
1147
1148 pMtxState->pGMM = NULL;
1149 return rc;
1150}
1151
1152
1153/**
1154 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1155 * chunk locked.
1156 *
1157 * This only works if gmmR0ChunkMutexAcquire was called with
1158 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1159 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1160 *
1161 * @returns VBox status code (assuming success is ok).
1162 * @param pMtxState Pointer to the chunk mutex state.
1163 */
1164static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1165{
1166 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1167 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1168 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1169 /** @todo GMM life cycle cleanup (we may race someone
1170 * destroying and cleaning up GMM)? */
1171 return gmmR0MutexRelease(pMtxState->pGMM);
1172}
1173
1174
1175/**
1176 * For experimenting with NUMA affinity and such.
1177 *
1178 * @returns The current NUMA Node ID.
1179 */
1180static uint16_t gmmR0GetCurrentNumaNodeId(void)
1181{
1182#if 1
1183 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1184#else
1185 return RTMpCpuId() / 16;
1186#endif
1187}
1188
1189
1190
1191/**
1192 * Cleans up when a VM is terminating.
1193 *
1194 * @param pGVM Pointer to the Global VM structure.
1195 */
1196GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1197{
1198 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1199
1200 PGMM pGMM;
1201 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1202
1203#ifdef VBOX_WITH_PAGE_SHARING
1204 /*
1205 * Clean up all registered shared modules first.
1206 */
1207 gmmR0SharedModuleCleanup(pGMM, pGVM);
1208#endif
1209
1210 gmmR0MutexAcquire(pGMM);
1211 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1212 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1213
1214 /*
1215 * The policy is 'INVALID' until the initial reservation
1216 * request has been serviced.
1217 */
1218 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1219 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1220 {
1221 /*
1222 * If it's the last VM around, we can skip walking all the chunk looking
1223 * for the pages owned by this VM and instead flush the whole shebang.
1224 *
1225 * This takes care of the eventuality that a VM has left shared page
1226 * references behind (shouldn't happen of course, but you never know).
1227 */
1228 Assert(pGMM->cRegisteredVMs);
1229 pGMM->cRegisteredVMs--;
1230
1231 /*
1232 * Walk the entire pool looking for pages that belong to this VM
1233 * and leftover mappings. (This'll only catch private pages,
1234 * shared pages will be 'left behind'.)
1235 */
1236 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1237 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1238
1239 unsigned iCountDown = 64;
1240 bool fRedoFromStart;
1241 PGMMCHUNK pChunk;
1242 do
1243 {
1244 fRedoFromStart = false;
1245 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1246 {
1247 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1248 if ( ( !pGMM->fBoundMemoryMode
1249 || pChunk->hGVM == pGVM->hSelf)
1250 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1251 {
1252 /* We left the giant mutex, so reset the yield counters. */
1253 uLockNanoTS = RTTimeSystemNanoTS();
1254 iCountDown = 64;
1255 }
1256 else
1257 {
1258 /* Didn't leave it, so do normal yielding. */
1259 if (!iCountDown)
1260 gmmR0MutexYield(pGMM, &uLockNanoTS);
1261 else
1262 iCountDown--;
1263 }
1264 if (pGMM->cFreedChunks != cFreeChunksOld)
1265 {
1266 fRedoFromStart = true;
1267 break;
1268 }
1269 }
1270 } while (fRedoFromStart);
1271
1272 if (pGVM->gmm.s.Stats.cPrivatePages)
1273 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1274
1275 pGMM->cAllocatedPages -= cPrivatePages;
1276
1277 /*
1278 * Free empty chunks.
1279 */
1280 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1281 do
1282 {
1283 fRedoFromStart = false;
1284 iCountDown = 10240;
1285 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1286 while (pChunk)
1287 {
1288 PGMMCHUNK pNext = pChunk->pFreeNext;
1289 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1290 if ( !pGMM->fBoundMemoryMode
1291 || pChunk->hGVM == pGVM->hSelf)
1292 {
1293 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1294 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1295 {
1296 /* We've left the giant mutex, restart? (+1 for our unlink) */
1297 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1298 if (fRedoFromStart)
1299 break;
1300 uLockNanoTS = RTTimeSystemNanoTS();
1301 iCountDown = 10240;
1302 }
1303 }
1304
1305 /* Advance and maybe yield the lock. */
1306 pChunk = pNext;
1307 if (--iCountDown == 0)
1308 {
1309 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1310 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1311 && pPrivateSet->idGeneration != idGenerationOld;
1312 if (fRedoFromStart)
1313 break;
1314 iCountDown = 10240;
1315 }
1316 }
1317 } while (fRedoFromStart);
1318
1319 /*
1320 * Account for shared pages that weren't freed.
1321 */
1322 if (pGVM->gmm.s.Stats.cSharedPages)
1323 {
1324 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1325 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1326 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1327 }
1328
1329 /*
1330 * Clean up balloon statistics in case the VM process crashed.
1331 */
1332 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1333 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1334
1335 /*
1336 * Update the over-commitment management statistics.
1337 */
1338 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1339 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1340 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1341 switch (pGVM->gmm.s.Stats.enmPolicy)
1342 {
1343 case GMMOCPOLICY_NO_OC:
1344 break;
1345 default:
1346 /** @todo Update GMM->cOverCommittedPages */
1347 break;
1348 }
1349 }
1350
1351 /* zap the GVM data. */
1352 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1353 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1354 pGVM->gmm.s.Stats.fMayAllocate = false;
1355
1356 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1357 gmmR0MutexRelease(pGMM);
1358
1359 /*
1360 * Destroy the spinlock.
1361 */
1362 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1363 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1364 RTSpinlockDestroy(hSpinlock);
1365
1366 LogFlow(("GMMR0CleanupVM: returns\n"));
1367}
1368
1369
1370/**
1371 * Scan one chunk for private pages belonging to the specified VM.
1372 *
1373 * @note This function may drop the giant mutex!
1374 *
1375 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1376 * we didn't.
1377 * @param pGMM Pointer to the GMM instance.
1378 * @param pGVM The global VM handle.
1379 * @param pChunk The chunk to scan.
1380 */
1381static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1382{
1383 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1384
1385 /*
1386 * Look for pages belonging to the VM.
1387 * (Perform some internal checks while we're scanning.)
1388 */
1389#ifndef VBOX_STRICT
1390 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1391#endif
1392 {
1393 unsigned cPrivate = 0;
1394 unsigned cShared = 0;
1395 unsigned cFree = 0;
1396
1397 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1398
1399 uint16_t hGVM = pGVM->hSelf;
1400 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1401 while (iPage-- > 0)
1402 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1403 {
1404 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1405 {
1406 /*
1407 * Free the page.
1408 *
1409 * The reason for not using gmmR0FreePrivatePage here is that we
1410 * must *not* cause the chunk to be freed from under us - we're in
1411 * an AVL tree walk here.
1412 */
1413 pChunk->aPages[iPage].u = 0;
1414 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1415 pChunk->aPages[iPage].Free.fZeroed = false;
1416 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1417 pChunk->iFreeHead = iPage;
1418 pChunk->cPrivate--;
1419 pChunk->cFree++;
1420 pGVM->gmm.s.Stats.cPrivatePages--;
1421 cFree++;
1422 }
1423 else
1424 cPrivate++;
1425 }
1426 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1427 cFree++;
1428 else
1429 cShared++;
1430
1431 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1432
1433 /*
1434 * Did it add up?
1435 */
1436 if (RT_UNLIKELY( pChunk->cFree != cFree
1437 || pChunk->cPrivate != cPrivate
1438 || pChunk->cShared != cShared))
1439 {
1440 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1441 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1442 pChunk->cFree = cFree;
1443 pChunk->cPrivate = cPrivate;
1444 pChunk->cShared = cShared;
1445 }
1446 }
1447
1448 /*
1449 * If not in bound memory mode, we should reset the hGVM field
1450 * if it has our handle in it.
1451 */
1452 if (pChunk->hGVM == pGVM->hSelf)
1453 {
1454 if (!g_pGMM->fBoundMemoryMode)
1455 pChunk->hGVM = NIL_GVM_HANDLE;
1456 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1457 {
1458 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1459 pChunk, pChunk->Core.Key, pChunk->cFree);
1460 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1461
1462 gmmR0UnlinkChunk(pChunk);
1463 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1464 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1465 }
1466 }
1467
1468 /*
1469 * Look for a mapping belonging to the terminating VM.
1470 */
1471 GMMR0CHUNKMTXSTATE MtxState;
1472 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1473 unsigned cMappings = pChunk->cMappingsX;
1474 for (unsigned i = 0; i < cMappings; i++)
1475 if (pChunk->paMappingsX[i].pGVM == pGVM)
1476 {
1477 gmmR0ChunkMutexDropGiant(&MtxState);
1478
1479 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1480
1481 cMappings--;
1482 if (i < cMappings)
1483 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1484 pChunk->paMappingsX[cMappings].pGVM = NULL;
1485 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1486 Assert(pChunk->cMappingsX - 1U == cMappings);
1487 pChunk->cMappingsX = cMappings;
1488
1489 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1490 if (RT_FAILURE(rc))
1491 {
1492 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1493 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1494 AssertRC(rc);
1495 }
1496
1497 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1498 return true;
1499 }
1500
1501 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1502 return false;
1503}
1504
1505
1506/**
1507 * The initial resource reservations.
1508 *
1509 * This will make memory reservations according to policy and priority. If there aren't
1510 * sufficient resources available to sustain the VM this function will fail and all
1511 * future allocations requests will fail as well.
1512 *
1513 * These are just the initial reservations made very very early during the VM creation
1514 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1515 * ring-3 init has completed.
1516 *
1517 * @returns VBox status code.
1518 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1519 * @retval VERR_GMM_
1520 *
1521 * @param pGVM The global (ring-0) VM structure.
1522 * @param idCpu The VCPU id - must be zero.
1523 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1524 * This does not include MMIO2 and similar.
1525 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1526 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1527 * hyper heap, MMIO2 and similar.
1528 * @param enmPolicy The OC policy to use on this VM.
1529 * @param enmPriority The priority in an out-of-memory situation.
1530 *
1531 * @thread The creator thread / EMT(0).
1532 */
1533GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1534 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1535{
1536 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1537 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1538
1539 /*
1540 * Validate, get basics and take the semaphore.
1541 */
1542 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1543 PGMM pGMM;
1544 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1545 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1546 if (RT_FAILURE(rc))
1547 return rc;
1548
1549 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1550 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1551 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1552 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1553 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1554
1555 gmmR0MutexAcquire(pGMM);
1556 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1557 {
1558 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1559 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1560 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1561 {
1562 /*
1563 * Check if we can accommodate this.
1564 */
1565 /* ... later ... */
1566 if (RT_SUCCESS(rc))
1567 {
1568 /*
1569 * Update the records.
1570 */
1571 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1572 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1573 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1574 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1575 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1576 pGVM->gmm.s.Stats.fMayAllocate = true;
1577
1578 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1579 pGMM->cRegisteredVMs++;
1580 }
1581 }
1582 else
1583 rc = VERR_WRONG_ORDER;
1584 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1585 }
1586 else
1587 rc = VERR_GMM_IS_NOT_SANE;
1588 gmmR0MutexRelease(pGMM);
1589 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1590 return rc;
1591}
1592
1593
1594/**
1595 * VMMR0 request wrapper for GMMR0InitialReservation.
1596 *
1597 * @returns see GMMR0InitialReservation.
1598 * @param pGVM The global (ring-0) VM structure.
1599 * @param idCpu The VCPU id.
1600 * @param pReq Pointer to the request packet.
1601 */
1602GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1603{
1604 /*
1605 * Validate input and pass it on.
1606 */
1607 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1608 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1609 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1610
1611 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1612 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1613}
1614
1615
1616/**
1617 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1618 *
1619 * @returns VBox status code.
1620 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1621 *
1622 * @param pGVM The global (ring-0) VM structure.
1623 * @param idCpu The VCPU id.
1624 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1625 * This does not include MMIO2 and similar.
1626 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1627 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1628 * hyper heap, MMIO2 and similar.
1629 *
1630 * @thread EMT(idCpu)
1631 */
1632GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1633 uint32_t cShadowPages, uint32_t cFixedPages)
1634{
1635 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1636 pGVM, cBasePages, cShadowPages, cFixedPages));
1637
1638 /*
1639 * Validate, get basics and take the semaphore.
1640 */
1641 PGMM pGMM;
1642 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1643 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1644 if (RT_FAILURE(rc))
1645 return rc;
1646
1647 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1648 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1649 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1650
1651 gmmR0MutexAcquire(pGMM);
1652 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1653 {
1654 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1655 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1656 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1657 {
1658 /*
1659 * Check if we can accommodate this.
1660 */
1661 /* ... later ... */
1662 if (RT_SUCCESS(rc))
1663 {
1664 /*
1665 * Update the records.
1666 */
1667 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1668 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1669 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1670 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1671
1672 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1673 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1674 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1675 }
1676 }
1677 else
1678 rc = VERR_WRONG_ORDER;
1679 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1680 }
1681 else
1682 rc = VERR_GMM_IS_NOT_SANE;
1683 gmmR0MutexRelease(pGMM);
1684 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1685 return rc;
1686}
1687
1688
1689/**
1690 * VMMR0 request wrapper for GMMR0UpdateReservation.
1691 *
1692 * @returns see GMMR0UpdateReservation.
1693 * @param pGVM The global (ring-0) VM structure.
1694 * @param idCpu The VCPU id.
1695 * @param pReq Pointer to the request packet.
1696 */
1697GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1698{
1699 /*
1700 * Validate input and pass it on.
1701 */
1702 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1703 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1704
1705 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1706}
1707
1708#ifdef GMMR0_WITH_SANITY_CHECK
1709
1710/**
1711 * Performs sanity checks on a free set.
1712 *
1713 * @returns Error count.
1714 *
1715 * @param pGMM Pointer to the GMM instance.
1716 * @param pSet Pointer to the set.
1717 * @param pszSetName The set name.
1718 * @param pszFunction The function from which it was called.
1719 * @param uLine The line number.
1720 */
1721static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1722 const char *pszFunction, unsigned uLineNo)
1723{
1724 uint32_t cErrors = 0;
1725
1726 /*
1727 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1728 */
1729 uint32_t cPages = 0;
1730 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1731 {
1732 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1733 {
1734 /** @todo check that the chunk is hash into the right set. */
1735 cPages += pCur->cFree;
1736 }
1737 }
1738 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1739 {
1740 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1741 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1742 cErrors++;
1743 }
1744
1745 return cErrors;
1746}
1747
1748
1749/**
1750 * Performs some sanity checks on the GMM while owning lock.
1751 *
1752 * @returns Error count.
1753 *
1754 * @param pGMM Pointer to the GMM instance.
1755 * @param pszFunction The function from which it is called.
1756 * @param uLineNo The line number.
1757 */
1758static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1759{
1760 uint32_t cErrors = 0;
1761
1762 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1763 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1764 /** @todo add more sanity checks. */
1765
1766 return cErrors;
1767}
1768
1769#endif /* GMMR0_WITH_SANITY_CHECK */
1770
1771/**
1772 * Looks up a chunk in the tree and fill in the TLB entry for it.
1773 *
1774 * This is not expected to fail and will bitch if it does.
1775 *
1776 * @returns Pointer to the allocation chunk, NULL if not found.
1777 * @param pGMM Pointer to the GMM instance.
1778 * @param idChunk The ID of the chunk to find.
1779 * @param pTlbe Pointer to the TLB entry.
1780 *
1781 * @note Caller owns spinlock.
1782 */
1783static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1784{
1785 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1786 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1787 pTlbe->idChunk = idChunk;
1788 pTlbe->pChunk = pChunk;
1789 return pChunk;
1790}
1791
1792
1793/**
1794 * Finds a allocation chunk, spin-locked.
1795 *
1796 * This is not expected to fail and will bitch if it does.
1797 *
1798 * @returns Pointer to the allocation chunk, NULL if not found.
1799 * @param pGMM Pointer to the GMM instance.
1800 * @param idChunk The ID of the chunk to find.
1801 */
1802DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1803{
1804 /*
1805 * Do a TLB lookup, branch if not in the TLB.
1806 */
1807 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1808 PGMMCHUNK pChunk = pTlbe->pChunk;
1809 if ( pChunk == NULL
1810 || pTlbe->idChunk != idChunk)
1811 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1812 return pChunk;
1813}
1814
1815
1816/**
1817 * Finds a allocation chunk.
1818 *
1819 * This is not expected to fail and will bitch if it does.
1820 *
1821 * @returns Pointer to the allocation chunk, NULL if not found.
1822 * @param pGMM Pointer to the GMM instance.
1823 * @param idChunk The ID of the chunk to find.
1824 */
1825DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1826{
1827 RTSpinlockAcquire(pGMM->hSpinLockTree);
1828 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1829 RTSpinlockRelease(pGMM->hSpinLockTree);
1830 return pChunk;
1831}
1832
1833
1834/**
1835 * Finds a page.
1836 *
1837 * This is not expected to fail and will bitch if it does.
1838 *
1839 * @returns Pointer to the page, NULL if not found.
1840 * @param pGMM Pointer to the GMM instance.
1841 * @param idPage The ID of the page to find.
1842 */
1843DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1844{
1845 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1846 if (RT_LIKELY(pChunk))
1847 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1848 return NULL;
1849}
1850
1851
1852#if 0 /* unused */
1853/**
1854 * Gets the host physical address for a page given by it's ID.
1855 *
1856 * @returns The host physical address or NIL_RTHCPHYS.
1857 * @param pGMM Pointer to the GMM instance.
1858 * @param idPage The ID of the page to find.
1859 */
1860DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1861{
1862 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1863 if (RT_LIKELY(pChunk))
1864 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1865 return NIL_RTHCPHYS;
1866}
1867#endif /* unused */
1868
1869
1870/**
1871 * Selects the appropriate free list given the number of free pages.
1872 *
1873 * @returns Free list index.
1874 * @param cFree The number of free pages in the chunk.
1875 */
1876DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1877{
1878 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1879 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1880 ("%d (%u)\n", iList, cFree));
1881 return iList;
1882}
1883
1884
1885/**
1886 * Unlinks the chunk from the free list it's currently on (if any).
1887 *
1888 * @param pChunk The allocation chunk.
1889 */
1890DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1891{
1892 PGMMCHUNKFREESET pSet = pChunk->pSet;
1893 if (RT_LIKELY(pSet))
1894 {
1895 pSet->cFreePages -= pChunk->cFree;
1896 pSet->idGeneration++;
1897
1898 PGMMCHUNK pPrev = pChunk->pFreePrev;
1899 PGMMCHUNK pNext = pChunk->pFreeNext;
1900 if (pPrev)
1901 pPrev->pFreeNext = pNext;
1902 else
1903 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1904 if (pNext)
1905 pNext->pFreePrev = pPrev;
1906
1907 pChunk->pSet = NULL;
1908 pChunk->pFreeNext = NULL;
1909 pChunk->pFreePrev = NULL;
1910 }
1911 else
1912 {
1913 Assert(!pChunk->pFreeNext);
1914 Assert(!pChunk->pFreePrev);
1915 Assert(!pChunk->cFree);
1916 }
1917}
1918
1919
1920/**
1921 * Links the chunk onto the appropriate free list in the specified free set.
1922 *
1923 * If no free entries, it's not linked into any list.
1924 *
1925 * @param pChunk The allocation chunk.
1926 * @param pSet The free set.
1927 */
1928DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1929{
1930 Assert(!pChunk->pSet);
1931 Assert(!pChunk->pFreeNext);
1932 Assert(!pChunk->pFreePrev);
1933
1934 if (pChunk->cFree > 0)
1935 {
1936 pChunk->pSet = pSet;
1937 pChunk->pFreePrev = NULL;
1938 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1939 pChunk->pFreeNext = pSet->apLists[iList];
1940 if (pChunk->pFreeNext)
1941 pChunk->pFreeNext->pFreePrev = pChunk;
1942 pSet->apLists[iList] = pChunk;
1943
1944 pSet->cFreePages += pChunk->cFree;
1945 pSet->idGeneration++;
1946 }
1947}
1948
1949
1950/**
1951 * Links the chunk onto the appropriate free list in the specified free set.
1952 *
1953 * If no free entries, it's not linked into any list.
1954 *
1955 * @param pGMM Pointer to the GMM instance.
1956 * @param pGVM Pointer to the kernel-only VM instace data.
1957 * @param pChunk The allocation chunk.
1958 */
1959DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1960{
1961 PGMMCHUNKFREESET pSet;
1962 if (pGMM->fBoundMemoryMode)
1963 pSet = &pGVM->gmm.s.Private;
1964 else if (pChunk->cShared)
1965 pSet = &pGMM->Shared;
1966 else
1967 pSet = &pGMM->PrivateX;
1968 gmmR0LinkChunk(pChunk, pSet);
1969}
1970
1971
1972/**
1973 * Frees a Chunk ID.
1974 *
1975 * @param pGMM Pointer to the GMM instance.
1976 * @param idChunk The Chunk ID to free.
1977 */
1978static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1979{
1980 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1981 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1982 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1983}
1984
1985
1986/**
1987 * Allocates a new Chunk ID.
1988 *
1989 * @returns The Chunk ID.
1990 * @param pGMM Pointer to the GMM instance.
1991 */
1992static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1993{
1994 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1995 AssertCompile(NIL_GMM_CHUNKID == 0);
1996
1997 /*
1998 * Try the next sequential one.
1999 */
2000 int32_t idChunk = ++pGMM->idChunkPrev;
2001 if ( (uint32_t)idChunk <= GMM_CHUNKID_LAST
2002 && idChunk > NIL_GMM_CHUNKID
2003 && !ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk))
2004 return idChunk;
2005
2006 /*
2007 * Scan sequentially from the last one.
2008 */
2009 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2010 && idChunk > NIL_GMM_CHUNKID)
2011 {
2012 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2013 if (idChunk > NIL_GMM_CHUNKID)
2014 {
2015 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2016 return pGMM->idChunkPrev = idChunk;
2017 }
2018 }
2019
2020 /*
2021 * Ok, scan from the start.
2022 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2023 */
2024 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2025 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2026 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2027
2028 return pGMM->idChunkPrev = idChunk;
2029}
2030
2031
2032/**
2033 * Allocates one private page.
2034 *
2035 * Worker for gmmR0AllocatePages.
2036 *
2037 * @param pChunk The chunk to allocate it from.
2038 * @param hGVM The GVM handle of the VM requesting memory.
2039 * @param pPageDesc The page descriptor.
2040 */
2041static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2042{
2043 /* update the chunk stats. */
2044 if (pChunk->hGVM == NIL_GVM_HANDLE)
2045 pChunk->hGVM = hGVM;
2046 Assert(pChunk->cFree);
2047 pChunk->cFree--;
2048 pChunk->cPrivate++;
2049
2050 /* unlink the first free page. */
2051 const uint32_t iPage = pChunk->iFreeHead;
2052 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2053 PGMMPAGE pPage = &pChunk->aPages[iPage];
2054 Assert(GMM_PAGE_IS_FREE(pPage));
2055 pChunk->iFreeHead = pPage->Free.iNext;
2056 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2057 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2058 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2059
2060 bool const fZeroed = pPage->Free.fZeroed;
2061
2062 /* make the page private. */
2063 pPage->u = 0;
2064 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2065 pPage->Private.hGVM = hGVM;
2066 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2067 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2068 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2069 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2070 else
2071 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2072
2073 /* update the page descriptor. */
2074 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2075 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2076 RTHCPHYS const HCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2077 Assert(HCPhys != NIL_RTHCPHYS); Assert(HCPhys < NIL_GMMPAGEDESC_PHYS);
2078 pPageDesc->HCPhysGCPhys = HCPhys;
2079 pPageDesc->fZeroed = fZeroed;
2080}
2081
2082
2083/**
2084 * Picks the free pages from a chunk.
2085 *
2086 * @returns The new page descriptor table index.
2087 * @param pChunk The chunk.
2088 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2089 * affinity.
2090 * @param iPage The current page descriptor table index.
2091 * @param cPages The total number of pages to allocate.
2092 * @param paPages The page descriptor table (input + ouput).
2093 */
2094static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2095 PGMMPAGEDESC paPages)
2096{
2097 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2098 gmmR0UnlinkChunk(pChunk);
2099
2100 for (; pChunk->cFree && iPage < cPages; iPage++)
2101 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2102
2103 gmmR0LinkChunk(pChunk, pSet);
2104 return iPage;
2105}
2106
2107
2108/**
2109 * Registers a new chunk of memory.
2110 *
2111 * This is called by gmmR0AllocateOneChunk and GMMR0AllocateLargePage.
2112 *
2113 * In the GMMR0AllocateLargePage case the GMM_CHUNK_FLAGS_LARGE_PAGE flag is
2114 * set and the chunk will be registered as fully allocated to save time.
2115 *
2116 * @returns VBox status code. On success, the giant GMM lock will be held, the
2117 * caller must release it (ugly).
2118 * @param pGMM Pointer to the GMM instance.
2119 * @param pSet Pointer to the set.
2120 * @param hMemObj The memory object for the chunk.
2121 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2122 * affinity.
2123 * @param pSession Same as @a hGVM.
2124 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2125 * @param ppChunk Chunk address (out).
2126 *
2127 * @remarks The caller must not own the giant GMM mutex.
2128 * The giant GMM mutex will be acquired and returned acquired in
2129 * the success path. On failure, no locks will be held.
2130 */
2131static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, PSUPDRVSESSION pSession,
2132 uint16_t fChunkFlags, PGMMCHUNK *ppChunk)
2133{
2134 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2135 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2136 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2137
2138#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2139 /*
2140 * Get a ring-0 mapping of the object.
2141 */
2142 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2143 if (!pbMapping)
2144 {
2145 RTR0MEMOBJ hMapObj;
2146 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2147 if (RT_SUCCESS(rc))
2148 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2149 else
2150 return rc;
2151 AssertPtr(pbMapping);
2152 }
2153#endif
2154
2155 /*
2156 * Allocate a chunk.
2157 */
2158 int rc;
2159 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2160 if (pChunk)
2161 {
2162 /*
2163 * Initialize it.
2164 */
2165 pChunk->hMemObj = hMemObj;
2166#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2167 pChunk->pbMapping = pbMapping;
2168#endif
2169 pChunk->hGVM = hGVM;
2170 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2171 pChunk->iChunkMtx = UINT8_MAX;
2172 pChunk->fFlags = fChunkFlags;
2173 pChunk->uidOwner = pSession ? SUPR0GetSessionUid(pSession) : NIL_RTUID;
2174 /*pChunk->cShared = 0; */
2175
2176 if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2177 {
2178 /* Queue all pages on the free list. */
2179 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2180 /*pChunk->cPrivate = 0; */
2181 /*pChunk->iFreeHead = 0;*/
2182
2183 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2184 {
2185 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2186 pChunk->aPages[iPage].Free.fZeroed = true;
2187 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2188 }
2189 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2190 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.fZeroed = true;
2191 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2192 }
2193 else
2194 {
2195 /* Mark all pages as privately allocated (watered down gmmR0AllocatePage). */
2196 pChunk->cFree = 0;
2197 pChunk->cPrivate = GMM_CHUNK_NUM_PAGES;
2198 pChunk->iFreeHead = UINT16_MAX;
2199
2200 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages); iPage++)
2201 {
2202 pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2203 pChunk->aPages[iPage].Private.hGVM = hGVM;
2204 pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2205 }
2206 }
2207
2208 /*
2209 * Zero the memory if it wasn't zeroed by the host already.
2210 * This simplifies keeping secret kernel bits from userland and brings
2211 * everyone to the same level wrt allocation zeroing.
2212 */
2213 rc = VINF_SUCCESS;
2214 if (!RTR0MemObjWasZeroInitialized(hMemObj))
2215 {
2216#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2217 for (uint32_t iPage = 0; iPage < (GMM_CHUNK_SIZE >> PAGE_SHIFT); iPage++)
2218 {
2219 void *pvPage = NULL;
2220 rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, iPage), &pvPage);
2221 AssertRC(rc);
2222 if (RT_SUCCESS(rc))
2223 RT_BZERO(pvPage, PAGE_SIZE);
2224 else
2225 break;
2226 }
2227#else
2228 RT_BZERO(pbMapping, GMM_CHUNK_SIZE);
2229#endif
2230 }
2231 if (RT_SUCCESS(rc))
2232 {
2233 *ppChunk = pChunk;
2234
2235 /*
2236 * Allocate a Chunk ID and insert it into the tree.
2237 * This has to be done behind the mutex of course.
2238 */
2239 rc = gmmR0MutexAcquire(pGMM);
2240 if (RT_SUCCESS(rc))
2241 {
2242 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2243 {
2244 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2245 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2246 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2247 {
2248 RTSpinlockAcquire(pGMM->hSpinLockTree);
2249 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2250 {
2251 pGMM->cChunks++;
2252 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2253 RTSpinlockRelease(pGMM->hSpinLockTree);
2254
2255 gmmR0LinkChunk(pChunk, pSet);
2256
2257 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2258
2259 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2260 return VINF_SUCCESS;
2261 }
2262 RTSpinlockRelease(pGMM->hSpinLockTree);
2263 }
2264
2265 /*
2266 * Bail out.
2267 */
2268 rc = VERR_GMM_CHUNK_INSERT;
2269 }
2270 else
2271 rc = VERR_GMM_IS_NOT_SANE;
2272 gmmR0MutexRelease(pGMM);
2273 }
2274
2275 *ppChunk = NULL;
2276 }
2277 RTMemFree(pChunk);
2278 }
2279 else
2280 rc = VERR_NO_MEMORY;
2281 return rc;
2282}
2283
2284
2285/**
2286 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2287 * what's remaining to the specified free set.
2288 *
2289 * @note This will leave the giant mutex while allocating the new chunk!
2290 *
2291 * @returns VBox status code.
2292 * @param pGMM Pointer to the GMM instance data.
2293 * @param pGVM Pointer to the kernel-only VM instace data.
2294 * @param pSet Pointer to the free set.
2295 * @param cPages The number of pages requested.
2296 * @param paPages The page descriptor table (input + output).
2297 * @param piPage The pointer to the page descriptor table index variable.
2298 * This will be updated.
2299 */
2300static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2301 PGMMPAGEDESC paPages, uint32_t *piPage)
2302{
2303 gmmR0MutexRelease(pGMM);
2304
2305 RTR0MEMOBJ hMemObj;
2306 int rc;
2307#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2308 if (pGMM->fHasWorkingAllocPhysNC)
2309 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2310 else
2311#endif
2312 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2313 if (RT_SUCCESS(rc))
2314 {
2315 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2316 * free pages first and then unchaining them right afterwards. Instead
2317 * do as much work as possible without holding the giant lock. */
2318 PGMMCHUNK pChunk;
2319 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, 0 /*fChunkFlags*/, &pChunk);
2320 if (RT_SUCCESS(rc))
2321 {
2322 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2323 return VINF_SUCCESS;
2324 }
2325
2326 /* bail out */
2327 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2328 }
2329
2330 int rc2 = gmmR0MutexAcquire(pGMM);
2331 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2332 return rc;
2333
2334}
2335
2336
2337/**
2338 * As a last restort we'll pick any page we can get.
2339 *
2340 * @returns The new page descriptor table index.
2341 * @param pSet The set to pick from.
2342 * @param pGVM Pointer to the global VM structure.
2343 * @param uidSelf The UID of the caller.
2344 * @param iPage The current page descriptor table index.
2345 * @param cPages The total number of pages to allocate.
2346 * @param paPages The page descriptor table (input + ouput).
2347 */
2348static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2349 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2350{
2351 unsigned iList = RT_ELEMENTS(pSet->apLists);
2352 while (iList-- > 0)
2353 {
2354 PGMMCHUNK pChunk = pSet->apLists[iList];
2355 while (pChunk)
2356 {
2357 PGMMCHUNK pNext = pChunk->pFreeNext;
2358 if ( pChunk->uidOwner == uidSelf
2359 || ( pChunk->cMappingsX == 0
2360 && pChunk->cFree == (GMM_CHUNK_SIZE >> PAGE_SHIFT)))
2361 {
2362 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2363 if (iPage >= cPages)
2364 return iPage;
2365 }
2366
2367 pChunk = pNext;
2368 }
2369 }
2370 return iPage;
2371}
2372
2373
2374/**
2375 * Pick pages from empty chunks on the same NUMA node.
2376 *
2377 * @returns The new page descriptor table index.
2378 * @param pSet The set to pick from.
2379 * @param pGVM Pointer to the global VM structure.
2380 * @param uidSelf The UID of the caller.
2381 * @param iPage The current page descriptor table index.
2382 * @param cPages The total number of pages to allocate.
2383 * @param paPages The page descriptor table (input + ouput).
2384 */
2385static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2386 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2387{
2388 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2389 if (pChunk)
2390 {
2391 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2392 while (pChunk)
2393 {
2394 PGMMCHUNK pNext = pChunk->pFreeNext;
2395
2396 if ( pChunk->idNumaNode == idNumaNode
2397 && ( pChunk->uidOwner == uidSelf
2398 || pChunk->cMappingsX == 0))
2399 {
2400 pChunk->hGVM = pGVM->hSelf;
2401 pChunk->uidOwner = uidSelf;
2402 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2403 if (iPage >= cPages)
2404 {
2405 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2406 return iPage;
2407 }
2408 }
2409
2410 pChunk = pNext;
2411 }
2412 }
2413 return iPage;
2414}
2415
2416
2417/**
2418 * Pick pages from non-empty chunks on the same NUMA node.
2419 *
2420 * @returns The new page descriptor table index.
2421 * @param pSet The set to pick from.
2422 * @param pGVM Pointer to the global VM structure.
2423 * @param uidSelf The UID of the caller.
2424 * @param iPage The current page descriptor table index.
2425 * @param cPages The total number of pages to allocate.
2426 * @param paPages The page descriptor table (input + ouput).
2427 */
2428static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID const uidSelf,
2429 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2430{
2431 /** @todo start by picking from chunks with about the right size first? */
2432 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2433 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2434 while (iList-- > 0)
2435 {
2436 PGMMCHUNK pChunk = pSet->apLists[iList];
2437 while (pChunk)
2438 {
2439 PGMMCHUNK pNext = pChunk->pFreeNext;
2440
2441 if ( pChunk->idNumaNode == idNumaNode
2442 && pChunk->uidOwner == uidSelf)
2443 {
2444 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2445 if (iPage >= cPages)
2446 {
2447 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2448 return iPage;
2449 }
2450 }
2451
2452 pChunk = pNext;
2453 }
2454 }
2455 return iPage;
2456}
2457
2458
2459/**
2460 * Pick pages that are in chunks already associated with the VM.
2461 *
2462 * @returns The new page descriptor table index.
2463 * @param pGMM Pointer to the GMM instance data.
2464 * @param pGVM Pointer to the global VM structure.
2465 * @param pSet The set to pick from.
2466 * @param iPage The current page descriptor table index.
2467 * @param cPages The total number of pages to allocate.
2468 * @param paPages The page descriptor table (input + ouput).
2469 */
2470static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2471 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2472{
2473 uint16_t const hGVM = pGVM->hSelf;
2474
2475 /* Hint. */
2476 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2477 {
2478 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2479 if (pChunk && pChunk->cFree)
2480 {
2481 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2482 if (iPage >= cPages)
2483 return iPage;
2484 }
2485 }
2486
2487 /* Scan. */
2488 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2489 {
2490 PGMMCHUNK pChunk = pSet->apLists[iList];
2491 while (pChunk)
2492 {
2493 PGMMCHUNK pNext = pChunk->pFreeNext;
2494
2495 if (pChunk->hGVM == hGVM)
2496 {
2497 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2498 if (iPage >= cPages)
2499 {
2500 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2501 return iPage;
2502 }
2503 }
2504
2505 pChunk = pNext;
2506 }
2507 }
2508 return iPage;
2509}
2510
2511
2512
2513/**
2514 * Pick pages in bound memory mode.
2515 *
2516 * @returns The new page descriptor table index.
2517 * @param pGVM Pointer to the global VM structure.
2518 * @param iPage The current page descriptor table index.
2519 * @param cPages The total number of pages to allocate.
2520 * @param paPages The page descriptor table (input + ouput).
2521 */
2522static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2523{
2524 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2525 {
2526 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2527 while (pChunk)
2528 {
2529 Assert(pChunk->hGVM == pGVM->hSelf);
2530 PGMMCHUNK pNext = pChunk->pFreeNext;
2531 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2532 if (iPage >= cPages)
2533 return iPage;
2534 pChunk = pNext;
2535 }
2536 }
2537 return iPage;
2538}
2539
2540
2541/**
2542 * Checks if we should start picking pages from chunks of other VMs because
2543 * we're getting close to the system memory or reserved limit.
2544 *
2545 * @returns @c true if we should, @c false if we should first try allocate more
2546 * chunks.
2547 */
2548static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2549{
2550 /*
2551 * Don't allocate a new chunk if we're
2552 */
2553 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2554 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2555 - pGVM->gmm.s.Stats.cBalloonedPages
2556 /** @todo what about shared pages? */;
2557 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2558 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2559 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2560 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2561 return true;
2562 /** @todo make the threshold configurable, also test the code to see if
2563 * this ever kicks in (we might be reserving too much or smth). */
2564
2565 /*
2566 * Check how close we're to the max memory limit and how many fragments
2567 * there are?...
2568 */
2569 /** @todo */
2570
2571 return false;
2572}
2573
2574
2575/**
2576 * Checks if we should start picking pages from chunks of other VMs because
2577 * there is a lot of free pages around.
2578 *
2579 * @returns @c true if we should, @c false if we should first try allocate more
2580 * chunks.
2581 */
2582static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2583{
2584 /*
2585 * Setting the limit at 16 chunks (32 MB) at the moment.
2586 */
2587 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2588 return true;
2589 return false;
2590}
2591
2592
2593/**
2594 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2595 *
2596 * @returns VBox status code:
2597 * @retval VINF_SUCCESS on success.
2598 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2599 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2600 * that is we're trying to allocate more than we've reserved.
2601 *
2602 * @param pGMM Pointer to the GMM instance data.
2603 * @param pGVM Pointer to the VM.
2604 * @param cPages The number of pages to allocate.
2605 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2606 * details on what is expected on input.
2607 * @param enmAccount The account to charge.
2608 *
2609 * @remarks Caller owns the giant GMM lock.
2610 */
2611static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2612{
2613 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2614
2615 /*
2616 * Check allocation limits.
2617 */
2618 if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2619 { /* likely */ }
2620 else
2621 return VERR_GMM_HIT_GLOBAL_LIMIT;
2622
2623 switch (enmAccount)
2624 {
2625 case GMMACCOUNT_BASE:
2626 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2627 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
2628 { /* likely */ }
2629 else
2630 {
2631 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2632 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2633 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2634 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2635 }
2636 break;
2637 case GMMACCOUNT_SHADOW:
2638 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2639 { /* likely */ }
2640 else
2641 {
2642 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2643 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2644 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2645 }
2646 break;
2647 case GMMACCOUNT_FIXED:
2648 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2649 { /* likely */ }
2650 else
2651 {
2652 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2653 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2654 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2655 }
2656 break;
2657 default:
2658 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2659 }
2660
2661 /*
2662 * Update the accounts before we proceed because we might be leaving the
2663 * protection of the global mutex and thus run the risk of permitting
2664 * too much memory to be allocated.
2665 */
2666 switch (enmAccount)
2667 {
2668 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2669 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2670 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2671 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2672 }
2673 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2674 pGMM->cAllocatedPages += cPages;
2675
2676 /*
2677 * Bound mode is also relatively straightforward.
2678 */
2679 uint32_t iPage = 0;
2680 int rc = VINF_SUCCESS;
2681 if (pGMM->fBoundMemoryMode)
2682 {
2683 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2684 if (iPage < cPages)
2685 do
2686 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2687 while (iPage < cPages && RT_SUCCESS(rc));
2688 }
2689 /*
2690 * Shared mode is trickier as we should try archive the same locality as
2691 * in bound mode, but smartly make use of non-full chunks allocated by
2692 * other VMs if we're low on memory.
2693 */
2694 else
2695 {
2696 RTUID const uidSelf = SUPR0GetSessionUid(pGVM->pSession);
2697
2698 /* Pick the most optimal pages first. */
2699 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2700 if (iPage < cPages)
2701 {
2702 /* Maybe we should try getting pages from chunks "belonging" to
2703 other VMs before allocating more chunks? */
2704 bool fTriedOnSameAlready = false;
2705 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2706 {
2707 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2708 fTriedOnSameAlready = true;
2709 }
2710
2711 /* Allocate memory from empty chunks. */
2712 if (iPage < cPages)
2713 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2714
2715 /* Grab empty shared chunks. */
2716 if (iPage < cPages)
2717 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, uidSelf, iPage, cPages, paPages);
2718
2719 /* If there is a lof of free pages spread around, try not waste
2720 system memory on more chunks. (Should trigger defragmentation.) */
2721 if ( !fTriedOnSameAlready
2722 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2723 {
2724 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2725 if (iPage < cPages)
2726 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2727 }
2728
2729 /*
2730 * Ok, try allocate new chunks.
2731 */
2732 if (iPage < cPages)
2733 {
2734 do
2735 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2736 while (iPage < cPages && RT_SUCCESS(rc));
2737
2738#if 0 /* We cannot mix chunks with different UIDs. */
2739 /* If the host is out of memory, take whatever we can get. */
2740 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2741 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2742 {
2743 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2744 if (iPage < cPages)
2745 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2746 AssertRelease(iPage == cPages);
2747 rc = VINF_SUCCESS;
2748 }
2749#endif
2750 }
2751 }
2752 }
2753
2754 /*
2755 * Clean up on failure. Since this is bound to be a low-memory condition
2756 * we will give back any empty chunks that might be hanging around.
2757 */
2758 if (RT_SUCCESS(rc))
2759 { /* likely */ }
2760 else
2761 {
2762 /* Update the statistics. */
2763 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2764 pGMM->cAllocatedPages -= cPages - iPage;
2765 switch (enmAccount)
2766 {
2767 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2768 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2769 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2770 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2771 }
2772
2773 /* Release the pages. */
2774 while (iPage-- > 0)
2775 {
2776 uint32_t idPage = paPages[iPage].idPage;
2777 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2778 if (RT_LIKELY(pPage))
2779 {
2780 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2781 Assert(pPage->Private.hGVM == pGVM->hSelf);
2782 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2783 }
2784 else
2785 AssertMsgFailed(("idPage=%#x\n", idPage));
2786
2787 paPages[iPage].idPage = NIL_GMM_PAGEID;
2788 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2789 paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2790 paPages[iPage].fZeroed = false;
2791 }
2792
2793 /* Free empty chunks. */
2794 /** @todo */
2795
2796 /* return the fail status on failure */
2797 return rc;
2798 }
2799 return VINF_SUCCESS;
2800}
2801
2802
2803/**
2804 * Updates the previous allocations and allocates more pages.
2805 *
2806 * The handy pages are always taken from the 'base' memory account.
2807 * The allocated pages are not cleared and will contains random garbage.
2808 *
2809 * @returns VBox status code:
2810 * @retval VINF_SUCCESS on success.
2811 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2812 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2813 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2814 * private page.
2815 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2816 * shared page.
2817 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2818 * owned by the VM.
2819 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2820 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2821 * that is we're trying to allocate more than we've reserved.
2822 *
2823 * @param pGVM The global (ring-0) VM structure.
2824 * @param idCpu The VCPU id.
2825 * @param cPagesToUpdate The number of pages to update (starting from the head).
2826 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2827 * @param paPages The array of page descriptors.
2828 * See GMMPAGEDESC for details on what is expected on input.
2829 * @thread EMT(idCpu)
2830 */
2831GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2832 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2833{
2834 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2835 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2836
2837 /*
2838 * Validate, get basics and take the semaphore.
2839 * (This is a relatively busy path, so make predictions where possible.)
2840 */
2841 PGMM pGMM;
2842 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2843 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2844 if (RT_FAILURE(rc))
2845 return rc;
2846
2847 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2848 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2849 || (cPagesToAlloc && cPagesToAlloc < 1024),
2850 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2851 VERR_INVALID_PARAMETER);
2852
2853 unsigned iPage = 0;
2854 for (; iPage < cPagesToUpdate; iPage++)
2855 {
2856 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2857 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2858 || paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
2859 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2860 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2861 VERR_INVALID_PARAMETER);
2862 /* ignore fZeroed here */
2863 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2864 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2865 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2866 AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2867 || paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2868 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2869 }
2870
2871 for (; iPage < cPagesToAlloc; iPage++)
2872 {
2873 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2874 AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
2875 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2876 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2877 }
2878
2879 gmmR0MutexAcquire(pGMM);
2880 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2881 {
2882 /* No allocations before the initial reservation has been made! */
2883 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2884 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2885 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2886 {
2887 /*
2888 * Perform the updates.
2889 * Stop on the first error.
2890 */
2891 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2892 {
2893 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2894 {
2895 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2896 if (RT_LIKELY(pPage))
2897 {
2898 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2899 {
2900 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2901 {
2902 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2903 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2904 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2905 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2906 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2907 /* else: NIL_RTHCPHYS nothing */
2908
2909 paPages[iPage].idPage = NIL_GMM_PAGEID;
2910 paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2911 paPages[iPage].fZeroed = false;
2912 }
2913 else
2914 {
2915 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2916 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2917 rc = VERR_GMM_NOT_PAGE_OWNER;
2918 break;
2919 }
2920 }
2921 else
2922 {
2923 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2924 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2925 break;
2926 }
2927 }
2928 else
2929 {
2930 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2931 rc = VERR_GMM_PAGE_NOT_FOUND;
2932 break;
2933 }
2934 }
2935
2936 if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
2937 { /* likely */ }
2938 else
2939 {
2940 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2941 if (RT_LIKELY(pPage))
2942 {
2943 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2944 {
2945 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2946 Assert(pPage->Shared.cRefs);
2947 Assert(pGVM->gmm.s.Stats.cSharedPages);
2948 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2949
2950 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2951 pGVM->gmm.s.Stats.cSharedPages--;
2952 pGVM->gmm.s.Stats.Allocated.cBasePages--;
2953 if (!--pPage->Shared.cRefs)
2954 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2955 else
2956 {
2957 Assert(pGMM->cDuplicatePages);
2958 pGMM->cDuplicatePages--;
2959 }
2960
2961 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2962 }
2963 else
2964 {
2965 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2966 rc = VERR_GMM_PAGE_NOT_SHARED;
2967 break;
2968 }
2969 }
2970 else
2971 {
2972 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2973 rc = VERR_GMM_PAGE_NOT_FOUND;
2974 break;
2975 }
2976 }
2977 } /* for each page to update */
2978
2979 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2980 {
2981#ifdef VBOX_STRICT
2982 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2983 {
2984 Assert(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS);
2985 Assert(paPages[iPage].fZeroed == false);
2986 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2987 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2988 }
2989#endif
2990
2991 /*
2992 * Join paths with GMMR0AllocatePages for the allocation.
2993 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2994 */
2995 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2996 }
2997 }
2998 else
2999 rc = VERR_WRONG_ORDER;
3000 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3001 }
3002 else
3003 rc = VERR_GMM_IS_NOT_SANE;
3004 gmmR0MutexRelease(pGMM);
3005 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3006 return rc;
3007}
3008
3009
3010/**
3011 * Allocate one or more pages.
3012 *
3013 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3014 * The allocated pages are not cleared and will contain random garbage.
3015 *
3016 * @returns VBox status code:
3017 * @retval VINF_SUCCESS on success.
3018 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3019 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3020 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3021 * that is we're trying to allocate more than we've reserved.
3022 *
3023 * @param pGVM The global (ring-0) VM structure.
3024 * @param idCpu The VCPU id.
3025 * @param cPages The number of pages to allocate.
3026 * @param paPages Pointer to the page descriptors.
3027 * See GMMPAGEDESC for details on what is expected on
3028 * input.
3029 * @param enmAccount The account to charge.
3030 *
3031 * @thread EMT.
3032 */
3033GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3034{
3035 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3036
3037 /*
3038 * Validate, get basics and take the semaphore.
3039 */
3040 PGMM pGMM;
3041 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3042 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3043 if (RT_FAILURE(rc))
3044 return rc;
3045
3046 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3047 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3048 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3049
3050 for (unsigned iPage = 0; iPage < cPages; iPage++)
3051 {
3052 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
3053 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3054 || ( enmAccount == GMMACCOUNT_BASE
3055 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3056 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3057 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3058 VERR_INVALID_PARAMETER);
3059 AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
3060 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3061 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3062 }
3063
3064 /*
3065 * Grab the giant mutex and get working.
3066 */
3067 gmmR0MutexAcquire(pGMM);
3068 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3069 {
3070
3071 /* No allocations before the initial reservation has been made! */
3072 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3073 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3074 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3075 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3076 else
3077 rc = VERR_WRONG_ORDER;
3078 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3079 }
3080 else
3081 rc = VERR_GMM_IS_NOT_SANE;
3082 gmmR0MutexRelease(pGMM);
3083
3084 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3085 return rc;
3086}
3087
3088
3089/**
3090 * VMMR0 request wrapper for GMMR0AllocatePages.
3091 *
3092 * @returns see GMMR0AllocatePages.
3093 * @param pGVM The global (ring-0) VM structure.
3094 * @param idCpu The VCPU id.
3095 * @param pReq Pointer to the request packet.
3096 */
3097GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3098{
3099 /*
3100 * Validate input and pass it on.
3101 */
3102 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3103 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3104 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3105 VERR_INVALID_PARAMETER);
3106 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3107 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3108 VERR_INVALID_PARAMETER);
3109
3110 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3111}
3112
3113
3114/**
3115 * Allocate a large page to represent guest RAM
3116 *
3117 * The allocated pages are zeroed upon return.
3118 *
3119 * @returns VBox status code:
3120 * @retval VINF_SUCCESS on success.
3121 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3122 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3123 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3124 * that is we're trying to allocate more than we've reserved.
3125 * @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3126 * @returns see GMMR0AllocatePages.
3127 *
3128 * @param pGVM The global (ring-0) VM structure.
3129 * @param idCpu The VCPU id.
3130 * @param cbPage Large page size.
3131 * @param pIdPage Where to return the GMM page ID of the page.
3132 * @param pHCPhys Where to return the host physical address of the page.
3133 */
3134GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3135{
3136 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3137
3138 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3139 *pIdPage = NIL_GMM_PAGEID;
3140 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3141 *pHCPhys = NIL_RTHCPHYS;
3142 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3143
3144 /*
3145 * Validate GVM + idCpu, get basics and take the semaphore.
3146 */
3147 PGMM pGMM;
3148 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3149 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3150 if (RT_SUCCESS(rc))
3151 rc = gmmR0MutexAcquire(pGMM);
3152 if (RT_SUCCESS(rc))
3153 {
3154 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3155 {
3156 /*
3157 * Check the quota.
3158 */
3159 /** @todo r=bird: Quota checking could be done w/o the giant mutex but using
3160 * a VM specific mutex... */
3161 if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + GMM_CHUNK_NUM_PAGES
3162 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3163 {
3164 /*
3165 * Allocate a new large page chunk.
3166 *
3167 * Note! We leave the giant GMM lock temporarily as the allocation might
3168 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3169 */
3170 AssertCompile(GMM_CHUNK_SIZE == _2M);
3171 gmmR0MutexRelease(pGMM);
3172
3173 RTR0MEMOBJ hMemObj;
3174 rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3175 if (RT_SUCCESS(rc))
3176 {
3177 *pHCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, 0);
3178
3179 /*
3180 * Register the chunk as fully allocated.
3181 * Note! As mentioned above, this will return owning the mutex on success.
3182 */
3183 PGMMCHUNK pChunk = NULL;
3184 PGMMCHUNKFREESET const pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3185 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3186 if (RT_SUCCESS(rc))
3187 {
3188 /*
3189 * The gmmR0RegisterChunk call already marked all pages allocated,
3190 * so we just have to fill in the return values and update stats now.
3191 */
3192 *pIdPage = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
3193
3194 /* Update accounting. */
3195 pGVM->gmm.s.Stats.Allocated.cBasePages += GMM_CHUNK_NUM_PAGES;
3196 pGVM->gmm.s.Stats.cPrivatePages += GMM_CHUNK_NUM_PAGES;
3197 pGMM->cAllocatedPages += GMM_CHUNK_NUM_PAGES;
3198
3199 gmmR0LinkChunk(pChunk, pSet);
3200 gmmR0MutexRelease(pGMM);
3201
3202 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3203 return VINF_SUCCESS;
3204 }
3205
3206 /*
3207 * Bail out.
3208 */
3209 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3210 *pHCPhys = NIL_RTHCPHYS;
3211 }
3212 }
3213 else
3214 {
3215 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3216 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, GMM_CHUNK_NUM_PAGES));
3217 gmmR0MutexRelease(pGMM);
3218 rc = VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3219 }
3220 }
3221 else
3222 {
3223 gmmR0MutexRelease(pGMM);
3224 rc = VERR_GMM_IS_NOT_SANE;
3225 }
3226 }
3227
3228 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3229 return rc;
3230}
3231
3232
3233/**
3234 * Free a large page.
3235 *
3236 * @returns VBox status code:
3237 * @param pGVM The global (ring-0) VM structure.
3238 * @param idCpu The VCPU id.
3239 * @param idPage The large page id.
3240 */
3241GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3242{
3243 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3244
3245 /*
3246 * Validate, get basics and take the semaphore.
3247 */
3248 PGMM pGMM;
3249 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3250 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3251 if (RT_FAILURE(rc))
3252 return rc;
3253
3254 gmmR0MutexAcquire(pGMM);
3255 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3256 {
3257 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3258
3259 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3260 {
3261 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3262 gmmR0MutexRelease(pGMM);
3263 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3264 }
3265
3266 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3267 if (RT_LIKELY( pPage
3268 && GMM_PAGE_IS_PRIVATE(pPage)))
3269 {
3270 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3271 Assert(pChunk);
3272 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3273 Assert(pChunk->cPrivate > 0);
3274
3275 /* Release the memory immediately. */
3276 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3277
3278 /* Update accounting. */
3279 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3280 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3281 pGMM->cAllocatedPages -= cPages;
3282 }
3283 else
3284 rc = VERR_GMM_PAGE_NOT_FOUND;
3285 }
3286 else
3287 rc = VERR_GMM_IS_NOT_SANE;
3288
3289 gmmR0MutexRelease(pGMM);
3290 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3291 return rc;
3292}
3293
3294
3295/**
3296 * VMMR0 request wrapper for GMMR0FreeLargePage.
3297 *
3298 * @returns see GMMR0FreeLargePage.
3299 * @param pGVM The global (ring-0) VM structure.
3300 * @param idCpu The VCPU id.
3301 * @param pReq Pointer to the request packet.
3302 */
3303GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3304{
3305 /*
3306 * Validate input and pass it on.
3307 */
3308 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3309 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3310 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3311 VERR_INVALID_PARAMETER);
3312
3313 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3314}
3315
3316
3317/**
3318 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3319 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3320 */
3321static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3322{
3323 RT_NOREF(pvUser);
3324 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3325 {
3326 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3327 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3328 while (i-- > 0)
3329 {
3330 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3331 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3332 }
3333 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3334 }
3335 return VINF_SUCCESS;
3336}
3337
3338
3339/**
3340 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3341 * free generation ID value.
3342 *
3343 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3344 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3345 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3346 * invalidation passes and resets the generation ID between then. This will
3347 * make sure there are no false positives.
3348 *
3349 * @param pGMM Pointer to the GMM instance.
3350 */
3351static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3352{
3353 /*
3354 * First invalidation pass.
3355 */
3356 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3357 AssertRCSuccess(rc);
3358
3359 /*
3360 * Reset the generation number.
3361 */
3362 RTSpinlockAcquire(pGMM->hSpinLockTree);
3363 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3364 RTSpinlockRelease(pGMM->hSpinLockTree);
3365
3366 /*
3367 * Second invalidation pass.
3368 */
3369 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3370 AssertRCSuccess(rc);
3371}
3372
3373
3374/**
3375 * Frees a chunk, giving it back to the host OS.
3376 *
3377 * @param pGMM Pointer to the GMM instance.
3378 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3379 * unmap and free the chunk in one go.
3380 * @param pChunk The chunk to free.
3381 * @param fRelaxedSem Whether we can release the semaphore while doing the
3382 * freeing (@c true) or not.
3383 */
3384static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3385{
3386 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3387
3388 GMMR0CHUNKMTXSTATE MtxState;
3389 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3390
3391 /*
3392 * Cleanup hack! Unmap the chunk from the callers address space.
3393 * This shouldn't happen, so screw lock contention...
3394 */
3395 if (pChunk->cMappingsX && pGVM)
3396 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3397
3398 /*
3399 * If there are current mappings of the chunk, then request the
3400 * VMs to unmap them. Reposition the chunk in the free list so
3401 * it won't be a likely candidate for allocations.
3402 */
3403 if (pChunk->cMappingsX)
3404 {
3405 /** @todo R0 -> VM request */
3406 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3407 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3408 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3409 return false;
3410 }
3411
3412
3413 /*
3414 * Save and trash the handle.
3415 */
3416 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3417 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3418
3419 /*
3420 * Unlink it from everywhere.
3421 */
3422 gmmR0UnlinkChunk(pChunk);
3423
3424 RTSpinlockAcquire(pGMM->hSpinLockTree);
3425
3426 RTListNodeRemove(&pChunk->ListNode);
3427
3428 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3429 Assert(pCore == &pChunk->Core); NOREF(pCore);
3430
3431 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3432 if (pTlbe->pChunk == pChunk)
3433 {
3434 pTlbe->idChunk = NIL_GMM_CHUNKID;
3435 pTlbe->pChunk = NULL;
3436 }
3437
3438 Assert(pGMM->cChunks > 0);
3439 pGMM->cChunks--;
3440
3441 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3442
3443 RTSpinlockRelease(pGMM->hSpinLockTree);
3444
3445 /*
3446 * Free the Chunk ID before dropping the locks and freeing the rest.
3447 */
3448 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3449 pChunk->Core.Key = NIL_GMM_CHUNKID;
3450
3451 pGMM->cFreedChunks++;
3452
3453 gmmR0ChunkMutexRelease(&MtxState, NULL);
3454 if (fRelaxedSem)
3455 gmmR0MutexRelease(pGMM);
3456
3457 if (idFreeGeneration == UINT64_MAX / 4)
3458 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3459
3460 RTMemFree(pChunk->paMappingsX);
3461 pChunk->paMappingsX = NULL;
3462
3463 RTMemFree(pChunk);
3464
3465#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3466 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3467#else
3468 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3469#endif
3470 AssertLogRelRC(rc);
3471
3472 if (fRelaxedSem)
3473 gmmR0MutexAcquire(pGMM);
3474 return fRelaxedSem;
3475}
3476
3477
3478/**
3479 * Free page worker.
3480 *
3481 * The caller does all the statistic decrementing, we do all the incrementing.
3482 *
3483 * @param pGMM Pointer to the GMM instance data.
3484 * @param pGVM Pointer to the GVM instance.
3485 * @param pChunk Pointer to the chunk this page belongs to.
3486 * @param idPage The Page ID.
3487 * @param pPage Pointer to the page.
3488 */
3489static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3490{
3491 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3492 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3493
3494 /*
3495 * Put the page on the free list.
3496 */
3497 pPage->u = 0;
3498 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3499 pPage->Free.fZeroed = false;
3500 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3501 pPage->Free.iNext = pChunk->iFreeHead;
3502 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3503
3504 /*
3505 * Update statistics (the cShared/cPrivate stats are up to date already),
3506 * and relink the chunk if necessary.
3507 */
3508 unsigned const cFree = pChunk->cFree;
3509 if ( !cFree
3510 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3511 {
3512 gmmR0UnlinkChunk(pChunk);
3513 pChunk->cFree++;
3514 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3515 }
3516 else
3517 {
3518 pChunk->cFree = cFree + 1;
3519 pChunk->pSet->cFreePages++;
3520 }
3521
3522 /*
3523 * If the chunk becomes empty, consider giving memory back to the host OS.
3524 *
3525 * The current strategy is to try give it back if there are other chunks
3526 * in this free list, meaning if there are at least 240 free pages in this
3527 * category. Note that since there are probably mappings of the chunk,
3528 * it won't be freed up instantly, which probably screws up this logic
3529 * a bit...
3530 */
3531 /** @todo Do this on the way out. */
3532 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3533 || pChunk->pFreeNext == NULL
3534 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3535 { /* likely */ }
3536 else
3537 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3538}
3539
3540
3541/**
3542 * Frees a shared page, the page is known to exist and be valid and such.
3543 *
3544 * @param pGMM Pointer to the GMM instance.
3545 * @param pGVM Pointer to the GVM instance.
3546 * @param idPage The page id.
3547 * @param pPage The page structure.
3548 */
3549DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3550{
3551 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3552 Assert(pChunk);
3553 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3554 Assert(pChunk->cShared > 0);
3555 Assert(pGMM->cSharedPages > 0);
3556 Assert(pGMM->cAllocatedPages > 0);
3557 Assert(!pPage->Shared.cRefs);
3558
3559 pChunk->cShared--;
3560 pGMM->cAllocatedPages--;
3561 pGMM->cSharedPages--;
3562 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3563}
3564
3565
3566/**
3567 * Frees a private page, the page is known to exist and be valid and such.
3568 *
3569 * @param pGMM Pointer to the GMM instance.
3570 * @param pGVM Pointer to the GVM instance.
3571 * @param idPage The page id.
3572 * @param pPage The page structure.
3573 */
3574DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3575{
3576 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3577 Assert(pChunk);
3578 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3579 Assert(pChunk->cPrivate > 0);
3580 Assert(pGMM->cAllocatedPages > 0);
3581
3582 pChunk->cPrivate--;
3583 pGMM->cAllocatedPages--;
3584 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3585}
3586
3587
3588/**
3589 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3590 *
3591 * @returns VBox status code:
3592 * @retval xxx
3593 *
3594 * @param pGMM Pointer to the GMM instance data.
3595 * @param pGVM Pointer to the VM.
3596 * @param cPages The number of pages to free.
3597 * @param paPages Pointer to the page descriptors.
3598 * @param enmAccount The account this relates to.
3599 */
3600static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3601{
3602 /*
3603 * Check that the request isn't impossible wrt to the account status.
3604 */
3605 switch (enmAccount)
3606 {
3607 case GMMACCOUNT_BASE:
3608 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3609 {
3610 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3611 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3612 }
3613 break;
3614 case GMMACCOUNT_SHADOW:
3615 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3616 {
3617 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3618 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3619 }
3620 break;
3621 case GMMACCOUNT_FIXED:
3622 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3623 {
3624 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3625 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3626 }
3627 break;
3628 default:
3629 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3630 }
3631
3632 /*
3633 * Walk the descriptors and free the pages.
3634 *
3635 * Statistics (except the account) are being updated as we go along,
3636 * unlike the alloc code. Also, stop on the first error.
3637 */
3638 int rc = VINF_SUCCESS;
3639 uint32_t iPage;
3640 for (iPage = 0; iPage < cPages; iPage++)
3641 {
3642 uint32_t idPage = paPages[iPage].idPage;
3643 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3644 if (RT_LIKELY(pPage))
3645 {
3646 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3647 {
3648 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3649 {
3650 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3651 pGVM->gmm.s.Stats.cPrivatePages--;
3652 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3653 }
3654 else
3655 {
3656 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3657 pPage->Private.hGVM, pGVM->hSelf));
3658 rc = VERR_GMM_NOT_PAGE_OWNER;
3659 break;
3660 }
3661 }
3662 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3663 {
3664 Assert(pGVM->gmm.s.Stats.cSharedPages);
3665 Assert(pPage->Shared.cRefs);
3666#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT)
3667 if (pPage->Shared.u14Checksum)
3668 {
3669 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3670 uChecksum &= UINT32_C(0x00003fff);
3671 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3672 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3673 }
3674#endif
3675 pGVM->gmm.s.Stats.cSharedPages--;
3676 if (!--pPage->Shared.cRefs)
3677 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3678 else
3679 {
3680 Assert(pGMM->cDuplicatePages);
3681 pGMM->cDuplicatePages--;
3682 }
3683 }
3684 else
3685 {
3686 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3687 rc = VERR_GMM_PAGE_ALREADY_FREE;
3688 break;
3689 }
3690 }
3691 else
3692 {
3693 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3694 rc = VERR_GMM_PAGE_NOT_FOUND;
3695 break;
3696 }
3697 paPages[iPage].idPage = NIL_GMM_PAGEID;
3698 }
3699
3700 /*
3701 * Update the account.
3702 */
3703 switch (enmAccount)
3704 {
3705 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3706 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3707 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3708 default:
3709 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3710 }
3711
3712 /*
3713 * Any threshold stuff to be done here?
3714 */
3715
3716 return rc;
3717}
3718
3719
3720/**
3721 * Free one or more pages.
3722 *
3723 * This is typically used at reset time or power off.
3724 *
3725 * @returns VBox status code:
3726 * @retval xxx
3727 *
3728 * @param pGVM The global (ring-0) VM structure.
3729 * @param idCpu The VCPU id.
3730 * @param cPages The number of pages to allocate.
3731 * @param paPages Pointer to the page descriptors containing the page IDs
3732 * for each page.
3733 * @param enmAccount The account this relates to.
3734 * @thread EMT.
3735 */
3736GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3737{
3738 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3739
3740 /*
3741 * Validate input and get the basics.
3742 */
3743 PGMM pGMM;
3744 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3745 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3746 if (RT_FAILURE(rc))
3747 return rc;
3748
3749 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3750 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3751 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3752
3753 for (unsigned iPage = 0; iPage < cPages; iPage++)
3754 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3755 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3756 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3757
3758 /*
3759 * Take the semaphore and call the worker function.
3760 */
3761 gmmR0MutexAcquire(pGMM);
3762 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3763 {
3764 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3765 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3766 }
3767 else
3768 rc = VERR_GMM_IS_NOT_SANE;
3769 gmmR0MutexRelease(pGMM);
3770 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3771 return rc;
3772}
3773
3774
3775/**
3776 * VMMR0 request wrapper for GMMR0FreePages.
3777 *
3778 * @returns see GMMR0FreePages.
3779 * @param pGVM The global (ring-0) VM structure.
3780 * @param idCpu The VCPU id.
3781 * @param pReq Pointer to the request packet.
3782 */
3783GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3784{
3785 /*
3786 * Validate input and pass it on.
3787 */
3788 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3789 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3790 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3791 VERR_INVALID_PARAMETER);
3792 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3793 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3794 VERR_INVALID_PARAMETER);
3795
3796 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3797}
3798
3799
3800/**
3801 * Report back on a memory ballooning request.
3802 *
3803 * The request may or may not have been initiated by the GMM. If it was initiated
3804 * by the GMM it is important that this function is called even if no pages were
3805 * ballooned.
3806 *
3807 * @returns VBox status code:
3808 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3809 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3810 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3811 * indicating that we won't necessarily have sufficient RAM to boot
3812 * the VM again and that it should pause until this changes (we'll try
3813 * balloon some other VM). (For standard deflate we have little choice
3814 * but to hope the VM won't use the memory that was returned to it.)
3815 *
3816 * @param pGVM The global (ring-0) VM structure.
3817 * @param idCpu The VCPU id.
3818 * @param enmAction Inflate/deflate/reset.
3819 * @param cBalloonedPages The number of pages that was ballooned.
3820 *
3821 * @thread EMT(idCpu)
3822 */
3823GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3824{
3825 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3826 pGVM, enmAction, cBalloonedPages));
3827
3828 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3829
3830 /*
3831 * Validate input and get the basics.
3832 */
3833 PGMM pGMM;
3834 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3835 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3836 if (RT_FAILURE(rc))
3837 return rc;
3838
3839 /*
3840 * Take the semaphore and do some more validations.
3841 */
3842 gmmR0MutexAcquire(pGMM);
3843 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3844 {
3845 switch (enmAction)
3846 {
3847 case GMMBALLOONACTION_INFLATE:
3848 {
3849 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3850 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3851 {
3852 /*
3853 * Record the ballooned memory.
3854 */
3855 pGMM->cBalloonedPages += cBalloonedPages;
3856 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3857 {
3858 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3859 AssertFailed();
3860
3861 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3862 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3863 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3864 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3865 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3866 }
3867 else
3868 {
3869 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3870 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3871 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3872 }
3873 }
3874 else
3875 {
3876 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3877 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3878 pGVM->gmm.s.Stats.Reserved.cBasePages));
3879 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3880 }
3881 break;
3882 }
3883
3884 case GMMBALLOONACTION_DEFLATE:
3885 {
3886 /* Deflate. */
3887 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3888 {
3889 /*
3890 * Record the ballooned memory.
3891 */
3892 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3893 pGMM->cBalloonedPages -= cBalloonedPages;
3894 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3895 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3896 {
3897 AssertFailed(); /* This is path is for later. */
3898 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3899 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3900
3901 /*
3902 * Anything we need to do here now when the request has been completed?
3903 */
3904 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3905 }
3906 else
3907 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3908 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3909 }
3910 else
3911 {
3912 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3913 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3914 }
3915 break;
3916 }
3917
3918 case GMMBALLOONACTION_RESET:
3919 {
3920 /* Reset to an empty balloon. */
3921 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3922
3923 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3924 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3925 break;
3926 }
3927
3928 default:
3929 rc = VERR_INVALID_PARAMETER;
3930 break;
3931 }
3932 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3933 }
3934 else
3935 rc = VERR_GMM_IS_NOT_SANE;
3936
3937 gmmR0MutexRelease(pGMM);
3938 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3939 return rc;
3940}
3941
3942
3943/**
3944 * VMMR0 request wrapper for GMMR0BalloonedPages.
3945 *
3946 * @returns see GMMR0BalloonedPages.
3947 * @param pGVM The global (ring-0) VM structure.
3948 * @param idCpu The VCPU id.
3949 * @param pReq Pointer to the request packet.
3950 */
3951GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3952{
3953 /*
3954 * Validate input and pass it on.
3955 */
3956 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3957 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3958 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3959 VERR_INVALID_PARAMETER);
3960
3961 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3962}
3963
3964
3965/**
3966 * Return memory statistics for the hypervisor
3967 *
3968 * @returns VBox status code.
3969 * @param pReq Pointer to the request packet.
3970 */
3971GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3972{
3973 /*
3974 * Validate input and pass it on.
3975 */
3976 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3977 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3978 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3979 VERR_INVALID_PARAMETER);
3980
3981 /*
3982 * Validate input and get the basics.
3983 */
3984 PGMM pGMM;
3985 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3986 pReq->cAllocPages = pGMM->cAllocatedPages;
3987 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3988 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3989 pReq->cMaxPages = pGMM->cMaxPages;
3990 pReq->cSharedPages = pGMM->cDuplicatePages;
3991 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3992
3993 return VINF_SUCCESS;
3994}
3995
3996
3997/**
3998 * Return memory statistics for the VM
3999 *
4000 * @returns VBox status code.
4001 * @param pGVM The global (ring-0) VM structure.
4002 * @param idCpu Cpu id.
4003 * @param pReq Pointer to the request packet.
4004 *
4005 * @thread EMT(idCpu)
4006 */
4007GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4008{
4009 /*
4010 * Validate input and pass it on.
4011 */
4012 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4013 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4014 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4015 VERR_INVALID_PARAMETER);
4016
4017 /*
4018 * Validate input and get the basics.
4019 */
4020 PGMM pGMM;
4021 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4022 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4023 if (RT_FAILURE(rc))
4024 return rc;
4025
4026 /*
4027 * Take the semaphore and do some more validations.
4028 */
4029 gmmR0MutexAcquire(pGMM);
4030 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4031 {
4032 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4033 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4034 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4035 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4036 }
4037 else
4038 rc = VERR_GMM_IS_NOT_SANE;
4039
4040 gmmR0MutexRelease(pGMM);
4041 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4042 return rc;
4043}
4044
4045
4046/**
4047 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4048 *
4049 * Don't call this in legacy allocation mode!
4050 *
4051 * @returns VBox status code.
4052 * @param pGMM Pointer to the GMM instance data.
4053 * @param pGVM Pointer to the Global VM structure.
4054 * @param pChunk Pointer to the chunk to be unmapped.
4055 */
4056static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4057{
4058 RT_NOREF_PV(pGMM);
4059
4060 /*
4061 * Find the mapping and try unmapping it.
4062 */
4063 uint32_t cMappings = pChunk->cMappingsX;
4064 for (uint32_t i = 0; i < cMappings; i++)
4065 {
4066 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4067 if (pChunk->paMappingsX[i].pGVM == pGVM)
4068 {
4069 /* unmap */
4070 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4071 if (RT_SUCCESS(rc))
4072 {
4073 /* update the record. */
4074 cMappings--;
4075 if (i < cMappings)
4076 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4077 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4078 pChunk->paMappingsX[cMappings].pGVM = NULL;
4079 Assert(pChunk->cMappingsX - 1U == cMappings);
4080 pChunk->cMappingsX = cMappings;
4081 }
4082
4083 return rc;
4084 }
4085 }
4086
4087 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4088 return VERR_GMM_CHUNK_NOT_MAPPED;
4089}
4090
4091
4092/**
4093 * Unmaps a chunk previously mapped into the address space of the current process.
4094 *
4095 * @returns VBox status code.
4096 * @param pGMM Pointer to the GMM instance data.
4097 * @param pGVM Pointer to the Global VM structure.
4098 * @param pChunk Pointer to the chunk to be unmapped.
4099 * @param fRelaxedSem Whether we can release the semaphore while doing the
4100 * mapping (@c true) or not.
4101 */
4102static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4103{
4104 /*
4105 * Lock the chunk and if possible leave the giant GMM lock.
4106 */
4107 GMMR0CHUNKMTXSTATE MtxState;
4108 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4109 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4110 if (RT_SUCCESS(rc))
4111 {
4112 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4113 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4114 }
4115 return rc;
4116}
4117
4118
4119/**
4120 * Worker for gmmR0MapChunk.
4121 *
4122 * @returns VBox status code.
4123 * @param pGMM Pointer to the GMM instance data.
4124 * @param pGVM Pointer to the Global VM structure.
4125 * @param pChunk Pointer to the chunk to be mapped.
4126 * @param ppvR3 Where to store the ring-3 address of the mapping.
4127 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4128 * contain the address of the existing mapping.
4129 */
4130static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4131{
4132 RT_NOREF(pGMM);
4133
4134 /*
4135 * Check to see if the chunk is already mapped.
4136 */
4137 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4138 {
4139 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4140 if (pChunk->paMappingsX[i].pGVM == pGVM)
4141 {
4142 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4143 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4144#ifdef VBOX_WITH_PAGE_SHARING
4145 /* The ring-3 chunk cache can be out of sync; don't fail. */
4146 return VINF_SUCCESS;
4147#else
4148 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4149#endif
4150 }
4151 }
4152
4153 /*
4154 * Do the mapping.
4155 */
4156 RTR0MEMOBJ hMapObj;
4157 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4158 if (RT_SUCCESS(rc))
4159 {
4160 /* reallocate the array? assumes few users per chunk (usually one). */
4161 unsigned iMapping = pChunk->cMappingsX;
4162 if ( iMapping <= 3
4163 || (iMapping & 3) == 0)
4164 {
4165 unsigned cNewSize = iMapping <= 3
4166 ? iMapping + 1
4167 : iMapping + 4;
4168 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4169 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4170 {
4171 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4172 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4173 }
4174
4175 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4176 if (RT_UNLIKELY(!pvMappings))
4177 {
4178 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4179 return VERR_NO_MEMORY;
4180 }
4181 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4182 }
4183
4184 /* insert new entry */
4185 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4186 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4187 Assert(pChunk->cMappingsX == iMapping);
4188 pChunk->cMappingsX = iMapping + 1;
4189
4190 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4191 }
4192
4193 return rc;
4194}
4195
4196
4197/**
4198 * Maps a chunk into the user address space of the current process.
4199 *
4200 * @returns VBox status code.
4201 * @param pGMM Pointer to the GMM instance data.
4202 * @param pGVM Pointer to the Global VM structure.
4203 * @param pChunk Pointer to the chunk to be mapped.
4204 * @param fRelaxedSem Whether we can release the semaphore while doing the
4205 * mapping (@c true) or not.
4206 * @param ppvR3 Where to store the ring-3 address of the mapping.
4207 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4208 * contain the address of the existing mapping.
4209 */
4210static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4211{
4212 /*
4213 * Take the chunk lock and leave the giant GMM lock when possible, then
4214 * call the worker function.
4215 */
4216 GMMR0CHUNKMTXSTATE MtxState;
4217 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4218 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4219 if (RT_SUCCESS(rc))
4220 {
4221 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4222 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4223 }
4224
4225 return rc;
4226}
4227
4228
4229
4230#if defined(VBOX_WITH_PAGE_SHARING) || defined(VBOX_STRICT)
4231/**
4232 * Check if a chunk is mapped into the specified VM
4233 *
4234 * @returns mapped yes/no
4235 * @param pGMM Pointer to the GMM instance.
4236 * @param pGVM Pointer to the Global VM structure.
4237 * @param pChunk Pointer to the chunk to be mapped.
4238 * @param ppvR3 Where to store the ring-3 address of the mapping.
4239 */
4240static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4241{
4242 GMMR0CHUNKMTXSTATE MtxState;
4243 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4244 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4245 {
4246 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4247 if (pChunk->paMappingsX[i].pGVM == pGVM)
4248 {
4249 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4250 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4251 return true;
4252 }
4253 }
4254 *ppvR3 = NULL;
4255 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4256 return false;
4257}
4258#endif /* VBOX_WITH_PAGE_SHARING || VBOX_STRICT */
4259
4260
4261/**
4262 * Map a chunk and/or unmap another chunk.
4263 *
4264 * The mapping and unmapping applies to the current process.
4265 *
4266 * This API does two things because it saves a kernel call per mapping when
4267 * when the ring-3 mapping cache is full.
4268 *
4269 * @returns VBox status code.
4270 * @param pGVM The global (ring-0) VM structure.
4271 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4272 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4273 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4274 * @thread EMT ???
4275 */
4276GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4277{
4278 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4279 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4280
4281 /*
4282 * Validate input and get the basics.
4283 */
4284 PGMM pGMM;
4285 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4286 int rc = GVMMR0ValidateGVM(pGVM);
4287 if (RT_FAILURE(rc))
4288 return rc;
4289
4290 AssertCompile(NIL_GMM_CHUNKID == 0);
4291 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4292 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4293
4294 if ( idChunkMap == NIL_GMM_CHUNKID
4295 && idChunkUnmap == NIL_GMM_CHUNKID)
4296 return VERR_INVALID_PARAMETER;
4297
4298 if (idChunkMap != NIL_GMM_CHUNKID)
4299 {
4300 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4301 *ppvR3 = NIL_RTR3PTR;
4302 }
4303
4304 /*
4305 * Take the semaphore and do the work.
4306 *
4307 * The unmapping is done last since it's easier to undo a mapping than
4308 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4309 * that it pushes the user virtual address space to within a chunk of
4310 * it it's limits, so, no problem here.
4311 */
4312 gmmR0MutexAcquire(pGMM);
4313 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4314 {
4315 PGMMCHUNK pMap = NULL;
4316 if (idChunkMap != NIL_GVM_HANDLE)
4317 {
4318 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4319 if (RT_LIKELY(pMap))
4320 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4321 else
4322 {
4323 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4324 rc = VERR_GMM_CHUNK_NOT_FOUND;
4325 }
4326 }
4327/** @todo split this operation, the bail out might (theoretcially) not be
4328 * entirely safe. */
4329
4330 if ( idChunkUnmap != NIL_GMM_CHUNKID
4331 && RT_SUCCESS(rc))
4332 {
4333 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4334 if (RT_LIKELY(pUnmap))
4335 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4336 else
4337 {
4338 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4339 rc = VERR_GMM_CHUNK_NOT_FOUND;
4340 }
4341
4342 if (RT_FAILURE(rc) && pMap)
4343 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4344 }
4345
4346 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4347 }
4348 else
4349 rc = VERR_GMM_IS_NOT_SANE;
4350 gmmR0MutexRelease(pGMM);
4351
4352 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4353 return rc;
4354}
4355
4356
4357/**
4358 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4359 *
4360 * @returns see GMMR0MapUnmapChunk.
4361 * @param pGVM The global (ring-0) VM structure.
4362 * @param pReq Pointer to the request packet.
4363 */
4364GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4365{
4366 /*
4367 * Validate input and pass it on.
4368 */
4369 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4370 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4371
4372 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4373}
4374
4375
4376#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4377/**
4378 * Gets the ring-0 virtual address for the given page.
4379 *
4380 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4381 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4382 * corresponding chunk will remain valid beyond the call (at least till the EMT
4383 * returns to ring-3).
4384 *
4385 * @returns VBox status code.
4386 * @param pGVM Pointer to the kernel-only VM instace data.
4387 * @param idPage The page ID.
4388 * @param ppv Where to store the address.
4389 * @thread EMT
4390 */
4391GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4392{
4393 *ppv = NULL;
4394 PGMM pGMM;
4395 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4396
4397 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4398
4399 /*
4400 * Start with the per-VM TLB.
4401 */
4402 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4403
4404 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4405 PGMMCHUNK pChunk = pTlbe->pChunk;
4406 if ( pChunk != NULL
4407 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4408 && pChunk->Core.Key == idChunk)
4409 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4410 else
4411 {
4412 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4413
4414 /*
4415 * Look it up in the chunk tree.
4416 */
4417 RTSpinlockAcquire(pGMM->hSpinLockTree);
4418 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4419 if (RT_LIKELY(pChunk))
4420 {
4421 pTlbe->idGeneration = pGMM->idFreeGeneration;
4422 RTSpinlockRelease(pGMM->hSpinLockTree);
4423 pTlbe->pChunk = pChunk;
4424 }
4425 else
4426 {
4427 RTSpinlockRelease(pGMM->hSpinLockTree);
4428 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4429 AssertMsgFailed(("idPage=%#x\n", idPage));
4430 return VERR_GMM_PAGE_NOT_FOUND;
4431 }
4432 }
4433
4434 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4435
4436 /*
4437 * Got a chunk, now validate the page ownership and calcuate it's address.
4438 */
4439 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4440 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4441 && pPage->Private.hGVM == pGVM->hSelf)
4442 || GMM_PAGE_IS_SHARED(pPage)))
4443 {
4444 AssertPtr(pChunk->pbMapping);
4445 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4446 return VINF_SUCCESS;
4447 }
4448 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4449 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4450 return VERR_GMM_NOT_PAGE_OWNER;
4451}
4452#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4453
4454#ifdef VBOX_WITH_PAGE_SHARING
4455
4456# ifdef VBOX_STRICT
4457/**
4458 * For checksumming shared pages in strict builds.
4459 *
4460 * The purpose is making sure that a page doesn't change.
4461 *
4462 * @returns Checksum, 0 on failure.
4463 * @param pGMM The GMM instance data.
4464 * @param pGVM Pointer to the kernel-only VM instace data.
4465 * @param idPage The page ID.
4466 */
4467static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4468{
4469 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4470 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4471
4472 uint8_t *pbChunk;
4473 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4474 return 0;
4475 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4476
4477 return RTCrc32(pbPage, PAGE_SIZE);
4478}
4479# endif /* VBOX_STRICT */
4480
4481
4482/**
4483 * Calculates the module hash value.
4484 *
4485 * @returns Hash value.
4486 * @param pszModuleName The module name.
4487 * @param pszVersion The module version string.
4488 */
4489static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4490{
4491 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4492}
4493
4494
4495/**
4496 * Finds a global module.
4497 *
4498 * @returns Pointer to the global module on success, NULL if not found.
4499 * @param pGMM The GMM instance data.
4500 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4501 * @param cbModule The module size.
4502 * @param enmGuestOS The guest OS type.
4503 * @param cRegions The number of regions.
4504 * @param pszModuleName The module name.
4505 * @param pszVersion The module version.
4506 * @param paRegions The region descriptions.
4507 */
4508static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4509 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4510 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4511{
4512 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4513 pGblMod;
4514 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4515 {
4516 if (pGblMod->cbModule != cbModule)
4517 continue;
4518 if (pGblMod->enmGuestOS != enmGuestOS)
4519 continue;
4520 if (pGblMod->cRegions != cRegions)
4521 continue;
4522 if (strcmp(pGblMod->szName, pszModuleName))
4523 continue;
4524 if (strcmp(pGblMod->szVersion, pszVersion))
4525 continue;
4526
4527 uint32_t i;
4528 for (i = 0; i < cRegions; i++)
4529 {
4530 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4531 if (pGblMod->aRegions[i].off != off)
4532 break;
4533
4534 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4535 if (pGblMod->aRegions[i].cb != cb)
4536 break;
4537 }
4538
4539 if (i == cRegions)
4540 return pGblMod;
4541 }
4542
4543 return NULL;
4544}
4545
4546
4547/**
4548 * Creates a new global module.
4549 *
4550 * @returns VBox status code.
4551 * @param pGMM The GMM instance data.
4552 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4553 * @param cbModule The module size.
4554 * @param enmGuestOS The guest OS type.
4555 * @param cRegions The number of regions.
4556 * @param pszModuleName The module name.
4557 * @param pszVersion The module version.
4558 * @param paRegions The region descriptions.
4559 * @param ppGblMod Where to return the new module on success.
4560 */
4561static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4562 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4563 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4564{
4565 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4566 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4567 {
4568 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4569 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4570 }
4571
4572 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4573 if (!pGblMod)
4574 {
4575 Log(("gmmR0ShModNewGlobal: No memory\n"));
4576 return VERR_NO_MEMORY;
4577 }
4578
4579 pGblMod->Core.Key = uHash;
4580 pGblMod->cbModule = cbModule;
4581 pGblMod->cRegions = cRegions;
4582 pGblMod->cUsers = 1;
4583 pGblMod->enmGuestOS = enmGuestOS;
4584 strcpy(pGblMod->szName, pszModuleName);
4585 strcpy(pGblMod->szVersion, pszVersion);
4586
4587 for (uint32_t i = 0; i < cRegions; i++)
4588 {
4589 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4590 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4591 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4592 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4593 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4594 }
4595
4596 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4597 Assert(fInsert); NOREF(fInsert);
4598 pGMM->cShareableModules++;
4599
4600 *ppGblMod = pGblMod;
4601 return VINF_SUCCESS;
4602}
4603
4604
4605/**
4606 * Deletes a global module which is no longer referenced by anyone.
4607 *
4608 * @param pGMM The GMM instance data.
4609 * @param pGblMod The module to delete.
4610 */
4611static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4612{
4613 Assert(pGblMod->cUsers == 0);
4614 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4615
4616 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4617 Assert(pvTest == pGblMod); NOREF(pvTest);
4618 pGMM->cShareableModules--;
4619
4620 uint32_t i = pGblMod->cRegions;
4621 while (i-- > 0)
4622 {
4623 if (pGblMod->aRegions[i].paidPages)
4624 {
4625 /* We don't doing anything to the pages as they are handled by the
4626 copy-on-write mechanism in PGM. */
4627 RTMemFree(pGblMod->aRegions[i].paidPages);
4628 pGblMod->aRegions[i].paidPages = NULL;
4629 }
4630 }
4631 RTMemFree(pGblMod);
4632}
4633
4634
4635static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4636 PGMMSHAREDMODULEPERVM *ppRecVM)
4637{
4638 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4639 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4640
4641 PGMMSHAREDMODULEPERVM pRecVM;
4642 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4643 if (!pRecVM)
4644 return VERR_NO_MEMORY;
4645
4646 pRecVM->Core.Key = GCBaseAddr;
4647 for (uint32_t i = 0; i < cRegions; i++)
4648 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4649
4650 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4651 Assert(fInsert); NOREF(fInsert);
4652 pGVM->gmm.s.Stats.cShareableModules++;
4653
4654 *ppRecVM = pRecVM;
4655 return VINF_SUCCESS;
4656}
4657
4658
4659static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4660{
4661 /*
4662 * Free the per-VM module.
4663 */
4664 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4665 pRecVM->pGlobalModule = NULL;
4666
4667 if (fRemove)
4668 {
4669 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4670 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4671 }
4672
4673 RTMemFree(pRecVM);
4674
4675 /*
4676 * Release the global module.
4677 * (In the registration bailout case, it might not be.)
4678 */
4679 if (pGblMod)
4680 {
4681 Assert(pGblMod->cUsers > 0);
4682 pGblMod->cUsers--;
4683 if (pGblMod->cUsers == 0)
4684 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4685 }
4686}
4687
4688#endif /* VBOX_WITH_PAGE_SHARING */
4689
4690/**
4691 * Registers a new shared module for the VM.
4692 *
4693 * @returns VBox status code.
4694 * @param pGVM The global (ring-0) VM structure.
4695 * @param idCpu The VCPU id.
4696 * @param enmGuestOS The guest OS type.
4697 * @param pszModuleName The module name.
4698 * @param pszVersion The module version.
4699 * @param GCPtrModBase The module base address.
4700 * @param cbModule The module size.
4701 * @param cRegions The mumber of shared region descriptors.
4702 * @param paRegions Pointer to an array of shared region(s).
4703 * @thread EMT(idCpu)
4704 */
4705GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4706 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4707 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4708{
4709#ifdef VBOX_WITH_PAGE_SHARING
4710 /*
4711 * Validate input and get the basics.
4712 *
4713 * Note! Turns out the module size does necessarily match the size of the
4714 * regions. (iTunes on XP)
4715 */
4716 PGMM pGMM;
4717 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4718 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4719 if (RT_FAILURE(rc))
4720 return rc;
4721
4722 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4723 return VERR_GMM_TOO_MANY_REGIONS;
4724
4725 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4726 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4727
4728 uint32_t cbTotal = 0;
4729 for (uint32_t i = 0; i < cRegions; i++)
4730 {
4731 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4732 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4733
4734 cbTotal += paRegions[i].cbRegion;
4735 if (RT_UNLIKELY(cbTotal > _1G))
4736 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4737 }
4738
4739 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4740 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4741 return VERR_GMM_MODULE_NAME_TOO_LONG;
4742
4743 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4744 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4745 return VERR_GMM_MODULE_NAME_TOO_LONG;
4746
4747 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4748 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4749
4750 /*
4751 * Take the semaphore and do some more validations.
4752 */
4753 gmmR0MutexAcquire(pGMM);
4754 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4755 {
4756 /*
4757 * Check if this module is already locally registered and register
4758 * it if it isn't. The base address is a unique module identifier
4759 * locally.
4760 */
4761 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4762 bool fNewModule = pRecVM == NULL;
4763 if (fNewModule)
4764 {
4765 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4766 if (RT_SUCCESS(rc))
4767 {
4768 /*
4769 * Find a matching global module, register a new one if needed.
4770 */
4771 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4772 pszModuleName, pszVersion, paRegions);
4773 if (!pGblMod)
4774 {
4775 Assert(fNewModule);
4776 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4777 pszModuleName, pszVersion, paRegions, &pGblMod);
4778 if (RT_SUCCESS(rc))
4779 {
4780 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4781 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4782 }
4783 else
4784 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4785 }
4786 else
4787 {
4788 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4789 pGblMod->cUsers++;
4790 pRecVM->pGlobalModule = pGblMod;
4791
4792 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4793 }
4794 }
4795 }
4796 else
4797 {
4798 /*
4799 * Attempt to re-register an existing module.
4800 */
4801 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4802 pszModuleName, pszVersion, paRegions);
4803 if (pRecVM->pGlobalModule == pGblMod)
4804 {
4805 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4806 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4807 }
4808 else
4809 {
4810 /** @todo may have to unregister+register when this happens in case it's caused
4811 * by VBoxService crashing and being restarted... */
4812 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4813 " incoming at %RGvLB%#x %s %s rgns %u\n"
4814 " existing at %RGvLB%#x %s %s rgns %u\n",
4815 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4816 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4817 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4818 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4819 }
4820 }
4821 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4822 }
4823 else
4824 rc = VERR_GMM_IS_NOT_SANE;
4825
4826 gmmR0MutexRelease(pGMM);
4827 return rc;
4828#else
4829
4830 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4831 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4832 return VERR_NOT_IMPLEMENTED;
4833#endif
4834}
4835
4836
4837/**
4838 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4839 *
4840 * @returns see GMMR0RegisterSharedModule.
4841 * @param pGVM The global (ring-0) VM structure.
4842 * @param idCpu The VCPU id.
4843 * @param pReq Pointer to the request packet.
4844 */
4845GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4846{
4847 /*
4848 * Validate input and pass it on.
4849 */
4850 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4851 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4852 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4853 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4854
4855 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4856 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4857 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4858 return VINF_SUCCESS;
4859}
4860
4861
4862/**
4863 * Unregisters a shared module for the VM
4864 *
4865 * @returns VBox status code.
4866 * @param pGVM The global (ring-0) VM structure.
4867 * @param idCpu The VCPU id.
4868 * @param pszModuleName The module name.
4869 * @param pszVersion The module version.
4870 * @param GCPtrModBase The module base address.
4871 * @param cbModule The module size.
4872 */
4873GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4874 RTGCPTR GCPtrModBase, uint32_t cbModule)
4875{
4876#ifdef VBOX_WITH_PAGE_SHARING
4877 /*
4878 * Validate input and get the basics.
4879 */
4880 PGMM pGMM;
4881 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4882 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4883 if (RT_FAILURE(rc))
4884 return rc;
4885
4886 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4887 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4888 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4889 return VERR_GMM_MODULE_NAME_TOO_LONG;
4890 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4891 return VERR_GMM_MODULE_NAME_TOO_LONG;
4892
4893 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4894
4895 /*
4896 * Take the semaphore and do some more validations.
4897 */
4898 gmmR0MutexAcquire(pGMM);
4899 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4900 {
4901 /*
4902 * Locate and remove the specified module.
4903 */
4904 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4905 if (pRecVM)
4906 {
4907 /** @todo Do we need to do more validations here, like that the
4908 * name + version + cbModule matches? */
4909 NOREF(cbModule);
4910 Assert(pRecVM->pGlobalModule);
4911 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4912 }
4913 else
4914 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4915
4916 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4917 }
4918 else
4919 rc = VERR_GMM_IS_NOT_SANE;
4920
4921 gmmR0MutexRelease(pGMM);
4922 return rc;
4923#else
4924
4925 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4926 return VERR_NOT_IMPLEMENTED;
4927#endif
4928}
4929
4930
4931/**
4932 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4933 *
4934 * @returns see GMMR0UnregisterSharedModule.
4935 * @param pGVM The global (ring-0) VM structure.
4936 * @param idCpu The VCPU id.
4937 * @param pReq Pointer to the request packet.
4938 */
4939GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4940{
4941 /*
4942 * Validate input and pass it on.
4943 */
4944 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4945 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4946
4947 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4948}
4949
4950#ifdef VBOX_WITH_PAGE_SHARING
4951
4952/**
4953 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4954 *
4955 * @param pGMM Pointer to the GMM instance.
4956 * @param pGVM Pointer to the GVM instance.
4957 * @param pPage The page structure.
4958 */
4959DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4960{
4961 Assert(pGMM->cSharedPages > 0);
4962 Assert(pGMM->cAllocatedPages > 0);
4963
4964 pGMM->cDuplicatePages++;
4965
4966 pPage->Shared.cRefs++;
4967 pGVM->gmm.s.Stats.cSharedPages++;
4968 pGVM->gmm.s.Stats.Allocated.cBasePages++;
4969}
4970
4971
4972/**
4973 * Converts a private page to a shared page, the page is known to exist and be valid and such.
4974 *
4975 * @param pGMM Pointer to the GMM instance.
4976 * @param pGVM Pointer to the GVM instance.
4977 * @param HCPhys Host physical address
4978 * @param idPage The Page ID
4979 * @param pPage The page structure.
4980 * @param pPageDesc Shared page descriptor
4981 */
4982DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4983 PGMMSHAREDPAGEDESC pPageDesc)
4984{
4985 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4986 Assert(pChunk);
4987 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4988 Assert(GMM_PAGE_IS_PRIVATE(pPage));
4989
4990 pChunk->cPrivate--;
4991 pChunk->cShared++;
4992
4993 pGMM->cSharedPages++;
4994
4995 pGVM->gmm.s.Stats.cSharedPages++;
4996 pGVM->gmm.s.Stats.cPrivatePages--;
4997
4998 /* Modify the page structure. */
4999 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5000 pPage->Shared.cRefs = 1;
5001#ifdef VBOX_STRICT
5002 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5003 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5004#else
5005 NOREF(pPageDesc);
5006 pPage->Shared.u14Checksum = 0;
5007#endif
5008 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5009}
5010
5011
5012static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5013 unsigned idxRegion, unsigned idxPage,
5014 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5015{
5016 NOREF(pModule);
5017
5018 /* Easy case: just change the internal page type. */
5019 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5020 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5021 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5022 VERR_PGM_PHYS_INVALID_PAGE_ID);
5023 NOREF(idxRegion);
5024
5025 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5026
5027 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5028
5029 /* Keep track of these references. */
5030 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5031
5032 return VINF_SUCCESS;
5033}
5034
5035/**
5036 * Checks specified shared module range for changes
5037 *
5038 * Performs the following tasks:
5039 * - If a shared page is new, then it changes the GMM page type to shared and
5040 * returns it in the pPageDesc descriptor.
5041 * - If a shared page already exists, then it checks if the VM page is
5042 * identical and if so frees the VM page and returns the shared page in
5043 * pPageDesc descriptor.
5044 *
5045 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5046 *
5047 * @returns VBox status code.
5048 * @param pGVM Pointer to the GVM instance data.
5049 * @param pModule Module description
5050 * @param idxRegion Region index
5051 * @param idxPage Page index
5052 * @param pPageDesc Page descriptor
5053 */
5054GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5055 PGMMSHAREDPAGEDESC pPageDesc)
5056{
5057 int rc;
5058 PGMM pGMM;
5059 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5060 pPageDesc->u32StrictChecksum = 0;
5061
5062 AssertMsgReturn(idxRegion < pModule->cRegions,
5063 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5064 VERR_INVALID_PARAMETER);
5065
5066 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5067 AssertMsgReturn(idxPage < cPages,
5068 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5069 VERR_INVALID_PARAMETER);
5070
5071 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5072
5073 /*
5074 * First time; create a page descriptor array.
5075 */
5076 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5077 if (!pGlobalRegion->paidPages)
5078 {
5079 Log(("Allocate page descriptor array for %d pages\n", cPages));
5080 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5081 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5082
5083 /* Invalidate all descriptors. */
5084 uint32_t i = cPages;
5085 while (i-- > 0)
5086 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5087 }
5088
5089 /*
5090 * We've seen this shared page for the first time?
5091 */
5092 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5093 {
5094 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5095 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5096 }
5097
5098 /*
5099 * We've seen it before...
5100 */
5101 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5102 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5103 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5104
5105 /*
5106 * Get the shared page source.
5107 */
5108 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5109 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5110 VERR_PGM_PHYS_INVALID_PAGE_ID);
5111
5112 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5113 {
5114 /*
5115 * Page was freed at some point; invalidate this entry.
5116 */
5117 /** @todo this isn't really bullet proof. */
5118 Log(("Old shared page was freed -> create a new one\n"));
5119 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5120 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5121 }
5122
5123 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5124
5125 /*
5126 * Calculate the virtual address of the local page.
5127 */
5128 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5129 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5130 VERR_PGM_PHYS_INVALID_PAGE_ID);
5131
5132 uint8_t *pbChunk;
5133 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5134 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5135 VERR_PGM_PHYS_INVALID_PAGE_ID);
5136 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5137
5138 /*
5139 * Calculate the virtual address of the shared page.
5140 */
5141 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5142 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5143
5144 /*
5145 * Get the virtual address of the physical page; map the chunk into the VM
5146 * process if not already done.
5147 */
5148 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5149 {
5150 Log(("Map chunk into process!\n"));
5151 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5152 AssertRCReturn(rc, rc);
5153 }
5154 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5155
5156#ifdef VBOX_STRICT
5157 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5158 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5159 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5160 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5161 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5162#endif
5163
5164 /** @todo write ASMMemComparePage. */
5165 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5166 {
5167 Log(("Unexpected differences found between local and shared page; skip\n"));
5168 /* Signal to the caller that this one hasn't changed. */
5169 pPageDesc->idPage = NIL_GMM_PAGEID;
5170 return VINF_SUCCESS;
5171 }
5172
5173 /*
5174 * Free the old local page.
5175 */
5176 GMMFREEPAGEDESC PageDesc;
5177 PageDesc.idPage = pPageDesc->idPage;
5178 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5179 AssertRCReturn(rc, rc);
5180
5181 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5182
5183 /*
5184 * Pass along the new physical address & page id.
5185 */
5186 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5187 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5188
5189 return VINF_SUCCESS;
5190}
5191
5192
5193/**
5194 * RTAvlGCPtrDestroy callback.
5195 *
5196 * @returns 0 or VERR_GMM_INSTANCE.
5197 * @param pNode The node to destroy.
5198 * @param pvArgs Pointer to an argument packet.
5199 */
5200static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5201{
5202 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5203 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5204 (PGMMSHAREDMODULEPERVM)pNode,
5205 false /*fRemove*/);
5206 return VINF_SUCCESS;
5207}
5208
5209
5210/**
5211 * Used by GMMR0CleanupVM to clean up shared modules.
5212 *
5213 * This is called without taking the GMM lock so that it can be yielded as
5214 * needed here.
5215 *
5216 * @param pGMM The GMM handle.
5217 * @param pGVM The global VM handle.
5218 */
5219static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5220{
5221 gmmR0MutexAcquire(pGMM);
5222 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5223
5224 GMMR0SHMODPERVMDTORARGS Args;
5225 Args.pGVM = pGVM;
5226 Args.pGMM = pGMM;
5227 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5228
5229 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5230 pGVM->gmm.s.Stats.cShareableModules = 0;
5231
5232 gmmR0MutexRelease(pGMM);
5233}
5234
5235#endif /* VBOX_WITH_PAGE_SHARING */
5236
5237/**
5238 * Removes all shared modules for the specified VM
5239 *
5240 * @returns VBox status code.
5241 * @param pGVM The global (ring-0) VM structure.
5242 * @param idCpu The VCPU id.
5243 */
5244GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5245{
5246#ifdef VBOX_WITH_PAGE_SHARING
5247 /*
5248 * Validate input and get the basics.
5249 */
5250 PGMM pGMM;
5251 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5252 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5253 if (RT_FAILURE(rc))
5254 return rc;
5255
5256 /*
5257 * Take the semaphore and do some more validations.
5258 */
5259 gmmR0MutexAcquire(pGMM);
5260 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5261 {
5262 Log(("GMMR0ResetSharedModules\n"));
5263 GMMR0SHMODPERVMDTORARGS Args;
5264 Args.pGVM = pGVM;
5265 Args.pGMM = pGMM;
5266 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5267 pGVM->gmm.s.Stats.cShareableModules = 0;
5268
5269 rc = VINF_SUCCESS;
5270 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5271 }
5272 else
5273 rc = VERR_GMM_IS_NOT_SANE;
5274
5275 gmmR0MutexRelease(pGMM);
5276 return rc;
5277#else
5278 RT_NOREF(pGVM, idCpu);
5279 return VERR_NOT_IMPLEMENTED;
5280#endif
5281}
5282
5283#ifdef VBOX_WITH_PAGE_SHARING
5284
5285/**
5286 * Tree enumeration callback for checking a shared module.
5287 */
5288static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5289{
5290 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5291 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5292 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5293
5294 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5295 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5296
5297 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5298 if (RT_FAILURE(rc))
5299 return rc;
5300 return VINF_SUCCESS;
5301}
5302
5303#endif /* VBOX_WITH_PAGE_SHARING */
5304
5305/**
5306 * Check all shared modules for the specified VM.
5307 *
5308 * @returns VBox status code.
5309 * @param pGVM The global (ring-0) VM structure.
5310 * @param idCpu The calling EMT number.
5311 * @thread EMT(idCpu)
5312 */
5313GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5314{
5315#ifdef VBOX_WITH_PAGE_SHARING
5316 /*
5317 * Validate input and get the basics.
5318 */
5319 PGMM pGMM;
5320 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5321 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5322 if (RT_FAILURE(rc))
5323 return rc;
5324
5325# ifndef DEBUG_sandervl
5326 /*
5327 * Take the semaphore and do some more validations.
5328 */
5329 gmmR0MutexAcquire(pGMM);
5330# endif
5331 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5332 {
5333 /*
5334 * Walk the tree, checking each module.
5335 */
5336 Log(("GMMR0CheckSharedModules\n"));
5337
5338 GMMCHECKSHAREDMODULEINFO Args;
5339 Args.pGVM = pGVM;
5340 Args.idCpu = idCpu;
5341 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5342
5343 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5344 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5345 }
5346 else
5347 rc = VERR_GMM_IS_NOT_SANE;
5348
5349# ifndef DEBUG_sandervl
5350 gmmR0MutexRelease(pGMM);
5351# endif
5352 return rc;
5353#else
5354 RT_NOREF(pGVM, idCpu);
5355 return VERR_NOT_IMPLEMENTED;
5356#endif
5357}
5358
5359#ifdef VBOX_STRICT
5360
5361/**
5362 * Worker for GMMR0FindDuplicatePageReq.
5363 *
5364 * @returns true if duplicate, false if not.
5365 */
5366static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5367{
5368 bool fFoundDuplicate = false;
5369 /* Only take chunks not mapped into this VM process; not entirely correct. */
5370 uint8_t *pbChunk;
5371 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5372 {
5373 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5374 if (RT_SUCCESS(rc))
5375 {
5376 /*
5377 * Look for duplicate pages
5378 */
5379 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5380 while (iPage-- > 0)
5381 {
5382 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5383 {
5384 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5385 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5386 {
5387 fFoundDuplicate = true;
5388 break;
5389 }
5390 }
5391 }
5392 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5393 }
5394 }
5395 return fFoundDuplicate;
5396}
5397
5398
5399/**
5400 * Find a duplicate of the specified page in other active VMs
5401 *
5402 * @returns VBox status code.
5403 * @param pGVM The global (ring-0) VM structure.
5404 * @param pReq Pointer to the request packet.
5405 */
5406GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5407{
5408 /*
5409 * Validate input and pass it on.
5410 */
5411 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5412 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5413
5414 PGMM pGMM;
5415 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5416
5417 int rc = GVMMR0ValidateGVM(pGVM);
5418 if (RT_FAILURE(rc))
5419 return rc;
5420
5421 /*
5422 * Take the semaphore and do some more validations.
5423 */
5424 rc = gmmR0MutexAcquire(pGMM);
5425 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5426 {
5427 uint8_t *pbChunk;
5428 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5429 if (pChunk)
5430 {
5431 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5432 {
5433 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5434 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5435 if (pPage)
5436 {
5437 /*
5438 * Walk the chunks
5439 */
5440 pReq->fDuplicate = false;
5441 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5442 {
5443 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5444 {
5445 pReq->fDuplicate = true;
5446 break;
5447 }
5448 }
5449 }
5450 else
5451 {
5452 AssertFailed();
5453 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5454 }
5455 }
5456 else
5457 AssertFailed();
5458 }
5459 else
5460 AssertFailed();
5461 }
5462 else
5463 rc = VERR_GMM_IS_NOT_SANE;
5464
5465 gmmR0MutexRelease(pGMM);
5466 return rc;
5467}
5468
5469#endif /* VBOX_STRICT */
5470
5471
5472/**
5473 * Retrieves the GMM statistics visible to the caller.
5474 *
5475 * @returns VBox status code.
5476 *
5477 * @param pStats Where to put the statistics.
5478 * @param pSession The current session.
5479 * @param pGVM The GVM to obtain statistics for. Optional.
5480 */
5481GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5482{
5483 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5484
5485 /*
5486 * Validate input.
5487 */
5488 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5489 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5490 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5491
5492 PGMM pGMM;
5493 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5494
5495 /*
5496 * Validate the VM handle, if not NULL, and lock the GMM.
5497 */
5498 int rc;
5499 if (pGVM)
5500 {
5501 rc = GVMMR0ValidateGVM(pGVM);
5502 if (RT_FAILURE(rc))
5503 return rc;
5504 }
5505
5506 rc = gmmR0MutexAcquire(pGMM);
5507 if (RT_FAILURE(rc))
5508 return rc;
5509
5510 /*
5511 * Copy out the GMM statistics.
5512 */
5513 pStats->cMaxPages = pGMM->cMaxPages;
5514 pStats->cReservedPages = pGMM->cReservedPages;
5515 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5516 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5517 pStats->cSharedPages = pGMM->cSharedPages;
5518 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5519 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5520 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5521 pStats->cChunks = pGMM->cChunks;
5522 pStats->cFreedChunks = pGMM->cFreedChunks;
5523 pStats->cShareableModules = pGMM->cShareableModules;
5524 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5525 RT_ZERO(pStats->au64Reserved);
5526
5527 /*
5528 * Copy out the VM statistics.
5529 */
5530 if (pGVM)
5531 pStats->VMStats = pGVM->gmm.s.Stats;
5532 else
5533 RT_ZERO(pStats->VMStats);
5534
5535 gmmR0MutexRelease(pGMM);
5536 return rc;
5537}
5538
5539
5540/**
5541 * VMMR0 request wrapper for GMMR0QueryStatistics.
5542 *
5543 * @returns see GMMR0QueryStatistics.
5544 * @param pGVM The global (ring-0) VM structure. Optional.
5545 * @param pReq Pointer to the request packet.
5546 */
5547GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5548{
5549 /*
5550 * Validate input and pass it on.
5551 */
5552 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5553 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5554
5555 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5556}
5557
5558
5559/**
5560 * Resets the specified GMM statistics.
5561 *
5562 * @returns VBox status code.
5563 *
5564 * @param pStats Which statistics to reset, that is, non-zero fields
5565 * indicates which to reset.
5566 * @param pSession The current session.
5567 * @param pGVM The GVM to reset statistics for. Optional.
5568 */
5569GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5570{
5571 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5572 /* Currently nothing we can reset at the moment. */
5573 return VINF_SUCCESS;
5574}
5575
5576
5577/**
5578 * VMMR0 request wrapper for GMMR0ResetStatistics.
5579 *
5580 * @returns see GMMR0ResetStatistics.
5581 * @param pGVM The global (ring-0) VM structure. Optional.
5582 * @param pReq Pointer to the request packet.
5583 */
5584GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5585{
5586 /*
5587 * Validate input and pass it on.
5588 */
5589 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5590 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5591
5592 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5593}
5594
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette