VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 78927

Last change on this file since 78927 was 76553, checked in by vboxsync, 6 years ago

scm --update-copyright-year

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 191.9 KB
Line 
1/* $Id: GMMR0.cpp 76553 2019-01-01 01:45:53Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2019 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 *
111 * @subsection sub_gmm_locking Serializing
112 *
113 * One simple fast mutex will be employed in the initial implementation, not
114 * two as mentioned in @ref sec_pgmPhys_Serializing.
115 *
116 * @see @ref sec_pgmPhys_Serializing
117 *
118 *
119 * @section sec_gmm_overcommit Memory Over-Commitment Management
120 *
121 * The GVM will have to do the system wide memory over-commitment
122 * management. My current ideas are:
123 * - Per VM oc policy that indicates how much to initially commit
124 * to it and what to do in a out-of-memory situation.
125 * - Prevent overtaxing the host.
126 *
127 * There are some challenges here, the main ones are configurability and
128 * security. Should we for instance permit anyone to request 100% memory
129 * commitment? Who should be allowed to do runtime adjustments of the
130 * config. And how to prevent these settings from being lost when the last
131 * VM process exits? The solution is probably to have an optional root
132 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
133 *
134 *
135 *
136 * @section sec_gmm_numa NUMA
137 *
138 * NUMA considerations will be designed and implemented a bit later.
139 *
140 * The preliminary guesses is that we will have to try allocate memory as
141 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
142 * threads). Which means it's mostly about allocation and sharing policies.
143 * Both the scheduler and allocator interface will to supply some NUMA info
144 * and we'll need to have a way to calc access costs.
145 *
146 */
147
148
149/*********************************************************************************************************************************
150* Header Files *
151*********************************************************************************************************************************/
152#define LOG_GROUP LOG_GROUP_GMM
153#include <VBox/rawpci.h>
154#include <VBox/vmm/vm.h>
155#include <VBox/vmm/gmm.h>
156#include "GMMR0Internal.h"
157#include <VBox/vmm/gvm.h>
158#include <VBox/vmm/pgm.h>
159#include <VBox/log.h>
160#include <VBox/param.h>
161#include <VBox/err.h>
162#include <VBox/VMMDev.h>
163#include <iprt/asm.h>
164#include <iprt/avl.h>
165#ifdef VBOX_STRICT
166# include <iprt/crc.h>
167#endif
168#include <iprt/critsect.h>
169#include <iprt/list.h>
170#include <iprt/mem.h>
171#include <iprt/memobj.h>
172#include <iprt/mp.h>
173#include <iprt/semaphore.h>
174#include <iprt/string.h>
175#include <iprt/time.h>
176
177
178/*********************************************************************************************************************************
179* Defined Constants And Macros *
180*********************************************************************************************************************************/
181/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
182 * Use a critical section instead of a fast mutex for the giant GMM lock.
183 *
184 * @remarks This is primarily a way of avoiding the deadlock checks in the
185 * windows driver verifier. */
186#if defined(RT_OS_WINDOWS) || defined(DOXYGEN_RUNNING)
187# define VBOX_USE_CRIT_SECT_FOR_GIANT
188#endif
189
190
191/*********************************************************************************************************************************
192* Structures and Typedefs *
193*********************************************************************************************************************************/
194/** Pointer to set of free chunks. */
195typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
196
197/**
198 * The per-page tracking structure employed by the GMM.
199 *
200 * On 32-bit hosts we'll some trickery is necessary to compress all
201 * the information into 32-bits. When the fSharedFree member is set,
202 * the 30th bit decides whether it's a free page or not.
203 *
204 * Because of the different layout on 32-bit and 64-bit hosts, macros
205 * are used to get and set some of the data.
206 */
207typedef union GMMPAGE
208{
209#if HC_ARCH_BITS == 64
210 /** Unsigned integer view. */
211 uint64_t u;
212
213 /** The common view. */
214 struct GMMPAGECOMMON
215 {
216 uint32_t uStuff1 : 32;
217 uint32_t uStuff2 : 30;
218 /** The page state. */
219 uint32_t u2State : 2;
220 } Common;
221
222 /** The view of a private page. */
223 struct GMMPAGEPRIVATE
224 {
225 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
226 uint32_t pfn;
227 /** The GVM handle. (64K VMs) */
228 uint32_t hGVM : 16;
229 /** Reserved. */
230 uint32_t u16Reserved : 14;
231 /** The page state. */
232 uint32_t u2State : 2;
233 } Private;
234
235 /** The view of a shared page. */
236 struct GMMPAGESHARED
237 {
238 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
239 uint32_t pfn;
240 /** The reference count (64K VMs). */
241 uint32_t cRefs : 16;
242 /** Used for debug checksumming. */
243 uint32_t u14Checksum : 14;
244 /** The page state. */
245 uint32_t u2State : 2;
246 } Shared;
247
248 /** The view of a free page. */
249 struct GMMPAGEFREE
250 {
251 /** The index of the next page in the free list. UINT16_MAX is NIL. */
252 uint16_t iNext;
253 /** Reserved. Checksum or something? */
254 uint16_t u16Reserved0;
255 /** Reserved. Checksum or something? */
256 uint32_t u30Reserved1 : 30;
257 /** The page state. */
258 uint32_t u2State : 2;
259 } Free;
260
261#else /* 32-bit */
262 /** Unsigned integer view. */
263 uint32_t u;
264
265 /** The common view. */
266 struct GMMPAGECOMMON
267 {
268 uint32_t uStuff : 30;
269 /** The page state. */
270 uint32_t u2State : 2;
271 } Common;
272
273 /** The view of a private page. */
274 struct GMMPAGEPRIVATE
275 {
276 /** The guest page frame number. (Max addressable: 2 ^ 36) */
277 uint32_t pfn : 24;
278 /** The GVM handle. (127 VMs) */
279 uint32_t hGVM : 7;
280 /** The top page state bit, MBZ. */
281 uint32_t fZero : 1;
282 } Private;
283
284 /** The view of a shared page. */
285 struct GMMPAGESHARED
286 {
287 /** The reference count. */
288 uint32_t cRefs : 30;
289 /** The page state. */
290 uint32_t u2State : 2;
291 } Shared;
292
293 /** The view of a free page. */
294 struct GMMPAGEFREE
295 {
296 /** The index of the next page in the free list. UINT16_MAX is NIL. */
297 uint32_t iNext : 16;
298 /** Reserved. Checksum or something? */
299 uint32_t u14Reserved : 14;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Free;
303#endif
304} GMMPAGE;
305AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
306/** Pointer to a GMMPAGE. */
307typedef GMMPAGE *PGMMPAGE;
308
309
310/** @name The Page States.
311 * @{ */
312/** A private page. */
313#define GMM_PAGE_STATE_PRIVATE 0
314/** A private page - alternative value used on the 32-bit implementation.
315 * This will never be used on 64-bit hosts. */
316#define GMM_PAGE_STATE_PRIVATE_32 1
317/** A shared page. */
318#define GMM_PAGE_STATE_SHARED 2
319/** A free page. */
320#define GMM_PAGE_STATE_FREE 3
321/** @} */
322
323
324/** @def GMM_PAGE_IS_PRIVATE
325 *
326 * @returns true if private, false if not.
327 * @param pPage The GMM page.
328 */
329#if HC_ARCH_BITS == 64
330# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
331#else
332# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
333#endif
334
335/** @def GMM_PAGE_IS_SHARED
336 *
337 * @returns true if shared, false if not.
338 * @param pPage The GMM page.
339 */
340#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
341
342/** @def GMM_PAGE_IS_FREE
343 *
344 * @returns true if free, false if not.
345 * @param pPage The GMM page.
346 */
347#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
348
349/** @def GMM_PAGE_PFN_LAST
350 * The last valid guest pfn range.
351 * @remark Some of the values outside the range has special meaning,
352 * see GMM_PAGE_PFN_UNSHAREABLE.
353 */
354#if HC_ARCH_BITS == 64
355# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
356#else
357# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
358#endif
359AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
360
361/** @def GMM_PAGE_PFN_UNSHAREABLE
362 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
363 */
364#if HC_ARCH_BITS == 64
365# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
366#else
367# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
368#endif
369AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
370
371
372/**
373 * A GMM allocation chunk ring-3 mapping record.
374 *
375 * This should really be associated with a session and not a VM, but
376 * it's simpler to associated with a VM and cleanup with the VM object
377 * is destroyed.
378 */
379typedef struct GMMCHUNKMAP
380{
381 /** The mapping object. */
382 RTR0MEMOBJ hMapObj;
383 /** The VM owning the mapping. */
384 PGVM pGVM;
385} GMMCHUNKMAP;
386/** Pointer to a GMM allocation chunk mapping. */
387typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
388
389
390/**
391 * A GMM allocation chunk.
392 */
393typedef struct GMMCHUNK
394{
395 /** The AVL node core.
396 * The Key is the chunk ID. (Giant mtx.) */
397 AVLU32NODECORE Core;
398 /** The memory object.
399 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
400 * what the host can dish up with. (Chunk mtx protects mapping accesses
401 * and related frees.) */
402 RTR0MEMOBJ hMemObj;
403 /** Pointer to the next chunk in the free list. (Giant mtx.) */
404 PGMMCHUNK pFreeNext;
405 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
406 PGMMCHUNK pFreePrev;
407 /** Pointer to the free set this chunk belongs to. NULL for
408 * chunks with no free pages. (Giant mtx.) */
409 PGMMCHUNKFREESET pSet;
410 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
411 RTLISTNODE ListNode;
412 /** Pointer to an array of mappings. (Chunk mtx.) */
413 PGMMCHUNKMAP paMappingsX;
414 /** The number of mappings. (Chunk mtx.) */
415 uint16_t cMappingsX;
416 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
417 * mapping or freeing anything. (Giant mtx.) */
418 uint8_t volatile iChunkMtx;
419 /** Flags field reserved for future use (like eliminating enmType).
420 * (Giant mtx.) */
421 uint8_t fFlags;
422 /** The head of the list of free pages. UINT16_MAX is the NIL value.
423 * (Giant mtx.) */
424 uint16_t iFreeHead;
425 /** The number of free pages. (Giant mtx.) */
426 uint16_t cFree;
427 /** The GVM handle of the VM that first allocated pages from this chunk, this
428 * is used as a preference when there are several chunks to choose from.
429 * When in bound memory mode this isn't a preference any longer. (Giant
430 * mtx.) */
431 uint16_t hGVM;
432 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
433 * future use.) (Giant mtx.) */
434 uint16_t idNumaNode;
435 /** The number of private pages. (Giant mtx.) */
436 uint16_t cPrivate;
437 /** The number of shared pages. (Giant mtx.) */
438 uint16_t cShared;
439 /** The pages. (Giant mtx.) */
440 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
441} GMMCHUNK;
442
443/** Indicates that the NUMA properies of the memory is unknown. */
444#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
445
446/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
447 * @{ */
448/** Indicates that the chunk is a large page (2MB). */
449#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
450/** @} */
451
452
453/**
454 * An allocation chunk TLB entry.
455 */
456typedef struct GMMCHUNKTLBE
457{
458 /** The chunk id. */
459 uint32_t idChunk;
460 /** Pointer to the chunk. */
461 PGMMCHUNK pChunk;
462} GMMCHUNKTLBE;
463/** Pointer to an allocation chunk TLB entry. */
464typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
465
466
467/** The number of entries tin the allocation chunk TLB. */
468#define GMM_CHUNKTLB_ENTRIES 32
469/** Gets the TLB entry index for the given Chunk ID. */
470#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
471
472/**
473 * An allocation chunk TLB.
474 */
475typedef struct GMMCHUNKTLB
476{
477 /** The TLB entries. */
478 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
479} GMMCHUNKTLB;
480/** Pointer to an allocation chunk TLB. */
481typedef GMMCHUNKTLB *PGMMCHUNKTLB;
482
483
484/**
485 * The GMM instance data.
486 */
487typedef struct GMM
488{
489 /** Magic / eye catcher. GMM_MAGIC */
490 uint32_t u32Magic;
491 /** The number of threads waiting on the mutex. */
492 uint32_t cMtxContenders;
493#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
494 /** The critical section protecting the GMM.
495 * More fine grained locking can be implemented later if necessary. */
496 RTCRITSECT GiantCritSect;
497#else
498 /** The fast mutex protecting the GMM.
499 * More fine grained locking can be implemented later if necessary. */
500 RTSEMFASTMUTEX hMtx;
501#endif
502#ifdef VBOX_STRICT
503 /** The current mutex owner. */
504 RTNATIVETHREAD hMtxOwner;
505#endif
506 /** The chunk tree. */
507 PAVLU32NODECORE pChunks;
508 /** The chunk TLB. */
509 GMMCHUNKTLB ChunkTLB;
510 /** The private free set. */
511 GMMCHUNKFREESET PrivateX;
512 /** The shared free set. */
513 GMMCHUNKFREESET Shared;
514
515 /** Shared module tree (global).
516 * @todo separate trees for distinctly different guest OSes. */
517 PAVLLU32NODECORE pGlobalSharedModuleTree;
518 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
519 uint32_t cShareableModules;
520
521 /** The chunk list. For simplifying the cleanup process. */
522 RTLISTANCHOR ChunkList;
523
524 /** The maximum number of pages we're allowed to allocate.
525 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
526 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
527 uint64_t cMaxPages;
528 /** The number of pages that has been reserved.
529 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
530 uint64_t cReservedPages;
531 /** The number of pages that we have over-committed in reservations. */
532 uint64_t cOverCommittedPages;
533 /** The number of actually allocated (committed if you like) pages. */
534 uint64_t cAllocatedPages;
535 /** The number of pages that are shared. A subset of cAllocatedPages. */
536 uint64_t cSharedPages;
537 /** The number of pages that are actually shared between VMs. */
538 uint64_t cDuplicatePages;
539 /** The number of pages that are shared that has been left behind by
540 * VMs not doing proper cleanups. */
541 uint64_t cLeftBehindSharedPages;
542 /** The number of allocation chunks.
543 * (The number of pages we've allocated from the host can be derived from this.) */
544 uint32_t cChunks;
545 /** The number of current ballooned pages. */
546 uint64_t cBalloonedPages;
547
548 /** The legacy allocation mode indicator.
549 * This is determined at initialization time. */
550 bool fLegacyAllocationMode;
551 /** The bound memory mode indicator.
552 * When set, the memory will be bound to a specific VM and never
553 * shared. This is always set if fLegacyAllocationMode is set.
554 * (Also determined at initialization time.) */
555 bool fBoundMemoryMode;
556 /** The number of registered VMs. */
557 uint16_t cRegisteredVMs;
558
559 /** The number of freed chunks ever. This is used a list generation to
560 * avoid restarting the cleanup scanning when the list wasn't modified. */
561 uint32_t cFreedChunks;
562 /** The previous allocated Chunk ID.
563 * Used as a hint to avoid scanning the whole bitmap. */
564 uint32_t idChunkPrev;
565 /** Chunk ID allocation bitmap.
566 * Bits of allocated IDs are set, free ones are clear.
567 * The NIL id (0) is marked allocated. */
568 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
569
570 /** The index of the next mutex to use. */
571 uint32_t iNextChunkMtx;
572 /** Chunk locks for reducing lock contention without having to allocate
573 * one lock per chunk. */
574 struct
575 {
576 /** The mutex */
577 RTSEMFASTMUTEX hMtx;
578 /** The number of threads currently using this mutex. */
579 uint32_t volatile cUsers;
580 } aChunkMtx[64];
581} GMM;
582/** Pointer to the GMM instance. */
583typedef GMM *PGMM;
584
585/** The value of GMM::u32Magic (Katsuhiro Otomo). */
586#define GMM_MAGIC UINT32_C(0x19540414)
587
588
589/**
590 * GMM chunk mutex state.
591 *
592 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
593 * gmmR0ChunkMutex* methods.
594 */
595typedef struct GMMR0CHUNKMTXSTATE
596{
597 PGMM pGMM;
598 /** The index of the chunk mutex. */
599 uint8_t iChunkMtx;
600 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
601 uint8_t fFlags;
602} GMMR0CHUNKMTXSTATE;
603/** Pointer to a chunk mutex state. */
604typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
605
606/** @name GMMR0CHUNK_MTX_XXX
607 * @{ */
608#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
609#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
610#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
611#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
612#define GMMR0CHUNK_MTX_END UINT32_C(4)
613/** @} */
614
615
616/** The maximum number of shared modules per-vm. */
617#define GMM_MAX_SHARED_PER_VM_MODULES 2048
618/** The maximum number of shared modules GMM is allowed to track. */
619#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
620
621
622/**
623 * Argument packet for gmmR0SharedModuleCleanup.
624 */
625typedef struct GMMR0SHMODPERVMDTORARGS
626{
627 PGVM pGVM;
628 PGMM pGMM;
629} GMMR0SHMODPERVMDTORARGS;
630
631/**
632 * Argument packet for gmmR0CheckSharedModule.
633 */
634typedef struct GMMCHECKSHAREDMODULEINFO
635{
636 PGVM pGVM;
637 VMCPUID idCpu;
638} GMMCHECKSHAREDMODULEINFO;
639
640/**
641 * Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
642 */
643typedef struct GMMFINDDUPPAGEINFO
644{
645 PGVM pGVM;
646 PGMM pGMM;
647 uint8_t *pSourcePage;
648 bool fFoundDuplicate;
649} GMMFINDDUPPAGEINFO;
650
651
652/*********************************************************************************************************************************
653* Global Variables *
654*********************************************************************************************************************************/
655/** Pointer to the GMM instance data. */
656static PGMM g_pGMM = NULL;
657
658/** Macro for obtaining and validating the g_pGMM pointer.
659 *
660 * On failure it will return from the invoking function with the specified
661 * return value.
662 *
663 * @param pGMM The name of the pGMM variable.
664 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
665 * status codes.
666 */
667#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
668 do { \
669 (pGMM) = g_pGMM; \
670 AssertPtrReturn((pGMM), (rc)); \
671 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
672 } while (0)
673
674/** Macro for obtaining and validating the g_pGMM pointer, void function
675 * variant.
676 *
677 * On failure it will return from the invoking function.
678 *
679 * @param pGMM The name of the pGMM variable.
680 */
681#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
682 do { \
683 (pGMM) = g_pGMM; \
684 AssertPtrReturnVoid((pGMM)); \
685 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
686 } while (0)
687
688
689/** @def GMM_CHECK_SANITY_UPON_ENTERING
690 * Checks the sanity of the GMM instance data before making changes.
691 *
692 * This is macro is a stub by default and must be enabled manually in the code.
693 *
694 * @returns true if sane, false if not.
695 * @param pGMM The name of the pGMM variable.
696 */
697#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
698# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
699#else
700# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
701#endif
702
703/** @def GMM_CHECK_SANITY_UPON_LEAVING
704 * Checks the sanity of the GMM instance data after making changes.
705 *
706 * This is macro is a stub by default and must be enabled manually in the code.
707 *
708 * @returns true if sane, false if not.
709 * @param pGMM The name of the pGMM variable.
710 */
711#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
712# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
713#else
714# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
715#endif
716
717/** @def GMM_CHECK_SANITY_IN_LOOPS
718 * Checks the sanity of the GMM instance in the allocation loops.
719 *
720 * This is macro is a stub by default and must be enabled manually in the code.
721 *
722 * @returns true if sane, false if not.
723 * @param pGMM The name of the pGMM variable.
724 */
725#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
726# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
727#else
728# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
729#endif
730
731
732/*********************************************************************************************************************************
733* Internal Functions *
734*********************************************************************************************************************************/
735static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
736static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
737DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
738DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
739DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
740#ifdef GMMR0_WITH_SANITY_CHECK
741static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
742#endif
743static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
744DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
745DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
746static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
747#ifdef VBOX_WITH_PAGE_SHARING
748static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
749# ifdef VBOX_STRICT
750static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
751# endif
752#endif
753
754
755
756/**
757 * Initializes the GMM component.
758 *
759 * This is called when the VMMR0.r0 module is loaded and protected by the
760 * loader semaphore.
761 *
762 * @returns VBox status code.
763 */
764GMMR0DECL(int) GMMR0Init(void)
765{
766 LogFlow(("GMMInit:\n"));
767
768 /*
769 * Allocate the instance data and the locks.
770 */
771 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
772 if (!pGMM)
773 return VERR_NO_MEMORY;
774
775 pGMM->u32Magic = GMM_MAGIC;
776 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
777 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
778 RTListInit(&pGMM->ChunkList);
779 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
780
781#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
782 int rc = RTCritSectInit(&pGMM->GiantCritSect);
783#else
784 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
785#endif
786 if (RT_SUCCESS(rc))
787 {
788 unsigned iMtx;
789 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
790 {
791 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
792 if (RT_FAILURE(rc))
793 break;
794 }
795 if (RT_SUCCESS(rc))
796 {
797 /*
798 * Check and see if RTR0MemObjAllocPhysNC works.
799 */
800#if 0 /* later, see @bufref{3170}. */
801 RTR0MEMOBJ MemObj;
802 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
803 if (RT_SUCCESS(rc))
804 {
805 rc = RTR0MemObjFree(MemObj, true);
806 AssertRC(rc);
807 }
808 else if (rc == VERR_NOT_SUPPORTED)
809 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
810 else
811 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
812#else
813# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
814 pGMM->fLegacyAllocationMode = false;
815# if ARCH_BITS == 32
816 /* Don't reuse possibly partial chunks because of the virtual
817 address space limitation. */
818 pGMM->fBoundMemoryMode = true;
819# else
820 pGMM->fBoundMemoryMode = false;
821# endif
822# else
823 pGMM->fLegacyAllocationMode = true;
824 pGMM->fBoundMemoryMode = true;
825# endif
826#endif
827
828 /*
829 * Query system page count and guess a reasonable cMaxPages value.
830 */
831 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
832
833 g_pGMM = pGMM;
834 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
835 return VINF_SUCCESS;
836 }
837
838 /*
839 * Bail out.
840 */
841 while (iMtx-- > 0)
842 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
843#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
844 RTCritSectDelete(&pGMM->GiantCritSect);
845#else
846 RTSemFastMutexDestroy(pGMM->hMtx);
847#endif
848 }
849
850 pGMM->u32Magic = 0;
851 RTMemFree(pGMM);
852 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
853 return rc;
854}
855
856
857/**
858 * Terminates the GMM component.
859 */
860GMMR0DECL(void) GMMR0Term(void)
861{
862 LogFlow(("GMMTerm:\n"));
863
864 /*
865 * Take care / be paranoid...
866 */
867 PGMM pGMM = g_pGMM;
868 if (!VALID_PTR(pGMM))
869 return;
870 if (pGMM->u32Magic != GMM_MAGIC)
871 {
872 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
873 return;
874 }
875
876 /*
877 * Undo what init did and free all the resources we've acquired.
878 */
879 /* Destroy the fundamentals. */
880 g_pGMM = NULL;
881 pGMM->u32Magic = ~GMM_MAGIC;
882#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
883 RTCritSectDelete(&pGMM->GiantCritSect);
884#else
885 RTSemFastMutexDestroy(pGMM->hMtx);
886 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
887#endif
888
889 /* Free any chunks still hanging around. */
890 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
891
892 /* Destroy the chunk locks. */
893 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
894 {
895 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
896 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
897 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
898 }
899
900 /* Finally the instance data itself. */
901 RTMemFree(pGMM);
902 LogFlow(("GMMTerm: done\n"));
903}
904
905
906/**
907 * RTAvlU32Destroy callback.
908 *
909 * @returns 0
910 * @param pNode The node to destroy.
911 * @param pvGMM The GMM handle.
912 */
913static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
914{
915 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
916
917 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
918 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
919 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
920
921 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
922 if (RT_FAILURE(rc))
923 {
924 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
925 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
926 AssertRC(rc);
927 }
928 pChunk->hMemObj = NIL_RTR0MEMOBJ;
929
930 RTMemFree(pChunk->paMappingsX);
931 pChunk->paMappingsX = NULL;
932
933 RTMemFree(pChunk);
934 NOREF(pvGMM);
935 return 0;
936}
937
938
939/**
940 * Initializes the per-VM data for the GMM.
941 *
942 * This is called from within the GVMM lock (from GVMMR0CreateVM)
943 * and should only initialize the data members so GMMR0CleanupVM
944 * can deal with them. We reserve no memory or anything here,
945 * that's done later in GMMR0InitVM.
946 *
947 * @param pGVM Pointer to the Global VM structure.
948 */
949GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
950{
951 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
952
953 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
954 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
955 pGVM->gmm.s.Stats.fMayAllocate = false;
956}
957
958
959/**
960 * Acquires the GMM giant lock.
961 *
962 * @returns Assert status code from RTSemFastMutexRequest.
963 * @param pGMM Pointer to the GMM instance.
964 */
965static int gmmR0MutexAcquire(PGMM pGMM)
966{
967 ASMAtomicIncU32(&pGMM->cMtxContenders);
968#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
969 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
970#else
971 int rc = RTSemFastMutexRequest(pGMM->hMtx);
972#endif
973 ASMAtomicDecU32(&pGMM->cMtxContenders);
974 AssertRC(rc);
975#ifdef VBOX_STRICT
976 pGMM->hMtxOwner = RTThreadNativeSelf();
977#endif
978 return rc;
979}
980
981
982/**
983 * Releases the GMM giant lock.
984 *
985 * @returns Assert status code from RTSemFastMutexRequest.
986 * @param pGMM Pointer to the GMM instance.
987 */
988static int gmmR0MutexRelease(PGMM pGMM)
989{
990#ifdef VBOX_STRICT
991 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
992#endif
993#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
994 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
995#else
996 int rc = RTSemFastMutexRelease(pGMM->hMtx);
997 AssertRC(rc);
998#endif
999 return rc;
1000}
1001
1002
1003/**
1004 * Yields the GMM giant lock if there is contention and a certain minimum time
1005 * has elapsed since we took it.
1006 *
1007 * @returns @c true if the mutex was yielded, @c false if not.
1008 * @param pGMM Pointer to the GMM instance.
1009 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1010 * (in/out).
1011 */
1012static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1013{
1014 /*
1015 * If nobody is contending the mutex, don't bother checking the time.
1016 */
1017 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1018 return false;
1019
1020 /*
1021 * Don't yield if we haven't executed for at least 2 milliseconds.
1022 */
1023 uint64_t uNanoNow = RTTimeSystemNanoTS();
1024 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1025 return false;
1026
1027 /*
1028 * Yield the mutex.
1029 */
1030#ifdef VBOX_STRICT
1031 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1032#endif
1033 ASMAtomicIncU32(&pGMM->cMtxContenders);
1034#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1035 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1036#else
1037 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1038#endif
1039
1040 RTThreadYield();
1041
1042#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1043 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1044#else
1045 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1046#endif
1047 *puLockNanoTS = RTTimeSystemNanoTS();
1048 ASMAtomicDecU32(&pGMM->cMtxContenders);
1049#ifdef VBOX_STRICT
1050 pGMM->hMtxOwner = RTThreadNativeSelf();
1051#endif
1052
1053 return true;
1054}
1055
1056
1057/**
1058 * Acquires a chunk lock.
1059 *
1060 * The caller must own the giant lock.
1061 *
1062 * @returns Assert status code from RTSemFastMutexRequest.
1063 * @param pMtxState The chunk mutex state info. (Avoids
1064 * passing the same flags and stuff around
1065 * for subsequent release and drop-giant
1066 * calls.)
1067 * @param pGMM Pointer to the GMM instance.
1068 * @param pChunk Pointer to the chunk.
1069 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1070 */
1071static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1072{
1073 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1074 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1075
1076 pMtxState->pGMM = pGMM;
1077 pMtxState->fFlags = (uint8_t)fFlags;
1078
1079 /*
1080 * Get the lock index and reference the lock.
1081 */
1082 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1083 uint32_t iChunkMtx = pChunk->iChunkMtx;
1084 if (iChunkMtx == UINT8_MAX)
1085 {
1086 iChunkMtx = pGMM->iNextChunkMtx++;
1087 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1088
1089 /* Try get an unused one... */
1090 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1091 {
1092 iChunkMtx = pGMM->iNextChunkMtx++;
1093 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1094 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1095 {
1096 iChunkMtx = pGMM->iNextChunkMtx++;
1097 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1098 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1099 {
1100 iChunkMtx = pGMM->iNextChunkMtx++;
1101 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1102 }
1103 }
1104 }
1105
1106 pChunk->iChunkMtx = iChunkMtx;
1107 }
1108 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1109 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1110 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1111
1112 /*
1113 * Drop the giant?
1114 */
1115 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1116 {
1117 /** @todo GMM life cycle cleanup (we may race someone
1118 * destroying and cleaning up GMM)? */
1119 gmmR0MutexRelease(pGMM);
1120 }
1121
1122 /*
1123 * Take the chunk mutex.
1124 */
1125 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1126 AssertRC(rc);
1127 return rc;
1128}
1129
1130
1131/**
1132 * Releases the GMM giant lock.
1133 *
1134 * @returns Assert status code from RTSemFastMutexRequest.
1135 * @param pMtxState Pointer to the chunk mutex state.
1136 * @param pChunk Pointer to the chunk if it's still
1137 * alive, NULL if it isn't. This is used to deassociate
1138 * the chunk from the mutex on the way out so a new one
1139 * can be selected next time, thus avoiding contented
1140 * mutexes.
1141 */
1142static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1143{
1144 PGMM pGMM = pMtxState->pGMM;
1145
1146 /*
1147 * Release the chunk mutex and reacquire the giant if requested.
1148 */
1149 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1150 AssertRC(rc);
1151 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1152 rc = gmmR0MutexAcquire(pGMM);
1153 else
1154 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1155
1156 /*
1157 * Drop the chunk mutex user reference and deassociate it from the chunk
1158 * when possible.
1159 */
1160 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1161 && pChunk
1162 && RT_SUCCESS(rc) )
1163 {
1164 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1165 pChunk->iChunkMtx = UINT8_MAX;
1166 else
1167 {
1168 rc = gmmR0MutexAcquire(pGMM);
1169 if (RT_SUCCESS(rc))
1170 {
1171 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1172 pChunk->iChunkMtx = UINT8_MAX;
1173 rc = gmmR0MutexRelease(pGMM);
1174 }
1175 }
1176 }
1177
1178 pMtxState->pGMM = NULL;
1179 return rc;
1180}
1181
1182
1183/**
1184 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1185 * chunk locked.
1186 *
1187 * This only works if gmmR0ChunkMutexAcquire was called with
1188 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1189 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1190 *
1191 * @returns VBox status code (assuming success is ok).
1192 * @param pMtxState Pointer to the chunk mutex state.
1193 */
1194static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1195{
1196 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1197 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1198 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1199 /** @todo GMM life cycle cleanup (we may race someone
1200 * destroying and cleaning up GMM)? */
1201 return gmmR0MutexRelease(pMtxState->pGMM);
1202}
1203
1204
1205/**
1206 * For experimenting with NUMA affinity and such.
1207 *
1208 * @returns The current NUMA Node ID.
1209 */
1210static uint16_t gmmR0GetCurrentNumaNodeId(void)
1211{
1212#if 1
1213 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1214#else
1215 return RTMpCpuId() / 16;
1216#endif
1217}
1218
1219
1220
1221/**
1222 * Cleans up when a VM is terminating.
1223 *
1224 * @param pGVM Pointer to the Global VM structure.
1225 */
1226GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1227{
1228 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1229
1230 PGMM pGMM;
1231 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1232
1233#ifdef VBOX_WITH_PAGE_SHARING
1234 /*
1235 * Clean up all registered shared modules first.
1236 */
1237 gmmR0SharedModuleCleanup(pGMM, pGVM);
1238#endif
1239
1240 gmmR0MutexAcquire(pGMM);
1241 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1242 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1243
1244 /*
1245 * The policy is 'INVALID' until the initial reservation
1246 * request has been serviced.
1247 */
1248 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1249 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1250 {
1251 /*
1252 * If it's the last VM around, we can skip walking all the chunk looking
1253 * for the pages owned by this VM and instead flush the whole shebang.
1254 *
1255 * This takes care of the eventuality that a VM has left shared page
1256 * references behind (shouldn't happen of course, but you never know).
1257 */
1258 Assert(pGMM->cRegisteredVMs);
1259 pGMM->cRegisteredVMs--;
1260
1261 /*
1262 * Walk the entire pool looking for pages that belong to this VM
1263 * and leftover mappings. (This'll only catch private pages,
1264 * shared pages will be 'left behind'.)
1265 */
1266 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1267 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1268
1269 unsigned iCountDown = 64;
1270 bool fRedoFromStart;
1271 PGMMCHUNK pChunk;
1272 do
1273 {
1274 fRedoFromStart = false;
1275 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1276 {
1277 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1278 if ( ( !pGMM->fBoundMemoryMode
1279 || pChunk->hGVM == pGVM->hSelf)
1280 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1281 {
1282 /* We left the giant mutex, so reset the yield counters. */
1283 uLockNanoTS = RTTimeSystemNanoTS();
1284 iCountDown = 64;
1285 }
1286 else
1287 {
1288 /* Didn't leave it, so do normal yielding. */
1289 if (!iCountDown)
1290 gmmR0MutexYield(pGMM, &uLockNanoTS);
1291 else
1292 iCountDown--;
1293 }
1294 if (pGMM->cFreedChunks != cFreeChunksOld)
1295 {
1296 fRedoFromStart = true;
1297 break;
1298 }
1299 }
1300 } while (fRedoFromStart);
1301
1302 if (pGVM->gmm.s.Stats.cPrivatePages)
1303 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1304
1305 pGMM->cAllocatedPages -= cPrivatePages;
1306
1307 /*
1308 * Free empty chunks.
1309 */
1310 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1311 do
1312 {
1313 fRedoFromStart = false;
1314 iCountDown = 10240;
1315 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1316 while (pChunk)
1317 {
1318 PGMMCHUNK pNext = pChunk->pFreeNext;
1319 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1320 if ( !pGMM->fBoundMemoryMode
1321 || pChunk->hGVM == pGVM->hSelf)
1322 {
1323 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1324 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1325 {
1326 /* We've left the giant mutex, restart? (+1 for our unlink) */
1327 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1328 if (fRedoFromStart)
1329 break;
1330 uLockNanoTS = RTTimeSystemNanoTS();
1331 iCountDown = 10240;
1332 }
1333 }
1334
1335 /* Advance and maybe yield the lock. */
1336 pChunk = pNext;
1337 if (--iCountDown == 0)
1338 {
1339 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1340 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1341 && pPrivateSet->idGeneration != idGenerationOld;
1342 if (fRedoFromStart)
1343 break;
1344 iCountDown = 10240;
1345 }
1346 }
1347 } while (fRedoFromStart);
1348
1349 /*
1350 * Account for shared pages that weren't freed.
1351 */
1352 if (pGVM->gmm.s.Stats.cSharedPages)
1353 {
1354 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1355 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1356 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1357 }
1358
1359 /*
1360 * Clean up balloon statistics in case the VM process crashed.
1361 */
1362 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1363 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1364
1365 /*
1366 * Update the over-commitment management statistics.
1367 */
1368 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1369 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1370 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1371 switch (pGVM->gmm.s.Stats.enmPolicy)
1372 {
1373 case GMMOCPOLICY_NO_OC:
1374 break;
1375 default:
1376 /** @todo Update GMM->cOverCommittedPages */
1377 break;
1378 }
1379 }
1380
1381 /* zap the GVM data. */
1382 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1383 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1384 pGVM->gmm.s.Stats.fMayAllocate = false;
1385
1386 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1387 gmmR0MutexRelease(pGMM);
1388
1389 LogFlow(("GMMR0CleanupVM: returns\n"));
1390}
1391
1392
1393/**
1394 * Scan one chunk for private pages belonging to the specified VM.
1395 *
1396 * @note This function may drop the giant mutex!
1397 *
1398 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1399 * we didn't.
1400 * @param pGMM Pointer to the GMM instance.
1401 * @param pGVM The global VM handle.
1402 * @param pChunk The chunk to scan.
1403 */
1404static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1405{
1406 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1407
1408 /*
1409 * Look for pages belonging to the VM.
1410 * (Perform some internal checks while we're scanning.)
1411 */
1412#ifndef VBOX_STRICT
1413 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1414#endif
1415 {
1416 unsigned cPrivate = 0;
1417 unsigned cShared = 0;
1418 unsigned cFree = 0;
1419
1420 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1421
1422 uint16_t hGVM = pGVM->hSelf;
1423 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1424 while (iPage-- > 0)
1425 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1426 {
1427 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1428 {
1429 /*
1430 * Free the page.
1431 *
1432 * The reason for not using gmmR0FreePrivatePage here is that we
1433 * must *not* cause the chunk to be freed from under us - we're in
1434 * an AVL tree walk here.
1435 */
1436 pChunk->aPages[iPage].u = 0;
1437 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1438 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1439 pChunk->iFreeHead = iPage;
1440 pChunk->cPrivate--;
1441 pChunk->cFree++;
1442 pGVM->gmm.s.Stats.cPrivatePages--;
1443 cFree++;
1444 }
1445 else
1446 cPrivate++;
1447 }
1448 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1449 cFree++;
1450 else
1451 cShared++;
1452
1453 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1454
1455 /*
1456 * Did it add up?
1457 */
1458 if (RT_UNLIKELY( pChunk->cFree != cFree
1459 || pChunk->cPrivate != cPrivate
1460 || pChunk->cShared != cShared))
1461 {
1462 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1463 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1464 pChunk->cFree = cFree;
1465 pChunk->cPrivate = cPrivate;
1466 pChunk->cShared = cShared;
1467 }
1468 }
1469
1470 /*
1471 * If not in bound memory mode, we should reset the hGVM field
1472 * if it has our handle in it.
1473 */
1474 if (pChunk->hGVM == pGVM->hSelf)
1475 {
1476 if (!g_pGMM->fBoundMemoryMode)
1477 pChunk->hGVM = NIL_GVM_HANDLE;
1478 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1479 {
1480 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1481 pChunk, pChunk->Core.Key, pChunk->cFree);
1482 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1483
1484 gmmR0UnlinkChunk(pChunk);
1485 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1486 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1487 }
1488 }
1489
1490 /*
1491 * Look for a mapping belonging to the terminating VM.
1492 */
1493 GMMR0CHUNKMTXSTATE MtxState;
1494 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1495 unsigned cMappings = pChunk->cMappingsX;
1496 for (unsigned i = 0; i < cMappings; i++)
1497 if (pChunk->paMappingsX[i].pGVM == pGVM)
1498 {
1499 gmmR0ChunkMutexDropGiant(&MtxState);
1500
1501 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1502
1503 cMappings--;
1504 if (i < cMappings)
1505 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1506 pChunk->paMappingsX[cMappings].pGVM = NULL;
1507 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1508 Assert(pChunk->cMappingsX - 1U == cMappings);
1509 pChunk->cMappingsX = cMappings;
1510
1511 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1512 if (RT_FAILURE(rc))
1513 {
1514 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1515 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1516 AssertRC(rc);
1517 }
1518
1519 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1520 return true;
1521 }
1522
1523 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1524 return false;
1525}
1526
1527
1528/**
1529 * The initial resource reservations.
1530 *
1531 * This will make memory reservations according to policy and priority. If there aren't
1532 * sufficient resources available to sustain the VM this function will fail and all
1533 * future allocations requests will fail as well.
1534 *
1535 * These are just the initial reservations made very very early during the VM creation
1536 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1537 * ring-3 init has completed.
1538 *
1539 * @returns VBox status code.
1540 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1541 * @retval VERR_GMM_
1542 *
1543 * @param pGVM The global (ring-0) VM structure.
1544 * @param pVM The cross context VM structure.
1545 * @param idCpu The VCPU id - must be zero.
1546 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1547 * This does not include MMIO2 and similar.
1548 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1549 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1550 * hyper heap, MMIO2 and similar.
1551 * @param enmPolicy The OC policy to use on this VM.
1552 * @param enmPriority The priority in an out-of-memory situation.
1553 *
1554 * @thread The creator thread / EMT(0).
1555 */
1556GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1557 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1558{
1559 LogFlow(("GMMR0InitialReservation: pGVM=%p pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1560 pGVM, pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1561
1562 /*
1563 * Validate, get basics and take the semaphore.
1564 */
1565 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1566 PGMM pGMM;
1567 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1568 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
1569 if (RT_FAILURE(rc))
1570 return rc;
1571
1572 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1573 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1574 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1575 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1576 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1577
1578 gmmR0MutexAcquire(pGMM);
1579 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1580 {
1581 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1582 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1583 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1584 {
1585 /*
1586 * Check if we can accommodate this.
1587 */
1588 /* ... later ... */
1589 if (RT_SUCCESS(rc))
1590 {
1591 /*
1592 * Update the records.
1593 */
1594 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1595 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1596 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1597 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1598 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1599 pGVM->gmm.s.Stats.fMayAllocate = true;
1600
1601 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1602 pGMM->cRegisteredVMs++;
1603 }
1604 }
1605 else
1606 rc = VERR_WRONG_ORDER;
1607 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1608 }
1609 else
1610 rc = VERR_GMM_IS_NOT_SANE;
1611 gmmR0MutexRelease(pGMM);
1612 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1613 return rc;
1614}
1615
1616
1617/**
1618 * VMMR0 request wrapper for GMMR0InitialReservation.
1619 *
1620 * @returns see GMMR0InitialReservation.
1621 * @param pGVM The global (ring-0) VM structure.
1622 * @param pVM The cross context VM structure.
1623 * @param idCpu The VCPU id.
1624 * @param pReq Pointer to the request packet.
1625 */
1626GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1627{
1628 /*
1629 * Validate input and pass it on.
1630 */
1631 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1632 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1633 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1634
1635 return GMMR0InitialReservation(pGVM, pVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1636 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1637}
1638
1639
1640/**
1641 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1642 *
1643 * @returns VBox status code.
1644 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1645 *
1646 * @param pGVM The global (ring-0) VM structure.
1647 * @param pVM The cross context VM structure.
1648 * @param idCpu The VCPU id.
1649 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1650 * This does not include MMIO2 and similar.
1651 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1652 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1653 * hyper heap, MMIO2 and similar.
1654 *
1655 * @thread EMT(idCpu)
1656 */
1657GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint64_t cBasePages,
1658 uint32_t cShadowPages, uint32_t cFixedPages)
1659{
1660 LogFlow(("GMMR0UpdateReservation: pGVM=%p pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1661 pGVM, pVM, cBasePages, cShadowPages, cFixedPages));
1662
1663 /*
1664 * Validate, get basics and take the semaphore.
1665 */
1666 PGMM pGMM;
1667 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1668 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
1669 if (RT_FAILURE(rc))
1670 return rc;
1671
1672 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1673 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1674 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1675
1676 gmmR0MutexAcquire(pGMM);
1677 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1678 {
1679 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1680 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1681 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1682 {
1683 /*
1684 * Check if we can accommodate this.
1685 */
1686 /* ... later ... */
1687 if (RT_SUCCESS(rc))
1688 {
1689 /*
1690 * Update the records.
1691 */
1692 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1693 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1694 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1695 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1696
1697 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1698 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1699 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1700 }
1701 }
1702 else
1703 rc = VERR_WRONG_ORDER;
1704 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1705 }
1706 else
1707 rc = VERR_GMM_IS_NOT_SANE;
1708 gmmR0MutexRelease(pGMM);
1709 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1710 return rc;
1711}
1712
1713
1714/**
1715 * VMMR0 request wrapper for GMMR0UpdateReservation.
1716 *
1717 * @returns see GMMR0UpdateReservation.
1718 * @param pGVM The global (ring-0) VM structure.
1719 * @param pVM The cross context VM structure.
1720 * @param idCpu The VCPU id.
1721 * @param pReq Pointer to the request packet.
1722 */
1723GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1724{
1725 /*
1726 * Validate input and pass it on.
1727 */
1728 AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1729 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1730 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1731
1732 return GMMR0UpdateReservation(pGVM, pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1733}
1734
1735#ifdef GMMR0_WITH_SANITY_CHECK
1736
1737/**
1738 * Performs sanity checks on a free set.
1739 *
1740 * @returns Error count.
1741 *
1742 * @param pGMM Pointer to the GMM instance.
1743 * @param pSet Pointer to the set.
1744 * @param pszSetName The set name.
1745 * @param pszFunction The function from which it was called.
1746 * @param uLine The line number.
1747 */
1748static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1749 const char *pszFunction, unsigned uLineNo)
1750{
1751 uint32_t cErrors = 0;
1752
1753 /*
1754 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1755 */
1756 uint32_t cPages = 0;
1757 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1758 {
1759 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1760 {
1761 /** @todo check that the chunk is hash into the right set. */
1762 cPages += pCur->cFree;
1763 }
1764 }
1765 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1766 {
1767 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1768 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1769 cErrors++;
1770 }
1771
1772 return cErrors;
1773}
1774
1775
1776/**
1777 * Performs some sanity checks on the GMM while owning lock.
1778 *
1779 * @returns Error count.
1780 *
1781 * @param pGMM Pointer to the GMM instance.
1782 * @param pszFunction The function from which it is called.
1783 * @param uLineNo The line number.
1784 */
1785static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1786{
1787 uint32_t cErrors = 0;
1788
1789 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1790 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1791 /** @todo add more sanity checks. */
1792
1793 return cErrors;
1794}
1795
1796#endif /* GMMR0_WITH_SANITY_CHECK */
1797
1798/**
1799 * Looks up a chunk in the tree and fill in the TLB entry for it.
1800 *
1801 * This is not expected to fail and will bitch if it does.
1802 *
1803 * @returns Pointer to the allocation chunk, NULL if not found.
1804 * @param pGMM Pointer to the GMM instance.
1805 * @param idChunk The ID of the chunk to find.
1806 * @param pTlbe Pointer to the TLB entry.
1807 */
1808static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1809{
1810 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1811 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1812 pTlbe->idChunk = idChunk;
1813 pTlbe->pChunk = pChunk;
1814 return pChunk;
1815}
1816
1817
1818/**
1819 * Finds a allocation chunk.
1820 *
1821 * This is not expected to fail and will bitch if it does.
1822 *
1823 * @returns Pointer to the allocation chunk, NULL if not found.
1824 * @param pGMM Pointer to the GMM instance.
1825 * @param idChunk The ID of the chunk to find.
1826 */
1827DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1828{
1829 /*
1830 * Do a TLB lookup, branch if not in the TLB.
1831 */
1832 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1833 if ( pTlbe->idChunk != idChunk
1834 || !pTlbe->pChunk)
1835 return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1836 return pTlbe->pChunk;
1837}
1838
1839
1840/**
1841 * Finds a page.
1842 *
1843 * This is not expected to fail and will bitch if it does.
1844 *
1845 * @returns Pointer to the page, NULL if not found.
1846 * @param pGMM Pointer to the GMM instance.
1847 * @param idPage The ID of the page to find.
1848 */
1849DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1850{
1851 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1852 if (RT_LIKELY(pChunk))
1853 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1854 return NULL;
1855}
1856
1857
1858#if 0 /* unused */
1859/**
1860 * Gets the host physical address for a page given by it's ID.
1861 *
1862 * @returns The host physical address or NIL_RTHCPHYS.
1863 * @param pGMM Pointer to the GMM instance.
1864 * @param idPage The ID of the page to find.
1865 */
1866DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1867{
1868 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1869 if (RT_LIKELY(pChunk))
1870 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1871 return NIL_RTHCPHYS;
1872}
1873#endif /* unused */
1874
1875
1876/**
1877 * Selects the appropriate free list given the number of free pages.
1878 *
1879 * @returns Free list index.
1880 * @param cFree The number of free pages in the chunk.
1881 */
1882DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1883{
1884 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1885 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1886 ("%d (%u)\n", iList, cFree));
1887 return iList;
1888}
1889
1890
1891/**
1892 * Unlinks the chunk from the free list it's currently on (if any).
1893 *
1894 * @param pChunk The allocation chunk.
1895 */
1896DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1897{
1898 PGMMCHUNKFREESET pSet = pChunk->pSet;
1899 if (RT_LIKELY(pSet))
1900 {
1901 pSet->cFreePages -= pChunk->cFree;
1902 pSet->idGeneration++;
1903
1904 PGMMCHUNK pPrev = pChunk->pFreePrev;
1905 PGMMCHUNK pNext = pChunk->pFreeNext;
1906 if (pPrev)
1907 pPrev->pFreeNext = pNext;
1908 else
1909 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1910 if (pNext)
1911 pNext->pFreePrev = pPrev;
1912
1913 pChunk->pSet = NULL;
1914 pChunk->pFreeNext = NULL;
1915 pChunk->pFreePrev = NULL;
1916 }
1917 else
1918 {
1919 Assert(!pChunk->pFreeNext);
1920 Assert(!pChunk->pFreePrev);
1921 Assert(!pChunk->cFree);
1922 }
1923}
1924
1925
1926/**
1927 * Links the chunk onto the appropriate free list in the specified free set.
1928 *
1929 * If no free entries, it's not linked into any list.
1930 *
1931 * @param pChunk The allocation chunk.
1932 * @param pSet The free set.
1933 */
1934DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1935{
1936 Assert(!pChunk->pSet);
1937 Assert(!pChunk->pFreeNext);
1938 Assert(!pChunk->pFreePrev);
1939
1940 if (pChunk->cFree > 0)
1941 {
1942 pChunk->pSet = pSet;
1943 pChunk->pFreePrev = NULL;
1944 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1945 pChunk->pFreeNext = pSet->apLists[iList];
1946 if (pChunk->pFreeNext)
1947 pChunk->pFreeNext->pFreePrev = pChunk;
1948 pSet->apLists[iList] = pChunk;
1949
1950 pSet->cFreePages += pChunk->cFree;
1951 pSet->idGeneration++;
1952 }
1953}
1954
1955
1956/**
1957 * Links the chunk onto the appropriate free list in the specified free set.
1958 *
1959 * If no free entries, it's not linked into any list.
1960 *
1961 * @param pGMM Pointer to the GMM instance.
1962 * @param pGVM Pointer to the kernel-only VM instace data.
1963 * @param pChunk The allocation chunk.
1964 */
1965DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1966{
1967 PGMMCHUNKFREESET pSet;
1968 if (pGMM->fBoundMemoryMode)
1969 pSet = &pGVM->gmm.s.Private;
1970 else if (pChunk->cShared)
1971 pSet = &pGMM->Shared;
1972 else
1973 pSet = &pGMM->PrivateX;
1974 gmmR0LinkChunk(pChunk, pSet);
1975}
1976
1977
1978/**
1979 * Frees a Chunk ID.
1980 *
1981 * @param pGMM Pointer to the GMM instance.
1982 * @param idChunk The Chunk ID to free.
1983 */
1984static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1985{
1986 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1987 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1988 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1989}
1990
1991
1992/**
1993 * Allocates a new Chunk ID.
1994 *
1995 * @returns The Chunk ID.
1996 * @param pGMM Pointer to the GMM instance.
1997 */
1998static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1999{
2000 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2001 AssertCompile(NIL_GMM_CHUNKID == 0);
2002
2003 /*
2004 * Try the next sequential one.
2005 */
2006 int32_t idChunk = ++pGMM->idChunkPrev;
2007#if 0 /** @todo enable this code */
2008 if ( idChunk <= GMM_CHUNKID_LAST
2009 && idChunk > NIL_GMM_CHUNKID
2010 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2011 return idChunk;
2012#endif
2013
2014 /*
2015 * Scan sequentially from the last one.
2016 */
2017 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2018 && idChunk > NIL_GMM_CHUNKID)
2019 {
2020 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2021 if (idChunk > NIL_GMM_CHUNKID)
2022 {
2023 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2024 return pGMM->idChunkPrev = idChunk;
2025 }
2026 }
2027
2028 /*
2029 * Ok, scan from the start.
2030 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2031 */
2032 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2033 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2034 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2035
2036 return pGMM->idChunkPrev = idChunk;
2037}
2038
2039
2040/**
2041 * Allocates one private page.
2042 *
2043 * Worker for gmmR0AllocatePages.
2044 *
2045 * @param pChunk The chunk to allocate it from.
2046 * @param hGVM The GVM handle of the VM requesting memory.
2047 * @param pPageDesc The page descriptor.
2048 */
2049static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2050{
2051 /* update the chunk stats. */
2052 if (pChunk->hGVM == NIL_GVM_HANDLE)
2053 pChunk->hGVM = hGVM;
2054 Assert(pChunk->cFree);
2055 pChunk->cFree--;
2056 pChunk->cPrivate++;
2057
2058 /* unlink the first free page. */
2059 const uint32_t iPage = pChunk->iFreeHead;
2060 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2061 PGMMPAGE pPage = &pChunk->aPages[iPage];
2062 Assert(GMM_PAGE_IS_FREE(pPage));
2063 pChunk->iFreeHead = pPage->Free.iNext;
2064 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2065 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2066 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2067
2068 /* make the page private. */
2069 pPage->u = 0;
2070 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2071 pPage->Private.hGVM = hGVM;
2072 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2073 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2074 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2075 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2076 else
2077 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2078
2079 /* update the page descriptor. */
2080 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2081 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2082 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2083 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2084}
2085
2086
2087/**
2088 * Picks the free pages from a chunk.
2089 *
2090 * @returns The new page descriptor table index.
2091 * @param pChunk The chunk.
2092 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2093 * affinity.
2094 * @param iPage The current page descriptor table index.
2095 * @param cPages The total number of pages to allocate.
2096 * @param paPages The page descriptor table (input + ouput).
2097 */
2098static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2099 PGMMPAGEDESC paPages)
2100{
2101 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2102 gmmR0UnlinkChunk(pChunk);
2103
2104 for (; pChunk->cFree && iPage < cPages; iPage++)
2105 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2106
2107 gmmR0LinkChunk(pChunk, pSet);
2108 return iPage;
2109}
2110
2111
2112/**
2113 * Registers a new chunk of memory.
2114 *
2115 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2116 *
2117 * @returns VBox status code. On success, the giant GMM lock will be held, the
2118 * caller must release it (ugly).
2119 * @param pGMM Pointer to the GMM instance.
2120 * @param pSet Pointer to the set.
2121 * @param MemObj The memory object for the chunk.
2122 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2123 * affinity.
2124 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2125 * @param ppChunk Chunk address (out). Optional.
2126 *
2127 * @remarks The caller must not own the giant GMM mutex.
2128 * The giant GMM mutex will be acquired and returned acquired in
2129 * the success path. On failure, no locks will be held.
2130 */
2131static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2132 PGMMCHUNK *ppChunk)
2133{
2134 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2135 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2136 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2137
2138 int rc;
2139 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2140 if (pChunk)
2141 {
2142 /*
2143 * Initialize it.
2144 */
2145 pChunk->hMemObj = MemObj;
2146 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2147 pChunk->hGVM = hGVM;
2148 /*pChunk->iFreeHead = 0;*/
2149 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2150 pChunk->iChunkMtx = UINT8_MAX;
2151 pChunk->fFlags = fChunkFlags;
2152 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2153 {
2154 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2155 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2156 }
2157 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2158 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2159
2160 /*
2161 * Allocate a Chunk ID and insert it into the tree.
2162 * This has to be done behind the mutex of course.
2163 */
2164 rc = gmmR0MutexAcquire(pGMM);
2165 if (RT_SUCCESS(rc))
2166 {
2167 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2168 {
2169 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2170 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2171 && pChunk->Core.Key <= GMM_CHUNKID_LAST
2172 && RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2173 {
2174 pGMM->cChunks++;
2175 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2176 gmmR0LinkChunk(pChunk, pSet);
2177 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2178
2179 if (ppChunk)
2180 *ppChunk = pChunk;
2181 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2182 return VINF_SUCCESS;
2183 }
2184
2185 /* bail out */
2186 rc = VERR_GMM_CHUNK_INSERT;
2187 }
2188 else
2189 rc = VERR_GMM_IS_NOT_SANE;
2190 gmmR0MutexRelease(pGMM);
2191 }
2192
2193 RTMemFree(pChunk);
2194 }
2195 else
2196 rc = VERR_NO_MEMORY;
2197 return rc;
2198}
2199
2200
2201/**
2202 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2203 * what's remaining to the specified free set.
2204 *
2205 * @note This will leave the giant mutex while allocating the new chunk!
2206 *
2207 * @returns VBox status code.
2208 * @param pGMM Pointer to the GMM instance data.
2209 * @param pGVM Pointer to the kernel-only VM instace data.
2210 * @param pSet Pointer to the free set.
2211 * @param cPages The number of pages requested.
2212 * @param paPages The page descriptor table (input + output).
2213 * @param piPage The pointer to the page descriptor table index variable.
2214 * This will be updated.
2215 */
2216static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2217 PGMMPAGEDESC paPages, uint32_t *piPage)
2218{
2219 gmmR0MutexRelease(pGMM);
2220
2221 RTR0MEMOBJ hMemObj;
2222 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2223 if (RT_SUCCESS(rc))
2224 {
2225/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2226 * free pages first and then unchaining them right afterwards. Instead
2227 * do as much work as possible without holding the giant lock. */
2228 PGMMCHUNK pChunk;
2229 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2230 if (RT_SUCCESS(rc))
2231 {
2232 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2233 return VINF_SUCCESS;
2234 }
2235
2236 /* bail out */
2237 RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2238 }
2239
2240 int rc2 = gmmR0MutexAcquire(pGMM);
2241 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2242 return rc;
2243
2244}
2245
2246
2247/**
2248 * As a last restort we'll pick any page we can get.
2249 *
2250 * @returns The new page descriptor table index.
2251 * @param pSet The set to pick from.
2252 * @param pGVM Pointer to the global VM structure.
2253 * @param iPage The current page descriptor table index.
2254 * @param cPages The total number of pages to allocate.
2255 * @param paPages The page descriptor table (input + ouput).
2256 */
2257static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2258 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2259{
2260 unsigned iList = RT_ELEMENTS(pSet->apLists);
2261 while (iList-- > 0)
2262 {
2263 PGMMCHUNK pChunk = pSet->apLists[iList];
2264 while (pChunk)
2265 {
2266 PGMMCHUNK pNext = pChunk->pFreeNext;
2267
2268 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2269 if (iPage >= cPages)
2270 return iPage;
2271
2272 pChunk = pNext;
2273 }
2274 }
2275 return iPage;
2276}
2277
2278
2279/**
2280 * Pick pages from empty chunks on the same NUMA node.
2281 *
2282 * @returns The new page descriptor table index.
2283 * @param pSet The set to pick from.
2284 * @param pGVM Pointer to the global VM structure.
2285 * @param iPage The current page descriptor table index.
2286 * @param cPages The total number of pages to allocate.
2287 * @param paPages The page descriptor table (input + ouput).
2288 */
2289static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2290 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2291{
2292 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2293 if (pChunk)
2294 {
2295 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2296 while (pChunk)
2297 {
2298 PGMMCHUNK pNext = pChunk->pFreeNext;
2299
2300 if (pChunk->idNumaNode == idNumaNode)
2301 {
2302 pChunk->hGVM = pGVM->hSelf;
2303 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2304 if (iPage >= cPages)
2305 {
2306 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2307 return iPage;
2308 }
2309 }
2310
2311 pChunk = pNext;
2312 }
2313 }
2314 return iPage;
2315}
2316
2317
2318/**
2319 * Pick pages from non-empty chunks on the same NUMA node.
2320 *
2321 * @returns The new page descriptor table index.
2322 * @param pSet The set to pick from.
2323 * @param pGVM Pointer to the global VM structure.
2324 * @param iPage The current page descriptor table index.
2325 * @param cPages The total number of pages to allocate.
2326 * @param paPages The page descriptor table (input + ouput).
2327 */
2328static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2329 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2330{
2331 /** @todo start by picking from chunks with about the right size first? */
2332 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2333 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2334 while (iList-- > 0)
2335 {
2336 PGMMCHUNK pChunk = pSet->apLists[iList];
2337 while (pChunk)
2338 {
2339 PGMMCHUNK pNext = pChunk->pFreeNext;
2340
2341 if (pChunk->idNumaNode == idNumaNode)
2342 {
2343 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2344 if (iPage >= cPages)
2345 {
2346 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2347 return iPage;
2348 }
2349 }
2350
2351 pChunk = pNext;
2352 }
2353 }
2354 return iPage;
2355}
2356
2357
2358/**
2359 * Pick pages that are in chunks already associated with the VM.
2360 *
2361 * @returns The new page descriptor table index.
2362 * @param pGMM Pointer to the GMM instance data.
2363 * @param pGVM Pointer to the global VM structure.
2364 * @param pSet The set to pick from.
2365 * @param iPage The current page descriptor table index.
2366 * @param cPages The total number of pages to allocate.
2367 * @param paPages The page descriptor table (input + ouput).
2368 */
2369static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2370 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2371{
2372 uint16_t const hGVM = pGVM->hSelf;
2373
2374 /* Hint. */
2375 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2376 {
2377 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2378 if (pChunk && pChunk->cFree)
2379 {
2380 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2381 if (iPage >= cPages)
2382 return iPage;
2383 }
2384 }
2385
2386 /* Scan. */
2387 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2388 {
2389 PGMMCHUNK pChunk = pSet->apLists[iList];
2390 while (pChunk)
2391 {
2392 PGMMCHUNK pNext = pChunk->pFreeNext;
2393
2394 if (pChunk->hGVM == hGVM)
2395 {
2396 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2397 if (iPage >= cPages)
2398 {
2399 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2400 return iPage;
2401 }
2402 }
2403
2404 pChunk = pNext;
2405 }
2406 }
2407 return iPage;
2408}
2409
2410
2411
2412/**
2413 * Pick pages in bound memory mode.
2414 *
2415 * @returns The new page descriptor table index.
2416 * @param pGVM Pointer to the global VM structure.
2417 * @param iPage The current page descriptor table index.
2418 * @param cPages The total number of pages to allocate.
2419 * @param paPages The page descriptor table (input + ouput).
2420 */
2421static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2422{
2423 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2424 {
2425 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2426 while (pChunk)
2427 {
2428 Assert(pChunk->hGVM == pGVM->hSelf);
2429 PGMMCHUNK pNext = pChunk->pFreeNext;
2430 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2431 if (iPage >= cPages)
2432 return iPage;
2433 pChunk = pNext;
2434 }
2435 }
2436 return iPage;
2437}
2438
2439
2440/**
2441 * Checks if we should start picking pages from chunks of other VMs because
2442 * we're getting close to the system memory or reserved limit.
2443 *
2444 * @returns @c true if we should, @c false if we should first try allocate more
2445 * chunks.
2446 */
2447static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2448{
2449 /*
2450 * Don't allocate a new chunk if we're
2451 */
2452 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2453 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2454 - pGVM->gmm.s.Stats.cBalloonedPages
2455 /** @todo what about shared pages? */;
2456 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2457 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2458 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2459 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2460 return true;
2461 /** @todo make the threshold configurable, also test the code to see if
2462 * this ever kicks in (we might be reserving too much or smth). */
2463
2464 /*
2465 * Check how close we're to the max memory limit and how many fragments
2466 * there are?...
2467 */
2468 /** @todo. */
2469
2470 return false;
2471}
2472
2473
2474/**
2475 * Checks if we should start picking pages from chunks of other VMs because
2476 * there is a lot of free pages around.
2477 *
2478 * @returns @c true if we should, @c false if we should first try allocate more
2479 * chunks.
2480 */
2481static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2482{
2483 /*
2484 * Setting the limit at 16 chunks (32 MB) at the moment.
2485 */
2486 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2487 return true;
2488 return false;
2489}
2490
2491
2492/**
2493 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2494 *
2495 * @returns VBox status code:
2496 * @retval VINF_SUCCESS on success.
2497 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2498 * gmmR0AllocateMoreChunks is necessary.
2499 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2500 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2501 * that is we're trying to allocate more than we've reserved.
2502 *
2503 * @param pGMM Pointer to the GMM instance data.
2504 * @param pGVM Pointer to the VM.
2505 * @param cPages The number of pages to allocate.
2506 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2507 * details on what is expected on input.
2508 * @param enmAccount The account to charge.
2509 *
2510 * @remarks Call takes the giant GMM lock.
2511 */
2512static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2513{
2514 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2515
2516 /*
2517 * Check allocation limits.
2518 */
2519 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2520 return VERR_GMM_HIT_GLOBAL_LIMIT;
2521
2522 switch (enmAccount)
2523 {
2524 case GMMACCOUNT_BASE:
2525 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2526 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2527 {
2528 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2529 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2530 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2531 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2532 }
2533 break;
2534 case GMMACCOUNT_SHADOW:
2535 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2536 {
2537 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2538 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2539 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2540 }
2541 break;
2542 case GMMACCOUNT_FIXED:
2543 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2544 {
2545 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2546 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2547 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2548 }
2549 break;
2550 default:
2551 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2552 }
2553
2554 /*
2555 * If we're in legacy memory mode, it's easy to figure if we have
2556 * sufficient number of pages up-front.
2557 */
2558 if ( pGMM->fLegacyAllocationMode
2559 && pGVM->gmm.s.Private.cFreePages < cPages)
2560 {
2561 Assert(pGMM->fBoundMemoryMode);
2562 return VERR_GMM_SEED_ME;
2563 }
2564
2565 /*
2566 * Update the accounts before we proceed because we might be leaving the
2567 * protection of the global mutex and thus run the risk of permitting
2568 * too much memory to be allocated.
2569 */
2570 switch (enmAccount)
2571 {
2572 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2573 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2574 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2575 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2576 }
2577 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2578 pGMM->cAllocatedPages += cPages;
2579
2580 /*
2581 * Part two of it's-easy-in-legacy-memory-mode.
2582 */
2583 uint32_t iPage = 0;
2584 if (pGMM->fLegacyAllocationMode)
2585 {
2586 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2587 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2588 return VINF_SUCCESS;
2589 }
2590
2591 /*
2592 * Bound mode is also relatively straightforward.
2593 */
2594 int rc = VINF_SUCCESS;
2595 if (pGMM->fBoundMemoryMode)
2596 {
2597 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2598 if (iPage < cPages)
2599 do
2600 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2601 while (iPage < cPages && RT_SUCCESS(rc));
2602 }
2603 /*
2604 * Shared mode is trickier as we should try archive the same locality as
2605 * in bound mode, but smartly make use of non-full chunks allocated by
2606 * other VMs if we're low on memory.
2607 */
2608 else
2609 {
2610 /* Pick the most optimal pages first. */
2611 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2612 if (iPage < cPages)
2613 {
2614 /* Maybe we should try getting pages from chunks "belonging" to
2615 other VMs before allocating more chunks? */
2616 bool fTriedOnSameAlready = false;
2617 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2618 {
2619 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2620 fTriedOnSameAlready = true;
2621 }
2622
2623 /* Allocate memory from empty chunks. */
2624 if (iPage < cPages)
2625 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2626
2627 /* Grab empty shared chunks. */
2628 if (iPage < cPages)
2629 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2630
2631 /* If there is a lof of free pages spread around, try not waste
2632 system memory on more chunks. (Should trigger defragmentation.) */
2633 if ( !fTriedOnSameAlready
2634 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2635 {
2636 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2637 if (iPage < cPages)
2638 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2639 }
2640
2641 /*
2642 * Ok, try allocate new chunks.
2643 */
2644 if (iPage < cPages)
2645 {
2646 do
2647 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2648 while (iPage < cPages && RT_SUCCESS(rc));
2649
2650 /* If the host is out of memory, take whatever we can get. */
2651 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2652 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2653 {
2654 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2655 if (iPage < cPages)
2656 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2657 AssertRelease(iPage == cPages);
2658 rc = VINF_SUCCESS;
2659 }
2660 }
2661 }
2662 }
2663
2664 /*
2665 * Clean up on failure. Since this is bound to be a low-memory condition
2666 * we will give back any empty chunks that might be hanging around.
2667 */
2668 if (RT_FAILURE(rc))
2669 {
2670 /* Update the statistics. */
2671 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2672 pGMM->cAllocatedPages -= cPages - iPage;
2673 switch (enmAccount)
2674 {
2675 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2676 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2677 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2678 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2679 }
2680
2681 /* Release the pages. */
2682 while (iPage-- > 0)
2683 {
2684 uint32_t idPage = paPages[iPage].idPage;
2685 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2686 if (RT_LIKELY(pPage))
2687 {
2688 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2689 Assert(pPage->Private.hGVM == pGVM->hSelf);
2690 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2691 }
2692 else
2693 AssertMsgFailed(("idPage=%#x\n", idPage));
2694
2695 paPages[iPage].idPage = NIL_GMM_PAGEID;
2696 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2697 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2698 }
2699
2700 /* Free empty chunks. */
2701 /** @todo */
2702
2703 /* return the fail status on failure */
2704 return rc;
2705 }
2706 return VINF_SUCCESS;
2707}
2708
2709
2710/**
2711 * Updates the previous allocations and allocates more pages.
2712 *
2713 * The handy pages are always taken from the 'base' memory account.
2714 * The allocated pages are not cleared and will contains random garbage.
2715 *
2716 * @returns VBox status code:
2717 * @retval VINF_SUCCESS on success.
2718 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2719 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2720 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2721 * private page.
2722 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2723 * shared page.
2724 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2725 * owned by the VM.
2726 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2727 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2728 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2729 * that is we're trying to allocate more than we've reserved.
2730 *
2731 * @param pGVM The global (ring-0) VM structure.
2732 * @param pVM The cross context VM structure.
2733 * @param idCpu The VCPU id.
2734 * @param cPagesToUpdate The number of pages to update (starting from the head).
2735 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2736 * @param paPages The array of page descriptors.
2737 * See GMMPAGEDESC for details on what is expected on input.
2738 * @thread EMT(idCpu)
2739 */
2740GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2741 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2742{
2743 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2744 pGVM, pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2745
2746 /*
2747 * Validate, get basics and take the semaphore.
2748 * (This is a relatively busy path, so make predictions where possible.)
2749 */
2750 PGMM pGMM;
2751 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2752 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
2753 if (RT_FAILURE(rc))
2754 return rc;
2755
2756 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2757 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2758 || (cPagesToAlloc && cPagesToAlloc < 1024),
2759 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2760 VERR_INVALID_PARAMETER);
2761
2762 unsigned iPage = 0;
2763 for (; iPage < cPagesToUpdate; iPage++)
2764 {
2765 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2766 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2767 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2768 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2769 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2770 VERR_INVALID_PARAMETER);
2771 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2772 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2773 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2774 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2775 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2776 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2777 }
2778
2779 for (; iPage < cPagesToAlloc; iPage++)
2780 {
2781 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2782 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2783 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2784 }
2785
2786 gmmR0MutexAcquire(pGMM);
2787 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2788 {
2789 /* No allocations before the initial reservation has been made! */
2790 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2791 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2792 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2793 {
2794 /*
2795 * Perform the updates.
2796 * Stop on the first error.
2797 */
2798 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2799 {
2800 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2801 {
2802 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2803 if (RT_LIKELY(pPage))
2804 {
2805 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2806 {
2807 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2808 {
2809 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2810 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2811 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2812 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2813 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2814 /* else: NIL_RTHCPHYS nothing */
2815
2816 paPages[iPage].idPage = NIL_GMM_PAGEID;
2817 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2818 }
2819 else
2820 {
2821 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2822 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2823 rc = VERR_GMM_NOT_PAGE_OWNER;
2824 break;
2825 }
2826 }
2827 else
2828 {
2829 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2830 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2831 break;
2832 }
2833 }
2834 else
2835 {
2836 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2837 rc = VERR_GMM_PAGE_NOT_FOUND;
2838 break;
2839 }
2840 }
2841
2842 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2843 {
2844 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2845 if (RT_LIKELY(pPage))
2846 {
2847 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2848 {
2849 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2850 Assert(pPage->Shared.cRefs);
2851 Assert(pGVM->gmm.s.Stats.cSharedPages);
2852 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2853
2854 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2855 pGVM->gmm.s.Stats.cSharedPages--;
2856 pGVM->gmm.s.Stats.Allocated.cBasePages--;
2857 if (!--pPage->Shared.cRefs)
2858 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2859 else
2860 {
2861 Assert(pGMM->cDuplicatePages);
2862 pGMM->cDuplicatePages--;
2863 }
2864
2865 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2866 }
2867 else
2868 {
2869 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2870 rc = VERR_GMM_PAGE_NOT_SHARED;
2871 break;
2872 }
2873 }
2874 else
2875 {
2876 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2877 rc = VERR_GMM_PAGE_NOT_FOUND;
2878 break;
2879 }
2880 }
2881 } /* for each page to update */
2882
2883 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2884 {
2885#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2886 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2887 {
2888 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2889 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2890 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2891 }
2892#endif
2893
2894 /*
2895 * Join paths with GMMR0AllocatePages for the allocation.
2896 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2897 */
2898 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2899 }
2900 }
2901 else
2902 rc = VERR_WRONG_ORDER;
2903 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2904 }
2905 else
2906 rc = VERR_GMM_IS_NOT_SANE;
2907 gmmR0MutexRelease(pGMM);
2908 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2909 return rc;
2910}
2911
2912
2913/**
2914 * Allocate one or more pages.
2915 *
2916 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2917 * The allocated pages are not cleared and will contain random garbage.
2918 *
2919 * @returns VBox status code:
2920 * @retval VINF_SUCCESS on success.
2921 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2922 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2923 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2924 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2925 * that is we're trying to allocate more than we've reserved.
2926 *
2927 * @param pGVM The global (ring-0) VM structure.
2928 * @param pVM The cross context VM structure.
2929 * @param idCpu The VCPU id.
2930 * @param cPages The number of pages to allocate.
2931 * @param paPages Pointer to the page descriptors.
2932 * See GMMPAGEDESC for details on what is expected on
2933 * input.
2934 * @param enmAccount The account to charge.
2935 *
2936 * @thread EMT.
2937 */
2938GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2939{
2940 LogFlow(("GMMR0AllocatePages: pGVM=%p pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, pVM, cPages, paPages, enmAccount));
2941
2942 /*
2943 * Validate, get basics and take the semaphore.
2944 */
2945 PGMM pGMM;
2946 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2947 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
2948 if (RT_FAILURE(rc))
2949 return rc;
2950
2951 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2952 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2953 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2954
2955 for (unsigned iPage = 0; iPage < cPages; iPage++)
2956 {
2957 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2958 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2959 || ( enmAccount == GMMACCOUNT_BASE
2960 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2961 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2962 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2963 VERR_INVALID_PARAMETER);
2964 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2965 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2966 }
2967
2968 gmmR0MutexAcquire(pGMM);
2969 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2970 {
2971
2972 /* No allocations before the initial reservation has been made! */
2973 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2974 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2975 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2976 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2977 else
2978 rc = VERR_WRONG_ORDER;
2979 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2980 }
2981 else
2982 rc = VERR_GMM_IS_NOT_SANE;
2983 gmmR0MutexRelease(pGMM);
2984 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2985 return rc;
2986}
2987
2988
2989/**
2990 * VMMR0 request wrapper for GMMR0AllocatePages.
2991 *
2992 * @returns see GMMR0AllocatePages.
2993 * @param pGVM The global (ring-0) VM structure.
2994 * @param pVM The cross context VM structure.
2995 * @param idCpu The VCPU id.
2996 * @param pReq Pointer to the request packet.
2997 */
2998GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2999{
3000 /*
3001 * Validate input and pass it on.
3002 */
3003 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3004 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3005 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3006 VERR_INVALID_PARAMETER);
3007 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3008 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3009 VERR_INVALID_PARAMETER);
3010
3011 return GMMR0AllocatePages(pGVM, pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3012}
3013
3014
3015/**
3016 * Allocate a large page to represent guest RAM
3017 *
3018 * The allocated pages are not cleared and will contains random garbage.
3019 *
3020 * @returns VBox status code:
3021 * @retval VINF_SUCCESS on success.
3022 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3023 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3024 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3025 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3026 * that is we're trying to allocate more than we've reserved.
3027 * @returns see GMMR0AllocatePages.
3028 *
3029 * @param pGVM The global (ring-0) VM structure.
3030 * @param pVM The cross context VM structure.
3031 * @param idCpu The VCPU id.
3032 * @param cbPage Large page size.
3033 * @param pIdPage Where to return the GMM page ID of the page.
3034 * @param pHCPhys Where to return the host physical address of the page.
3035 */
3036GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3037{
3038 LogFlow(("GMMR0AllocateLargePage: pGVM=%p pVM=%p cbPage=%x\n", pGVM, pVM, cbPage));
3039
3040 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3041 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3042 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3043
3044 /*
3045 * Validate, get basics and take the semaphore.
3046 */
3047 PGMM pGMM;
3048 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3049 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
3050 if (RT_FAILURE(rc))
3051 return rc;
3052
3053 /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3054 if (pGMM->fLegacyAllocationMode)
3055 return VERR_NOT_SUPPORTED;
3056
3057 *pHCPhys = NIL_RTHCPHYS;
3058 *pIdPage = NIL_GMM_PAGEID;
3059
3060 gmmR0MutexAcquire(pGMM);
3061 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3062 {
3063 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3064 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3065 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3066 {
3067 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3068 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3069 gmmR0MutexRelease(pGMM);
3070 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3071 }
3072
3073 /*
3074 * Allocate a new large page chunk.
3075 *
3076 * Note! We leave the giant GMM lock temporarily as the allocation might
3077 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3078 */
3079 AssertCompile(GMM_CHUNK_SIZE == _2M);
3080 gmmR0MutexRelease(pGMM);
3081
3082 RTR0MEMOBJ hMemObj;
3083 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3084 if (RT_SUCCESS(rc))
3085 {
3086 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3087 PGMMCHUNK pChunk;
3088 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3089 if (RT_SUCCESS(rc))
3090 {
3091 /*
3092 * Allocate all the pages in the chunk.
3093 */
3094 /* Unlink the new chunk from the free list. */
3095 gmmR0UnlinkChunk(pChunk);
3096
3097 /** @todo rewrite this to skip the looping. */
3098 /* Allocate all pages. */
3099 GMMPAGEDESC PageDesc;
3100 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3101
3102 /* Return the first page as we'll use the whole chunk as one big page. */
3103 *pIdPage = PageDesc.idPage;
3104 *pHCPhys = PageDesc.HCPhysGCPhys;
3105
3106 for (unsigned i = 1; i < cPages; i++)
3107 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3108
3109 /* Update accounting. */
3110 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3111 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3112 pGMM->cAllocatedPages += cPages;
3113
3114 gmmR0LinkChunk(pChunk, pSet);
3115 gmmR0MutexRelease(pGMM);
3116 }
3117 else
3118 RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3119 }
3120 }
3121 else
3122 {
3123 gmmR0MutexRelease(pGMM);
3124 rc = VERR_GMM_IS_NOT_SANE;
3125 }
3126
3127 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3128 return rc;
3129}
3130
3131
3132/**
3133 * Free a large page.
3134 *
3135 * @returns VBox status code:
3136 * @param pGVM The global (ring-0) VM structure.
3137 * @param pVM The cross context VM structure.
3138 * @param idCpu The VCPU id.
3139 * @param idPage The large page id.
3140 */
3141GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint32_t idPage)
3142{
3143 LogFlow(("GMMR0FreeLargePage: pGVM=%p pVM=%p idPage=%x\n", pGVM, pVM, idPage));
3144
3145 /*
3146 * Validate, get basics and take the semaphore.
3147 */
3148 PGMM pGMM;
3149 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3150 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
3151 if (RT_FAILURE(rc))
3152 return rc;
3153
3154 /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3155 if (pGMM->fLegacyAllocationMode)
3156 return VERR_NOT_SUPPORTED;
3157
3158 gmmR0MutexAcquire(pGMM);
3159 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3160 {
3161 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3162
3163 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3164 {
3165 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3166 gmmR0MutexRelease(pGMM);
3167 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3168 }
3169
3170 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3171 if (RT_LIKELY( pPage
3172 && GMM_PAGE_IS_PRIVATE(pPage)))
3173 {
3174 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3175 Assert(pChunk);
3176 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3177 Assert(pChunk->cPrivate > 0);
3178
3179 /* Release the memory immediately. */
3180 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3181
3182 /* Update accounting. */
3183 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3184 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3185 pGMM->cAllocatedPages -= cPages;
3186 }
3187 else
3188 rc = VERR_GMM_PAGE_NOT_FOUND;
3189 }
3190 else
3191 rc = VERR_GMM_IS_NOT_SANE;
3192
3193 gmmR0MutexRelease(pGMM);
3194 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3195 return rc;
3196}
3197
3198
3199/**
3200 * VMMR0 request wrapper for GMMR0FreeLargePage.
3201 *
3202 * @returns see GMMR0FreeLargePage.
3203 * @param pGVM The global (ring-0) VM structure.
3204 * @param pVM The cross context VM structure.
3205 * @param idCpu The VCPU id.
3206 * @param pReq Pointer to the request packet.
3207 */
3208GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3209{
3210 /*
3211 * Validate input and pass it on.
3212 */
3213 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3214 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3215 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3216 VERR_INVALID_PARAMETER);
3217
3218 return GMMR0FreeLargePage(pGVM, pVM, idCpu, pReq->idPage);
3219}
3220
3221
3222/**
3223 * Frees a chunk, giving it back to the host OS.
3224 *
3225 * @param pGMM Pointer to the GMM instance.
3226 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3227 * unmap and free the chunk in one go.
3228 * @param pChunk The chunk to free.
3229 * @param fRelaxedSem Whether we can release the semaphore while doing the
3230 * freeing (@c true) or not.
3231 */
3232static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3233{
3234 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3235
3236 GMMR0CHUNKMTXSTATE MtxState;
3237 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3238
3239 /*
3240 * Cleanup hack! Unmap the chunk from the callers address space.
3241 * This shouldn't happen, so screw lock contention...
3242 */
3243 if ( pChunk->cMappingsX
3244 && !pGMM->fLegacyAllocationMode
3245 && pGVM)
3246 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3247
3248 /*
3249 * If there are current mappings of the chunk, then request the
3250 * VMs to unmap them. Reposition the chunk in the free list so
3251 * it won't be a likely candidate for allocations.
3252 */
3253 if (pChunk->cMappingsX)
3254 {
3255 /** @todo R0 -> VM request */
3256 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3257 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3258 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3259 return false;
3260 }
3261
3262
3263 /*
3264 * Save and trash the handle.
3265 */
3266 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3267 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3268
3269 /*
3270 * Unlink it from everywhere.
3271 */
3272 gmmR0UnlinkChunk(pChunk);
3273
3274 RTListNodeRemove(&pChunk->ListNode);
3275
3276 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3277 Assert(pCore == &pChunk->Core); NOREF(pCore);
3278
3279 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3280 if (pTlbe->pChunk == pChunk)
3281 {
3282 pTlbe->idChunk = NIL_GMM_CHUNKID;
3283 pTlbe->pChunk = NULL;
3284 }
3285
3286 Assert(pGMM->cChunks > 0);
3287 pGMM->cChunks--;
3288
3289 /*
3290 * Free the Chunk ID before dropping the locks and freeing the rest.
3291 */
3292 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3293 pChunk->Core.Key = NIL_GMM_CHUNKID;
3294
3295 pGMM->cFreedChunks++;
3296
3297 gmmR0ChunkMutexRelease(&MtxState, NULL);
3298 if (fRelaxedSem)
3299 gmmR0MutexRelease(pGMM);
3300
3301 RTMemFree(pChunk->paMappingsX);
3302 pChunk->paMappingsX = NULL;
3303
3304 RTMemFree(pChunk);
3305
3306 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3307 AssertLogRelRC(rc);
3308
3309 if (fRelaxedSem)
3310 gmmR0MutexAcquire(pGMM);
3311 return fRelaxedSem;
3312}
3313
3314
3315/**
3316 * Free page worker.
3317 *
3318 * The caller does all the statistic decrementing, we do all the incrementing.
3319 *
3320 * @param pGMM Pointer to the GMM instance data.
3321 * @param pGVM Pointer to the GVM instance.
3322 * @param pChunk Pointer to the chunk this page belongs to.
3323 * @param idPage The Page ID.
3324 * @param pPage Pointer to the page.
3325 */
3326static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3327{
3328 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3329 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3330
3331 /*
3332 * Put the page on the free list.
3333 */
3334 pPage->u = 0;
3335 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3336 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3337 pPage->Free.iNext = pChunk->iFreeHead;
3338 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3339
3340 /*
3341 * Update statistics (the cShared/cPrivate stats are up to date already),
3342 * and relink the chunk if necessary.
3343 */
3344 unsigned const cFree = pChunk->cFree;
3345 if ( !cFree
3346 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3347 {
3348 gmmR0UnlinkChunk(pChunk);
3349 pChunk->cFree++;
3350 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3351 }
3352 else
3353 {
3354 pChunk->cFree = cFree + 1;
3355 pChunk->pSet->cFreePages++;
3356 }
3357
3358 /*
3359 * If the chunk becomes empty, consider giving memory back to the host OS.
3360 *
3361 * The current strategy is to try give it back if there are other chunks
3362 * in this free list, meaning if there are at least 240 free pages in this
3363 * category. Note that since there are probably mappings of the chunk,
3364 * it won't be freed up instantly, which probably screws up this logic
3365 * a bit...
3366 */
3367 /** @todo Do this on the way out. */
3368 if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3369 && pChunk->pFreeNext
3370 && pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3371 && !pGMM->fLegacyAllocationMode))
3372 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3373
3374}
3375
3376
3377/**
3378 * Frees a shared page, the page is known to exist and be valid and such.
3379 *
3380 * @param pGMM Pointer to the GMM instance.
3381 * @param pGVM Pointer to the GVM instance.
3382 * @param idPage The page id.
3383 * @param pPage The page structure.
3384 */
3385DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3386{
3387 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3388 Assert(pChunk);
3389 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3390 Assert(pChunk->cShared > 0);
3391 Assert(pGMM->cSharedPages > 0);
3392 Assert(pGMM->cAllocatedPages > 0);
3393 Assert(!pPage->Shared.cRefs);
3394
3395 pChunk->cShared--;
3396 pGMM->cAllocatedPages--;
3397 pGMM->cSharedPages--;
3398 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3399}
3400
3401
3402/**
3403 * Frees a private page, the page is known to exist and be valid and such.
3404 *
3405 * @param pGMM Pointer to the GMM instance.
3406 * @param pGVM Pointer to the GVM instance.
3407 * @param idPage The page id.
3408 * @param pPage The page structure.
3409 */
3410DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3411{
3412 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3413 Assert(pChunk);
3414 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3415 Assert(pChunk->cPrivate > 0);
3416 Assert(pGMM->cAllocatedPages > 0);
3417
3418 pChunk->cPrivate--;
3419 pGMM->cAllocatedPages--;
3420 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3421}
3422
3423
3424/**
3425 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3426 *
3427 * @returns VBox status code:
3428 * @retval xxx
3429 *
3430 * @param pGMM Pointer to the GMM instance data.
3431 * @param pGVM Pointer to the VM.
3432 * @param cPages The number of pages to free.
3433 * @param paPages Pointer to the page descriptors.
3434 * @param enmAccount The account this relates to.
3435 */
3436static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3437{
3438 /*
3439 * Check that the request isn't impossible wrt to the account status.
3440 */
3441 switch (enmAccount)
3442 {
3443 case GMMACCOUNT_BASE:
3444 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3445 {
3446 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3447 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3448 }
3449 break;
3450 case GMMACCOUNT_SHADOW:
3451 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3452 {
3453 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3454 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3455 }
3456 break;
3457 case GMMACCOUNT_FIXED:
3458 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3459 {
3460 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3461 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3462 }
3463 break;
3464 default:
3465 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3466 }
3467
3468 /*
3469 * Walk the descriptors and free the pages.
3470 *
3471 * Statistics (except the account) are being updated as we go along,
3472 * unlike the alloc code. Also, stop on the first error.
3473 */
3474 int rc = VINF_SUCCESS;
3475 uint32_t iPage;
3476 for (iPage = 0; iPage < cPages; iPage++)
3477 {
3478 uint32_t idPage = paPages[iPage].idPage;
3479 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3480 if (RT_LIKELY(pPage))
3481 {
3482 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3483 {
3484 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3485 {
3486 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3487 pGVM->gmm.s.Stats.cPrivatePages--;
3488 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3489 }
3490 else
3491 {
3492 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3493 pPage->Private.hGVM, pGVM->hSelf));
3494 rc = VERR_GMM_NOT_PAGE_OWNER;
3495 break;
3496 }
3497 }
3498 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3499 {
3500 Assert(pGVM->gmm.s.Stats.cSharedPages);
3501 Assert(pPage->Shared.cRefs);
3502#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3503 if (pPage->Shared.u14Checksum)
3504 {
3505 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3506 uChecksum &= UINT32_C(0x00003fff);
3507 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3508 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3509 }
3510#endif
3511 pGVM->gmm.s.Stats.cSharedPages--;
3512 if (!--pPage->Shared.cRefs)
3513 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3514 else
3515 {
3516 Assert(pGMM->cDuplicatePages);
3517 pGMM->cDuplicatePages--;
3518 }
3519 }
3520 else
3521 {
3522 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3523 rc = VERR_GMM_PAGE_ALREADY_FREE;
3524 break;
3525 }
3526 }
3527 else
3528 {
3529 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3530 rc = VERR_GMM_PAGE_NOT_FOUND;
3531 break;
3532 }
3533 paPages[iPage].idPage = NIL_GMM_PAGEID;
3534 }
3535
3536 /*
3537 * Update the account.
3538 */
3539 switch (enmAccount)
3540 {
3541 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3542 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3543 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3544 default:
3545 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3546 }
3547
3548 /*
3549 * Any threshold stuff to be done here?
3550 */
3551
3552 return rc;
3553}
3554
3555
3556/**
3557 * Free one or more pages.
3558 *
3559 * This is typically used at reset time or power off.
3560 *
3561 * @returns VBox status code:
3562 * @retval xxx
3563 *
3564 * @param pGVM The global (ring-0) VM structure.
3565 * @param pVM The cross context VM structure.
3566 * @param idCpu The VCPU id.
3567 * @param cPages The number of pages to allocate.
3568 * @param paPages Pointer to the page descriptors containing the page IDs
3569 * for each page.
3570 * @param enmAccount The account this relates to.
3571 * @thread EMT.
3572 */
3573GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3574{
3575 LogFlow(("GMMR0FreePages: pGVM=%p pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, pVM, cPages, paPages, enmAccount));
3576
3577 /*
3578 * Validate input and get the basics.
3579 */
3580 PGMM pGMM;
3581 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3582 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
3583 if (RT_FAILURE(rc))
3584 return rc;
3585
3586 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3587 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3588 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3589
3590 for (unsigned iPage = 0; iPage < cPages; iPage++)
3591 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3592 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3593 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3594
3595 /*
3596 * Take the semaphore and call the worker function.
3597 */
3598 gmmR0MutexAcquire(pGMM);
3599 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3600 {
3601 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3602 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3603 }
3604 else
3605 rc = VERR_GMM_IS_NOT_SANE;
3606 gmmR0MutexRelease(pGMM);
3607 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3608 return rc;
3609}
3610
3611
3612/**
3613 * VMMR0 request wrapper for GMMR0FreePages.
3614 *
3615 * @returns see GMMR0FreePages.
3616 * @param pGVM The global (ring-0) VM structure.
3617 * @param pVM The cross context VM structure.
3618 * @param idCpu The VCPU id.
3619 * @param pReq Pointer to the request packet.
3620 */
3621GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3622{
3623 /*
3624 * Validate input and pass it on.
3625 */
3626 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3627 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3628 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3629 VERR_INVALID_PARAMETER);
3630 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3631 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3632 VERR_INVALID_PARAMETER);
3633
3634 return GMMR0FreePages(pGVM, pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3635}
3636
3637
3638/**
3639 * Report back on a memory ballooning request.
3640 *
3641 * The request may or may not have been initiated by the GMM. If it was initiated
3642 * by the GMM it is important that this function is called even if no pages were
3643 * ballooned.
3644 *
3645 * @returns VBox status code:
3646 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3647 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3648 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3649 * indicating that we won't necessarily have sufficient RAM to boot
3650 * the VM again and that it should pause until this changes (we'll try
3651 * balloon some other VM). (For standard deflate we have little choice
3652 * but to hope the VM won't use the memory that was returned to it.)
3653 *
3654 * @param pGVM The global (ring-0) VM structure.
3655 * @param pVM The cross context VM structure.
3656 * @param idCpu The VCPU id.
3657 * @param enmAction Inflate/deflate/reset.
3658 * @param cBalloonedPages The number of pages that was ballooned.
3659 *
3660 * @thread EMT(idCpu)
3661 */
3662GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3663{
3664 LogFlow(("GMMR0BalloonedPages: pGVM=%p pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3665 pGVM, pVM, enmAction, cBalloonedPages));
3666
3667 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3668
3669 /*
3670 * Validate input and get the basics.
3671 */
3672 PGMM pGMM;
3673 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3674 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
3675 if (RT_FAILURE(rc))
3676 return rc;
3677
3678 /*
3679 * Take the semaphore and do some more validations.
3680 */
3681 gmmR0MutexAcquire(pGMM);
3682 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3683 {
3684 switch (enmAction)
3685 {
3686 case GMMBALLOONACTION_INFLATE:
3687 {
3688 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3689 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3690 {
3691 /*
3692 * Record the ballooned memory.
3693 */
3694 pGMM->cBalloonedPages += cBalloonedPages;
3695 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3696 {
3697 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3698 AssertFailed();
3699
3700 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3701 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3702 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3703 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3704 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3705 }
3706 else
3707 {
3708 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3709 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3710 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3711 }
3712 }
3713 else
3714 {
3715 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3716 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3717 pGVM->gmm.s.Stats.Reserved.cBasePages));
3718 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3719 }
3720 break;
3721 }
3722
3723 case GMMBALLOONACTION_DEFLATE:
3724 {
3725 /* Deflate. */
3726 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3727 {
3728 /*
3729 * Record the ballooned memory.
3730 */
3731 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3732 pGMM->cBalloonedPages -= cBalloonedPages;
3733 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3734 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3735 {
3736 AssertFailed(); /* This is path is for later. */
3737 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3738 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3739
3740 /*
3741 * Anything we need to do here now when the request has been completed?
3742 */
3743 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3744 }
3745 else
3746 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3747 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3748 }
3749 else
3750 {
3751 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3752 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3753 }
3754 break;
3755 }
3756
3757 case GMMBALLOONACTION_RESET:
3758 {
3759 /* Reset to an empty balloon. */
3760 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3761
3762 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3763 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3764 break;
3765 }
3766
3767 default:
3768 rc = VERR_INVALID_PARAMETER;
3769 break;
3770 }
3771 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3772 }
3773 else
3774 rc = VERR_GMM_IS_NOT_SANE;
3775
3776 gmmR0MutexRelease(pGMM);
3777 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3778 return rc;
3779}
3780
3781
3782/**
3783 * VMMR0 request wrapper for GMMR0BalloonedPages.
3784 *
3785 * @returns see GMMR0BalloonedPages.
3786 * @param pGVM The global (ring-0) VM structure.
3787 * @param pVM The cross context VM structure.
3788 * @param idCpu The VCPU id.
3789 * @param pReq Pointer to the request packet.
3790 */
3791GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3792{
3793 /*
3794 * Validate input and pass it on.
3795 */
3796 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3797 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3798 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3799 VERR_INVALID_PARAMETER);
3800
3801 return GMMR0BalloonedPages(pGVM, pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3802}
3803
3804
3805/**
3806 * Return memory statistics for the hypervisor
3807 *
3808 * @returns VBox status code.
3809 * @param pReq Pointer to the request packet.
3810 */
3811GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3812{
3813 /*
3814 * Validate input and pass it on.
3815 */
3816 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3817 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3818 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3819 VERR_INVALID_PARAMETER);
3820
3821 /*
3822 * Validate input and get the basics.
3823 */
3824 PGMM pGMM;
3825 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3826 pReq->cAllocPages = pGMM->cAllocatedPages;
3827 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3828 pReq->cBalloonedPages = pGMM->cBalloonedPages;
3829 pReq->cMaxPages = pGMM->cMaxPages;
3830 pReq->cSharedPages = pGMM->cDuplicatePages;
3831 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3832
3833 return VINF_SUCCESS;
3834}
3835
3836
3837/**
3838 * Return memory statistics for the VM
3839 *
3840 * @returns VBox status code.
3841 * @param pGVM The global (ring-0) VM structure.
3842 * @param pVM The cross context VM structure.
3843 * @param idCpu Cpu id.
3844 * @param pReq Pointer to the request packet.
3845 *
3846 * @thread EMT(idCpu)
3847 */
3848GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3849{
3850 /*
3851 * Validate input and pass it on.
3852 */
3853 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3854 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3855 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3856 VERR_INVALID_PARAMETER);
3857
3858 /*
3859 * Validate input and get the basics.
3860 */
3861 PGMM pGMM;
3862 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3863 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
3864 if (RT_FAILURE(rc))
3865 return rc;
3866
3867 /*
3868 * Take the semaphore and do some more validations.
3869 */
3870 gmmR0MutexAcquire(pGMM);
3871 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3872 {
3873 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3874 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3875 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3876 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3877 }
3878 else
3879 rc = VERR_GMM_IS_NOT_SANE;
3880
3881 gmmR0MutexRelease(pGMM);
3882 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3883 return rc;
3884}
3885
3886
3887/**
3888 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3889 *
3890 * Don't call this in legacy allocation mode!
3891 *
3892 * @returns VBox status code.
3893 * @param pGMM Pointer to the GMM instance data.
3894 * @param pGVM Pointer to the Global VM structure.
3895 * @param pChunk Pointer to the chunk to be unmapped.
3896 */
3897static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3898{
3899 Assert(!pGMM->fLegacyAllocationMode); NOREF(pGMM);
3900
3901 /*
3902 * Find the mapping and try unmapping it.
3903 */
3904 uint32_t cMappings = pChunk->cMappingsX;
3905 for (uint32_t i = 0; i < cMappings; i++)
3906 {
3907 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3908 if (pChunk->paMappingsX[i].pGVM == pGVM)
3909 {
3910 /* unmap */
3911 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3912 if (RT_SUCCESS(rc))
3913 {
3914 /* update the record. */
3915 cMappings--;
3916 if (i < cMappings)
3917 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3918 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3919 pChunk->paMappingsX[cMappings].pGVM = NULL;
3920 Assert(pChunk->cMappingsX - 1U == cMappings);
3921 pChunk->cMappingsX = cMappings;
3922 }
3923
3924 return rc;
3925 }
3926 }
3927
3928 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3929 return VERR_GMM_CHUNK_NOT_MAPPED;
3930}
3931
3932
3933/**
3934 * Unmaps a chunk previously mapped into the address space of the current process.
3935 *
3936 * @returns VBox status code.
3937 * @param pGMM Pointer to the GMM instance data.
3938 * @param pGVM Pointer to the Global VM structure.
3939 * @param pChunk Pointer to the chunk to be unmapped.
3940 * @param fRelaxedSem Whether we can release the semaphore while doing the
3941 * mapping (@c true) or not.
3942 */
3943static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3944{
3945 if (!pGMM->fLegacyAllocationMode)
3946 {
3947 /*
3948 * Lock the chunk and if possible leave the giant GMM lock.
3949 */
3950 GMMR0CHUNKMTXSTATE MtxState;
3951 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3952 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3953 if (RT_SUCCESS(rc))
3954 {
3955 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3956 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3957 }
3958 return rc;
3959 }
3960
3961 if (pChunk->hGVM == pGVM->hSelf)
3962 return VINF_SUCCESS;
3963
3964 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3965 return VERR_GMM_CHUNK_NOT_MAPPED;
3966}
3967
3968
3969/**
3970 * Worker for gmmR0MapChunk.
3971 *
3972 * @returns VBox status code.
3973 * @param pGMM Pointer to the GMM instance data.
3974 * @param pGVM Pointer to the Global VM structure.
3975 * @param pChunk Pointer to the chunk to be mapped.
3976 * @param ppvR3 Where to store the ring-3 address of the mapping.
3977 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3978 * contain the address of the existing mapping.
3979 */
3980static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3981{
3982 /*
3983 * If we're in legacy mode this is simple.
3984 */
3985 if (pGMM->fLegacyAllocationMode)
3986 {
3987 if (pChunk->hGVM != pGVM->hSelf)
3988 {
3989 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3990 return VERR_GMM_CHUNK_NOT_FOUND;
3991 }
3992
3993 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3994 return VINF_SUCCESS;
3995 }
3996
3997 /*
3998 * Check to see if the chunk is already mapped.
3999 */
4000 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4001 {
4002 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4003 if (pChunk->paMappingsX[i].pGVM == pGVM)
4004 {
4005 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4006 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4007#ifdef VBOX_WITH_PAGE_SHARING
4008 /* The ring-3 chunk cache can be out of sync; don't fail. */
4009 return VINF_SUCCESS;
4010#else
4011 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4012#endif
4013 }
4014 }
4015
4016 /*
4017 * Do the mapping.
4018 */
4019 RTR0MEMOBJ hMapObj;
4020 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4021 if (RT_SUCCESS(rc))
4022 {
4023 /* reallocate the array? assumes few users per chunk (usually one). */
4024 unsigned iMapping = pChunk->cMappingsX;
4025 if ( iMapping <= 3
4026 || (iMapping & 3) == 0)
4027 {
4028 unsigned cNewSize = iMapping <= 3
4029 ? iMapping + 1
4030 : iMapping + 4;
4031 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4032 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4033 {
4034 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4035 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4036 }
4037
4038 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4039 if (RT_UNLIKELY(!pvMappings))
4040 {
4041 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4042 return VERR_NO_MEMORY;
4043 }
4044 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4045 }
4046
4047 /* insert new entry */
4048 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4049 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4050 Assert(pChunk->cMappingsX == iMapping);
4051 pChunk->cMappingsX = iMapping + 1;
4052
4053 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4054 }
4055
4056 return rc;
4057}
4058
4059
4060/**
4061 * Maps a chunk into the user address space of the current process.
4062 *
4063 * @returns VBox status code.
4064 * @param pGMM Pointer to the GMM instance data.
4065 * @param pGVM Pointer to the Global VM structure.
4066 * @param pChunk Pointer to the chunk to be mapped.
4067 * @param fRelaxedSem Whether we can release the semaphore while doing the
4068 * mapping (@c true) or not.
4069 * @param ppvR3 Where to store the ring-3 address of the mapping.
4070 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4071 * contain the address of the existing mapping.
4072 */
4073static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4074{
4075 /*
4076 * Take the chunk lock and leave the giant GMM lock when possible, then
4077 * call the worker function.
4078 */
4079 GMMR0CHUNKMTXSTATE MtxState;
4080 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4081 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4082 if (RT_SUCCESS(rc))
4083 {
4084 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4085 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4086 }
4087
4088 return rc;
4089}
4090
4091
4092
4093#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4094/**
4095 * Check if a chunk is mapped into the specified VM
4096 *
4097 * @returns mapped yes/no
4098 * @param pGMM Pointer to the GMM instance.
4099 * @param pGVM Pointer to the Global VM structure.
4100 * @param pChunk Pointer to the chunk to be mapped.
4101 * @param ppvR3 Where to store the ring-3 address of the mapping.
4102 */
4103static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4104{
4105 GMMR0CHUNKMTXSTATE MtxState;
4106 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4107 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4108 {
4109 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4110 if (pChunk->paMappingsX[i].pGVM == pGVM)
4111 {
4112 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4113 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4114 return true;
4115 }
4116 }
4117 *ppvR3 = NULL;
4118 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4119 return false;
4120}
4121#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4122
4123
4124/**
4125 * Map a chunk and/or unmap another chunk.
4126 *
4127 * The mapping and unmapping applies to the current process.
4128 *
4129 * This API does two things because it saves a kernel call per mapping when
4130 * when the ring-3 mapping cache is full.
4131 *
4132 * @returns VBox status code.
4133 * @param pGVM The global (ring-0) VM structure.
4134 * @param pVM The cross context VM structure.
4135 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4136 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4137 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4138 * @thread EMT ???
4139 */
4140GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4141{
4142 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4143 pGVM, pVM, idChunkMap, idChunkUnmap, ppvR3));
4144
4145 /*
4146 * Validate input and get the basics.
4147 */
4148 PGMM pGMM;
4149 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4150 int rc = GVMMR0ValidateGVMandVM(pGVM, pVM);
4151 if (RT_FAILURE(rc))
4152 return rc;
4153
4154 AssertCompile(NIL_GMM_CHUNKID == 0);
4155 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4156 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4157
4158 if ( idChunkMap == NIL_GMM_CHUNKID
4159 && idChunkUnmap == NIL_GMM_CHUNKID)
4160 return VERR_INVALID_PARAMETER;
4161
4162 if (idChunkMap != NIL_GMM_CHUNKID)
4163 {
4164 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4165 *ppvR3 = NIL_RTR3PTR;
4166 }
4167
4168 /*
4169 * Take the semaphore and do the work.
4170 *
4171 * The unmapping is done last since it's easier to undo a mapping than
4172 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4173 * that it pushes the user virtual address space to within a chunk of
4174 * it it's limits, so, no problem here.
4175 */
4176 gmmR0MutexAcquire(pGMM);
4177 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4178 {
4179 PGMMCHUNK pMap = NULL;
4180 if (idChunkMap != NIL_GVM_HANDLE)
4181 {
4182 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4183 if (RT_LIKELY(pMap))
4184 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4185 else
4186 {
4187 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4188 rc = VERR_GMM_CHUNK_NOT_FOUND;
4189 }
4190 }
4191/** @todo split this operation, the bail out might (theoretcially) not be
4192 * entirely safe. */
4193
4194 if ( idChunkUnmap != NIL_GMM_CHUNKID
4195 && RT_SUCCESS(rc))
4196 {
4197 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4198 if (RT_LIKELY(pUnmap))
4199 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4200 else
4201 {
4202 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4203 rc = VERR_GMM_CHUNK_NOT_FOUND;
4204 }
4205
4206 if (RT_FAILURE(rc) && pMap)
4207 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4208 }
4209
4210 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4211 }
4212 else
4213 rc = VERR_GMM_IS_NOT_SANE;
4214 gmmR0MutexRelease(pGMM);
4215
4216 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4217 return rc;
4218}
4219
4220
4221/**
4222 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4223 *
4224 * @returns see GMMR0MapUnmapChunk.
4225 * @param pGVM The global (ring-0) VM structure.
4226 * @param pVM The cross context VM structure.
4227 * @param pReq Pointer to the request packet.
4228 */
4229GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4230{
4231 /*
4232 * Validate input and pass it on.
4233 */
4234 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4235 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4236
4237 return GMMR0MapUnmapChunk(pGVM, pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4238}
4239
4240
4241/**
4242 * Legacy mode API for supplying pages.
4243 *
4244 * The specified user address points to a allocation chunk sized block that
4245 * will be locked down and used by the GMM when the GM asks for pages.
4246 *
4247 * @returns VBox status code.
4248 * @param pGVM The global (ring-0) VM structure.
4249 * @param pVM The cross context VM structure.
4250 * @param idCpu The VCPU id.
4251 * @param pvR3 Pointer to the chunk size memory block to lock down.
4252 */
4253GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4254{
4255 /*
4256 * Validate input and get the basics.
4257 */
4258 PGMM pGMM;
4259 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4260 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
4261 if (RT_FAILURE(rc))
4262 return rc;
4263
4264 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4265 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4266
4267 if (!pGMM->fLegacyAllocationMode)
4268 {
4269 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4270 return VERR_NOT_SUPPORTED;
4271 }
4272
4273 /*
4274 * Lock the memory and add it as new chunk with our hGVM.
4275 * (The GMM locking is done inside gmmR0RegisterChunk.)
4276 */
4277 RTR0MEMOBJ MemObj;
4278 rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4279 if (RT_SUCCESS(rc))
4280 {
4281 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /*fChunkFlags*/, NULL);
4282 if (RT_SUCCESS(rc))
4283 gmmR0MutexRelease(pGMM);
4284 else
4285 RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4286 }
4287
4288 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4289 return rc;
4290}
4291
4292#ifdef VBOX_WITH_PAGE_SHARING
4293
4294# ifdef VBOX_STRICT
4295/**
4296 * For checksumming shared pages in strict builds.
4297 *
4298 * The purpose is making sure that a page doesn't change.
4299 *
4300 * @returns Checksum, 0 on failure.
4301 * @param pGMM The GMM instance data.
4302 * @param pGVM Pointer to the kernel-only VM instace data.
4303 * @param idPage The page ID.
4304 */
4305static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4306{
4307 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4308 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4309
4310 uint8_t *pbChunk;
4311 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4312 return 0;
4313 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4314
4315 return RTCrc32(pbPage, PAGE_SIZE);
4316}
4317# endif /* VBOX_STRICT */
4318
4319
4320/**
4321 * Calculates the module hash value.
4322 *
4323 * @returns Hash value.
4324 * @param pszModuleName The module name.
4325 * @param pszVersion The module version string.
4326 */
4327static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4328{
4329 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4330}
4331
4332
4333/**
4334 * Finds a global module.
4335 *
4336 * @returns Pointer to the global module on success, NULL if not found.
4337 * @param pGMM The GMM instance data.
4338 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4339 * @param cbModule The module size.
4340 * @param enmGuestOS The guest OS type.
4341 * @param cRegions The number of regions.
4342 * @param pszModuleName The module name.
4343 * @param pszVersion The module version.
4344 * @param paRegions The region descriptions.
4345 */
4346static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4347 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4348 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4349{
4350 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4351 pGblMod;
4352 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4353 {
4354 if (pGblMod->cbModule != cbModule)
4355 continue;
4356 if (pGblMod->enmGuestOS != enmGuestOS)
4357 continue;
4358 if (pGblMod->cRegions != cRegions)
4359 continue;
4360 if (strcmp(pGblMod->szName, pszModuleName))
4361 continue;
4362 if (strcmp(pGblMod->szVersion, pszVersion))
4363 continue;
4364
4365 uint32_t i;
4366 for (i = 0; i < cRegions; i++)
4367 {
4368 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4369 if (pGblMod->aRegions[i].off != off)
4370 break;
4371
4372 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4373 if (pGblMod->aRegions[i].cb != cb)
4374 break;
4375 }
4376
4377 if (i == cRegions)
4378 return pGblMod;
4379 }
4380
4381 return NULL;
4382}
4383
4384
4385/**
4386 * Creates a new global module.
4387 *
4388 * @returns VBox status code.
4389 * @param pGMM The GMM instance data.
4390 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4391 * @param cbModule The module size.
4392 * @param enmGuestOS The guest OS type.
4393 * @param cRegions The number of regions.
4394 * @param pszModuleName The module name.
4395 * @param pszVersion The module version.
4396 * @param paRegions The region descriptions.
4397 * @param ppGblMod Where to return the new module on success.
4398 */
4399static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4400 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4401 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4402{
4403 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4404 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4405 {
4406 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4407 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4408 }
4409
4410 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4411 if (!pGblMod)
4412 {
4413 Log(("gmmR0ShModNewGlobal: No memory\n"));
4414 return VERR_NO_MEMORY;
4415 }
4416
4417 pGblMod->Core.Key = uHash;
4418 pGblMod->cbModule = cbModule;
4419 pGblMod->cRegions = cRegions;
4420 pGblMod->cUsers = 1;
4421 pGblMod->enmGuestOS = enmGuestOS;
4422 strcpy(pGblMod->szName, pszModuleName);
4423 strcpy(pGblMod->szVersion, pszVersion);
4424
4425 for (uint32_t i = 0; i < cRegions; i++)
4426 {
4427 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4428 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4429 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4430 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4431 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4432 }
4433
4434 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4435 Assert(fInsert); NOREF(fInsert);
4436 pGMM->cShareableModules++;
4437
4438 *ppGblMod = pGblMod;
4439 return VINF_SUCCESS;
4440}
4441
4442
4443/**
4444 * Deletes a global module which is no longer referenced by anyone.
4445 *
4446 * @param pGMM The GMM instance data.
4447 * @param pGblMod The module to delete.
4448 */
4449static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4450{
4451 Assert(pGblMod->cUsers == 0);
4452 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4453
4454 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4455 Assert(pvTest == pGblMod); NOREF(pvTest);
4456 pGMM->cShareableModules--;
4457
4458 uint32_t i = pGblMod->cRegions;
4459 while (i-- > 0)
4460 {
4461 if (pGblMod->aRegions[i].paidPages)
4462 {
4463 /* We don't doing anything to the pages as they are handled by the
4464 copy-on-write mechanism in PGM. */
4465 RTMemFree(pGblMod->aRegions[i].paidPages);
4466 pGblMod->aRegions[i].paidPages = NULL;
4467 }
4468 }
4469 RTMemFree(pGblMod);
4470}
4471
4472
4473static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4474 PGMMSHAREDMODULEPERVM *ppRecVM)
4475{
4476 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4477 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4478
4479 PGMMSHAREDMODULEPERVM pRecVM;
4480 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4481 if (!pRecVM)
4482 return VERR_NO_MEMORY;
4483
4484 pRecVM->Core.Key = GCBaseAddr;
4485 for (uint32_t i = 0; i < cRegions; i++)
4486 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4487
4488 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4489 Assert(fInsert); NOREF(fInsert);
4490 pGVM->gmm.s.Stats.cShareableModules++;
4491
4492 *ppRecVM = pRecVM;
4493 return VINF_SUCCESS;
4494}
4495
4496
4497static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4498{
4499 /*
4500 * Free the per-VM module.
4501 */
4502 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4503 pRecVM->pGlobalModule = NULL;
4504
4505 if (fRemove)
4506 {
4507 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4508 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4509 }
4510
4511 RTMemFree(pRecVM);
4512
4513 /*
4514 * Release the global module.
4515 * (In the registration bailout case, it might not be.)
4516 */
4517 if (pGblMod)
4518 {
4519 Assert(pGblMod->cUsers > 0);
4520 pGblMod->cUsers--;
4521 if (pGblMod->cUsers == 0)
4522 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4523 }
4524}
4525
4526#endif /* VBOX_WITH_PAGE_SHARING */
4527
4528/**
4529 * Registers a new shared module for the VM.
4530 *
4531 * @returns VBox status code.
4532 * @param pGVM The global (ring-0) VM structure.
4533 * @param pVM The cross context VM structure.
4534 * @param idCpu The VCPU id.
4535 * @param enmGuestOS The guest OS type.
4536 * @param pszModuleName The module name.
4537 * @param pszVersion The module version.
4538 * @param GCPtrModBase The module base address.
4539 * @param cbModule The module size.
4540 * @param cRegions The mumber of shared region descriptors.
4541 * @param paRegions Pointer to an array of shared region(s).
4542 * @thread EMT(idCpu)
4543 */
4544GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4545 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4546 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4547{
4548#ifdef VBOX_WITH_PAGE_SHARING
4549 /*
4550 * Validate input and get the basics.
4551 *
4552 * Note! Turns out the module size does necessarily match the size of the
4553 * regions. (iTunes on XP)
4554 */
4555 PGMM pGMM;
4556 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4557 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
4558 if (RT_FAILURE(rc))
4559 return rc;
4560
4561 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4562 return VERR_GMM_TOO_MANY_REGIONS;
4563
4564 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4565 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4566
4567 uint32_t cbTotal = 0;
4568 for (uint32_t i = 0; i < cRegions; i++)
4569 {
4570 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4571 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4572
4573 cbTotal += paRegions[i].cbRegion;
4574 if (RT_UNLIKELY(cbTotal > _1G))
4575 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4576 }
4577
4578 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4579 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4580 return VERR_GMM_MODULE_NAME_TOO_LONG;
4581
4582 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4583 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4584 return VERR_GMM_MODULE_NAME_TOO_LONG;
4585
4586 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4587 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4588
4589 /*
4590 * Take the semaphore and do some more validations.
4591 */
4592 gmmR0MutexAcquire(pGMM);
4593 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4594 {
4595 /*
4596 * Check if this module is already locally registered and register
4597 * it if it isn't. The base address is a unique module identifier
4598 * locally.
4599 */
4600 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4601 bool fNewModule = pRecVM == NULL;
4602 if (fNewModule)
4603 {
4604 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4605 if (RT_SUCCESS(rc))
4606 {
4607 /*
4608 * Find a matching global module, register a new one if needed.
4609 */
4610 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4611 pszModuleName, pszVersion, paRegions);
4612 if (!pGblMod)
4613 {
4614 Assert(fNewModule);
4615 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4616 pszModuleName, pszVersion, paRegions, &pGblMod);
4617 if (RT_SUCCESS(rc))
4618 {
4619 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4620 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4621 }
4622 else
4623 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4624 }
4625 else
4626 {
4627 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4628 pGblMod->cUsers++;
4629 pRecVM->pGlobalModule = pGblMod;
4630
4631 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4632 }
4633 }
4634 }
4635 else
4636 {
4637 /*
4638 * Attempt to re-register an existing module.
4639 */
4640 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4641 pszModuleName, pszVersion, paRegions);
4642 if (pRecVM->pGlobalModule == pGblMod)
4643 {
4644 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4645 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4646 }
4647 else
4648 {
4649 /** @todo may have to unregister+register when this happens in case it's caused
4650 * by VBoxService crashing and being restarted... */
4651 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4652 " incoming at %RGvLB%#x %s %s rgns %u\n"
4653 " existing at %RGvLB%#x %s %s rgns %u\n",
4654 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4655 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4656 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4657 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4658 }
4659 }
4660 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4661 }
4662 else
4663 rc = VERR_GMM_IS_NOT_SANE;
4664
4665 gmmR0MutexRelease(pGMM);
4666 return rc;
4667#else
4668
4669 NOREF(pGVM); NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4670 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4671 return VERR_NOT_IMPLEMENTED;
4672#endif
4673}
4674
4675
4676/**
4677 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4678 *
4679 * @returns see GMMR0RegisterSharedModule.
4680 * @param pGVM The global (ring-0) VM structure.
4681 * @param pVM The cross context VM structure.
4682 * @param idCpu The VCPU id.
4683 * @param pReq Pointer to the request packet.
4684 */
4685GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4686{
4687 /*
4688 * Validate input and pass it on.
4689 */
4690 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4691 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4692 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4693 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4694
4695 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4696 pReq->rc = GMMR0RegisterSharedModule(pGVM, pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4697 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4698 return VINF_SUCCESS;
4699}
4700
4701
4702/**
4703 * Unregisters a shared module for the VM
4704 *
4705 * @returns VBox status code.
4706 * @param pGVM The global (ring-0) VM structure.
4707 * @param pVM The cross context VM structure.
4708 * @param idCpu The VCPU id.
4709 * @param pszModuleName The module name.
4710 * @param pszVersion The module version.
4711 * @param GCPtrModBase The module base address.
4712 * @param cbModule The module size.
4713 */
4714GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, PVM pVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
4715 RTGCPTR GCPtrModBase, uint32_t cbModule)
4716{
4717#ifdef VBOX_WITH_PAGE_SHARING
4718 /*
4719 * Validate input and get the basics.
4720 */
4721 PGMM pGMM;
4722 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4723 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
4724 if (RT_FAILURE(rc))
4725 return rc;
4726
4727 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4728 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4729 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4730 return VERR_GMM_MODULE_NAME_TOO_LONG;
4731 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4732 return VERR_GMM_MODULE_NAME_TOO_LONG;
4733
4734 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4735
4736 /*
4737 * Take the semaphore and do some more validations.
4738 */
4739 gmmR0MutexAcquire(pGMM);
4740 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4741 {
4742 /*
4743 * Locate and remove the specified module.
4744 */
4745 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4746 if (pRecVM)
4747 {
4748 /** @todo Do we need to do more validations here, like that the
4749 * name + version + cbModule matches? */
4750 NOREF(cbModule);
4751 Assert(pRecVM->pGlobalModule);
4752 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4753 }
4754 else
4755 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4756
4757 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4758 }
4759 else
4760 rc = VERR_GMM_IS_NOT_SANE;
4761
4762 gmmR0MutexRelease(pGMM);
4763 return rc;
4764#else
4765
4766 NOREF(pGVM); NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4767 return VERR_NOT_IMPLEMENTED;
4768#endif
4769}
4770
4771
4772/**
4773 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4774 *
4775 * @returns see GMMR0UnregisterSharedModule.
4776 * @param pGVM The global (ring-0) VM structure.
4777 * @param pVM The cross context VM structure.
4778 * @param idCpu The VCPU id.
4779 * @param pReq Pointer to the request packet.
4780 */
4781GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4782{
4783 /*
4784 * Validate input and pass it on.
4785 */
4786 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4787 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4788
4789 return GMMR0UnregisterSharedModule(pGVM, pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4790}
4791
4792#ifdef VBOX_WITH_PAGE_SHARING
4793
4794/**
4795 * Increase the use count of a shared page, the page is known to exist and be valid and such.
4796 *
4797 * @param pGMM Pointer to the GMM instance.
4798 * @param pGVM Pointer to the GVM instance.
4799 * @param pPage The page structure.
4800 */
4801DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4802{
4803 Assert(pGMM->cSharedPages > 0);
4804 Assert(pGMM->cAllocatedPages > 0);
4805
4806 pGMM->cDuplicatePages++;
4807
4808 pPage->Shared.cRefs++;
4809 pGVM->gmm.s.Stats.cSharedPages++;
4810 pGVM->gmm.s.Stats.Allocated.cBasePages++;
4811}
4812
4813
4814/**
4815 * Converts a private page to a shared page, the page is known to exist and be valid and such.
4816 *
4817 * @param pGMM Pointer to the GMM instance.
4818 * @param pGVM Pointer to the GVM instance.
4819 * @param HCPhys Host physical address
4820 * @param idPage The Page ID
4821 * @param pPage The page structure.
4822 * @param pPageDesc Shared page descriptor
4823 */
4824DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4825 PGMMSHAREDPAGEDESC pPageDesc)
4826{
4827 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4828 Assert(pChunk);
4829 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4830 Assert(GMM_PAGE_IS_PRIVATE(pPage));
4831
4832 pChunk->cPrivate--;
4833 pChunk->cShared++;
4834
4835 pGMM->cSharedPages++;
4836
4837 pGVM->gmm.s.Stats.cSharedPages++;
4838 pGVM->gmm.s.Stats.cPrivatePages--;
4839
4840 /* Modify the page structure. */
4841 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4842 pPage->Shared.cRefs = 1;
4843#ifdef VBOX_STRICT
4844 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4845 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4846#else
4847 NOREF(pPageDesc);
4848 pPage->Shared.u14Checksum = 0;
4849#endif
4850 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4851}
4852
4853
4854static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4855 unsigned idxRegion, unsigned idxPage,
4856 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4857{
4858 NOREF(pModule);
4859
4860 /* Easy case: just change the internal page type. */
4861 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4862 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4863 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4864 VERR_PGM_PHYS_INVALID_PAGE_ID);
4865 NOREF(idxRegion);
4866
4867 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4868
4869 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
4870
4871 /* Keep track of these references. */
4872 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4873
4874 return VINF_SUCCESS;
4875}
4876
4877/**
4878 * Checks specified shared module range for changes
4879 *
4880 * Performs the following tasks:
4881 * - If a shared page is new, then it changes the GMM page type to shared and
4882 * returns it in the pPageDesc descriptor.
4883 * - If a shared page already exists, then it checks if the VM page is
4884 * identical and if so frees the VM page and returns the shared page in
4885 * pPageDesc descriptor.
4886 *
4887 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
4888 *
4889 * @returns VBox status code.
4890 * @param pGVM Pointer to the GVM instance data.
4891 * @param pModule Module description
4892 * @param idxRegion Region index
4893 * @param idxPage Page index
4894 * @param pPageDesc Page descriptor
4895 */
4896GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
4897 PGMMSHAREDPAGEDESC pPageDesc)
4898{
4899 int rc;
4900 PGMM pGMM;
4901 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4902 pPageDesc->u32StrictChecksum = 0;
4903
4904 AssertMsgReturn(idxRegion < pModule->cRegions,
4905 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4906 VERR_INVALID_PARAMETER);
4907
4908 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
4909 AssertMsgReturn(idxPage < cPages,
4910 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4911 VERR_INVALID_PARAMETER);
4912
4913 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4914
4915 /*
4916 * First time; create a page descriptor array.
4917 */
4918 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4919 if (!pGlobalRegion->paidPages)
4920 {
4921 Log(("Allocate page descriptor array for %d pages\n", cPages));
4922 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
4923 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4924
4925 /* Invalidate all descriptors. */
4926 uint32_t i = cPages;
4927 while (i-- > 0)
4928 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4929 }
4930
4931 /*
4932 * We've seen this shared page for the first time?
4933 */
4934 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4935 {
4936 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4937 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4938 }
4939
4940 /*
4941 * We've seen it before...
4942 */
4943 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4944 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4945 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4946
4947 /*
4948 * Get the shared page source.
4949 */
4950 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
4951 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
4952 VERR_PGM_PHYS_INVALID_PAGE_ID);
4953
4954 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4955 {
4956 /*
4957 * Page was freed at some point; invalidate this entry.
4958 */
4959 /** @todo this isn't really bullet proof. */
4960 Log(("Old shared page was freed -> create a new one\n"));
4961 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
4962 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4963 }
4964
4965 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4966
4967 /*
4968 * Calculate the virtual address of the local page.
4969 */
4970 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
4971 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
4972 VERR_PGM_PHYS_INVALID_PAGE_ID);
4973
4974 uint8_t *pbChunk;
4975 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
4976 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
4977 VERR_PGM_PHYS_INVALID_PAGE_ID);
4978 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4979
4980 /*
4981 * Calculate the virtual address of the shared page.
4982 */
4983 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
4984 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4985
4986 /*
4987 * Get the virtual address of the physical page; map the chunk into the VM
4988 * process if not already done.
4989 */
4990 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4991 {
4992 Log(("Map chunk into process!\n"));
4993 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
4994 AssertRCReturn(rc, rc);
4995 }
4996 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4997
4998#ifdef VBOX_STRICT
4999 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5000 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5001 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5002 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5003 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5004#endif
5005
5006 /** @todo write ASMMemComparePage. */
5007 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5008 {
5009 Log(("Unexpected differences found between local and shared page; skip\n"));
5010 /* Signal to the caller that this one hasn't changed. */
5011 pPageDesc->idPage = NIL_GMM_PAGEID;
5012 return VINF_SUCCESS;
5013 }
5014
5015 /*
5016 * Free the old local page.
5017 */
5018 GMMFREEPAGEDESC PageDesc;
5019 PageDesc.idPage = pPageDesc->idPage;
5020 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5021 AssertRCReturn(rc, rc);
5022
5023 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5024
5025 /*
5026 * Pass along the new physical address & page id.
5027 */
5028 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5029 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5030
5031 return VINF_SUCCESS;
5032}
5033
5034
5035/**
5036 * RTAvlGCPtrDestroy callback.
5037 *
5038 * @returns 0 or VERR_GMM_INSTANCE.
5039 * @param pNode The node to destroy.
5040 * @param pvArgs Pointer to an argument packet.
5041 */
5042static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5043{
5044 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5045 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5046 (PGMMSHAREDMODULEPERVM)pNode,
5047 false /*fRemove*/);
5048 return VINF_SUCCESS;
5049}
5050
5051
5052/**
5053 * Used by GMMR0CleanupVM to clean up shared modules.
5054 *
5055 * This is called without taking the GMM lock so that it can be yielded as
5056 * needed here.
5057 *
5058 * @param pGMM The GMM handle.
5059 * @param pGVM The global VM handle.
5060 */
5061static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5062{
5063 gmmR0MutexAcquire(pGMM);
5064 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5065
5066 GMMR0SHMODPERVMDTORARGS Args;
5067 Args.pGVM = pGVM;
5068 Args.pGMM = pGMM;
5069 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5070
5071 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5072 pGVM->gmm.s.Stats.cShareableModules = 0;
5073
5074 gmmR0MutexRelease(pGMM);
5075}
5076
5077#endif /* VBOX_WITH_PAGE_SHARING */
5078
5079/**
5080 * Removes all shared modules for the specified VM
5081 *
5082 * @returns VBox status code.
5083 * @param pGVM The global (ring-0) VM structure.
5084 * @param pVM The cross context VM structure.
5085 * @param idCpu The VCPU id.
5086 */
5087GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, PVM pVM, VMCPUID idCpu)
5088{
5089#ifdef VBOX_WITH_PAGE_SHARING
5090 /*
5091 * Validate input and get the basics.
5092 */
5093 PGMM pGMM;
5094 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5095 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
5096 if (RT_FAILURE(rc))
5097 return rc;
5098
5099 /*
5100 * Take the semaphore and do some more validations.
5101 */
5102 gmmR0MutexAcquire(pGMM);
5103 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5104 {
5105 Log(("GMMR0ResetSharedModules\n"));
5106 GMMR0SHMODPERVMDTORARGS Args;
5107 Args.pGVM = pGVM;
5108 Args.pGMM = pGMM;
5109 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5110 pGVM->gmm.s.Stats.cShareableModules = 0;
5111
5112 rc = VINF_SUCCESS;
5113 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5114 }
5115 else
5116 rc = VERR_GMM_IS_NOT_SANE;
5117
5118 gmmR0MutexRelease(pGMM);
5119 return rc;
5120#else
5121 RT_NOREF(pGVM, pVM, idCpu);
5122 return VERR_NOT_IMPLEMENTED;
5123#endif
5124}
5125
5126#ifdef VBOX_WITH_PAGE_SHARING
5127
5128/**
5129 * Tree enumeration callback for checking a shared module.
5130 */
5131static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5132{
5133 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5134 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5135 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5136
5137 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5138 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5139
5140 int rc = PGMR0SharedModuleCheck(pArgs->pGVM->pVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5141 if (RT_FAILURE(rc))
5142 return rc;
5143 return VINF_SUCCESS;
5144}
5145
5146#endif /* VBOX_WITH_PAGE_SHARING */
5147
5148/**
5149 * Check all shared modules for the specified VM.
5150 *
5151 * @returns VBox status code.
5152 * @param pGVM The global (ring-0) VM structure.
5153 * @param pVM The cross context VM structure.
5154 * @param idCpu The calling EMT number.
5155 * @thread EMT(idCpu)
5156 */
5157GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, PVM pVM, VMCPUID idCpu)
5158{
5159#ifdef VBOX_WITH_PAGE_SHARING
5160 /*
5161 * Validate input and get the basics.
5162 */
5163 PGMM pGMM;
5164 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5165 int rc = GVMMR0ValidateGVMandVMandEMT(pGVM, pVM, idCpu);
5166 if (RT_FAILURE(rc))
5167 return rc;
5168
5169# ifndef DEBUG_sandervl
5170 /*
5171 * Take the semaphore and do some more validations.
5172 */
5173 gmmR0MutexAcquire(pGMM);
5174# endif
5175 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5176 {
5177 /*
5178 * Walk the tree, checking each module.
5179 */
5180 Log(("GMMR0CheckSharedModules\n"));
5181
5182 GMMCHECKSHAREDMODULEINFO Args;
5183 Args.pGVM = pGVM;
5184 Args.idCpu = idCpu;
5185 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5186
5187 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5188 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5189 }
5190 else
5191 rc = VERR_GMM_IS_NOT_SANE;
5192
5193# ifndef DEBUG_sandervl
5194 gmmR0MutexRelease(pGMM);
5195# endif
5196 return rc;
5197#else
5198 RT_NOREF(pGVM, pVM, idCpu);
5199 return VERR_NOT_IMPLEMENTED;
5200#endif
5201}
5202
5203#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5204
5205/**
5206 * RTAvlU32DoWithAll callback.
5207 *
5208 * @returns 0
5209 * @param pNode The node to search.
5210 * @param pvUser Pointer to the input argument packet.
5211 */
5212static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvUser)
5213{
5214 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
5215 GMMFINDDUPPAGEINFO *pArgs = (GMMFINDDUPPAGEINFO *)pvUser;
5216 PGVM pGVM = pArgs->pGVM;
5217 PGMM pGMM = pArgs->pGMM;
5218 uint8_t *pbChunk;
5219
5220 /* Only take chunks not mapped into this VM process; not entirely correct. */
5221 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5222 {
5223 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5224 if (RT_SUCCESS(rc))
5225 {
5226 /*
5227 * Look for duplicate pages
5228 */
5229 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5230 while (iPage-- > 0)
5231 {
5232 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5233 {
5234 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5235
5236 if (!memcmp(pArgs->pSourcePage, pbDestPage, PAGE_SIZE))
5237 {
5238 pArgs->fFoundDuplicate = true;
5239 break;
5240 }
5241 }
5242 }
5243 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5244 }
5245 }
5246 return pArgs->fFoundDuplicate; /* (stops search if true) */
5247}
5248
5249
5250/**
5251 * Find a duplicate of the specified page in other active VMs
5252 *
5253 * @returns VBox status code.
5254 * @param pGVM The global (ring-0) VM structure.
5255 * @param pVM The cross context VM structure.
5256 * @param pReq Pointer to the request packet.
5257 */
5258GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5259{
5260 /*
5261 * Validate input and pass it on.
5262 */
5263 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5264 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5265
5266 PGMM pGMM;
5267 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5268
5269 int rc = GVMMR0ValidateGVMandVM(pGVM, pVM);
5270 if (RT_FAILURE(rc))
5271 return rc;
5272
5273 /*
5274 * Take the semaphore and do some more validations.
5275 */
5276 rc = gmmR0MutexAcquire(pGMM);
5277 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5278 {
5279 uint8_t *pbChunk;
5280 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5281 if (pChunk)
5282 {
5283 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5284 {
5285 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5286 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5287 if (pPage)
5288 {
5289 GMMFINDDUPPAGEINFO Args;
5290 Args.pGVM = pGVM;
5291 Args.pGMM = pGMM;
5292 Args.pSourcePage = pbSourcePage;
5293 Args.fFoundDuplicate = false;
5294 RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Args);
5295
5296 pReq->fDuplicate = Args.fFoundDuplicate;
5297 }
5298 else
5299 {
5300 AssertFailed();
5301 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5302 }
5303 }
5304 else
5305 AssertFailed();
5306 }
5307 else
5308 AssertFailed();
5309 }
5310 else
5311 rc = VERR_GMM_IS_NOT_SANE;
5312
5313 gmmR0MutexRelease(pGMM);
5314 return rc;
5315}
5316
5317#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5318
5319
5320/**
5321 * Retrieves the GMM statistics visible to the caller.
5322 *
5323 * @returns VBox status code.
5324 *
5325 * @param pStats Where to put the statistics.
5326 * @param pSession The current session.
5327 * @param pGVM The GVM to obtain statistics for. Optional.
5328 * @param pVM The VM structure corresponding to @a pGVM.
5329 */
5330GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM, PVM pVM)
5331{
5332 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p pVM=%p\n", pStats, pSession, pGVM, pVM));
5333
5334 /*
5335 * Validate input.
5336 */
5337 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5338 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5339 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5340
5341 PGMM pGMM;
5342 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5343
5344 /*
5345 * Validate the VM handle, if not NULL, and lock the GMM.
5346 */
5347 int rc;
5348 if (pGVM)
5349 {
5350 rc = GVMMR0ValidateGVMandVM(pGVM, pVM);
5351 if (RT_FAILURE(rc))
5352 return rc;
5353 }
5354
5355 rc = gmmR0MutexAcquire(pGMM);
5356 if (RT_FAILURE(rc))
5357 return rc;
5358
5359 /*
5360 * Copy out the GMM statistics.
5361 */
5362 pStats->cMaxPages = pGMM->cMaxPages;
5363 pStats->cReservedPages = pGMM->cReservedPages;
5364 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5365 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5366 pStats->cSharedPages = pGMM->cSharedPages;
5367 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5368 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5369 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5370 pStats->cChunks = pGMM->cChunks;
5371 pStats->cFreedChunks = pGMM->cFreedChunks;
5372 pStats->cShareableModules = pGMM->cShareableModules;
5373 RT_ZERO(pStats->au64Reserved);
5374
5375 /*
5376 * Copy out the VM statistics.
5377 */
5378 if (pGVM)
5379 pStats->VMStats = pGVM->gmm.s.Stats;
5380 else
5381 RT_ZERO(pStats->VMStats);
5382
5383 gmmR0MutexRelease(pGMM);
5384 return rc;
5385}
5386
5387
5388/**
5389 * VMMR0 request wrapper for GMMR0QueryStatistics.
5390 *
5391 * @returns see GMMR0QueryStatistics.
5392 * @param pGVM The global (ring-0) VM structure. Optional.
5393 * @param pVM The cross context VM structure. Optional.
5394 * @param pReq Pointer to the request packet.
5395 */
5396GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PVM pVM, PGMMQUERYSTATISTICSSREQ pReq)
5397{
5398 /*
5399 * Validate input and pass it on.
5400 */
5401 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5402 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5403
5404 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM, pVM);
5405}
5406
5407
5408/**
5409 * Resets the specified GMM statistics.
5410 *
5411 * @returns VBox status code.
5412 *
5413 * @param pStats Which statistics to reset, that is, non-zero fields
5414 * indicates which to reset.
5415 * @param pSession The current session.
5416 * @param pGVM The GVM to reset statistics for. Optional.
5417 * @param pVM The VM structure corresponding to @a pGVM.
5418 */
5419GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM, PVM pVM)
5420{
5421 NOREF(pStats); NOREF(pSession); NOREF(pVM); NOREF(pGVM);
5422 /* Currently nothing we can reset at the moment. */
5423 return VINF_SUCCESS;
5424}
5425
5426
5427/**
5428 * VMMR0 request wrapper for GMMR0ResetStatistics.
5429 *
5430 * @returns see GMMR0ResetStatistics.
5431 * @param pGVM The global (ring-0) VM structure. Optional.
5432 * @param pVM The cross context VM structure. Optional.
5433 * @param pReq Pointer to the request packet.
5434 */
5435GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PVM pVM, PGMMRESETSTATISTICSSREQ pReq)
5436{
5437 /*
5438 * Validate input and pass it on.
5439 */
5440 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5441 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5442
5443 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM, pVM);
5444}
5445
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette