VirtualBox

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 91044

Last change on this file since 91044 was 91014, checked in by vboxsync, 3 years ago

VMM: Made VBOX_WITH_RAM_IN_KERNEL non-optional, removing all the tests for it. bugref:9627

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Revision
File size: 200.6 KB
Line 
1/* $Id: GMMR0.cpp 91014 2021-08-31 01:03:39Z vboxsync $ */
2/** @file
3 * GMM - Global Memory Manager.
4 */
5
6/*
7 * Copyright (C) 2007-2020 Oracle Corporation
8 *
9 * This file is part of VirtualBox Open Source Edition (OSE), as
10 * available from http://www.virtualbox.org. This file is free software;
11 * you can redistribute it and/or modify it under the terms of the GNU
12 * General Public License (GPL) as published by the Free Software
13 * Foundation, in version 2 as it comes in the "COPYING" file of the
14 * VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15 * hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16 */
17
18
19/** @page pg_gmm GMM - The Global Memory Manager
20 *
21 * As the name indicates, this component is responsible for global memory
22 * management. Currently only guest RAM is allocated from the GMM, but this
23 * may change to include shadow page tables and other bits later.
24 *
25 * Guest RAM is managed as individual pages, but allocated from the host OS
26 * in chunks for reasons of portability / efficiency. To minimize the memory
27 * footprint all tracking structure must be as small as possible without
28 * unnecessary performance penalties.
29 *
30 * The allocation chunks has fixed sized, the size defined at compile time
31 * by the #GMM_CHUNK_SIZE \#define.
32 *
33 * Each chunk is given an unique ID. Each page also has a unique ID. The
34 * relationship between the two IDs is:
35 * @code
36 * GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37 * idPage = (idChunk << GMM_CHUNK_SHIFT) | iPage;
38 * @endcode
39 * Where iPage is the index of the page within the chunk. This ID scheme
40 * permits for efficient chunk and page lookup, but it relies on the chunk size
41 * to be set at compile time. The chunks are organized in an AVL tree with their
42 * IDs being the keys.
43 *
44 * The physical address of each page in an allocation chunk is maintained by
45 * the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46 * need to duplicate this information (it'll cost 8-bytes per page if we did).
47 *
48 * So what do we need to track per page? Most importantly we need to know
49 * which state the page is in:
50 * - Private - Allocated for (eventually) backing one particular VM page.
51 * - Shared - Readonly page that is used by one or more VMs and treated
52 * as COW by PGM.
53 * - Free - Not used by anyone.
54 *
55 * For the page replacement operations (sharing, defragmenting and freeing)
56 * to be somewhat efficient, private pages needs to be associated with a
57 * particular page in a particular VM.
58 *
59 * Tracking the usage of shared pages is impractical and expensive, so we'll
60 * settle for a reference counting system instead.
61 *
62 * Free pages will be chained on LIFOs
63 *
64 * On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65 * systems a 32-bit bitfield will have to suffice because of address space
66 * limitations. The #GMMPAGE structure shows the details.
67 *
68 *
69 * @section sec_gmm_alloc_strat Page Allocation Strategy
70 *
71 * The strategy for allocating pages has to take fragmentation and shared
72 * pages into account, or we may end up with with 2000 chunks with only
73 * a few pages in each. Shared pages cannot easily be reallocated because
74 * of the inaccurate usage accounting (see above). Private pages can be
75 * reallocated by a defragmentation thread in the same manner that sharing
76 * is done.
77 *
78 * The first approach is to manage the free pages in two sets depending on
79 * whether they are mainly for the allocation of shared or private pages.
80 * In the initial implementation there will be almost no possibility for
81 * mixing shared and private pages in the same chunk (only if we're really
82 * stressed on memory), but when we implement forking of VMs and have to
83 * deal with lots of COW pages it'll start getting kind of interesting.
84 *
85 * The sets are lists of chunks with approximately the same number of
86 * free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87 * consists of 16 lists. So, the first list will contain the chunks with
88 * 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89 * moved between the lists as pages are freed up or allocated.
90 *
91 *
92 * @section sec_gmm_costs Costs
93 *
94 * The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95 * entails. In addition there is the chunk cost of approximately
96 * (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97 *
98 * On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99 * and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100 * The cost on Linux is identical, but here it's because of sizeof(struct page *).
101 *
102 *
103 * @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104 *
105 * In legacy mode the page source is locked user pages and not
106 * #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107 * by the VM that locked it. We will make no attempt at implementing
108 * page sharing on these systems, just do enough to make it all work.
109 *
110 * @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111 * under the assumption that there is sufficient kernel virtual address
112 * space to map all of the guest memory allocations. So, we'll be using
113 * #RTR0MemObjAllocPage on some platforms as an alternative to
114 * #RTR0MemObjAllocPhysNC.
115 *
116 *
117 * @subsection sub_gmm_locking Serializing
118 *
119 * One simple fast mutex will be employed in the initial implementation, not
120 * two as mentioned in @ref sec_pgmPhys_Serializing.
121 *
122 * @see @ref sec_pgmPhys_Serializing
123 *
124 *
125 * @section sec_gmm_overcommit Memory Over-Commitment Management
126 *
127 * The GVM will have to do the system wide memory over-commitment
128 * management. My current ideas are:
129 * - Per VM oc policy that indicates how much to initially commit
130 * to it and what to do in a out-of-memory situation.
131 * - Prevent overtaxing the host.
132 *
133 * There are some challenges here, the main ones are configurability and
134 * security. Should we for instance permit anyone to request 100% memory
135 * commitment? Who should be allowed to do runtime adjustments of the
136 * config. And how to prevent these settings from being lost when the last
137 * VM process exits? The solution is probably to have an optional root
138 * daemon the will keep VMMR0.r0 in memory and enable the security measures.
139 *
140 *
141 *
142 * @section sec_gmm_numa NUMA
143 *
144 * NUMA considerations will be designed and implemented a bit later.
145 *
146 * The preliminary guesses is that we will have to try allocate memory as
147 * close as possible to the CPUs the VM is executed on (EMT and additional CPU
148 * threads). Which means it's mostly about allocation and sharing policies.
149 * Both the scheduler and allocator interface will to supply some NUMA info
150 * and we'll need to have a way to calc access costs.
151 *
152 */
153
154
155/*********************************************************************************************************************************
156* Header Files *
157*********************************************************************************************************************************/
158#define LOG_GROUP LOG_GROUP_GMM
159#include <VBox/rawpci.h>
160#include <VBox/vmm/gmm.h>
161#include "GMMR0Internal.h"
162#include <VBox/vmm/vmcc.h>
163#include <VBox/vmm/pgm.h>
164#include <VBox/log.h>
165#include <VBox/param.h>
166#include <VBox/err.h>
167#include <VBox/VMMDev.h>
168#include <iprt/asm.h>
169#include <iprt/avl.h>
170#ifdef VBOX_STRICT
171# include <iprt/crc.h>
172#endif
173#include <iprt/critsect.h>
174#include <iprt/list.h>
175#include <iprt/mem.h>
176#include <iprt/memobj.h>
177#include <iprt/mp.h>
178#include <iprt/semaphore.h>
179#include <iprt/spinlock.h>
180#include <iprt/string.h>
181#include <iprt/time.h>
182
183
184/*********************************************************************************************************************************
185* Defined Constants And Macros *
186*********************************************************************************************************************************/
187/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
188 * Use a critical section instead of a fast mutex for the giant GMM lock.
189 *
190 * @remarks This is primarily a way of avoiding the deadlock checks in the
191 * windows driver verifier. */
192#if defined(RT_OS_WINDOWS) || defined(RT_OS_DARWIN) || defined(DOXYGEN_RUNNING)
193# define VBOX_USE_CRIT_SECT_FOR_GIANT
194#endif
195
196#if defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM) && !defined(RT_OS_DARWIN)
197/** Enable the legacy mode code (will be dropped soon). */
198# define GMM_WITH_LEGACY_MODE
199#endif
200
201
202/*********************************************************************************************************************************
203* Structures and Typedefs *
204*********************************************************************************************************************************/
205/** Pointer to set of free chunks. */
206typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208/**
209 * The per-page tracking structure employed by the GMM.
210 *
211 * On 32-bit hosts we'll some trickery is necessary to compress all
212 * the information into 32-bits. When the fSharedFree member is set,
213 * the 30th bit decides whether it's a free page or not.
214 *
215 * Because of the different layout on 32-bit and 64-bit hosts, macros
216 * are used to get and set some of the data.
217 */
218typedef union GMMPAGE
219{
220#if HC_ARCH_BITS == 64
221 /** Unsigned integer view. */
222 uint64_t u;
223
224 /** The common view. */
225 struct GMMPAGECOMMON
226 {
227 uint32_t uStuff1 : 32;
228 uint32_t uStuff2 : 30;
229 /** The page state. */
230 uint32_t u2State : 2;
231 } Common;
232
233 /** The view of a private page. */
234 struct GMMPAGEPRIVATE
235 {
236 /** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
237 uint32_t pfn;
238 /** The GVM handle. (64K VMs) */
239 uint32_t hGVM : 16;
240 /** Reserved. */
241 uint32_t u16Reserved : 14;
242 /** The page state. */
243 uint32_t u2State : 2;
244 } Private;
245
246 /** The view of a shared page. */
247 struct GMMPAGESHARED
248 {
249 /** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
250 uint32_t pfn;
251 /** The reference count (64K VMs). */
252 uint32_t cRefs : 16;
253 /** Used for debug checksumming. */
254 uint32_t u14Checksum : 14;
255 /** The page state. */
256 uint32_t u2State : 2;
257 } Shared;
258
259 /** The view of a free page. */
260 struct GMMPAGEFREE
261 {
262 /** The index of the next page in the free list. UINT16_MAX is NIL. */
263 uint16_t iNext;
264 /** Reserved. Checksum or something? */
265 uint16_t u16Reserved0;
266 /** Reserved. Checksum or something? */
267 uint32_t u30Reserved1 : 30;
268 /** The page state. */
269 uint32_t u2State : 2;
270 } Free;
271
272#else /* 32-bit */
273 /** Unsigned integer view. */
274 uint32_t u;
275
276 /** The common view. */
277 struct GMMPAGECOMMON
278 {
279 uint32_t uStuff : 30;
280 /** The page state. */
281 uint32_t u2State : 2;
282 } Common;
283
284 /** The view of a private page. */
285 struct GMMPAGEPRIVATE
286 {
287 /** The guest page frame number. (Max addressable: 2 ^ 36) */
288 uint32_t pfn : 24;
289 /** The GVM handle. (127 VMs) */
290 uint32_t hGVM : 7;
291 /** The top page state bit, MBZ. */
292 uint32_t fZero : 1;
293 } Private;
294
295 /** The view of a shared page. */
296 struct GMMPAGESHARED
297 {
298 /** The reference count. */
299 uint32_t cRefs : 30;
300 /** The page state. */
301 uint32_t u2State : 2;
302 } Shared;
303
304 /** The view of a free page. */
305 struct GMMPAGEFREE
306 {
307 /** The index of the next page in the free list. UINT16_MAX is NIL. */
308 uint32_t iNext : 16;
309 /** Reserved. Checksum or something? */
310 uint32_t u14Reserved : 14;
311 /** The page state. */
312 uint32_t u2State : 2;
313 } Free;
314#endif
315} GMMPAGE;
316AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
317/** Pointer to a GMMPAGE. */
318typedef GMMPAGE *PGMMPAGE;
319
320
321/** @name The Page States.
322 * @{ */
323/** A private page. */
324#define GMM_PAGE_STATE_PRIVATE 0
325/** A private page - alternative value used on the 32-bit implementation.
326 * This will never be used on 64-bit hosts. */
327#define GMM_PAGE_STATE_PRIVATE_32 1
328/** A shared page. */
329#define GMM_PAGE_STATE_SHARED 2
330/** A free page. */
331#define GMM_PAGE_STATE_FREE 3
332/** @} */
333
334
335/** @def GMM_PAGE_IS_PRIVATE
336 *
337 * @returns true if private, false if not.
338 * @param pPage The GMM page.
339 */
340#if HC_ARCH_BITS == 64
341# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
342#else
343# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
344#endif
345
346/** @def GMM_PAGE_IS_SHARED
347 *
348 * @returns true if shared, false if not.
349 * @param pPage The GMM page.
350 */
351#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
352
353/** @def GMM_PAGE_IS_FREE
354 *
355 * @returns true if free, false if not.
356 * @param pPage The GMM page.
357 */
358#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
359
360/** @def GMM_PAGE_PFN_LAST
361 * The last valid guest pfn range.
362 * @remark Some of the values outside the range has special meaning,
363 * see GMM_PAGE_PFN_UNSHAREABLE.
364 */
365#if HC_ARCH_BITS == 64
366# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
367#else
368# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
369#endif
370AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
371
372/** @def GMM_PAGE_PFN_UNSHAREABLE
373 * Indicates that this page isn't used for normal guest memory and thus isn't shareable.
374 */
375#if HC_ARCH_BITS == 64
376# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
377#else
378# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
379#endif
380AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
381
382
383/**
384 * A GMM allocation chunk ring-3 mapping record.
385 *
386 * This should really be associated with a session and not a VM, but
387 * it's simpler to associated with a VM and cleanup with the VM object
388 * is destroyed.
389 */
390typedef struct GMMCHUNKMAP
391{
392 /** The mapping object. */
393 RTR0MEMOBJ hMapObj;
394 /** The VM owning the mapping. */
395 PGVM pGVM;
396} GMMCHUNKMAP;
397/** Pointer to a GMM allocation chunk mapping. */
398typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
399
400
401/**
402 * A GMM allocation chunk.
403 */
404typedef struct GMMCHUNK
405{
406 /** The AVL node core.
407 * The Key is the chunk ID. (Giant mtx.) */
408 AVLU32NODECORE Core;
409 /** The memory object.
410 * Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
411 * what the host can dish up with. (Chunk mtx protects mapping accesses
412 * and related frees.) */
413 RTR0MEMOBJ hMemObj;
414#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
415 /** Pointer to the kernel mapping. */
416 uint8_t *pbMapping;
417#endif
418 /** Pointer to the next chunk in the free list. (Giant mtx.) */
419 PGMMCHUNK pFreeNext;
420 /** Pointer to the previous chunk in the free list. (Giant mtx.) */
421 PGMMCHUNK pFreePrev;
422 /** Pointer to the free set this chunk belongs to. NULL for
423 * chunks with no free pages. (Giant mtx.) */
424 PGMMCHUNKFREESET pSet;
425 /** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
426 RTLISTNODE ListNode;
427 /** Pointer to an array of mappings. (Chunk mtx.) */
428 PGMMCHUNKMAP paMappingsX;
429 /** The number of mappings. (Chunk mtx.) */
430 uint16_t cMappingsX;
431 /** The mapping lock this chunk is using using. UINT16_MAX if nobody is
432 * mapping or freeing anything. (Giant mtx.) */
433 uint8_t volatile iChunkMtx;
434 /** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
435 uint8_t fFlags;
436 /** The head of the list of free pages. UINT16_MAX is the NIL value.
437 * (Giant mtx.) */
438 uint16_t iFreeHead;
439 /** The number of free pages. (Giant mtx.) */
440 uint16_t cFree;
441 /** The GVM handle of the VM that first allocated pages from this chunk, this
442 * is used as a preference when there are several chunks to choose from.
443 * When in bound memory mode this isn't a preference any longer. (Giant
444 * mtx.) */
445 uint16_t hGVM;
446 /** The ID of the NUMA node the memory mostly resides on. (Reserved for
447 * future use.) (Giant mtx.) */
448 uint16_t idNumaNode;
449 /** The number of private pages. (Giant mtx.) */
450 uint16_t cPrivate;
451 /** The number of shared pages. (Giant mtx.) */
452 uint16_t cShared;
453 /** The pages. (Giant mtx.) */
454 GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
455} GMMCHUNK;
456
457/** Indicates that the NUMA properies of the memory is unknown. */
458#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
459
460/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
461 * @{ */
462/** Indicates that the chunk is a large page (2MB). */
463#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
464#ifdef GMM_WITH_LEGACY_MODE
465/** Indicates that the chunk was locked rather than allocated directly. */
466# define GMM_CHUNK_FLAGS_SEEDED UINT16_C(0x0002)
467#endif
468/** @} */
469
470
471/**
472 * An allocation chunk TLB entry.
473 */
474typedef struct GMMCHUNKTLBE
475{
476 /** The chunk id. */
477 uint32_t idChunk;
478 /** Pointer to the chunk. */
479 PGMMCHUNK pChunk;
480} GMMCHUNKTLBE;
481/** Pointer to an allocation chunk TLB entry. */
482typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
483
484
485/** The number of entries in the allocation chunk TLB. */
486#define GMM_CHUNKTLB_ENTRIES 32
487/** Gets the TLB entry index for the given Chunk ID. */
488#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
489
490/**
491 * An allocation chunk TLB.
492 */
493typedef struct GMMCHUNKTLB
494{
495 /** The TLB entries. */
496 GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
497} GMMCHUNKTLB;
498/** Pointer to an allocation chunk TLB. */
499typedef GMMCHUNKTLB *PGMMCHUNKTLB;
500
501
502/**
503 * The GMM instance data.
504 */
505typedef struct GMM
506{
507 /** Magic / eye catcher. GMM_MAGIC */
508 uint32_t u32Magic;
509 /** The number of threads waiting on the mutex. */
510 uint32_t cMtxContenders;
511#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
512 /** The critical section protecting the GMM.
513 * More fine grained locking can be implemented later if necessary. */
514 RTCRITSECT GiantCritSect;
515#else
516 /** The fast mutex protecting the GMM.
517 * More fine grained locking can be implemented later if necessary. */
518 RTSEMFASTMUTEX hMtx;
519#endif
520#ifdef VBOX_STRICT
521 /** The current mutex owner. */
522 RTNATIVETHREAD hMtxOwner;
523#endif
524 /** Spinlock protecting the AVL tree.
525 * @todo Make this a read-write spinlock as we should allow concurrent
526 * lookups. */
527 RTSPINLOCK hSpinLockTree;
528 /** The chunk tree.
529 * Protected by hSpinLockTree. */
530 PAVLU32NODECORE pChunks;
531 /** Chunk freeing generation - incremented whenever a chunk is freed. Used
532 * for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
533 * (exclusive), though higher numbers may temporarily occure while
534 * invalidating the individual TLBs during wrap-around processing. */
535 uint64_t volatile idFreeGeneration;
536 /** The chunk TLB.
537 * Protected by hSpinLockTree. */
538 GMMCHUNKTLB ChunkTLB;
539 /** The private free set. */
540 GMMCHUNKFREESET PrivateX;
541 /** The shared free set. */
542 GMMCHUNKFREESET Shared;
543
544 /** Shared module tree (global).
545 * @todo separate trees for distinctly different guest OSes. */
546 PAVLLU32NODECORE pGlobalSharedModuleTree;
547 /** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
548 uint32_t cShareableModules;
549
550 /** The chunk list. For simplifying the cleanup process and avoid tree
551 * traversal. */
552 RTLISTANCHOR ChunkList;
553
554 /** The maximum number of pages we're allowed to allocate.
555 * @gcfgm{GMM/MaxPages,64-bit, Direct.}
556 * @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
557 uint64_t cMaxPages;
558 /** The number of pages that has been reserved.
559 * The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
560 uint64_t cReservedPages;
561 /** The number of pages that we have over-committed in reservations. */
562 uint64_t cOverCommittedPages;
563 /** The number of actually allocated (committed if you like) pages. */
564 uint64_t cAllocatedPages;
565 /** The number of pages that are shared. A subset of cAllocatedPages. */
566 uint64_t cSharedPages;
567 /** The number of pages that are actually shared between VMs. */
568 uint64_t cDuplicatePages;
569 /** The number of pages that are shared that has been left behind by
570 * VMs not doing proper cleanups. */
571 uint64_t cLeftBehindSharedPages;
572 /** The number of allocation chunks.
573 * (The number of pages we've allocated from the host can be derived from this.) */
574 uint32_t cChunks;
575 /** The number of current ballooned pages. */
576 uint64_t cBalloonedPages;
577
578#ifndef GMM_WITH_LEGACY_MODE
579# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
580 /** Whether #RTR0MemObjAllocPhysNC works. */
581 bool fHasWorkingAllocPhysNC;
582# else
583 bool fPadding;
584# endif
585#else
586 /** The legacy allocation mode indicator.
587 * This is determined at initialization time. */
588 bool fLegacyAllocationMode;
589#endif
590 /** The bound memory mode indicator.
591 * When set, the memory will be bound to a specific VM and never
592 * shared. This is always set if fLegacyAllocationMode is set.
593 * (Also determined at initialization time.) */
594 bool fBoundMemoryMode;
595 /** The number of registered VMs. */
596 uint16_t cRegisteredVMs;
597
598 /** The number of freed chunks ever. This is used a list generation to
599 * avoid restarting the cleanup scanning when the list wasn't modified. */
600 uint32_t cFreedChunks;
601 /** The previous allocated Chunk ID.
602 * Used as a hint to avoid scanning the whole bitmap. */
603 uint32_t idChunkPrev;
604 /** Chunk ID allocation bitmap.
605 * Bits of allocated IDs are set, free ones are clear.
606 * The NIL id (0) is marked allocated. */
607 uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
608
609 /** The index of the next mutex to use. */
610 uint32_t iNextChunkMtx;
611 /** Chunk locks for reducing lock contention without having to allocate
612 * one lock per chunk. */
613 struct
614 {
615 /** The mutex */
616 RTSEMFASTMUTEX hMtx;
617 /** The number of threads currently using this mutex. */
618 uint32_t volatile cUsers;
619 } aChunkMtx[64];
620} GMM;
621/** Pointer to the GMM instance. */
622typedef GMM *PGMM;
623
624/** The value of GMM::u32Magic (Katsuhiro Otomo). */
625#define GMM_MAGIC UINT32_C(0x19540414)
626
627
628/**
629 * GMM chunk mutex state.
630 *
631 * This is returned by gmmR0ChunkMutexAcquire and is used by the other
632 * gmmR0ChunkMutex* methods.
633 */
634typedef struct GMMR0CHUNKMTXSTATE
635{
636 PGMM pGMM;
637 /** The index of the chunk mutex. */
638 uint8_t iChunkMtx;
639 /** The relevant flags (GMMR0CHUNK_MTX_XXX). */
640 uint8_t fFlags;
641} GMMR0CHUNKMTXSTATE;
642/** Pointer to a chunk mutex state. */
643typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
644
645/** @name GMMR0CHUNK_MTX_XXX
646 * @{ */
647#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
648#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
649#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
650#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
651#define GMMR0CHUNK_MTX_END UINT32_C(4)
652/** @} */
653
654
655/** The maximum number of shared modules per-vm. */
656#define GMM_MAX_SHARED_PER_VM_MODULES 2048
657/** The maximum number of shared modules GMM is allowed to track. */
658#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
659
660
661/**
662 * Argument packet for gmmR0SharedModuleCleanup.
663 */
664typedef struct GMMR0SHMODPERVMDTORARGS
665{
666 PGVM pGVM;
667 PGMM pGMM;
668} GMMR0SHMODPERVMDTORARGS;
669
670/**
671 * Argument packet for gmmR0CheckSharedModule.
672 */
673typedef struct GMMCHECKSHAREDMODULEINFO
674{
675 PGVM pGVM;
676 VMCPUID idCpu;
677} GMMCHECKSHAREDMODULEINFO;
678
679
680/*********************************************************************************************************************************
681* Global Variables *
682*********************************************************************************************************************************/
683/** Pointer to the GMM instance data. */
684static PGMM g_pGMM = NULL;
685
686/** Macro for obtaining and validating the g_pGMM pointer.
687 *
688 * On failure it will return from the invoking function with the specified
689 * return value.
690 *
691 * @param pGMM The name of the pGMM variable.
692 * @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
693 * status codes.
694 */
695#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
696 do { \
697 (pGMM) = g_pGMM; \
698 AssertPtrReturn((pGMM), (rc)); \
699 AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
700 } while (0)
701
702/** Macro for obtaining and validating the g_pGMM pointer, void function
703 * variant.
704 *
705 * On failure it will return from the invoking function.
706 *
707 * @param pGMM The name of the pGMM variable.
708 */
709#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
710 do { \
711 (pGMM) = g_pGMM; \
712 AssertPtrReturnVoid((pGMM)); \
713 AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
714 } while (0)
715
716
717/** @def GMM_CHECK_SANITY_UPON_ENTERING
718 * Checks the sanity of the GMM instance data before making changes.
719 *
720 * This is macro is a stub by default and must be enabled manually in the code.
721 *
722 * @returns true if sane, false if not.
723 * @param pGMM The name of the pGMM variable.
724 */
725#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
726# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
727#else
728# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
729#endif
730
731/** @def GMM_CHECK_SANITY_UPON_LEAVING
732 * Checks the sanity of the GMM instance data after making changes.
733 *
734 * This is macro is a stub by default and must be enabled manually in the code.
735 *
736 * @returns true if sane, false if not.
737 * @param pGMM The name of the pGMM variable.
738 */
739#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
740# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
741#else
742# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
743#endif
744
745/** @def GMM_CHECK_SANITY_IN_LOOPS
746 * Checks the sanity of the GMM instance in the allocation loops.
747 *
748 * This is macro is a stub by default and must be enabled manually in the code.
749 *
750 * @returns true if sane, false if not.
751 * @param pGMM The name of the pGMM variable.
752 */
753#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
754# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
755#else
756# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
757#endif
758
759
760/*********************************************************************************************************************************
761* Internal Functions *
762*********************************************************************************************************************************/
763static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
764static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
765DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
766DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
767DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
768#ifdef GMMR0_WITH_SANITY_CHECK
769static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
770#endif
771static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
772DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
773DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
774static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
775#ifdef VBOX_WITH_PAGE_SHARING
776static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
777# ifdef VBOX_STRICT
778static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
779# endif
780#endif
781
782
783
784/**
785 * Initializes the GMM component.
786 *
787 * This is called when the VMMR0.r0 module is loaded and protected by the
788 * loader semaphore.
789 *
790 * @returns VBox status code.
791 */
792GMMR0DECL(int) GMMR0Init(void)
793{
794 LogFlow(("GMMInit:\n"));
795
796 /*
797 * Allocate the instance data and the locks.
798 */
799 PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
800 if (!pGMM)
801 return VERR_NO_MEMORY;
802
803 pGMM->u32Magic = GMM_MAGIC;
804 for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
805 pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
806 RTListInit(&pGMM->ChunkList);
807 ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
808
809#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
810 int rc = RTCritSectInit(&pGMM->GiantCritSect);
811#else
812 int rc = RTSemFastMutexCreate(&pGMM->hMtx);
813#endif
814 if (RT_SUCCESS(rc))
815 {
816 unsigned iMtx;
817 for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
818 {
819 rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
820 if (RT_FAILURE(rc))
821 break;
822 }
823 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
824 if (RT_SUCCESS(rc))
825 rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
826 if (RT_SUCCESS(rc))
827 {
828#ifndef GMM_WITH_LEGACY_MODE
829 /*
830 * Figure out how we're going to allocate stuff (only applicable to
831 * host with linear physical memory mappings).
832 */
833 pGMM->fBoundMemoryMode = false;
834# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
835 pGMM->fHasWorkingAllocPhysNC = false;
836
837 RTR0MEMOBJ hMemObj;
838 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
839 if (RT_SUCCESS(rc))
840 {
841 rc = RTR0MemObjFree(hMemObj, true);
842 AssertRC(rc);
843 pGMM->fHasWorkingAllocPhysNC = true;
844 }
845 else if (rc != VERR_NOT_SUPPORTED)
846 SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
847# endif
848#else /* GMM_WITH_LEGACY_MODE */
849 /*
850 * Check and see if RTR0MemObjAllocPhysNC works.
851 */
852# if 0 /* later, see @bufref{3170}. */
853 RTR0MEMOBJ MemObj;
854 rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
855 if (RT_SUCCESS(rc))
856 {
857 rc = RTR0MemObjFree(MemObj, true);
858 AssertRC(rc);
859 }
860 else if (rc == VERR_NOT_SUPPORTED)
861 pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
862 else
863 SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
864# else
865# if defined(RT_OS_WINDOWS) || (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) || defined(RT_OS_LINUX) || defined(RT_OS_FREEBSD)
866 pGMM->fLegacyAllocationMode = false;
867# if ARCH_BITS == 32
868 /* Don't reuse possibly partial chunks because of the virtual
869 address space limitation. */
870 pGMM->fBoundMemoryMode = true;
871# else
872 pGMM->fBoundMemoryMode = false;
873# endif
874# else
875 pGMM->fLegacyAllocationMode = true;
876 pGMM->fBoundMemoryMode = true;
877# endif
878# endif
879#endif /* GMM_WITH_LEGACY_MODE */
880
881 /*
882 * Query system page count and guess a reasonable cMaxPages value.
883 */
884 pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
885
886 /*
887 * The idFreeGeneration value should be set so we actually trigger the
888 * wrap-around invalidation handling during a typical test run.
889 */
890 pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
891
892 g_pGMM = pGMM;
893#ifdef GMM_WITH_LEGACY_MODE
894 LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
895#elif defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
896 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
897#else
898 LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
899#endif
900 return VINF_SUCCESS;
901 }
902
903 /*
904 * Bail out.
905 */
906 RTSpinlockDestroy(pGMM->hSpinLockTree);
907 while (iMtx-- > 0)
908 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
909#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
910 RTCritSectDelete(&pGMM->GiantCritSect);
911#else
912 RTSemFastMutexDestroy(pGMM->hMtx);
913#endif
914 }
915
916 pGMM->u32Magic = 0;
917 RTMemFree(pGMM);
918 SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
919 return rc;
920}
921
922
923/**
924 * Terminates the GMM component.
925 */
926GMMR0DECL(void) GMMR0Term(void)
927{
928 LogFlow(("GMMTerm:\n"));
929
930 /*
931 * Take care / be paranoid...
932 */
933 PGMM pGMM = g_pGMM;
934 if (!RT_VALID_PTR(pGMM))
935 return;
936 if (pGMM->u32Magic != GMM_MAGIC)
937 {
938 SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
939 return;
940 }
941
942 /*
943 * Undo what init did and free all the resources we've acquired.
944 */
945 /* Destroy the fundamentals. */
946 g_pGMM = NULL;
947 pGMM->u32Magic = ~GMM_MAGIC;
948#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
949 RTCritSectDelete(&pGMM->GiantCritSect);
950#else
951 RTSemFastMutexDestroy(pGMM->hMtx);
952 pGMM->hMtx = NIL_RTSEMFASTMUTEX;
953#endif
954 RTSpinlockDestroy(pGMM->hSpinLockTree);
955 pGMM->hSpinLockTree = NIL_RTSPINLOCK;
956
957 /* Free any chunks still hanging around. */
958 RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
959
960 /* Destroy the chunk locks. */
961 for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
962 {
963 Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
964 RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
965 pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
966 }
967
968 /* Finally the instance data itself. */
969 RTMemFree(pGMM);
970 LogFlow(("GMMTerm: done\n"));
971}
972
973
974/**
975 * RTAvlU32Destroy callback.
976 *
977 * @returns 0
978 * @param pNode The node to destroy.
979 * @param pvGMM The GMM handle.
980 */
981static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
982{
983 PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
984
985 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
986 SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
987 pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
988
989 int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
990 if (RT_FAILURE(rc))
991 {
992 SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
993 pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
994 AssertRC(rc);
995 }
996 pChunk->hMemObj = NIL_RTR0MEMOBJ;
997
998 RTMemFree(pChunk->paMappingsX);
999 pChunk->paMappingsX = NULL;
1000
1001 RTMemFree(pChunk);
1002 NOREF(pvGMM);
1003 return 0;
1004}
1005
1006
1007/**
1008 * Initializes the per-VM data for the GMM.
1009 *
1010 * This is called from within the GVMM lock (from GVMMR0CreateVM)
1011 * and should only initialize the data members so GMMR0CleanupVM
1012 * can deal with them. We reserve no memory or anything here,
1013 * that's done later in GMMR0InitVM.
1014 *
1015 * @param pGVM Pointer to the Global VM structure.
1016 */
1017GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
1018{
1019 AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
1020
1021 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1022 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1023 pGVM->gmm.s.Stats.fMayAllocate = false;
1024
1025 pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
1026 int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
1027 AssertRCReturn(rc, rc);
1028
1029 return VINF_SUCCESS;
1030}
1031
1032
1033/**
1034 * Acquires the GMM giant lock.
1035 *
1036 * @returns Assert status code from RTSemFastMutexRequest.
1037 * @param pGMM Pointer to the GMM instance.
1038 */
1039static int gmmR0MutexAcquire(PGMM pGMM)
1040{
1041 ASMAtomicIncU32(&pGMM->cMtxContenders);
1042#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1043 int rc = RTCritSectEnter(&pGMM->GiantCritSect);
1044#else
1045 int rc = RTSemFastMutexRequest(pGMM->hMtx);
1046#endif
1047 ASMAtomicDecU32(&pGMM->cMtxContenders);
1048 AssertRC(rc);
1049#ifdef VBOX_STRICT
1050 pGMM->hMtxOwner = RTThreadNativeSelf();
1051#endif
1052 return rc;
1053}
1054
1055
1056/**
1057 * Releases the GMM giant lock.
1058 *
1059 * @returns Assert status code from RTSemFastMutexRequest.
1060 * @param pGMM Pointer to the GMM instance.
1061 */
1062static int gmmR0MutexRelease(PGMM pGMM)
1063{
1064#ifdef VBOX_STRICT
1065 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1066#endif
1067#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1068 int rc = RTCritSectLeave(&pGMM->GiantCritSect);
1069#else
1070 int rc = RTSemFastMutexRelease(pGMM->hMtx);
1071 AssertRC(rc);
1072#endif
1073 return rc;
1074}
1075
1076
1077/**
1078 * Yields the GMM giant lock if there is contention and a certain minimum time
1079 * has elapsed since we took it.
1080 *
1081 * @returns @c true if the mutex was yielded, @c false if not.
1082 * @param pGMM Pointer to the GMM instance.
1083 * @param puLockNanoTS Where the lock acquisition time stamp is kept
1084 * (in/out).
1085 */
1086static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1087{
1088 /*
1089 * If nobody is contending the mutex, don't bother checking the time.
1090 */
1091 if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1092 return false;
1093
1094 /*
1095 * Don't yield if we haven't executed for at least 2 milliseconds.
1096 */
1097 uint64_t uNanoNow = RTTimeSystemNanoTS();
1098 if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1099 return false;
1100
1101 /*
1102 * Yield the mutex.
1103 */
1104#ifdef VBOX_STRICT
1105 pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1106#endif
1107 ASMAtomicIncU32(&pGMM->cMtxContenders);
1108#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1109 int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1110#else
1111 int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1112#endif
1113
1114 RTThreadYield();
1115
1116#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1117 int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1118#else
1119 int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1120#endif
1121 *puLockNanoTS = RTTimeSystemNanoTS();
1122 ASMAtomicDecU32(&pGMM->cMtxContenders);
1123#ifdef VBOX_STRICT
1124 pGMM->hMtxOwner = RTThreadNativeSelf();
1125#endif
1126
1127 return true;
1128}
1129
1130
1131/**
1132 * Acquires a chunk lock.
1133 *
1134 * The caller must own the giant lock.
1135 *
1136 * @returns Assert status code from RTSemFastMutexRequest.
1137 * @param pMtxState The chunk mutex state info. (Avoids
1138 * passing the same flags and stuff around
1139 * for subsequent release and drop-giant
1140 * calls.)
1141 * @param pGMM Pointer to the GMM instance.
1142 * @param pChunk Pointer to the chunk.
1143 * @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1144 */
1145static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1146{
1147 Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1148 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1149
1150 pMtxState->pGMM = pGMM;
1151 pMtxState->fFlags = (uint8_t)fFlags;
1152
1153 /*
1154 * Get the lock index and reference the lock.
1155 */
1156 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1157 uint32_t iChunkMtx = pChunk->iChunkMtx;
1158 if (iChunkMtx == UINT8_MAX)
1159 {
1160 iChunkMtx = pGMM->iNextChunkMtx++;
1161 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1162
1163 /* Try get an unused one... */
1164 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1165 {
1166 iChunkMtx = pGMM->iNextChunkMtx++;
1167 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1168 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1169 {
1170 iChunkMtx = pGMM->iNextChunkMtx++;
1171 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1172 if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1173 {
1174 iChunkMtx = pGMM->iNextChunkMtx++;
1175 iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1176 }
1177 }
1178 }
1179
1180 pChunk->iChunkMtx = iChunkMtx;
1181 }
1182 AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1183 pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1184 ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1185
1186 /*
1187 * Drop the giant?
1188 */
1189 if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1190 {
1191 /** @todo GMM life cycle cleanup (we may race someone
1192 * destroying and cleaning up GMM)? */
1193 gmmR0MutexRelease(pGMM);
1194 }
1195
1196 /*
1197 * Take the chunk mutex.
1198 */
1199 int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1200 AssertRC(rc);
1201 return rc;
1202}
1203
1204
1205/**
1206 * Releases the GMM giant lock.
1207 *
1208 * @returns Assert status code from RTSemFastMutexRequest.
1209 * @param pMtxState Pointer to the chunk mutex state.
1210 * @param pChunk Pointer to the chunk if it's still
1211 * alive, NULL if it isn't. This is used to deassociate
1212 * the chunk from the mutex on the way out so a new one
1213 * can be selected next time, thus avoiding contented
1214 * mutexes.
1215 */
1216static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1217{
1218 PGMM pGMM = pMtxState->pGMM;
1219
1220 /*
1221 * Release the chunk mutex and reacquire the giant if requested.
1222 */
1223 int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1224 AssertRC(rc);
1225 if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1226 rc = gmmR0MutexAcquire(pGMM);
1227 else
1228 Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1229
1230 /*
1231 * Drop the chunk mutex user reference and deassociate it from the chunk
1232 * when possible.
1233 */
1234 if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1235 && pChunk
1236 && RT_SUCCESS(rc) )
1237 {
1238 if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1239 pChunk->iChunkMtx = UINT8_MAX;
1240 else
1241 {
1242 rc = gmmR0MutexAcquire(pGMM);
1243 if (RT_SUCCESS(rc))
1244 {
1245 if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1246 pChunk->iChunkMtx = UINT8_MAX;
1247 rc = gmmR0MutexRelease(pGMM);
1248 }
1249 }
1250 }
1251
1252 pMtxState->pGMM = NULL;
1253 return rc;
1254}
1255
1256
1257/**
1258 * Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1259 * chunk locked.
1260 *
1261 * This only works if gmmR0ChunkMutexAcquire was called with
1262 * GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1263 * mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1264 *
1265 * @returns VBox status code (assuming success is ok).
1266 * @param pMtxState Pointer to the chunk mutex state.
1267 */
1268static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1269{
1270 AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1271 Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1272 pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1273 /** @todo GMM life cycle cleanup (we may race someone
1274 * destroying and cleaning up GMM)? */
1275 return gmmR0MutexRelease(pMtxState->pGMM);
1276}
1277
1278
1279/**
1280 * For experimenting with NUMA affinity and such.
1281 *
1282 * @returns The current NUMA Node ID.
1283 */
1284static uint16_t gmmR0GetCurrentNumaNodeId(void)
1285{
1286#if 1
1287 return GMM_CHUNK_NUMA_ID_UNKNOWN;
1288#else
1289 return RTMpCpuId() / 16;
1290#endif
1291}
1292
1293
1294
1295/**
1296 * Cleans up when a VM is terminating.
1297 *
1298 * @param pGVM Pointer to the Global VM structure.
1299 */
1300GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1301{
1302 LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1303
1304 PGMM pGMM;
1305 GMM_GET_VALID_INSTANCE_VOID(pGMM);
1306
1307#ifdef VBOX_WITH_PAGE_SHARING
1308 /*
1309 * Clean up all registered shared modules first.
1310 */
1311 gmmR0SharedModuleCleanup(pGMM, pGVM);
1312#endif
1313
1314 gmmR0MutexAcquire(pGMM);
1315 uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1316 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1317
1318 /*
1319 * The policy is 'INVALID' until the initial reservation
1320 * request has been serviced.
1321 */
1322 if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1323 && pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1324 {
1325 /*
1326 * If it's the last VM around, we can skip walking all the chunk looking
1327 * for the pages owned by this VM and instead flush the whole shebang.
1328 *
1329 * This takes care of the eventuality that a VM has left shared page
1330 * references behind (shouldn't happen of course, but you never know).
1331 */
1332 Assert(pGMM->cRegisteredVMs);
1333 pGMM->cRegisteredVMs--;
1334
1335 /*
1336 * Walk the entire pool looking for pages that belong to this VM
1337 * and leftover mappings. (This'll only catch private pages,
1338 * shared pages will be 'left behind'.)
1339 */
1340 /** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1341 uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1342
1343 unsigned iCountDown = 64;
1344 bool fRedoFromStart;
1345 PGMMCHUNK pChunk;
1346 do
1347 {
1348 fRedoFromStart = false;
1349 RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1350 {
1351 uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1352 if ( ( !pGMM->fBoundMemoryMode
1353 || pChunk->hGVM == pGVM->hSelf)
1354 && gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1355 {
1356 /* We left the giant mutex, so reset the yield counters. */
1357 uLockNanoTS = RTTimeSystemNanoTS();
1358 iCountDown = 64;
1359 }
1360 else
1361 {
1362 /* Didn't leave it, so do normal yielding. */
1363 if (!iCountDown)
1364 gmmR0MutexYield(pGMM, &uLockNanoTS);
1365 else
1366 iCountDown--;
1367 }
1368 if (pGMM->cFreedChunks != cFreeChunksOld)
1369 {
1370 fRedoFromStart = true;
1371 break;
1372 }
1373 }
1374 } while (fRedoFromStart);
1375
1376 if (pGVM->gmm.s.Stats.cPrivatePages)
1377 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1378
1379 pGMM->cAllocatedPages -= cPrivatePages;
1380
1381 /*
1382 * Free empty chunks.
1383 */
1384 PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1385 do
1386 {
1387 fRedoFromStart = false;
1388 iCountDown = 10240;
1389 pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1390 while (pChunk)
1391 {
1392 PGMMCHUNK pNext = pChunk->pFreeNext;
1393 Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1394 if ( !pGMM->fBoundMemoryMode
1395 || pChunk->hGVM == pGVM->hSelf)
1396 {
1397 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1398 if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /*fRelaxedSem*/))
1399 {
1400 /* We've left the giant mutex, restart? (+1 for our unlink) */
1401 fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1402 if (fRedoFromStart)
1403 break;
1404 uLockNanoTS = RTTimeSystemNanoTS();
1405 iCountDown = 10240;
1406 }
1407 }
1408
1409 /* Advance and maybe yield the lock. */
1410 pChunk = pNext;
1411 if (--iCountDown == 0)
1412 {
1413 uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1414 fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1415 && pPrivateSet->idGeneration != idGenerationOld;
1416 if (fRedoFromStart)
1417 break;
1418 iCountDown = 10240;
1419 }
1420 }
1421 } while (fRedoFromStart);
1422
1423 /*
1424 * Account for shared pages that weren't freed.
1425 */
1426 if (pGVM->gmm.s.Stats.cSharedPages)
1427 {
1428 Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1429 SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1430 pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1431 }
1432
1433 /*
1434 * Clean up balloon statistics in case the VM process crashed.
1435 */
1436 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1437 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1438
1439 /*
1440 * Update the over-commitment management statistics.
1441 */
1442 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1443 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1444 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1445 switch (pGVM->gmm.s.Stats.enmPolicy)
1446 {
1447 case GMMOCPOLICY_NO_OC:
1448 break;
1449 default:
1450 /** @todo Update GMM->cOverCommittedPages */
1451 break;
1452 }
1453 }
1454
1455 /* zap the GVM data. */
1456 pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1457 pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1458 pGVM->gmm.s.Stats.fMayAllocate = false;
1459
1460 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1461 gmmR0MutexRelease(pGMM);
1462
1463 /*
1464 * Destroy the spinlock.
1465 */
1466 RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1467 ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1468 RTSpinlockDestroy(hSpinlock);
1469
1470 LogFlow(("GMMR0CleanupVM: returns\n"));
1471}
1472
1473
1474/**
1475 * Scan one chunk for private pages belonging to the specified VM.
1476 *
1477 * @note This function may drop the giant mutex!
1478 *
1479 * @returns @c true if we've temporarily dropped the giant mutex, @c false if
1480 * we didn't.
1481 * @param pGMM Pointer to the GMM instance.
1482 * @param pGVM The global VM handle.
1483 * @param pChunk The chunk to scan.
1484 */
1485static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1486{
1487 Assert(!pGMM->fBoundMemoryMode || pChunk->hGVM == pGVM->hSelf);
1488
1489 /*
1490 * Look for pages belonging to the VM.
1491 * (Perform some internal checks while we're scanning.)
1492 */
1493#ifndef VBOX_STRICT
1494 if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1495#endif
1496 {
1497 unsigned cPrivate = 0;
1498 unsigned cShared = 0;
1499 unsigned cFree = 0;
1500
1501 gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1502
1503 uint16_t hGVM = pGVM->hSelf;
1504 unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1505 while (iPage-- > 0)
1506 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1507 {
1508 if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1509 {
1510 /*
1511 * Free the page.
1512 *
1513 * The reason for not using gmmR0FreePrivatePage here is that we
1514 * must *not* cause the chunk to be freed from under us - we're in
1515 * an AVL tree walk here.
1516 */
1517 pChunk->aPages[iPage].u = 0;
1518 pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1519 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1520 pChunk->iFreeHead = iPage;
1521 pChunk->cPrivate--;
1522 pChunk->cFree++;
1523 pGVM->gmm.s.Stats.cPrivatePages--;
1524 cFree++;
1525 }
1526 else
1527 cPrivate++;
1528 }
1529 else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1530 cFree++;
1531 else
1532 cShared++;
1533
1534 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1535
1536 /*
1537 * Did it add up?
1538 */
1539 if (RT_UNLIKELY( pChunk->cFree != cFree
1540 || pChunk->cPrivate != cPrivate
1541 || pChunk->cShared != cShared))
1542 {
1543 SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1544 pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1545 pChunk->cFree = cFree;
1546 pChunk->cPrivate = cPrivate;
1547 pChunk->cShared = cShared;
1548 }
1549 }
1550
1551 /*
1552 * If not in bound memory mode, we should reset the hGVM field
1553 * if it has our handle in it.
1554 */
1555 if (pChunk->hGVM == pGVM->hSelf)
1556 {
1557 if (!g_pGMM->fBoundMemoryMode)
1558 pChunk->hGVM = NIL_GVM_HANDLE;
1559 else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1560 {
1561 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1562 pChunk, pChunk->Core.Key, pChunk->cFree);
1563 AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1564
1565 gmmR0UnlinkChunk(pChunk);
1566 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1567 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1568 }
1569 }
1570
1571 /*
1572 * Look for a mapping belonging to the terminating VM.
1573 */
1574 GMMR0CHUNKMTXSTATE MtxState;
1575 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1576 unsigned cMappings = pChunk->cMappingsX;
1577 for (unsigned i = 0; i < cMappings; i++)
1578 if (pChunk->paMappingsX[i].pGVM == pGVM)
1579 {
1580 gmmR0ChunkMutexDropGiant(&MtxState);
1581
1582 RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1583
1584 cMappings--;
1585 if (i < cMappings)
1586 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1587 pChunk->paMappingsX[cMappings].pGVM = NULL;
1588 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1589 Assert(pChunk->cMappingsX - 1U == cMappings);
1590 pChunk->cMappingsX = cMappings;
1591
1592 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1593 if (RT_FAILURE(rc))
1594 {
1595 SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1596 pChunk, pChunk->Core.Key, i, hMemObj, rc);
1597 AssertRC(rc);
1598 }
1599
1600 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1601 return true;
1602 }
1603
1604 gmmR0ChunkMutexRelease(&MtxState, pChunk);
1605 return false;
1606}
1607
1608
1609/**
1610 * The initial resource reservations.
1611 *
1612 * This will make memory reservations according to policy and priority. If there aren't
1613 * sufficient resources available to sustain the VM this function will fail and all
1614 * future allocations requests will fail as well.
1615 *
1616 * These are just the initial reservations made very very early during the VM creation
1617 * process and will be adjusted later in the GMMR0UpdateReservation call after the
1618 * ring-3 init has completed.
1619 *
1620 * @returns VBox status code.
1621 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1622 * @retval VERR_GMM_
1623 *
1624 * @param pGVM The global (ring-0) VM structure.
1625 * @param idCpu The VCPU id - must be zero.
1626 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1627 * This does not include MMIO2 and similar.
1628 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1629 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1630 * hyper heap, MMIO2 and similar.
1631 * @param enmPolicy The OC policy to use on this VM.
1632 * @param enmPriority The priority in an out-of-memory situation.
1633 *
1634 * @thread The creator thread / EMT(0).
1635 */
1636GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1637 uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1638{
1639 LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1640 pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1641
1642 /*
1643 * Validate, get basics and take the semaphore.
1644 */
1645 AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1646 PGMM pGMM;
1647 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1648 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1649 if (RT_FAILURE(rc))
1650 return rc;
1651
1652 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1653 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1654 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1655 AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1656 AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1657
1658 gmmR0MutexAcquire(pGMM);
1659 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1660 {
1661 if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1662 && !pGVM->gmm.s.Stats.Reserved.cFixedPages
1663 && !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1664 {
1665 /*
1666 * Check if we can accommodate this.
1667 */
1668 /* ... later ... */
1669 if (RT_SUCCESS(rc))
1670 {
1671 /*
1672 * Update the records.
1673 */
1674 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1675 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1676 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1677 pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1678 pGVM->gmm.s.Stats.enmPriority = enmPriority;
1679 pGVM->gmm.s.Stats.fMayAllocate = true;
1680
1681 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1682 pGMM->cRegisteredVMs++;
1683 }
1684 }
1685 else
1686 rc = VERR_WRONG_ORDER;
1687 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1688 }
1689 else
1690 rc = VERR_GMM_IS_NOT_SANE;
1691 gmmR0MutexRelease(pGMM);
1692 LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1693 return rc;
1694}
1695
1696
1697/**
1698 * VMMR0 request wrapper for GMMR0InitialReservation.
1699 *
1700 * @returns see GMMR0InitialReservation.
1701 * @param pGVM The global (ring-0) VM structure.
1702 * @param idCpu The VCPU id.
1703 * @param pReq Pointer to the request packet.
1704 */
1705GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1706{
1707 /*
1708 * Validate input and pass it on.
1709 */
1710 AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1711 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1712 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1713
1714 return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1715 pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1716}
1717
1718
1719/**
1720 * This updates the memory reservation with the additional MMIO2 and ROM pages.
1721 *
1722 * @returns VBox status code.
1723 * @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1724 *
1725 * @param pGVM The global (ring-0) VM structure.
1726 * @param idCpu The VCPU id.
1727 * @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1728 * This does not include MMIO2 and similar.
1729 * @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1730 * @param cFixedPages The number of pages that may be allocated for fixed objects like the
1731 * hyper heap, MMIO2 and similar.
1732 *
1733 * @thread EMT(idCpu)
1734 */
1735GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1736 uint32_t cShadowPages, uint32_t cFixedPages)
1737{
1738 LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1739 pGVM, cBasePages, cShadowPages, cFixedPages));
1740
1741 /*
1742 * Validate, get basics and take the semaphore.
1743 */
1744 PGMM pGMM;
1745 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1746 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1747 if (RT_FAILURE(rc))
1748 return rc;
1749
1750 AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1751 AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1752 AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1753
1754 gmmR0MutexAcquire(pGMM);
1755 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1756 {
1757 if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1758 && pGVM->gmm.s.Stats.Reserved.cFixedPages
1759 && pGVM->gmm.s.Stats.Reserved.cShadowPages)
1760 {
1761 /*
1762 * Check if we can accommodate this.
1763 */
1764 /* ... later ... */
1765 if (RT_SUCCESS(rc))
1766 {
1767 /*
1768 * Update the records.
1769 */
1770 pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1771 + pGVM->gmm.s.Stats.Reserved.cFixedPages
1772 + pGVM->gmm.s.Stats.Reserved.cShadowPages;
1773 pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1774
1775 pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1776 pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1777 pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1778 }
1779 }
1780 else
1781 rc = VERR_WRONG_ORDER;
1782 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1783 }
1784 else
1785 rc = VERR_GMM_IS_NOT_SANE;
1786 gmmR0MutexRelease(pGMM);
1787 LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1788 return rc;
1789}
1790
1791
1792/**
1793 * VMMR0 request wrapper for GMMR0UpdateReservation.
1794 *
1795 * @returns see GMMR0UpdateReservation.
1796 * @param pGVM The global (ring-0) VM structure.
1797 * @param idCpu The VCPU id.
1798 * @param pReq Pointer to the request packet.
1799 */
1800GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1801{
1802 /*
1803 * Validate input and pass it on.
1804 */
1805 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1806 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
1807
1808 return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1809}
1810
1811#ifdef GMMR0_WITH_SANITY_CHECK
1812
1813/**
1814 * Performs sanity checks on a free set.
1815 *
1816 * @returns Error count.
1817 *
1818 * @param pGMM Pointer to the GMM instance.
1819 * @param pSet Pointer to the set.
1820 * @param pszSetName The set name.
1821 * @param pszFunction The function from which it was called.
1822 * @param uLine The line number.
1823 */
1824static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1825 const char *pszFunction, unsigned uLineNo)
1826{
1827 uint32_t cErrors = 0;
1828
1829 /*
1830 * Count the free pages in all the chunks and match it against pSet->cFreePages.
1831 */
1832 uint32_t cPages = 0;
1833 for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1834 {
1835 for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1836 {
1837 /** @todo check that the chunk is hash into the right set. */
1838 cPages += pCur->cFree;
1839 }
1840 }
1841 if (RT_UNLIKELY(cPages != pSet->cFreePages))
1842 {
1843 SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1844 cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1845 cErrors++;
1846 }
1847
1848 return cErrors;
1849}
1850
1851
1852/**
1853 * Performs some sanity checks on the GMM while owning lock.
1854 *
1855 * @returns Error count.
1856 *
1857 * @param pGMM Pointer to the GMM instance.
1858 * @param pszFunction The function from which it is called.
1859 * @param uLineNo The line number.
1860 */
1861static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1862{
1863 uint32_t cErrors = 0;
1864
1865 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1866 cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1867 /** @todo add more sanity checks. */
1868
1869 return cErrors;
1870}
1871
1872#endif /* GMMR0_WITH_SANITY_CHECK */
1873
1874/**
1875 * Looks up a chunk in the tree and fill in the TLB entry for it.
1876 *
1877 * This is not expected to fail and will bitch if it does.
1878 *
1879 * @returns Pointer to the allocation chunk, NULL if not found.
1880 * @param pGMM Pointer to the GMM instance.
1881 * @param idChunk The ID of the chunk to find.
1882 * @param pTlbe Pointer to the TLB entry.
1883 *
1884 * @note Caller owns spinlock.
1885 */
1886static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1887{
1888 PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1889 AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1890 pTlbe->idChunk = idChunk;
1891 pTlbe->pChunk = pChunk;
1892 return pChunk;
1893}
1894
1895
1896/**
1897 * Finds a allocation chunk, spin-locked.
1898 *
1899 * This is not expected to fail and will bitch if it does.
1900 *
1901 * @returns Pointer to the allocation chunk, NULL if not found.
1902 * @param pGMM Pointer to the GMM instance.
1903 * @param idChunk The ID of the chunk to find.
1904 */
1905DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1906{
1907 /*
1908 * Do a TLB lookup, branch if not in the TLB.
1909 */
1910 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1911 PGMMCHUNK pChunk = pTlbe->pChunk;
1912 if ( pChunk == NULL
1913 || pTlbe->idChunk != idChunk)
1914 pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1915 return pChunk;
1916}
1917
1918
1919/**
1920 * Finds a allocation chunk.
1921 *
1922 * This is not expected to fail and will bitch if it does.
1923 *
1924 * @returns Pointer to the allocation chunk, NULL if not found.
1925 * @param pGMM Pointer to the GMM instance.
1926 * @param idChunk The ID of the chunk to find.
1927 */
1928DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1929{
1930 RTSpinlockAcquire(pGMM->hSpinLockTree);
1931 PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1932 RTSpinlockRelease(pGMM->hSpinLockTree);
1933 return pChunk;
1934}
1935
1936
1937/**
1938 * Finds a page.
1939 *
1940 * This is not expected to fail and will bitch if it does.
1941 *
1942 * @returns Pointer to the page, NULL if not found.
1943 * @param pGMM Pointer to the GMM instance.
1944 * @param idPage The ID of the page to find.
1945 */
1946DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1947{
1948 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1949 if (RT_LIKELY(pChunk))
1950 return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1951 return NULL;
1952}
1953
1954
1955#if 0 /* unused */
1956/**
1957 * Gets the host physical address for a page given by it's ID.
1958 *
1959 * @returns The host physical address or NIL_RTHCPHYS.
1960 * @param pGMM Pointer to the GMM instance.
1961 * @param idPage The ID of the page to find.
1962 */
1963DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1964{
1965 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1966 if (RT_LIKELY(pChunk))
1967 return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1968 return NIL_RTHCPHYS;
1969}
1970#endif /* unused */
1971
1972
1973/**
1974 * Selects the appropriate free list given the number of free pages.
1975 *
1976 * @returns Free list index.
1977 * @param cFree The number of free pages in the chunk.
1978 */
1979DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1980{
1981 unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1982 AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1983 ("%d (%u)\n", iList, cFree));
1984 return iList;
1985}
1986
1987
1988/**
1989 * Unlinks the chunk from the free list it's currently on (if any).
1990 *
1991 * @param pChunk The allocation chunk.
1992 */
1993DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1994{
1995 PGMMCHUNKFREESET pSet = pChunk->pSet;
1996 if (RT_LIKELY(pSet))
1997 {
1998 pSet->cFreePages -= pChunk->cFree;
1999 pSet->idGeneration++;
2000
2001 PGMMCHUNK pPrev = pChunk->pFreePrev;
2002 PGMMCHUNK pNext = pChunk->pFreeNext;
2003 if (pPrev)
2004 pPrev->pFreeNext = pNext;
2005 else
2006 pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
2007 if (pNext)
2008 pNext->pFreePrev = pPrev;
2009
2010 pChunk->pSet = NULL;
2011 pChunk->pFreeNext = NULL;
2012 pChunk->pFreePrev = NULL;
2013 }
2014 else
2015 {
2016 Assert(!pChunk->pFreeNext);
2017 Assert(!pChunk->pFreePrev);
2018 Assert(!pChunk->cFree);
2019 }
2020}
2021
2022
2023/**
2024 * Links the chunk onto the appropriate free list in the specified free set.
2025 *
2026 * If no free entries, it's not linked into any list.
2027 *
2028 * @param pChunk The allocation chunk.
2029 * @param pSet The free set.
2030 */
2031DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
2032{
2033 Assert(!pChunk->pSet);
2034 Assert(!pChunk->pFreeNext);
2035 Assert(!pChunk->pFreePrev);
2036
2037 if (pChunk->cFree > 0)
2038 {
2039 pChunk->pSet = pSet;
2040 pChunk->pFreePrev = NULL;
2041 unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
2042 pChunk->pFreeNext = pSet->apLists[iList];
2043 if (pChunk->pFreeNext)
2044 pChunk->pFreeNext->pFreePrev = pChunk;
2045 pSet->apLists[iList] = pChunk;
2046
2047 pSet->cFreePages += pChunk->cFree;
2048 pSet->idGeneration++;
2049 }
2050}
2051
2052
2053/**
2054 * Links the chunk onto the appropriate free list in the specified free set.
2055 *
2056 * If no free entries, it's not linked into any list.
2057 *
2058 * @param pGMM Pointer to the GMM instance.
2059 * @param pGVM Pointer to the kernel-only VM instace data.
2060 * @param pChunk The allocation chunk.
2061 */
2062DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
2063{
2064 PGMMCHUNKFREESET pSet;
2065 if (pGMM->fBoundMemoryMode)
2066 pSet = &pGVM->gmm.s.Private;
2067 else if (pChunk->cShared)
2068 pSet = &pGMM->Shared;
2069 else
2070 pSet = &pGMM->PrivateX;
2071 gmmR0LinkChunk(pChunk, pSet);
2072}
2073
2074
2075/**
2076 * Frees a Chunk ID.
2077 *
2078 * @param pGMM Pointer to the GMM instance.
2079 * @param idChunk The Chunk ID to free.
2080 */
2081static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
2082{
2083 AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
2084 AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
2085 ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
2086}
2087
2088
2089/**
2090 * Allocates a new Chunk ID.
2091 *
2092 * @returns The Chunk ID.
2093 * @param pGMM Pointer to the GMM instance.
2094 */
2095static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
2096{
2097 AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
2098 AssertCompile(NIL_GMM_CHUNKID == 0);
2099
2100 /*
2101 * Try the next sequential one.
2102 */
2103 int32_t idChunk = ++pGMM->idChunkPrev;
2104#if 0 /** @todo enable this code */
2105 if ( idChunk <= GMM_CHUNKID_LAST
2106 && idChunk > NIL_GMM_CHUNKID
2107 && !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2108 return idChunk;
2109#endif
2110
2111 /*
2112 * Scan sequentially from the last one.
2113 */
2114 if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2115 && idChunk > NIL_GMM_CHUNKID)
2116 {
2117 idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2118 if (idChunk > NIL_GMM_CHUNKID)
2119 {
2120 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2121 return pGMM->idChunkPrev = idChunk;
2122 }
2123 }
2124
2125 /*
2126 * Ok, scan from the start.
2127 * We're not racing anyone, so there is no need to expect failures or have restart loops.
2128 */
2129 idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2130 AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2131 AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2132
2133 return pGMM->idChunkPrev = idChunk;
2134}
2135
2136
2137/**
2138 * Allocates one private page.
2139 *
2140 * Worker for gmmR0AllocatePages.
2141 *
2142 * @param pChunk The chunk to allocate it from.
2143 * @param hGVM The GVM handle of the VM requesting memory.
2144 * @param pPageDesc The page descriptor.
2145 */
2146static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2147{
2148 /* update the chunk stats. */
2149 if (pChunk->hGVM == NIL_GVM_HANDLE)
2150 pChunk->hGVM = hGVM;
2151 Assert(pChunk->cFree);
2152 pChunk->cFree--;
2153 pChunk->cPrivate++;
2154
2155 /* unlink the first free page. */
2156 const uint32_t iPage = pChunk->iFreeHead;
2157 AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2158 PGMMPAGE pPage = &pChunk->aPages[iPage];
2159 Assert(GMM_PAGE_IS_FREE(pPage));
2160 pChunk->iFreeHead = pPage->Free.iNext;
2161 Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2162 pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage,
2163 pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2164
2165 /* make the page private. */
2166 pPage->u = 0;
2167 AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2168 pPage->Private.hGVM = hGVM;
2169 AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2170 AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2171 if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2172 pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2173 else
2174 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2175
2176 /* update the page descriptor. */
2177 pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2178 Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2179 pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) | iPage;
2180 pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2181}
2182
2183
2184/**
2185 * Picks the free pages from a chunk.
2186 *
2187 * @returns The new page descriptor table index.
2188 * @param pChunk The chunk.
2189 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2190 * affinity.
2191 * @param iPage The current page descriptor table index.
2192 * @param cPages The total number of pages to allocate.
2193 * @param paPages The page descriptor table (input + ouput).
2194 */
2195static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2196 PGMMPAGEDESC paPages)
2197{
2198 PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2199 gmmR0UnlinkChunk(pChunk);
2200
2201 for (; pChunk->cFree && iPage < cPages; iPage++)
2202 gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2203
2204 gmmR0LinkChunk(pChunk, pSet);
2205 return iPage;
2206}
2207
2208
2209/**
2210 * Registers a new chunk of memory.
2211 *
2212 * This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2213 *
2214 * @returns VBox status code. On success, the giant GMM lock will be held, the
2215 * caller must release it (ugly).
2216 * @param pGMM Pointer to the GMM instance.
2217 * @param pSet Pointer to the set.
2218 * @param hMemObj The memory object for the chunk.
2219 * @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2220 * affinity.
2221 * @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2222 * @param ppChunk Chunk address (out). Optional.
2223 *
2224 * @remarks The caller must not own the giant GMM mutex.
2225 * The giant GMM mutex will be acquired and returned acquired in
2226 * the success path. On failure, no locks will be held.
2227 */
2228static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2229 PGMMCHUNK *ppChunk)
2230{
2231 Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2232 Assert(hGVM != NIL_GVM_HANDLE || pGMM->fBoundMemoryMode);
2233#ifdef GMM_WITH_LEGACY_MODE
2234 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE || fChunkFlags == GMM_CHUNK_FLAGS_SEEDED);
2235#else
2236 Assert(fChunkFlags == 0 || fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2237#endif
2238
2239#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2240 /*
2241 * Get a ring-0 mapping of the object.
2242 */
2243# ifdef GMM_WITH_LEGACY_MODE
2244 uint8_t *pbMapping = !(fChunkFlags & GMM_CHUNK_FLAGS_SEEDED) ? (uint8_t *)RTR0MemObjAddress(hMemObj) : NULL;
2245# else
2246 uint8_t *pbMapping = (uint8_t *)RTR0MemObjAddress(hMemObj);
2247# endif
2248 if (!pbMapping)
2249 {
2250 RTR0MEMOBJ hMapObj;
2251 int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE);
2252 if (RT_SUCCESS(rc))
2253 pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2254 else
2255 return rc;
2256 AssertPtr(pbMapping);
2257 }
2258#endif
2259
2260 /*
2261 * Allocate a chunk.
2262 */
2263 int rc;
2264 PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2265 if (pChunk)
2266 {
2267 /*
2268 * Initialize it.
2269 */
2270 pChunk->hMemObj = hMemObj;
2271#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2272 pChunk->pbMapping = pbMapping;
2273#endif
2274 pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2275 pChunk->hGVM = hGVM;
2276 /*pChunk->iFreeHead = 0;*/
2277 pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2278 pChunk->iChunkMtx = UINT8_MAX;
2279 pChunk->fFlags = fChunkFlags;
2280 for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2281 {
2282 pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2283 pChunk->aPages[iPage].Free.iNext = iPage + 1;
2284 }
2285 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2286 pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2287
2288 /*
2289 * Allocate a Chunk ID and insert it into the tree.
2290 * This has to be done behind the mutex of course.
2291 */
2292 rc = gmmR0MutexAcquire(pGMM);
2293 if (RT_SUCCESS(rc))
2294 {
2295 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2296 {
2297 pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2298 if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2299 && pChunk->Core.Key <= GMM_CHUNKID_LAST)
2300 {
2301 RTSpinlockAcquire(pGMM->hSpinLockTree);
2302 if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2303 {
2304 pGMM->cChunks++;
2305 RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2306 RTSpinlockRelease(pGMM->hSpinLockTree);
2307
2308 gmmR0LinkChunk(pChunk, pSet);
2309
2310 LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2311
2312 if (ppChunk)
2313 *ppChunk = pChunk;
2314 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2315 return VINF_SUCCESS;
2316 }
2317 RTSpinlockRelease(pGMM->hSpinLockTree);
2318 }
2319
2320 /* bail out */
2321 rc = VERR_GMM_CHUNK_INSERT;
2322 }
2323 else
2324 rc = VERR_GMM_IS_NOT_SANE;
2325 gmmR0MutexRelease(pGMM);
2326 }
2327
2328 RTMemFree(pChunk);
2329 }
2330 else
2331 rc = VERR_NO_MEMORY;
2332 return rc;
2333}
2334
2335
2336/**
2337 * Allocate a new chunk, immediately pick the requested pages from it, and adds
2338 * what's remaining to the specified free set.
2339 *
2340 * @note This will leave the giant mutex while allocating the new chunk!
2341 *
2342 * @returns VBox status code.
2343 * @param pGMM Pointer to the GMM instance data.
2344 * @param pGVM Pointer to the kernel-only VM instace data.
2345 * @param pSet Pointer to the free set.
2346 * @param cPages The number of pages requested.
2347 * @param paPages The page descriptor table (input + output).
2348 * @param piPage The pointer to the page descriptor table index variable.
2349 * This will be updated.
2350 */
2351static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2352 PGMMPAGEDESC paPages, uint32_t *piPage)
2353{
2354 gmmR0MutexRelease(pGMM);
2355
2356 RTR0MEMOBJ hMemObj;
2357#ifndef GMM_WITH_LEGACY_MODE
2358 int rc;
2359# ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2360 if (pGMM->fHasWorkingAllocPhysNC)
2361 rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2362 else
2363# endif
2364 rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /*fExecutable*/);
2365#else
2366 int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2367#endif
2368 if (RT_SUCCESS(rc))
2369 {
2370 /** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2371 * free pages first and then unchaining them right afterwards. Instead
2372 * do as much work as possible without holding the giant lock. */
2373 PGMMCHUNK pChunk;
2374 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /*fChunkFlags*/, &pChunk);
2375 if (RT_SUCCESS(rc))
2376 {
2377 *piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, *piPage, cPages, paPages);
2378 return VINF_SUCCESS;
2379 }
2380
2381 /* bail out */
2382 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2383 }
2384
2385 int rc2 = gmmR0MutexAcquire(pGMM);
2386 AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2387 return rc;
2388
2389}
2390
2391
2392/**
2393 * As a last restort we'll pick any page we can get.
2394 *
2395 * @returns The new page descriptor table index.
2396 * @param pSet The set to pick from.
2397 * @param pGVM Pointer to the global VM structure.
2398 * @param iPage The current page descriptor table index.
2399 * @param cPages The total number of pages to allocate.
2400 * @param paPages The page descriptor table (input + ouput).
2401 */
2402static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2403 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2404{
2405 unsigned iList = RT_ELEMENTS(pSet->apLists);
2406 while (iList-- > 0)
2407 {
2408 PGMMCHUNK pChunk = pSet->apLists[iList];
2409 while (pChunk)
2410 {
2411 PGMMCHUNK pNext = pChunk->pFreeNext;
2412
2413 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2414 if (iPage >= cPages)
2415 return iPage;
2416
2417 pChunk = pNext;
2418 }
2419 }
2420 return iPage;
2421}
2422
2423
2424/**
2425 * Pick pages from empty chunks on the same NUMA node.
2426 *
2427 * @returns The new page descriptor table index.
2428 * @param pSet The set to pick from.
2429 * @param pGVM Pointer to the global VM structure.
2430 * @param iPage The current page descriptor table index.
2431 * @param cPages The total number of pages to allocate.
2432 * @param paPages The page descriptor table (input + ouput).
2433 */
2434static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2435 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2436{
2437 PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2438 if (pChunk)
2439 {
2440 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2441 while (pChunk)
2442 {
2443 PGMMCHUNK pNext = pChunk->pFreeNext;
2444
2445 if (pChunk->idNumaNode == idNumaNode)
2446 {
2447 pChunk->hGVM = pGVM->hSelf;
2448 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2449 if (iPage >= cPages)
2450 {
2451 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2452 return iPage;
2453 }
2454 }
2455
2456 pChunk = pNext;
2457 }
2458 }
2459 return iPage;
2460}
2461
2462
2463/**
2464 * Pick pages from non-empty chunks on the same NUMA node.
2465 *
2466 * @returns The new page descriptor table index.
2467 * @param pSet The set to pick from.
2468 * @param pGVM Pointer to the global VM structure.
2469 * @param iPage The current page descriptor table index.
2470 * @param cPages The total number of pages to allocate.
2471 * @param paPages The page descriptor table (input + ouput).
2472 */
2473static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2474 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2475{
2476 /** @todo start by picking from chunks with about the right size first? */
2477 uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2478 unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2479 while (iList-- > 0)
2480 {
2481 PGMMCHUNK pChunk = pSet->apLists[iList];
2482 while (pChunk)
2483 {
2484 PGMMCHUNK pNext = pChunk->pFreeNext;
2485
2486 if (pChunk->idNumaNode == idNumaNode)
2487 {
2488 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2489 if (iPage >= cPages)
2490 {
2491 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2492 return iPage;
2493 }
2494 }
2495
2496 pChunk = pNext;
2497 }
2498 }
2499 return iPage;
2500}
2501
2502
2503/**
2504 * Pick pages that are in chunks already associated with the VM.
2505 *
2506 * @returns The new page descriptor table index.
2507 * @param pGMM Pointer to the GMM instance data.
2508 * @param pGVM Pointer to the global VM structure.
2509 * @param pSet The set to pick from.
2510 * @param iPage The current page descriptor table index.
2511 * @param cPages The total number of pages to allocate.
2512 * @param paPages The page descriptor table (input + ouput).
2513 */
2514static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2515 uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2516{
2517 uint16_t const hGVM = pGVM->hSelf;
2518
2519 /* Hint. */
2520 if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2521 {
2522 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2523 if (pChunk && pChunk->cFree)
2524 {
2525 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2526 if (iPage >= cPages)
2527 return iPage;
2528 }
2529 }
2530
2531 /* Scan. */
2532 for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2533 {
2534 PGMMCHUNK pChunk = pSet->apLists[iList];
2535 while (pChunk)
2536 {
2537 PGMMCHUNK pNext = pChunk->pFreeNext;
2538
2539 if (pChunk->hGVM == hGVM)
2540 {
2541 iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2542 if (iPage >= cPages)
2543 {
2544 pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2545 return iPage;
2546 }
2547 }
2548
2549 pChunk = pNext;
2550 }
2551 }
2552 return iPage;
2553}
2554
2555
2556
2557/**
2558 * Pick pages in bound memory mode.
2559 *
2560 * @returns The new page descriptor table index.
2561 * @param pGVM Pointer to the global VM structure.
2562 * @param iPage The current page descriptor table index.
2563 * @param cPages The total number of pages to allocate.
2564 * @param paPages The page descriptor table (input + ouput).
2565 */
2566static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2567{
2568 for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2569 {
2570 PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2571 while (pChunk)
2572 {
2573 Assert(pChunk->hGVM == pGVM->hSelf);
2574 PGMMCHUNK pNext = pChunk->pFreeNext;
2575 iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2576 if (iPage >= cPages)
2577 return iPage;
2578 pChunk = pNext;
2579 }
2580 }
2581 return iPage;
2582}
2583
2584
2585/**
2586 * Checks if we should start picking pages from chunks of other VMs because
2587 * we're getting close to the system memory or reserved limit.
2588 *
2589 * @returns @c true if we should, @c false if we should first try allocate more
2590 * chunks.
2591 */
2592static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2593{
2594 /*
2595 * Don't allocate a new chunk if we're
2596 */
2597 uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2598 + pGVM->gmm.s.Stats.Reserved.cFixedPages
2599 - pGVM->gmm.s.Stats.cBalloonedPages
2600 /** @todo what about shared pages? */;
2601 uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2602 + pGVM->gmm.s.Stats.Allocated.cFixedPages;
2603 uint64_t cPgDelta = cPgReserved - cPgAllocated;
2604 if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2605 return true;
2606 /** @todo make the threshold configurable, also test the code to see if
2607 * this ever kicks in (we might be reserving too much or smth). */
2608
2609 /*
2610 * Check how close we're to the max memory limit and how many fragments
2611 * there are?...
2612 */
2613 /** @todo */
2614
2615 return false;
2616}
2617
2618
2619/**
2620 * Checks if we should start picking pages from chunks of other VMs because
2621 * there is a lot of free pages around.
2622 *
2623 * @returns @c true if we should, @c false if we should first try allocate more
2624 * chunks.
2625 */
2626static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2627{
2628 /*
2629 * Setting the limit at 16 chunks (32 MB) at the moment.
2630 */
2631 if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2632 return true;
2633 return false;
2634}
2635
2636
2637/**
2638 * Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2639 *
2640 * @returns VBox status code:
2641 * @retval VINF_SUCCESS on success.
2642 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2643 * gmmR0AllocateMoreChunks is necessary.
2644 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2645 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2646 * that is we're trying to allocate more than we've reserved.
2647 *
2648 * @param pGMM Pointer to the GMM instance data.
2649 * @param pGVM Pointer to the VM.
2650 * @param cPages The number of pages to allocate.
2651 * @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2652 * details on what is expected on input.
2653 * @param enmAccount The account to charge.
2654 *
2655 * @remarks Call takes the giant GMM lock.
2656 */
2657static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2658{
2659 Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2660
2661 /*
2662 * Check allocation limits.
2663 */
2664 if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2665 return VERR_GMM_HIT_GLOBAL_LIMIT;
2666
2667 switch (enmAccount)
2668 {
2669 case GMMACCOUNT_BASE:
2670 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2671 > pGVM->gmm.s.Stats.Reserved.cBasePages))
2672 {
2673 Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2674 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2675 pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2676 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2677 }
2678 break;
2679 case GMMACCOUNT_SHADOW:
2680 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2681 {
2682 Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2683 pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2684 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2685 }
2686 break;
2687 case GMMACCOUNT_FIXED:
2688 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2689 {
2690 Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2691 pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2692 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2693 }
2694 break;
2695 default:
2696 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2697 }
2698
2699#ifdef GMM_WITH_LEGACY_MODE
2700 /*
2701 * If we're in legacy memory mode, it's easy to figure if we have
2702 * sufficient number of pages up-front.
2703 */
2704 if ( pGMM->fLegacyAllocationMode
2705 && pGVM->gmm.s.Private.cFreePages < cPages)
2706 {
2707 Assert(pGMM->fBoundMemoryMode);
2708 return VERR_GMM_SEED_ME;
2709 }
2710#endif
2711
2712 /*
2713 * Update the accounts before we proceed because we might be leaving the
2714 * protection of the global mutex and thus run the risk of permitting
2715 * too much memory to be allocated.
2716 */
2717 switch (enmAccount)
2718 {
2719 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2720 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2721 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2722 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2723 }
2724 pGVM->gmm.s.Stats.cPrivatePages += cPages;
2725 pGMM->cAllocatedPages += cPages;
2726
2727#ifdef GMM_WITH_LEGACY_MODE
2728 /*
2729 * Part two of it's-easy-in-legacy-memory-mode.
2730 */
2731 if (pGMM->fLegacyAllocationMode)
2732 {
2733 uint32_t iPage = gmmR0AllocatePagesInBoundMode(pGVM, 0, cPages, paPages);
2734 AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2735 return VINF_SUCCESS;
2736 }
2737#endif
2738
2739 /*
2740 * Bound mode is also relatively straightforward.
2741 */
2742 uint32_t iPage = 0;
2743 int rc = VINF_SUCCESS;
2744 if (pGMM->fBoundMemoryMode)
2745 {
2746 iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2747 if (iPage < cPages)
2748 do
2749 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2750 while (iPage < cPages && RT_SUCCESS(rc));
2751 }
2752 /*
2753 * Shared mode is trickier as we should try archive the same locality as
2754 * in bound mode, but smartly make use of non-full chunks allocated by
2755 * other VMs if we're low on memory.
2756 */
2757 else
2758 {
2759 /* Pick the most optimal pages first. */
2760 iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2761 if (iPage < cPages)
2762 {
2763 /* Maybe we should try getting pages from chunks "belonging" to
2764 other VMs before allocating more chunks? */
2765 bool fTriedOnSameAlready = false;
2766 if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2767 {
2768 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2769 fTriedOnSameAlready = true;
2770 }
2771
2772 /* Allocate memory from empty chunks. */
2773 if (iPage < cPages)
2774 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2775
2776 /* Grab empty shared chunks. */
2777 if (iPage < cPages)
2778 iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2779
2780 /* If there is a lof of free pages spread around, try not waste
2781 system memory on more chunks. (Should trigger defragmentation.) */
2782 if ( !fTriedOnSameAlready
2783 && gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2784 {
2785 iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2786 if (iPage < cPages)
2787 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2788 }
2789
2790 /*
2791 * Ok, try allocate new chunks.
2792 */
2793 if (iPage < cPages)
2794 {
2795 do
2796 rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2797 while (iPage < cPages && RT_SUCCESS(rc));
2798
2799 /* If the host is out of memory, take whatever we can get. */
2800 if ( (rc == VERR_NO_MEMORY || rc == VERR_NO_PHYS_MEMORY)
2801 && pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2802 {
2803 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2804 if (iPage < cPages)
2805 iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2806 AssertRelease(iPage == cPages);
2807 rc = VINF_SUCCESS;
2808 }
2809 }
2810 }
2811 }
2812
2813 /*
2814 * Clean up on failure. Since this is bound to be a low-memory condition
2815 * we will give back any empty chunks that might be hanging around.
2816 */
2817 if (RT_FAILURE(rc))
2818 {
2819 /* Update the statistics. */
2820 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2821 pGMM->cAllocatedPages -= cPages - iPage;
2822 switch (enmAccount)
2823 {
2824 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2825 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2826 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2827 default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2828 }
2829
2830 /* Release the pages. */
2831 while (iPage-- > 0)
2832 {
2833 uint32_t idPage = paPages[iPage].idPage;
2834 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2835 if (RT_LIKELY(pPage))
2836 {
2837 Assert(GMM_PAGE_IS_PRIVATE(pPage));
2838 Assert(pPage->Private.hGVM == pGVM->hSelf);
2839 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2840 }
2841 else
2842 AssertMsgFailed(("idPage=%#x\n", idPage));
2843
2844 paPages[iPage].idPage = NIL_GMM_PAGEID;
2845 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2846 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2847 }
2848
2849 /* Free empty chunks. */
2850 /** @todo */
2851
2852 /* return the fail status on failure */
2853 return rc;
2854 }
2855 return VINF_SUCCESS;
2856}
2857
2858
2859/**
2860 * Updates the previous allocations and allocates more pages.
2861 *
2862 * The handy pages are always taken from the 'base' memory account.
2863 * The allocated pages are not cleared and will contains random garbage.
2864 *
2865 * @returns VBox status code:
2866 * @retval VINF_SUCCESS on success.
2867 * @retval VERR_NOT_OWNER if the caller is not an EMT.
2868 * @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2869 * @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2870 * private page.
2871 * @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2872 * shared page.
2873 * @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2874 * owned by the VM.
2875 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2876 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2877 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2878 * that is we're trying to allocate more than we've reserved.
2879 *
2880 * @param pGVM The global (ring-0) VM structure.
2881 * @param idCpu The VCPU id.
2882 * @param cPagesToUpdate The number of pages to update (starting from the head).
2883 * @param cPagesToAlloc The number of pages to allocate (starting from the head).
2884 * @param paPages The array of page descriptors.
2885 * See GMMPAGEDESC for details on what is expected on input.
2886 * @thread EMT(idCpu)
2887 */
2888GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2889 uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2890{
2891 LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2892 pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2893
2894 /*
2895 * Validate, get basics and take the semaphore.
2896 * (This is a relatively busy path, so make predictions where possible.)
2897 */
2898 PGMM pGMM;
2899 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2900 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2901 if (RT_FAILURE(rc))
2902 return rc;
2903
2904 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2905 AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2906 || (cPagesToAlloc && cPagesToAlloc < 1024),
2907 ("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2908 VERR_INVALID_PARAMETER);
2909
2910 unsigned iPage = 0;
2911 for (; iPage < cPagesToUpdate; iPage++)
2912 {
2913 AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2914 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2915 || paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2916 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2917 ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2918 VERR_INVALID_PARAMETER);
2919 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2920 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
2921 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2922 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2923 /*|| paPages[iPage].idSharedPage == NIL_GMM_PAGEID*/,
2924 ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2925 }
2926
2927 for (; iPage < cPagesToAlloc; iPage++)
2928 {
2929 AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2930 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2931 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2932 }
2933
2934 gmmR0MutexAcquire(pGMM);
2935 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2936 {
2937 /* No allocations before the initial reservation has been made! */
2938 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2939 && pGVM->gmm.s.Stats.Reserved.cFixedPages
2940 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
2941 {
2942 /*
2943 * Perform the updates.
2944 * Stop on the first error.
2945 */
2946 for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2947 {
2948 if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2949 {
2950 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2951 if (RT_LIKELY(pPage))
2952 {
2953 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2954 {
2955 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2956 {
2957 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2958 if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2959 pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2960 else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2961 pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2962 /* else: NIL_RTHCPHYS nothing */
2963
2964 paPages[iPage].idPage = NIL_GMM_PAGEID;
2965 paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2966 }
2967 else
2968 {
2969 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2970 iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2971 rc = VERR_GMM_NOT_PAGE_OWNER;
2972 break;
2973 }
2974 }
2975 else
2976 {
2977 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.*Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(*pPage), pPage, pPage->Common.u2State));
2978 rc = VERR_GMM_PAGE_NOT_PRIVATE;
2979 break;
2980 }
2981 }
2982 else
2983 {
2984 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2985 rc = VERR_GMM_PAGE_NOT_FOUND;
2986 break;
2987 }
2988 }
2989
2990 if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2991 {
2992 PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2993 if (RT_LIKELY(pPage))
2994 {
2995 if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2996 {
2997 AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2998 Assert(pPage->Shared.cRefs);
2999 Assert(pGVM->gmm.s.Stats.cSharedPages);
3000 Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
3001
3002 Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
3003 pGVM->gmm.s.Stats.cSharedPages--;
3004 pGVM->gmm.s.Stats.Allocated.cBasePages--;
3005 if (!--pPage->Shared.cRefs)
3006 gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
3007 else
3008 {
3009 Assert(pGMM->cDuplicatePages);
3010 pGMM->cDuplicatePages--;
3011 }
3012
3013 paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
3014 }
3015 else
3016 {
3017 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
3018 rc = VERR_GMM_PAGE_NOT_SHARED;
3019 break;
3020 }
3021 }
3022 else
3023 {
3024 Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
3025 rc = VERR_GMM_PAGE_NOT_FOUND;
3026 break;
3027 }
3028 }
3029 } /* for each page to update */
3030
3031 if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
3032 {
3033#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
3034 for (iPage = 0; iPage < cPagesToAlloc; iPage++)
3035 {
3036 Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
3037 Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
3038 Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
3039 }
3040#endif
3041
3042 /*
3043 * Join paths with GMMR0AllocatePages for the allocation.
3044 * Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
3045 */
3046 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
3047 }
3048 }
3049 else
3050 rc = VERR_WRONG_ORDER;
3051 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3052 }
3053 else
3054 rc = VERR_GMM_IS_NOT_SANE;
3055 gmmR0MutexRelease(pGMM);
3056 LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3057 return rc;
3058}
3059
3060
3061/**
3062 * Allocate one or more pages.
3063 *
3064 * This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3065 * The allocated pages are not cleared and will contain random garbage.
3066 *
3067 * @returns VBox status code:
3068 * @retval VINF_SUCCESS on success.
3069 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3070 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3071 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3072 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3073 * that is we're trying to allocate more than we've reserved.
3074 *
3075 * @param pGVM The global (ring-0) VM structure.
3076 * @param idCpu The VCPU id.
3077 * @param cPages The number of pages to allocate.
3078 * @param paPages Pointer to the page descriptors.
3079 * See GMMPAGEDESC for details on what is expected on
3080 * input.
3081 * @param enmAccount The account to charge.
3082 *
3083 * @thread EMT.
3084 */
3085GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3086{
3087 LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3088
3089 /*
3090 * Validate, get basics and take the semaphore.
3091 */
3092 PGMM pGMM;
3093 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3094 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3095 if (RT_FAILURE(rc))
3096 return rc;
3097
3098 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3099 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3100 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3101
3102 for (unsigned iPage = 0; iPage < cPages; iPage++)
3103 {
3104 AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
3105 || paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3106 || ( enmAccount == GMMACCOUNT_BASE
3107 && paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3108 && !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3109 ("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3110 VERR_INVALID_PARAMETER);
3111 AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3112 AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3113 }
3114
3115 gmmR0MutexAcquire(pGMM);
3116 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3117 {
3118
3119 /* No allocations before the initial reservation has been made! */
3120 if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3121 && pGVM->gmm.s.Stats.Reserved.cFixedPages
3122 && pGVM->gmm.s.Stats.Reserved.cShadowPages))
3123 rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3124 else
3125 rc = VERR_WRONG_ORDER;
3126 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3127 }
3128 else
3129 rc = VERR_GMM_IS_NOT_SANE;
3130 gmmR0MutexRelease(pGMM);
3131 LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3132 return rc;
3133}
3134
3135
3136/**
3137 * VMMR0 request wrapper for GMMR0AllocatePages.
3138 *
3139 * @returns see GMMR0AllocatePages.
3140 * @param pGVM The global (ring-0) VM structure.
3141 * @param idCpu The VCPU id.
3142 * @param pReq Pointer to the request packet.
3143 */
3144GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3145{
3146 /*
3147 * Validate input and pass it on.
3148 */
3149 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3150 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3151 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3152 VERR_INVALID_PARAMETER);
3153 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3154 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3155 VERR_INVALID_PARAMETER);
3156
3157 return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3158}
3159
3160
3161/**
3162 * Allocate a large page to represent guest RAM
3163 *
3164 * The allocated pages are not cleared and will contains random garbage.
3165 *
3166 * @returns VBox status code:
3167 * @retval VINF_SUCCESS on success.
3168 * @retval VERR_NOT_OWNER if the caller is not an EMT.
3169 * @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3170 * @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3171 * @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3172 * that is we're trying to allocate more than we've reserved.
3173 * @returns see GMMR0AllocatePages.
3174 *
3175 * @param pGVM The global (ring-0) VM structure.
3176 * @param idCpu The VCPU id.
3177 * @param cbPage Large page size.
3178 * @param pIdPage Where to return the GMM page ID of the page.
3179 * @param pHCPhys Where to return the host physical address of the page.
3180 */
3181GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t *pIdPage, RTHCPHYS *pHCPhys)
3182{
3183 LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3184
3185 AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3186 AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3187 AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3188
3189 /*
3190 * Validate, get basics and take the semaphore.
3191 */
3192 PGMM pGMM;
3193 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3194 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3195 if (RT_FAILURE(rc))
3196 return rc;
3197
3198#ifdef GMM_WITH_LEGACY_MODE
3199 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3200 // if (pGMM->fLegacyAllocationMode)
3201 // return VERR_NOT_SUPPORTED;
3202#endif
3203
3204 *pHCPhys = NIL_RTHCPHYS;
3205 *pIdPage = NIL_GMM_PAGEID;
3206
3207 gmmR0MutexAcquire(pGMM);
3208 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3209 {
3210 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3211 if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3212 > pGVM->gmm.s.Stats.Reserved.cBasePages))
3213 {
3214 Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3215 pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3216 gmmR0MutexRelease(pGMM);
3217 return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3218 }
3219
3220 /*
3221 * Allocate a new large page chunk.
3222 *
3223 * Note! We leave the giant GMM lock temporarily as the allocation might
3224 * take a long time. gmmR0RegisterChunk will retake it (ugly).
3225 */
3226 AssertCompile(GMM_CHUNK_SIZE == _2M);
3227 gmmR0MutexRelease(pGMM);
3228
3229 RTR0MEMOBJ hMemObj;
3230 rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3231 if (RT_SUCCESS(rc))
3232 {
3233 PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3234 PGMMCHUNK pChunk;
3235 rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3236 if (RT_SUCCESS(rc))
3237 {
3238 /*
3239 * Allocate all the pages in the chunk.
3240 */
3241 /* Unlink the new chunk from the free list. */
3242 gmmR0UnlinkChunk(pChunk);
3243
3244 /** @todo rewrite this to skip the looping. */
3245 /* Allocate all pages. */
3246 GMMPAGEDESC PageDesc;
3247 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3248
3249 /* Return the first page as we'll use the whole chunk as one big page. */
3250 *pIdPage = PageDesc.idPage;
3251 *pHCPhys = PageDesc.HCPhysGCPhys;
3252
3253 for (unsigned i = 1; i < cPages; i++)
3254 gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3255
3256 /* Update accounting. */
3257 pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3258 pGVM->gmm.s.Stats.cPrivatePages += cPages;
3259 pGMM->cAllocatedPages += cPages;
3260
3261 gmmR0LinkChunk(pChunk, pSet);
3262 gmmR0MutexRelease(pGMM);
3263 LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3264 return VINF_SUCCESS;
3265 }
3266 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3267 }
3268 }
3269 else
3270 {
3271 gmmR0MutexRelease(pGMM);
3272 rc = VERR_GMM_IS_NOT_SANE;
3273 }
3274
3275 LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3276 return rc;
3277}
3278
3279
3280/**
3281 * Free a large page.
3282 *
3283 * @returns VBox status code:
3284 * @param pGVM The global (ring-0) VM structure.
3285 * @param idCpu The VCPU id.
3286 * @param idPage The large page id.
3287 */
3288GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3289{
3290 LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3291
3292 /*
3293 * Validate, get basics and take the semaphore.
3294 */
3295 PGMM pGMM;
3296 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3297 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3298 if (RT_FAILURE(rc))
3299 return rc;
3300
3301#ifdef GMM_WITH_LEGACY_MODE
3302 // /* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3303 // if (pGMM->fLegacyAllocationMode)
3304 // return VERR_NOT_SUPPORTED;
3305#endif
3306
3307 gmmR0MutexAcquire(pGMM);
3308 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3309 {
3310 const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3311
3312 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3313 {
3314 Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3315 gmmR0MutexRelease(pGMM);
3316 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3317 }
3318
3319 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3320 if (RT_LIKELY( pPage
3321 && GMM_PAGE_IS_PRIVATE(pPage)))
3322 {
3323 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3324 Assert(pChunk);
3325 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3326 Assert(pChunk->cPrivate > 0);
3327
3328 /* Release the memory immediately. */
3329 gmmR0FreeChunk(pGMM, NULL, pChunk, false /*fRelaxedSem*/); /** @todo this can be relaxed too! */
3330
3331 /* Update accounting. */
3332 pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3333 pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3334 pGMM->cAllocatedPages -= cPages;
3335 }
3336 else
3337 rc = VERR_GMM_PAGE_NOT_FOUND;
3338 }
3339 else
3340 rc = VERR_GMM_IS_NOT_SANE;
3341
3342 gmmR0MutexRelease(pGMM);
3343 LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3344 return rc;
3345}
3346
3347
3348/**
3349 * VMMR0 request wrapper for GMMR0FreeLargePage.
3350 *
3351 * @returns see GMMR0FreeLargePage.
3352 * @param pGVM The global (ring-0) VM structure.
3353 * @param idCpu The VCPU id.
3354 * @param pReq Pointer to the request packet.
3355 */
3356GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3357{
3358 /*
3359 * Validate input and pass it on.
3360 */
3361 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3362 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3363 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3364 VERR_INVALID_PARAMETER);
3365
3366 return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3367}
3368
3369
3370/**
3371 * @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3372 * Used by gmmR0FreeChunkFlushPerVmTlbs().}
3373 */
3374static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3375{
3376 RT_NOREF(pvUser);
3377 if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3378 {
3379 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3380 uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3381 while (i-- > 0)
3382 {
3383 pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3384 pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3385 }
3386 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3387 }
3388 return VINF_SUCCESS;
3389}
3390
3391
3392/**
3393 * Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3394 * free generation ID value.
3395 *
3396 * This is done at 2^62 - 1, which allows us to drop all locks and as it will
3397 * take a while before 12 exa (2 305 843 009 213 693 952) calls to
3398 * gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3399 * invalidation passes and resets the generation ID between then. This will
3400 * make sure there are no false positives.
3401 *
3402 * @param pGMM Pointer to the GMM instance.
3403 */
3404static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3405{
3406 /*
3407 * First invalidation pass.
3408 */
3409 int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3410 AssertRCSuccess(rc);
3411
3412 /*
3413 * Reset the generation number.
3414 */
3415 RTSpinlockAcquire(pGMM->hSpinLockTree);
3416 ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3417 RTSpinlockRelease(pGMM->hSpinLockTree);
3418
3419 /*
3420 * Second invalidation pass.
3421 */
3422 rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3423 AssertRCSuccess(rc);
3424}
3425
3426
3427/**
3428 * Frees a chunk, giving it back to the host OS.
3429 *
3430 * @param pGMM Pointer to the GMM instance.
3431 * @param pGVM This is set when called from GMMR0CleanupVM so we can
3432 * unmap and free the chunk in one go.
3433 * @param pChunk The chunk to free.
3434 * @param fRelaxedSem Whether we can release the semaphore while doing the
3435 * freeing (@c true) or not.
3436 */
3437static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3438{
3439 Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3440
3441 GMMR0CHUNKMTXSTATE MtxState;
3442 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3443
3444 /*
3445 * Cleanup hack! Unmap the chunk from the callers address space.
3446 * This shouldn't happen, so screw lock contention...
3447 */
3448 if ( pChunk->cMappingsX
3449#ifdef GMM_WITH_LEGACY_MODE
3450 && (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
3451#endif
3452 && pGVM)
3453 gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3454
3455 /*
3456 * If there are current mappings of the chunk, then request the
3457 * VMs to unmap them. Reposition the chunk in the free list so
3458 * it won't be a likely candidate for allocations.
3459 */
3460 if (pChunk->cMappingsX)
3461 {
3462 /** @todo R0 -> VM request */
3463 /* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3464 Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3465 gmmR0ChunkMutexRelease(&MtxState, pChunk);
3466 return false;
3467 }
3468
3469
3470 /*
3471 * Save and trash the handle.
3472 */
3473 RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3474 pChunk->hMemObj = NIL_RTR0MEMOBJ;
3475
3476 /*
3477 * Unlink it from everywhere.
3478 */
3479 gmmR0UnlinkChunk(pChunk);
3480
3481 RTSpinlockAcquire(pGMM->hSpinLockTree);
3482
3483 RTListNodeRemove(&pChunk->ListNode);
3484
3485 PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3486 Assert(pCore == &pChunk->Core); NOREF(pCore);
3487
3488 PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3489 if (pTlbe->pChunk == pChunk)
3490 {
3491 pTlbe->idChunk = NIL_GMM_CHUNKID;
3492 pTlbe->pChunk = NULL;
3493 }
3494
3495 Assert(pGMM->cChunks > 0);
3496 pGMM->cChunks--;
3497
3498 uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3499
3500 RTSpinlockRelease(pGMM->hSpinLockTree);
3501
3502 /*
3503 * Free the Chunk ID before dropping the locks and freeing the rest.
3504 */
3505 gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3506 pChunk->Core.Key = NIL_GMM_CHUNKID;
3507
3508 pGMM->cFreedChunks++;
3509
3510 gmmR0ChunkMutexRelease(&MtxState, NULL);
3511 if (fRelaxedSem)
3512 gmmR0MutexRelease(pGMM);
3513
3514 if (idFreeGeneration == UINT64_MAX / 4)
3515 gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3516
3517 RTMemFree(pChunk->paMappingsX);
3518 pChunk->paMappingsX = NULL;
3519
3520 RTMemFree(pChunk);
3521
3522#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3523 int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3524#else
3525 int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3526#endif
3527 AssertLogRelRC(rc);
3528
3529 if (fRelaxedSem)
3530 gmmR0MutexAcquire(pGMM);
3531 return fRelaxedSem;
3532}
3533
3534
3535/**
3536 * Free page worker.
3537 *
3538 * The caller does all the statistic decrementing, we do all the incrementing.
3539 *
3540 * @param pGMM Pointer to the GMM instance data.
3541 * @param pGVM Pointer to the GVM instance.
3542 * @param pChunk Pointer to the chunk this page belongs to.
3543 * @param idPage The Page ID.
3544 * @param pPage Pointer to the page.
3545 */
3546static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3547{
3548 Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3549 pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3550
3551 /*
3552 * Put the page on the free list.
3553 */
3554 pPage->u = 0;
3555 pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3556 Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) || pChunk->iFreeHead == UINT16_MAX);
3557 pPage->Free.iNext = pChunk->iFreeHead;
3558 pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3559
3560 /*
3561 * Update statistics (the cShared/cPrivate stats are up to date already),
3562 * and relink the chunk if necessary.
3563 */
3564 unsigned const cFree = pChunk->cFree;
3565 if ( !cFree
3566 || gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3567 {
3568 gmmR0UnlinkChunk(pChunk);
3569 pChunk->cFree++;
3570 gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3571 }
3572 else
3573 {
3574 pChunk->cFree = cFree + 1;
3575 pChunk->pSet->cFreePages++;
3576 }
3577
3578 /*
3579 * If the chunk becomes empty, consider giving memory back to the host OS.
3580 *
3581 * The current strategy is to try give it back if there are other chunks
3582 * in this free list, meaning if there are at least 240 free pages in this
3583 * category. Note that since there are probably mappings of the chunk,
3584 * it won't be freed up instantly, which probably screws up this logic
3585 * a bit...
3586 */
3587 /** @todo Do this on the way out. */
3588 if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3589 || pChunk->pFreeNext == NULL
3590 || pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3591 { /* likely */ }
3592#ifdef GMM_WITH_LEGACY_MODE
3593 else if (RT_LIKELY(pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE)))
3594 { /* likely */ }
3595#endif
3596 else
3597 gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3598
3599}
3600
3601
3602/**
3603 * Frees a shared page, the page is known to exist and be valid and such.
3604 *
3605 * @param pGMM Pointer to the GMM instance.
3606 * @param pGVM Pointer to the GVM instance.
3607 * @param idPage The page id.
3608 * @param pPage The page structure.
3609 */
3610DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3611{
3612 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3613 Assert(pChunk);
3614 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3615 Assert(pChunk->cShared > 0);
3616 Assert(pGMM->cSharedPages > 0);
3617 Assert(pGMM->cAllocatedPages > 0);
3618 Assert(!pPage->Shared.cRefs);
3619
3620 pChunk->cShared--;
3621 pGMM->cAllocatedPages--;
3622 pGMM->cSharedPages--;
3623 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3624}
3625
3626
3627/**
3628 * Frees a private page, the page is known to exist and be valid and such.
3629 *
3630 * @param pGMM Pointer to the GMM instance.
3631 * @param pGVM Pointer to the GVM instance.
3632 * @param idPage The page id.
3633 * @param pPage The page structure.
3634 */
3635DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3636{
3637 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3638 Assert(pChunk);
3639 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3640 Assert(pChunk->cPrivate > 0);
3641 Assert(pGMM->cAllocatedPages > 0);
3642
3643 pChunk->cPrivate--;
3644 pGMM->cAllocatedPages--;
3645 gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3646}
3647
3648
3649/**
3650 * Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3651 *
3652 * @returns VBox status code:
3653 * @retval xxx
3654 *
3655 * @param pGMM Pointer to the GMM instance data.
3656 * @param pGVM Pointer to the VM.
3657 * @param cPages The number of pages to free.
3658 * @param paPages Pointer to the page descriptors.
3659 * @param enmAccount The account this relates to.
3660 */
3661static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3662{
3663 /*
3664 * Check that the request isn't impossible wrt to the account status.
3665 */
3666 switch (enmAccount)
3667 {
3668 case GMMACCOUNT_BASE:
3669 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3670 {
3671 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3672 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3673 }
3674 break;
3675 case GMMACCOUNT_SHADOW:
3676 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3677 {
3678 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3679 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3680 }
3681 break;
3682 case GMMACCOUNT_FIXED:
3683 if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3684 {
3685 Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3686 return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3687 }
3688 break;
3689 default:
3690 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3691 }
3692
3693 /*
3694 * Walk the descriptors and free the pages.
3695 *
3696 * Statistics (except the account) are being updated as we go along,
3697 * unlike the alloc code. Also, stop on the first error.
3698 */
3699 int rc = VINF_SUCCESS;
3700 uint32_t iPage;
3701 for (iPage = 0; iPage < cPages; iPage++)
3702 {
3703 uint32_t idPage = paPages[iPage].idPage;
3704 PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3705 if (RT_LIKELY(pPage))
3706 {
3707 if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3708 {
3709 if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3710 {
3711 Assert(pGVM->gmm.s.Stats.cPrivatePages);
3712 pGVM->gmm.s.Stats.cPrivatePages--;
3713 gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3714 }
3715 else
3716 {
3717 Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3718 pPage->Private.hGVM, pGVM->hSelf));
3719 rc = VERR_GMM_NOT_PAGE_OWNER;
3720 break;
3721 }
3722 }
3723 else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3724 {
3725 Assert(pGVM->gmm.s.Stats.cSharedPages);
3726 Assert(pPage->Shared.cRefs);
3727#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3728 if (pPage->Shared.u14Checksum)
3729 {
3730 uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3731 uChecksum &= UINT32_C(0x00003fff);
3732 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum,
3733 ("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3734 }
3735#endif
3736 pGVM->gmm.s.Stats.cSharedPages--;
3737 if (!--pPage->Shared.cRefs)
3738 gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3739 else
3740 {
3741 Assert(pGMM->cDuplicatePages);
3742 pGMM->cDuplicatePages--;
3743 }
3744 }
3745 else
3746 {
3747 Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3748 rc = VERR_GMM_PAGE_ALREADY_FREE;
3749 break;
3750 }
3751 }
3752 else
3753 {
3754 Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3755 rc = VERR_GMM_PAGE_NOT_FOUND;
3756 break;
3757 }
3758 paPages[iPage].idPage = NIL_GMM_PAGEID;
3759 }
3760
3761 /*
3762 * Update the account.
3763 */
3764 switch (enmAccount)
3765 {
3766 case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3767 case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3768 case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3769 default:
3770 AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3771 }
3772
3773 /*
3774 * Any threshold stuff to be done here?
3775 */
3776
3777 return rc;
3778}
3779
3780
3781/**
3782 * Free one or more pages.
3783 *
3784 * This is typically used at reset time or power off.
3785 *
3786 * @returns VBox status code:
3787 * @retval xxx
3788 *
3789 * @param pGVM The global (ring-0) VM structure.
3790 * @param idCpu The VCPU id.
3791 * @param cPages The number of pages to allocate.
3792 * @param paPages Pointer to the page descriptors containing the page IDs
3793 * for each page.
3794 * @param enmAccount The account this relates to.
3795 * @thread EMT.
3796 */
3797GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3798{
3799 LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3800
3801 /*
3802 * Validate input and get the basics.
3803 */
3804 PGMM pGMM;
3805 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3806 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3807 if (RT_FAILURE(rc))
3808 return rc;
3809
3810 AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3811 AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3812 AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3813
3814 for (unsigned iPage = 0; iPage < cPages; iPage++)
3815 AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3816 /*|| paPages[iPage].idPage == NIL_GMM_PAGEID*/,
3817 ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3818
3819 /*
3820 * Take the semaphore and call the worker function.
3821 */
3822 gmmR0MutexAcquire(pGMM);
3823 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3824 {
3825 rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3826 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3827 }
3828 else
3829 rc = VERR_GMM_IS_NOT_SANE;
3830 gmmR0MutexRelease(pGMM);
3831 LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3832 return rc;
3833}
3834
3835
3836/**
3837 * VMMR0 request wrapper for GMMR0FreePages.
3838 *
3839 * @returns see GMMR0FreePages.
3840 * @param pGVM The global (ring-0) VM structure.
3841 * @param idCpu The VCPU id.
3842 * @param pReq Pointer to the request packet.
3843 */
3844GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3845{
3846 /*
3847 * Validate input and pass it on.
3848 */
3849 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3850 AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3851 ("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3852 VERR_INVALID_PARAMETER);
3853 AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3854 ("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3855 VERR_INVALID_PARAMETER);
3856
3857 return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3858}
3859
3860
3861/**
3862 * Report back on a memory ballooning request.
3863 *
3864 * The request may or may not have been initiated by the GMM. If it was initiated
3865 * by the GMM it is important that this function is called even if no pages were
3866 * ballooned.
3867 *
3868 * @returns VBox status code:
3869 * @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3870 * @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3871 * @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3872 * indicating that we won't necessarily have sufficient RAM to boot
3873 * the VM again and that it should pause until this changes (we'll try
3874 * balloon some other VM). (For standard deflate we have little choice
3875 * but to hope the VM won't use the memory that was returned to it.)
3876 *
3877 * @param pGVM The global (ring-0) VM structure.
3878 * @param idCpu The VCPU id.
3879 * @param enmAction Inflate/deflate/reset.
3880 * @param cBalloonedPages The number of pages that was ballooned.
3881 *
3882 * @thread EMT(idCpu)
3883 */
3884GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3885{
3886 LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3887 pGVM, enmAction, cBalloonedPages));
3888
3889 AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3890
3891 /*
3892 * Validate input and get the basics.
3893 */
3894 PGMM pGMM;
3895 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3896 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3897 if (RT_FAILURE(rc))
3898 return rc;
3899
3900 /*
3901 * Take the semaphore and do some more validations.
3902 */
3903 gmmR0MutexAcquire(pGMM);
3904 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3905 {
3906 switch (enmAction)
3907 {
3908 case GMMBALLOONACTION_INFLATE:
3909 {
3910 if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3911 <= pGVM->gmm.s.Stats.Reserved.cBasePages))
3912 {
3913 /*
3914 * Record the ballooned memory.
3915 */
3916 pGMM->cBalloonedPages += cBalloonedPages;
3917 if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3918 {
3919 /* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3920 AssertFailed();
3921
3922 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3923 pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3924 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3925 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3926 pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3927 }
3928 else
3929 {
3930 pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3931 Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3932 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3933 }
3934 }
3935 else
3936 {
3937 Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3938 pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3939 pGVM->gmm.s.Stats.Reserved.cBasePages));
3940 rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3941 }
3942 break;
3943 }
3944
3945 case GMMBALLOONACTION_DEFLATE:
3946 {
3947 /* Deflate. */
3948 if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3949 {
3950 /*
3951 * Record the ballooned memory.
3952 */
3953 Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3954 pGMM->cBalloonedPages -= cBalloonedPages;
3955 pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3956 if (pGVM->gmm.s.Stats.cReqDeflatePages)
3957 {
3958 AssertFailed(); /* This is path is for later. */
3959 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3960 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3961
3962 /*
3963 * Anything we need to do here now when the request has been completed?
3964 */
3965 pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3966 }
3967 else
3968 Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3969 cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3970 }
3971 else
3972 {
3973 Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3974 rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3975 }
3976 break;
3977 }
3978
3979 case GMMBALLOONACTION_RESET:
3980 {
3981 /* Reset to an empty balloon. */
3982 Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3983
3984 pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3985 pGVM->gmm.s.Stats.cBalloonedPages = 0;
3986 break;
3987 }
3988
3989 default:
3990 rc = VERR_INVALID_PARAMETER;
3991 break;
3992 }
3993 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3994 }
3995 else
3996 rc = VERR_GMM_IS_NOT_SANE;
3997
3998 gmmR0MutexRelease(pGMM);
3999 LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
4000 return rc;
4001}
4002
4003
4004/**
4005 * VMMR0 request wrapper for GMMR0BalloonedPages.
4006 *
4007 * @returns see GMMR0BalloonedPages.
4008 * @param pGVM The global (ring-0) VM structure.
4009 * @param idCpu The VCPU id.
4010 * @param pReq Pointer to the request packet.
4011 */
4012GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
4013{
4014 /*
4015 * Validate input and pass it on.
4016 */
4017 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4018 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
4019 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
4020 VERR_INVALID_PARAMETER);
4021
4022 return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
4023}
4024
4025
4026/**
4027 * Return memory statistics for the hypervisor
4028 *
4029 * @returns VBox status code.
4030 * @param pReq Pointer to the request packet.
4031 */
4032GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
4033{
4034 /*
4035 * Validate input and pass it on.
4036 */
4037 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4038 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4039 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4040 VERR_INVALID_PARAMETER);
4041
4042 /*
4043 * Validate input and get the basics.
4044 */
4045 PGMM pGMM;
4046 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4047 pReq->cAllocPages = pGMM->cAllocatedPages;
4048 pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
4049 pReq->cBalloonedPages = pGMM->cBalloonedPages;
4050 pReq->cMaxPages = pGMM->cMaxPages;
4051 pReq->cSharedPages = pGMM->cDuplicatePages;
4052 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4053
4054 return VINF_SUCCESS;
4055}
4056
4057
4058/**
4059 * Return memory statistics for the VM
4060 *
4061 * @returns VBox status code.
4062 * @param pGVM The global (ring-0) VM structure.
4063 * @param idCpu Cpu id.
4064 * @param pReq Pointer to the request packet.
4065 *
4066 * @thread EMT(idCpu)
4067 */
4068GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4069{
4070 /*
4071 * Validate input and pass it on.
4072 */
4073 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4074 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4075 ("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4076 VERR_INVALID_PARAMETER);
4077
4078 /*
4079 * Validate input and get the basics.
4080 */
4081 PGMM pGMM;
4082 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4083 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4084 if (RT_FAILURE(rc))
4085 return rc;
4086
4087 /*
4088 * Take the semaphore and do some more validations.
4089 */
4090 gmmR0MutexAcquire(pGMM);
4091 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4092 {
4093 pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4094 pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4095 pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4096 pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4097 }
4098 else
4099 rc = VERR_GMM_IS_NOT_SANE;
4100
4101 gmmR0MutexRelease(pGMM);
4102 LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4103 return rc;
4104}
4105
4106
4107/**
4108 * Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4109 *
4110 * Don't call this in legacy allocation mode!
4111 *
4112 * @returns VBox status code.
4113 * @param pGMM Pointer to the GMM instance data.
4114 * @param pGVM Pointer to the Global VM structure.
4115 * @param pChunk Pointer to the chunk to be unmapped.
4116 */
4117static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4118{
4119 RT_NOREF_PV(pGMM);
4120#ifdef GMM_WITH_LEGACY_MODE
4121 Assert(!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE));
4122#endif
4123
4124 /*
4125 * Find the mapping and try unmapping it.
4126 */
4127 uint32_t cMappings = pChunk->cMappingsX;
4128 for (uint32_t i = 0; i < cMappings; i++)
4129 {
4130 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4131 if (pChunk->paMappingsX[i].pGVM == pGVM)
4132 {
4133 /* unmap */
4134 int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4135 if (RT_SUCCESS(rc))
4136 {
4137 /* update the record. */
4138 cMappings--;
4139 if (i < cMappings)
4140 pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4141 pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4142 pChunk->paMappingsX[cMappings].pGVM = NULL;
4143 Assert(pChunk->cMappingsX - 1U == cMappings);
4144 pChunk->cMappingsX = cMappings;
4145 }
4146
4147 return rc;
4148 }
4149 }
4150
4151 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4152 return VERR_GMM_CHUNK_NOT_MAPPED;
4153}
4154
4155
4156/**
4157 * Unmaps a chunk previously mapped into the address space of the current process.
4158 *
4159 * @returns VBox status code.
4160 * @param pGMM Pointer to the GMM instance data.
4161 * @param pGVM Pointer to the Global VM structure.
4162 * @param pChunk Pointer to the chunk to be unmapped.
4163 * @param fRelaxedSem Whether we can release the semaphore while doing the
4164 * mapping (@c true) or not.
4165 */
4166static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4167{
4168#ifdef GMM_WITH_LEGACY_MODE
4169 if (!pGMM->fLegacyAllocationMode || (pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4170 {
4171#endif
4172 /*
4173 * Lock the chunk and if possible leave the giant GMM lock.
4174 */
4175 GMMR0CHUNKMTXSTATE MtxState;
4176 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4177 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4178 if (RT_SUCCESS(rc))
4179 {
4180 rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4181 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4182 }
4183 return rc;
4184#ifdef GMM_WITH_LEGACY_MODE
4185 }
4186
4187 if (pChunk->hGVM == pGVM->hSelf)
4188 return VINF_SUCCESS;
4189
4190 Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4191 return VERR_GMM_CHUNK_NOT_MAPPED;
4192#endif
4193}
4194
4195
4196/**
4197 * Worker for gmmR0MapChunk.
4198 *
4199 * @returns VBox status code.
4200 * @param pGMM Pointer to the GMM instance data.
4201 * @param pGVM Pointer to the Global VM structure.
4202 * @param pChunk Pointer to the chunk to be mapped.
4203 * @param ppvR3 Where to store the ring-3 address of the mapping.
4204 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4205 * contain the address of the existing mapping.
4206 */
4207static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4208{
4209#ifdef GMM_WITH_LEGACY_MODE
4210 /*
4211 * If we're in legacy mode this is simple.
4212 */
4213 if (pGMM->fLegacyAllocationMode && !(pChunk->fFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
4214 {
4215 if (pChunk->hGVM != pGVM->hSelf)
4216 {
4217 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4218 return VERR_GMM_CHUNK_NOT_FOUND;
4219 }
4220
4221 *ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4222 return VINF_SUCCESS;
4223 }
4224#else
4225 RT_NOREF(pGMM);
4226#endif
4227
4228 /*
4229 * Check to see if the chunk is already mapped.
4230 */
4231 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4232 {
4233 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4234 if (pChunk->paMappingsX[i].pGVM == pGVM)
4235 {
4236 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4237 Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4238#ifdef VBOX_WITH_PAGE_SHARING
4239 /* The ring-3 chunk cache can be out of sync; don't fail. */
4240 return VINF_SUCCESS;
4241#else
4242 return VERR_GMM_CHUNK_ALREADY_MAPPED;
4243#endif
4244 }
4245 }
4246
4247 /*
4248 * Do the mapping.
4249 */
4250 RTR0MEMOBJ hMapObj;
4251 int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4252 if (RT_SUCCESS(rc))
4253 {
4254 /* reallocate the array? assumes few users per chunk (usually one). */
4255 unsigned iMapping = pChunk->cMappingsX;
4256 if ( iMapping <= 3
4257 || (iMapping & 3) == 0)
4258 {
4259 unsigned cNewSize = iMapping <= 3
4260 ? iMapping + 1
4261 : iMapping + 4;
4262 Assert(cNewSize < 4 || RT_ALIGN_32(cNewSize, 4) == cNewSize);
4263 if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4264 {
4265 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4266 return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4267 }
4268
4269 void *pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize * sizeof(pChunk->paMappingsX[0]));
4270 if (RT_UNLIKELY(!pvMappings))
4271 {
4272 rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4273 return VERR_NO_MEMORY;
4274 }
4275 pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4276 }
4277
4278 /* insert new entry */
4279 pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4280 pChunk->paMappingsX[iMapping].pGVM = pGVM;
4281 Assert(pChunk->cMappingsX == iMapping);
4282 pChunk->cMappingsX = iMapping + 1;
4283
4284 *ppvR3 = RTR0MemObjAddressR3(hMapObj);
4285 }
4286
4287 return rc;
4288}
4289
4290
4291/**
4292 * Maps a chunk into the user address space of the current process.
4293 *
4294 * @returns VBox status code.
4295 * @param pGMM Pointer to the GMM instance data.
4296 * @param pGVM Pointer to the Global VM structure.
4297 * @param pChunk Pointer to the chunk to be mapped.
4298 * @param fRelaxedSem Whether we can release the semaphore while doing the
4299 * mapping (@c true) or not.
4300 * @param ppvR3 Where to store the ring-3 address of the mapping.
4301 * In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4302 * contain the address of the existing mapping.
4303 */
4304static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4305{
4306 /*
4307 * Take the chunk lock and leave the giant GMM lock when possible, then
4308 * call the worker function.
4309 */
4310 GMMR0CHUNKMTXSTATE MtxState;
4311 int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4312 fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4313 if (RT_SUCCESS(rc))
4314 {
4315 rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4316 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4317 }
4318
4319 return rc;
4320}
4321
4322
4323
4324#if defined(VBOX_WITH_PAGE_SHARING) || (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4325/**
4326 * Check if a chunk is mapped into the specified VM
4327 *
4328 * @returns mapped yes/no
4329 * @param pGMM Pointer to the GMM instance.
4330 * @param pGVM Pointer to the Global VM structure.
4331 * @param pChunk Pointer to the chunk to be mapped.
4332 * @param ppvR3 Where to store the ring-3 address of the mapping.
4333 */
4334static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4335{
4336 GMMR0CHUNKMTXSTATE MtxState;
4337 gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4338 for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4339 {
4340 Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4341 if (pChunk->paMappingsX[i].pGVM == pGVM)
4342 {
4343 *ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4344 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4345 return true;
4346 }
4347 }
4348 *ppvR3 = NULL;
4349 gmmR0ChunkMutexRelease(&MtxState, pChunk);
4350 return false;
4351}
4352#endif /* VBOX_WITH_PAGE_SHARING || (VBOX_STRICT && 64-BIT) */
4353
4354
4355/**
4356 * Map a chunk and/or unmap another chunk.
4357 *
4358 * The mapping and unmapping applies to the current process.
4359 *
4360 * This API does two things because it saves a kernel call per mapping when
4361 * when the ring-3 mapping cache is full.
4362 *
4363 * @returns VBox status code.
4364 * @param pGVM The global (ring-0) VM structure.
4365 * @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4366 * @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4367 * @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4368 * @thread EMT ???
4369 */
4370GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4371{
4372 LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4373 pGVM, idChunkMap, idChunkUnmap, ppvR3));
4374
4375 /*
4376 * Validate input and get the basics.
4377 */
4378 PGMM pGMM;
4379 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4380 int rc = GVMMR0ValidateGVM(pGVM);
4381 if (RT_FAILURE(rc))
4382 return rc;
4383
4384 AssertCompile(NIL_GMM_CHUNKID == 0);
4385 AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4386 AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4387
4388 if ( idChunkMap == NIL_GMM_CHUNKID
4389 && idChunkUnmap == NIL_GMM_CHUNKID)
4390 return VERR_INVALID_PARAMETER;
4391
4392 if (idChunkMap != NIL_GMM_CHUNKID)
4393 {
4394 AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4395 *ppvR3 = NIL_RTR3PTR;
4396 }
4397
4398 /*
4399 * Take the semaphore and do the work.
4400 *
4401 * The unmapping is done last since it's easier to undo a mapping than
4402 * undoing an unmapping. The ring-3 mapping cache cannot not be so big
4403 * that it pushes the user virtual address space to within a chunk of
4404 * it it's limits, so, no problem here.
4405 */
4406 gmmR0MutexAcquire(pGMM);
4407 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4408 {
4409 PGMMCHUNK pMap = NULL;
4410 if (idChunkMap != NIL_GVM_HANDLE)
4411 {
4412 pMap = gmmR0GetChunk(pGMM, idChunkMap);
4413 if (RT_LIKELY(pMap))
4414 rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /*fRelaxedSem*/, ppvR3);
4415 else
4416 {
4417 Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4418 rc = VERR_GMM_CHUNK_NOT_FOUND;
4419 }
4420 }
4421/** @todo split this operation, the bail out might (theoretcially) not be
4422 * entirely safe. */
4423
4424 if ( idChunkUnmap != NIL_GMM_CHUNKID
4425 && RT_SUCCESS(rc))
4426 {
4427 PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4428 if (RT_LIKELY(pUnmap))
4429 rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /*fRelaxedSem*/);
4430 else
4431 {
4432 Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4433 rc = VERR_GMM_CHUNK_NOT_FOUND;
4434 }
4435
4436 if (RT_FAILURE(rc) && pMap)
4437 gmmR0UnmapChunk(pGMM, pGVM, pMap, false /*fRelaxedSem*/);
4438 }
4439
4440 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4441 }
4442 else
4443 rc = VERR_GMM_IS_NOT_SANE;
4444 gmmR0MutexRelease(pGMM);
4445
4446 LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4447 return rc;
4448}
4449
4450
4451/**
4452 * VMMR0 request wrapper for GMMR0MapUnmapChunk.
4453 *
4454 * @returns see GMMR0MapUnmapChunk.
4455 * @param pGVM The global (ring-0) VM structure.
4456 * @param pReq Pointer to the request packet.
4457 */
4458GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4459{
4460 /*
4461 * Validate input and pass it on.
4462 */
4463 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4464 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4465
4466 return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4467}
4468
4469
4470/**
4471 * Legacy mode API for supplying pages.
4472 *
4473 * The specified user address points to a allocation chunk sized block that
4474 * will be locked down and used by the GMM when the GM asks for pages.
4475 *
4476 * @returns VBox status code.
4477 * @param pGVM The global (ring-0) VM structure.
4478 * @param idCpu The VCPU id.
4479 * @param pvR3 Pointer to the chunk size memory block to lock down.
4480 */
4481GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4482{
4483#ifdef GMM_WITH_LEGACY_MODE
4484 /*
4485 * Validate input and get the basics.
4486 */
4487 PGMM pGMM;
4488 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4489 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4490 if (RT_FAILURE(rc))
4491 return rc;
4492
4493 AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4494 AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4495
4496 if (!pGMM->fLegacyAllocationMode)
4497 {
4498 Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4499 return VERR_NOT_SUPPORTED;
4500 }
4501
4502 /*
4503 * Lock the memory and add it as new chunk with our hGVM.
4504 * (The GMM locking is done inside gmmR0RegisterChunk.)
4505 */
4506 RTR0MEMOBJ hMemObj;
4507 rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ | RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4508 if (RT_SUCCESS(rc))
4509 {
4510 rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_SEEDED, NULL);
4511 if (RT_SUCCESS(rc))
4512 gmmR0MutexRelease(pGMM);
4513 else
4514 RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4515 }
4516
4517 LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4518 return rc;
4519#else
4520 RT_NOREF(pGVM, idCpu, pvR3);
4521 return VERR_NOT_SUPPORTED;
4522#endif
4523}
4524
4525
4526#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4527/**
4528 * Gets the ring-0 virtual address for the given page.
4529 *
4530 * This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4531 * One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4532 * corresponding chunk will remain valid beyond the call (at least till the EMT
4533 * returns to ring-3).
4534 *
4535 * @returns VBox status code.
4536 * @param pGVM Pointer to the kernel-only VM instace data.
4537 * @param idPage The page ID.
4538 * @param ppv Where to store the address.
4539 * @thread EMT
4540 */
4541GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4542{
4543 *ppv = NULL;
4544 PGMM pGMM;
4545 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4546
4547 uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4548
4549 /*
4550 * Start with the per-VM TLB.
4551 */
4552 RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4553
4554 PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4555 PGMMCHUNK pChunk = pTlbe->pChunk;
4556 if ( pChunk != NULL
4557 && pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4558 && pChunk->Core.Key == idChunk)
4559 pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4560 else
4561 {
4562 pGVM->R0Stats.gmm.cChunkTlbMisses++;
4563
4564 /*
4565 * Look it up in the chunk tree.
4566 */
4567 RTSpinlockAcquire(pGMM->hSpinLockTree);
4568 pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4569 if (RT_LIKELY(pChunk))
4570 {
4571 pTlbe->idGeneration = pGMM->idFreeGeneration;
4572 RTSpinlockRelease(pGMM->hSpinLockTree);
4573 pTlbe->pChunk = pChunk;
4574 }
4575 else
4576 {
4577 RTSpinlockRelease(pGMM->hSpinLockTree);
4578 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4579 AssertMsgFailed(("idPage=%#x\n", idPage));
4580 return VERR_GMM_PAGE_NOT_FOUND;
4581 }
4582 }
4583
4584 RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4585
4586 /*
4587 * Got a chunk, now validate the page ownership and calcuate it's address.
4588 */
4589 const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4590 if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4591 && pPage->Private.hGVM == pGVM->hSelf)
4592 || GMM_PAGE_IS_SHARED(pPage)))
4593 {
4594 AssertPtr(pChunk->pbMapping);
4595 *ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4596 return VINF_SUCCESS;
4597 }
4598 AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4599 idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4600 return VERR_GMM_NOT_PAGE_OWNER;
4601}
4602#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4603
4604#ifdef VBOX_WITH_PAGE_SHARING
4605
4606# ifdef VBOX_STRICT
4607/**
4608 * For checksumming shared pages in strict builds.
4609 *
4610 * The purpose is making sure that a page doesn't change.
4611 *
4612 * @returns Checksum, 0 on failure.
4613 * @param pGMM The GMM instance data.
4614 * @param pGVM Pointer to the kernel-only VM instace data.
4615 * @param idPage The page ID.
4616 */
4617static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4618{
4619 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4620 AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4621
4622 uint8_t *pbChunk;
4623 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4624 return 0;
4625 uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4626
4627 return RTCrc32(pbPage, PAGE_SIZE);
4628}
4629# endif /* VBOX_STRICT */
4630
4631
4632/**
4633 * Calculates the module hash value.
4634 *
4635 * @returns Hash value.
4636 * @param pszModuleName The module name.
4637 * @param pszVersion The module version string.
4638 */
4639static uint32_t gmmR0ShModCalcHash(const char *pszModuleName, const char *pszVersion)
4640{
4641 return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4642}
4643
4644
4645/**
4646 * Finds a global module.
4647 *
4648 * @returns Pointer to the global module on success, NULL if not found.
4649 * @param pGMM The GMM instance data.
4650 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4651 * @param cbModule The module size.
4652 * @param enmGuestOS The guest OS type.
4653 * @param cRegions The number of regions.
4654 * @param pszModuleName The module name.
4655 * @param pszVersion The module version.
4656 * @param paRegions The region descriptions.
4657 */
4658static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4659 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4660 struct VMMDEVSHAREDREGIONDESC const *paRegions)
4661{
4662 for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4663 pGblMod;
4664 pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4665 {
4666 if (pGblMod->cbModule != cbModule)
4667 continue;
4668 if (pGblMod->enmGuestOS != enmGuestOS)
4669 continue;
4670 if (pGblMod->cRegions != cRegions)
4671 continue;
4672 if (strcmp(pGblMod->szName, pszModuleName))
4673 continue;
4674 if (strcmp(pGblMod->szVersion, pszVersion))
4675 continue;
4676
4677 uint32_t i;
4678 for (i = 0; i < cRegions; i++)
4679 {
4680 uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4681 if (pGblMod->aRegions[i].off != off)
4682 break;
4683
4684 uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4685 if (pGblMod->aRegions[i].cb != cb)
4686 break;
4687 }
4688
4689 if (i == cRegions)
4690 return pGblMod;
4691 }
4692
4693 return NULL;
4694}
4695
4696
4697/**
4698 * Creates a new global module.
4699 *
4700 * @returns VBox status code.
4701 * @param pGMM The GMM instance data.
4702 * @param uHash The hash as calculated by gmmR0ShModCalcHash.
4703 * @param cbModule The module size.
4704 * @param enmGuestOS The guest OS type.
4705 * @param cRegions The number of regions.
4706 * @param pszModuleName The module name.
4707 * @param pszVersion The module version.
4708 * @param paRegions The region descriptions.
4709 * @param ppGblMod Where to return the new module on success.
4710 */
4711static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4712 uint32_t cRegions, const char *pszModuleName, const char *pszVersion,
4713 struct VMMDEVSHAREDREGIONDESC const *paRegions, PGMMSHAREDMODULE *ppGblMod)
4714{
4715 Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4716 if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4717 {
4718 Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4719 return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4720 }
4721
4722 PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4723 if (!pGblMod)
4724 {
4725 Log(("gmmR0ShModNewGlobal: No memory\n"));
4726 return VERR_NO_MEMORY;
4727 }
4728
4729 pGblMod->Core.Key = uHash;
4730 pGblMod->cbModule = cbModule;
4731 pGblMod->cRegions = cRegions;
4732 pGblMod->cUsers = 1;
4733 pGblMod->enmGuestOS = enmGuestOS;
4734 strcpy(pGblMod->szName, pszModuleName);
4735 strcpy(pGblMod->szVersion, pszVersion);
4736
4737 for (uint32_t i = 0; i < cRegions; i++)
4738 {
4739 Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4740 pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4741 pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4742 pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4743 pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4744 }
4745
4746 bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4747 Assert(fInsert); NOREF(fInsert);
4748 pGMM->cShareableModules++;
4749
4750 *ppGblMod = pGblMod;
4751 return VINF_SUCCESS;
4752}
4753
4754
4755/**
4756 * Deletes a global module which is no longer referenced by anyone.
4757 *
4758 * @param pGMM The GMM instance data.
4759 * @param pGblMod The module to delete.
4760 */
4761static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4762{
4763 Assert(pGblMod->cUsers == 0);
4764 Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4765
4766 void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4767 Assert(pvTest == pGblMod); NOREF(pvTest);
4768 pGMM->cShareableModules--;
4769
4770 uint32_t i = pGblMod->cRegions;
4771 while (i-- > 0)
4772 {
4773 if (pGblMod->aRegions[i].paidPages)
4774 {
4775 /* We don't doing anything to the pages as they are handled by the
4776 copy-on-write mechanism in PGM. */
4777 RTMemFree(pGblMod->aRegions[i].paidPages);
4778 pGblMod->aRegions[i].paidPages = NULL;
4779 }
4780 }
4781 RTMemFree(pGblMod);
4782}
4783
4784
4785static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4786 PGMMSHAREDMODULEPERVM *ppRecVM)
4787{
4788 if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4789 return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4790
4791 PGMMSHAREDMODULEPERVM pRecVM;
4792 pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4793 if (!pRecVM)
4794 return VERR_NO_MEMORY;
4795
4796 pRecVM->Core.Key = GCBaseAddr;
4797 for (uint32_t i = 0; i < cRegions; i++)
4798 pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4799
4800 bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4801 Assert(fInsert); NOREF(fInsert);
4802 pGVM->gmm.s.Stats.cShareableModules++;
4803
4804 *ppRecVM = pRecVM;
4805 return VINF_SUCCESS;
4806}
4807
4808
4809static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4810{
4811 /*
4812 * Free the per-VM module.
4813 */
4814 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4815 pRecVM->pGlobalModule = NULL;
4816
4817 if (fRemove)
4818 {
4819 void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4820 Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4821 }
4822
4823 RTMemFree(pRecVM);
4824
4825 /*
4826 * Release the global module.
4827 * (In the registration bailout case, it might not be.)
4828 */
4829 if (pGblMod)
4830 {
4831 Assert(pGblMod->cUsers > 0);
4832 pGblMod->cUsers--;
4833 if (pGblMod->cUsers == 0)
4834 gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4835 }
4836}
4837
4838#endif /* VBOX_WITH_PAGE_SHARING */
4839
4840/**
4841 * Registers a new shared module for the VM.
4842 *
4843 * @returns VBox status code.
4844 * @param pGVM The global (ring-0) VM structure.
4845 * @param idCpu The VCPU id.
4846 * @param enmGuestOS The guest OS type.
4847 * @param pszModuleName The module name.
4848 * @param pszVersion The module version.
4849 * @param GCPtrModBase The module base address.
4850 * @param cbModule The module size.
4851 * @param cRegions The mumber of shared region descriptors.
4852 * @param paRegions Pointer to an array of shared region(s).
4853 * @thread EMT(idCpu)
4854 */
4855GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4856 char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4857 uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4858{
4859#ifdef VBOX_WITH_PAGE_SHARING
4860 /*
4861 * Validate input and get the basics.
4862 *
4863 * Note! Turns out the module size does necessarily match the size of the
4864 * regions. (iTunes on XP)
4865 */
4866 PGMM pGMM;
4867 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4868 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4869 if (RT_FAILURE(rc))
4870 return rc;
4871
4872 if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4873 return VERR_GMM_TOO_MANY_REGIONS;
4874
4875 if (RT_UNLIKELY(cbModule == 0 || cbModule > _1G))
4876 return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4877
4878 uint32_t cbTotal = 0;
4879 for (uint32_t i = 0; i < cRegions; i++)
4880 {
4881 if (RT_UNLIKELY(paRegions[i].cbRegion == 0 || paRegions[i].cbRegion > _1G))
4882 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4883
4884 cbTotal += paRegions[i].cbRegion;
4885 if (RT_UNLIKELY(cbTotal > _1G))
4886 return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4887 }
4888
4889 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4890 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4891 return VERR_GMM_MODULE_NAME_TOO_LONG;
4892
4893 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4894 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4895 return VERR_GMM_MODULE_NAME_TOO_LONG;
4896
4897 uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4898 Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4899
4900 /*
4901 * Take the semaphore and do some more validations.
4902 */
4903 gmmR0MutexAcquire(pGMM);
4904 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4905 {
4906 /*
4907 * Check if this module is already locally registered and register
4908 * it if it isn't. The base address is a unique module identifier
4909 * locally.
4910 */
4911 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4912 bool fNewModule = pRecVM == NULL;
4913 if (fNewModule)
4914 {
4915 rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4916 if (RT_SUCCESS(rc))
4917 {
4918 /*
4919 * Find a matching global module, register a new one if needed.
4920 */
4921 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4922 pszModuleName, pszVersion, paRegions);
4923 if (!pGblMod)
4924 {
4925 Assert(fNewModule);
4926 rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4927 pszModuleName, pszVersion, paRegions, &pGblMod);
4928 if (RT_SUCCESS(rc))
4929 {
4930 pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4931 Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4932 }
4933 else
4934 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
4935 }
4936 else
4937 {
4938 Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4939 pGblMod->cUsers++;
4940 pRecVM->pGlobalModule = pGblMod;
4941
4942 Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4943 }
4944 }
4945 }
4946 else
4947 {
4948 /*
4949 * Attempt to re-register an existing module.
4950 */
4951 PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4952 pszModuleName, pszVersion, paRegions);
4953 if (pRecVM->pGlobalModule == pGblMod)
4954 {
4955 Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4956 rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4957 }
4958 else
4959 {
4960 /** @todo may have to unregister+register when this happens in case it's caused
4961 * by VBoxService crashing and being restarted... */
4962 Log(("GMMR0RegisterSharedModule: Address clash!\n"
4963 " incoming at %RGvLB%#x %s %s rgns %u\n"
4964 " existing at %RGvLB%#x %s %s rgns %u\n",
4965 GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4966 pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4967 pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4968 rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4969 }
4970 }
4971 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4972 }
4973 else
4974 rc = VERR_GMM_IS_NOT_SANE;
4975
4976 gmmR0MutexRelease(pGMM);
4977 return rc;
4978#else
4979
4980 NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4981 NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4982 return VERR_NOT_IMPLEMENTED;
4983#endif
4984}
4985
4986
4987/**
4988 * VMMR0 request wrapper for GMMR0RegisterSharedModule.
4989 *
4990 * @returns see GMMR0RegisterSharedModule.
4991 * @param pGVM The global (ring-0) VM structure.
4992 * @param idCpu The VCPU id.
4993 * @param pReq Pointer to the request packet.
4994 */
4995GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4996{
4997 /*
4998 * Validate input and pass it on.
4999 */
5000 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5001 AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
5002 && pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
5003 ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5004
5005 /* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
5006 pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
5007 pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
5008 return VINF_SUCCESS;
5009}
5010
5011
5012/**
5013 * Unregisters a shared module for the VM
5014 *
5015 * @returns VBox status code.
5016 * @param pGVM The global (ring-0) VM structure.
5017 * @param idCpu The VCPU id.
5018 * @param pszModuleName The module name.
5019 * @param pszVersion The module version.
5020 * @param GCPtrModBase The module base address.
5021 * @param cbModule The module size.
5022 */
5023GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char *pszModuleName, char *pszVersion,
5024 RTGCPTR GCPtrModBase, uint32_t cbModule)
5025{
5026#ifdef VBOX_WITH_PAGE_SHARING
5027 /*
5028 * Validate input and get the basics.
5029 */
5030 PGMM pGMM;
5031 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5032 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5033 if (RT_FAILURE(rc))
5034 return rc;
5035
5036 AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
5037 AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
5038 if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
5039 return VERR_GMM_MODULE_NAME_TOO_LONG;
5040 if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
5041 return VERR_GMM_MODULE_NAME_TOO_LONG;
5042
5043 Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
5044
5045 /*
5046 * Take the semaphore and do some more validations.
5047 */
5048 gmmR0MutexAcquire(pGMM);
5049 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5050 {
5051 /*
5052 * Locate and remove the specified module.
5053 */
5054 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
5055 if (pRecVM)
5056 {
5057 /** @todo Do we need to do more validations here, like that the
5058 * name + version + cbModule matches? */
5059 NOREF(cbModule);
5060 Assert(pRecVM->pGlobalModule);
5061 gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /*fRemove*/);
5062 }
5063 else
5064 rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
5065
5066 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5067 }
5068 else
5069 rc = VERR_GMM_IS_NOT_SANE;
5070
5071 gmmR0MutexRelease(pGMM);
5072 return rc;
5073#else
5074
5075 NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
5076 return VERR_NOT_IMPLEMENTED;
5077#endif
5078}
5079
5080
5081/**
5082 * VMMR0 request wrapper for GMMR0UnregisterSharedModule.
5083 *
5084 * @returns see GMMR0UnregisterSharedModule.
5085 * @param pGVM The global (ring-0) VM structure.
5086 * @param idCpu The VCPU id.
5087 * @param pReq Pointer to the request packet.
5088 */
5089GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
5090{
5091 /*
5092 * Validate input and pass it on.
5093 */
5094 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5095 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5096
5097 return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
5098}
5099
5100#ifdef VBOX_WITH_PAGE_SHARING
5101
5102/**
5103 * Increase the use count of a shared page, the page is known to exist and be valid and such.
5104 *
5105 * @param pGMM Pointer to the GMM instance.
5106 * @param pGVM Pointer to the GVM instance.
5107 * @param pPage The page structure.
5108 */
5109DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
5110{
5111 Assert(pGMM->cSharedPages > 0);
5112 Assert(pGMM->cAllocatedPages > 0);
5113
5114 pGMM->cDuplicatePages++;
5115
5116 pPage->Shared.cRefs++;
5117 pGVM->gmm.s.Stats.cSharedPages++;
5118 pGVM->gmm.s.Stats.Allocated.cBasePages++;
5119}
5120
5121
5122/**
5123 * Converts a private page to a shared page, the page is known to exist and be valid and such.
5124 *
5125 * @param pGMM Pointer to the GMM instance.
5126 * @param pGVM Pointer to the GVM instance.
5127 * @param HCPhys Host physical address
5128 * @param idPage The Page ID
5129 * @param pPage The page structure.
5130 * @param pPageDesc Shared page descriptor
5131 */
5132DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
5133 PGMMSHAREDPAGEDESC pPageDesc)
5134{
5135 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
5136 Assert(pChunk);
5137 Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
5138 Assert(GMM_PAGE_IS_PRIVATE(pPage));
5139
5140 pChunk->cPrivate--;
5141 pChunk->cShared++;
5142
5143 pGMM->cSharedPages++;
5144
5145 pGVM->gmm.s.Stats.cSharedPages++;
5146 pGVM->gmm.s.Stats.cPrivatePages--;
5147
5148 /* Modify the page structure. */
5149 pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5150 pPage->Shared.cRefs = 1;
5151#ifdef VBOX_STRICT
5152 pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5153 pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5154#else
5155 NOREF(pPageDesc);
5156 pPage->Shared.u14Checksum = 0;
5157#endif
5158 pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5159}
5160
5161
5162static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5163 unsigned idxRegion, unsigned idxPage,
5164 PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5165{
5166 NOREF(pModule);
5167
5168 /* Easy case: just change the internal page type. */
5169 PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5170 AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5171 pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5172 VERR_PGM_PHYS_INVALID_PAGE_ID);
5173 NOREF(idxRegion);
5174
5175 AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5176
5177 gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5178
5179 /* Keep track of these references. */
5180 pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5181
5182 return VINF_SUCCESS;
5183}
5184
5185/**
5186 * Checks specified shared module range for changes
5187 *
5188 * Performs the following tasks:
5189 * - If a shared page is new, then it changes the GMM page type to shared and
5190 * returns it in the pPageDesc descriptor.
5191 * - If a shared page already exists, then it checks if the VM page is
5192 * identical and if so frees the VM page and returns the shared page in
5193 * pPageDesc descriptor.
5194 *
5195 * @remarks ASSUMES the caller has acquired the GMM semaphore!!
5196 *
5197 * @returns VBox status code.
5198 * @param pGVM Pointer to the GVM instance data.
5199 * @param pModule Module description
5200 * @param idxRegion Region index
5201 * @param idxPage Page index
5202 * @param pPageDesc Page descriptor
5203 */
5204GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5205 PGMMSHAREDPAGEDESC pPageDesc)
5206{
5207 int rc;
5208 PGMM pGMM;
5209 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5210 pPageDesc->u32StrictChecksum = 0;
5211
5212 AssertMsgReturn(idxRegion < pModule->cRegions,
5213 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5214 VERR_INVALID_PARAMETER);
5215
5216 uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5217 AssertMsgReturn(idxPage < cPages,
5218 ("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5219 VERR_INVALID_PARAMETER);
5220
5221 LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5222
5223 /*
5224 * First time; create a page descriptor array.
5225 */
5226 PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5227 if (!pGlobalRegion->paidPages)
5228 {
5229 Log(("Allocate page descriptor array for %d pages\n", cPages));
5230 pGlobalRegion->paidPages = (uint32_t *)RTMemAlloc(cPages * sizeof(pGlobalRegion->paidPages[0]));
5231 AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5232
5233 /* Invalidate all descriptors. */
5234 uint32_t i = cPages;
5235 while (i-- > 0)
5236 pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5237 }
5238
5239 /*
5240 * We've seen this shared page for the first time?
5241 */
5242 if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5243 {
5244 Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5245 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5246 }
5247
5248 /*
5249 * We've seen it before...
5250 */
5251 Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5252 pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5253 Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5254
5255 /*
5256 * Get the shared page source.
5257 */
5258 PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5259 AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5260 VERR_PGM_PHYS_INVALID_PAGE_ID);
5261
5262 if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5263 {
5264 /*
5265 * Page was freed at some point; invalidate this entry.
5266 */
5267 /** @todo this isn't really bullet proof. */
5268 Log(("Old shared page was freed -> create a new one\n"));
5269 pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5270 return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5271 }
5272
5273 Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5274
5275 /*
5276 * Calculate the virtual address of the local page.
5277 */
5278 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5279 AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5280 VERR_PGM_PHYS_INVALID_PAGE_ID);
5281
5282 uint8_t *pbChunk;
5283 AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5284 ("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5285 VERR_PGM_PHYS_INVALID_PAGE_ID);
5286 uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5287
5288 /*
5289 * Calculate the virtual address of the shared page.
5290 */
5291 pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5292 Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5293
5294 /*
5295 * Get the virtual address of the physical page; map the chunk into the VM
5296 * process if not already done.
5297 */
5298 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5299 {
5300 Log(("Map chunk into process!\n"));
5301 rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5302 AssertRCReturn(rc, rc);
5303 }
5304 uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5305
5306#ifdef VBOX_STRICT
5307 pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5308 uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5309 AssertMsg(!uChecksum || uChecksum == pPage->Shared.u14Checksum || !pPage->Shared.u14Checksum,
5310 ("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5311 pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5312#endif
5313
5314 /** @todo write ASMMemComparePage. */
5315 if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5316 {
5317 Log(("Unexpected differences found between local and shared page; skip\n"));
5318 /* Signal to the caller that this one hasn't changed. */
5319 pPageDesc->idPage = NIL_GMM_PAGEID;
5320 return VINF_SUCCESS;
5321 }
5322
5323 /*
5324 * Free the old local page.
5325 */
5326 GMMFREEPAGEDESC PageDesc;
5327 PageDesc.idPage = pPageDesc->idPage;
5328 rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5329 AssertRCReturn(rc, rc);
5330
5331 gmmR0UseSharedPage(pGMM, pGVM, pPage);
5332
5333 /*
5334 * Pass along the new physical address & page id.
5335 */
5336 pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5337 pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5338
5339 return VINF_SUCCESS;
5340}
5341
5342
5343/**
5344 * RTAvlGCPtrDestroy callback.
5345 *
5346 * @returns 0 or VERR_GMM_INSTANCE.
5347 * @param pNode The node to destroy.
5348 * @param pvArgs Pointer to an argument packet.
5349 */
5350static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5351{
5352 gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5353 ((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5354 (PGMMSHAREDMODULEPERVM)pNode,
5355 false /*fRemove*/);
5356 return VINF_SUCCESS;
5357}
5358
5359
5360/**
5361 * Used by GMMR0CleanupVM to clean up shared modules.
5362 *
5363 * This is called without taking the GMM lock so that it can be yielded as
5364 * needed here.
5365 *
5366 * @param pGMM The GMM handle.
5367 * @param pGVM The global VM handle.
5368 */
5369static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5370{
5371 gmmR0MutexAcquire(pGMM);
5372 GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5373
5374 GMMR0SHMODPERVMDTORARGS Args;
5375 Args.pGVM = pGVM;
5376 Args.pGMM = pGMM;
5377 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5378
5379 AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5380 pGVM->gmm.s.Stats.cShareableModules = 0;
5381
5382 gmmR0MutexRelease(pGMM);
5383}
5384
5385#endif /* VBOX_WITH_PAGE_SHARING */
5386
5387/**
5388 * Removes all shared modules for the specified VM
5389 *
5390 * @returns VBox status code.
5391 * @param pGVM The global (ring-0) VM structure.
5392 * @param idCpu The VCPU id.
5393 */
5394GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5395{
5396#ifdef VBOX_WITH_PAGE_SHARING
5397 /*
5398 * Validate input and get the basics.
5399 */
5400 PGMM pGMM;
5401 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5402 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5403 if (RT_FAILURE(rc))
5404 return rc;
5405
5406 /*
5407 * Take the semaphore and do some more validations.
5408 */
5409 gmmR0MutexAcquire(pGMM);
5410 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5411 {
5412 Log(("GMMR0ResetSharedModules\n"));
5413 GMMR0SHMODPERVMDTORARGS Args;
5414 Args.pGVM = pGVM;
5415 Args.pGMM = pGMM;
5416 RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5417 pGVM->gmm.s.Stats.cShareableModules = 0;
5418
5419 rc = VINF_SUCCESS;
5420 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5421 }
5422 else
5423 rc = VERR_GMM_IS_NOT_SANE;
5424
5425 gmmR0MutexRelease(pGMM);
5426 return rc;
5427#else
5428 RT_NOREF(pGVM, idCpu);
5429 return VERR_NOT_IMPLEMENTED;
5430#endif
5431}
5432
5433#ifdef VBOX_WITH_PAGE_SHARING
5434
5435/**
5436 * Tree enumeration callback for checking a shared module.
5437 */
5438static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5439{
5440 GMMCHECKSHAREDMODULEINFO *pArgs = (GMMCHECKSHAREDMODULEINFO*)pvUser;
5441 PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5442 PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5443
5444 Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5445 pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5446
5447 int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5448 if (RT_FAILURE(rc))
5449 return rc;
5450 return VINF_SUCCESS;
5451}
5452
5453#endif /* VBOX_WITH_PAGE_SHARING */
5454
5455/**
5456 * Check all shared modules for the specified VM.
5457 *
5458 * @returns VBox status code.
5459 * @param pGVM The global (ring-0) VM structure.
5460 * @param idCpu The calling EMT number.
5461 * @thread EMT(idCpu)
5462 */
5463GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5464{
5465#ifdef VBOX_WITH_PAGE_SHARING
5466 /*
5467 * Validate input and get the basics.
5468 */
5469 PGMM pGMM;
5470 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5471 int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5472 if (RT_FAILURE(rc))
5473 return rc;
5474
5475# ifndef DEBUG_sandervl
5476 /*
5477 * Take the semaphore and do some more validations.
5478 */
5479 gmmR0MutexAcquire(pGMM);
5480# endif
5481 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5482 {
5483 /*
5484 * Walk the tree, checking each module.
5485 */
5486 Log(("GMMR0CheckSharedModules\n"));
5487
5488 GMMCHECKSHAREDMODULEINFO Args;
5489 Args.pGVM = pGVM;
5490 Args.idCpu = idCpu;
5491 rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5492
5493 Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5494 GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5495 }
5496 else
5497 rc = VERR_GMM_IS_NOT_SANE;
5498
5499# ifndef DEBUG_sandervl
5500 gmmR0MutexRelease(pGMM);
5501# endif
5502 return rc;
5503#else
5504 RT_NOREF(pGVM, idCpu);
5505 return VERR_NOT_IMPLEMENTED;
5506#endif
5507}
5508
5509#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5510
5511/**
5512 * Worker for GMMR0FindDuplicatePageReq.
5513 *
5514 * @returns true if duplicate, false if not.
5515 */
5516static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5517{
5518 bool fFoundDuplicate = false;
5519 /* Only take chunks not mapped into this VM process; not entirely correct. */
5520 uint8_t *pbChunk;
5521 if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5522 {
5523 int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/, (PRTR3PTR)&pbChunk);
5524 if (RT_SUCCESS(rc))
5525 {
5526 /*
5527 * Look for duplicate pages
5528 */
5529 uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5530 while (iPage-- > 0)
5531 {
5532 if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5533 {
5534 uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5535 if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5536 {
5537 fFoundDuplicate = true;
5538 break;
5539 }
5540 }
5541 }
5542 gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /*fRelaxedSem*/);
5543 }
5544 }
5545 return fFoundDuplicate;
5546}
5547
5548
5549/**
5550 * Find a duplicate of the specified page in other active VMs
5551 *
5552 * @returns VBox status code.
5553 * @param pGVM The global (ring-0) VM structure.
5554 * @param pReq Pointer to the request packet.
5555 */
5556GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5557{
5558 /*
5559 * Validate input and pass it on.
5560 */
5561 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5562 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5563
5564 PGMM pGMM;
5565 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5566
5567 int rc = GVMMR0ValidateGVM(pGVM);
5568 if (RT_FAILURE(rc))
5569 return rc;
5570
5571 /*
5572 * Take the semaphore and do some more validations.
5573 */
5574 rc = gmmR0MutexAcquire(pGMM);
5575 if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5576 {
5577 uint8_t *pbChunk;
5578 PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5579 if (pChunk)
5580 {
5581 if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5582 {
5583 uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5584 PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5585 if (pPage)
5586 {
5587 /*
5588 * Walk the chunks
5589 */
5590 pReq->fDuplicate = false;
5591 RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5592 {
5593 if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5594 {
5595 pReq->fDuplicate = true;
5596 break;
5597 }
5598 }
5599 }
5600 else
5601 {
5602 AssertFailed();
5603 rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5604 }
5605 }
5606 else
5607 AssertFailed();
5608 }
5609 else
5610 AssertFailed();
5611 }
5612 else
5613 rc = VERR_GMM_IS_NOT_SANE;
5614
5615 gmmR0MutexRelease(pGMM);
5616 return rc;
5617}
5618
5619#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5620
5621
5622/**
5623 * Retrieves the GMM statistics visible to the caller.
5624 *
5625 * @returns VBox status code.
5626 *
5627 * @param pStats Where to put the statistics.
5628 * @param pSession The current session.
5629 * @param pGVM The GVM to obtain statistics for. Optional.
5630 */
5631GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5632{
5633 LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5634
5635 /*
5636 * Validate input.
5637 */
5638 AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5639 AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5640 pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5641
5642 PGMM pGMM;
5643 GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5644
5645 /*
5646 * Validate the VM handle, if not NULL, and lock the GMM.
5647 */
5648 int rc;
5649 if (pGVM)
5650 {
5651 rc = GVMMR0ValidateGVM(pGVM);
5652 if (RT_FAILURE(rc))
5653 return rc;
5654 }
5655
5656 rc = gmmR0MutexAcquire(pGMM);
5657 if (RT_FAILURE(rc))
5658 return rc;
5659
5660 /*
5661 * Copy out the GMM statistics.
5662 */
5663 pStats->cMaxPages = pGMM->cMaxPages;
5664 pStats->cReservedPages = pGMM->cReservedPages;
5665 pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5666 pStats->cAllocatedPages = pGMM->cAllocatedPages;
5667 pStats->cSharedPages = pGMM->cSharedPages;
5668 pStats->cDuplicatePages = pGMM->cDuplicatePages;
5669 pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5670 pStats->cBalloonedPages = pGMM->cBalloonedPages;
5671 pStats->cChunks = pGMM->cChunks;
5672 pStats->cFreedChunks = pGMM->cFreedChunks;
5673 pStats->cShareableModules = pGMM->cShareableModules;
5674 pStats->idFreeGeneration = pGMM->idFreeGeneration;
5675 RT_ZERO(pStats->au64Reserved);
5676
5677 /*
5678 * Copy out the VM statistics.
5679 */
5680 if (pGVM)
5681 pStats->VMStats = pGVM->gmm.s.Stats;
5682 else
5683 RT_ZERO(pStats->VMStats);
5684
5685 gmmR0MutexRelease(pGMM);
5686 return rc;
5687}
5688
5689
5690/**
5691 * VMMR0 request wrapper for GMMR0QueryStatistics.
5692 *
5693 * @returns see GMMR0QueryStatistics.
5694 * @param pGVM The global (ring-0) VM structure. Optional.
5695 * @param pReq Pointer to the request packet.
5696 */
5697GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5698{
5699 /*
5700 * Validate input and pass it on.
5701 */
5702 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5703 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5704
5705 return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5706}
5707
5708
5709/**
5710 * Resets the specified GMM statistics.
5711 *
5712 * @returns VBox status code.
5713 *
5714 * @param pStats Which statistics to reset, that is, non-zero fields
5715 * indicates which to reset.
5716 * @param pSession The current session.
5717 * @param pGVM The GVM to reset statistics for. Optional.
5718 */
5719GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5720{
5721 NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5722 /* Currently nothing we can reset at the moment. */
5723 return VINF_SUCCESS;
5724}
5725
5726
5727/**
5728 * VMMR0 request wrapper for GMMR0ResetStatistics.
5729 *
5730 * @returns see GMMR0ResetStatistics.
5731 * @param pGVM The global (ring-0) VM structure. Optional.
5732 * @param pReq Pointer to the request packet.
5733 */
5734GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5735{
5736 /*
5737 * Validate input and pass it on.
5738 */
5739 AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5740 AssertMsgReturn(pReq->Hdr.cbReq == sizeof(*pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
5741
5742 return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5743}
5744
Note: See TracBrowser for help on using the repository browser.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy Automated Access Etiquette