GMMR0.cpp@ 92339

Last change on this file since 92339 was 92339, checked in by vboxsync, 3 years ago
VMM/GMM: Optimized GMMR0AllocateLargePage a little by making gmmR0RegisterChunk mark the chunk allocated, eliminating 512 calls to gmmR0AllocatePage. bugref:10093
Property svn:eol-style set to `native` Property svn:keywords set to `Id Revision`
File size: 197.0 KB

Line
1	/* $Id: GMMR0.cpp 92339 2021-11-10 21:21:20Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2020 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relationship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	* @note With 6.1 really dropping 32-bit support, the legacy mode is obsoleted
111	* under the assumption that there is sufficient kernel virtual address
112	* space to map all of the guest memory allocations. So, we'll be using
113	* #RTR0MemObjAllocPage on some platforms as an alternative to
114	* #RTR0MemObjAllocPhysNC.
115	*
116	*
117	* @subsection sub_gmm_locking Serializing
118	*
119	* One simple fast mutex will be employed in the initial implementation, not
120	* two as mentioned in @ref sec_pgmPhys_Serializing.
121	*
122	* @see @ref sec_pgmPhys_Serializing
123	*
124	*
125	* @section sec_gmm_overcommit Memory Over-Commitment Management
126	*
127	* The GVM will have to do the system wide memory over-commitment
128	* management. My current ideas are:
129	* - Per VM oc policy that indicates how much to initially commit
130	* to it and what to do in a out-of-memory situation.
131	* - Prevent overtaxing the host.
132	*
133	* There are some challenges here, the main ones are configurability and
134	* security. Should we for instance permit anyone to request 100% memory
135	* commitment? Who should be allowed to do runtime adjustments of the
136	* config. And how to prevent these settings from being lost when the last
137	* VM process exits? The solution is probably to have an optional root
138	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
139	*
140	*
141	*
142	* @section sec_gmm_numa NUMA
143	*
144	* NUMA considerations will be designed and implemented a bit later.
145	*
146	* The preliminary guesses is that we will have to try allocate memory as
147	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
148	* threads). Which means it's mostly about allocation and sharing policies.
149	* Both the scheduler and allocator interface will to supply some NUMA info
150	* and we'll need to have a way to calc access costs.
151	*
152	*/
153
154
155	/*********************************************************************************************************************************
156	* Header Files *
157	*********************************************************************************************************************************/
158	#define LOG_GROUP LOG_GROUP_GMM
159	#include <VBox/rawpci.h>
160	#include <VBox/vmm/gmm.h>
161	#include "GMMR0Internal.h"
162	#include <VBox/vmm/vmcc.h>
163	#include <VBox/vmm/pgm.h>
164	#include <VBox/log.h>
165	#include <VBox/param.h>
166	#include <VBox/err.h>
167	#include <VBox/VMMDev.h>
168	#include <iprt/asm.h>
169	#include <iprt/avl.h>
170	#ifdef VBOX_STRICT
171	# include <iprt/crc.h>
172	#endif
173	#include <iprt/critsect.h>
174	#include <iprt/list.h>
175	#include <iprt/mem.h>
176	#include <iprt/memobj.h>
177	#include <iprt/mp.h>
178	#include <iprt/semaphore.h>
179	#include <iprt/spinlock.h>
180	#include <iprt/string.h>
181	#include <iprt/time.h>
182
183	/* This is 64-bit only code now. */
184	#if HC_ARCH_BITS != 64 \|\| ARCH_BITS != 64
185	# error "This is 64-bit only code"
186	#endif
187
188
189	/*********************************************************************************************************************************
190	* Defined Constants And Macros *
191	*********************************************************************************************************************************/
192	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
193	* Use a critical section instead of a fast mutex for the giant GMM lock.
194	*
195	* @remarks This is primarily a way of avoiding the deadlock checks in the
196	* windows driver verifier. */
197	#if defined(RT_OS_WINDOWS) \|\| defined(RT_OS_DARWIN) \|\| defined(DOXYGEN_RUNNING)
198	# define VBOX_USE_CRIT_SECT_FOR_GIANT
199	#endif
200
201
202	/*********************************************************************************************************************************
203	* Structures and Typedefs *
204	*********************************************************************************************************************************/
205	/** Pointer to set of free chunks. */
206	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
207
208	/**
209	* The per-page tracking structure employed by the GMM.
210	*
211	* Because of the different layout on 32-bit and 64-bit hosts in earlier
212	* versions of the code, macros are used to get and set some of the data.
213	*/
214	typedef union GMMPAGE
215	{
216	/** Unsigned integer view. */
217	uint64_t u;
218
219	/** The common view. */
220	struct GMMPAGECOMMON
221	{
222	uint32_t uStuff1 : 32;
223	uint32_t uStuff2 : 30;
224	/** The page state. */
225	uint32_t u2State : 2;
226	} Common;
227
228	/** The view of a private page. */
229	struct GMMPAGEPRIVATE
230	{
231	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
232	uint32_t pfn;
233	/** The GVM handle. (64K VMs) */
234	uint32_t hGVM : 16;
235	/** Reserved. */
236	uint32_t u16Reserved : 14;
237	/** The page state. */
238	uint32_t u2State : 2;
239	} Private;
240
241	/** The view of a shared page. */
242	struct GMMPAGESHARED
243	{
244	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
245	uint32_t pfn;
246	/** The reference count (64K VMs). */
247	uint32_t cRefs : 16;
248	/** Used for debug checksumming. */
249	uint32_t u14Checksum : 14;
250	/** The page state. */
251	uint32_t u2State : 2;
252	} Shared;
253
254	/** The view of a free page. */
255	struct GMMPAGEFREE
256	{
257	/** The index of the next page in the free list. UINT16_MAX is NIL. */
258	uint16_t iNext;
259	/** Reserved. Checksum or something? */
260	uint16_t u16Reserved0;
261	/** Reserved. Checksum or something? */
262	uint32_t u30Reserved1 : 29;
263	/** Set if the page was zeroed. */
264	uint32_t fZeroed : 1;
265	/** The page state. */
266	uint32_t u2State : 2;
267	} Free;
268	} GMMPAGE;
269	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
270	/** Pointer to a GMMPAGE. */
271	typedef GMMPAGE *PGMMPAGE;
272
273
274	/** @name The Page States.
275	* @{ */
276	/** A private page. */
277	#define GMM_PAGE_STATE_PRIVATE 0
278	/** A shared page. */
279	#define GMM_PAGE_STATE_SHARED 2
280	/** A free page. */
281	#define GMM_PAGE_STATE_FREE 3
282	/** @} */
283
284
285	/** @def GMM_PAGE_IS_PRIVATE
286	*
287	* @returns true if private, false if not.
288	* @param pPage The GMM page.
289	*/
290	#define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
291
292	/** @def GMM_PAGE_IS_SHARED
293	*
294	* @returns true if shared, false if not.
295	* @param pPage The GMM page.
296	*/
297	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
298
299	/** @def GMM_PAGE_IS_FREE
300	*
301	* @returns true if free, false if not.
302	* @param pPage The GMM page.
303	*/
304	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
305
306	/** @def GMM_PAGE_PFN_LAST
307	* The last valid guest pfn range.
308	* @remark Some of the values outside the range has special meaning,
309	* see GMM_PAGE_PFN_UNSHAREABLE.
310	*/
311	#define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
312	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
313
314	/** @def GMM_PAGE_PFN_UNSHAREABLE
315	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
316	*/
317	#define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
318	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
319
320
321	/**
322	* A GMM allocation chunk ring-3 mapping record.
323	*
324	* This should really be associated with a session and not a VM, but
325	* it's simpler to associated with a VM and cleanup with the VM object
326	* is destroyed.
327	*/
328	typedef struct GMMCHUNKMAP
329	{
330	/** The mapping object. */
331	RTR0MEMOBJ hMapObj;
332	/** The VM owning the mapping. */
333	PGVM pGVM;
334	} GMMCHUNKMAP;
335	/** Pointer to a GMM allocation chunk mapping. */
336	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
337
338
339	/**
340	* A GMM allocation chunk.
341	*/
342	typedef struct GMMCHUNK
343	{
344	/** The AVL node core.
345	* The Key is the chunk ID. (Giant mtx.) */
346	AVLU32NODECORE Core;
347	/** The memory object.
348	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
349	* what the host can dish up with. (Chunk mtx protects mapping accesses
350	* and related frees.) */
351	RTR0MEMOBJ hMemObj;
352	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
353	/** Pointer to the kernel mapping. */
354	uint8_t *pbMapping;
355	#endif
356	/** Pointer to the next chunk in the free list. (Giant mtx.) */
357	PGMMCHUNK pFreeNext;
358	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
359	PGMMCHUNK pFreePrev;
360	/** Pointer to the free set this chunk belongs to. NULL for
361	* chunks with no free pages. (Giant mtx.) */
362	PGMMCHUNKFREESET pSet;
363	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
364	RTLISTNODE ListNode;
365	/** Pointer to an array of mappings. (Chunk mtx.) */
366	PGMMCHUNKMAP paMappingsX;
367	/** The number of mappings. (Chunk mtx.) */
368	uint16_t cMappingsX;
369	/** The mapping lock this chunk is using using. UINT8_MAX if nobody is mapping
370	* or freeing anything. (Giant mtx.) */
371	uint8_t volatile iChunkMtx;
372	/** GMM_CHUNK_FLAGS_XXX. (Giant mtx.) */
373	uint8_t fFlags;
374	/** The head of the list of free pages. UINT16_MAX is the NIL value.
375	* (Giant mtx.) */
376	uint16_t iFreeHead;
377	/** The number of free pages. (Giant mtx.) */
378	uint16_t cFree;
379	/** The GVM handle of the VM that first allocated pages from this chunk, this
380	* is used as a preference when there are several chunks to choose from.
381	* When in bound memory mode this isn't a preference any longer. (Giant
382	* mtx.) */
383	uint16_t hGVM;
384	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
385	* future use.) (Giant mtx.) */
386	uint16_t idNumaNode;
387	/** The number of private pages. (Giant mtx.) */
388	uint16_t cPrivate;
389	/** The number of shared pages. (Giant mtx.) */
390	uint16_t cShared;
391	/** The UID this chunk is associated with. */
392	RTUID uidOwner;
393	uint32_t u32Padding;
394	/** The pages. (Giant mtx.) */
395	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
396	} GMMCHUNK;
397
398	/** Indicates that the NUMA properies of the memory is unknown. */
399	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
400
401	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
402	* @{ */
403	/** Indicates that the chunk is a large page (2MB). */
404	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
405	/** @} */
406
407
408	/**
409	* An allocation chunk TLB entry.
410	*/
411	typedef struct GMMCHUNKTLBE
412	{
413	/** The chunk id. */
414	uint32_t idChunk;
415	/** Pointer to the chunk. */
416	PGMMCHUNK pChunk;
417	} GMMCHUNKTLBE;
418	/** Pointer to an allocation chunk TLB entry. */
419	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
420
421
422	/** The number of entries in the allocation chunk TLB. */
423	#define GMM_CHUNKTLB_ENTRIES 32
424	/** Gets the TLB entry index for the given Chunk ID. */
425	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
426
427	/**
428	* An allocation chunk TLB.
429	*/
430	typedef struct GMMCHUNKTLB
431	{
432	/** The TLB entries. */
433	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
434	} GMMCHUNKTLB;
435	/** Pointer to an allocation chunk TLB. */
436	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
437
438
439	/**
440	* The GMM instance data.
441	*/
442	typedef struct GMM
443	{
444	/** Magic / eye catcher. GMM_MAGIC */
445	uint32_t u32Magic;
446	/** The number of threads waiting on the mutex. */
447	uint32_t cMtxContenders;
448	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
449	/** The critical section protecting the GMM.
450	* More fine grained locking can be implemented later if necessary. */
451	RTCRITSECT GiantCritSect;
452	#else
453	/** The fast mutex protecting the GMM.
454	* More fine grained locking can be implemented later if necessary. */
455	RTSEMFASTMUTEX hMtx;
456	#endif
457	#ifdef VBOX_STRICT
458	/** The current mutex owner. */
459	RTNATIVETHREAD hMtxOwner;
460	#endif
461	/** Spinlock protecting the AVL tree.
462	* @todo Make this a read-write spinlock as we should allow concurrent
463	* lookups. */
464	RTSPINLOCK hSpinLockTree;
465	/** The chunk tree.
466	* Protected by hSpinLockTree. */
467	PAVLU32NODECORE pChunks;
468	/** Chunk freeing generation - incremented whenever a chunk is freed. Used
469	* for validating the per-VM chunk TLB entries. Valid range is 1 to 2^62
470	* (exclusive), though higher numbers may temporarily occure while
471	* invalidating the individual TLBs during wrap-around processing. */
472	uint64_t volatile idFreeGeneration;
473	/** The chunk TLB.
474	* Protected by hSpinLockTree. */
475	GMMCHUNKTLB ChunkTLB;
476	/** The private free set. */
477	GMMCHUNKFREESET PrivateX;
478	/** The shared free set. */
479	GMMCHUNKFREESET Shared;
480
481	/** Shared module tree (global).
482	* @todo separate trees for distinctly different guest OSes. */
483	PAVLLU32NODECORE pGlobalSharedModuleTree;
484	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
485	uint32_t cShareableModules;
486
487	/** The chunk list. For simplifying the cleanup process and avoid tree
488	* traversal. */
489	RTLISTANCHOR ChunkList;
490
491	/** The maximum number of pages we're allowed to allocate.
492	* @gcfgm{GMM/MaxPages,64-bit, Direct.}
493	* @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
494	uint64_t cMaxPages;
495	/** The number of pages that has been reserved.
496	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
497	uint64_t cReservedPages;
498	/** The number of pages that we have over-committed in reservations. */
499	uint64_t cOverCommittedPages;
500	/** The number of actually allocated (committed if you like) pages. */
501	uint64_t cAllocatedPages;
502	/** The number of pages that are shared. A subset of cAllocatedPages. */
503	uint64_t cSharedPages;
504	/** The number of pages that are actually shared between VMs. */
505	uint64_t cDuplicatePages;
506	/** The number of pages that are shared that has been left behind by
507	* VMs not doing proper cleanups. */
508	uint64_t cLeftBehindSharedPages;
509	/** The number of allocation chunks.
510	* (The number of pages we've allocated from the host can be derived from this.) */
511	uint32_t cChunks;
512	/** The number of current ballooned pages. */
513	uint64_t cBalloonedPages;
514
515	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
516	/** Whether #RTR0MemObjAllocPhysNC works. */
517	bool fHasWorkingAllocPhysNC;
518	#else
519	bool fPadding;
520	#endif
521	/** The bound memory mode indicator.
522	* When set, the memory will be bound to a specific VM and never
523	* shared. This is always set if fLegacyAllocationMode is set.
524	* (Also determined at initialization time.) */
525	bool fBoundMemoryMode;
526	/** The number of registered VMs. */
527	uint16_t cRegisteredVMs;
528
529	/** The index of the next mutex to use. */
530	uint32_t iNextChunkMtx;
531	/** Chunk locks for reducing lock contention without having to allocate
532	* one lock per chunk. */
533	struct
534	{
535	/** The mutex */
536	RTSEMFASTMUTEX hMtx;
537	/** The number of threads currently using this mutex. */
538	uint32_t volatile cUsers;
539	} aChunkMtx[64];
540
541	/** The number of freed chunks ever. This is used as list generation to
542	* avoid restarting the cleanup scanning when the list wasn't modified. */
543	uint32_t volatile cFreedChunks;
544	/** The previous allocated Chunk ID.
545	* Used as a hint to avoid scanning the whole bitmap. */
546	uint32_t idChunkPrev;
547	/** Chunk ID allocation bitmap.
548	* Bits of allocated IDs are set, free ones are clear.
549	* The NIL id (0) is marked allocated. */
550	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
551	} GMM;
552	/** Pointer to the GMM instance. */
553	typedef GMM *PGMM;
554
555	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
556	#define GMM_MAGIC UINT32_C(0x19540414)
557
558
559	/**
560	* GMM chunk mutex state.
561	*
562	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
563	* gmmR0ChunkMutex* methods.
564	*/
565	typedef struct GMMR0CHUNKMTXSTATE
566	{
567	PGMM pGMM;
568	/** The index of the chunk mutex. */
569	uint8_t iChunkMtx;
570	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
571	uint8_t fFlags;
572	} GMMR0CHUNKMTXSTATE;
573	/** Pointer to a chunk mutex state. */
574	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
575
576	/** @name GMMR0CHUNK_MTX_XXX
577	* @{ */
578	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
579	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
580	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
581	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
582	#define GMMR0CHUNK_MTX_END UINT32_C(4)
583	/** @} */
584
585
586	/** The maximum number of shared modules per-vm. */
587	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
588	/** The maximum number of shared modules GMM is allowed to track. */
589	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
590
591
592	/**
593	* Argument packet for gmmR0SharedModuleCleanup.
594	*/
595	typedef struct GMMR0SHMODPERVMDTORARGS
596	{
597	PGVM pGVM;
598	PGMM pGMM;
599	} GMMR0SHMODPERVMDTORARGS;
600
601	/**
602	* Argument packet for gmmR0CheckSharedModule.
603	*/
604	typedef struct GMMCHECKSHAREDMODULEINFO
605	{
606	PGVM pGVM;
607	VMCPUID idCpu;
608	} GMMCHECKSHAREDMODULEINFO;
609
610
611	/*********************************************************************************************************************************
612	* Global Variables *
613	*********************************************************************************************************************************/
614	/** Pointer to the GMM instance data. */
615	static PGMM g_pGMM = NULL;
616
617	/** Macro for obtaining and validating the g_pGMM pointer.
618	*
619	* On failure it will return from the invoking function with the specified
620	* return value.
621	*
622	* @param pGMM The name of the pGMM variable.
623	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
624	* status codes.
625	*/
626	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
627	do { \
628	(pGMM) = g_pGMM; \
629	AssertPtrReturn((pGMM), (rc)); \
630	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
631	} while (0)
632
633	/** Macro for obtaining and validating the g_pGMM pointer, void function
634	* variant.
635	*
636	* On failure it will return from the invoking function.
637	*
638	* @param pGMM The name of the pGMM variable.
639	*/
640	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
641	do { \
642	(pGMM) = g_pGMM; \
643	AssertPtrReturnVoid((pGMM)); \
644	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
645	} while (0)
646
647
648	/** @def GMM_CHECK_SANITY_UPON_ENTERING
649	* Checks the sanity of the GMM instance data before making changes.
650	*
651	* This is macro is a stub by default and must be enabled manually in the code.
652	*
653	* @returns true if sane, false if not.
654	* @param pGMM The name of the pGMM variable.
655	*/
656	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
657	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (RT_LIKELY(gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0))
658	#else
659	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
660	#endif
661
662	/** @def GMM_CHECK_SANITY_UPON_LEAVING
663	* Checks the sanity of the GMM instance data after making changes.
664	*
665	* This is macro is a stub by default and must be enabled manually in the code.
666	*
667	* @returns true if sane, false if not.
668	* @param pGMM The name of the pGMM variable.
669	*/
670	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
671	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
672	#else
673	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
674	#endif
675
676	/** @def GMM_CHECK_SANITY_IN_LOOPS
677	* Checks the sanity of the GMM instance in the allocation loops.
678	*
679	* This is macro is a stub by default and must be enabled manually in the code.
680	*
681	* @returns true if sane, false if not.
682	* @param pGMM The name of the pGMM variable.
683	*/
684	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
685	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
686	#else
687	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
688	#endif
689
690
691	/*********************************************************************************************************************************
692	* Internal Functions *
693	*********************************************************************************************************************************/
694	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
695	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
696	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
697	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
698	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
699	#ifdef GMMR0_WITH_SANITY_CHECK
700	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
701	#endif
702	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
703	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
704	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
705	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
706	#ifdef VBOX_WITH_PAGE_SHARING
707	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
708	# ifdef VBOX_STRICT
709	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
710	# endif
711	#endif
712
713
714
715	/**
716	* Initializes the GMM component.
717	*
718	* This is called when the VMMR0.r0 module is loaded and protected by the
719	* loader semaphore.
720	*
721	* @returns VBox status code.
722	*/
723	GMMR0DECL(int) GMMR0Init(void)
724	{
725	LogFlow(("GMMInit:\n"));
726
727	/*
728	* Allocate the instance data and the locks.
729	*/
730	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
731	if (!pGMM)
732	return VERR_NO_MEMORY;
733
734	pGMM->u32Magic = GMM_MAGIC;
735	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
736	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
737	RTListInit(&pGMM->ChunkList);
738	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
739
740	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
741	int rc = RTCritSectInit(&pGMM->GiantCritSect);
742	#else
743	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
744	#endif
745	if (RT_SUCCESS(rc))
746	{
747	unsigned iMtx;
748	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
749	{
750	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
751	if (RT_FAILURE(rc))
752	break;
753	}
754	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
755	if (RT_SUCCESS(rc))
756	rc = RTSpinlockCreate(&pGMM->hSpinLockTree, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "gmm-chunk-tree");
757	if (RT_SUCCESS(rc))
758	{
759	/*
760	* Figure out how we're going to allocate stuff (only applicable to
761	* host with linear physical memory mappings).
762	*/
763	pGMM->fBoundMemoryMode = false;
764	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
765	pGMM->fHasWorkingAllocPhysNC = false;
766
767	RTR0MEMOBJ hMemObj;
768	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
769	if (RT_SUCCESS(rc))
770	{
771	rc = RTR0MemObjFree(hMemObj, true);
772	AssertRC(rc);
773	pGMM->fHasWorkingAllocPhysNC = true;
774	}
775	else if (rc != VERR_NOT_SUPPORTED)
776	SUPR0Printf("GMMR0Init: Warning! RTR0MemObjAllocPhysNC(, %u, NIL_RTHCPHYS) -> %d!\n", GMM_CHUNK_SIZE, rc);
777	# endif
778
779	/*
780	* Query system page count and guess a reasonable cMaxPages value.
781	*/
782	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
783
784	/*
785	* The idFreeGeneration value should be set so we actually trigger the
786	* wrap-around invalidation handling during a typical test run.
787	*/
788	pGMM->idFreeGeneration = UINT64_MAX / 4 - 128;
789
790	g_pGMM = pGMM;
791	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
792	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool fHasWorkingAllocPhysNC=%RTbool\n", pGMM, pGMM->fBoundMemoryMode, pGMM->fHasWorkingAllocPhysNC));
793	#else
794	LogFlow(("GMMInit: pGMM=%p fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fBoundMemoryMode));
795	#endif
796	return VINF_SUCCESS;
797	}
798
799	/*
800	* Bail out.
801	*/
802	RTSpinlockDestroy(pGMM->hSpinLockTree);
803	while (iMtx-- > 0)
804	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
805	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
806	RTCritSectDelete(&pGMM->GiantCritSect);
807	#else
808	RTSemFastMutexDestroy(pGMM->hMtx);
809	#endif
810	}
811
812	pGMM->u32Magic = 0;
813	RTMemFree(pGMM);
814	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
815	return rc;
816	}
817
818
819	/**
820	* Terminates the GMM component.
821	*/
822	GMMR0DECL(void) GMMR0Term(void)
823	{
824	LogFlow(("GMMTerm:\n"));
825
826	/*
827	* Take care / be paranoid...
828	*/
829	PGMM pGMM = g_pGMM;
830	if (!RT_VALID_PTR(pGMM))
831	return;
832	if (pGMM->u32Magic != GMM_MAGIC)
833	{
834	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
835	return;
836	}
837
838	/*
839	* Undo what init did and free all the resources we've acquired.
840	*/
841	/* Destroy the fundamentals. */
842	g_pGMM = NULL;
843	pGMM->u32Magic = ~GMM_MAGIC;
844	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
845	RTCritSectDelete(&pGMM->GiantCritSect);
846	#else
847	RTSemFastMutexDestroy(pGMM->hMtx);
848	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
849	#endif
850	RTSpinlockDestroy(pGMM->hSpinLockTree);
851	pGMM->hSpinLockTree = NIL_RTSPINLOCK;
852
853	/* Free any chunks still hanging around. */
854	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
855
856	/* Destroy the chunk locks. */
857	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
858	{
859	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
860	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
861	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
862	}
863
864	/* Finally the instance data itself. */
865	RTMemFree(pGMM);
866	LogFlow(("GMMTerm: done\n"));
867	}
868
869
870	/**
871	* RTAvlU32Destroy callback.
872	*
873	* @returns 0
874	* @param pNode The node to destroy.
875	* @param pvGMM The GMM handle.
876	*/
877	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
878	{
879	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
880
881	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
882	SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
883	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
884
885	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
886	if (RT_FAILURE(rc))
887	{
888	SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
889	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
890	AssertRC(rc);
891	}
892	pChunk->hMemObj = NIL_RTR0MEMOBJ;
893
894	RTMemFree(pChunk->paMappingsX);
895	pChunk->paMappingsX = NULL;
896
897	RTMemFree(pChunk);
898	NOREF(pvGMM);
899	return 0;
900	}
901
902
903	/**
904	* Initializes the per-VM data for the GMM.
905	*
906	* This is called from within the GVMM lock (from GVMMR0CreateVM)
907	* and should only initialize the data members so GMMR0CleanupVM
908	* can deal with them. We reserve no memory or anything here,
909	* that's done later in GMMR0InitVM.
910	*
911	* @param pGVM Pointer to the Global VM structure.
912	*/
913	GMMR0DECL(int) GMMR0InitPerVMData(PGVM pGVM)
914	{
915	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
916
917	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
918	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
919	pGVM->gmm.s.Stats.fMayAllocate = false;
920
921	pGVM->gmm.s.hChunkTlbSpinLock = NIL_RTSPINLOCK;
922	int rc = RTSpinlockCreate(&pGVM->gmm.s.hChunkTlbSpinLock, RTSPINLOCK_FLAGS_INTERRUPT_SAFE, "per-vm-chunk-tlb");
923	AssertRCReturn(rc, rc);
924
925	return VINF_SUCCESS;
926	}
927
928
929	/**
930	* Acquires the GMM giant lock.
931	*
932	* @returns Assert status code from RTSemFastMutexRequest.
933	* @param pGMM Pointer to the GMM instance.
934	*/
935	static int gmmR0MutexAcquire(PGMM pGMM)
936	{
937	ASMAtomicIncU32(&pGMM->cMtxContenders);
938	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
939	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
940	#else
941	int rc = RTSemFastMutexRequest(pGMM->hMtx);
942	#endif
943	ASMAtomicDecU32(&pGMM->cMtxContenders);
944	AssertRC(rc);
945	#ifdef VBOX_STRICT
946	pGMM->hMtxOwner = RTThreadNativeSelf();
947	#endif
948	return rc;
949	}
950
951
952	/**
953	* Releases the GMM giant lock.
954	*
955	* @returns Assert status code from RTSemFastMutexRequest.
956	* @param pGMM Pointer to the GMM instance.
957	*/
958	static int gmmR0MutexRelease(PGMM pGMM)
959	{
960	#ifdef VBOX_STRICT
961	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
962	#endif
963	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
964	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
965	#else
966	int rc = RTSemFastMutexRelease(pGMM->hMtx);
967	AssertRC(rc);
968	#endif
969	return rc;
970	}
971
972
973	/**
974	* Yields the GMM giant lock if there is contention and a certain minimum time
975	* has elapsed since we took it.
976	*
977	* @returns @c true if the mutex was yielded, @c false if not.
978	* @param pGMM Pointer to the GMM instance.
979	* @param puLockNanoTS Where the lock acquisition time stamp is kept
980	* (in/out).
981	*/
982	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
983	{
984	/*
985	* If nobody is contending the mutex, don't bother checking the time.
986	*/
987	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
988	return false;
989
990	/*
991	* Don't yield if we haven't executed for at least 2 milliseconds.
992	*/
993	uint64_t uNanoNow = RTTimeSystemNanoTS();
994	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
995	return false;
996
997	/*
998	* Yield the mutex.
999	*/
1000	#ifdef VBOX_STRICT
1001	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1002	#endif
1003	ASMAtomicIncU32(&pGMM->cMtxContenders);
1004	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1005	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1006	#else
1007	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1008	#endif
1009
1010	RTThreadYield();
1011
1012	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1013	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1014	#else
1015	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1016	#endif
1017	*puLockNanoTS = RTTimeSystemNanoTS();
1018	ASMAtomicDecU32(&pGMM->cMtxContenders);
1019	#ifdef VBOX_STRICT
1020	pGMM->hMtxOwner = RTThreadNativeSelf();
1021	#endif
1022
1023	return true;
1024	}
1025
1026
1027	/**
1028	* Acquires a chunk lock.
1029	*
1030	* The caller must own the giant lock.
1031	*
1032	* @returns Assert status code from RTSemFastMutexRequest.
1033	* @param pMtxState The chunk mutex state info. (Avoids
1034	* passing the same flags and stuff around
1035	* for subsequent release and drop-giant
1036	* calls.)
1037	* @param pGMM Pointer to the GMM instance.
1038	* @param pChunk Pointer to the chunk.
1039	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1040	*/
1041	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1042	{
1043	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1044	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1045
1046	pMtxState->pGMM = pGMM;
1047	pMtxState->fFlags = (uint8_t)fFlags;
1048
1049	/*
1050	* Get the lock index and reference the lock.
1051	*/
1052	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1053	uint32_t iChunkMtx = pChunk->iChunkMtx;
1054	if (iChunkMtx == UINT8_MAX)
1055	{
1056	iChunkMtx = pGMM->iNextChunkMtx++;
1057	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1058
1059	/* Try get an unused one... */
1060	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1061	{
1062	iChunkMtx = pGMM->iNextChunkMtx++;
1063	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1064	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1065	{
1066	iChunkMtx = pGMM->iNextChunkMtx++;
1067	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1068	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1069	{
1070	iChunkMtx = pGMM->iNextChunkMtx++;
1071	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1072	}
1073	}
1074	}
1075
1076	pChunk->iChunkMtx = iChunkMtx;
1077	}
1078	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1079	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1080	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1081
1082	/*
1083	* Drop the giant?
1084	*/
1085	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1086	{
1087	/** @todo GMM life cycle cleanup (we may race someone
1088	* destroying and cleaning up GMM)? */
1089	gmmR0MutexRelease(pGMM);
1090	}
1091
1092	/*
1093	* Take the chunk mutex.
1094	*/
1095	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1096	AssertRC(rc);
1097	return rc;
1098	}
1099
1100
1101	/**
1102	* Releases the GMM giant lock.
1103	*
1104	* @returns Assert status code from RTSemFastMutexRequest.
1105	* @param pMtxState Pointer to the chunk mutex state.
1106	* @param pChunk Pointer to the chunk if it's still
1107	* alive, NULL if it isn't. This is used to deassociate
1108	* the chunk from the mutex on the way out so a new one
1109	* can be selected next time, thus avoiding contented
1110	* mutexes.
1111	*/
1112	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1113	{
1114	PGMM pGMM = pMtxState->pGMM;
1115
1116	/*
1117	* Release the chunk mutex and reacquire the giant if requested.
1118	*/
1119	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1120	AssertRC(rc);
1121	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1122	rc = gmmR0MutexAcquire(pGMM);
1123	else
1124	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1125
1126	/*
1127	* Drop the chunk mutex user reference and deassociate it from the chunk
1128	* when possible.
1129	*/
1130	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1131	&& pChunk
1132	&& RT_SUCCESS(rc) )
1133	{
1134	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1135	pChunk->iChunkMtx = UINT8_MAX;
1136	else
1137	{
1138	rc = gmmR0MutexAcquire(pGMM);
1139	if (RT_SUCCESS(rc))
1140	{
1141	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1142	pChunk->iChunkMtx = UINT8_MAX;
1143	rc = gmmR0MutexRelease(pGMM);
1144	}
1145	}
1146	}
1147
1148	pMtxState->pGMM = NULL;
1149	return rc;
1150	}
1151
1152
1153	/**
1154	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1155	* chunk locked.
1156	*
1157	* This only works if gmmR0ChunkMutexAcquire was called with
1158	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1159	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1160	*
1161	* @returns VBox status code (assuming success is ok).
1162	* @param pMtxState Pointer to the chunk mutex state.
1163	*/
1164	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1165	{
1166	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1167	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1168	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1169	/** @todo GMM life cycle cleanup (we may race someone
1170	* destroying and cleaning up GMM)? */
1171	return gmmR0MutexRelease(pMtxState->pGMM);
1172	}
1173
1174
1175	/**
1176	* For experimenting with NUMA affinity and such.
1177	*
1178	* @returns The current NUMA Node ID.
1179	*/
1180	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1181	{
1182	#if 1
1183	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1184	#else
1185	return RTMpCpuId() / 16;
1186	#endif
1187	}
1188
1189
1190
1191	/**
1192	* Cleans up when a VM is terminating.
1193	*
1194	* @param pGVM Pointer to the Global VM structure.
1195	*/
1196	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1197	{
1198	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1199
1200	PGMM pGMM;
1201	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1202
1203	#ifdef VBOX_WITH_PAGE_SHARING
1204	/*
1205	* Clean up all registered shared modules first.
1206	*/
1207	gmmR0SharedModuleCleanup(pGMM, pGVM);
1208	#endif
1209
1210	gmmR0MutexAcquire(pGMM);
1211	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1212	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1213
1214	/*
1215	* The policy is 'INVALID' until the initial reservation
1216	* request has been serviced.
1217	*/
1218	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1219	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1220	{
1221	/*
1222	* If it's the last VM around, we can skip walking all the chunk looking
1223	* for the pages owned by this VM and instead flush the whole shebang.
1224	*
1225	* This takes care of the eventuality that a VM has left shared page
1226	* references behind (shouldn't happen of course, but you never know).
1227	*/
1228	Assert(pGMM->cRegisteredVMs);
1229	pGMM->cRegisteredVMs--;
1230
1231	/*
1232	* Walk the entire pool looking for pages that belong to this VM
1233	* and leftover mappings. (This'll only catch private pages,
1234	* shared pages will be 'left behind'.)
1235	*/
1236	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1237	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1238
1239	unsigned iCountDown = 64;
1240	bool fRedoFromStart;
1241	PGMMCHUNK pChunk;
1242	do
1243	{
1244	fRedoFromStart = false;
1245	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1246	{
1247	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1248	if ( ( !pGMM->fBoundMemoryMode
1249	\|\| pChunk->hGVM == pGVM->hSelf)
1250	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1251	{
1252	/* We left the giant mutex, so reset the yield counters. */
1253	uLockNanoTS = RTTimeSystemNanoTS();
1254	iCountDown = 64;
1255	}
1256	else
1257	{
1258	/* Didn't leave it, so do normal yielding. */
1259	if (!iCountDown)
1260	gmmR0MutexYield(pGMM, &uLockNanoTS);
1261	else
1262	iCountDown--;
1263	}
1264	if (pGMM->cFreedChunks != cFreeChunksOld)
1265	{
1266	fRedoFromStart = true;
1267	break;
1268	}
1269	}
1270	} while (fRedoFromStart);
1271
1272	if (pGVM->gmm.s.Stats.cPrivatePages)
1273	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1274
1275	pGMM->cAllocatedPages -= cPrivatePages;
1276
1277	/*
1278	* Free empty chunks.
1279	*/
1280	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1281	do
1282	{
1283	fRedoFromStart = false;
1284	iCountDown = 10240;
1285	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1286	while (pChunk)
1287	{
1288	PGMMCHUNK pNext = pChunk->pFreeNext;
1289	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1290	if ( !pGMM->fBoundMemoryMode
1291	\|\| pChunk->hGVM == pGVM->hSelf)
1292	{
1293	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1294	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1295	{
1296	/* We've left the giant mutex, restart? (+1 for our unlink) */
1297	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1298	if (fRedoFromStart)
1299	break;
1300	uLockNanoTS = RTTimeSystemNanoTS();
1301	iCountDown = 10240;
1302	}
1303	}
1304
1305	/* Advance and maybe yield the lock. */
1306	pChunk = pNext;
1307	if (--iCountDown == 0)
1308	{
1309	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1310	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1311	&& pPrivateSet->idGeneration != idGenerationOld;
1312	if (fRedoFromStart)
1313	break;
1314	iCountDown = 10240;
1315	}
1316	}
1317	} while (fRedoFromStart);
1318
1319	/*
1320	* Account for shared pages that weren't freed.
1321	*/
1322	if (pGVM->gmm.s.Stats.cSharedPages)
1323	{
1324	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1325	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1326	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1327	}
1328
1329	/*
1330	* Clean up balloon statistics in case the VM process crashed.
1331	*/
1332	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1333	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1334
1335	/*
1336	* Update the over-commitment management statistics.
1337	*/
1338	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1339	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1340	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1341	switch (pGVM->gmm.s.Stats.enmPolicy)
1342	{
1343	case GMMOCPOLICY_NO_OC:
1344	break;
1345	default:
1346	/** @todo Update GMM->cOverCommittedPages */
1347	break;
1348	}
1349	}
1350
1351	/* zap the GVM data. */
1352	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1353	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1354	pGVM->gmm.s.Stats.fMayAllocate = false;
1355
1356	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1357	gmmR0MutexRelease(pGMM);
1358
1359	/*
1360	* Destroy the spinlock.
1361	*/
1362	RTSPINLOCK hSpinlock = NIL_RTSPINLOCK;
1363	ASMAtomicXchgHandle(&pGVM->gmm.s.hChunkTlbSpinLock, NIL_RTSPINLOCK, &hSpinlock);
1364	RTSpinlockDestroy(hSpinlock);
1365
1366	LogFlow(("GMMR0CleanupVM: returns\n"));
1367	}
1368
1369
1370	/**
1371	* Scan one chunk for private pages belonging to the specified VM.
1372	*
1373	* @note This function may drop the giant mutex!
1374	*
1375	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1376	* we didn't.
1377	* @param pGMM Pointer to the GMM instance.
1378	* @param pGVM The global VM handle.
1379	* @param pChunk The chunk to scan.
1380	*/
1381	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1382	{
1383	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1384
1385	/*
1386	* Look for pages belonging to the VM.
1387	* (Perform some internal checks while we're scanning.)
1388	*/
1389	#ifndef VBOX_STRICT
1390	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1391	#endif
1392	{
1393	unsigned cPrivate = 0;
1394	unsigned cShared = 0;
1395	unsigned cFree = 0;
1396
1397	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1398
1399	uint16_t hGVM = pGVM->hSelf;
1400	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1401	while (iPage-- > 0)
1402	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1403	{
1404	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1405	{
1406	/*
1407	* Free the page.
1408	*
1409	* The reason for not using gmmR0FreePrivatePage here is that we
1410	* must not cause the chunk to be freed from under us - we're in
1411	* an AVL tree walk here.
1412	*/
1413	pChunk->aPages[iPage].u = 0;
1414	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1415	pChunk->aPages[iPage].Free.fZeroed = false;
1416	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1417	pChunk->iFreeHead = iPage;
1418	pChunk->cPrivate--;
1419	pChunk->cFree++;
1420	pGVM->gmm.s.Stats.cPrivatePages--;
1421	cFree++;
1422	}
1423	else
1424	cPrivate++;
1425	}
1426	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1427	cFree++;
1428	else
1429	cShared++;
1430
1431	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1432
1433	/*
1434	* Did it add up?
1435	*/
1436	if (RT_UNLIKELY( pChunk->cFree != cFree
1437	\|\| pChunk->cPrivate != cPrivate
1438	\|\| pChunk->cShared != cShared))
1439	{
1440	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1441	pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1442	pChunk->cFree = cFree;
1443	pChunk->cPrivate = cPrivate;
1444	pChunk->cShared = cShared;
1445	}
1446	}
1447
1448	/*
1449	* If not in bound memory mode, we should reset the hGVM field
1450	* if it has our handle in it.
1451	*/
1452	if (pChunk->hGVM == pGVM->hSelf)
1453	{
1454	if (!g_pGMM->fBoundMemoryMode)
1455	pChunk->hGVM = NIL_GVM_HANDLE;
1456	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1457	{
1458	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1459	pChunk, pChunk->Core.Key, pChunk->cFree);
1460	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1461
1462	gmmR0UnlinkChunk(pChunk);
1463	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1464	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1465	}
1466	}
1467
1468	/*
1469	* Look for a mapping belonging to the terminating VM.
1470	*/
1471	GMMR0CHUNKMTXSTATE MtxState;
1472	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1473	unsigned cMappings = pChunk->cMappingsX;
1474	for (unsigned i = 0; i < cMappings; i++)
1475	if (pChunk->paMappingsX[i].pGVM == pGVM)
1476	{
1477	gmmR0ChunkMutexDropGiant(&MtxState);
1478
1479	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1480
1481	cMappings--;
1482	if (i < cMappings)
1483	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1484	pChunk->paMappingsX[cMappings].pGVM = NULL;
1485	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1486	Assert(pChunk->cMappingsX - 1U == cMappings);
1487	pChunk->cMappingsX = cMappings;
1488
1489	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1490	if (RT_FAILURE(rc))
1491	{
1492	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1493	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1494	AssertRC(rc);
1495	}
1496
1497	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1498	return true;
1499	}
1500
1501	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1502	return false;
1503	}
1504
1505
1506	/**
1507	* The initial resource reservations.
1508	*
1509	* This will make memory reservations according to policy and priority. If there aren't
1510	* sufficient resources available to sustain the VM this function will fail and all
1511	* future allocations requests will fail as well.
1512	*
1513	* These are just the initial reservations made very very early during the VM creation
1514	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1515	* ring-3 init has completed.
1516	*
1517	* @returns VBox status code.
1518	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1519	* @retval VERR_GMM_
1520	*
1521	* @param pGVM The global (ring-0) VM structure.
1522	* @param idCpu The VCPU id - must be zero.
1523	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1524	* This does not include MMIO2 and similar.
1525	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1526	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1527	* hyper heap, MMIO2 and similar.
1528	* @param enmPolicy The OC policy to use on this VM.
1529	* @param enmPriority The priority in an out-of-memory situation.
1530	*
1531	* @thread The creator thread / EMT(0).
1532	*/
1533	GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1534	uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1535	{
1536	LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1537	pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1538
1539	/*
1540	* Validate, get basics and take the semaphore.
1541	*/
1542	AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1543	PGMM pGMM;
1544	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1545	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1546	if (RT_FAILURE(rc))
1547	return rc;
1548
1549	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1550	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1551	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1552	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1553	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1554
1555	gmmR0MutexAcquire(pGMM);
1556	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1557	{
1558	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1559	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1560	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1561	{
1562	/*
1563	* Check if we can accommodate this.
1564	*/
1565	/* ... later ... */
1566	if (RT_SUCCESS(rc))
1567	{
1568	/*
1569	* Update the records.
1570	*/
1571	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1572	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1573	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1574	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1575	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1576	pGVM->gmm.s.Stats.fMayAllocate = true;
1577
1578	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1579	pGMM->cRegisteredVMs++;
1580	}
1581	}
1582	else
1583	rc = VERR_WRONG_ORDER;
1584	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1585	}
1586	else
1587	rc = VERR_GMM_IS_NOT_SANE;
1588	gmmR0MutexRelease(pGMM);
1589	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1590	return rc;
1591	}
1592
1593
1594	/**
1595	* VMMR0 request wrapper for GMMR0InitialReservation.
1596	*
1597	* @returns see GMMR0InitialReservation.
1598	* @param pGVM The global (ring-0) VM structure.
1599	* @param idCpu The VCPU id.
1600	* @param pReq Pointer to the request packet.
1601	*/
1602	GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1603	{
1604	/*
1605	* Validate input and pass it on.
1606	*/
1607	AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1608	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1609	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1610
1611	return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1612	pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1613	}
1614
1615
1616	/**
1617	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1618	*
1619	* @returns VBox status code.
1620	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1621	*
1622	* @param pGVM The global (ring-0) VM structure.
1623	* @param idCpu The VCPU id.
1624	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1625	* This does not include MMIO2 and similar.
1626	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1627	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1628	* hyper heap, MMIO2 and similar.
1629	*
1630	* @thread EMT(idCpu)
1631	*/
1632	GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1633	uint32_t cShadowPages, uint32_t cFixedPages)
1634	{
1635	LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1636	pGVM, cBasePages, cShadowPages, cFixedPages));
1637
1638	/*
1639	* Validate, get basics and take the semaphore.
1640	*/
1641	PGMM pGMM;
1642	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1643	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1644	if (RT_FAILURE(rc))
1645	return rc;
1646
1647	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1648	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1649	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1650
1651	gmmR0MutexAcquire(pGMM);
1652	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1653	{
1654	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1655	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1656	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1657	{
1658	/*
1659	* Check if we can accommodate this.
1660	*/
1661	/* ... later ... */
1662	if (RT_SUCCESS(rc))
1663	{
1664	/*
1665	* Update the records.
1666	*/
1667	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1668	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1669	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1670	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1671
1672	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1673	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1674	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1675	}
1676	}
1677	else
1678	rc = VERR_WRONG_ORDER;
1679	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1680	}
1681	else
1682	rc = VERR_GMM_IS_NOT_SANE;
1683	gmmR0MutexRelease(pGMM);
1684	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1685	return rc;
1686	}
1687
1688
1689	/**
1690	* VMMR0 request wrapper for GMMR0UpdateReservation.
1691	*
1692	* @returns see GMMR0UpdateReservation.
1693	* @param pGVM The global (ring-0) VM structure.
1694	* @param idCpu The VCPU id.
1695	* @param pReq Pointer to the request packet.
1696	*/
1697	GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1698	{
1699	/*
1700	* Validate input and pass it on.
1701	*/
1702	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1703	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1704
1705	return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1706	}
1707
1708	#ifdef GMMR0_WITH_SANITY_CHECK
1709
1710	/**
1711	* Performs sanity checks on a free set.
1712	*
1713	* @returns Error count.
1714	*
1715	* @param pGMM Pointer to the GMM instance.
1716	* @param pSet Pointer to the set.
1717	* @param pszSetName The set name.
1718	* @param pszFunction The function from which it was called.
1719	* @param uLine The line number.
1720	*/
1721	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1722	const char *pszFunction, unsigned uLineNo)
1723	{
1724	uint32_t cErrors = 0;
1725
1726	/*
1727	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1728	*/
1729	uint32_t cPages = 0;
1730	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1731	{
1732	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1733	{
1734	/** @todo check that the chunk is hash into the right set. */
1735	cPages += pCur->cFree;
1736	}
1737	}
1738	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1739	{
1740	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1741	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1742	cErrors++;
1743	}
1744
1745	return cErrors;
1746	}
1747
1748
1749	/**
1750	* Performs some sanity checks on the GMM while owning lock.
1751	*
1752	* @returns Error count.
1753	*
1754	* @param pGMM Pointer to the GMM instance.
1755	* @param pszFunction The function from which it is called.
1756	* @param uLineNo The line number.
1757	*/
1758	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1759	{
1760	uint32_t cErrors = 0;
1761
1762	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1763	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1764	/** @todo add more sanity checks. */
1765
1766	return cErrors;
1767	}
1768
1769	#endif /* GMMR0_WITH_SANITY_CHECK */
1770
1771	/**
1772	* Looks up a chunk in the tree and fill in the TLB entry for it.
1773	*
1774	* This is not expected to fail and will bitch if it does.
1775	*
1776	* @returns Pointer to the allocation chunk, NULL if not found.
1777	* @param pGMM Pointer to the GMM instance.
1778	* @param idChunk The ID of the chunk to find.
1779	* @param pTlbe Pointer to the TLB entry.
1780	*
1781	* @note Caller owns spinlock.
1782	*/
1783	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1784	{
1785	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1786	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1787	pTlbe->idChunk = idChunk;
1788	pTlbe->pChunk = pChunk;
1789	return pChunk;
1790	}
1791
1792
1793	/**
1794	* Finds a allocation chunk, spin-locked.
1795	*
1796	* This is not expected to fail and will bitch if it does.
1797	*
1798	* @returns Pointer to the allocation chunk, NULL if not found.
1799	* @param pGMM Pointer to the GMM instance.
1800	* @param idChunk The ID of the chunk to find.
1801	*/
1802	DECLINLINE(PGMMCHUNK) gmmR0GetChunkLocked(PGMM pGMM, uint32_t idChunk)
1803	{
1804	/*
1805	* Do a TLB lookup, branch if not in the TLB.
1806	*/
1807	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1808	PGMMCHUNK pChunk = pTlbe->pChunk;
1809	if ( pChunk == NULL
1810	\|\| pTlbe->idChunk != idChunk)
1811	pChunk = gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1812	return pChunk;
1813	}
1814
1815
1816	/**
1817	* Finds a allocation chunk.
1818	*
1819	* This is not expected to fail and will bitch if it does.
1820	*
1821	* @returns Pointer to the allocation chunk, NULL if not found.
1822	* @param pGMM Pointer to the GMM instance.
1823	* @param idChunk The ID of the chunk to find.
1824	*/
1825	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1826	{
1827	RTSpinlockAcquire(pGMM->hSpinLockTree);
1828	PGMMCHUNK pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
1829	RTSpinlockRelease(pGMM->hSpinLockTree);
1830	return pChunk;
1831	}
1832
1833
1834	/**
1835	* Finds a page.
1836	*
1837	* This is not expected to fail and will bitch if it does.
1838	*
1839	* @returns Pointer to the page, NULL if not found.
1840	* @param pGMM Pointer to the GMM instance.
1841	* @param idPage The ID of the page to find.
1842	*/
1843	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1844	{
1845	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1846	if (RT_LIKELY(pChunk))
1847	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1848	return NULL;
1849	}
1850
1851
1852	#if 0 /* unused */
1853	/**
1854	* Gets the host physical address for a page given by it's ID.
1855	*
1856	* @returns The host physical address or NIL_RTHCPHYS.
1857	* @param pGMM Pointer to the GMM instance.
1858	* @param idPage The ID of the page to find.
1859	*/
1860	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1861	{
1862	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1863	if (RT_LIKELY(pChunk))
1864	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1865	return NIL_RTHCPHYS;
1866	}
1867	#endif /* unused */
1868
1869
1870	/**
1871	* Selects the appropriate free list given the number of free pages.
1872	*
1873	* @returns Free list index.
1874	* @param cFree The number of free pages in the chunk.
1875	*/
1876	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1877	{
1878	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1879	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1880	("%d (%u)\n", iList, cFree));
1881	return iList;
1882	}
1883
1884
1885	/**
1886	* Unlinks the chunk from the free list it's currently on (if any).
1887	*
1888	* @param pChunk The allocation chunk.
1889	*/
1890	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1891	{
1892	PGMMCHUNKFREESET pSet = pChunk->pSet;
1893	if (RT_LIKELY(pSet))
1894	{
1895	pSet->cFreePages -= pChunk->cFree;
1896	pSet->idGeneration++;
1897
1898	PGMMCHUNK pPrev = pChunk->pFreePrev;
1899	PGMMCHUNK pNext = pChunk->pFreeNext;
1900	if (pPrev)
1901	pPrev->pFreeNext = pNext;
1902	else
1903	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1904	if (pNext)
1905	pNext->pFreePrev = pPrev;
1906
1907	pChunk->pSet = NULL;
1908	pChunk->pFreeNext = NULL;
1909	pChunk->pFreePrev = NULL;
1910	}
1911	else
1912	{
1913	Assert(!pChunk->pFreeNext);
1914	Assert(!pChunk->pFreePrev);
1915	Assert(!pChunk->cFree);
1916	}
1917	}
1918
1919
1920	/**
1921	* Links the chunk onto the appropriate free list in the specified free set.
1922	*
1923	* If no free entries, it's not linked into any list.
1924	*
1925	* @param pChunk The allocation chunk.
1926	* @param pSet The free set.
1927	*/
1928	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1929	{
1930	Assert(!pChunk->pSet);
1931	Assert(!pChunk->pFreeNext);
1932	Assert(!pChunk->pFreePrev);
1933
1934	if (pChunk->cFree > 0)
1935	{
1936	pChunk->pSet = pSet;
1937	pChunk->pFreePrev = NULL;
1938	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1939	pChunk->pFreeNext = pSet->apLists[iList];
1940	if (pChunk->pFreeNext)
1941	pChunk->pFreeNext->pFreePrev = pChunk;
1942	pSet->apLists[iList] = pChunk;
1943
1944	pSet->cFreePages += pChunk->cFree;
1945	pSet->idGeneration++;
1946	}
1947	}
1948
1949
1950	/**
1951	* Links the chunk onto the appropriate free list in the specified free set.
1952	*
1953	* If no free entries, it's not linked into any list.
1954	*
1955	* @param pGMM Pointer to the GMM instance.
1956	* @param pGVM Pointer to the kernel-only VM instace data.
1957	* @param pChunk The allocation chunk.
1958	*/
1959	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1960	{
1961	PGMMCHUNKFREESET pSet;
1962	if (pGMM->fBoundMemoryMode)
1963	pSet = &pGVM->gmm.s.Private;
1964	else if (pChunk->cShared)
1965	pSet = &pGMM->Shared;
1966	else
1967	pSet = &pGMM->PrivateX;
1968	gmmR0LinkChunk(pChunk, pSet);
1969	}
1970
1971
1972	/**
1973	* Frees a Chunk ID.
1974	*
1975	* @param pGMM Pointer to the GMM instance.
1976	* @param idChunk The Chunk ID to free.
1977	*/
1978	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1979	{
1980	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1981	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1982	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1983	}
1984
1985
1986	/**
1987	* Allocates a new Chunk ID.
1988	*
1989	* @returns The Chunk ID.
1990	* @param pGMM Pointer to the GMM instance.
1991	*/
1992	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1993	{
1994	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1995	AssertCompile(NIL_GMM_CHUNKID == 0);
1996
1997	/*
1998	* Try the next sequential one.
1999	*/
2000	int32_t idChunk = ++pGMM->idChunkPrev;
2001	if ( (uint32_t)idChunk <= GMM_CHUNKID_LAST
2002	&& idChunk > NIL_GMM_CHUNKID
2003	&& !ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk))
2004	return idChunk;
2005
2006	/*
2007	* Scan sequentially from the last one.
2008	*/
2009	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2010	&& idChunk > NIL_GMM_CHUNKID)
2011	{
2012	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2013	if (idChunk > NIL_GMM_CHUNKID)
2014	{
2015	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2016	return pGMM->idChunkPrev = idChunk;
2017	}
2018	}
2019
2020	/*
2021	* Ok, scan from the start.
2022	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2023	*/
2024	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2025	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2026	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2027
2028	return pGMM->idChunkPrev = idChunk;
2029	}
2030
2031
2032	/**
2033	* Allocates one private page.
2034	*
2035	* Worker for gmmR0AllocatePages.
2036	*
2037	* @param pChunk The chunk to allocate it from.
2038	* @param hGVM The GVM handle of the VM requesting memory.
2039	* @param pPageDesc The page descriptor.
2040	*/
2041	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2042	{
2043	/* update the chunk stats. */
2044	if (pChunk->hGVM == NIL_GVM_HANDLE)
2045	pChunk->hGVM = hGVM;
2046	Assert(pChunk->cFree);
2047	pChunk->cFree--;
2048	pChunk->cPrivate++;
2049
2050	/* unlink the first free page. */
2051	const uint32_t iPage = pChunk->iFreeHead;
2052	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2053	PGMMPAGE pPage = &pChunk->aPages[iPage];
2054	Assert(GMM_PAGE_IS_FREE(pPage));
2055	pChunk->iFreeHead = pPage->Free.iNext;
2056	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2057	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2058	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2059
2060	bool const fZeroed = pPage->Free.fZeroed;
2061
2062	/* make the page private. */
2063	pPage->u = 0;
2064	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2065	pPage->Private.hGVM = hGVM;
2066	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2067	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2068	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2069	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2070	else
2071	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2072
2073	/* update the page descriptor. */
2074	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2075	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2076	RTHCPHYS const HCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2077	Assert(HCPhys != NIL_RTHCPHYS); Assert(HCPhys < NIL_GMMPAGEDESC_PHYS);
2078	pPageDesc->HCPhysGCPhys = HCPhys;
2079	pPageDesc->fZeroed = fZeroed;
2080	}
2081
2082
2083	/**
2084	* Picks the free pages from a chunk.
2085	*
2086	* @returns The new page descriptor table index.
2087	* @param pChunk The chunk.
2088	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2089	* affinity.
2090	* @param iPage The current page descriptor table index.
2091	* @param cPages The total number of pages to allocate.
2092	* @param paPages The page descriptor table (input + ouput).
2093	*/
2094	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2095	PGMMPAGEDESC paPages)
2096	{
2097	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2098	gmmR0UnlinkChunk(pChunk);
2099
2100	for (; pChunk->cFree && iPage < cPages; iPage++)
2101	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2102
2103	gmmR0LinkChunk(pChunk, pSet);
2104	return iPage;
2105	}
2106
2107
2108	/**
2109	* Registers a new chunk of memory.
2110	*
2111	* This is called by gmmR0AllocateOneChunk and GMMR0AllocateLargePage.
2112	*
2113	* In the GMMR0AllocateLargePage case the GMM_CHUNK_FLAGS_LARGE_PAGE flag is
2114	* set and the chunk will be registered as fully allocated to save time.
2115	*
2116	* @returns VBox status code. On success, the giant GMM lock will be held, the
2117	* caller must release it (ugly).
2118	* @param pGMM Pointer to the GMM instance.
2119	* @param pSet Pointer to the set.
2120	* @param hMemObj The memory object for the chunk.
2121	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2122	* affinity.
2123	* @param pSession Same as @a hGVM.
2124	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2125	* @param ppChunk Chunk address (out).
2126	*
2127	* @remarks The caller must not own the giant GMM mutex.
2128	* The giant GMM mutex will be acquired and returned acquired in
2129	* the success path. On failure, no locks will be held.
2130	*/
2131	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, PSUPDRVSESSION pSession,
2132	uint16_t fChunkFlags, PGMMCHUNK *ppChunk)
2133	{
2134	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2135	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2136	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2137
2138	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2139	/*
2140	* Get a ring-0 mapping of the object.
2141	*/
2142	uint8_t pbMapping = (uint8_t )RTR0MemObjAddress(hMemObj);
2143	if (!pbMapping)
2144	{
2145	RTR0MEMOBJ hMapObj;
2146	int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE);
2147	if (RT_SUCCESS(rc))
2148	pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2149	else
2150	return rc;
2151	AssertPtr(pbMapping);
2152	}
2153	#endif
2154
2155	/*
2156	* Allocate a chunk.
2157	*/
2158	int rc;
2159	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2160	if (pChunk)
2161	{
2162	/*
2163	* Initialize it.
2164	*/
2165	pChunk->hMemObj = hMemObj;
2166	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2167	pChunk->pbMapping = pbMapping;
2168	#endif
2169	pChunk->hGVM = hGVM;
2170	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2171	pChunk->iChunkMtx = UINT8_MAX;
2172	pChunk->fFlags = fChunkFlags;
2173	pChunk->uidOwner = pSession ? SUPR0GetSessionUid(pSession) : NIL_RTUID;
2174	/pChunk->cShared = 0; /
2175
2176	if (!(fChunkFlags & GMM_CHUNK_FLAGS_LARGE_PAGE))
2177	{
2178	/* Queue all pages on the free list. */
2179	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2180	/pChunk->cPrivate = 0; /
2181	/pChunk->iFreeHead = 0;/
2182
2183	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2184	{
2185	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2186	pChunk->aPages[iPage].Free.fZeroed = true;
2187	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2188	}
2189	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2190	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.fZeroed = true;
2191	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2192	}
2193	else
2194	{
2195	/* Mark all pages as privately allocated (watered down gmmR0AllocatePage). */
2196	pChunk->cFree = 0;
2197	pChunk->cPrivate = GMM_CHUNK_NUM_PAGES;
2198	pChunk->iFreeHead = UINT16_MAX;
2199
2200	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages); iPage++)
2201	{
2202	pChunk->aPages[iPage].Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2203	pChunk->aPages[iPage].Private.hGVM = hGVM;
2204	pChunk->aPages[iPage].Private.u2State = GMM_PAGE_STATE_PRIVATE;
2205	}
2206	}
2207
2208	/*
2209	* Zero the memory if it wasn't zeroed by the host already.
2210	* This simplifies keeping secret kernel bits from userland and brings
2211	* everyone to the same level wrt allocation zeroing.
2212	*/
2213	rc = VINF_SUCCESS;
2214	if (!RTR0MemObjWasZeroInitialized(hMemObj))
2215	{
2216	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2217	for (uint32_t iPage = 0; iPage < (GMM_CHUNK_SIZE >> PAGE_SHIFT); iPage++)
2218	{
2219	void *pvPage = NULL;
2220	rc = SUPR0HCPhysToVirt(RTR0MemObjGetPagePhysAddr(hMemObj, iPage), &pvPage);
2221	AssertRC(rc);
2222	if (RT_SUCCESS(rc))
2223	RT_BZERO(pvPage, PAGE_SIZE);
2224	else
2225	break;
2226	}
2227	#else
2228	RT_BZERO(pbMapping, GMM_CHUNK_SIZE);
2229	#endif
2230	}
2231	if (RT_SUCCESS(rc))
2232	{
2233	*ppChunk = pChunk;
2234
2235	/*
2236	* Allocate a Chunk ID and insert it into the tree.
2237	* This has to be done behind the mutex of course.
2238	*/
2239	rc = gmmR0MutexAcquire(pGMM);
2240	if (RT_SUCCESS(rc))
2241	{
2242	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2243	{
2244	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2245	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2246	&& pChunk->Core.Key <= GMM_CHUNKID_LAST)
2247	{
2248	RTSpinlockAcquire(pGMM->hSpinLockTree);
2249	if (RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2250	{
2251	pGMM->cChunks++;
2252	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2253	RTSpinlockRelease(pGMM->hSpinLockTree);
2254
2255	gmmR0LinkChunk(pChunk, pSet);
2256
2257	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2258
2259	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2260	return VINF_SUCCESS;
2261	}
2262	RTSpinlockRelease(pGMM->hSpinLockTree);
2263	}
2264
2265	/*
2266	* Bail out.
2267	*/
2268	rc = VERR_GMM_CHUNK_INSERT;
2269	}
2270	else
2271	rc = VERR_GMM_IS_NOT_SANE;
2272	gmmR0MutexRelease(pGMM);
2273	}
2274
2275	*ppChunk = NULL;
2276	}
2277	RTMemFree(pChunk);
2278	}
2279	else
2280	rc = VERR_NO_MEMORY;
2281	return rc;
2282	}
2283
2284
2285	/**
2286	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2287	* what's remaining to the specified free set.
2288	*
2289	* @note This will leave the giant mutex while allocating the new chunk!
2290	*
2291	* @returns VBox status code.
2292	* @param pGMM Pointer to the GMM instance data.
2293	* @param pGVM Pointer to the kernel-only VM instace data.
2294	* @param pSet Pointer to the free set.
2295	* @param cPages The number of pages requested.
2296	* @param paPages The page descriptor table (input + output).
2297	* @param piPage The pointer to the page descriptor table index variable.
2298	* This will be updated.
2299	*/
2300	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2301	PGMMPAGEDESC paPages, uint32_t *piPage)
2302	{
2303	gmmR0MutexRelease(pGMM);
2304
2305	RTR0MEMOBJ hMemObj;
2306	int rc;
2307	#ifdef VBOX_WITH_LINEAR_HOST_PHYS_MEM
2308	if (pGMM->fHasWorkingAllocPhysNC)
2309	rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2310	else
2311	#endif
2312	rc = RTR0MemObjAllocPage(&hMemObj, GMM_CHUNK_SIZE, false /fExecutable/);
2313	if (RT_SUCCESS(rc))
2314	{
2315	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2316	* free pages first and then unchaining them right afterwards. Instead
2317	* do as much work as possible without holding the giant lock. */
2318	PGMMCHUNK pChunk;
2319	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, 0 /fChunkFlags/, &pChunk);
2320	if (RT_SUCCESS(rc))
2321	{
2322	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2323	return VINF_SUCCESS;
2324	}
2325
2326	/* bail out */
2327	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2328	}
2329
2330	int rc2 = gmmR0MutexAcquire(pGMM);
2331	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2332	return rc;
2333
2334	}
2335
2336
2337	/**
2338	* As a last restort we'll pick any page we can get.
2339	*
2340	* @returns The new page descriptor table index.
2341	* @param pSet The set to pick from.
2342	* @param pGVM Pointer to the global VM structure.
2343	* @param uidSelf The UID of the caller.
2344	* @param iPage The current page descriptor table index.
2345	* @param cPages The total number of pages to allocate.
2346	* @param paPages The page descriptor table (input + ouput).
2347	*/
2348	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2349	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2350	{
2351	unsigned iList = RT_ELEMENTS(pSet->apLists);
2352	while (iList-- > 0)
2353	{
2354	PGMMCHUNK pChunk = pSet->apLists[iList];
2355	while (pChunk)
2356	{
2357	PGMMCHUNK pNext = pChunk->pFreeNext;
2358	if ( pChunk->uidOwner == uidSelf
2359	\|\| ( pChunk->cMappingsX == 0
2360	&& pChunk->cFree == (GMM_CHUNK_SIZE >> PAGE_SHIFT)))
2361	{
2362	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2363	if (iPage >= cPages)
2364	return iPage;
2365	}
2366
2367	pChunk = pNext;
2368	}
2369	}
2370	return iPage;
2371	}
2372
2373
2374	/**
2375	* Pick pages from empty chunks on the same NUMA node.
2376	*
2377	* @returns The new page descriptor table index.
2378	* @param pSet The set to pick from.
2379	* @param pGVM Pointer to the global VM structure.
2380	* @param uidSelf The UID of the caller.
2381	* @param iPage The current page descriptor table index.
2382	* @param cPages The total number of pages to allocate.
2383	* @param paPages The page descriptor table (input + ouput).
2384	*/
2385	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID uidSelf,
2386	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2387	{
2388	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2389	if (pChunk)
2390	{
2391	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2392	while (pChunk)
2393	{
2394	PGMMCHUNK pNext = pChunk->pFreeNext;
2395
2396	if ( pChunk->idNumaNode == idNumaNode
2397	&& ( pChunk->uidOwner == uidSelf
2398	\|\| pChunk->cMappingsX == 0))
2399	{
2400	pChunk->hGVM = pGVM->hSelf;
2401	pChunk->uidOwner = uidSelf;
2402	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2403	if (iPage >= cPages)
2404	{
2405	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2406	return iPage;
2407	}
2408	}
2409
2410	pChunk = pNext;
2411	}
2412	}
2413	return iPage;
2414	}
2415
2416
2417	/**
2418	* Pick pages from non-empty chunks on the same NUMA node.
2419	*
2420	* @returns The new page descriptor table index.
2421	* @param pSet The set to pick from.
2422	* @param pGVM Pointer to the global VM structure.
2423	* @param uidSelf The UID of the caller.
2424	* @param iPage The current page descriptor table index.
2425	* @param cPages The total number of pages to allocate.
2426	* @param paPages The page descriptor table (input + ouput).
2427	*/
2428	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM, RTUID const uidSelf,
2429	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2430	{
2431	/** @todo start by picking from chunks with about the right size first? */
2432	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2433	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2434	while (iList-- > 0)
2435	{
2436	PGMMCHUNK pChunk = pSet->apLists[iList];
2437	while (pChunk)
2438	{
2439	PGMMCHUNK pNext = pChunk->pFreeNext;
2440
2441	if ( pChunk->idNumaNode == idNumaNode
2442	&& pChunk->uidOwner == uidSelf)
2443	{
2444	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2445	if (iPage >= cPages)
2446	{
2447	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2448	return iPage;
2449	}
2450	}
2451
2452	pChunk = pNext;
2453	}
2454	}
2455	return iPage;
2456	}
2457
2458
2459	/**
2460	* Pick pages that are in chunks already associated with the VM.
2461	*
2462	* @returns The new page descriptor table index.
2463	* @param pGMM Pointer to the GMM instance data.
2464	* @param pGVM Pointer to the global VM structure.
2465	* @param pSet The set to pick from.
2466	* @param iPage The current page descriptor table index.
2467	* @param cPages The total number of pages to allocate.
2468	* @param paPages The page descriptor table (input + ouput).
2469	*/
2470	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2471	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2472	{
2473	uint16_t const hGVM = pGVM->hSelf;
2474
2475	/* Hint. */
2476	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2477	{
2478	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2479	if (pChunk && pChunk->cFree)
2480	{
2481	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2482	if (iPage >= cPages)
2483	return iPage;
2484	}
2485	}
2486
2487	/* Scan. */
2488	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2489	{
2490	PGMMCHUNK pChunk = pSet->apLists[iList];
2491	while (pChunk)
2492	{
2493	PGMMCHUNK pNext = pChunk->pFreeNext;
2494
2495	if (pChunk->hGVM == hGVM)
2496	{
2497	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2498	if (iPage >= cPages)
2499	{
2500	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2501	return iPage;
2502	}
2503	}
2504
2505	pChunk = pNext;
2506	}
2507	}
2508	return iPage;
2509	}
2510
2511
2512
2513	/**
2514	* Pick pages in bound memory mode.
2515	*
2516	* @returns The new page descriptor table index.
2517	* @param pGVM Pointer to the global VM structure.
2518	* @param iPage The current page descriptor table index.
2519	* @param cPages The total number of pages to allocate.
2520	* @param paPages The page descriptor table (input + ouput).
2521	*/
2522	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2523	{
2524	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2525	{
2526	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2527	while (pChunk)
2528	{
2529	Assert(pChunk->hGVM == pGVM->hSelf);
2530	PGMMCHUNK pNext = pChunk->pFreeNext;
2531	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2532	if (iPage >= cPages)
2533	return iPage;
2534	pChunk = pNext;
2535	}
2536	}
2537	return iPage;
2538	}
2539
2540
2541	/**
2542	* Checks if we should start picking pages from chunks of other VMs because
2543	* we're getting close to the system memory or reserved limit.
2544	*
2545	* @returns @c true if we should, @c false if we should first try allocate more
2546	* chunks.
2547	*/
2548	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2549	{
2550	/*
2551	* Don't allocate a new chunk if we're
2552	*/
2553	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2554	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2555	- pGVM->gmm.s.Stats.cBalloonedPages
2556	/** @todo what about shared pages? */;
2557	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2558	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2559	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2560	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2561	return true;
2562	/** @todo make the threshold configurable, also test the code to see if
2563	* this ever kicks in (we might be reserving too much or smth). */
2564
2565	/*
2566	* Check how close we're to the max memory limit and how many fragments
2567	* there are?...
2568	*/
2569	/** @todo */
2570
2571	return false;
2572	}
2573
2574
2575	/**
2576	* Checks if we should start picking pages from chunks of other VMs because
2577	* there is a lot of free pages around.
2578	*
2579	* @returns @c true if we should, @c false if we should first try allocate more
2580	* chunks.
2581	*/
2582	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2583	{
2584	/*
2585	* Setting the limit at 16 chunks (32 MB) at the moment.
2586	*/
2587	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2588	return true;
2589	return false;
2590	}
2591
2592
2593	/**
2594	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2595	*
2596	* @returns VBox status code:
2597	* @retval VINF_SUCCESS on success.
2598	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2599	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2600	* that is we're trying to allocate more than we've reserved.
2601	*
2602	* @param pGMM Pointer to the GMM instance data.
2603	* @param pGVM Pointer to the VM.
2604	* @param cPages The number of pages to allocate.
2605	* @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2606	* details on what is expected on input.
2607	* @param enmAccount The account to charge.
2608	*
2609	* @remarks Caller owns the giant GMM lock.
2610	*/
2611	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2612	{
2613	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2614
2615	/*
2616	* Check allocation limits.
2617	*/
2618	if (RT_LIKELY(pGMM->cAllocatedPages + cPages <= pGMM->cMaxPages))
2619	{ /* likely */ }
2620	else
2621	return VERR_GMM_HIT_GLOBAL_LIMIT;
2622
2623	switch (enmAccount)
2624	{
2625	case GMMACCOUNT_BASE:
2626	if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2627	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
2628	{ /* likely */ }
2629	else
2630	{
2631	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2632	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2633	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2634	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2635	}
2636	break;
2637	case GMMACCOUNT_SHADOW:
2638	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages <= pGVM->gmm.s.Stats.Reserved.cShadowPages))
2639	{ /* likely */ }
2640	else
2641	{
2642	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2643	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2644	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2645	}
2646	break;
2647	case GMMACCOUNT_FIXED:
2648	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages <= pGVM->gmm.s.Stats.Reserved.cFixedPages))
2649	{ /* likely */ }
2650	else
2651	{
2652	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2653	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2654	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2655	}
2656	break;
2657	default:
2658	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2659	}
2660
2661	/*
2662	* Update the accounts before we proceed because we might be leaving the
2663	* protection of the global mutex and thus run the risk of permitting
2664	* too much memory to be allocated.
2665	*/
2666	switch (enmAccount)
2667	{
2668	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2669	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2670	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2671	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2672	}
2673	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2674	pGMM->cAllocatedPages += cPages;
2675
2676	/*
2677	* Bound mode is also relatively straightforward.
2678	*/
2679	uint32_t iPage = 0;
2680	int rc = VINF_SUCCESS;
2681	if (pGMM->fBoundMemoryMode)
2682	{
2683	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2684	if (iPage < cPages)
2685	do
2686	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2687	while (iPage < cPages && RT_SUCCESS(rc));
2688	}
2689	/*
2690	* Shared mode is trickier as we should try archive the same locality as
2691	* in bound mode, but smartly make use of non-full chunks allocated by
2692	* other VMs if we're low on memory.
2693	*/
2694	else
2695	{
2696	RTUID const uidSelf = SUPR0GetSessionUid(pGVM->pSession);
2697
2698	/* Pick the most optimal pages first. */
2699	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2700	if (iPage < cPages)
2701	{
2702	/* Maybe we should try getting pages from chunks "belonging" to
2703	other VMs before allocating more chunks? */
2704	bool fTriedOnSameAlready = false;
2705	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2706	{
2707	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2708	fTriedOnSameAlready = true;
2709	}
2710
2711	/* Allocate memory from empty chunks. */
2712	if (iPage < cPages)
2713	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2714
2715	/* Grab empty shared chunks. */
2716	if (iPage < cPages)
2717	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, uidSelf, iPage, cPages, paPages);
2718
2719	/* If there is a lof of free pages spread around, try not waste
2720	system memory on more chunks. (Should trigger defragmentation.) */
2721	if ( !fTriedOnSameAlready
2722	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2723	{
2724	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2725	if (iPage < cPages)
2726	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, uidSelf, iPage, cPages, paPages);
2727	}
2728
2729	/*
2730	* Ok, try allocate new chunks.
2731	*/
2732	if (iPage < cPages)
2733	{
2734	do
2735	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2736	while (iPage < cPages && RT_SUCCESS(rc));
2737
2738	#if 0 /* We cannot mix chunks with different UIDs. */
2739	/* If the host is out of memory, take whatever we can get. */
2740	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2741	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2742	{
2743	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2744	if (iPage < cPages)
2745	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2746	AssertRelease(iPage == cPages);
2747	rc = VINF_SUCCESS;
2748	}
2749	#endif
2750	}
2751	}
2752	}
2753
2754	/*
2755	* Clean up on failure. Since this is bound to be a low-memory condition
2756	* we will give back any empty chunks that might be hanging around.
2757	*/
2758	if (RT_SUCCESS(rc))
2759	{ /* likely */ }
2760	else
2761	{
2762	/* Update the statistics. */
2763	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2764	pGMM->cAllocatedPages -= cPages - iPage;
2765	switch (enmAccount)
2766	{
2767	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2768	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2769	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2770	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2771	}
2772
2773	/* Release the pages. */
2774	while (iPage-- > 0)
2775	{
2776	uint32_t idPage = paPages[iPage].idPage;
2777	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2778	if (RT_LIKELY(pPage))
2779	{
2780	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2781	Assert(pPage->Private.hGVM == pGVM->hSelf);
2782	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2783	}
2784	else
2785	AssertMsgFailed(("idPage=%#x\n", idPage));
2786
2787	paPages[iPage].idPage = NIL_GMM_PAGEID;
2788	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2789	paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2790	paPages[iPage].fZeroed = false;
2791	}
2792
2793	/* Free empty chunks. */
2794	/** @todo */
2795
2796	/* return the fail status on failure */
2797	return rc;
2798	}
2799	return VINF_SUCCESS;
2800	}
2801
2802
2803	/**
2804	* Updates the previous allocations and allocates more pages.
2805	*
2806	* The handy pages are always taken from the 'base' memory account.
2807	* The allocated pages are not cleared and will contains random garbage.
2808	*
2809	* @returns VBox status code:
2810	* @retval VINF_SUCCESS on success.
2811	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2812	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2813	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2814	* private page.
2815	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2816	* shared page.
2817	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2818	* owned by the VM.
2819	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2820	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2821	* that is we're trying to allocate more than we've reserved.
2822	*
2823	* @param pGVM The global (ring-0) VM structure.
2824	* @param idCpu The VCPU id.
2825	* @param cPagesToUpdate The number of pages to update (starting from the head).
2826	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2827	* @param paPages The array of page descriptors.
2828	* See GMMPAGEDESC for details on what is expected on input.
2829	* @thread EMT(idCpu)
2830	*/
2831	GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2832	uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2833	{
2834	LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2835	pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2836
2837	/*
2838	* Validate, get basics and take the semaphore.
2839	* (This is a relatively busy path, so make predictions where possible.)
2840	*/
2841	PGMM pGMM;
2842	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2843	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2844	if (RT_FAILURE(rc))
2845	return rc;
2846
2847	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2848	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2849	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2850	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2851	VERR_INVALID_PARAMETER);
2852
2853	unsigned iPage = 0;
2854	for (; iPage < cPagesToUpdate; iPage++)
2855	{
2856	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2857	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2858	\|\| paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
2859	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2860	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2861	VERR_INVALID_PARAMETER);
2862	/* ignore fZeroed here */
2863	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2864	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2865	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2866	AssertMsgReturn( paPages[iPage].idSharedPage == NIL_GMM_PAGEID
2867	\|\| paPages[iPage].idSharedPage <= GMM_PAGEID_LAST,
2868	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2869	}
2870
2871	for (; iPage < cPagesToAlloc; iPage++)
2872	{
2873	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2874	AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
2875	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2876	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2877	}
2878
2879	gmmR0MutexAcquire(pGMM);
2880	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2881	{
2882	/* No allocations before the initial reservation has been made! */
2883	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2884	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2885	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2886	{
2887	/*
2888	* Perform the updates.
2889	* Stop on the first error.
2890	*/
2891	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2892	{
2893	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2894	{
2895	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2896	if (RT_LIKELY(pPage))
2897	{
2898	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2899	{
2900	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2901	{
2902	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2903	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2904	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2905	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2906	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2907	/* else: NIL_RTHCPHYS nothing */
2908
2909	paPages[iPage].idPage = NIL_GMM_PAGEID;
2910	paPages[iPage].HCPhysGCPhys = NIL_GMMPAGEDESC_PHYS;
2911	paPages[iPage].fZeroed = false;
2912	}
2913	else
2914	{
2915	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2916	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2917	rc = VERR_GMM_NOT_PAGE_OWNER;
2918	break;
2919	}
2920	}
2921	else
2922	{
2923	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2924	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2925	break;
2926	}
2927	}
2928	else
2929	{
2930	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2931	rc = VERR_GMM_PAGE_NOT_FOUND;
2932	break;
2933	}
2934	}
2935
2936	if (paPages[iPage].idSharedPage == NIL_GMM_PAGEID)
2937	{ /* likely */ }
2938	else
2939	{
2940	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2941	if (RT_LIKELY(pPage))
2942	{
2943	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2944	{
2945	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2946	Assert(pPage->Shared.cRefs);
2947	Assert(pGVM->gmm.s.Stats.cSharedPages);
2948	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2949
2950	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2951	pGVM->gmm.s.Stats.cSharedPages--;
2952	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2953	if (!--pPage->Shared.cRefs)
2954	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2955	else
2956	{
2957	Assert(pGMM->cDuplicatePages);
2958	pGMM->cDuplicatePages--;
2959	}
2960
2961	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2962	}
2963	else
2964	{
2965	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2966	rc = VERR_GMM_PAGE_NOT_SHARED;
2967	break;
2968	}
2969	}
2970	else
2971	{
2972	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2973	rc = VERR_GMM_PAGE_NOT_FOUND;
2974	break;
2975	}
2976	}
2977	} /* for each page to update */
2978
2979	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2980	{
2981	#ifdef VBOX_STRICT
2982	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2983	{
2984	Assert(paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS);
2985	Assert(paPages[iPage].fZeroed == false);
2986	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2987	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2988	}
2989	#endif
2990
2991	/*
2992	* Join paths with GMMR0AllocatePages for the allocation.
2993	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2994	*/
2995	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2996	}
2997	}
2998	else
2999	rc = VERR_WRONG_ORDER;
3000	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3001	}
3002	else
3003	rc = VERR_GMM_IS_NOT_SANE;
3004	gmmR0MutexRelease(pGMM);
3005	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
3006	return rc;
3007	}
3008
3009
3010	/**
3011	* Allocate one or more pages.
3012	*
3013	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
3014	* The allocated pages are not cleared and will contain random garbage.
3015	*
3016	* @returns VBox status code:
3017	* @retval VINF_SUCCESS on success.
3018	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3019	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3020	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3021	* that is we're trying to allocate more than we've reserved.
3022	*
3023	* @param pGVM The global (ring-0) VM structure.
3024	* @param idCpu The VCPU id.
3025	* @param cPages The number of pages to allocate.
3026	* @param paPages Pointer to the page descriptors.
3027	* See GMMPAGEDESC for details on what is expected on
3028	* input.
3029	* @param enmAccount The account to charge.
3030	*
3031	* @thread EMT.
3032	*/
3033	GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
3034	{
3035	LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3036
3037	/*
3038	* Validate, get basics and take the semaphore.
3039	*/
3040	PGMM pGMM;
3041	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3042	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3043	if (RT_FAILURE(rc))
3044	return rc;
3045
3046	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3047	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3048	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3049
3050	for (unsigned iPage = 0; iPage < cPages; iPage++)
3051	{
3052	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_GMMPAGEDESC_PHYS
3053	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
3054	\|\| ( enmAccount == GMMACCOUNT_BASE
3055	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
3056	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
3057	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
3058	VERR_INVALID_PARAMETER);
3059	AssertMsgReturn(paPages[iPage].fZeroed == false, ("#%#x: %#x\n", iPage, paPages[iPage].fZeroed), VERR_INVALID_PARAMETER);
3060	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3061	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
3062	}
3063
3064	/*
3065	* Grab the giant mutex and get working.
3066	*/
3067	gmmR0MutexAcquire(pGMM);
3068	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3069	{
3070
3071	/* No allocations before the initial reservation has been made! */
3072	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
3073	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
3074	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
3075	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
3076	else
3077	rc = VERR_WRONG_ORDER;
3078	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3079	}
3080	else
3081	rc = VERR_GMM_IS_NOT_SANE;
3082	gmmR0MutexRelease(pGMM);
3083
3084	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3085	return rc;
3086	}
3087
3088
3089	/**
3090	* VMMR0 request wrapper for GMMR0AllocatePages.
3091	*
3092	* @returns see GMMR0AllocatePages.
3093	* @param pGVM The global (ring-0) VM structure.
3094	* @param idCpu The VCPU id.
3095	* @param pReq Pointer to the request packet.
3096	*/
3097	GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3098	{
3099	/*
3100	* Validate input and pass it on.
3101	*/
3102	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3103	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3104	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3105	VERR_INVALID_PARAMETER);
3106	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3107	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3108	VERR_INVALID_PARAMETER);
3109
3110	return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3111	}
3112
3113
3114	/**
3115	* Allocate a large page to represent guest RAM
3116	*
3117	* The allocated pages are zeroed upon return.
3118	*
3119	* @returns VBox status code:
3120	* @retval VINF_SUCCESS on success.
3121	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3122	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3123	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3124	* that is we're trying to allocate more than we've reserved.
3125	* @retval VERR_TRY_AGAIN if the host is temporarily out of large pages.
3126	* @returns see GMMR0AllocatePages.
3127	*
3128	* @param pGVM The global (ring-0) VM structure.
3129	* @param idCpu The VCPU id.
3130	* @param cbPage Large page size.
3131	* @param pIdPage Where to return the GMM page ID of the page.
3132	* @param pHCPhys Where to return the host physical address of the page.
3133	*/
3134	GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3135	{
3136	LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3137
3138	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3139	*pIdPage = NIL_GMM_PAGEID;
3140	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3141	*pHCPhys = NIL_RTHCPHYS;
3142	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3143
3144	/*
3145	* Validate GVM + idCpu, get basics and take the semaphore.
3146	*/
3147	PGMM pGMM;
3148	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3149	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3150	if (RT_SUCCESS(rc))
3151	rc = gmmR0MutexAcquire(pGMM);
3152	if (RT_SUCCESS(rc))
3153	{
3154	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3155	{
3156	/*
3157	* Check the quota.
3158	*/
3159	/** @todo r=bird: Quota checking could be done w/o the giant mutex but using
3160	* a VM specific mutex... */
3161	if (RT_LIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + GMM_CHUNK_NUM_PAGES
3162	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3163	{
3164	/*
3165	* Allocate a new large page chunk.
3166	*
3167	* Note! We leave the giant GMM lock temporarily as the allocation might
3168	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3169	*/
3170	AssertCompile(GMM_CHUNK_SIZE == _2M);
3171	gmmR0MutexRelease(pGMM);
3172
3173	RTR0MEMOBJ hMemObj;
3174	rc = RTR0MemObjAllocLarge(&hMemObj, GMM_CHUNK_SIZE, GMM_CHUNK_SIZE, RTMEMOBJ_ALLOC_LARGE_F_FAST);
3175	if (RT_SUCCESS(rc))
3176	{
3177	*pHCPhys = RTR0MemObjGetPagePhysAddr(hMemObj, 0);
3178
3179	/*
3180	* Register the chunk as fully allocated.
3181	* Note! As mentioned above, this will return owning the mutex on success.
3182	*/
3183	PGMMCHUNK pChunk = NULL;
3184	PGMMCHUNKFREESET const pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3185	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, pGVM->pSession, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3186	if (RT_SUCCESS(rc))
3187	{
3188	/*
3189	* The gmmR0RegisterChunk call already marked all pages allocated,
3190	* so we just have to fill in the return values and update stats now.
3191	*/
3192	*pIdPage = pChunk->Core.Key << GMM_CHUNKID_SHIFT;
3193
3194	/* Update accounting. */
3195	pGVM->gmm.s.Stats.Allocated.cBasePages += GMM_CHUNK_NUM_PAGES;
3196	pGVM->gmm.s.Stats.cPrivatePages += GMM_CHUNK_NUM_PAGES;
3197	pGMM->cAllocatedPages += GMM_CHUNK_NUM_PAGES;
3198
3199	gmmR0LinkChunk(pChunk, pSet);
3200	gmmR0MutexRelease(pGMM);
3201
3202	LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3203	return VINF_SUCCESS;
3204	}
3205
3206	/*
3207	* Bail out.
3208	*/
3209	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3210	*pHCPhys = NIL_RTHCPHYS;
3211	}
3212	}
3213	else
3214	{
3215	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3216	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, GMM_CHUNK_NUM_PAGES));
3217	gmmR0MutexRelease(pGMM);
3218	rc = VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3219	}
3220	}
3221	else
3222	{
3223	gmmR0MutexRelease(pGMM);
3224	rc = VERR_GMM_IS_NOT_SANE;
3225	}
3226	}
3227
3228	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3229	return rc;
3230	}
3231
3232
3233	/**
3234	* Free a large page.
3235	*
3236	* @returns VBox status code:
3237	* @param pGVM The global (ring-0) VM structure.
3238	* @param idCpu The VCPU id.
3239	* @param idPage The large page id.
3240	*/
3241	GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3242	{
3243	LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3244
3245	/*
3246	* Validate, get basics and take the semaphore.
3247	*/
3248	PGMM pGMM;
3249	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3250	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3251	if (RT_FAILURE(rc))
3252	return rc;
3253
3254	gmmR0MutexAcquire(pGMM);
3255	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3256	{
3257	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3258
3259	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3260	{
3261	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3262	gmmR0MutexRelease(pGMM);
3263	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3264	}
3265
3266	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3267	if (RT_LIKELY( pPage
3268	&& GMM_PAGE_IS_PRIVATE(pPage)))
3269	{
3270	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3271	Assert(pChunk);
3272	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3273	Assert(pChunk->cPrivate > 0);
3274
3275	/* Release the memory immediately. */
3276	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3277
3278	/* Update accounting. */
3279	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3280	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3281	pGMM->cAllocatedPages -= cPages;
3282	}
3283	else
3284	rc = VERR_GMM_PAGE_NOT_FOUND;
3285	}
3286	else
3287	rc = VERR_GMM_IS_NOT_SANE;
3288
3289	gmmR0MutexRelease(pGMM);
3290	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3291	return rc;
3292	}
3293
3294
3295	/**
3296	* VMMR0 request wrapper for GMMR0FreeLargePage.
3297	*
3298	* @returns see GMMR0FreeLargePage.
3299	* @param pGVM The global (ring-0) VM structure.
3300	* @param idCpu The VCPU id.
3301	* @param pReq Pointer to the request packet.
3302	*/
3303	GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3304	{
3305	/*
3306	* Validate input and pass it on.
3307	*/
3308	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3309	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3310	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3311	VERR_INVALID_PARAMETER);
3312
3313	return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3314	}
3315
3316
3317	/**
3318	* @callback_method_impl{FNGVMMR0ENUMCALLBACK,
3319	* Used by gmmR0FreeChunkFlushPerVmTlbs().}
3320	*/
3321	static DECLCALLBACK(int) gmmR0InvalidatePerVmChunkTlbCallback(PGVM pGVM, void *pvUser)
3322	{
3323	RT_NOREF(pvUser);
3324	if (pGVM->gmm.s.hChunkTlbSpinLock != NIL_RTSPINLOCK)
3325	{
3326	RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
3327	uintptr_t i = RT_ELEMENTS(pGVM->gmm.s.aChunkTlbEntries);
3328	while (i-- > 0)
3329	{
3330	pGVM->gmm.s.aChunkTlbEntries[i].idGeneration = UINT64_MAX;
3331	pGVM->gmm.s.aChunkTlbEntries[i].pChunk = NULL;
3332	}
3333	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
3334	}
3335	return VINF_SUCCESS;
3336	}
3337
3338
3339	/**
3340	* Called by gmmR0FreeChunk when we reach the threshold for wrapping around the
3341	* free generation ID value.
3342	*
3343	* This is done at 2^62 - 1, which allows us to drop all locks and as it will
3344	* take a while before 12 exa (2 305 843 009 213 693 952) calls to
3345	* gmmR0FreeChunk can be made and causes a real wrap-around. We do two
3346	* invalidation passes and resets the generation ID between then. This will
3347	* make sure there are no false positives.
3348	*
3349	* @param pGMM Pointer to the GMM instance.
3350	*/
3351	static void gmmR0FreeChunkFlushPerVmTlbs(PGMM pGMM)
3352	{
3353	/*
3354	* First invalidation pass.
3355	*/
3356	int rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3357	AssertRCSuccess(rc);
3358
3359	/*
3360	* Reset the generation number.
3361	*/
3362	RTSpinlockAcquire(pGMM->hSpinLockTree);
3363	ASMAtomicWriteU64(&pGMM->idFreeGeneration, 1);
3364	RTSpinlockRelease(pGMM->hSpinLockTree);
3365
3366	/*
3367	* Second invalidation pass.
3368	*/
3369	rc = GVMMR0EnumVMs(gmmR0InvalidatePerVmChunkTlbCallback, NULL);
3370	AssertRCSuccess(rc);
3371	}
3372
3373
3374	/**
3375	* Frees a chunk, giving it back to the host OS.
3376	*
3377	* @param pGMM Pointer to the GMM instance.
3378	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3379	* unmap and free the chunk in one go.
3380	* @param pChunk The chunk to free.
3381	* @param fRelaxedSem Whether we can release the semaphore while doing the
3382	* freeing (@c true) or not.
3383	*/
3384	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3385	{
3386	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3387
3388	GMMR0CHUNKMTXSTATE MtxState;
3389	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3390
3391	/*
3392	* Cleanup hack! Unmap the chunk from the callers address space.
3393	* This shouldn't happen, so screw lock contention...
3394	*/
3395	if (pChunk->cMappingsX && pGVM)
3396	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3397
3398	/*
3399	* If there are current mappings of the chunk, then request the
3400	* VMs to unmap them. Reposition the chunk in the free list so
3401	* it won't be a likely candidate for allocations.
3402	*/
3403	if (pChunk->cMappingsX)
3404	{
3405	/** @todo R0 -> VM request */
3406	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3407	Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3408	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3409	return false;
3410	}
3411
3412
3413	/*
3414	* Save and trash the handle.
3415	*/
3416	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3417	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3418
3419	/*
3420	* Unlink it from everywhere.
3421	*/
3422	gmmR0UnlinkChunk(pChunk);
3423
3424	RTSpinlockAcquire(pGMM->hSpinLockTree);
3425
3426	RTListNodeRemove(&pChunk->ListNode);
3427
3428	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3429	Assert(pCore == &pChunk->Core); NOREF(pCore);
3430
3431	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3432	if (pTlbe->pChunk == pChunk)
3433	{
3434	pTlbe->idChunk = NIL_GMM_CHUNKID;
3435	pTlbe->pChunk = NULL;
3436	}
3437
3438	Assert(pGMM->cChunks > 0);
3439	pGMM->cChunks--;
3440
3441	uint64_t const idFreeGeneration = ASMAtomicIncU64(&pGMM->idFreeGeneration);
3442
3443	RTSpinlockRelease(pGMM->hSpinLockTree);
3444
3445	/*
3446	* Free the Chunk ID before dropping the locks and freeing the rest.
3447	*/
3448	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3449	pChunk->Core.Key = NIL_GMM_CHUNKID;
3450
3451	pGMM->cFreedChunks++;
3452
3453	gmmR0ChunkMutexRelease(&MtxState, NULL);
3454	if (fRelaxedSem)
3455	gmmR0MutexRelease(pGMM);
3456
3457	if (idFreeGeneration == UINT64_MAX / 4)
3458	gmmR0FreeChunkFlushPerVmTlbs(pGMM);
3459
3460	RTMemFree(pChunk->paMappingsX);
3461	pChunk->paMappingsX = NULL;
3462
3463	RTMemFree(pChunk);
3464
3465	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
3466	int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3467	#else
3468	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3469	#endif
3470	AssertLogRelRC(rc);
3471
3472	if (fRelaxedSem)
3473	gmmR0MutexAcquire(pGMM);
3474	return fRelaxedSem;
3475	}
3476
3477
3478	/**
3479	* Free page worker.
3480	*
3481	* The caller does all the statistic decrementing, we do all the incrementing.
3482	*
3483	* @param pGMM Pointer to the GMM instance data.
3484	* @param pGVM Pointer to the GVM instance.
3485	* @param pChunk Pointer to the chunk this page belongs to.
3486	* @param idPage The Page ID.
3487	* @param pPage Pointer to the page.
3488	*/
3489	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3490	{
3491	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3492	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3493
3494	/*
3495	* Put the page on the free list.
3496	*/
3497	pPage->u = 0;
3498	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3499	pPage->Free.fZeroed = false;
3500	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3501	pPage->Free.iNext = pChunk->iFreeHead;
3502	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3503
3504	/*
3505	* Update statistics (the cShared/cPrivate stats are up to date already),
3506	* and relink the chunk if necessary.
3507	*/
3508	unsigned const cFree = pChunk->cFree;
3509	if ( !cFree
3510	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3511	{
3512	gmmR0UnlinkChunk(pChunk);
3513	pChunk->cFree++;
3514	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3515	}
3516	else
3517	{
3518	pChunk->cFree = cFree + 1;
3519	pChunk->pSet->cFreePages++;
3520	}
3521
3522	/*
3523	* If the chunk becomes empty, consider giving memory back to the host OS.
3524	*
3525	* The current strategy is to try give it back if there are other chunks
3526	* in this free list, meaning if there are at least 240 free pages in this
3527	* category. Note that since there are probably mappings of the chunk,
3528	* it won't be freed up instantly, which probably screws up this logic
3529	* a bit...
3530	*/
3531	/** @todo Do this on the way out. */
3532	if (RT_LIKELY( pChunk->cFree != GMM_CHUNK_NUM_PAGES
3533	\|\| pChunk->pFreeNext == NULL
3534	\|\| pChunk->pFreePrev == NULL /** @todo this is probably misfiring, see reset... */))
3535	{ /* likely */ }
3536	else
3537	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3538	}
3539
3540
3541	/**
3542	* Frees a shared page, the page is known to exist and be valid and such.
3543	*
3544	* @param pGMM Pointer to the GMM instance.
3545	* @param pGVM Pointer to the GVM instance.
3546	* @param idPage The page id.
3547	* @param pPage The page structure.
3548	*/
3549	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3550	{
3551	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3552	Assert(pChunk);
3553	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3554	Assert(pChunk->cShared > 0);
3555	Assert(pGMM->cSharedPages > 0);
3556	Assert(pGMM->cAllocatedPages > 0);
3557	Assert(!pPage->Shared.cRefs);
3558
3559	pChunk->cShared--;
3560	pGMM->cAllocatedPages--;
3561	pGMM->cSharedPages--;
3562	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3563	}
3564
3565
3566	/**
3567	* Frees a private page, the page is known to exist and be valid and such.
3568	*
3569	* @param pGMM Pointer to the GMM instance.
3570	* @param pGVM Pointer to the GVM instance.
3571	* @param idPage The page id.
3572	* @param pPage The page structure.
3573	*/
3574	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3575	{
3576	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3577	Assert(pChunk);
3578	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3579	Assert(pChunk->cPrivate > 0);
3580	Assert(pGMM->cAllocatedPages > 0);
3581
3582	pChunk->cPrivate--;
3583	pGMM->cAllocatedPages--;
3584	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3585	}
3586
3587
3588	/**
3589	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3590	*
3591	* @returns VBox status code:
3592	* @retval xxx
3593	*
3594	* @param pGMM Pointer to the GMM instance data.
3595	* @param pGVM Pointer to the VM.
3596	* @param cPages The number of pages to free.
3597	* @param paPages Pointer to the page descriptors.
3598	* @param enmAccount The account this relates to.
3599	*/
3600	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3601	{
3602	/*
3603	* Check that the request isn't impossible wrt to the account status.
3604	*/
3605	switch (enmAccount)
3606	{
3607	case GMMACCOUNT_BASE:
3608	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3609	{
3610	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3611	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3612	}
3613	break;
3614	case GMMACCOUNT_SHADOW:
3615	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3616	{
3617	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3618	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3619	}
3620	break;
3621	case GMMACCOUNT_FIXED:
3622	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3623	{
3624	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3625	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3626	}
3627	break;
3628	default:
3629	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3630	}
3631
3632	/*
3633	* Walk the descriptors and free the pages.
3634	*
3635	* Statistics (except the account) are being updated as we go along,
3636	* unlike the alloc code. Also, stop on the first error.
3637	*/
3638	int rc = VINF_SUCCESS;
3639	uint32_t iPage;
3640	for (iPage = 0; iPage < cPages; iPage++)
3641	{
3642	uint32_t idPage = paPages[iPage].idPage;
3643	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3644	if (RT_LIKELY(pPage))
3645	{
3646	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3647	{
3648	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3649	{
3650	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3651	pGVM->gmm.s.Stats.cPrivatePages--;
3652	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3653	}
3654	else
3655	{
3656	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3657	pPage->Private.hGVM, pGVM->hSelf));
3658	rc = VERR_GMM_NOT_PAGE_OWNER;
3659	break;
3660	}
3661	}
3662	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3663	{
3664	Assert(pGVM->gmm.s.Stats.cSharedPages);
3665	Assert(pPage->Shared.cRefs);
3666	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT)
3667	if (pPage->Shared.u14Checksum)
3668	{
3669	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3670	uChecksum &= UINT32_C(0x00003fff);
3671	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3672	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3673	}
3674	#endif
3675	pGVM->gmm.s.Stats.cSharedPages--;
3676	if (!--pPage->Shared.cRefs)
3677	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3678	else
3679	{
3680	Assert(pGMM->cDuplicatePages);
3681	pGMM->cDuplicatePages--;
3682	}
3683	}
3684	else
3685	{
3686	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3687	rc = VERR_GMM_PAGE_ALREADY_FREE;
3688	break;
3689	}
3690	}
3691	else
3692	{
3693	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3694	rc = VERR_GMM_PAGE_NOT_FOUND;
3695	break;
3696	}
3697	paPages[iPage].idPage = NIL_GMM_PAGEID;
3698	}
3699
3700	/*
3701	* Update the account.
3702	*/
3703	switch (enmAccount)
3704	{
3705	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3706	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3707	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3708	default:
3709	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3710	}
3711
3712	/*
3713	* Any threshold stuff to be done here?
3714	*/
3715
3716	return rc;
3717	}
3718
3719
3720	/**
3721	* Free one or more pages.
3722	*
3723	* This is typically used at reset time or power off.
3724	*
3725	* @returns VBox status code:
3726	* @retval xxx
3727	*
3728	* @param pGVM The global (ring-0) VM structure.
3729	* @param idCpu The VCPU id.
3730	* @param cPages The number of pages to allocate.
3731	* @param paPages Pointer to the page descriptors containing the page IDs
3732	* for each page.
3733	* @param enmAccount The account this relates to.
3734	* @thread EMT.
3735	*/
3736	GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3737	{
3738	LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3739
3740	/*
3741	* Validate input and get the basics.
3742	*/
3743	PGMM pGMM;
3744	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3745	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3746	if (RT_FAILURE(rc))
3747	return rc;
3748
3749	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3750	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3751	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3752
3753	for (unsigned iPage = 0; iPage < cPages; iPage++)
3754	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3755	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3756	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3757
3758	/*
3759	* Take the semaphore and call the worker function.
3760	*/
3761	gmmR0MutexAcquire(pGMM);
3762	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3763	{
3764	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3765	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3766	}
3767	else
3768	rc = VERR_GMM_IS_NOT_SANE;
3769	gmmR0MutexRelease(pGMM);
3770	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3771	return rc;
3772	}
3773
3774
3775	/**
3776	* VMMR0 request wrapper for GMMR0FreePages.
3777	*
3778	* @returns see GMMR0FreePages.
3779	* @param pGVM The global (ring-0) VM structure.
3780	* @param idCpu The VCPU id.
3781	* @param pReq Pointer to the request packet.
3782	*/
3783	GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3784	{
3785	/*
3786	* Validate input and pass it on.
3787	*/
3788	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3789	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3790	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3791	VERR_INVALID_PARAMETER);
3792	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3793	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3794	VERR_INVALID_PARAMETER);
3795
3796	return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3797	}
3798
3799
3800	/**
3801	* Report back on a memory ballooning request.
3802	*
3803	* The request may or may not have been initiated by the GMM. If it was initiated
3804	* by the GMM it is important that this function is called even if no pages were
3805	* ballooned.
3806	*
3807	* @returns VBox status code:
3808	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3809	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3810	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3811	* indicating that we won't necessarily have sufficient RAM to boot
3812	* the VM again and that it should pause until this changes (we'll try
3813	* balloon some other VM). (For standard deflate we have little choice
3814	* but to hope the VM won't use the memory that was returned to it.)
3815	*
3816	* @param pGVM The global (ring-0) VM structure.
3817	* @param idCpu The VCPU id.
3818	* @param enmAction Inflate/deflate/reset.
3819	* @param cBalloonedPages The number of pages that was ballooned.
3820	*
3821	* @thread EMT(idCpu)
3822	*/
3823	GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3824	{
3825	LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3826	pGVM, enmAction, cBalloonedPages));
3827
3828	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3829
3830	/*
3831	* Validate input and get the basics.
3832	*/
3833	PGMM pGMM;
3834	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3835	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3836	if (RT_FAILURE(rc))
3837	return rc;
3838
3839	/*
3840	* Take the semaphore and do some more validations.
3841	*/
3842	gmmR0MutexAcquire(pGMM);
3843	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3844	{
3845	switch (enmAction)
3846	{
3847	case GMMBALLOONACTION_INFLATE:
3848	{
3849	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3850	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3851	{
3852	/*
3853	* Record the ballooned memory.
3854	*/
3855	pGMM->cBalloonedPages += cBalloonedPages;
3856	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3857	{
3858	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3859	AssertFailed();
3860
3861	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3862	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3863	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3864	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3865	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3866	}
3867	else
3868	{
3869	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3870	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3871	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3872	}
3873	}
3874	else
3875	{
3876	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3877	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3878	pGVM->gmm.s.Stats.Reserved.cBasePages));
3879	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3880	}
3881	break;
3882	}
3883
3884	case GMMBALLOONACTION_DEFLATE:
3885	{
3886	/* Deflate. */
3887	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3888	{
3889	/*
3890	* Record the ballooned memory.
3891	*/
3892	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3893	pGMM->cBalloonedPages -= cBalloonedPages;
3894	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3895	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3896	{
3897	AssertFailed(); /* This is path is for later. */
3898	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3899	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3900
3901	/*
3902	* Anything we need to do here now when the request has been completed?
3903	*/
3904	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3905	}
3906	else
3907	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3908	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3909	}
3910	else
3911	{
3912	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3913	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3914	}
3915	break;
3916	}
3917
3918	case GMMBALLOONACTION_RESET:
3919	{
3920	/* Reset to an empty balloon. */
3921	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3922
3923	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3924	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3925	break;
3926	}
3927
3928	default:
3929	rc = VERR_INVALID_PARAMETER;
3930	break;
3931	}
3932	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3933	}
3934	else
3935	rc = VERR_GMM_IS_NOT_SANE;
3936
3937	gmmR0MutexRelease(pGMM);
3938	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3939	return rc;
3940	}
3941
3942
3943	/**
3944	* VMMR0 request wrapper for GMMR0BalloonedPages.
3945	*
3946	* @returns see GMMR0BalloonedPages.
3947	* @param pGVM The global (ring-0) VM structure.
3948	* @param idCpu The VCPU id.
3949	* @param pReq Pointer to the request packet.
3950	*/
3951	GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3952	{
3953	/*
3954	* Validate input and pass it on.
3955	*/
3956	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3957	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3958	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3959	VERR_INVALID_PARAMETER);
3960
3961	return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3962	}
3963
3964
3965	/**
3966	* Return memory statistics for the hypervisor
3967	*
3968	* @returns VBox status code.
3969	* @param pReq Pointer to the request packet.
3970	*/
3971	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3972	{
3973	/*
3974	* Validate input and pass it on.
3975	*/
3976	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3977	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3978	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3979	VERR_INVALID_PARAMETER);
3980
3981	/*
3982	* Validate input and get the basics.
3983	*/
3984	PGMM pGMM;
3985	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3986	pReq->cAllocPages = pGMM->cAllocatedPages;
3987	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3988	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3989	pReq->cMaxPages = pGMM->cMaxPages;
3990	pReq->cSharedPages = pGMM->cDuplicatePages;
3991	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3992
3993	return VINF_SUCCESS;
3994	}
3995
3996
3997	/**
3998	* Return memory statistics for the VM
3999	*
4000	* @returns VBox status code.
4001	* @param pGVM The global (ring-0) VM structure.
4002	* @param idCpu Cpu id.
4003	* @param pReq Pointer to the request packet.
4004	*
4005	* @thread EMT(idCpu)
4006	*/
4007	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
4008	{
4009	/*
4010	* Validate input and pass it on.
4011	*/
4012	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4013	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
4014	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
4015	VERR_INVALID_PARAMETER);
4016
4017	/*
4018	* Validate input and get the basics.
4019	*/
4020	PGMM pGMM;
4021	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4022	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4023	if (RT_FAILURE(rc))
4024	return rc;
4025
4026	/*
4027	* Take the semaphore and do some more validations.
4028	*/
4029	gmmR0MutexAcquire(pGMM);
4030	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4031	{
4032	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
4033	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
4034	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
4035	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
4036	}
4037	else
4038	rc = VERR_GMM_IS_NOT_SANE;
4039
4040	gmmR0MutexRelease(pGMM);
4041	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
4042	return rc;
4043	}
4044
4045
4046	/**
4047	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
4048	*
4049	* Don't call this in legacy allocation mode!
4050	*
4051	* @returns VBox status code.
4052	* @param pGMM Pointer to the GMM instance data.
4053	* @param pGVM Pointer to the Global VM structure.
4054	* @param pChunk Pointer to the chunk to be unmapped.
4055	*/
4056	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
4057	{
4058	RT_NOREF_PV(pGMM);
4059
4060	/*
4061	* Find the mapping and try unmapping it.
4062	*/
4063	uint32_t cMappings = pChunk->cMappingsX;
4064	for (uint32_t i = 0; i < cMappings; i++)
4065	{
4066	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4067	if (pChunk->paMappingsX[i].pGVM == pGVM)
4068	{
4069	/* unmap */
4070	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
4071	if (RT_SUCCESS(rc))
4072	{
4073	/* update the record. */
4074	cMappings--;
4075	if (i < cMappings)
4076	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
4077	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
4078	pChunk->paMappingsX[cMappings].pGVM = NULL;
4079	Assert(pChunk->cMappingsX - 1U == cMappings);
4080	pChunk->cMappingsX = cMappings;
4081	}
4082
4083	return rc;
4084	}
4085	}
4086
4087	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
4088	return VERR_GMM_CHUNK_NOT_MAPPED;
4089	}
4090
4091
4092	/**
4093	* Unmaps a chunk previously mapped into the address space of the current process.
4094	*
4095	* @returns VBox status code.
4096	* @param pGMM Pointer to the GMM instance data.
4097	* @param pGVM Pointer to the Global VM structure.
4098	* @param pChunk Pointer to the chunk to be unmapped.
4099	* @param fRelaxedSem Whether we can release the semaphore while doing the
4100	* mapping (@c true) or not.
4101	*/
4102	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
4103	{
4104	/*
4105	* Lock the chunk and if possible leave the giant GMM lock.
4106	*/
4107	GMMR0CHUNKMTXSTATE MtxState;
4108	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4109	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4110	if (RT_SUCCESS(rc))
4111	{
4112	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
4113	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4114	}
4115	return rc;
4116	}
4117
4118
4119	/**
4120	* Worker for gmmR0MapChunk.
4121	*
4122	* @returns VBox status code.
4123	* @param pGMM Pointer to the GMM instance data.
4124	* @param pGVM Pointer to the Global VM structure.
4125	* @param pChunk Pointer to the chunk to be mapped.
4126	* @param ppvR3 Where to store the ring-3 address of the mapping.
4127	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4128	* contain the address of the existing mapping.
4129	*/
4130	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4131	{
4132	RT_NOREF(pGMM);
4133
4134	/*
4135	* Check to see if the chunk is already mapped.
4136	*/
4137	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4138	{
4139	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4140	if (pChunk->paMappingsX[i].pGVM == pGVM)
4141	{
4142	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4143	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4144	#ifdef VBOX_WITH_PAGE_SHARING
4145	/* The ring-3 chunk cache can be out of sync; don't fail. */
4146	return VINF_SUCCESS;
4147	#else
4148	return VERR_GMM_CHUNK_ALREADY_MAPPED;
4149	#endif
4150	}
4151	}
4152
4153	/*
4154	* Do the mapping.
4155	*/
4156	RTR0MEMOBJ hMapObj;
4157	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4158	if (RT_SUCCESS(rc))
4159	{
4160	/* reallocate the array? assumes few users per chunk (usually one). */
4161	unsigned iMapping = pChunk->cMappingsX;
4162	if ( iMapping <= 3
4163	\|\| (iMapping & 3) == 0)
4164	{
4165	unsigned cNewSize = iMapping <= 3
4166	? iMapping + 1
4167	: iMapping + 4;
4168	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4169	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4170	{
4171	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4172	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4173	}
4174
4175	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4176	if (RT_UNLIKELY(!pvMappings))
4177	{
4178	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4179	return VERR_NO_MEMORY;
4180	}
4181	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4182	}
4183
4184	/* insert new entry */
4185	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4186	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4187	Assert(pChunk->cMappingsX == iMapping);
4188	pChunk->cMappingsX = iMapping + 1;
4189
4190	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4191	}
4192
4193	return rc;
4194	}
4195
4196
4197	/**
4198	* Maps a chunk into the user address space of the current process.
4199	*
4200	* @returns VBox status code.
4201	* @param pGMM Pointer to the GMM instance data.
4202	* @param pGVM Pointer to the Global VM structure.
4203	* @param pChunk Pointer to the chunk to be mapped.
4204	* @param fRelaxedSem Whether we can release the semaphore while doing the
4205	* mapping (@c true) or not.
4206	* @param ppvR3 Where to store the ring-3 address of the mapping.
4207	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4208	* contain the address of the existing mapping.
4209	*/
4210	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4211	{
4212	/*
4213	* Take the chunk lock and leave the giant GMM lock when possible, then
4214	* call the worker function.
4215	*/
4216	GMMR0CHUNKMTXSTATE MtxState;
4217	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4218	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4219	if (RT_SUCCESS(rc))
4220	{
4221	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4222	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4223	}
4224
4225	return rc;
4226	}
4227
4228
4229
4230	#if defined(VBOX_WITH_PAGE_SHARING) \|\| defined(VBOX_STRICT)
4231	/**
4232	* Check if a chunk is mapped into the specified VM
4233	*
4234	* @returns mapped yes/no
4235	* @param pGMM Pointer to the GMM instance.
4236	* @param pGVM Pointer to the Global VM structure.
4237	* @param pChunk Pointer to the chunk to be mapped.
4238	* @param ppvR3 Where to store the ring-3 address of the mapping.
4239	*/
4240	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4241	{
4242	GMMR0CHUNKMTXSTATE MtxState;
4243	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4244	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4245	{
4246	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4247	if (pChunk->paMappingsX[i].pGVM == pGVM)
4248	{
4249	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4250	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4251	return true;
4252	}
4253	}
4254	*ppvR3 = NULL;
4255	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4256	return false;
4257	}
4258	#endif /* VBOX_WITH_PAGE_SHARING \|\| VBOX_STRICT */
4259
4260
4261	/**
4262	* Map a chunk and/or unmap another chunk.
4263	*
4264	* The mapping and unmapping applies to the current process.
4265	*
4266	* This API does two things because it saves a kernel call per mapping when
4267	* when the ring-3 mapping cache is full.
4268	*
4269	* @returns VBox status code.
4270	* @param pGVM The global (ring-0) VM structure.
4271	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4272	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4273	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4274	* @thread EMT ???
4275	*/
4276	GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4277	{
4278	LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4279	pGVM, idChunkMap, idChunkUnmap, ppvR3));
4280
4281	/*
4282	* Validate input and get the basics.
4283	*/
4284	PGMM pGMM;
4285	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4286	int rc = GVMMR0ValidateGVM(pGVM);
4287	if (RT_FAILURE(rc))
4288	return rc;
4289
4290	AssertCompile(NIL_GMM_CHUNKID == 0);
4291	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4292	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4293
4294	if ( idChunkMap == NIL_GMM_CHUNKID
4295	&& idChunkUnmap == NIL_GMM_CHUNKID)
4296	return VERR_INVALID_PARAMETER;
4297
4298	if (idChunkMap != NIL_GMM_CHUNKID)
4299	{
4300	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4301	*ppvR3 = NIL_RTR3PTR;
4302	}
4303
4304	/*
4305	* Take the semaphore and do the work.
4306	*
4307	* The unmapping is done last since it's easier to undo a mapping than
4308	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4309	* that it pushes the user virtual address space to within a chunk of
4310	* it it's limits, so, no problem here.
4311	*/
4312	gmmR0MutexAcquire(pGMM);
4313	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4314	{
4315	PGMMCHUNK pMap = NULL;
4316	if (idChunkMap != NIL_GVM_HANDLE)
4317	{
4318	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4319	if (RT_LIKELY(pMap))
4320	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4321	else
4322	{
4323	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4324	rc = VERR_GMM_CHUNK_NOT_FOUND;
4325	}
4326	}
4327	/** @todo split this operation, the bail out might (theoretcially) not be
4328	* entirely safe. */
4329
4330	if ( idChunkUnmap != NIL_GMM_CHUNKID
4331	&& RT_SUCCESS(rc))
4332	{
4333	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4334	if (RT_LIKELY(pUnmap))
4335	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4336	else
4337	{
4338	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4339	rc = VERR_GMM_CHUNK_NOT_FOUND;
4340	}
4341
4342	if (RT_FAILURE(rc) && pMap)
4343	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4344	}
4345
4346	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4347	}
4348	else
4349	rc = VERR_GMM_IS_NOT_SANE;
4350	gmmR0MutexRelease(pGMM);
4351
4352	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4353	return rc;
4354	}
4355
4356
4357	/**
4358	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4359	*
4360	* @returns see GMMR0MapUnmapChunk.
4361	* @param pGVM The global (ring-0) VM structure.
4362	* @param pReq Pointer to the request packet.
4363	*/
4364	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4365	{
4366	/*
4367	* Validate input and pass it on.
4368	*/
4369	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4370	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4371
4372	return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4373	}
4374
4375
4376	#ifndef VBOX_WITH_LINEAR_HOST_PHYS_MEM
4377	/**
4378	* Gets the ring-0 virtual address for the given page.
4379	*
4380	* This is used by PGM when IEM and such wants to access guest RAM from ring-0.
4381	* One of the ASSUMPTIONS here is that the @a idPage is used by the VM and the
4382	* corresponding chunk will remain valid beyond the call (at least till the EMT
4383	* returns to ring-3).
4384	*
4385	* @returns VBox status code.
4386	* @param pGVM Pointer to the kernel-only VM instace data.
4387	* @param idPage The page ID.
4388	* @param ppv Where to store the address.
4389	* @thread EMT
4390	*/
4391	GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4392	{
4393	*ppv = NULL;
4394	PGMM pGMM;
4395	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4396
4397	uint32_t const idChunk = idPage >> GMM_CHUNKID_SHIFT;
4398
4399	/*
4400	* Start with the per-VM TLB.
4401	*/
4402	RTSpinlockAcquire(pGVM->gmm.s.hChunkTlbSpinLock);
4403
4404	PGMMPERVMCHUNKTLBE pTlbe = &pGVM->gmm.s.aChunkTlbEntries[GMMPERVM_CHUNKTLB_IDX(idChunk)];
4405	PGMMCHUNK pChunk = pTlbe->pChunk;
4406	if ( pChunk != NULL
4407	&& pTlbe->idGeneration == ASMAtomicUoReadU64(&pGMM->idFreeGeneration)
4408	&& pChunk->Core.Key == idChunk)
4409	pGVM->R0Stats.gmm.cChunkTlbHits++; /* hopefully this is a likely outcome */
4410	else
4411	{
4412	pGVM->R0Stats.gmm.cChunkTlbMisses++;
4413
4414	/*
4415	* Look it up in the chunk tree.
4416	*/
4417	RTSpinlockAcquire(pGMM->hSpinLockTree);
4418	pChunk = gmmR0GetChunkLocked(pGMM, idChunk);
4419	if (RT_LIKELY(pChunk))
4420	{
4421	pTlbe->idGeneration = pGMM->idFreeGeneration;
4422	RTSpinlockRelease(pGMM->hSpinLockTree);
4423	pTlbe->pChunk = pChunk;
4424	}
4425	else
4426	{
4427	RTSpinlockRelease(pGMM->hSpinLockTree);
4428	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4429	AssertMsgFailed(("idPage=%#x\n", idPage));
4430	return VERR_GMM_PAGE_NOT_FOUND;
4431	}
4432	}
4433
4434	RTSpinlockRelease(pGVM->gmm.s.hChunkTlbSpinLock);
4435
4436	/*
4437	* Got a chunk, now validate the page ownership and calcuate it's address.
4438	*/
4439	const GMMPAGE * const pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4440	if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4441	&& pPage->Private.hGVM == pGVM->hSelf)
4442	\|\| GMM_PAGE_IS_SHARED(pPage)))
4443	{
4444	AssertPtr(pChunk->pbMapping);
4445	*ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4446	return VINF_SUCCESS;
4447	}
4448	AssertMsgFailed(("idPage=%#x is-private=%RTbool Private.hGVM=%u pGVM->hGVM=%u\n",
4449	idPage, GMM_PAGE_IS_PRIVATE(pPage), pPage->Private.hGVM, pGVM->hSelf));
4450	return VERR_GMM_NOT_PAGE_OWNER;
4451	}
4452	#endif /* !VBOX_WITH_LINEAR_HOST_PHYS_MEM */
4453
4454	#ifdef VBOX_WITH_PAGE_SHARING
4455
4456	# ifdef VBOX_STRICT
4457	/**
4458	* For checksumming shared pages in strict builds.
4459	*
4460	* The purpose is making sure that a page doesn't change.
4461	*
4462	* @returns Checksum, 0 on failure.
4463	* @param pGMM The GMM instance data.
4464	* @param pGVM Pointer to the kernel-only VM instace data.
4465	* @param idPage The page ID.
4466	*/
4467	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4468	{
4469	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4470	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4471
4472	uint8_t *pbChunk;
4473	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4474	return 0;
4475	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4476
4477	return RTCrc32(pbPage, PAGE_SIZE);
4478	}
4479	# endif /* VBOX_STRICT */
4480
4481
4482	/**
4483	* Calculates the module hash value.
4484	*
4485	* @returns Hash value.
4486	* @param pszModuleName The module name.
4487	* @param pszVersion The module version string.
4488	*/
4489	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4490	{
4491	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4492	}
4493
4494
4495	/**
4496	* Finds a global module.
4497	*
4498	* @returns Pointer to the global module on success, NULL if not found.
4499	* @param pGMM The GMM instance data.
4500	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4501	* @param cbModule The module size.
4502	* @param enmGuestOS The guest OS type.
4503	* @param cRegions The number of regions.
4504	* @param pszModuleName The module name.
4505	* @param pszVersion The module version.
4506	* @param paRegions The region descriptions.
4507	*/
4508	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4509	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4510	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4511	{
4512	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4513	pGblMod;
4514	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4515	{
4516	if (pGblMod->cbModule != cbModule)
4517	continue;
4518	if (pGblMod->enmGuestOS != enmGuestOS)
4519	continue;
4520	if (pGblMod->cRegions != cRegions)
4521	continue;
4522	if (strcmp(pGblMod->szName, pszModuleName))
4523	continue;
4524	if (strcmp(pGblMod->szVersion, pszVersion))
4525	continue;
4526
4527	uint32_t i;
4528	for (i = 0; i < cRegions; i++)
4529	{
4530	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4531	if (pGblMod->aRegions[i].off != off)
4532	break;
4533
4534	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4535	if (pGblMod->aRegions[i].cb != cb)
4536	break;
4537	}
4538
4539	if (i == cRegions)
4540	return pGblMod;
4541	}
4542
4543	return NULL;
4544	}
4545
4546
4547	/**
4548	* Creates a new global module.
4549	*
4550	* @returns VBox status code.
4551	* @param pGMM The GMM instance data.
4552	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4553	* @param cbModule The module size.
4554	* @param enmGuestOS The guest OS type.
4555	* @param cRegions The number of regions.
4556	* @param pszModuleName The module name.
4557	* @param pszVersion The module version.
4558	* @param paRegions The region descriptions.
4559	* @param ppGblMod Where to return the new module on success.
4560	*/
4561	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4562	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4563	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4564	{
4565	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4566	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4567	{
4568	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4569	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4570	}
4571
4572	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4573	if (!pGblMod)
4574	{
4575	Log(("gmmR0ShModNewGlobal: No memory\n"));
4576	return VERR_NO_MEMORY;
4577	}
4578
4579	pGblMod->Core.Key = uHash;
4580	pGblMod->cbModule = cbModule;
4581	pGblMod->cRegions = cRegions;
4582	pGblMod->cUsers = 1;
4583	pGblMod->enmGuestOS = enmGuestOS;
4584	strcpy(pGblMod->szName, pszModuleName);
4585	strcpy(pGblMod->szVersion, pszVersion);
4586
4587	for (uint32_t i = 0; i < cRegions; i++)
4588	{
4589	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4590	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4591	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4592	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4593	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4594	}
4595
4596	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4597	Assert(fInsert); NOREF(fInsert);
4598	pGMM->cShareableModules++;
4599
4600	*ppGblMod = pGblMod;
4601	return VINF_SUCCESS;
4602	}
4603
4604
4605	/**
4606	* Deletes a global module which is no longer referenced by anyone.
4607	*
4608	* @param pGMM The GMM instance data.
4609	* @param pGblMod The module to delete.
4610	*/
4611	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4612	{
4613	Assert(pGblMod->cUsers == 0);
4614	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4615
4616	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4617	Assert(pvTest == pGblMod); NOREF(pvTest);
4618	pGMM->cShareableModules--;
4619
4620	uint32_t i = pGblMod->cRegions;
4621	while (i-- > 0)
4622	{
4623	if (pGblMod->aRegions[i].paidPages)
4624	{
4625	/* We don't doing anything to the pages as they are handled by the
4626	copy-on-write mechanism in PGM. */
4627	RTMemFree(pGblMod->aRegions[i].paidPages);
4628	pGblMod->aRegions[i].paidPages = NULL;
4629	}
4630	}
4631	RTMemFree(pGblMod);
4632	}
4633
4634
4635	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4636	PGMMSHAREDMODULEPERVM *ppRecVM)
4637	{
4638	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4639	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4640
4641	PGMMSHAREDMODULEPERVM pRecVM;
4642	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4643	if (!pRecVM)
4644	return VERR_NO_MEMORY;
4645
4646	pRecVM->Core.Key = GCBaseAddr;
4647	for (uint32_t i = 0; i < cRegions; i++)
4648	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4649
4650	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4651	Assert(fInsert); NOREF(fInsert);
4652	pGVM->gmm.s.Stats.cShareableModules++;
4653
4654	*ppRecVM = pRecVM;
4655	return VINF_SUCCESS;
4656	}
4657
4658
4659	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4660	{
4661	/*
4662	* Free the per-VM module.
4663	*/
4664	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4665	pRecVM->pGlobalModule = NULL;
4666
4667	if (fRemove)
4668	{
4669	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4670	Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4671	}
4672
4673	RTMemFree(pRecVM);
4674
4675	/*
4676	* Release the global module.
4677	* (In the registration bailout case, it might not be.)
4678	*/
4679	if (pGblMod)
4680	{
4681	Assert(pGblMod->cUsers > 0);
4682	pGblMod->cUsers--;
4683	if (pGblMod->cUsers == 0)
4684	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4685	}
4686	}
4687
4688	#endif /* VBOX_WITH_PAGE_SHARING */
4689
4690	/**
4691	* Registers a new shared module for the VM.
4692	*
4693	* @returns VBox status code.
4694	* @param pGVM The global (ring-0) VM structure.
4695	* @param idCpu The VCPU id.
4696	* @param enmGuestOS The guest OS type.
4697	* @param pszModuleName The module name.
4698	* @param pszVersion The module version.
4699	* @param GCPtrModBase The module base address.
4700	* @param cbModule The module size.
4701	* @param cRegions The mumber of shared region descriptors.
4702	* @param paRegions Pointer to an array of shared region(s).
4703	* @thread EMT(idCpu)
4704	*/
4705	GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4706	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4707	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4708	{
4709	#ifdef VBOX_WITH_PAGE_SHARING
4710	/*
4711	* Validate input and get the basics.
4712	*
4713	* Note! Turns out the module size does necessarily match the size of the
4714	* regions. (iTunes on XP)
4715	*/
4716	PGMM pGMM;
4717	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4718	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4719	if (RT_FAILURE(rc))
4720	return rc;
4721
4722	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4723	return VERR_GMM_TOO_MANY_REGIONS;
4724
4725	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4726	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4727
4728	uint32_t cbTotal = 0;
4729	for (uint32_t i = 0; i < cRegions; i++)
4730	{
4731	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4732	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4733
4734	cbTotal += paRegions[i].cbRegion;
4735	if (RT_UNLIKELY(cbTotal > _1G))
4736	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4737	}
4738
4739	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4740	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4741	return VERR_GMM_MODULE_NAME_TOO_LONG;
4742
4743	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4744	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4745	return VERR_GMM_MODULE_NAME_TOO_LONG;
4746
4747	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4748	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4749
4750	/*
4751	* Take the semaphore and do some more validations.
4752	*/
4753	gmmR0MutexAcquire(pGMM);
4754	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4755	{
4756	/*
4757	* Check if this module is already locally registered and register
4758	* it if it isn't. The base address is a unique module identifier
4759	* locally.
4760	*/
4761	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4762	bool fNewModule = pRecVM == NULL;
4763	if (fNewModule)
4764	{
4765	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4766	if (RT_SUCCESS(rc))
4767	{
4768	/*
4769	* Find a matching global module, register a new one if needed.
4770	*/
4771	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4772	pszModuleName, pszVersion, paRegions);
4773	if (!pGblMod)
4774	{
4775	Assert(fNewModule);
4776	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4777	pszModuleName, pszVersion, paRegions, &pGblMod);
4778	if (RT_SUCCESS(rc))
4779	{
4780	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4781	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4782	}
4783	else
4784	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4785	}
4786	else
4787	{
4788	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4789	pGblMod->cUsers++;
4790	pRecVM->pGlobalModule = pGblMod;
4791
4792	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4793	}
4794	}
4795	}
4796	else
4797	{
4798	/*
4799	* Attempt to re-register an existing module.
4800	*/
4801	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4802	pszModuleName, pszVersion, paRegions);
4803	if (pRecVM->pGlobalModule == pGblMod)
4804	{
4805	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4806	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4807	}
4808	else
4809	{
4810	/** @todo may have to unregister+register when this happens in case it's caused
4811	* by VBoxService crashing and being restarted... */
4812	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4813	" incoming at %RGvLB%#x %s %s rgns %u\n"
4814	" existing at %RGvLB%#x %s %s rgns %u\n",
4815	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4816	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4817	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4818	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4819	}
4820	}
4821	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4822	}
4823	else
4824	rc = VERR_GMM_IS_NOT_SANE;
4825
4826	gmmR0MutexRelease(pGMM);
4827	return rc;
4828	#else
4829
4830	NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4831	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4832	return VERR_NOT_IMPLEMENTED;
4833	#endif
4834	}
4835
4836
4837	/**
4838	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4839	*
4840	* @returns see GMMR0RegisterSharedModule.
4841	* @param pGVM The global (ring-0) VM structure.
4842	* @param idCpu The VCPU id.
4843	* @param pReq Pointer to the request packet.
4844	*/
4845	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4846	{
4847	/*
4848	* Validate input and pass it on.
4849	*/
4850	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4851	AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4852	&& pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4853	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4854
4855	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4856	pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4857	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4858	return VINF_SUCCESS;
4859	}
4860
4861
4862	/**
4863	* Unregisters a shared module for the VM
4864	*
4865	* @returns VBox status code.
4866	* @param pGVM The global (ring-0) VM structure.
4867	* @param idCpu The VCPU id.
4868	* @param pszModuleName The module name.
4869	* @param pszVersion The module version.
4870	* @param GCPtrModBase The module base address.
4871	* @param cbModule The module size.
4872	*/
4873	GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4874	RTGCPTR GCPtrModBase, uint32_t cbModule)
4875	{
4876	#ifdef VBOX_WITH_PAGE_SHARING
4877	/*
4878	* Validate input and get the basics.
4879	*/
4880	PGMM pGMM;
4881	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4882	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4883	if (RT_FAILURE(rc))
4884	return rc;
4885
4886	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4887	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4888	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4889	return VERR_GMM_MODULE_NAME_TOO_LONG;
4890	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4891	return VERR_GMM_MODULE_NAME_TOO_LONG;
4892
4893	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4894
4895	/*
4896	* Take the semaphore and do some more validations.
4897	*/
4898	gmmR0MutexAcquire(pGMM);
4899	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4900	{
4901	/*
4902	* Locate and remove the specified module.
4903	*/
4904	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4905	if (pRecVM)
4906	{
4907	/** @todo Do we need to do more validations here, like that the
4908	* name + version + cbModule matches? */
4909	NOREF(cbModule);
4910	Assert(pRecVM->pGlobalModule);
4911	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4912	}
4913	else
4914	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4915
4916	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4917	}
4918	else
4919	rc = VERR_GMM_IS_NOT_SANE;
4920
4921	gmmR0MutexRelease(pGMM);
4922	return rc;
4923	#else
4924
4925	NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4926	return VERR_NOT_IMPLEMENTED;
4927	#endif
4928	}
4929
4930
4931	/**
4932	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4933	*
4934	* @returns see GMMR0UnregisterSharedModule.
4935	* @param pGVM The global (ring-0) VM structure.
4936	* @param idCpu The VCPU id.
4937	* @param pReq Pointer to the request packet.
4938	*/
4939	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4940	{
4941	/*
4942	* Validate input and pass it on.
4943	*/
4944	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4945	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4946
4947	return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4948	}
4949
4950	#ifdef VBOX_WITH_PAGE_SHARING
4951
4952	/**
4953	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4954	*
4955	* @param pGMM Pointer to the GMM instance.
4956	* @param pGVM Pointer to the GVM instance.
4957	* @param pPage The page structure.
4958	*/
4959	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4960	{
4961	Assert(pGMM->cSharedPages > 0);
4962	Assert(pGMM->cAllocatedPages > 0);
4963
4964	pGMM->cDuplicatePages++;
4965
4966	pPage->Shared.cRefs++;
4967	pGVM->gmm.s.Stats.cSharedPages++;
4968	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4969	}
4970
4971
4972	/**
4973	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4974	*
4975	* @param pGMM Pointer to the GMM instance.
4976	* @param pGVM Pointer to the GVM instance.
4977	* @param HCPhys Host physical address
4978	* @param idPage The Page ID
4979	* @param pPage The page structure.
4980	* @param pPageDesc Shared page descriptor
4981	*/
4982	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4983	PGMMSHAREDPAGEDESC pPageDesc)
4984	{
4985	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4986	Assert(pChunk);
4987	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4988	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4989
4990	pChunk->cPrivate--;
4991	pChunk->cShared++;
4992
4993	pGMM->cSharedPages++;
4994
4995	pGVM->gmm.s.Stats.cSharedPages++;
4996	pGVM->gmm.s.Stats.cPrivatePages--;
4997
4998	/* Modify the page structure. */
4999	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
5000	pPage->Shared.cRefs = 1;
5001	#ifdef VBOX_STRICT
5002	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
5003	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
5004	#else
5005	NOREF(pPageDesc);
5006	pPage->Shared.u14Checksum = 0;
5007	#endif
5008	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
5009	}
5010
5011
5012	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
5013	unsigned idxRegion, unsigned idxPage,
5014	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
5015	{
5016	NOREF(pModule);
5017
5018	/* Easy case: just change the internal page type. */
5019	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
5020	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
5021	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
5022	VERR_PGM_PHYS_INVALID_PAGE_ID);
5023	NOREF(idxRegion);
5024
5025	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
5026
5027	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
5028
5029	/* Keep track of these references. */
5030	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
5031
5032	return VINF_SUCCESS;
5033	}
5034
5035	/**
5036	* Checks specified shared module range for changes
5037	*
5038	* Performs the following tasks:
5039	* - If a shared page is new, then it changes the GMM page type to shared and
5040	* returns it in the pPageDesc descriptor.
5041	* - If a shared page already exists, then it checks if the VM page is
5042	* identical and if so frees the VM page and returns the shared page in
5043	* pPageDesc descriptor.
5044	*
5045	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
5046	*
5047	* @returns VBox status code.
5048	* @param pGVM Pointer to the GVM instance data.
5049	* @param pModule Module description
5050	* @param idxRegion Region index
5051	* @param idxPage Page index
5052	* @param pPageDesc Page descriptor
5053	*/
5054	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
5055	PGMMSHAREDPAGEDESC pPageDesc)
5056	{
5057	int rc;
5058	PGMM pGMM;
5059	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5060	pPageDesc->u32StrictChecksum = 0;
5061
5062	AssertMsgReturn(idxRegion < pModule->cRegions,
5063	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5064	VERR_INVALID_PARAMETER);
5065
5066	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
5067	AssertMsgReturn(idxPage < cPages,
5068	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
5069	VERR_INVALID_PARAMETER);
5070
5071	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
5072
5073	/*
5074	* First time; create a page descriptor array.
5075	*/
5076	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
5077	if (!pGlobalRegion->paidPages)
5078	{
5079	Log(("Allocate page descriptor array for %d pages\n", cPages));
5080	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
5081	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
5082
5083	/* Invalidate all descriptors. */
5084	uint32_t i = cPages;
5085	while (i-- > 0)
5086	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
5087	}
5088
5089	/*
5090	* We've seen this shared page for the first time?
5091	*/
5092	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
5093	{
5094	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
5095	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5096	}
5097
5098	/*
5099	* We've seen it before...
5100	*/
5101	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
5102	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
5103	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
5104
5105	/*
5106	* Get the shared page source.
5107	*/
5108	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5109	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5110	VERR_PGM_PHYS_INVALID_PAGE_ID);
5111
5112	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5113	{
5114	/*
5115	* Page was freed at some point; invalidate this entry.
5116	*/
5117	/** @todo this isn't really bullet proof. */
5118	Log(("Old shared page was freed -> create a new one\n"));
5119	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5120	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5121	}
5122
5123	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5124
5125	/*
5126	* Calculate the virtual address of the local page.
5127	*/
5128	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5129	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5130	VERR_PGM_PHYS_INVALID_PAGE_ID);
5131
5132	uint8_t *pbChunk;
5133	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5134	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5135	VERR_PGM_PHYS_INVALID_PAGE_ID);
5136	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5137
5138	/*
5139	* Calculate the virtual address of the shared page.
5140	*/
5141	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5142	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5143
5144	/*
5145	* Get the virtual address of the physical page; map the chunk into the VM
5146	* process if not already done.
5147	*/
5148	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5149	{
5150	Log(("Map chunk into process!\n"));
5151	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5152	AssertRCReturn(rc, rc);
5153	}
5154	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5155
5156	#ifdef VBOX_STRICT
5157	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5158	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5159	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
5160	("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5161	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5162	#endif
5163
5164	/** @todo write ASMMemComparePage. */
5165	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5166	{
5167	Log(("Unexpected differences found between local and shared page; skip\n"));
5168	/* Signal to the caller that this one hasn't changed. */
5169	pPageDesc->idPage = NIL_GMM_PAGEID;
5170	return VINF_SUCCESS;
5171	}
5172
5173	/*
5174	* Free the old local page.
5175	*/
5176	GMMFREEPAGEDESC PageDesc;
5177	PageDesc.idPage = pPageDesc->idPage;
5178	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5179	AssertRCReturn(rc, rc);
5180
5181	gmmR0UseSharedPage(pGMM, pGVM, pPage);
5182
5183	/*
5184	* Pass along the new physical address & page id.
5185	*/
5186	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5187	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5188
5189	return VINF_SUCCESS;
5190	}
5191
5192
5193	/**
5194	* RTAvlGCPtrDestroy callback.
5195	*
5196	* @returns 0 or VERR_GMM_INSTANCE.
5197	* @param pNode The node to destroy.
5198	* @param pvArgs Pointer to an argument packet.
5199	*/
5200	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5201	{
5202	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5203	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5204	(PGMMSHAREDMODULEPERVM)pNode,
5205	false /fRemove/);
5206	return VINF_SUCCESS;
5207	}
5208
5209
5210	/**
5211	* Used by GMMR0CleanupVM to clean up shared modules.
5212	*
5213	* This is called without taking the GMM lock so that it can be yielded as
5214	* needed here.
5215	*
5216	* @param pGMM The GMM handle.
5217	* @param pGVM The global VM handle.
5218	*/
5219	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5220	{
5221	gmmR0MutexAcquire(pGMM);
5222	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5223
5224	GMMR0SHMODPERVMDTORARGS Args;
5225	Args.pGVM = pGVM;
5226	Args.pGMM = pGMM;
5227	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5228
5229	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5230	pGVM->gmm.s.Stats.cShareableModules = 0;
5231
5232	gmmR0MutexRelease(pGMM);
5233	}
5234
5235	#endif /* VBOX_WITH_PAGE_SHARING */
5236
5237	/**
5238	* Removes all shared modules for the specified VM
5239	*
5240	* @returns VBox status code.
5241	* @param pGVM The global (ring-0) VM structure.
5242	* @param idCpu The VCPU id.
5243	*/
5244	GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5245	{
5246	#ifdef VBOX_WITH_PAGE_SHARING
5247	/*
5248	* Validate input and get the basics.
5249	*/
5250	PGMM pGMM;
5251	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5252	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5253	if (RT_FAILURE(rc))
5254	return rc;
5255
5256	/*
5257	* Take the semaphore and do some more validations.
5258	*/
5259	gmmR0MutexAcquire(pGMM);
5260	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5261	{
5262	Log(("GMMR0ResetSharedModules\n"));
5263	GMMR0SHMODPERVMDTORARGS Args;
5264	Args.pGVM = pGVM;
5265	Args.pGMM = pGMM;
5266	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5267	pGVM->gmm.s.Stats.cShareableModules = 0;
5268
5269	rc = VINF_SUCCESS;
5270	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5271	}
5272	else
5273	rc = VERR_GMM_IS_NOT_SANE;
5274
5275	gmmR0MutexRelease(pGMM);
5276	return rc;
5277	#else
5278	RT_NOREF(pGVM, idCpu);
5279	return VERR_NOT_IMPLEMENTED;
5280	#endif
5281	}
5282
5283	#ifdef VBOX_WITH_PAGE_SHARING
5284
5285	/**
5286	* Tree enumeration callback for checking a shared module.
5287	*/
5288	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5289	{
5290	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5291	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5292	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5293
5294	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5295	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5296
5297	int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5298	if (RT_FAILURE(rc))
5299	return rc;
5300	return VINF_SUCCESS;
5301	}
5302
5303	#endif /* VBOX_WITH_PAGE_SHARING */
5304
5305	/**
5306	* Check all shared modules for the specified VM.
5307	*
5308	* @returns VBox status code.
5309	* @param pGVM The global (ring-0) VM structure.
5310	* @param idCpu The calling EMT number.
5311	* @thread EMT(idCpu)
5312	*/
5313	GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5314	{
5315	#ifdef VBOX_WITH_PAGE_SHARING
5316	/*
5317	* Validate input and get the basics.
5318	*/
5319	PGMM pGMM;
5320	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5321	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5322	if (RT_FAILURE(rc))
5323	return rc;
5324
5325	# ifndef DEBUG_sandervl
5326	/*
5327	* Take the semaphore and do some more validations.
5328	*/
5329	gmmR0MutexAcquire(pGMM);
5330	# endif
5331	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5332	{
5333	/*
5334	* Walk the tree, checking each module.
5335	*/
5336	Log(("GMMR0CheckSharedModules\n"));
5337
5338	GMMCHECKSHAREDMODULEINFO Args;
5339	Args.pGVM = pGVM;
5340	Args.idCpu = idCpu;
5341	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5342
5343	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5344	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5345	}
5346	else
5347	rc = VERR_GMM_IS_NOT_SANE;
5348
5349	# ifndef DEBUG_sandervl
5350	gmmR0MutexRelease(pGMM);
5351	# endif
5352	return rc;
5353	#else
5354	RT_NOREF(pGVM, idCpu);
5355	return VERR_NOT_IMPLEMENTED;
5356	#endif
5357	}
5358
5359	#ifdef VBOX_STRICT
5360
5361	/**
5362	* Worker for GMMR0FindDuplicatePageReq.
5363	*
5364	* @returns true if duplicate, false if not.
5365	*/
5366	static bool gmmR0FindDupPageInChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint8_t const *pbSourcePage)
5367	{
5368	bool fFoundDuplicate = false;
5369	/* Only take chunks not mapped into this VM process; not entirely correct. */
5370	uint8_t *pbChunk;
5371	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5372	{
5373	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5374	if (RT_SUCCESS(rc))
5375	{
5376	/*
5377	* Look for duplicate pages
5378	*/
5379	uintptr_t iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5380	while (iPage-- > 0)
5381	{
5382	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5383	{
5384	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5385	if (!memcmp(pbSourcePage, pbDestPage, PAGE_SIZE))
5386	{
5387	fFoundDuplicate = true;
5388	break;
5389	}
5390	}
5391	}
5392	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5393	}
5394	}
5395	return fFoundDuplicate;
5396	}
5397
5398
5399	/**
5400	* Find a duplicate of the specified page in other active VMs
5401	*
5402	* @returns VBox status code.
5403	* @param pGVM The global (ring-0) VM structure.
5404	* @param pReq Pointer to the request packet.
5405	*/
5406	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5407	{
5408	/*
5409	* Validate input and pass it on.
5410	*/
5411	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5412	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5413
5414	PGMM pGMM;
5415	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5416
5417	int rc = GVMMR0ValidateGVM(pGVM);
5418	if (RT_FAILURE(rc))
5419	return rc;
5420
5421	/*
5422	* Take the semaphore and do some more validations.
5423	*/
5424	rc = gmmR0MutexAcquire(pGMM);
5425	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5426	{
5427	uint8_t *pbChunk;
5428	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5429	if (pChunk)
5430	{
5431	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5432	{
5433	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5434	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5435	if (pPage)
5436	{
5437	/*
5438	* Walk the chunks
5439	*/
5440	pReq->fDuplicate = false;
5441	RTListForEach(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
5442	{
5443	if (gmmR0FindDupPageInChunk(pGMM, pGVM, pChunk, pbSourcePage))
5444	{
5445	pReq->fDuplicate = true;
5446	break;
5447	}
5448	}
5449	}
5450	else
5451	{
5452	AssertFailed();
5453	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5454	}
5455	}
5456	else
5457	AssertFailed();
5458	}
5459	else
5460	AssertFailed();
5461	}
5462	else
5463	rc = VERR_GMM_IS_NOT_SANE;
5464
5465	gmmR0MutexRelease(pGMM);
5466	return rc;
5467	}
5468
5469	#endif /* VBOX_STRICT */
5470
5471
5472	/**
5473	* Retrieves the GMM statistics visible to the caller.
5474	*
5475	* @returns VBox status code.
5476	*
5477	* @param pStats Where to put the statistics.
5478	* @param pSession The current session.
5479	* @param pGVM The GVM to obtain statistics for. Optional.
5480	*/
5481	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5482	{
5483	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5484
5485	/*
5486	* Validate input.
5487	*/
5488	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5489	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5490	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5491
5492	PGMM pGMM;
5493	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5494
5495	/*
5496	* Validate the VM handle, if not NULL, and lock the GMM.
5497	*/
5498	int rc;
5499	if (pGVM)
5500	{
5501	rc = GVMMR0ValidateGVM(pGVM);
5502	if (RT_FAILURE(rc))
5503	return rc;
5504	}
5505
5506	rc = gmmR0MutexAcquire(pGMM);
5507	if (RT_FAILURE(rc))
5508	return rc;
5509
5510	/*
5511	* Copy out the GMM statistics.
5512	*/
5513	pStats->cMaxPages = pGMM->cMaxPages;
5514	pStats->cReservedPages = pGMM->cReservedPages;
5515	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5516	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5517	pStats->cSharedPages = pGMM->cSharedPages;
5518	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5519	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5520	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5521	pStats->cChunks = pGMM->cChunks;
5522	pStats->cFreedChunks = pGMM->cFreedChunks;
5523	pStats->cShareableModules = pGMM->cShareableModules;
5524	pStats->idFreeGeneration = pGMM->idFreeGeneration;
5525	RT_ZERO(pStats->au64Reserved);
5526
5527	/*
5528	* Copy out the VM statistics.
5529	*/
5530	if (pGVM)
5531	pStats->VMStats = pGVM->gmm.s.Stats;
5532	else
5533	RT_ZERO(pStats->VMStats);
5534
5535	gmmR0MutexRelease(pGMM);
5536	return rc;
5537	}
5538
5539
5540	/**
5541	* VMMR0 request wrapper for GMMR0QueryStatistics.
5542	*
5543	* @returns see GMMR0QueryStatistics.
5544	* @param pGVM The global (ring-0) VM structure. Optional.
5545	* @param pReq Pointer to the request packet.
5546	*/
5547	GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5548	{
5549	/*
5550	* Validate input and pass it on.
5551	*/
5552	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5553	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5554
5555	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5556	}
5557
5558
5559	/**
5560	* Resets the specified GMM statistics.
5561	*
5562	* @returns VBox status code.
5563	*
5564	* @param pStats Which statistics to reset, that is, non-zero fields
5565	* indicates which to reset.
5566	* @param pSession The current session.
5567	* @param pGVM The GVM to reset statistics for. Optional.
5568	*/
5569	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5570	{
5571	NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5572	/* Currently nothing we can reset at the moment. */
5573	return VINF_SUCCESS;
5574	}
5575
5576
5577	/**
5578	* VMMR0 request wrapper for GMMR0ResetStatistics.
5579	*
5580	* @returns see GMMR0ResetStatistics.
5581	* @param pGVM The global (ring-0) VM structure. Optional.
5582	* @param pReq Pointer to the request packet.
5583	*/
5584	GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5585	{
5586	/*
5587	* Validate input and pass it on.
5588	*/
5589	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5590	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5591
5592	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5593	}
5594

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 92339

Download in other formats: