GMMR0.cpp@ 82821

Last change on this file since 82821 was 82591, checked in by vboxsync, 5 years ago
VMM: Changing how we access guest RAM when in ring-0 (disabled). bugref:9627
Property svn:eol-style set to `native` Property svn:keywords set to `Id Revision`
File size: 191.8 KB

Line
1	/* $Id: GMMR0.cpp 82591 2019-12-16 17:55:40Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2019 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relationship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref sec_pgmPhys_Serializing.
115	*
116	* @see @ref sec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*********************************************************************************************************************************
150	* Header Files *
151	*********************************************************************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/gmm.h>
155	#include "GMMR0Internal.h"
156	#include <VBox/vmm/vmcc.h>
157	#include <VBox/vmm/pgm.h>
158	#include <VBox/log.h>
159	#include <VBox/param.h>
160	#include <VBox/err.h>
161	#include <VBox/VMMDev.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#ifdef VBOX_STRICT
165	# include <iprt/crc.h>
166	#endif
167	#include <iprt/critsect.h>
168	#include <iprt/list.h>
169	#include <iprt/mem.h>
170	#include <iprt/memobj.h>
171	#include <iprt/mp.h>
172	#include <iprt/semaphore.h>
173	#include <iprt/string.h>
174	#include <iprt/time.h>
175
176
177	/*********************************************************************************************************************************
178	* Defined Constants And Macros *
179	*********************************************************************************************************************************/
180	/** @def VBOX_USE_CRIT_SECT_FOR_GIANT
181	* Use a critical section instead of a fast mutex for the giant GMM lock.
182	*
183	* @remarks This is primarily a way of avoiding the deadlock checks in the
184	* windows driver verifier. */
185	#if defined(RT_OS_WINDOWS) \|\| defined(DOXYGEN_RUNNING)
186	# define VBOX_USE_CRIT_SECT_FOR_GIANT
187	#endif
188
189
190	/*********************************************************************************************************************************
191	* Structures and Typedefs *
192	*********************************************************************************************************************************/
193	/** Pointer to set of free chunks. */
194	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
195
196	/**
197	* The per-page tracking structure employed by the GMM.
198	*
199	* On 32-bit hosts we'll some trickery is necessary to compress all
200	* the information into 32-bits. When the fSharedFree member is set,
201	* the 30th bit decides whether it's a free page or not.
202	*
203	* Because of the different layout on 32-bit and 64-bit hosts, macros
204	* are used to get and set some of the data.
205	*/
206	typedef union GMMPAGE
207	{
208	#if HC_ARCH_BITS == 64
209	/** Unsigned integer view. */
210	uint64_t u;
211
212	/** The common view. */
213	struct GMMPAGECOMMON
214	{
215	uint32_t uStuff1 : 32;
216	uint32_t uStuff2 : 30;
217	/** The page state. */
218	uint32_t u2State : 2;
219	} Common;
220
221	/** The view of a private page. */
222	struct GMMPAGEPRIVATE
223	{
224	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
225	uint32_t pfn;
226	/** The GVM handle. (64K VMs) */
227	uint32_t hGVM : 16;
228	/** Reserved. */
229	uint32_t u16Reserved : 14;
230	/** The page state. */
231	uint32_t u2State : 2;
232	} Private;
233
234	/** The view of a shared page. */
235	struct GMMPAGESHARED
236	{
237	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
238	uint32_t pfn;
239	/** The reference count (64K VMs). */
240	uint32_t cRefs : 16;
241	/** Used for debug checksumming. */
242	uint32_t u14Checksum : 14;
243	/** The page state. */
244	uint32_t u2State : 2;
245	} Shared;
246
247	/** The view of a free page. */
248	struct GMMPAGEFREE
249	{
250	/** The index of the next page in the free list. UINT16_MAX is NIL. */
251	uint16_t iNext;
252	/** Reserved. Checksum or something? */
253	uint16_t u16Reserved0;
254	/** Reserved. Checksum or something? */
255	uint32_t u30Reserved1 : 30;
256	/** The page state. */
257	uint32_t u2State : 2;
258	} Free;
259
260	#else /* 32-bit */
261	/** Unsigned integer view. */
262	uint32_t u;
263
264	/** The common view. */
265	struct GMMPAGECOMMON
266	{
267	uint32_t uStuff : 30;
268	/** The page state. */
269	uint32_t u2State : 2;
270	} Common;
271
272	/** The view of a private page. */
273	struct GMMPAGEPRIVATE
274	{
275	/** The guest page frame number. (Max addressable: 2 ^ 36) */
276	uint32_t pfn : 24;
277	/** The GVM handle. (127 VMs) */
278	uint32_t hGVM : 7;
279	/** The top page state bit, MBZ. */
280	uint32_t fZero : 1;
281	} Private;
282
283	/** The view of a shared page. */
284	struct GMMPAGESHARED
285	{
286	/** The reference count. */
287	uint32_t cRefs : 30;
288	/** The page state. */
289	uint32_t u2State : 2;
290	} Shared;
291
292	/** The view of a free page. */
293	struct GMMPAGEFREE
294	{
295	/** The index of the next page in the free list. UINT16_MAX is NIL. */
296	uint32_t iNext : 16;
297	/** Reserved. Checksum or something? */
298	uint32_t u14Reserved : 14;
299	/** The page state. */
300	uint32_t u2State : 2;
301	} Free;
302	#endif
303	} GMMPAGE;
304	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
305	/** Pointer to a GMMPAGE. */
306	typedef GMMPAGE *PGMMPAGE;
307
308
309	/** @name The Page States.
310	* @{ */
311	/** A private page. */
312	#define GMM_PAGE_STATE_PRIVATE 0
313	/** A private page - alternative value used on the 32-bit implementation.
314	* This will never be used on 64-bit hosts. */
315	#define GMM_PAGE_STATE_PRIVATE_32 1
316	/** A shared page. */
317	#define GMM_PAGE_STATE_SHARED 2
318	/** A free page. */
319	#define GMM_PAGE_STATE_FREE 3
320	/** @} */
321
322
323	/** @def GMM_PAGE_IS_PRIVATE
324	*
325	* @returns true if private, false if not.
326	* @param pPage The GMM page.
327	*/
328	#if HC_ARCH_BITS == 64
329	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
330	#else
331	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
332	#endif
333
334	/** @def GMM_PAGE_IS_SHARED
335	*
336	* @returns true if shared, false if not.
337	* @param pPage The GMM page.
338	*/
339	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
340
341	/** @def GMM_PAGE_IS_FREE
342	*
343	* @returns true if free, false if not.
344	* @param pPage The GMM page.
345	*/
346	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
347
348	/** @def GMM_PAGE_PFN_LAST
349	* The last valid guest pfn range.
350	* @remark Some of the values outside the range has special meaning,
351	* see GMM_PAGE_PFN_UNSHAREABLE.
352	*/
353	#if HC_ARCH_BITS == 64
354	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
355	#else
356	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
357	#endif
358	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
359
360	/** @def GMM_PAGE_PFN_UNSHAREABLE
361	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
362	*/
363	#if HC_ARCH_BITS == 64
364	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
365	#else
366	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
367	#endif
368	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
369
370
371	/**
372	* A GMM allocation chunk ring-3 mapping record.
373	*
374	* This should really be associated with a session and not a VM, but
375	* it's simpler to associated with a VM and cleanup with the VM object
376	* is destroyed.
377	*/
378	typedef struct GMMCHUNKMAP
379	{
380	/** The mapping object. */
381	RTR0MEMOBJ hMapObj;
382	/** The VM owning the mapping. */
383	PGVM pGVM;
384	} GMMCHUNKMAP;
385	/** Pointer to a GMM allocation chunk mapping. */
386	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
387
388
389	/**
390	* A GMM allocation chunk.
391	*/
392	typedef struct GMMCHUNK
393	{
394	/** The AVL node core.
395	* The Key is the chunk ID. (Giant mtx.) */
396	AVLU32NODECORE Core;
397	/** The memory object.
398	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
399	* what the host can dish up with. (Chunk mtx protects mapping accesses
400	* and related frees.) */
401	RTR0MEMOBJ hMemObj;
402	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
403	/** Pointer to the kernel mapping. */
404	uint8_t *pbMapping;
405	#endif
406	/** Pointer to the next chunk in the free list. (Giant mtx.) */
407	PGMMCHUNK pFreeNext;
408	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
409	PGMMCHUNK pFreePrev;
410	/** Pointer to the free set this chunk belongs to. NULL for
411	* chunks with no free pages. (Giant mtx.) */
412	PGMMCHUNKFREESET pSet;
413	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
414	RTLISTNODE ListNode;
415	/** Pointer to an array of mappings. (Chunk mtx.) */
416	PGMMCHUNKMAP paMappingsX;
417	/** The number of mappings. (Chunk mtx.) */
418	uint16_t cMappingsX;
419	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
420	* mapping or freeing anything. (Giant mtx.) */
421	uint8_t volatile iChunkMtx;
422	/** Flags field reserved for future use (like eliminating enmType).
423	* (Giant mtx.) */
424	uint8_t fFlags;
425	/** The head of the list of free pages. UINT16_MAX is the NIL value.
426	* (Giant mtx.) */
427	uint16_t iFreeHead;
428	/** The number of free pages. (Giant mtx.) */
429	uint16_t cFree;
430	/** The GVM handle of the VM that first allocated pages from this chunk, this
431	* is used as a preference when there are several chunks to choose from.
432	* When in bound memory mode this isn't a preference any longer. (Giant
433	* mtx.) */
434	uint16_t hGVM;
435	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
436	* future use.) (Giant mtx.) */
437	uint16_t idNumaNode;
438	/** The number of private pages. (Giant mtx.) */
439	uint16_t cPrivate;
440	/** The number of shared pages. (Giant mtx.) */
441	uint16_t cShared;
442	/** The pages. (Giant mtx.) */
443	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
444	} GMMCHUNK;
445
446	/** Indicates that the NUMA properies of the memory is unknown. */
447	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
448
449	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
450	* @{ */
451	/** Indicates that the chunk is a large page (2MB). */
452	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
453	/** @} */
454
455
456	/**
457	* An allocation chunk TLB entry.
458	*/
459	typedef struct GMMCHUNKTLBE
460	{
461	/** The chunk id. */
462	uint32_t idChunk;
463	/** Pointer to the chunk. */
464	PGMMCHUNK pChunk;
465	} GMMCHUNKTLBE;
466	/** Pointer to an allocation chunk TLB entry. */
467	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
468
469
470	/** The number of entries tin the allocation chunk TLB. */
471	#define GMM_CHUNKTLB_ENTRIES 32
472	/** Gets the TLB entry index for the given Chunk ID. */
473	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
474
475	/**
476	* An allocation chunk TLB.
477	*/
478	typedef struct GMMCHUNKTLB
479	{
480	/** The TLB entries. */
481	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
482	} GMMCHUNKTLB;
483	/** Pointer to an allocation chunk TLB. */
484	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
485
486
487	/**
488	* The GMM instance data.
489	*/
490	typedef struct GMM
491	{
492	/** Magic / eye catcher. GMM_MAGIC */
493	uint32_t u32Magic;
494	/** The number of threads waiting on the mutex. */
495	uint32_t cMtxContenders;
496	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
497	/** The critical section protecting the GMM.
498	* More fine grained locking can be implemented later if necessary. */
499	RTCRITSECT GiantCritSect;
500	#else
501	/** The fast mutex protecting the GMM.
502	* More fine grained locking can be implemented later if necessary. */
503	RTSEMFASTMUTEX hMtx;
504	#endif
505	#ifdef VBOX_STRICT
506	/** The current mutex owner. */
507	RTNATIVETHREAD hMtxOwner;
508	#endif
509	/** The chunk tree. */
510	PAVLU32NODECORE pChunks;
511	/** The chunk TLB. */
512	GMMCHUNKTLB ChunkTLB;
513	/** The private free set. */
514	GMMCHUNKFREESET PrivateX;
515	/** The shared free set. */
516	GMMCHUNKFREESET Shared;
517
518	/** Shared module tree (global).
519	* @todo separate trees for distinctly different guest OSes. */
520	PAVLLU32NODECORE pGlobalSharedModuleTree;
521	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
522	uint32_t cShareableModules;
523
524	/** The chunk list. For simplifying the cleanup process. */
525	RTLISTANCHOR ChunkList;
526
527	/** The maximum number of pages we're allowed to allocate.
528	* @gcfgm{GMM/MaxPages,64-bit, Direct.}
529	* @gcfgm{GMM/PctPages,32-bit, Relative to the number of host pages.} */
530	uint64_t cMaxPages;
531	/** The number of pages that has been reserved.
532	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
533	uint64_t cReservedPages;
534	/** The number of pages that we have over-committed in reservations. */
535	uint64_t cOverCommittedPages;
536	/** The number of actually allocated (committed if you like) pages. */
537	uint64_t cAllocatedPages;
538	/** The number of pages that are shared. A subset of cAllocatedPages. */
539	uint64_t cSharedPages;
540	/** The number of pages that are actually shared between VMs. */
541	uint64_t cDuplicatePages;
542	/** The number of pages that are shared that has been left behind by
543	* VMs not doing proper cleanups. */
544	uint64_t cLeftBehindSharedPages;
545	/** The number of allocation chunks.
546	* (The number of pages we've allocated from the host can be derived from this.) */
547	uint32_t cChunks;
548	/** The number of current ballooned pages. */
549	uint64_t cBalloonedPages;
550
551	/** The legacy allocation mode indicator.
552	* This is determined at initialization time. */
553	bool fLegacyAllocationMode;
554	/** The bound memory mode indicator.
555	* When set, the memory will be bound to a specific VM and never
556	* shared. This is always set if fLegacyAllocationMode is set.
557	* (Also determined at initialization time.) */
558	bool fBoundMemoryMode;
559	/** The number of registered VMs. */
560	uint16_t cRegisteredVMs;
561
562	/** The number of freed chunks ever. This is used a list generation to
563	* avoid restarting the cleanup scanning when the list wasn't modified. */
564	uint32_t cFreedChunks;
565	/** The previous allocated Chunk ID.
566	* Used as a hint to avoid scanning the whole bitmap. */
567	uint32_t idChunkPrev;
568	/** Chunk ID allocation bitmap.
569	* Bits of allocated IDs are set, free ones are clear.
570	* The NIL id (0) is marked allocated. */
571	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
572
573	/** The index of the next mutex to use. */
574	uint32_t iNextChunkMtx;
575	/** Chunk locks for reducing lock contention without having to allocate
576	* one lock per chunk. */
577	struct
578	{
579	/** The mutex */
580	RTSEMFASTMUTEX hMtx;
581	/** The number of threads currently using this mutex. */
582	uint32_t volatile cUsers;
583	} aChunkMtx[64];
584	} GMM;
585	/** Pointer to the GMM instance. */
586	typedef GMM *PGMM;
587
588	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
589	#define GMM_MAGIC UINT32_C(0x19540414)
590
591
592	/**
593	* GMM chunk mutex state.
594	*
595	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
596	* gmmR0ChunkMutex* methods.
597	*/
598	typedef struct GMMR0CHUNKMTXSTATE
599	{
600	PGMM pGMM;
601	/** The index of the chunk mutex. */
602	uint8_t iChunkMtx;
603	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
604	uint8_t fFlags;
605	} GMMR0CHUNKMTXSTATE;
606	/** Pointer to a chunk mutex state. */
607	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
608
609	/** @name GMMR0CHUNK_MTX_XXX
610	* @{ */
611	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
612	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
613	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
614	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
615	#define GMMR0CHUNK_MTX_END UINT32_C(4)
616	/** @} */
617
618
619	/** The maximum number of shared modules per-vm. */
620	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
621	/** The maximum number of shared modules GMM is allowed to track. */
622	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
623
624
625	/**
626	* Argument packet for gmmR0SharedModuleCleanup.
627	*/
628	typedef struct GMMR0SHMODPERVMDTORARGS
629	{
630	PGVM pGVM;
631	PGMM pGMM;
632	} GMMR0SHMODPERVMDTORARGS;
633
634	/**
635	* Argument packet for gmmR0CheckSharedModule.
636	*/
637	typedef struct GMMCHECKSHAREDMODULEINFO
638	{
639	PGVM pGVM;
640	VMCPUID idCpu;
641	} GMMCHECKSHAREDMODULEINFO;
642
643	/**
644	* Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
645	*/
646	typedef struct GMMFINDDUPPAGEINFO
647	{
648	PGVM pGVM;
649	PGMM pGMM;
650	uint8_t *pSourcePage;
651	bool fFoundDuplicate;
652	} GMMFINDDUPPAGEINFO;
653
654
655	/*********************************************************************************************************************************
656	* Global Variables *
657	*********************************************************************************************************************************/
658	/** Pointer to the GMM instance data. */
659	static PGMM g_pGMM = NULL;
660
661	/** Macro for obtaining and validating the g_pGMM pointer.
662	*
663	* On failure it will return from the invoking function with the specified
664	* return value.
665	*
666	* @param pGMM The name of the pGMM variable.
667	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
668	* status codes.
669	*/
670	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
671	do { \
672	(pGMM) = g_pGMM; \
673	AssertPtrReturn((pGMM), (rc)); \
674	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
675	} while (0)
676
677	/** Macro for obtaining and validating the g_pGMM pointer, void function
678	* variant.
679	*
680	* On failure it will return from the invoking function.
681	*
682	* @param pGMM The name of the pGMM variable.
683	*/
684	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
685	do { \
686	(pGMM) = g_pGMM; \
687	AssertPtrReturnVoid((pGMM)); \
688	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
689	} while (0)
690
691
692	/** @def GMM_CHECK_SANITY_UPON_ENTERING
693	* Checks the sanity of the GMM instance data before making changes.
694	*
695	* This is macro is a stub by default and must be enabled manually in the code.
696	*
697	* @returns true if sane, false if not.
698	* @param pGMM The name of the pGMM variable.
699	*/
700	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
701	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
702	#else
703	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
704	#endif
705
706	/** @def GMM_CHECK_SANITY_UPON_LEAVING
707	* Checks the sanity of the GMM instance data after making changes.
708	*
709	* This is macro is a stub by default and must be enabled manually in the code.
710	*
711	* @returns true if sane, false if not.
712	* @param pGMM The name of the pGMM variable.
713	*/
714	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
715	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
716	#else
717	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
718	#endif
719
720	/** @def GMM_CHECK_SANITY_IN_LOOPS
721	* Checks the sanity of the GMM instance in the allocation loops.
722	*
723	* This is macro is a stub by default and must be enabled manually in the code.
724	*
725	* @returns true if sane, false if not.
726	* @param pGMM The name of the pGMM variable.
727	*/
728	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
729	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
730	#else
731	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
732	#endif
733
734
735	/*********************************************************************************************************************************
736	* Internal Functions *
737	*********************************************************************************************************************************/
738	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
739	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
740	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
741	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
742	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
743	#ifdef GMMR0_WITH_SANITY_CHECK
744	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
745	#endif
746	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
747	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
748	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
749	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
750	#ifdef VBOX_WITH_PAGE_SHARING
751	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
752	# ifdef VBOX_STRICT
753	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
754	# endif
755	#endif
756
757
758
759	/**
760	* Initializes the GMM component.
761	*
762	* This is called when the VMMR0.r0 module is loaded and protected by the
763	* loader semaphore.
764	*
765	* @returns VBox status code.
766	*/
767	GMMR0DECL(int) GMMR0Init(void)
768	{
769	LogFlow(("GMMInit:\n"));
770
771	/*
772	* Allocate the instance data and the locks.
773	*/
774	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
775	if (!pGMM)
776	return VERR_NO_MEMORY;
777
778	pGMM->u32Magic = GMM_MAGIC;
779	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
780	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
781	RTListInit(&pGMM->ChunkList);
782	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
783
784	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
785	int rc = RTCritSectInit(&pGMM->GiantCritSect);
786	#else
787	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
788	#endif
789	if (RT_SUCCESS(rc))
790	{
791	unsigned iMtx;
792	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
793	{
794	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
795	if (RT_FAILURE(rc))
796	break;
797	}
798	if (RT_SUCCESS(rc))
799	{
800	/*
801	* Check and see if RTR0MemObjAllocPhysNC works.
802	*/
803	#if 0 /* later, see @bufref{3170}. */
804	RTR0MEMOBJ MemObj;
805	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
806	if (RT_SUCCESS(rc))
807	{
808	rc = RTR0MemObjFree(MemObj, true);
809	AssertRC(rc);
810	}
811	else if (rc == VERR_NOT_SUPPORTED)
812	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
813	else
814	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
815	#else
816	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
817	pGMM->fLegacyAllocationMode = false;
818	# if ARCH_BITS == 32
819	/* Don't reuse possibly partial chunks because of the virtual
820	address space limitation. */
821	pGMM->fBoundMemoryMode = true;
822	# else
823	pGMM->fBoundMemoryMode = false;
824	# endif
825	# else
826	pGMM->fLegacyAllocationMode = true;
827	pGMM->fBoundMemoryMode = true;
828	# endif
829	#endif
830
831	/*
832	* Query system page count and guess a reasonable cMaxPages value.
833	*/
834	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
835
836	g_pGMM = pGMM;
837	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
838	return VINF_SUCCESS;
839	}
840
841	/*
842	* Bail out.
843	*/
844	while (iMtx-- > 0)
845	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
846	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
847	RTCritSectDelete(&pGMM->GiantCritSect);
848	#else
849	RTSemFastMutexDestroy(pGMM->hMtx);
850	#endif
851	}
852
853	pGMM->u32Magic = 0;
854	RTMemFree(pGMM);
855	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
856	return rc;
857	}
858
859
860	/**
861	* Terminates the GMM component.
862	*/
863	GMMR0DECL(void) GMMR0Term(void)
864	{
865	LogFlow(("GMMTerm:\n"));
866
867	/*
868	* Take care / be paranoid...
869	*/
870	PGMM pGMM = g_pGMM;
871	if (!VALID_PTR(pGMM))
872	return;
873	if (pGMM->u32Magic != GMM_MAGIC)
874	{
875	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
876	return;
877	}
878
879	/*
880	* Undo what init did and free all the resources we've acquired.
881	*/
882	/* Destroy the fundamentals. */
883	g_pGMM = NULL;
884	pGMM->u32Magic = ~GMM_MAGIC;
885	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
886	RTCritSectDelete(&pGMM->GiantCritSect);
887	#else
888	RTSemFastMutexDestroy(pGMM->hMtx);
889	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
890	#endif
891
892	/* Free any chunks still hanging around. */
893	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
894
895	/* Destroy the chunk locks. */
896	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
897	{
898	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
899	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
900	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
901	}
902
903	/* Finally the instance data itself. */
904	RTMemFree(pGMM);
905	LogFlow(("GMMTerm: done\n"));
906	}
907
908
909	/**
910	* RTAvlU32Destroy callback.
911	*
912	* @returns 0
913	* @param pNode The node to destroy.
914	* @param pvGMM The GMM handle.
915	*/
916	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
917	{
918	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
919
920	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
921	SUPR0Printf("GMMR0Term: %RKv/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
922	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
923
924	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
925	if (RT_FAILURE(rc))
926	{
927	SUPR0Printf("GMMR0Term: %RKv/%#x: RTRMemObjFree(%RKv,true) -> %d (cMappings=%d)\n", pChunk,
928	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
929	AssertRC(rc);
930	}
931	pChunk->hMemObj = NIL_RTR0MEMOBJ;
932
933	RTMemFree(pChunk->paMappingsX);
934	pChunk->paMappingsX = NULL;
935
936	RTMemFree(pChunk);
937	NOREF(pvGMM);
938	return 0;
939	}
940
941
942	/**
943	* Initializes the per-VM data for the GMM.
944	*
945	* This is called from within the GVMM lock (from GVMMR0CreateVM)
946	* and should only initialize the data members so GMMR0CleanupVM
947	* can deal with them. We reserve no memory or anything here,
948	* that's done later in GMMR0InitVM.
949	*
950	* @param pGVM Pointer to the Global VM structure.
951	*/
952	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
953	{
954	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
955
956	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
957	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
958	pGVM->gmm.s.Stats.fMayAllocate = false;
959	}
960
961
962	/**
963	* Acquires the GMM giant lock.
964	*
965	* @returns Assert status code from RTSemFastMutexRequest.
966	* @param pGMM Pointer to the GMM instance.
967	*/
968	static int gmmR0MutexAcquire(PGMM pGMM)
969	{
970	ASMAtomicIncU32(&pGMM->cMtxContenders);
971	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
972	int rc = RTCritSectEnter(&pGMM->GiantCritSect);
973	#else
974	int rc = RTSemFastMutexRequest(pGMM->hMtx);
975	#endif
976	ASMAtomicDecU32(&pGMM->cMtxContenders);
977	AssertRC(rc);
978	#ifdef VBOX_STRICT
979	pGMM->hMtxOwner = RTThreadNativeSelf();
980	#endif
981	return rc;
982	}
983
984
985	/**
986	* Releases the GMM giant lock.
987	*
988	* @returns Assert status code from RTSemFastMutexRequest.
989	* @param pGMM Pointer to the GMM instance.
990	*/
991	static int gmmR0MutexRelease(PGMM pGMM)
992	{
993	#ifdef VBOX_STRICT
994	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
995	#endif
996	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
997	int rc = RTCritSectLeave(&pGMM->GiantCritSect);
998	#else
999	int rc = RTSemFastMutexRelease(pGMM->hMtx);
1000	AssertRC(rc);
1001	#endif
1002	return rc;
1003	}
1004
1005
1006	/**
1007	* Yields the GMM giant lock if there is contention and a certain minimum time
1008	* has elapsed since we took it.
1009	*
1010	* @returns @c true if the mutex was yielded, @c false if not.
1011	* @param pGMM Pointer to the GMM instance.
1012	* @param puLockNanoTS Where the lock acquisition time stamp is kept
1013	* (in/out).
1014	*/
1015	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
1016	{
1017	/*
1018	* If nobody is contending the mutex, don't bother checking the time.
1019	*/
1020	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
1021	return false;
1022
1023	/*
1024	* Don't yield if we haven't executed for at least 2 milliseconds.
1025	*/
1026	uint64_t uNanoNow = RTTimeSystemNanoTS();
1027	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
1028	return false;
1029
1030	/*
1031	* Yield the mutex.
1032	*/
1033	#ifdef VBOX_STRICT
1034	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
1035	#endif
1036	ASMAtomicIncU32(&pGMM->cMtxContenders);
1037	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1038	int rc1 = RTCritSectLeave(&pGMM->GiantCritSect); AssertRC(rc1);
1039	#else
1040	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
1041	#endif
1042
1043	RTThreadYield();
1044
1045	#ifdef VBOX_USE_CRIT_SECT_FOR_GIANT
1046	int rc2 = RTCritSectEnter(&pGMM->GiantCritSect); AssertRC(rc2);
1047	#else
1048	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
1049	#endif
1050	*puLockNanoTS = RTTimeSystemNanoTS();
1051	ASMAtomicDecU32(&pGMM->cMtxContenders);
1052	#ifdef VBOX_STRICT
1053	pGMM->hMtxOwner = RTThreadNativeSelf();
1054	#endif
1055
1056	return true;
1057	}
1058
1059
1060	/**
1061	* Acquires a chunk lock.
1062	*
1063	* The caller must own the giant lock.
1064	*
1065	* @returns Assert status code from RTSemFastMutexRequest.
1066	* @param pMtxState The chunk mutex state info. (Avoids
1067	* passing the same flags and stuff around
1068	* for subsequent release and drop-giant
1069	* calls.)
1070	* @param pGMM Pointer to the GMM instance.
1071	* @param pChunk Pointer to the chunk.
1072	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1073	*/
1074	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1075	{
1076	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1077	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1078
1079	pMtxState->pGMM = pGMM;
1080	pMtxState->fFlags = (uint8_t)fFlags;
1081
1082	/*
1083	* Get the lock index and reference the lock.
1084	*/
1085	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1086	uint32_t iChunkMtx = pChunk->iChunkMtx;
1087	if (iChunkMtx == UINT8_MAX)
1088	{
1089	iChunkMtx = pGMM->iNextChunkMtx++;
1090	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1091
1092	/* Try get an unused one... */
1093	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1094	{
1095	iChunkMtx = pGMM->iNextChunkMtx++;
1096	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1097	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1098	{
1099	iChunkMtx = pGMM->iNextChunkMtx++;
1100	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1101	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1102	{
1103	iChunkMtx = pGMM->iNextChunkMtx++;
1104	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1105	}
1106	}
1107	}
1108
1109	pChunk->iChunkMtx = iChunkMtx;
1110	}
1111	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1112	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1113	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1114
1115	/*
1116	* Drop the giant?
1117	*/
1118	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1119	{
1120	/** @todo GMM life cycle cleanup (we may race someone
1121	* destroying and cleaning up GMM)? */
1122	gmmR0MutexRelease(pGMM);
1123	}
1124
1125	/*
1126	* Take the chunk mutex.
1127	*/
1128	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1129	AssertRC(rc);
1130	return rc;
1131	}
1132
1133
1134	/**
1135	* Releases the GMM giant lock.
1136	*
1137	* @returns Assert status code from RTSemFastMutexRequest.
1138	* @param pMtxState Pointer to the chunk mutex state.
1139	* @param pChunk Pointer to the chunk if it's still
1140	* alive, NULL if it isn't. This is used to deassociate
1141	* the chunk from the mutex on the way out so a new one
1142	* can be selected next time, thus avoiding contented
1143	* mutexes.
1144	*/
1145	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1146	{
1147	PGMM pGMM = pMtxState->pGMM;
1148
1149	/*
1150	* Release the chunk mutex and reacquire the giant if requested.
1151	*/
1152	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1153	AssertRC(rc);
1154	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1155	rc = gmmR0MutexAcquire(pGMM);
1156	else
1157	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1158
1159	/*
1160	* Drop the chunk mutex user reference and deassociate it from the chunk
1161	* when possible.
1162	*/
1163	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1164	&& pChunk
1165	&& RT_SUCCESS(rc) )
1166	{
1167	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1168	pChunk->iChunkMtx = UINT8_MAX;
1169	else
1170	{
1171	rc = gmmR0MutexAcquire(pGMM);
1172	if (RT_SUCCESS(rc))
1173	{
1174	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1175	pChunk->iChunkMtx = UINT8_MAX;
1176	rc = gmmR0MutexRelease(pGMM);
1177	}
1178	}
1179	}
1180
1181	pMtxState->pGMM = NULL;
1182	return rc;
1183	}
1184
1185
1186	/**
1187	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1188	* chunk locked.
1189	*
1190	* This only works if gmmR0ChunkMutexAcquire was called with
1191	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1192	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1193	*
1194	* @returns VBox status code (assuming success is ok).
1195	* @param pMtxState Pointer to the chunk mutex state.
1196	*/
1197	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1198	{
1199	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1200	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1201	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1202	/** @todo GMM life cycle cleanup (we may race someone
1203	* destroying and cleaning up GMM)? */
1204	return gmmR0MutexRelease(pMtxState->pGMM);
1205	}
1206
1207
1208	/**
1209	* For experimenting with NUMA affinity and such.
1210	*
1211	* @returns The current NUMA Node ID.
1212	*/
1213	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1214	{
1215	#if 1
1216	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1217	#else
1218	return RTMpCpuId() / 16;
1219	#endif
1220	}
1221
1222
1223
1224	/**
1225	* Cleans up when a VM is terminating.
1226	*
1227	* @param pGVM Pointer to the Global VM structure.
1228	*/
1229	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1230	{
1231	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.hSelf=%#x}\n", pGVM, pGVM->hSelf));
1232
1233	PGMM pGMM;
1234	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1235
1236	#ifdef VBOX_WITH_PAGE_SHARING
1237	/*
1238	* Clean up all registered shared modules first.
1239	*/
1240	gmmR0SharedModuleCleanup(pGMM, pGVM);
1241	#endif
1242
1243	gmmR0MutexAcquire(pGMM);
1244	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1245	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1246
1247	/*
1248	* The policy is 'INVALID' until the initial reservation
1249	* request has been serviced.
1250	*/
1251	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1252	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1253	{
1254	/*
1255	* If it's the last VM around, we can skip walking all the chunk looking
1256	* for the pages owned by this VM and instead flush the whole shebang.
1257	*
1258	* This takes care of the eventuality that a VM has left shared page
1259	* references behind (shouldn't happen of course, but you never know).
1260	*/
1261	Assert(pGMM->cRegisteredVMs);
1262	pGMM->cRegisteredVMs--;
1263
1264	/*
1265	* Walk the entire pool looking for pages that belong to this VM
1266	* and leftover mappings. (This'll only catch private pages,
1267	* shared pages will be 'left behind'.)
1268	*/
1269	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1270	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1271
1272	unsigned iCountDown = 64;
1273	bool fRedoFromStart;
1274	PGMMCHUNK pChunk;
1275	do
1276	{
1277	fRedoFromStart = false;
1278	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1279	{
1280	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1281	if ( ( !pGMM->fBoundMemoryMode
1282	\|\| pChunk->hGVM == pGVM->hSelf)
1283	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1284	{
1285	/* We left the giant mutex, so reset the yield counters. */
1286	uLockNanoTS = RTTimeSystemNanoTS();
1287	iCountDown = 64;
1288	}
1289	else
1290	{
1291	/* Didn't leave it, so do normal yielding. */
1292	if (!iCountDown)
1293	gmmR0MutexYield(pGMM, &uLockNanoTS);
1294	else
1295	iCountDown--;
1296	}
1297	if (pGMM->cFreedChunks != cFreeChunksOld)
1298	{
1299	fRedoFromStart = true;
1300	break;
1301	}
1302	}
1303	} while (fRedoFromStart);
1304
1305	if (pGVM->gmm.s.Stats.cPrivatePages)
1306	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1307
1308	pGMM->cAllocatedPages -= cPrivatePages;
1309
1310	/*
1311	* Free empty chunks.
1312	*/
1313	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1314	do
1315	{
1316	fRedoFromStart = false;
1317	iCountDown = 10240;
1318	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1319	while (pChunk)
1320	{
1321	PGMMCHUNK pNext = pChunk->pFreeNext;
1322	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1323	if ( !pGMM->fBoundMemoryMode
1324	\|\| pChunk->hGVM == pGVM->hSelf)
1325	{
1326	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1327	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1328	{
1329	/* We've left the giant mutex, restart? (+1 for our unlink) */
1330	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1331	if (fRedoFromStart)
1332	break;
1333	uLockNanoTS = RTTimeSystemNanoTS();
1334	iCountDown = 10240;
1335	}
1336	}
1337
1338	/* Advance and maybe yield the lock. */
1339	pChunk = pNext;
1340	if (--iCountDown == 0)
1341	{
1342	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1343	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1344	&& pPrivateSet->idGeneration != idGenerationOld;
1345	if (fRedoFromStart)
1346	break;
1347	iCountDown = 10240;
1348	}
1349	}
1350	} while (fRedoFromStart);
1351
1352	/*
1353	* Account for shared pages that weren't freed.
1354	*/
1355	if (pGVM->gmm.s.Stats.cSharedPages)
1356	{
1357	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1358	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1359	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1360	}
1361
1362	/*
1363	* Clean up balloon statistics in case the VM process crashed.
1364	*/
1365	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1366	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1367
1368	/*
1369	* Update the over-commitment management statistics.
1370	*/
1371	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1372	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1373	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1374	switch (pGVM->gmm.s.Stats.enmPolicy)
1375	{
1376	case GMMOCPOLICY_NO_OC:
1377	break;
1378	default:
1379	/** @todo Update GMM->cOverCommittedPages */
1380	break;
1381	}
1382	}
1383
1384	/* zap the GVM data. */
1385	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1386	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1387	pGVM->gmm.s.Stats.fMayAllocate = false;
1388
1389	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1390	gmmR0MutexRelease(pGMM);
1391
1392	LogFlow(("GMMR0CleanupVM: returns\n"));
1393	}
1394
1395
1396	/**
1397	* Scan one chunk for private pages belonging to the specified VM.
1398	*
1399	* @note This function may drop the giant mutex!
1400	*
1401	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1402	* we didn't.
1403	* @param pGMM Pointer to the GMM instance.
1404	* @param pGVM The global VM handle.
1405	* @param pChunk The chunk to scan.
1406	*/
1407	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1408	{
1409	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1410
1411	/*
1412	* Look for pages belonging to the VM.
1413	* (Perform some internal checks while we're scanning.)
1414	*/
1415	#ifndef VBOX_STRICT
1416	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1417	#endif
1418	{
1419	unsigned cPrivate = 0;
1420	unsigned cShared = 0;
1421	unsigned cFree = 0;
1422
1423	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1424
1425	uint16_t hGVM = pGVM->hSelf;
1426	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1427	while (iPage-- > 0)
1428	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1429	{
1430	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1431	{
1432	/*
1433	* Free the page.
1434	*
1435	* The reason for not using gmmR0FreePrivatePage here is that we
1436	* must not cause the chunk to be freed from under us - we're in
1437	* an AVL tree walk here.
1438	*/
1439	pChunk->aPages[iPage].u = 0;
1440	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1441	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1442	pChunk->iFreeHead = iPage;
1443	pChunk->cPrivate--;
1444	pChunk->cFree++;
1445	pGVM->gmm.s.Stats.cPrivatePages--;
1446	cFree++;
1447	}
1448	else
1449	cPrivate++;
1450	}
1451	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1452	cFree++;
1453	else
1454	cShared++;
1455
1456	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1457
1458	/*
1459	* Did it add up?
1460	*/
1461	if (RT_UNLIKELY( pChunk->cFree != cFree
1462	\|\| pChunk->cPrivate != cPrivate
1463	\|\| pChunk->cShared != cShared))
1464	{
1465	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %RKv/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1466	pChunk, pChunk->Core.Key, pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1467	pChunk->cFree = cFree;
1468	pChunk->cPrivate = cPrivate;
1469	pChunk->cShared = cShared;
1470	}
1471	}
1472
1473	/*
1474	* If not in bound memory mode, we should reset the hGVM field
1475	* if it has our handle in it.
1476	*/
1477	if (pChunk->hGVM == pGVM->hSelf)
1478	{
1479	if (!g_pGMM->fBoundMemoryMode)
1480	pChunk->hGVM = NIL_GVM_HANDLE;
1481	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1482	{
1483	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1484	pChunk, pChunk->Core.Key, pChunk->cFree);
1485	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1486
1487	gmmR0UnlinkChunk(pChunk);
1488	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1489	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1490	}
1491	}
1492
1493	/*
1494	* Look for a mapping belonging to the terminating VM.
1495	*/
1496	GMMR0CHUNKMTXSTATE MtxState;
1497	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1498	unsigned cMappings = pChunk->cMappingsX;
1499	for (unsigned i = 0; i < cMappings; i++)
1500	if (pChunk->paMappingsX[i].pGVM == pGVM)
1501	{
1502	gmmR0ChunkMutexDropGiant(&MtxState);
1503
1504	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1505
1506	cMappings--;
1507	if (i < cMappings)
1508	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1509	pChunk->paMappingsX[cMappings].pGVM = NULL;
1510	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1511	Assert(pChunk->cMappingsX - 1U == cMappings);
1512	pChunk->cMappingsX = cMappings;
1513
1514	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1515	if (RT_FAILURE(rc))
1516	{
1517	SUPR0Printf("gmmR0CleanupVMScanChunk: %RKv/%#x: mapping #%x: RTRMemObjFree(%RKv,false) -> %d \n",
1518	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1519	AssertRC(rc);
1520	}
1521
1522	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1523	return true;
1524	}
1525
1526	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1527	return false;
1528	}
1529
1530
1531	/**
1532	* The initial resource reservations.
1533	*
1534	* This will make memory reservations according to policy and priority. If there aren't
1535	* sufficient resources available to sustain the VM this function will fail and all
1536	* future allocations requests will fail as well.
1537	*
1538	* These are just the initial reservations made very very early during the VM creation
1539	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1540	* ring-3 init has completed.
1541	*
1542	* @returns VBox status code.
1543	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1544	* @retval VERR_GMM_
1545	*
1546	* @param pGVM The global (ring-0) VM structure.
1547	* @param idCpu The VCPU id - must be zero.
1548	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1549	* This does not include MMIO2 and similar.
1550	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1551	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1552	* hyper heap, MMIO2 and similar.
1553	* @param enmPolicy The OC policy to use on this VM.
1554	* @param enmPriority The priority in an out-of-memory situation.
1555	*
1556	* @thread The creator thread / EMT(0).
1557	*/
1558	GMMR0DECL(int) GMMR0InitialReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages,
1559	uint32_t cFixedPages, GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1560	{
1561	LogFlow(("GMMR0InitialReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1562	pGVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1563
1564	/*
1565	* Validate, get basics and take the semaphore.
1566	*/
1567	AssertReturn(idCpu == 0, VERR_INVALID_CPU_ID);
1568	PGMM pGMM;
1569	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1570	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1571	if (RT_FAILURE(rc))
1572	return rc;
1573
1574	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1575	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1576	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1577	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1578	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1579
1580	gmmR0MutexAcquire(pGMM);
1581	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1582	{
1583	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1584	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1585	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1586	{
1587	/*
1588	* Check if we can accommodate this.
1589	*/
1590	/* ... later ... */
1591	if (RT_SUCCESS(rc))
1592	{
1593	/*
1594	* Update the records.
1595	*/
1596	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1597	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1598	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1599	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1600	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1601	pGVM->gmm.s.Stats.fMayAllocate = true;
1602
1603	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1604	pGMM->cRegisteredVMs++;
1605	}
1606	}
1607	else
1608	rc = VERR_WRONG_ORDER;
1609	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1610	}
1611	else
1612	rc = VERR_GMM_IS_NOT_SANE;
1613	gmmR0MutexRelease(pGMM);
1614	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1615	return rc;
1616	}
1617
1618
1619	/**
1620	* VMMR0 request wrapper for GMMR0InitialReservation.
1621	*
1622	* @returns see GMMR0InitialReservation.
1623	* @param pGVM The global (ring-0) VM structure.
1624	* @param idCpu The VCPU id.
1625	* @param pReq Pointer to the request packet.
1626	*/
1627	GMMR0DECL(int) GMMR0InitialReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1628	{
1629	/*
1630	* Validate input and pass it on.
1631	*/
1632	AssertPtrReturn(pGVM, VERR_INVALID_POINTER);
1633	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1634	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1635
1636	return GMMR0InitialReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages,
1637	pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1638	}
1639
1640
1641	/**
1642	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1643	*
1644	* @returns VBox status code.
1645	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1646	*
1647	* @param pGVM The global (ring-0) VM structure.
1648	* @param idCpu The VCPU id.
1649	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1650	* This does not include MMIO2 and similar.
1651	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1652	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1653	* hyper heap, MMIO2 and similar.
1654	*
1655	* @thread EMT(idCpu)
1656	*/
1657	GMMR0DECL(int) GMMR0UpdateReservation(PGVM pGVM, VMCPUID idCpu, uint64_t cBasePages,
1658	uint32_t cShadowPages, uint32_t cFixedPages)
1659	{
1660	LogFlow(("GMMR0UpdateReservation: pGVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1661	pGVM, cBasePages, cShadowPages, cFixedPages));
1662
1663	/*
1664	* Validate, get basics and take the semaphore.
1665	*/
1666	PGMM pGMM;
1667	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1668	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
1669	if (RT_FAILURE(rc))
1670	return rc;
1671
1672	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1673	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1674	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1675
1676	gmmR0MutexAcquire(pGMM);
1677	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1678	{
1679	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1680	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1681	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1682	{
1683	/*
1684	* Check if we can accommodate this.
1685	*/
1686	/* ... later ... */
1687	if (RT_SUCCESS(rc))
1688	{
1689	/*
1690	* Update the records.
1691	*/
1692	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1693	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1694	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1695	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1696
1697	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1698	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1699	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1700	}
1701	}
1702	else
1703	rc = VERR_WRONG_ORDER;
1704	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1705	}
1706	else
1707	rc = VERR_GMM_IS_NOT_SANE;
1708	gmmR0MutexRelease(pGMM);
1709	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1710	return rc;
1711	}
1712
1713
1714	/**
1715	* VMMR0 request wrapper for GMMR0UpdateReservation.
1716	*
1717	* @returns see GMMR0UpdateReservation.
1718	* @param pGVM The global (ring-0) VM structure.
1719	* @param idCpu The VCPU id.
1720	* @param pReq Pointer to the request packet.
1721	*/
1722	GMMR0DECL(int) GMMR0UpdateReservationReq(PGVM pGVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1723	{
1724	/*
1725	* Validate input and pass it on.
1726	*/
1727	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1728	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1729
1730	return GMMR0UpdateReservation(pGVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1731	}
1732
1733	#ifdef GMMR0_WITH_SANITY_CHECK
1734
1735	/**
1736	* Performs sanity checks on a free set.
1737	*
1738	* @returns Error count.
1739	*
1740	* @param pGMM Pointer to the GMM instance.
1741	* @param pSet Pointer to the set.
1742	* @param pszSetName The set name.
1743	* @param pszFunction The function from which it was called.
1744	* @param uLine The line number.
1745	*/
1746	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1747	const char *pszFunction, unsigned uLineNo)
1748	{
1749	uint32_t cErrors = 0;
1750
1751	/*
1752	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1753	*/
1754	uint32_t cPages = 0;
1755	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1756	{
1757	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1758	{
1759	/** @todo check that the chunk is hash into the right set. */
1760	cPages += pCur->cFree;
1761	}
1762	}
1763	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1764	{
1765	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1766	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1767	cErrors++;
1768	}
1769
1770	return cErrors;
1771	}
1772
1773
1774	/**
1775	* Performs some sanity checks on the GMM while owning lock.
1776	*
1777	* @returns Error count.
1778	*
1779	* @param pGMM Pointer to the GMM instance.
1780	* @param pszFunction The function from which it is called.
1781	* @param uLineNo The line number.
1782	*/
1783	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1784	{
1785	uint32_t cErrors = 0;
1786
1787	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1788	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1789	/** @todo add more sanity checks. */
1790
1791	return cErrors;
1792	}
1793
1794	#endif /* GMMR0_WITH_SANITY_CHECK */
1795
1796	/**
1797	* Looks up a chunk in the tree and fill in the TLB entry for it.
1798	*
1799	* This is not expected to fail and will bitch if it does.
1800	*
1801	* @returns Pointer to the allocation chunk, NULL if not found.
1802	* @param pGMM Pointer to the GMM instance.
1803	* @param idChunk The ID of the chunk to find.
1804	* @param pTlbe Pointer to the TLB entry.
1805	*/
1806	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1807	{
1808	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1809	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1810	pTlbe->idChunk = idChunk;
1811	pTlbe->pChunk = pChunk;
1812	return pChunk;
1813	}
1814
1815
1816	/**
1817	* Finds a allocation chunk.
1818	*
1819	* This is not expected to fail and will bitch if it does.
1820	*
1821	* @returns Pointer to the allocation chunk, NULL if not found.
1822	* @param pGMM Pointer to the GMM instance.
1823	* @param idChunk The ID of the chunk to find.
1824	*/
1825	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1826	{
1827	/*
1828	* Do a TLB lookup, branch if not in the TLB.
1829	*/
1830	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1831	if ( pTlbe->idChunk != idChunk
1832	\|\| !pTlbe->pChunk)
1833	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1834	return pTlbe->pChunk;
1835	}
1836
1837
1838	/**
1839	* Finds a page.
1840	*
1841	* This is not expected to fail and will bitch if it does.
1842	*
1843	* @returns Pointer to the page, NULL if not found.
1844	* @param pGMM Pointer to the GMM instance.
1845	* @param idPage The ID of the page to find.
1846	*/
1847	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1848	{
1849	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1850	if (RT_LIKELY(pChunk))
1851	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1852	return NULL;
1853	}
1854
1855
1856	#if 0 /* unused */
1857	/**
1858	* Gets the host physical address for a page given by it's ID.
1859	*
1860	* @returns The host physical address or NIL_RTHCPHYS.
1861	* @param pGMM Pointer to the GMM instance.
1862	* @param idPage The ID of the page to find.
1863	*/
1864	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1865	{
1866	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1867	if (RT_LIKELY(pChunk))
1868	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1869	return NIL_RTHCPHYS;
1870	}
1871	#endif /* unused */
1872
1873
1874	/**
1875	* Selects the appropriate free list given the number of free pages.
1876	*
1877	* @returns Free list index.
1878	* @param cFree The number of free pages in the chunk.
1879	*/
1880	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1881	{
1882	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1883	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1884	("%d (%u)\n", iList, cFree));
1885	return iList;
1886	}
1887
1888
1889	/**
1890	* Unlinks the chunk from the free list it's currently on (if any).
1891	*
1892	* @param pChunk The allocation chunk.
1893	*/
1894	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1895	{
1896	PGMMCHUNKFREESET pSet = pChunk->pSet;
1897	if (RT_LIKELY(pSet))
1898	{
1899	pSet->cFreePages -= pChunk->cFree;
1900	pSet->idGeneration++;
1901
1902	PGMMCHUNK pPrev = pChunk->pFreePrev;
1903	PGMMCHUNK pNext = pChunk->pFreeNext;
1904	if (pPrev)
1905	pPrev->pFreeNext = pNext;
1906	else
1907	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1908	if (pNext)
1909	pNext->pFreePrev = pPrev;
1910
1911	pChunk->pSet = NULL;
1912	pChunk->pFreeNext = NULL;
1913	pChunk->pFreePrev = NULL;
1914	}
1915	else
1916	{
1917	Assert(!pChunk->pFreeNext);
1918	Assert(!pChunk->pFreePrev);
1919	Assert(!pChunk->cFree);
1920	}
1921	}
1922
1923
1924	/**
1925	* Links the chunk onto the appropriate free list in the specified free set.
1926	*
1927	* If no free entries, it's not linked into any list.
1928	*
1929	* @param pChunk The allocation chunk.
1930	* @param pSet The free set.
1931	*/
1932	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1933	{
1934	Assert(!pChunk->pSet);
1935	Assert(!pChunk->pFreeNext);
1936	Assert(!pChunk->pFreePrev);
1937
1938	if (pChunk->cFree > 0)
1939	{
1940	pChunk->pSet = pSet;
1941	pChunk->pFreePrev = NULL;
1942	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1943	pChunk->pFreeNext = pSet->apLists[iList];
1944	if (pChunk->pFreeNext)
1945	pChunk->pFreeNext->pFreePrev = pChunk;
1946	pSet->apLists[iList] = pChunk;
1947
1948	pSet->cFreePages += pChunk->cFree;
1949	pSet->idGeneration++;
1950	}
1951	}
1952
1953
1954	/**
1955	* Links the chunk onto the appropriate free list in the specified free set.
1956	*
1957	* If no free entries, it's not linked into any list.
1958	*
1959	* @param pGMM Pointer to the GMM instance.
1960	* @param pGVM Pointer to the kernel-only VM instace data.
1961	* @param pChunk The allocation chunk.
1962	*/
1963	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1964	{
1965	PGMMCHUNKFREESET pSet;
1966	if (pGMM->fBoundMemoryMode)
1967	pSet = &pGVM->gmm.s.Private;
1968	else if (pChunk->cShared)
1969	pSet = &pGMM->Shared;
1970	else
1971	pSet = &pGMM->PrivateX;
1972	gmmR0LinkChunk(pChunk, pSet);
1973	}
1974
1975
1976	/**
1977	* Frees a Chunk ID.
1978	*
1979	* @param pGMM Pointer to the GMM instance.
1980	* @param idChunk The Chunk ID to free.
1981	*/
1982	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1983	{
1984	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1985	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1986	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1987	}
1988
1989
1990	/**
1991	* Allocates a new Chunk ID.
1992	*
1993	* @returns The Chunk ID.
1994	* @param pGMM Pointer to the GMM instance.
1995	*/
1996	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1997	{
1998	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1999	AssertCompile(NIL_GMM_CHUNKID == 0);
2000
2001	/*
2002	* Try the next sequential one.
2003	*/
2004	int32_t idChunk = ++pGMM->idChunkPrev;
2005	#if 0 /** @todo enable this code */
2006	if ( idChunk <= GMM_CHUNKID_LAST
2007	&& idChunk > NIL_GMM_CHUNKID
2008	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
2009	return idChunk;
2010	#endif
2011
2012	/*
2013	* Scan sequentially from the last one.
2014	*/
2015	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
2016	&& idChunk > NIL_GMM_CHUNKID)
2017	{
2018	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk - 1);
2019	if (idChunk > NIL_GMM_CHUNKID)
2020	{
2021	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2022	return pGMM->idChunkPrev = idChunk;
2023	}
2024	}
2025
2026	/*
2027	* Ok, scan from the start.
2028	* We're not racing anyone, so there is no need to expect failures or have restart loops.
2029	*/
2030	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
2031	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
2032	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
2033
2034	return pGMM->idChunkPrev = idChunk;
2035	}
2036
2037
2038	/**
2039	* Allocates one private page.
2040	*
2041	* Worker for gmmR0AllocatePages.
2042	*
2043	* @param pChunk The chunk to allocate it from.
2044	* @param hGVM The GVM handle of the VM requesting memory.
2045	* @param pPageDesc The page descriptor.
2046	*/
2047	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
2048	{
2049	/* update the chunk stats. */
2050	if (pChunk->hGVM == NIL_GVM_HANDLE)
2051	pChunk->hGVM = hGVM;
2052	Assert(pChunk->cFree);
2053	pChunk->cFree--;
2054	pChunk->cPrivate++;
2055
2056	/* unlink the first free page. */
2057	const uint32_t iPage = pChunk->iFreeHead;
2058	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2059	PGMMPAGE pPage = &pChunk->aPages[iPage];
2060	Assert(GMM_PAGE_IS_FREE(pPage));
2061	pChunk->iFreeHead = pPage->Free.iNext;
2062	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2063	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2064	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2065
2066	/* make the page private. */
2067	pPage->u = 0;
2068	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2069	pPage->Private.hGVM = hGVM;
2070	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2071	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2072	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2073	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2074	else
2075	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2076
2077	/* update the page descriptor. */
2078	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2079	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2080	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2081	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2082	}
2083
2084
2085	/**
2086	* Picks the free pages from a chunk.
2087	*
2088	* @returns The new page descriptor table index.
2089	* @param pChunk The chunk.
2090	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2091	* affinity.
2092	* @param iPage The current page descriptor table index.
2093	* @param cPages The total number of pages to allocate.
2094	* @param paPages The page descriptor table (input + ouput).
2095	*/
2096	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2097	PGMMPAGEDESC paPages)
2098	{
2099	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2100	gmmR0UnlinkChunk(pChunk);
2101
2102	for (; pChunk->cFree && iPage < cPages; iPage++)
2103	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2104
2105	gmmR0LinkChunk(pChunk, pSet);
2106	return iPage;
2107	}
2108
2109
2110	/**
2111	* Registers a new chunk of memory.
2112	*
2113	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2114	*
2115	* @returns VBox status code. On success, the giant GMM lock will be held, the
2116	* caller must release it (ugly).
2117	* @param pGMM Pointer to the GMM instance.
2118	* @param pSet Pointer to the set.
2119	* @param hMemObj The memory object for the chunk.
2120	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2121	* affinity.
2122	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2123	* @param ppChunk Chunk address (out). Optional.
2124	*
2125	* @remarks The caller must not own the giant GMM mutex.
2126	* The giant GMM mutex will be acquired and returned acquired in
2127	* the success path. On failure, no locks will be held.
2128	*/
2129	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ hMemObj, uint16_t hGVM, uint16_t fChunkFlags,
2130	PGMMCHUNK *ppChunk)
2131	{
2132	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2133	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2134	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2135
2136	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2137	/*
2138	* Get a ring-0 mapping of the object.
2139	*/
2140	uint8_t pbMapping = (uint8_t )RTR0MemObjAddress(hMemObj);
2141	if (!pbMapping)
2142	{
2143	RTR0MEMOBJ hMapObj;
2144	int rc = RTR0MemObjMapKernel(&hMapObj, hMemObj, (void *)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE);
2145	if (RT_SUCCESS(rc))
2146	pbMapping = (uint8_t *)RTR0MemObjAddress(hMapObj);
2147	else
2148	return rc;
2149	AssertPtr(pbMapping);
2150	}
2151	#endif
2152
2153	/*
2154	* Allocate a chunk.
2155	*/
2156	int rc;
2157	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2158	if (pChunk)
2159	{
2160	/*
2161	* Initialize it.
2162	*/
2163	pChunk->hMemObj = hMemObj;
2164	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
2165	pChunk->pbMapping = pbMapping;
2166	#endif
2167	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2168	pChunk->hGVM = hGVM;
2169	/pChunk->iFreeHead = 0;/
2170	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2171	pChunk->iChunkMtx = UINT8_MAX;
2172	pChunk->fFlags = fChunkFlags;
2173	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2174	{
2175	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2176	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2177	}
2178	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2179	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2180
2181	/*
2182	* Allocate a Chunk ID and insert it into the tree.
2183	* This has to be done behind the mutex of course.
2184	*/
2185	rc = gmmR0MutexAcquire(pGMM);
2186	if (RT_SUCCESS(rc))
2187	{
2188	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2189	{
2190	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2191	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2192	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2193	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2194	{
2195	pGMM->cChunks++;
2196	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2197	gmmR0LinkChunk(pChunk, pSet);
2198	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2199
2200	if (ppChunk)
2201	*ppChunk = pChunk;
2202	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2203	return VINF_SUCCESS;
2204	}
2205
2206	/* bail out */
2207	rc = VERR_GMM_CHUNK_INSERT;
2208	}
2209	else
2210	rc = VERR_GMM_IS_NOT_SANE;
2211	gmmR0MutexRelease(pGMM);
2212	}
2213
2214	RTMemFree(pChunk);
2215	}
2216	else
2217	rc = VERR_NO_MEMORY;
2218	return rc;
2219	}
2220
2221
2222	/**
2223	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2224	* what's remaining to the specified free set.
2225	*
2226	* @note This will leave the giant mutex while allocating the new chunk!
2227	*
2228	* @returns VBox status code.
2229	* @param pGMM Pointer to the GMM instance data.
2230	* @param pGVM Pointer to the kernel-only VM instace data.
2231	* @param pSet Pointer to the free set.
2232	* @param cPages The number of pages requested.
2233	* @param paPages The page descriptor table (input + output).
2234	* @param piPage The pointer to the page descriptor table index variable.
2235	* This will be updated.
2236	*/
2237	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2238	PGMMPAGEDESC paPages, uint32_t *piPage)
2239	{
2240	gmmR0MutexRelease(pGMM);
2241
2242	RTR0MEMOBJ hMemObj;
2243	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2244	if (RT_SUCCESS(rc))
2245	{
2246	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2247	* free pages first and then unchaining them right afterwards. Instead
2248	* do as much work as possible without holding the giant lock. */
2249	PGMMCHUNK pChunk;
2250	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2251	if (RT_SUCCESS(rc))
2252	{
2253	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2254	return VINF_SUCCESS;
2255	}
2256
2257	/* bail out */
2258	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
2259	}
2260
2261	int rc2 = gmmR0MutexAcquire(pGMM);
2262	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2263	return rc;
2264
2265	}
2266
2267
2268	/**
2269	* As a last restort we'll pick any page we can get.
2270	*
2271	* @returns The new page descriptor table index.
2272	* @param pSet The set to pick from.
2273	* @param pGVM Pointer to the global VM structure.
2274	* @param iPage The current page descriptor table index.
2275	* @param cPages The total number of pages to allocate.
2276	* @param paPages The page descriptor table (input + ouput).
2277	*/
2278	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2279	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2280	{
2281	unsigned iList = RT_ELEMENTS(pSet->apLists);
2282	while (iList-- > 0)
2283	{
2284	PGMMCHUNK pChunk = pSet->apLists[iList];
2285	while (pChunk)
2286	{
2287	PGMMCHUNK pNext = pChunk->pFreeNext;
2288
2289	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2290	if (iPage >= cPages)
2291	return iPage;
2292
2293	pChunk = pNext;
2294	}
2295	}
2296	return iPage;
2297	}
2298
2299
2300	/**
2301	* Pick pages from empty chunks on the same NUMA node.
2302	*
2303	* @returns The new page descriptor table index.
2304	* @param pSet The set to pick from.
2305	* @param pGVM Pointer to the global VM structure.
2306	* @param iPage The current page descriptor table index.
2307	* @param cPages The total number of pages to allocate.
2308	* @param paPages The page descriptor table (input + ouput).
2309	*/
2310	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2311	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2312	{
2313	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2314	if (pChunk)
2315	{
2316	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2317	while (pChunk)
2318	{
2319	PGMMCHUNK pNext = pChunk->pFreeNext;
2320
2321	if (pChunk->idNumaNode == idNumaNode)
2322	{
2323	pChunk->hGVM = pGVM->hSelf;
2324	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2325	if (iPage >= cPages)
2326	{
2327	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2328	return iPage;
2329	}
2330	}
2331
2332	pChunk = pNext;
2333	}
2334	}
2335	return iPage;
2336	}
2337
2338
2339	/**
2340	* Pick pages from non-empty chunks on the same NUMA node.
2341	*
2342	* @returns The new page descriptor table index.
2343	* @param pSet The set to pick from.
2344	* @param pGVM Pointer to the global VM structure.
2345	* @param iPage The current page descriptor table index.
2346	* @param cPages The total number of pages to allocate.
2347	* @param paPages The page descriptor table (input + ouput).
2348	*/
2349	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2350	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2351	{
2352	/** @todo start by picking from chunks with about the right size first? */
2353	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2354	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2355	while (iList-- > 0)
2356	{
2357	PGMMCHUNK pChunk = pSet->apLists[iList];
2358	while (pChunk)
2359	{
2360	PGMMCHUNK pNext = pChunk->pFreeNext;
2361
2362	if (pChunk->idNumaNode == idNumaNode)
2363	{
2364	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2365	if (iPage >= cPages)
2366	{
2367	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2368	return iPage;
2369	}
2370	}
2371
2372	pChunk = pNext;
2373	}
2374	}
2375	return iPage;
2376	}
2377
2378
2379	/**
2380	* Pick pages that are in chunks already associated with the VM.
2381	*
2382	* @returns The new page descriptor table index.
2383	* @param pGMM Pointer to the GMM instance data.
2384	* @param pGVM Pointer to the global VM structure.
2385	* @param pSet The set to pick from.
2386	* @param iPage The current page descriptor table index.
2387	* @param cPages The total number of pages to allocate.
2388	* @param paPages The page descriptor table (input + ouput).
2389	*/
2390	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2391	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2392	{
2393	uint16_t const hGVM = pGVM->hSelf;
2394
2395	/* Hint. */
2396	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2397	{
2398	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2399	if (pChunk && pChunk->cFree)
2400	{
2401	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2402	if (iPage >= cPages)
2403	return iPage;
2404	}
2405	}
2406
2407	/* Scan. */
2408	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2409	{
2410	PGMMCHUNK pChunk = pSet->apLists[iList];
2411	while (pChunk)
2412	{
2413	PGMMCHUNK pNext = pChunk->pFreeNext;
2414
2415	if (pChunk->hGVM == hGVM)
2416	{
2417	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2418	if (iPage >= cPages)
2419	{
2420	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2421	return iPage;
2422	}
2423	}
2424
2425	pChunk = pNext;
2426	}
2427	}
2428	return iPage;
2429	}
2430
2431
2432
2433	/**
2434	* Pick pages in bound memory mode.
2435	*
2436	* @returns The new page descriptor table index.
2437	* @param pGVM Pointer to the global VM structure.
2438	* @param iPage The current page descriptor table index.
2439	* @param cPages The total number of pages to allocate.
2440	* @param paPages The page descriptor table (input + ouput).
2441	*/
2442	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2443	{
2444	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2445	{
2446	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2447	while (pChunk)
2448	{
2449	Assert(pChunk->hGVM == pGVM->hSelf);
2450	PGMMCHUNK pNext = pChunk->pFreeNext;
2451	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2452	if (iPage >= cPages)
2453	return iPage;
2454	pChunk = pNext;
2455	}
2456	}
2457	return iPage;
2458	}
2459
2460
2461	/**
2462	* Checks if we should start picking pages from chunks of other VMs because
2463	* we're getting close to the system memory or reserved limit.
2464	*
2465	* @returns @c true if we should, @c false if we should first try allocate more
2466	* chunks.
2467	*/
2468	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(PGVM pGVM)
2469	{
2470	/*
2471	* Don't allocate a new chunk if we're
2472	*/
2473	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2474	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2475	- pGVM->gmm.s.Stats.cBalloonedPages
2476	/** @todo what about shared pages? */;
2477	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2478	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2479	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2480	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2481	return true;
2482	/** @todo make the threshold configurable, also test the code to see if
2483	* this ever kicks in (we might be reserving too much or smth). */
2484
2485	/*
2486	* Check how close we're to the max memory limit and how many fragments
2487	* there are?...
2488	*/
2489	/** @todo */
2490
2491	return false;
2492	}
2493
2494
2495	/**
2496	* Checks if we should start picking pages from chunks of other VMs because
2497	* there is a lot of free pages around.
2498	*
2499	* @returns @c true if we should, @c false if we should first try allocate more
2500	* chunks.
2501	*/
2502	static bool gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(PGMM pGMM)
2503	{
2504	/*
2505	* Setting the limit at 16 chunks (32 MB) at the moment.
2506	*/
2507	if (pGMM->PrivateX.cFreePages >= GMM_CHUNK_NUM_PAGES * 16)
2508	return true;
2509	return false;
2510	}
2511
2512
2513	/**
2514	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2515	*
2516	* @returns VBox status code:
2517	* @retval VINF_SUCCESS on success.
2518	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2519	* gmmR0AllocateMoreChunks is necessary.
2520	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2521	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2522	* that is we're trying to allocate more than we've reserved.
2523	*
2524	* @param pGMM Pointer to the GMM instance data.
2525	* @param pGVM Pointer to the VM.
2526	* @param cPages The number of pages to allocate.
2527	* @param paPages Pointer to the page descriptors. See GMMPAGEDESC for
2528	* details on what is expected on input.
2529	* @param enmAccount The account to charge.
2530	*
2531	* @remarks Call takes the giant GMM lock.
2532	*/
2533	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2534	{
2535	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2536
2537	/*
2538	* Check allocation limits.
2539	*/
2540	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2541	return VERR_GMM_HIT_GLOBAL_LIMIT;
2542
2543	switch (enmAccount)
2544	{
2545	case GMMACCOUNT_BASE:
2546	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2547	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2548	{
2549	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2550	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2551	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2552	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2553	}
2554	break;
2555	case GMMACCOUNT_SHADOW:
2556	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2557	{
2558	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2559	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2560	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2561	}
2562	break;
2563	case GMMACCOUNT_FIXED:
2564	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2565	{
2566	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2567	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2568	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2569	}
2570	break;
2571	default:
2572	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2573	}
2574
2575	/*
2576	* If we're in legacy memory mode, it's easy to figure if we have
2577	* sufficient number of pages up-front.
2578	*/
2579	if ( pGMM->fLegacyAllocationMode
2580	&& pGVM->gmm.s.Private.cFreePages < cPages)
2581	{
2582	Assert(pGMM->fBoundMemoryMode);
2583	return VERR_GMM_SEED_ME;
2584	}
2585
2586	/*
2587	* Update the accounts before we proceed because we might be leaving the
2588	* protection of the global mutex and thus run the risk of permitting
2589	* too much memory to be allocated.
2590	*/
2591	switch (enmAccount)
2592	{
2593	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2594	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2595	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2596	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2597	}
2598	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2599	pGMM->cAllocatedPages += cPages;
2600
2601	/*
2602	* Part two of it's-easy-in-legacy-memory-mode.
2603	*/
2604	uint32_t iPage = 0;
2605	if (pGMM->fLegacyAllocationMode)
2606	{
2607	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2608	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2609	return VINF_SUCCESS;
2610	}
2611
2612	/*
2613	* Bound mode is also relatively straightforward.
2614	*/
2615	int rc = VINF_SUCCESS;
2616	if (pGMM->fBoundMemoryMode)
2617	{
2618	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2619	if (iPage < cPages)
2620	do
2621	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2622	while (iPage < cPages && RT_SUCCESS(rc));
2623	}
2624	/*
2625	* Shared mode is trickier as we should try archive the same locality as
2626	* in bound mode, but smartly make use of non-full chunks allocated by
2627	* other VMs if we're low on memory.
2628	*/
2629	else
2630	{
2631	/* Pick the most optimal pages first. */
2632	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2633	if (iPage < cPages)
2634	{
2635	/* Maybe we should try getting pages from chunks "belonging" to
2636	other VMs before allocating more chunks? */
2637	bool fTriedOnSameAlready = false;
2638	if (gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLimits(pGVM))
2639	{
2640	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2641	fTriedOnSameAlready = true;
2642	}
2643
2644	/* Allocate memory from empty chunks. */
2645	if (iPage < cPages)
2646	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2647
2648	/* Grab empty shared chunks. */
2649	if (iPage < cPages)
2650	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2651
2652	/* If there is a lof of free pages spread around, try not waste
2653	system memory on more chunks. (Should trigger defragmentation.) */
2654	if ( !fTriedOnSameAlready
2655	&& gmmR0ShouldAllocatePagesInOtherChunksBecauseOfLotsFree(pGMM))
2656	{
2657	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2658	if (iPage < cPages)
2659	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2660	}
2661
2662	/*
2663	* Ok, try allocate new chunks.
2664	*/
2665	if (iPage < cPages)
2666	{
2667	do
2668	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2669	while (iPage < cPages && RT_SUCCESS(rc));
2670
2671	/* If the host is out of memory, take whatever we can get. */
2672	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2673	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2674	{
2675	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2676	if (iPage < cPages)
2677	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2678	AssertRelease(iPage == cPages);
2679	rc = VINF_SUCCESS;
2680	}
2681	}
2682	}
2683	}
2684
2685	/*
2686	* Clean up on failure. Since this is bound to be a low-memory condition
2687	* we will give back any empty chunks that might be hanging around.
2688	*/
2689	if (RT_FAILURE(rc))
2690	{
2691	/* Update the statistics. */
2692	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2693	pGMM->cAllocatedPages -= cPages - iPage;
2694	switch (enmAccount)
2695	{
2696	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2697	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2698	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2699	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2700	}
2701
2702	/* Release the pages. */
2703	while (iPage-- > 0)
2704	{
2705	uint32_t idPage = paPages[iPage].idPage;
2706	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2707	if (RT_LIKELY(pPage))
2708	{
2709	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2710	Assert(pPage->Private.hGVM == pGVM->hSelf);
2711	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2712	}
2713	else
2714	AssertMsgFailed(("idPage=%#x\n", idPage));
2715
2716	paPages[iPage].idPage = NIL_GMM_PAGEID;
2717	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2718	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2719	}
2720
2721	/* Free empty chunks. */
2722	/** @todo */
2723
2724	/* return the fail status on failure */
2725	return rc;
2726	}
2727	return VINF_SUCCESS;
2728	}
2729
2730
2731	/**
2732	* Updates the previous allocations and allocates more pages.
2733	*
2734	* The handy pages are always taken from the 'base' memory account.
2735	* The allocated pages are not cleared and will contains random garbage.
2736	*
2737	* @returns VBox status code:
2738	* @retval VINF_SUCCESS on success.
2739	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2740	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2741	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2742	* private page.
2743	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2744	* shared page.
2745	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2746	* owned by the VM.
2747	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2748	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2749	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2750	* that is we're trying to allocate more than we've reserved.
2751	*
2752	* @param pGVM The global (ring-0) VM structure.
2753	* @param idCpu The VCPU id.
2754	* @param cPagesToUpdate The number of pages to update (starting from the head).
2755	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2756	* @param paPages The array of page descriptors.
2757	* See GMMPAGEDESC for details on what is expected on input.
2758	* @thread EMT(idCpu)
2759	*/
2760	GMMR0DECL(int) GMMR0AllocateHandyPages(PGVM pGVM, VMCPUID idCpu, uint32_t cPagesToUpdate,
2761	uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2762	{
2763	LogFlow(("GMMR0AllocateHandyPages: pGVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2764	pGVM, cPagesToUpdate, cPagesToAlloc, paPages));
2765
2766	/*
2767	* Validate, get basics and take the semaphore.
2768	* (This is a relatively busy path, so make predictions where possible.)
2769	*/
2770	PGMM pGMM;
2771	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2772	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2773	if (RT_FAILURE(rc))
2774	return rc;
2775
2776	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2777	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2778	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2779	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2780	VERR_INVALID_PARAMETER);
2781
2782	unsigned iPage = 0;
2783	for (; iPage < cPagesToUpdate; iPage++)
2784	{
2785	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2786	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2787	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2788	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2789	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2790	VERR_INVALID_PARAMETER);
2791	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2792	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2793	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2794	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2795	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2796	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2797	}
2798
2799	for (; iPage < cPagesToAlloc; iPage++)
2800	{
2801	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2802	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2803	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2804	}
2805
2806	gmmR0MutexAcquire(pGMM);
2807	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2808	{
2809	/* No allocations before the initial reservation has been made! */
2810	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2811	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2812	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2813	{
2814	/*
2815	* Perform the updates.
2816	* Stop on the first error.
2817	*/
2818	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2819	{
2820	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2821	{
2822	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2823	if (RT_LIKELY(pPage))
2824	{
2825	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2826	{
2827	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2828	{
2829	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2830	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2831	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2832	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2833	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2834	/* else: NIL_RTHCPHYS nothing */
2835
2836	paPages[iPage].idPage = NIL_GMM_PAGEID;
2837	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2838	}
2839	else
2840	{
2841	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2842	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2843	rc = VERR_GMM_NOT_PAGE_OWNER;
2844	break;
2845	}
2846	}
2847	else
2848	{
2849	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2850	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2851	break;
2852	}
2853	}
2854	else
2855	{
2856	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2857	rc = VERR_GMM_PAGE_NOT_FOUND;
2858	break;
2859	}
2860	}
2861
2862	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2863	{
2864	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2865	if (RT_LIKELY(pPage))
2866	{
2867	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2868	{
2869	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2870	Assert(pPage->Shared.cRefs);
2871	Assert(pGVM->gmm.s.Stats.cSharedPages);
2872	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2873
2874	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2875	pGVM->gmm.s.Stats.cSharedPages--;
2876	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2877	if (!--pPage->Shared.cRefs)
2878	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2879	else
2880	{
2881	Assert(pGMM->cDuplicatePages);
2882	pGMM->cDuplicatePages--;
2883	}
2884
2885	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2886	}
2887	else
2888	{
2889	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2890	rc = VERR_GMM_PAGE_NOT_SHARED;
2891	break;
2892	}
2893	}
2894	else
2895	{
2896	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2897	rc = VERR_GMM_PAGE_NOT_FOUND;
2898	break;
2899	}
2900	}
2901	} /* for each page to update */
2902
2903	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2904	{
2905	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2906	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2907	{
2908	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2909	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2910	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2911	}
2912	#endif
2913
2914	/*
2915	* Join paths with GMMR0AllocatePages for the allocation.
2916	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2917	*/
2918	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2919	}
2920	}
2921	else
2922	rc = VERR_WRONG_ORDER;
2923	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2924	}
2925	else
2926	rc = VERR_GMM_IS_NOT_SANE;
2927	gmmR0MutexRelease(pGMM);
2928	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2929	return rc;
2930	}
2931
2932
2933	/**
2934	* Allocate one or more pages.
2935	*
2936	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2937	* The allocated pages are not cleared and will contain random garbage.
2938	*
2939	* @returns VBox status code:
2940	* @retval VINF_SUCCESS on success.
2941	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2942	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2943	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2944	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2945	* that is we're trying to allocate more than we've reserved.
2946	*
2947	* @param pGVM The global (ring-0) VM structure.
2948	* @param idCpu The VCPU id.
2949	* @param cPages The number of pages to allocate.
2950	* @param paPages Pointer to the page descriptors.
2951	* See GMMPAGEDESC for details on what is expected on
2952	* input.
2953	* @param enmAccount The account to charge.
2954	*
2955	* @thread EMT.
2956	*/
2957	GMMR0DECL(int) GMMR0AllocatePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2958	{
2959	LogFlow(("GMMR0AllocatePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
2960
2961	/*
2962	* Validate, get basics and take the semaphore.
2963	*/
2964	PGMM pGMM;
2965	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2966	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
2967	if (RT_FAILURE(rc))
2968	return rc;
2969
2970	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2971	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2972	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2973
2974	for (unsigned iPage = 0; iPage < cPages; iPage++)
2975	{
2976	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2977	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2978	\|\| ( enmAccount == GMMACCOUNT_BASE
2979	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2980	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2981	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2982	VERR_INVALID_PARAMETER);
2983	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2984	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2985	}
2986
2987	gmmR0MutexAcquire(pGMM);
2988	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2989	{
2990
2991	/* No allocations before the initial reservation has been made! */
2992	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2993	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2994	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2995	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2996	else
2997	rc = VERR_WRONG_ORDER;
2998	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2999	}
3000	else
3001	rc = VERR_GMM_IS_NOT_SANE;
3002	gmmR0MutexRelease(pGMM);
3003	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
3004	return rc;
3005	}
3006
3007
3008	/**
3009	* VMMR0 request wrapper for GMMR0AllocatePages.
3010	*
3011	* @returns see GMMR0AllocatePages.
3012	* @param pGVM The global (ring-0) VM structure.
3013	* @param idCpu The VCPU id.
3014	* @param pReq Pointer to the request packet.
3015	*/
3016	GMMR0DECL(int) GMMR0AllocatePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
3017	{
3018	/*
3019	* Validate input and pass it on.
3020	*/
3021	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3022	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
3023	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
3024	VERR_INVALID_PARAMETER);
3025	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
3026	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
3027	VERR_INVALID_PARAMETER);
3028
3029	return GMMR0AllocatePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3030	}
3031
3032
3033	/**
3034	* Allocate a large page to represent guest RAM
3035	*
3036	* The allocated pages are not cleared and will contains random garbage.
3037	*
3038	* @returns VBox status code:
3039	* @retval VINF_SUCCESS on success.
3040	* @retval VERR_NOT_OWNER if the caller is not an EMT.
3041	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
3042	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
3043	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
3044	* that is we're trying to allocate more than we've reserved.
3045	* @returns see GMMR0AllocatePages.
3046	*
3047	* @param pGVM The global (ring-0) VM structure.
3048	* @param idCpu The VCPU id.
3049	* @param cbPage Large page size.
3050	* @param pIdPage Where to return the GMM page ID of the page.
3051	* @param pHCPhys Where to return the host physical address of the page.
3052	*/
3053	GMMR0DECL(int) GMMR0AllocateLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
3054	{
3055	LogFlow(("GMMR0AllocateLargePage: pGVM=%p cbPage=%x\n", pGVM, cbPage));
3056
3057	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
3058	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
3059	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
3060
3061	/*
3062	* Validate, get basics and take the semaphore.
3063	*/
3064	PGMM pGMM;
3065	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3066	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3067	if (RT_FAILURE(rc))
3068	return rc;
3069
3070	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3071	if (pGMM->fLegacyAllocationMode)
3072	return VERR_NOT_SUPPORTED;
3073
3074	*pHCPhys = NIL_RTHCPHYS;
3075	*pIdPage = NIL_GMM_PAGEID;
3076
3077	gmmR0MutexAcquire(pGMM);
3078	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3079	{
3080	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3081	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
3082	> pGVM->gmm.s.Stats.Reserved.cBasePages))
3083	{
3084	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
3085	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3086	gmmR0MutexRelease(pGMM);
3087	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
3088	}
3089
3090	/*
3091	* Allocate a new large page chunk.
3092	*
3093	* Note! We leave the giant GMM lock temporarily as the allocation might
3094	* take a long time. gmmR0RegisterChunk will retake it (ugly).
3095	*/
3096	AssertCompile(GMM_CHUNK_SIZE == _2M);
3097	gmmR0MutexRelease(pGMM);
3098
3099	RTR0MEMOBJ hMemObj;
3100	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
3101	if (RT_SUCCESS(rc))
3102	{
3103	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
3104	PGMMCHUNK pChunk;
3105	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
3106	if (RT_SUCCESS(rc))
3107	{
3108	/*
3109	* Allocate all the pages in the chunk.
3110	*/
3111	/* Unlink the new chunk from the free list. */
3112	gmmR0UnlinkChunk(pChunk);
3113
3114	/** @todo rewrite this to skip the looping. */
3115	/* Allocate all pages. */
3116	GMMPAGEDESC PageDesc;
3117	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3118
3119	/* Return the first page as we'll use the whole chunk as one big page. */
3120	*pIdPage = PageDesc.idPage;
3121	*pHCPhys = PageDesc.HCPhysGCPhys;
3122
3123	for (unsigned i = 1; i < cPages; i++)
3124	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3125
3126	/* Update accounting. */
3127	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3128	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3129	pGMM->cAllocatedPages += cPages;
3130
3131	gmmR0LinkChunk(pChunk, pSet);
3132	gmmR0MutexRelease(pGMM);
3133	LogFlow(("GMMR0AllocateLargePage: returns VINF_SUCCESS\n"));
3134	return VINF_SUCCESS;
3135	}
3136	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3137	}
3138	}
3139	else
3140	{
3141	gmmR0MutexRelease(pGMM);
3142	rc = VERR_GMM_IS_NOT_SANE;
3143	}
3144
3145	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3146	return rc;
3147	}
3148
3149
3150	/**
3151	* Free a large page.
3152	*
3153	* @returns VBox status code:
3154	* @param pGVM The global (ring-0) VM structure.
3155	* @param idCpu The VCPU id.
3156	* @param idPage The large page id.
3157	*/
3158	GMMR0DECL(int) GMMR0FreeLargePage(PGVM pGVM, VMCPUID idCpu, uint32_t idPage)
3159	{
3160	LogFlow(("GMMR0FreeLargePage: pGVM=%p idPage=%x\n", pGVM, idPage));
3161
3162	/*
3163	* Validate, get basics and take the semaphore.
3164	*/
3165	PGMM pGMM;
3166	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3167	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3168	if (RT_FAILURE(rc))
3169	return rc;
3170
3171	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3172	if (pGMM->fLegacyAllocationMode)
3173	return VERR_NOT_SUPPORTED;
3174
3175	gmmR0MutexAcquire(pGMM);
3176	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3177	{
3178	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3179
3180	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3181	{
3182	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3183	gmmR0MutexRelease(pGMM);
3184	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3185	}
3186
3187	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3188	if (RT_LIKELY( pPage
3189	&& GMM_PAGE_IS_PRIVATE(pPage)))
3190	{
3191	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3192	Assert(pChunk);
3193	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3194	Assert(pChunk->cPrivate > 0);
3195
3196	/* Release the memory immediately. */
3197	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3198
3199	/* Update accounting. */
3200	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3201	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3202	pGMM->cAllocatedPages -= cPages;
3203	}
3204	else
3205	rc = VERR_GMM_PAGE_NOT_FOUND;
3206	}
3207	else
3208	rc = VERR_GMM_IS_NOT_SANE;
3209
3210	gmmR0MutexRelease(pGMM);
3211	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3212	return rc;
3213	}
3214
3215
3216	/**
3217	* VMMR0 request wrapper for GMMR0FreeLargePage.
3218	*
3219	* @returns see GMMR0FreeLargePage.
3220	* @param pGVM The global (ring-0) VM structure.
3221	* @param idCpu The VCPU id.
3222	* @param pReq Pointer to the request packet.
3223	*/
3224	GMMR0DECL(int) GMMR0FreeLargePageReq(PGVM pGVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3225	{
3226	/*
3227	* Validate input and pass it on.
3228	*/
3229	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3230	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3231	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3232	VERR_INVALID_PARAMETER);
3233
3234	return GMMR0FreeLargePage(pGVM, idCpu, pReq->idPage);
3235	}
3236
3237
3238	/**
3239	* Frees a chunk, giving it back to the host OS.
3240	*
3241	* @param pGMM Pointer to the GMM instance.
3242	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3243	* unmap and free the chunk in one go.
3244	* @param pChunk The chunk to free.
3245	* @param fRelaxedSem Whether we can release the semaphore while doing the
3246	* freeing (@c true) or not.
3247	*/
3248	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3249	{
3250	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3251
3252	GMMR0CHUNKMTXSTATE MtxState;
3253	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3254
3255	/*
3256	* Cleanup hack! Unmap the chunk from the callers address space.
3257	* This shouldn't happen, so screw lock contention...
3258	*/
3259	if ( pChunk->cMappingsX
3260	&& !pGMM->fLegacyAllocationMode
3261	&& pGVM)
3262	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3263
3264	/*
3265	* If there are current mappings of the chunk, then request the
3266	* VMs to unmap them. Reposition the chunk in the free list so
3267	* it won't be a likely candidate for allocations.
3268	*/
3269	if (pChunk->cMappingsX)
3270	{
3271	/** @todo R0 -> VM request */
3272	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3273	Log(("gmmR0FreeChunk: chunk still has %d mappings; don't free!\n", pChunk->cMappingsX));
3274	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3275	return false;
3276	}
3277
3278
3279	/*
3280	* Save and trash the handle.
3281	*/
3282	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3283	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3284
3285	/*
3286	* Unlink it from everywhere.
3287	*/
3288	gmmR0UnlinkChunk(pChunk);
3289
3290	RTListNodeRemove(&pChunk->ListNode);
3291
3292	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3293	Assert(pCore == &pChunk->Core); NOREF(pCore);
3294
3295	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3296	if (pTlbe->pChunk == pChunk)
3297	{
3298	pTlbe->idChunk = NIL_GMM_CHUNKID;
3299	pTlbe->pChunk = NULL;
3300	}
3301
3302	Assert(pGMM->cChunks > 0);
3303	pGMM->cChunks--;
3304
3305	/*
3306	* Free the Chunk ID before dropping the locks and freeing the rest.
3307	*/
3308	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3309	pChunk->Core.Key = NIL_GMM_CHUNKID;
3310
3311	pGMM->cFreedChunks++;
3312
3313	gmmR0ChunkMutexRelease(&MtxState, NULL);
3314	if (fRelaxedSem)
3315	gmmR0MutexRelease(pGMM);
3316
3317	RTMemFree(pChunk->paMappingsX);
3318	pChunk->paMappingsX = NULL;
3319
3320	RTMemFree(pChunk);
3321
3322	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
3323	int rc = RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
3324	#else
3325	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3326	#endif
3327	AssertLogRelRC(rc);
3328
3329	if (fRelaxedSem)
3330	gmmR0MutexAcquire(pGMM);
3331	return fRelaxedSem;
3332	}
3333
3334
3335	/**
3336	* Free page worker.
3337	*
3338	* The caller does all the statistic decrementing, we do all the incrementing.
3339	*
3340	* @param pGMM Pointer to the GMM instance data.
3341	* @param pGVM Pointer to the GVM instance.
3342	* @param pChunk Pointer to the chunk this page belongs to.
3343	* @param idPage The Page ID.
3344	* @param pPage Pointer to the page.
3345	*/
3346	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3347	{
3348	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3349	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3350
3351	/*
3352	* Put the page on the free list.
3353	*/
3354	pPage->u = 0;
3355	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3356	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3357	pPage->Free.iNext = pChunk->iFreeHead;
3358	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3359
3360	/*
3361	* Update statistics (the cShared/cPrivate stats are up to date already),
3362	* and relink the chunk if necessary.
3363	*/
3364	unsigned const cFree = pChunk->cFree;
3365	if ( !cFree
3366	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3367	{
3368	gmmR0UnlinkChunk(pChunk);
3369	pChunk->cFree++;
3370	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3371	}
3372	else
3373	{
3374	pChunk->cFree = cFree + 1;
3375	pChunk->pSet->cFreePages++;
3376	}
3377
3378	/*
3379	* If the chunk becomes empty, consider giving memory back to the host OS.
3380	*
3381	* The current strategy is to try give it back if there are other chunks
3382	* in this free list, meaning if there are at least 240 free pages in this
3383	* category. Note that since there are probably mappings of the chunk,
3384	* it won't be freed up instantly, which probably screws up this logic
3385	* a bit...
3386	*/
3387	/** @todo Do this on the way out. */
3388	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3389	&& pChunk->pFreeNext
3390	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3391	&& !pGMM->fLegacyAllocationMode))
3392	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3393
3394	}
3395
3396
3397	/**
3398	* Frees a shared page, the page is known to exist and be valid and such.
3399	*
3400	* @param pGMM Pointer to the GMM instance.
3401	* @param pGVM Pointer to the GVM instance.
3402	* @param idPage The page id.
3403	* @param pPage The page structure.
3404	*/
3405	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3406	{
3407	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3408	Assert(pChunk);
3409	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3410	Assert(pChunk->cShared > 0);
3411	Assert(pGMM->cSharedPages > 0);
3412	Assert(pGMM->cAllocatedPages > 0);
3413	Assert(!pPage->Shared.cRefs);
3414
3415	pChunk->cShared--;
3416	pGMM->cAllocatedPages--;
3417	pGMM->cSharedPages--;
3418	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3419	}
3420
3421
3422	/**
3423	* Frees a private page, the page is known to exist and be valid and such.
3424	*
3425	* @param pGMM Pointer to the GMM instance.
3426	* @param pGVM Pointer to the GVM instance.
3427	* @param idPage The page id.
3428	* @param pPage The page structure.
3429	*/
3430	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3431	{
3432	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3433	Assert(pChunk);
3434	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3435	Assert(pChunk->cPrivate > 0);
3436	Assert(pGMM->cAllocatedPages > 0);
3437
3438	pChunk->cPrivate--;
3439	pGMM->cAllocatedPages--;
3440	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3441	}
3442
3443
3444	/**
3445	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3446	*
3447	* @returns VBox status code:
3448	* @retval xxx
3449	*
3450	* @param pGMM Pointer to the GMM instance data.
3451	* @param pGVM Pointer to the VM.
3452	* @param cPages The number of pages to free.
3453	* @param paPages Pointer to the page descriptors.
3454	* @param enmAccount The account this relates to.
3455	*/
3456	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3457	{
3458	/*
3459	* Check that the request isn't impossible wrt to the account status.
3460	*/
3461	switch (enmAccount)
3462	{
3463	case GMMACCOUNT_BASE:
3464	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3465	{
3466	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3467	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3468	}
3469	break;
3470	case GMMACCOUNT_SHADOW:
3471	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3472	{
3473	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3474	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3475	}
3476	break;
3477	case GMMACCOUNT_FIXED:
3478	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3479	{
3480	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3481	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3482	}
3483	break;
3484	default:
3485	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3486	}
3487
3488	/*
3489	* Walk the descriptors and free the pages.
3490	*
3491	* Statistics (except the account) are being updated as we go along,
3492	* unlike the alloc code. Also, stop on the first error.
3493	*/
3494	int rc = VINF_SUCCESS;
3495	uint32_t iPage;
3496	for (iPage = 0; iPage < cPages; iPage++)
3497	{
3498	uint32_t idPage = paPages[iPage].idPage;
3499	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3500	if (RT_LIKELY(pPage))
3501	{
3502	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3503	{
3504	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3505	{
3506	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3507	pGVM->gmm.s.Stats.cPrivatePages--;
3508	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3509	}
3510	else
3511	{
3512	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3513	pPage->Private.hGVM, pGVM->hSelf));
3514	rc = VERR_GMM_NOT_PAGE_OWNER;
3515	break;
3516	}
3517	}
3518	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3519	{
3520	Assert(pGVM->gmm.s.Stats.cSharedPages);
3521	Assert(pPage->Shared.cRefs);
3522	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3523	if (pPage->Shared.u14Checksum)
3524	{
3525	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3526	uChecksum &= UINT32_C(0x00003fff);
3527	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3528	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3529	}
3530	#endif
3531	pGVM->gmm.s.Stats.cSharedPages--;
3532	if (!--pPage->Shared.cRefs)
3533	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3534	else
3535	{
3536	Assert(pGMM->cDuplicatePages);
3537	pGMM->cDuplicatePages--;
3538	}
3539	}
3540	else
3541	{
3542	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3543	rc = VERR_GMM_PAGE_ALREADY_FREE;
3544	break;
3545	}
3546	}
3547	else
3548	{
3549	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3550	rc = VERR_GMM_PAGE_NOT_FOUND;
3551	break;
3552	}
3553	paPages[iPage].idPage = NIL_GMM_PAGEID;
3554	}
3555
3556	/*
3557	* Update the account.
3558	*/
3559	switch (enmAccount)
3560	{
3561	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3562	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3563	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3564	default:
3565	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3566	}
3567
3568	/*
3569	* Any threshold stuff to be done here?
3570	*/
3571
3572	return rc;
3573	}
3574
3575
3576	/**
3577	* Free one or more pages.
3578	*
3579	* This is typically used at reset time or power off.
3580	*
3581	* @returns VBox status code:
3582	* @retval xxx
3583	*
3584	* @param pGVM The global (ring-0) VM structure.
3585	* @param idCpu The VCPU id.
3586	* @param cPages The number of pages to allocate.
3587	* @param paPages Pointer to the page descriptors containing the page IDs
3588	* for each page.
3589	* @param enmAccount The account this relates to.
3590	* @thread EMT.
3591	*/
3592	GMMR0DECL(int) GMMR0FreePages(PGVM pGVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3593	{
3594	LogFlow(("GMMR0FreePages: pGVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pGVM, cPages, paPages, enmAccount));
3595
3596	/*
3597	* Validate input and get the basics.
3598	*/
3599	PGMM pGMM;
3600	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3601	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3602	if (RT_FAILURE(rc))
3603	return rc;
3604
3605	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3606	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3607	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3608
3609	for (unsigned iPage = 0; iPage < cPages; iPage++)
3610	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3611	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3612	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3613
3614	/*
3615	* Take the semaphore and call the worker function.
3616	*/
3617	gmmR0MutexAcquire(pGMM);
3618	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3619	{
3620	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3621	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3622	}
3623	else
3624	rc = VERR_GMM_IS_NOT_SANE;
3625	gmmR0MutexRelease(pGMM);
3626	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3627	return rc;
3628	}
3629
3630
3631	/**
3632	* VMMR0 request wrapper for GMMR0FreePages.
3633	*
3634	* @returns see GMMR0FreePages.
3635	* @param pGVM The global (ring-0) VM structure.
3636	* @param idCpu The VCPU id.
3637	* @param pReq Pointer to the request packet.
3638	*/
3639	GMMR0DECL(int) GMMR0FreePagesReq(PGVM pGVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3640	{
3641	/*
3642	* Validate input and pass it on.
3643	*/
3644	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3645	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3646	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3647	VERR_INVALID_PARAMETER);
3648	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3649	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF_DYN(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3650	VERR_INVALID_PARAMETER);
3651
3652	return GMMR0FreePages(pGVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3653	}
3654
3655
3656	/**
3657	* Report back on a memory ballooning request.
3658	*
3659	* The request may or may not have been initiated by the GMM. If it was initiated
3660	* by the GMM it is important that this function is called even if no pages were
3661	* ballooned.
3662	*
3663	* @returns VBox status code:
3664	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3665	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3666	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3667	* indicating that we won't necessarily have sufficient RAM to boot
3668	* the VM again and that it should pause until this changes (we'll try
3669	* balloon some other VM). (For standard deflate we have little choice
3670	* but to hope the VM won't use the memory that was returned to it.)
3671	*
3672	* @param pGVM The global (ring-0) VM structure.
3673	* @param idCpu The VCPU id.
3674	* @param enmAction Inflate/deflate/reset.
3675	* @param cBalloonedPages The number of pages that was ballooned.
3676	*
3677	* @thread EMT(idCpu)
3678	*/
3679	GMMR0DECL(int) GMMR0BalloonedPages(PGVM pGVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3680	{
3681	LogFlow(("GMMR0BalloonedPages: pGVM=%p enmAction=%d cBalloonedPages=%#x\n",
3682	pGVM, enmAction, cBalloonedPages));
3683
3684	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3685
3686	/*
3687	* Validate input and get the basics.
3688	*/
3689	PGMM pGMM;
3690	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3691	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3692	if (RT_FAILURE(rc))
3693	return rc;
3694
3695	/*
3696	* Take the semaphore and do some more validations.
3697	*/
3698	gmmR0MutexAcquire(pGMM);
3699	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3700	{
3701	switch (enmAction)
3702	{
3703	case GMMBALLOONACTION_INFLATE:
3704	{
3705	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3706	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3707	{
3708	/*
3709	* Record the ballooned memory.
3710	*/
3711	pGMM->cBalloonedPages += cBalloonedPages;
3712	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3713	{
3714	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3715	AssertFailed();
3716
3717	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3718	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3719	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3720	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3721	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3722	}
3723	else
3724	{
3725	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3726	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3727	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3728	}
3729	}
3730	else
3731	{
3732	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3733	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3734	pGVM->gmm.s.Stats.Reserved.cBasePages));
3735	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3736	}
3737	break;
3738	}
3739
3740	case GMMBALLOONACTION_DEFLATE:
3741	{
3742	/* Deflate. */
3743	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3744	{
3745	/*
3746	* Record the ballooned memory.
3747	*/
3748	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3749	pGMM->cBalloonedPages -= cBalloonedPages;
3750	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3751	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3752	{
3753	AssertFailed(); /* This is path is for later. */
3754	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3755	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3756
3757	/*
3758	* Anything we need to do here now when the request has been completed?
3759	*/
3760	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3761	}
3762	else
3763	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3764	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3765	}
3766	else
3767	{
3768	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3769	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3770	}
3771	break;
3772	}
3773
3774	case GMMBALLOONACTION_RESET:
3775	{
3776	/* Reset to an empty balloon. */
3777	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3778
3779	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3780	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3781	break;
3782	}
3783
3784	default:
3785	rc = VERR_INVALID_PARAMETER;
3786	break;
3787	}
3788	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3789	}
3790	else
3791	rc = VERR_GMM_IS_NOT_SANE;
3792
3793	gmmR0MutexRelease(pGMM);
3794	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3795	return rc;
3796	}
3797
3798
3799	/**
3800	* VMMR0 request wrapper for GMMR0BalloonedPages.
3801	*
3802	* @returns see GMMR0BalloonedPages.
3803	* @param pGVM The global (ring-0) VM structure.
3804	* @param idCpu The VCPU id.
3805	* @param pReq Pointer to the request packet.
3806	*/
3807	GMMR0DECL(int) GMMR0BalloonedPagesReq(PGVM pGVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3808	{
3809	/*
3810	* Validate input and pass it on.
3811	*/
3812	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3813	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3814	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3815	VERR_INVALID_PARAMETER);
3816
3817	return GMMR0BalloonedPages(pGVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3818	}
3819
3820
3821	/**
3822	* Return memory statistics for the hypervisor
3823	*
3824	* @returns VBox status code.
3825	* @param pReq Pointer to the request packet.
3826	*/
3827	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PGMMMEMSTATSREQ pReq)
3828	{
3829	/*
3830	* Validate input and pass it on.
3831	*/
3832	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3833	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3834	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3835	VERR_INVALID_PARAMETER);
3836
3837	/*
3838	* Validate input and get the basics.
3839	*/
3840	PGMM pGMM;
3841	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3842	pReq->cAllocPages = pGMM->cAllocatedPages;
3843	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3844	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3845	pReq->cMaxPages = pGMM->cMaxPages;
3846	pReq->cSharedPages = pGMM->cDuplicatePages;
3847	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3848
3849	return VINF_SUCCESS;
3850	}
3851
3852
3853	/**
3854	* Return memory statistics for the VM
3855	*
3856	* @returns VBox status code.
3857	* @param pGVM The global (ring-0) VM structure.
3858	* @param idCpu Cpu id.
3859	* @param pReq Pointer to the request packet.
3860	*
3861	* @thread EMT(idCpu)
3862	*/
3863	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PGVM pGVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3864	{
3865	/*
3866	* Validate input and pass it on.
3867	*/
3868	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3869	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3870	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3871	VERR_INVALID_PARAMETER);
3872
3873	/*
3874	* Validate input and get the basics.
3875	*/
3876	PGMM pGMM;
3877	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3878	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
3879	if (RT_FAILURE(rc))
3880	return rc;
3881
3882	/*
3883	* Take the semaphore and do some more validations.
3884	*/
3885	gmmR0MutexAcquire(pGMM);
3886	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3887	{
3888	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3889	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3890	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3891	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3892	}
3893	else
3894	rc = VERR_GMM_IS_NOT_SANE;
3895
3896	gmmR0MutexRelease(pGMM);
3897	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3898	return rc;
3899	}
3900
3901
3902	/**
3903	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3904	*
3905	* Don't call this in legacy allocation mode!
3906	*
3907	* @returns VBox status code.
3908	* @param pGMM Pointer to the GMM instance data.
3909	* @param pGVM Pointer to the Global VM structure.
3910	* @param pChunk Pointer to the chunk to be unmapped.
3911	*/
3912	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3913	{
3914	Assert(!pGMM->fLegacyAllocationMode); NOREF(pGMM);
3915
3916	/*
3917	* Find the mapping and try unmapping it.
3918	*/
3919	uint32_t cMappings = pChunk->cMappingsX;
3920	for (uint32_t i = 0; i < cMappings; i++)
3921	{
3922	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3923	if (pChunk->paMappingsX[i].pGVM == pGVM)
3924	{
3925	/* unmap */
3926	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3927	if (RT_SUCCESS(rc))
3928	{
3929	/* update the record. */
3930	cMappings--;
3931	if (i < cMappings)
3932	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3933	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3934	pChunk->paMappingsX[cMappings].pGVM = NULL;
3935	Assert(pChunk->cMappingsX - 1U == cMappings);
3936	pChunk->cMappingsX = cMappings;
3937	}
3938
3939	return rc;
3940	}
3941	}
3942
3943	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3944	return VERR_GMM_CHUNK_NOT_MAPPED;
3945	}
3946
3947
3948	/**
3949	* Unmaps a chunk previously mapped into the address space of the current process.
3950	*
3951	* @returns VBox status code.
3952	* @param pGMM Pointer to the GMM instance data.
3953	* @param pGVM Pointer to the Global VM structure.
3954	* @param pChunk Pointer to the chunk to be unmapped.
3955	* @param fRelaxedSem Whether we can release the semaphore while doing the
3956	* mapping (@c true) or not.
3957	*/
3958	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3959	{
3960	if (!pGMM->fLegacyAllocationMode)
3961	{
3962	/*
3963	* Lock the chunk and if possible leave the giant GMM lock.
3964	*/
3965	GMMR0CHUNKMTXSTATE MtxState;
3966	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3967	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3968	if (RT_SUCCESS(rc))
3969	{
3970	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3971	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3972	}
3973	return rc;
3974	}
3975
3976	if (pChunk->hGVM == pGVM->hSelf)
3977	return VINF_SUCCESS;
3978
3979	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3980	return VERR_GMM_CHUNK_NOT_MAPPED;
3981	}
3982
3983
3984	/**
3985	* Worker for gmmR0MapChunk.
3986	*
3987	* @returns VBox status code.
3988	* @param pGMM Pointer to the GMM instance data.
3989	* @param pGVM Pointer to the Global VM structure.
3990	* @param pChunk Pointer to the chunk to be mapped.
3991	* @param ppvR3 Where to store the ring-3 address of the mapping.
3992	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3993	* contain the address of the existing mapping.
3994	*/
3995	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3996	{
3997	/*
3998	* If we're in legacy mode this is simple.
3999	*/
4000	if (pGMM->fLegacyAllocationMode)
4001	{
4002	if (pChunk->hGVM != pGVM->hSelf)
4003	{
4004	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4005	return VERR_GMM_CHUNK_NOT_FOUND;
4006	}
4007
4008	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
4009	return VINF_SUCCESS;
4010	}
4011
4012	/*
4013	* Check to see if the chunk is already mapped.
4014	*/
4015	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4016	{
4017	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4018	if (pChunk->paMappingsX[i].pGVM == pGVM)
4019	{
4020	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4021	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
4022	#ifdef VBOX_WITH_PAGE_SHARING
4023	/* The ring-3 chunk cache can be out of sync; don't fail. */
4024	return VINF_SUCCESS;
4025	#else
4026	return VERR_GMM_CHUNK_ALREADY_MAPPED;
4027	#endif
4028	}
4029	}
4030
4031	/*
4032	* Do the mapping.
4033	*/
4034	RTR0MEMOBJ hMapObj;
4035	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4036	if (RT_SUCCESS(rc))
4037	{
4038	/* reallocate the array? assumes few users per chunk (usually one). */
4039	unsigned iMapping = pChunk->cMappingsX;
4040	if ( iMapping <= 3
4041	\|\| (iMapping & 3) == 0)
4042	{
4043	unsigned cNewSize = iMapping <= 3
4044	? iMapping + 1
4045	: iMapping + 4;
4046	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
4047	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
4048	{
4049	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4050	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
4051	}
4052
4053	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
4054	if (RT_UNLIKELY(!pvMappings))
4055	{
4056	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
4057	return VERR_NO_MEMORY;
4058	}
4059	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
4060	}
4061
4062	/* insert new entry */
4063	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
4064	pChunk->paMappingsX[iMapping].pGVM = pGVM;
4065	Assert(pChunk->cMappingsX == iMapping);
4066	pChunk->cMappingsX = iMapping + 1;
4067
4068	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
4069	}
4070
4071	return rc;
4072	}
4073
4074
4075	/**
4076	* Maps a chunk into the user address space of the current process.
4077	*
4078	* @returns VBox status code.
4079	* @param pGMM Pointer to the GMM instance data.
4080	* @param pGVM Pointer to the Global VM structure.
4081	* @param pChunk Pointer to the chunk to be mapped.
4082	* @param fRelaxedSem Whether we can release the semaphore while doing the
4083	* mapping (@c true) or not.
4084	* @param ppvR3 Where to store the ring-3 address of the mapping.
4085	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
4086	* contain the address of the existing mapping.
4087	*/
4088	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
4089	{
4090	/*
4091	* Take the chunk lock and leave the giant GMM lock when possible, then
4092	* call the worker function.
4093	*/
4094	GMMR0CHUNKMTXSTATE MtxState;
4095	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
4096	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
4097	if (RT_SUCCESS(rc))
4098	{
4099	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
4100	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4101	}
4102
4103	return rc;
4104	}
4105
4106
4107
4108	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
4109	/**
4110	* Check if a chunk is mapped into the specified VM
4111	*
4112	* @returns mapped yes/no
4113	* @param pGMM Pointer to the GMM instance.
4114	* @param pGVM Pointer to the Global VM structure.
4115	* @param pChunk Pointer to the chunk to be mapped.
4116	* @param ppvR3 Where to store the ring-3 address of the mapping.
4117	*/
4118	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4119	{
4120	GMMR0CHUNKMTXSTATE MtxState;
4121	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4122	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4123	{
4124	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4125	if (pChunk->paMappingsX[i].pGVM == pGVM)
4126	{
4127	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4128	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4129	return true;
4130	}
4131	}
4132	*ppvR3 = NULL;
4133	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4134	return false;
4135	}
4136	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4137
4138
4139	/**
4140	* Map a chunk and/or unmap another chunk.
4141	*
4142	* The mapping and unmapping applies to the current process.
4143	*
4144	* This API does two things because it saves a kernel call per mapping when
4145	* when the ring-3 mapping cache is full.
4146	*
4147	* @returns VBox status code.
4148	* @param pGVM The global (ring-0) VM structure.
4149	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4150	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4151	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4152	* @thread EMT ???
4153	*/
4154	GMMR0DECL(int) GMMR0MapUnmapChunk(PGVM pGVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4155	{
4156	LogFlow(("GMMR0MapUnmapChunk: pGVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4157	pGVM, idChunkMap, idChunkUnmap, ppvR3));
4158
4159	/*
4160	* Validate input and get the basics.
4161	*/
4162	PGMM pGMM;
4163	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4164	int rc = GVMMR0ValidateGVM(pGVM);
4165	if (RT_FAILURE(rc))
4166	return rc;
4167
4168	AssertCompile(NIL_GMM_CHUNKID == 0);
4169	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4170	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4171
4172	if ( idChunkMap == NIL_GMM_CHUNKID
4173	&& idChunkUnmap == NIL_GMM_CHUNKID)
4174	return VERR_INVALID_PARAMETER;
4175
4176	if (idChunkMap != NIL_GMM_CHUNKID)
4177	{
4178	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4179	*ppvR3 = NIL_RTR3PTR;
4180	}
4181
4182	/*
4183	* Take the semaphore and do the work.
4184	*
4185	* The unmapping is done last since it's easier to undo a mapping than
4186	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4187	* that it pushes the user virtual address space to within a chunk of
4188	* it it's limits, so, no problem here.
4189	*/
4190	gmmR0MutexAcquire(pGMM);
4191	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4192	{
4193	PGMMCHUNK pMap = NULL;
4194	if (idChunkMap != NIL_GVM_HANDLE)
4195	{
4196	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4197	if (RT_LIKELY(pMap))
4198	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4199	else
4200	{
4201	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4202	rc = VERR_GMM_CHUNK_NOT_FOUND;
4203	}
4204	}
4205	/** @todo split this operation, the bail out might (theoretcially) not be
4206	* entirely safe. */
4207
4208	if ( idChunkUnmap != NIL_GMM_CHUNKID
4209	&& RT_SUCCESS(rc))
4210	{
4211	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4212	if (RT_LIKELY(pUnmap))
4213	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4214	else
4215	{
4216	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4217	rc = VERR_GMM_CHUNK_NOT_FOUND;
4218	}
4219
4220	if (RT_FAILURE(rc) && pMap)
4221	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4222	}
4223
4224	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4225	}
4226	else
4227	rc = VERR_GMM_IS_NOT_SANE;
4228	gmmR0MutexRelease(pGMM);
4229
4230	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4231	return rc;
4232	}
4233
4234
4235	/**
4236	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4237	*
4238	* @returns see GMMR0MapUnmapChunk.
4239	* @param pGVM The global (ring-0) VM structure.
4240	* @param pReq Pointer to the request packet.
4241	*/
4242	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PGVM pGVM, PGMMMAPUNMAPCHUNKREQ pReq)
4243	{
4244	/*
4245	* Validate input and pass it on.
4246	*/
4247	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4248	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4249
4250	return GMMR0MapUnmapChunk(pGVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4251	}
4252
4253
4254	/**
4255	* Legacy mode API for supplying pages.
4256	*
4257	* The specified user address points to a allocation chunk sized block that
4258	* will be locked down and used by the GMM when the GM asks for pages.
4259	*
4260	* @returns VBox status code.
4261	* @param pGVM The global (ring-0) VM structure.
4262	* @param idCpu The VCPU id.
4263	* @param pvR3 Pointer to the chunk size memory block to lock down.
4264	*/
4265	GMMR0DECL(int) GMMR0SeedChunk(PGVM pGVM, VMCPUID idCpu, RTR3PTR pvR3)
4266	{
4267	/*
4268	* Validate input and get the basics.
4269	*/
4270	PGMM pGMM;
4271	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4272	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4273	if (RT_FAILURE(rc))
4274	return rc;
4275
4276	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4277	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4278
4279	if (!pGMM->fLegacyAllocationMode)
4280	{
4281	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4282	return VERR_NOT_SUPPORTED;
4283	}
4284
4285	/*
4286	* Lock the memory and add it as new chunk with our hGVM.
4287	* (The GMM locking is done inside gmmR0RegisterChunk.)
4288	*/
4289	RTR0MEMOBJ hMemObj;
4290	rc = RTR0MemObjLockUser(&hMemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4291	if (RT_SUCCESS(rc))
4292	{
4293	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4294	if (RT_SUCCESS(rc))
4295	gmmR0MutexRelease(pGMM);
4296	else
4297	RTR0MemObjFree(hMemObj, true /* fFreeMappings */);
4298	}
4299
4300	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4301	return rc;
4302	}
4303
4304	#if defined(VBOX_WITH_RAM_IN_KERNEL) && !defined(VBOX_WITH_LINEAR_HOST_PHYS_MEM)
4305
4306	/**
4307	* Gets the ring-0 virtual address for the given page.
4308	*
4309	* @returns VBox status code.
4310	* @param pGVM Pointer to the kernel-only VM instace data.
4311	* @param idPage The page ID.
4312	* @param ppv Where to store the address.
4313	* @thread EMT
4314	*/
4315	GMMR0DECL(int) GMMR0PageIdToVirt(PGVM pGVM, uint32_t idPage, void **ppv)
4316	{
4317	*ppv = NULL;
4318	PGMM pGMM;
4319	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4320	gmmR0MutexAcquire(pGMM); /** @todo shared access */
4321
4322	int rc;
4323	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4324	if (pChunk)
4325	{
4326	const GMMPAGE *pPage = &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
4327	if (RT_LIKELY( ( GMM_PAGE_IS_PRIVATE(pPage)
4328	&& pPage->Private.hGVM == pGVM->hSelf)
4329	\|\| GMM_PAGE_IS_SHARED(pPage)))
4330	{
4331	AssertPtr(pChunk->pbMapping);
4332	*ppv = &pChunk->pbMapping[(idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT];
4333	rc = VINF_SUCCESS;
4334	}
4335	else
4336	rc = VERR_GMM_NOT_PAGE_OWNER;
4337	}
4338	else
4339	rc = VERR_GMM_PAGE_NOT_FOUND;
4340
4341	gmmR0MutexRelease(pGMM);
4342	return rc;
4343	}
4344
4345	#endif
4346
4347	#ifdef VBOX_WITH_PAGE_SHARING
4348
4349	# ifdef VBOX_STRICT
4350	/**
4351	* For checksumming shared pages in strict builds.
4352	*
4353	* The purpose is making sure that a page doesn't change.
4354	*
4355	* @returns Checksum, 0 on failure.
4356	* @param pGMM The GMM instance data.
4357	* @param pGVM Pointer to the kernel-only VM instace data.
4358	* @param idPage The page ID.
4359	*/
4360	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4361	{
4362	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4363	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4364
4365	uint8_t *pbChunk;
4366	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4367	return 0;
4368	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4369
4370	return RTCrc32(pbPage, PAGE_SIZE);
4371	}
4372	# endif /* VBOX_STRICT */
4373
4374
4375	/**
4376	* Calculates the module hash value.
4377	*
4378	* @returns Hash value.
4379	* @param pszModuleName The module name.
4380	* @param pszVersion The module version string.
4381	*/
4382	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4383	{
4384	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4385	}
4386
4387
4388	/**
4389	* Finds a global module.
4390	*
4391	* @returns Pointer to the global module on success, NULL if not found.
4392	* @param pGMM The GMM instance data.
4393	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4394	* @param cbModule The module size.
4395	* @param enmGuestOS The guest OS type.
4396	* @param cRegions The number of regions.
4397	* @param pszModuleName The module name.
4398	* @param pszVersion The module version.
4399	* @param paRegions The region descriptions.
4400	*/
4401	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4402	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4403	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4404	{
4405	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4406	pGblMod;
4407	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4408	{
4409	if (pGblMod->cbModule != cbModule)
4410	continue;
4411	if (pGblMod->enmGuestOS != enmGuestOS)
4412	continue;
4413	if (pGblMod->cRegions != cRegions)
4414	continue;
4415	if (strcmp(pGblMod->szName, pszModuleName))
4416	continue;
4417	if (strcmp(pGblMod->szVersion, pszVersion))
4418	continue;
4419
4420	uint32_t i;
4421	for (i = 0; i < cRegions; i++)
4422	{
4423	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4424	if (pGblMod->aRegions[i].off != off)
4425	break;
4426
4427	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4428	if (pGblMod->aRegions[i].cb != cb)
4429	break;
4430	}
4431
4432	if (i == cRegions)
4433	return pGblMod;
4434	}
4435
4436	return NULL;
4437	}
4438
4439
4440	/**
4441	* Creates a new global module.
4442	*
4443	* @returns VBox status code.
4444	* @param pGMM The GMM instance data.
4445	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4446	* @param cbModule The module size.
4447	* @param enmGuestOS The guest OS type.
4448	* @param cRegions The number of regions.
4449	* @param pszModuleName The module name.
4450	* @param pszVersion The module version.
4451	* @param paRegions The region descriptions.
4452	* @param ppGblMod Where to return the new module on success.
4453	*/
4454	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4455	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4456	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4457	{
4458	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, enmGuestOS, cRegions));
4459	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4460	{
4461	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4462	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4463	}
4464
4465	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULE, aRegions[cRegions]));
4466	if (!pGblMod)
4467	{
4468	Log(("gmmR0ShModNewGlobal: No memory\n"));
4469	return VERR_NO_MEMORY;
4470	}
4471
4472	pGblMod->Core.Key = uHash;
4473	pGblMod->cbModule = cbModule;
4474	pGblMod->cRegions = cRegions;
4475	pGblMod->cUsers = 1;
4476	pGblMod->enmGuestOS = enmGuestOS;
4477	strcpy(pGblMod->szName, pszModuleName);
4478	strcpy(pGblMod->szVersion, pszVersion);
4479
4480	for (uint32_t i = 0; i < cRegions; i++)
4481	{
4482	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4483	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4484	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4485	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4486	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4487	}
4488
4489	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4490	Assert(fInsert); NOREF(fInsert);
4491	pGMM->cShareableModules++;
4492
4493	*ppGblMod = pGblMod;
4494	return VINF_SUCCESS;
4495	}
4496
4497
4498	/**
4499	* Deletes a global module which is no longer referenced by anyone.
4500	*
4501	* @param pGMM The GMM instance data.
4502	* @param pGblMod The module to delete.
4503	*/
4504	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4505	{
4506	Assert(pGblMod->cUsers == 0);
4507	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4508
4509	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4510	Assert(pvTest == pGblMod); NOREF(pvTest);
4511	pGMM->cShareableModules--;
4512
4513	uint32_t i = pGblMod->cRegions;
4514	while (i-- > 0)
4515	{
4516	if (pGblMod->aRegions[i].paidPages)
4517	{
4518	/* We don't doing anything to the pages as they are handled by the
4519	copy-on-write mechanism in PGM. */
4520	RTMemFree(pGblMod->aRegions[i].paidPages);
4521	pGblMod->aRegions[i].paidPages = NULL;
4522	}
4523	}
4524	RTMemFree(pGblMod);
4525	}
4526
4527
4528	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4529	PGMMSHAREDMODULEPERVM *ppRecVM)
4530	{
4531	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4532	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4533
4534	PGMMSHAREDMODULEPERVM pRecVM;
4535	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_UOFFSETOF_DYN(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4536	if (!pRecVM)
4537	return VERR_NO_MEMORY;
4538
4539	pRecVM->Core.Key = GCBaseAddr;
4540	for (uint32_t i = 0; i < cRegions; i++)
4541	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4542
4543	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4544	Assert(fInsert); NOREF(fInsert);
4545	pGVM->gmm.s.Stats.cShareableModules++;
4546
4547	*ppRecVM = pRecVM;
4548	return VINF_SUCCESS;
4549	}
4550
4551
4552	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4553	{
4554	/*
4555	* Free the per-VM module.
4556	*/
4557	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4558	pRecVM->pGlobalModule = NULL;
4559
4560	if (fRemove)
4561	{
4562	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4563	Assert(pvTest == &pRecVM->Core); NOREF(pvTest);
4564	}
4565
4566	RTMemFree(pRecVM);
4567
4568	/*
4569	* Release the global module.
4570	* (In the registration bailout case, it might not be.)
4571	*/
4572	if (pGblMod)
4573	{
4574	Assert(pGblMod->cUsers > 0);
4575	pGblMod->cUsers--;
4576	if (pGblMod->cUsers == 0)
4577	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4578	}
4579	}
4580
4581	#endif /* VBOX_WITH_PAGE_SHARING */
4582
4583	/**
4584	* Registers a new shared module for the VM.
4585	*
4586	* @returns VBox status code.
4587	* @param pGVM The global (ring-0) VM structure.
4588	* @param idCpu The VCPU id.
4589	* @param enmGuestOS The guest OS type.
4590	* @param pszModuleName The module name.
4591	* @param pszVersion The module version.
4592	* @param GCPtrModBase The module base address.
4593	* @param cbModule The module size.
4594	* @param cRegions The mumber of shared region descriptors.
4595	* @param paRegions Pointer to an array of shared region(s).
4596	* @thread EMT(idCpu)
4597	*/
4598	GMMR0DECL(int) GMMR0RegisterSharedModule(PGVM pGVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4599	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4600	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4601	{
4602	#ifdef VBOX_WITH_PAGE_SHARING
4603	/*
4604	* Validate input and get the basics.
4605	*
4606	* Note! Turns out the module size does necessarily match the size of the
4607	* regions. (iTunes on XP)
4608	*/
4609	PGMM pGMM;
4610	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4611	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4612	if (RT_FAILURE(rc))
4613	return rc;
4614
4615	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4616	return VERR_GMM_TOO_MANY_REGIONS;
4617
4618	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4619	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4620
4621	uint32_t cbTotal = 0;
4622	for (uint32_t i = 0; i < cRegions; i++)
4623	{
4624	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4625	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4626
4627	cbTotal += paRegions[i].cbRegion;
4628	if (RT_UNLIKELY(cbTotal > _1G))
4629	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4630	}
4631
4632	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4633	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4634	return VERR_GMM_MODULE_NAME_TOO_LONG;
4635
4636	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4637	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4638	return VERR_GMM_MODULE_NAME_TOO_LONG;
4639
4640	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4641	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4642
4643	/*
4644	* Take the semaphore and do some more validations.
4645	*/
4646	gmmR0MutexAcquire(pGMM);
4647	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4648	{
4649	/*
4650	* Check if this module is already locally registered and register
4651	* it if it isn't. The base address is a unique module identifier
4652	* locally.
4653	*/
4654	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4655	bool fNewModule = pRecVM == NULL;
4656	if (fNewModule)
4657	{
4658	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4659	if (RT_SUCCESS(rc))
4660	{
4661	/*
4662	* Find a matching global module, register a new one if needed.
4663	*/
4664	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4665	pszModuleName, pszVersion, paRegions);
4666	if (!pGblMod)
4667	{
4668	Assert(fNewModule);
4669	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4670	pszModuleName, pszVersion, paRegions, &pGblMod);
4671	if (RT_SUCCESS(rc))
4672	{
4673	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4674	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4675	}
4676	else
4677	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4678	}
4679	else
4680	{
4681	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4682	pGblMod->cUsers++;
4683	pRecVM->pGlobalModule = pGblMod;
4684
4685	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4686	}
4687	}
4688	}
4689	else
4690	{
4691	/*
4692	* Attempt to re-register an existing module.
4693	*/
4694	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4695	pszModuleName, pszVersion, paRegions);
4696	if (pRecVM->pGlobalModule == pGblMod)
4697	{
4698	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4699	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4700	}
4701	else
4702	{
4703	/** @todo may have to unregister+register when this happens in case it's caused
4704	* by VBoxService crashing and being restarted... */
4705	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4706	" incoming at %RGvLB%#x %s %s rgns %u\n"
4707	" existing at %RGvLB%#x %s %s rgns %u\n",
4708	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4709	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4710	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4711	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4712	}
4713	}
4714	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4715	}
4716	else
4717	rc = VERR_GMM_IS_NOT_SANE;
4718
4719	gmmR0MutexRelease(pGMM);
4720	return rc;
4721	#else
4722
4723	NOREF(pGVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4724	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4725	return VERR_NOT_IMPLEMENTED;
4726	#endif
4727	}
4728
4729
4730	/**
4731	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4732	*
4733	* @returns see GMMR0RegisterSharedModule.
4734	* @param pGVM The global (ring-0) VM structure.
4735	* @param idCpu The VCPU id.
4736	* @param pReq Pointer to the request packet.
4737	*/
4738	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4739	{
4740	/*
4741	* Validate input and pass it on.
4742	*/
4743	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4744	AssertMsgReturn( pReq->Hdr.cbReq >= sizeof(*pReq)
4745	&& pReq->Hdr.cbReq == RT_UOFFSETOF_DYN(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]),
4746	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(*pReq)), VERR_INVALID_PARAMETER);
4747
4748	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4749	pReq->rc = GMMR0RegisterSharedModule(pGVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4750	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4751	return VINF_SUCCESS;
4752	}
4753
4754
4755	/**
4756	* Unregisters a shared module for the VM
4757	*
4758	* @returns VBox status code.
4759	* @param pGVM The global (ring-0) VM structure.
4760	* @param idCpu The VCPU id.
4761	* @param pszModuleName The module name.
4762	* @param pszVersion The module version.
4763	* @param GCPtrModBase The module base address.
4764	* @param cbModule The module size.
4765	*/
4766	GMMR0DECL(int) GMMR0UnregisterSharedModule(PGVM pGVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4767	RTGCPTR GCPtrModBase, uint32_t cbModule)
4768	{
4769	#ifdef VBOX_WITH_PAGE_SHARING
4770	/*
4771	* Validate input and get the basics.
4772	*/
4773	PGMM pGMM;
4774	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4775	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
4776	if (RT_FAILURE(rc))
4777	return rc;
4778
4779	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4780	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4781	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4782	return VERR_GMM_MODULE_NAME_TOO_LONG;
4783	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4784	return VERR_GMM_MODULE_NAME_TOO_LONG;
4785
4786	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4787
4788	/*
4789	* Take the semaphore and do some more validations.
4790	*/
4791	gmmR0MutexAcquire(pGMM);
4792	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4793	{
4794	/*
4795	* Locate and remove the specified module.
4796	*/
4797	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4798	if (pRecVM)
4799	{
4800	/** @todo Do we need to do more validations here, like that the
4801	* name + version + cbModule matches? */
4802	NOREF(cbModule);
4803	Assert(pRecVM->pGlobalModule);
4804	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4805	}
4806	else
4807	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4808
4809	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4810	}
4811	else
4812	rc = VERR_GMM_IS_NOT_SANE;
4813
4814	gmmR0MutexRelease(pGMM);
4815	return rc;
4816	#else
4817
4818	NOREF(pGVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4819	return VERR_NOT_IMPLEMENTED;
4820	#endif
4821	}
4822
4823
4824	/**
4825	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4826	*
4827	* @returns see GMMR0UnregisterSharedModule.
4828	* @param pGVM The global (ring-0) VM structure.
4829	* @param idCpu The VCPU id.
4830	* @param pReq Pointer to the request packet.
4831	*/
4832	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PGVM pGVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4833	{
4834	/*
4835	* Validate input and pass it on.
4836	*/
4837	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4838	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4839
4840	return GMMR0UnregisterSharedModule(pGVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4841	}
4842
4843	#ifdef VBOX_WITH_PAGE_SHARING
4844
4845	/**
4846	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4847	*
4848	* @param pGMM Pointer to the GMM instance.
4849	* @param pGVM Pointer to the GVM instance.
4850	* @param pPage The page structure.
4851	*/
4852	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4853	{
4854	Assert(pGMM->cSharedPages > 0);
4855	Assert(pGMM->cAllocatedPages > 0);
4856
4857	pGMM->cDuplicatePages++;
4858
4859	pPage->Shared.cRefs++;
4860	pGVM->gmm.s.Stats.cSharedPages++;
4861	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4862	}
4863
4864
4865	/**
4866	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4867	*
4868	* @param pGMM Pointer to the GMM instance.
4869	* @param pGVM Pointer to the GVM instance.
4870	* @param HCPhys Host physical address
4871	* @param idPage The Page ID
4872	* @param pPage The page structure.
4873	* @param pPageDesc Shared page descriptor
4874	*/
4875	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4876	PGMMSHAREDPAGEDESC pPageDesc)
4877	{
4878	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4879	Assert(pChunk);
4880	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4881	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4882
4883	pChunk->cPrivate--;
4884	pChunk->cShared++;
4885
4886	pGMM->cSharedPages++;
4887
4888	pGVM->gmm.s.Stats.cSharedPages++;
4889	pGVM->gmm.s.Stats.cPrivatePages--;
4890
4891	/* Modify the page structure. */
4892	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4893	pPage->Shared.cRefs = 1;
4894	#ifdef VBOX_STRICT
4895	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4896	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4897	#else
4898	NOREF(pPageDesc);
4899	pPage->Shared.u14Checksum = 0;
4900	#endif
4901	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4902	}
4903
4904
4905	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4906	unsigned idxRegion, unsigned idxPage,
4907	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4908	{
4909	NOREF(pModule);
4910
4911	/* Easy case: just change the internal page type. */
4912	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4913	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4914	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4915	VERR_PGM_PHYS_INVALID_PAGE_ID);
4916	NOREF(idxRegion);
4917
4918	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4919
4920	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
4921
4922	/* Keep track of these references. */
4923	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4924
4925	return VINF_SUCCESS;
4926	}
4927
4928	/**
4929	* Checks specified shared module range for changes
4930	*
4931	* Performs the following tasks:
4932	* - If a shared page is new, then it changes the GMM page type to shared and
4933	* returns it in the pPageDesc descriptor.
4934	* - If a shared page already exists, then it checks if the VM page is
4935	* identical and if so frees the VM page and returns the shared page in
4936	* pPageDesc descriptor.
4937	*
4938	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4939	*
4940	* @returns VBox status code.
4941	* @param pGVM Pointer to the GVM instance data.
4942	* @param pModule Module description
4943	* @param idxRegion Region index
4944	* @param idxPage Page index
4945	* @param pPageDesc Page descriptor
4946	*/
4947	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
4948	PGMMSHAREDPAGEDESC pPageDesc)
4949	{
4950	int rc;
4951	PGMM pGMM;
4952	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4953	pPageDesc->u32StrictChecksum = 0;
4954
4955	AssertMsgReturn(idxRegion < pModule->cRegions,
4956	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4957	VERR_INVALID_PARAMETER);
4958
4959	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
4960	AssertMsgReturn(idxPage < cPages,
4961	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4962	VERR_INVALID_PARAMETER);
4963
4964	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4965
4966	/*
4967	* First time; create a page descriptor array.
4968	*/
4969	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4970	if (!pGlobalRegion->paidPages)
4971	{
4972	Log(("Allocate page descriptor array for %d pages\n", cPages));
4973	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
4974	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4975
4976	/* Invalidate all descriptors. */
4977	uint32_t i = cPages;
4978	while (i-- > 0)
4979	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4980	}
4981
4982	/*
4983	* We've seen this shared page for the first time?
4984	*/
4985	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4986	{
4987	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4988	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4989	}
4990
4991	/*
4992	* We've seen it before...
4993	*/
4994	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4995	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4996	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4997
4998	/*
4999	* Get the shared page source.
5000	*/
5001	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
5002	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
5003	VERR_PGM_PHYS_INVALID_PAGE_ID);
5004
5005	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
5006	{
5007	/*
5008	* Page was freed at some point; invalidate this entry.
5009	*/
5010	/** @todo this isn't really bullet proof. */
5011	Log(("Old shared page was freed -> create a new one\n"));
5012	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
5013	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
5014	}
5015
5016	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
5017
5018	/*
5019	* Calculate the virtual address of the local page.
5020	*/
5021	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
5022	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
5023	VERR_PGM_PHYS_INVALID_PAGE_ID);
5024
5025	uint8_t *pbChunk;
5026	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
5027	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
5028	VERR_PGM_PHYS_INVALID_PAGE_ID);
5029	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5030
5031	/*
5032	* Calculate the virtual address of the shared page.
5033	*/
5034	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
5035	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
5036
5037	/*
5038	* Get the virtual address of the physical page; map the chunk into the VM
5039	* process if not already done.
5040	*/
5041	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5042	{
5043	Log(("Map chunk into process!\n"));
5044	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5045	AssertRCReturn(rc, rc);
5046	}
5047	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5048
5049	#ifdef VBOX_STRICT
5050	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
5051	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
5052	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
5053	("%#x vs %#x - idPage=%#x - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
5054	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
5055	#endif
5056
5057	/** @todo write ASMMemComparePage. */
5058	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
5059	{
5060	Log(("Unexpected differences found between local and shared page; skip\n"));
5061	/* Signal to the caller that this one hasn't changed. */
5062	pPageDesc->idPage = NIL_GMM_PAGEID;
5063	return VINF_SUCCESS;
5064	}
5065
5066	/*
5067	* Free the old local page.
5068	*/
5069	GMMFREEPAGEDESC PageDesc;
5070	PageDesc.idPage = pPageDesc->idPage;
5071	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
5072	AssertRCReturn(rc, rc);
5073
5074	gmmR0UseSharedPage(pGMM, pGVM, pPage);
5075
5076	/*
5077	* Pass along the new physical address & page id.
5078	*/
5079	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
5080	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
5081
5082	return VINF_SUCCESS;
5083	}
5084
5085
5086	/**
5087	* RTAvlGCPtrDestroy callback.
5088	*
5089	* @returns 0 or VERR_GMM_INSTANCE.
5090	* @param pNode The node to destroy.
5091	* @param pvArgs Pointer to an argument packet.
5092	*/
5093	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
5094	{
5095	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
5096	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
5097	(PGMMSHAREDMODULEPERVM)pNode,
5098	false /fRemove/);
5099	return VINF_SUCCESS;
5100	}
5101
5102
5103	/**
5104	* Used by GMMR0CleanupVM to clean up shared modules.
5105	*
5106	* This is called without taking the GMM lock so that it can be yielded as
5107	* needed here.
5108	*
5109	* @param pGMM The GMM handle.
5110	* @param pGVM The global VM handle.
5111	*/
5112	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
5113	{
5114	gmmR0MutexAcquire(pGMM);
5115	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
5116
5117	GMMR0SHMODPERVMDTORARGS Args;
5118	Args.pGVM = pGVM;
5119	Args.pGMM = pGMM;
5120	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5121
5122	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
5123	pGVM->gmm.s.Stats.cShareableModules = 0;
5124
5125	gmmR0MutexRelease(pGMM);
5126	}
5127
5128	#endif /* VBOX_WITH_PAGE_SHARING */
5129
5130	/**
5131	* Removes all shared modules for the specified VM
5132	*
5133	* @returns VBox status code.
5134	* @param pGVM The global (ring-0) VM structure.
5135	* @param idCpu The VCPU id.
5136	*/
5137	GMMR0DECL(int) GMMR0ResetSharedModules(PGVM pGVM, VMCPUID idCpu)
5138	{
5139	#ifdef VBOX_WITH_PAGE_SHARING
5140	/*
5141	* Validate input and get the basics.
5142	*/
5143	PGMM pGMM;
5144	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5145	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5146	if (RT_FAILURE(rc))
5147	return rc;
5148
5149	/*
5150	* Take the semaphore and do some more validations.
5151	*/
5152	gmmR0MutexAcquire(pGMM);
5153	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5154	{
5155	Log(("GMMR0ResetSharedModules\n"));
5156	GMMR0SHMODPERVMDTORARGS Args;
5157	Args.pGVM = pGVM;
5158	Args.pGMM = pGMM;
5159	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
5160	pGVM->gmm.s.Stats.cShareableModules = 0;
5161
5162	rc = VINF_SUCCESS;
5163	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5164	}
5165	else
5166	rc = VERR_GMM_IS_NOT_SANE;
5167
5168	gmmR0MutexRelease(pGMM);
5169	return rc;
5170	#else
5171	RT_NOREF(pGVM, idCpu);
5172	return VERR_NOT_IMPLEMENTED;
5173	#endif
5174	}
5175
5176	#ifdef VBOX_WITH_PAGE_SHARING
5177
5178	/**
5179	* Tree enumeration callback for checking a shared module.
5180	*/
5181	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5182	{
5183	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5184	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5185	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5186
5187	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5188	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5189
5190	int rc = PGMR0SharedModuleCheck(pArgs->pGVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5191	if (RT_FAILURE(rc))
5192	return rc;
5193	return VINF_SUCCESS;
5194	}
5195
5196	#endif /* VBOX_WITH_PAGE_SHARING */
5197
5198	/**
5199	* Check all shared modules for the specified VM.
5200	*
5201	* @returns VBox status code.
5202	* @param pGVM The global (ring-0) VM structure.
5203	* @param idCpu The calling EMT number.
5204	* @thread EMT(idCpu)
5205	*/
5206	GMMR0DECL(int) GMMR0CheckSharedModules(PGVM pGVM, VMCPUID idCpu)
5207	{
5208	#ifdef VBOX_WITH_PAGE_SHARING
5209	/*
5210	* Validate input and get the basics.
5211	*/
5212	PGMM pGMM;
5213	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5214	int rc = GVMMR0ValidateGVMandEMT(pGVM, idCpu);
5215	if (RT_FAILURE(rc))
5216	return rc;
5217
5218	# ifndef DEBUG_sandervl
5219	/*
5220	* Take the semaphore and do some more validations.
5221	*/
5222	gmmR0MutexAcquire(pGMM);
5223	# endif
5224	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5225	{
5226	/*
5227	* Walk the tree, checking each module.
5228	*/
5229	Log(("GMMR0CheckSharedModules\n"));
5230
5231	GMMCHECKSHAREDMODULEINFO Args;
5232	Args.pGVM = pGVM;
5233	Args.idCpu = idCpu;
5234	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5235
5236	Log(("GMMR0CheckSharedModules done (rc=%Rrc)!\n", rc));
5237	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5238	}
5239	else
5240	rc = VERR_GMM_IS_NOT_SANE;
5241
5242	# ifndef DEBUG_sandervl
5243	gmmR0MutexRelease(pGMM);
5244	# endif
5245	return rc;
5246	#else
5247	RT_NOREF(pGVM, idCpu);
5248	return VERR_NOT_IMPLEMENTED;
5249	#endif
5250	}
5251
5252	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5253
5254	/**
5255	* RTAvlU32DoWithAll callback.
5256	*
5257	* @returns 0
5258	* @param pNode The node to search.
5259	* @param pvUser Pointer to the input argument packet.
5260	*/
5261	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvUser)
5262	{
5263	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
5264	GMMFINDDUPPAGEINFO pArgs = (GMMFINDDUPPAGEINFO )pvUser;
5265	PGVM pGVM = pArgs->pGVM;
5266	PGMM pGMM = pArgs->pGMM;
5267	uint8_t *pbChunk;
5268
5269	/* Only take chunks not mapped into this VM process; not entirely correct. */
5270	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5271	{
5272	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5273	if (RT_SUCCESS(rc))
5274	{
5275	/*
5276	* Look for duplicate pages
5277	*/
5278	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5279	while (iPage-- > 0)
5280	{
5281	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5282	{
5283	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5284
5285	if (!memcmp(pArgs->pSourcePage, pbDestPage, PAGE_SIZE))
5286	{
5287	pArgs->fFoundDuplicate = true;
5288	break;
5289	}
5290	}
5291	}
5292	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5293	}
5294	}
5295	return pArgs->fFoundDuplicate; /* (stops search if true) */
5296	}
5297
5298
5299	/**
5300	* Find a duplicate of the specified page in other active VMs
5301	*
5302	* @returns VBox status code.
5303	* @param pGVM The global (ring-0) VM structure.
5304	* @param pReq Pointer to the request packet.
5305	*/
5306	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PGVM pGVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5307	{
5308	/*
5309	* Validate input and pass it on.
5310	*/
5311	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5312	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5313
5314	PGMM pGMM;
5315	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5316
5317	int rc = GVMMR0ValidateGVM(pGVM);
5318	if (RT_FAILURE(rc))
5319	return rc;
5320
5321	/*
5322	* Take the semaphore and do some more validations.
5323	*/
5324	rc = gmmR0MutexAcquire(pGMM);
5325	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5326	{
5327	uint8_t *pbChunk;
5328	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5329	if (pChunk)
5330	{
5331	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5332	{
5333	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5334	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5335	if (pPage)
5336	{
5337	GMMFINDDUPPAGEINFO Args;
5338	Args.pGVM = pGVM;
5339	Args.pGMM = pGMM;
5340	Args.pSourcePage = pbSourcePage;
5341	Args.fFoundDuplicate = false;
5342	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Args);
5343
5344	pReq->fDuplicate = Args.fFoundDuplicate;
5345	}
5346	else
5347	{
5348	AssertFailed();
5349	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5350	}
5351	}
5352	else
5353	AssertFailed();
5354	}
5355	else
5356	AssertFailed();
5357	}
5358	else
5359	rc = VERR_GMM_IS_NOT_SANE;
5360
5361	gmmR0MutexRelease(pGMM);
5362	return rc;
5363	}
5364
5365	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5366
5367
5368	/**
5369	* Retrieves the GMM statistics visible to the caller.
5370	*
5371	* @returns VBox status code.
5372	*
5373	* @param pStats Where to put the statistics.
5374	* @param pSession The current session.
5375	* @param pGVM The GVM to obtain statistics for. Optional.
5376	*/
5377	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5378	{
5379	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pGVM=%p\n", pStats, pSession, pGVM));
5380
5381	/*
5382	* Validate input.
5383	*/
5384	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5385	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5386	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5387
5388	PGMM pGMM;
5389	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5390
5391	/*
5392	* Validate the VM handle, if not NULL, and lock the GMM.
5393	*/
5394	int rc;
5395	if (pGVM)
5396	{
5397	rc = GVMMR0ValidateGVM(pGVM);
5398	if (RT_FAILURE(rc))
5399	return rc;
5400	}
5401
5402	rc = gmmR0MutexAcquire(pGMM);
5403	if (RT_FAILURE(rc))
5404	return rc;
5405
5406	/*
5407	* Copy out the GMM statistics.
5408	*/
5409	pStats->cMaxPages = pGMM->cMaxPages;
5410	pStats->cReservedPages = pGMM->cReservedPages;
5411	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5412	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5413	pStats->cSharedPages = pGMM->cSharedPages;
5414	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5415	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5416	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5417	pStats->cChunks = pGMM->cChunks;
5418	pStats->cFreedChunks = pGMM->cFreedChunks;
5419	pStats->cShareableModules = pGMM->cShareableModules;
5420	RT_ZERO(pStats->au64Reserved);
5421
5422	/*
5423	* Copy out the VM statistics.
5424	*/
5425	if (pGVM)
5426	pStats->VMStats = pGVM->gmm.s.Stats;
5427	else
5428	RT_ZERO(pStats->VMStats);
5429
5430	gmmR0MutexRelease(pGMM);
5431	return rc;
5432	}
5433
5434
5435	/**
5436	* VMMR0 request wrapper for GMMR0QueryStatistics.
5437	*
5438	* @returns see GMMR0QueryStatistics.
5439	* @param pGVM The global (ring-0) VM structure. Optional.
5440	* @param pReq Pointer to the request packet.
5441	*/
5442	GMMR0DECL(int) GMMR0QueryStatisticsReq(PGVM pGVM, PGMMQUERYSTATISTICSSREQ pReq)
5443	{
5444	/*
5445	* Validate input and pass it on.
5446	*/
5447	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5448	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5449
5450	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pGVM);
5451	}
5452
5453
5454	/**
5455	* Resets the specified GMM statistics.
5456	*
5457	* @returns VBox status code.
5458	*
5459	* @param pStats Which statistics to reset, that is, non-zero fields
5460	* indicates which to reset.
5461	* @param pSession The current session.
5462	* @param pGVM The GVM to reset statistics for. Optional.
5463	*/
5464	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PGVM pGVM)
5465	{
5466	NOREF(pStats); NOREF(pSession); NOREF(pGVM);
5467	/* Currently nothing we can reset at the moment. */
5468	return VINF_SUCCESS;
5469	}
5470
5471
5472	/**
5473	* VMMR0 request wrapper for GMMR0ResetStatistics.
5474	*
5475	* @returns see GMMR0ResetStatistics.
5476	* @param pGVM The global (ring-0) VM structure. Optional.
5477	* @param pReq Pointer to the request packet.
5478	*/
5479	GMMR0DECL(int) GMMR0ResetStatisticsReq(PGVM pGVM, PGMMRESETSTATISTICSSREQ pReq)
5480	{
5481	/*
5482	* Validate input and pass it on.
5483	*/
5484	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5485	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5486
5487	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pGVM);
5488	}
5489

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 82821

Download in other formats: