GMMR0.cpp@ 43361

Last change on this file since 43361 was 43235, checked in by vboxsync, 12 years ago

GMMR0.cpp: Fixed bug in GMMR0CleanupVM/gmmR0CleanupVMScanChunk affecting bound mode on all 32-bit hosts + 64-bit darwin. Problem was caused by unncessary scanning of chunks bound to other VMs and accidentally relinking them into the set of the VM about to die. Once the GVM structure was finally fried, almost all pChunk->pSet members would point to the dead VMs GVM::gmm.s.Private member.

Also fixed a missing redo-from-start when someone else freed a chunk while we were scanning the list. Expecting this to only occure rarely, but should be reproducible when many VMs are doing cleanups at the same time in unbound mode.

Property svn:eol-style set to native
Property svn:keywords set to Id Revision

File size: 187.1 KB

Line
1	/* $Id: GMMR0.cpp 43235 2012-09-06 23:53:40Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2012 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relation ship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref subsec_pgmPhys_Serializing.
115	*
116	* @see @ref subsec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*******************************************************************************
150	* Header Files *
151	*******************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/vm.h>
155	#include <VBox/vmm/gmm.h>
156	#include "GMMR0Internal.h"
157	#include <VBox/vmm/gvm.h>
158	#include <VBox/vmm/pgm.h>
159	#include <VBox/log.h>
160	#include <VBox/param.h>
161	#include <VBox/err.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#ifdef VBOX_STRICT
165	# include <iprt/crc.h>
166	#endif
167	#include <iprt/list.h>
168	#include <iprt/mem.h>
169	#include <iprt/memobj.h>
170	#include <iprt/mp.h>
171	#include <iprt/semaphore.h>
172	#include <iprt/string.h>
173	#include <iprt/time.h>
174
175
176	/*******************************************************************************
177	* Structures and Typedefs *
178	*******************************************************************************/
179	/** Pointer to set of free chunks. */
180	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
181
182	/**
183	* The per-page tracking structure employed by the GMM.
184	*
185	* On 32-bit hosts we'll some trickery is necessary to compress all
186	* the information into 32-bits. When the fSharedFree member is set,
187	* the 30th bit decides whether it's a free page or not.
188	*
189	* Because of the different layout on 32-bit and 64-bit hosts, macros
190	* are used to get and set some of the data.
191	*/
192	typedef union GMMPAGE
193	{
194	#if HC_ARCH_BITS == 64
195	/** Unsigned integer view. */
196	uint64_t u;
197
198	/** The common view. */
199	struct GMMPAGECOMMON
200	{
201	uint32_t uStuff1 : 32;
202	uint32_t uStuff2 : 30;
203	/** The page state. */
204	uint32_t u2State : 2;
205	} Common;
206
207	/** The view of a private page. */
208	struct GMMPAGEPRIVATE
209	{
210	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
211	uint32_t pfn;
212	/** The GVM handle. (64K VMs) */
213	uint32_t hGVM : 16;
214	/** Reserved. */
215	uint32_t u16Reserved : 14;
216	/** The page state. */
217	uint32_t u2State : 2;
218	} Private;
219
220	/** The view of a shared page. */
221	struct GMMPAGESHARED
222	{
223	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
224	uint32_t pfn;
225	/** The reference count (64K VMs). */
226	uint32_t cRefs : 16;
227	/** Used for debug checksumming. */
228	uint32_t u14Checksum : 14;
229	/** The page state. */
230	uint32_t u2State : 2;
231	} Shared;
232
233	/** The view of a free page. */
234	struct GMMPAGEFREE
235	{
236	/** The index of the next page in the free list. UINT16_MAX is NIL. */
237	uint16_t iNext;
238	/** Reserved. Checksum or something? */
239	uint16_t u16Reserved0;
240	/** Reserved. Checksum or something? */
241	uint32_t u30Reserved1 : 30;
242	/** The page state. */
243	uint32_t u2State : 2;
244	} Free;
245
246	#else /* 32-bit */
247	/** Unsigned integer view. */
248	uint32_t u;
249
250	/** The common view. */
251	struct GMMPAGECOMMON
252	{
253	uint32_t uStuff : 30;
254	/** The page state. */
255	uint32_t u2State : 2;
256	} Common;
257
258	/** The view of a private page. */
259	struct GMMPAGEPRIVATE
260	{
261	/** The guest page frame number. (Max addressable: 2 ^ 36) */
262	uint32_t pfn : 24;
263	/** The GVM handle. (127 VMs) */
264	uint32_t hGVM : 7;
265	/** The top page state bit, MBZ. */
266	uint32_t fZero : 1;
267	} Private;
268
269	/** The view of a shared page. */
270	struct GMMPAGESHARED
271	{
272	/** The reference count. */
273	uint32_t cRefs : 30;
274	/** The page state. */
275	uint32_t u2State : 2;
276	} Shared;
277
278	/** The view of a free page. */
279	struct GMMPAGEFREE
280	{
281	/** The index of the next page in the free list. UINT16_MAX is NIL. */
282	uint32_t iNext : 16;
283	/** Reserved. Checksum or something? */
284	uint32_t u14Reserved : 14;
285	/** The page state. */
286	uint32_t u2State : 2;
287	} Free;
288	#endif
289	} GMMPAGE;
290	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
291	/** Pointer to a GMMPAGE. */
292	typedef GMMPAGE *PGMMPAGE;
293
294
295	/** @name The Page States.
296	* @{ */
297	/** A private page. */
298	#define GMM_PAGE_STATE_PRIVATE 0
299	/** A private page - alternative value used on the 32-bit implementation.
300	* This will never be used on 64-bit hosts. */
301	#define GMM_PAGE_STATE_PRIVATE_32 1
302	/** A shared page. */
303	#define GMM_PAGE_STATE_SHARED 2
304	/** A free page. */
305	#define GMM_PAGE_STATE_FREE 3
306	/** @} */
307
308
309	/** @def GMM_PAGE_IS_PRIVATE
310	*
311	* @returns true if private, false if not.
312	* @param pPage The GMM page.
313	*/
314	#if HC_ARCH_BITS == 64
315	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
316	#else
317	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
318	#endif
319
320	/** @def GMM_PAGE_IS_SHARED
321	*
322	* @returns true if shared, false if not.
323	* @param pPage The GMM page.
324	*/
325	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
326
327	/** @def GMM_PAGE_IS_FREE
328	*
329	* @returns true if free, false if not.
330	* @param pPage The GMM page.
331	*/
332	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
333
334	/** @def GMM_PAGE_PFN_LAST
335	* The last valid guest pfn range.
336	* @remark Some of the values outside the range has special meaning,
337	* see GMM_PAGE_PFN_UNSHAREABLE.
338	*/
339	#if HC_ARCH_BITS == 64
340	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
341	#else
342	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
343	#endif
344	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
345
346	/** @def GMM_PAGE_PFN_UNSHAREABLE
347	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
348	*/
349	#if HC_ARCH_BITS == 64
350	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
351	#else
352	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
353	#endif
354	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
355
356
357	/**
358	* A GMM allocation chunk ring-3 mapping record.
359	*
360	* This should really be associated with a session and not a VM, but
361	* it's simpler to associated with a VM and cleanup with the VM object
362	* is destroyed.
363	*/
364	typedef struct GMMCHUNKMAP
365	{
366	/** The mapping object. */
367	RTR0MEMOBJ hMapObj;
368	/** The VM owning the mapping. */
369	PGVM pGVM;
370	} GMMCHUNKMAP;
371	/** Pointer to a GMM allocation chunk mapping. */
372	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
373
374
375	/**
376	* A GMM allocation chunk.
377	*/
378	typedef struct GMMCHUNK
379	{
380	/** The AVL node core.
381	* The Key is the chunk ID. (Giant mtx.) */
382	AVLU32NODECORE Core;
383	/** The memory object.
384	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
385	* what the host can dish up with. (Chunk mtx protects mapping accesses
386	* and related frees.) */
387	RTR0MEMOBJ hMemObj;
388	/** Pointer to the next chunk in the free list. (Giant mtx.) */
389	PGMMCHUNK pFreeNext;
390	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
391	PGMMCHUNK pFreePrev;
392	/** Pointer to the free set this chunk belongs to. NULL for
393	* chunks with no free pages. (Giant mtx.) */
394	PGMMCHUNKFREESET pSet;
395	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
396	RTLISTNODE ListNode;
397	/** Pointer to an array of mappings. (Chunk mtx.) */
398	PGMMCHUNKMAP paMappingsX;
399	/** The number of mappings. (Chunk mtx.) */
400	uint16_t cMappingsX;
401	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
402	* mapping or freeing anything. (Giant mtx.) */
403	uint8_t volatile iChunkMtx;
404	/** Flags field reserved for future use (like eliminating enmType).
405	* (Giant mtx.) */
406	uint8_t fFlags;
407	/** The head of the list of free pages. UINT16_MAX is the NIL value.
408	* (Giant mtx.) */
409	uint16_t iFreeHead;
410	/** The number of free pages. (Giant mtx.) */
411	uint16_t cFree;
412	/** The GVM handle of the VM that first allocated pages from this chunk, this
413	* is used as a preference when there are several chunks to choose from.
414	* When in bound memory mode this isn't a preference any longer. (Giant
415	* mtx.) */
416	uint16_t hGVM;
417	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
418	* future use.) (Giant mtx.) */
419	uint16_t idNumaNode;
420	/** The number of private pages. (Giant mtx.) */
421	uint16_t cPrivate;
422	/** The number of shared pages. (Giant mtx.) */
423	uint16_t cShared;
424	/** The pages. (Giant mtx.) */
425	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
426	} GMMCHUNK;
427
428	/** Indicates that the NUMA properies of the memory is unknown. */
429	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
430
431	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
432	* @{ */
433	/** Indicates that the chunk is a large page (2MB). */
434	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
435	/** @} */
436
437
438	/**
439	* An allocation chunk TLB entry.
440	*/
441	typedef struct GMMCHUNKTLBE
442	{
443	/** The chunk id. */
444	uint32_t idChunk;
445	/** Pointer to the chunk. */
446	PGMMCHUNK pChunk;
447	} GMMCHUNKTLBE;
448	/** Pointer to an allocation chunk TLB entry. */
449	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
450
451
452	/** The number of entries tin the allocation chunk TLB. */
453	#define GMM_CHUNKTLB_ENTRIES 32
454	/** Gets the TLB entry index for the given Chunk ID. */
455	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
456
457	/**
458	* An allocation chunk TLB.
459	*/
460	typedef struct GMMCHUNKTLB
461	{
462	/** The TLB entries. */
463	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
464	} GMMCHUNKTLB;
465	/** Pointer to an allocation chunk TLB. */
466	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
467
468
469	/**
470	* The GMM instance data.
471	*/
472	typedef struct GMM
473	{
474	/** Magic / eye catcher. GMM_MAGIC */
475	uint32_t u32Magic;
476	/** The number of threads waiting on the mutex. */
477	uint32_t cMtxContenders;
478	/** The fast mutex protecting the GMM.
479	* More fine grained locking can be implemented later if necessary. */
480	RTSEMFASTMUTEX hMtx;
481	#ifdef VBOX_STRICT
482	/** The current mutex owner. */
483	RTNATIVETHREAD hMtxOwner;
484	#endif
485	/** The chunk tree. */
486	PAVLU32NODECORE pChunks;
487	/** The chunk TLB. */
488	GMMCHUNKTLB ChunkTLB;
489	/** The private free set. */
490	GMMCHUNKFREESET PrivateX;
491	/** The shared free set. */
492	GMMCHUNKFREESET Shared;
493
494	/** Shared module tree (global).
495	* @todo separate trees for distinctly different guest OSes. */
496	PAVLLU32NODECORE pGlobalSharedModuleTree;
497	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
498	uint32_t cShareableModules;
499
500	/** The chunk list. For simplifying the cleanup process. */
501	RTLISTANCHOR ChunkList;
502
503	/** The maximum number of pages we're allowed to allocate.
504	* @gcfgm 64-bit GMM/MaxPages Direct.
505	* @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
506	uint64_t cMaxPages;
507	/** The number of pages that has been reserved.
508	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
509	uint64_t cReservedPages;
510	/** The number of pages that we have over-committed in reservations. */
511	uint64_t cOverCommittedPages;
512	/** The number of actually allocated (committed if you like) pages. */
513	uint64_t cAllocatedPages;
514	/** The number of pages that are shared. A subset of cAllocatedPages. */
515	uint64_t cSharedPages;
516	/** The number of pages that are actually shared between VMs. */
517	uint64_t cDuplicatePages;
518	/** The number of pages that are shared that has been left behind by
519	* VMs not doing proper cleanups. */
520	uint64_t cLeftBehindSharedPages;
521	/** The number of allocation chunks.
522	* (The number of pages we've allocated from the host can be derived from this.) */
523	uint32_t cChunks;
524	/** The number of current ballooned pages. */
525	uint64_t cBalloonedPages;
526
527	/** The legacy allocation mode indicator.
528	* This is determined at initialization time. */
529	bool fLegacyAllocationMode;
530	/** The bound memory mode indicator.
531	* When set, the memory will be bound to a specific VM and never
532	* shared. This is always set if fLegacyAllocationMode is set.
533	* (Also determined at initialization time.) */
534	bool fBoundMemoryMode;
535	/** The number of registered VMs. */
536	uint16_t cRegisteredVMs;
537
538	/** The number of freed chunks ever. This is used a list generation to
539	* avoid restarting the cleanup scanning when the list wasn't modified. */
540	uint32_t cFreedChunks;
541	/** The previous allocated Chunk ID.
542	* Used as a hint to avoid scanning the whole bitmap. */
543	uint32_t idChunkPrev;
544	/** Chunk ID allocation bitmap.
545	* Bits of allocated IDs are set, free ones are clear.
546	* The NIL id (0) is marked allocated. */
547	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
548
549	/** The index of the next mutex to use. */
550	uint32_t iNextChunkMtx;
551	/** Chunk locks for reducing lock contention without having to allocate
552	* one lock per chunk. */
553	struct
554	{
555	/** The mutex */
556	RTSEMFASTMUTEX hMtx;
557	/** The number of threads currently using this mutex. */
558	uint32_t volatile cUsers;
559	} aChunkMtx[64];
560	} GMM;
561	/** Pointer to the GMM instance. */
562	typedef GMM *PGMM;
563
564	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
565	#define GMM_MAGIC UINT32_C(0x19540414)
566
567
568	/**
569	* GMM chunk mutex state.
570	*
571	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
572	* gmmR0ChunkMutex* methods.
573	*/
574	typedef struct GMMR0CHUNKMTXSTATE
575	{
576	PGMM pGMM;
577	/** The index of the chunk mutex. */
578	uint8_t iChunkMtx;
579	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
580	uint8_t fFlags;
581	} GMMR0CHUNKMTXSTATE;
582	/** Pointer to a chunk mutex state. */
583	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
584
585	/** @name GMMR0CHUNK_MTX_XXX
586	* @{ */
587	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
588	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
589	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
590	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
591	#define GMMR0CHUNK_MTX_END UINT32_C(4)
592	/** @} */
593
594
595	/** The maximum number of shared modules per-vm. */
596	#define GMM_MAX_SHARED_PER_VM_MODULES 2048
597	/** The maximum number of shared modules GMM is allowed to track. */
598	#define GMM_MAX_SHARED_GLOBAL_MODULES 16834
599
600
601	/**
602	* Argument packet for gmmR0SharedModuleCleanup.
603	*/
604	typedef struct GMMR0SHMODPERVMDTORARGS
605	{
606	PGVM pGVM;
607	PGMM pGMM;
608	} GMMR0SHMODPERVMDTORARGS;
609
610	/**
611	* Argument packet for gmmR0CheckSharedModule.
612	*/
613	typedef struct GMMCHECKSHAREDMODULEINFO
614	{
615	PGVM pGVM;
616	VMCPUID idCpu;
617	} GMMCHECKSHAREDMODULEINFO;
618
619	/**
620	* Argument packet for gmmR0FindDupPageInChunk by GMMR0FindDuplicatePage.
621	*/
622	typedef struct GMMFINDDUPPAGEINFO
623	{
624	PGVM pGVM;
625	PGMM pGMM;
626	uint8_t *pSourcePage;
627	bool fFoundDuplicate;
628	} GMMFINDDUPPAGEINFO;
629
630
631	/*******************************************************************************
632	* Global Variables *
633	*******************************************************************************/
634	/** Pointer to the GMM instance data. */
635	static PGMM g_pGMM = NULL;
636
637	/** Macro for obtaining and validating the g_pGMM pointer.
638	*
639	* On failure it will return from the invoking function with the specified
640	* return value.
641	*
642	* @param pGMM The name of the pGMM variable.
643	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
644	* status codes.
645	*/
646	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
647	do { \
648	(pGMM) = g_pGMM; \
649	AssertPtrReturn((pGMM), (rc)); \
650	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
651	} while (0)
652
653	/** Macro for obtaining and validating the g_pGMM pointer, void function
654	* variant.
655	*
656	* On failure it will return from the invoking function.
657	*
658	* @param pGMM The name of the pGMM variable.
659	*/
660	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
661	do { \
662	(pGMM) = g_pGMM; \
663	AssertPtrReturnVoid((pGMM)); \
664	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
665	} while (0)
666
667
668	/** @def GMM_CHECK_SANITY_UPON_ENTERING
669	* Checks the sanity of the GMM instance data before making changes.
670	*
671	* This is macro is a stub by default and must be enabled manually in the code.
672	*
673	* @returns true if sane, false if not.
674	* @param pGMM The name of the pGMM variable.
675	*/
676	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
677	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
678	#else
679	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
680	#endif
681
682	/** @def GMM_CHECK_SANITY_UPON_LEAVING
683	* Checks the sanity of the GMM instance data after making changes.
684	*
685	* This is macro is a stub by default and must be enabled manually in the code.
686	*
687	* @returns true if sane, false if not.
688	* @param pGMM The name of the pGMM variable.
689	*/
690	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
691	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
692	#else
693	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
694	#endif
695
696	/** @def GMM_CHECK_SANITY_IN_LOOPS
697	* Checks the sanity of the GMM instance in the allocation loops.
698	*
699	* This is macro is a stub by default and must be enabled manually in the code.
700	*
701	* @returns true if sane, false if not.
702	* @param pGMM The name of the pGMM variable.
703	*/
704	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
705	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
706	#else
707	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
708	#endif
709
710
711	/*******************************************************************************
712	* Internal Functions *
713	*******************************************************************************/
714	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
715	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
716	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
717	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
718	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
719	#ifdef GMMR0_WITH_SANITY_CHECK
720	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
721	#endif
722	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
723	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
724	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
725	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
726	#ifdef VBOX_WITH_PAGE_SHARING
727	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
728	# ifdef VBOX_STRICT
729	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage);
730	# endif
731	#endif
732
733
734
735	/**
736	* Initializes the GMM component.
737	*
738	* This is called when the VMMR0.r0 module is loaded and protected by the
739	* loader semaphore.
740	*
741	* @returns VBox status code.
742	*/
743	GMMR0DECL(int) GMMR0Init(void)
744	{
745	LogFlow(("GMMInit:\n"));
746
747	/*
748	* Allocate the instance data and the locks.
749	*/
750	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
751	if (!pGMM)
752	return VERR_NO_MEMORY;
753
754	pGMM->u32Magic = GMM_MAGIC;
755	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
756	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
757	RTListInit(&pGMM->ChunkList);
758	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
759
760	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
761	if (RT_SUCCESS(rc))
762	{
763	unsigned iMtx;
764	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
765	{
766	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
767	if (RT_FAILURE(rc))
768	break;
769	}
770	if (RT_SUCCESS(rc))
771	{
772	/*
773	* Check and see if RTR0MemObjAllocPhysNC works.
774	*/
775	#if 0 /* later, see @bufref{3170}. */
776	RTR0MEMOBJ MemObj;
777	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
778	if (RT_SUCCESS(rc))
779	{
780	rc = RTR0MemObjFree(MemObj, true);
781	AssertRC(rc);
782	}
783	else if (rc == VERR_NOT_SUPPORTED)
784	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
785	else
786	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
787	#else
788	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
789	pGMM->fLegacyAllocationMode = false;
790	# if ARCH_BITS == 32
791	/* Don't reuse possibly partial chunks because of the virtual
792	address space limitation. */
793	pGMM->fBoundMemoryMode = true;
794	# else
795	pGMM->fBoundMemoryMode = false;
796	# endif
797	# else
798	pGMM->fLegacyAllocationMode = true;
799	pGMM->fBoundMemoryMode = true;
800	# endif
801	#endif
802
803	/*
804	* Query system page count and guess a reasonable cMaxPages value.
805	*/
806	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
807
808	g_pGMM = pGMM;
809	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
810	return VINF_SUCCESS;
811	}
812
813	/*
814	* Bail out.
815	*/
816	while (iMtx-- > 0)
817	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
818	RTSemFastMutexDestroy(pGMM->hMtx);
819	}
820
821	pGMM->u32Magic = 0;
822	RTMemFree(pGMM);
823	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
824	return rc;
825	}
826
827
828	/**
829	* Terminates the GMM component.
830	*/
831	GMMR0DECL(void) GMMR0Term(void)
832	{
833	LogFlow(("GMMTerm:\n"));
834
835	/*
836	* Take care / be paranoid...
837	*/
838	PGMM pGMM = g_pGMM;
839	if (!VALID_PTR(pGMM))
840	return;
841	if (pGMM->u32Magic != GMM_MAGIC)
842	{
843	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
844	return;
845	}
846
847	/*
848	* Undo what init did and free all the resources we've acquired.
849	*/
850	/* Destroy the fundamentals. */
851	g_pGMM = NULL;
852	pGMM->u32Magic = ~GMM_MAGIC;
853	RTSemFastMutexDestroy(pGMM->hMtx);
854	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
855
856	/* Free any chunks still hanging around. */
857	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
858
859	/* Destroy the chunk locks. */
860	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
861	{
862	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
863	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
864	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
865	}
866
867	/* Finally the instance data itself. */
868	RTMemFree(pGMM);
869	LogFlow(("GMMTerm: done\n"));
870	}
871
872
873	/**
874	* RTAvlU32Destroy callback.
875	*
876	* @returns 0
877	* @param pNode The node to destroy.
878	* @param pvGMM The GMM handle.
879	*/
880	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
881	{
882	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
883
884	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
885	SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
886	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
887
888	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
889	if (RT_FAILURE(rc))
890	{
891	SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
892	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
893	AssertRC(rc);
894	}
895	pChunk->hMemObj = NIL_RTR0MEMOBJ;
896
897	RTMemFree(pChunk->paMappingsX);
898	pChunk->paMappingsX = NULL;
899
900	RTMemFree(pChunk);
901	NOREF(pvGMM);
902	return 0;
903	}
904
905
906	/**
907	* Initializes the per-VM data for the GMM.
908	*
909	* This is called from within the GVMM lock (from GVMMR0CreateVM)
910	* and should only initialize the data members so GMMR0CleanupVM
911	* can deal with them. We reserve no memory or anything here,
912	* that's done later in GMMR0InitVM.
913	*
914	* @param pGVM Pointer to the Global VM structure.
915	*/
916	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
917	{
918	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
919
920	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
921	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
922	pGVM->gmm.s.Stats.fMayAllocate = false;
923	}
924
925
926	/**
927	* Acquires the GMM giant lock.
928	*
929	* @returns Assert status code from RTSemFastMutexRequest.
930	* @param pGMM Pointer to the GMM instance.
931	*/
932	static int gmmR0MutexAcquire(PGMM pGMM)
933	{
934	ASMAtomicIncU32(&pGMM->cMtxContenders);
935	int rc = RTSemFastMutexRequest(pGMM->hMtx);
936	ASMAtomicDecU32(&pGMM->cMtxContenders);
937	AssertRC(rc);
938	#ifdef VBOX_STRICT
939	pGMM->hMtxOwner = RTThreadNativeSelf();
940	#endif
941	return rc;
942	}
943
944
945	/**
946	* Releases the GMM giant lock.
947	*
948	* @returns Assert status code from RTSemFastMutexRequest.
949	* @param pGMM Pointer to the GMM instance.
950	*/
951	static int gmmR0MutexRelease(PGMM pGMM)
952	{
953	#ifdef VBOX_STRICT
954	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
955	#endif
956	int rc = RTSemFastMutexRelease(pGMM->hMtx);
957	AssertRC(rc);
958	return rc;
959	}
960
961
962	/**
963	* Yields the GMM giant lock if there is contention and a certain minimum time
964	* has elapsed since we took it.
965	*
966	* @returns @c true if the mutex was yielded, @c false if not.
967	* @param pGMM Pointer to the GMM instance.
968	* @param puLockNanoTS Where the lock acquisition time stamp is kept
969	* (in/out).
970	*/
971	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
972	{
973	/*
974	* If nobody is contending the mutex, don't bother checking the time.
975	*/
976	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
977	return false;
978
979	/*
980	* Don't yield if we haven't executed for at least 2 milliseconds.
981	*/
982	uint64_t uNanoNow = RTTimeSystemNanoTS();
983	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
984	return false;
985
986	/*
987	* Yield the mutex.
988	*/
989	#ifdef VBOX_STRICT
990	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
991	#endif
992	ASMAtomicIncU32(&pGMM->cMtxContenders);
993	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
994
995	RTThreadYield();
996
997	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
998	*puLockNanoTS = RTTimeSystemNanoTS();
999	ASMAtomicDecU32(&pGMM->cMtxContenders);
1000	#ifdef VBOX_STRICT
1001	pGMM->hMtxOwner = RTThreadNativeSelf();
1002	#endif
1003
1004	return true;
1005	}
1006
1007
1008	/**
1009	* Acquires a chunk lock.
1010	*
1011	* The caller must own the giant lock.
1012	*
1013	* @returns Assert status code from RTSemFastMutexRequest.
1014	* @param pMtxState The chunk mutex state info. (Avoids
1015	* passing the same flags and stuff around
1016	* for subsequent release and drop-giant
1017	* calls.)
1018	* @param pGMM Pointer to the GMM instance.
1019	* @param pChunk Pointer to the chunk.
1020	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
1021	*/
1022	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
1023	{
1024	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
1025	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1026
1027	pMtxState->pGMM = pGMM;
1028	pMtxState->fFlags = (uint8_t)fFlags;
1029
1030	/*
1031	* Get the lock index and reference the lock.
1032	*/
1033	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
1034	uint32_t iChunkMtx = pChunk->iChunkMtx;
1035	if (iChunkMtx == UINT8_MAX)
1036	{
1037	iChunkMtx = pGMM->iNextChunkMtx++;
1038	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1039
1040	/* Try get an unused one... */
1041	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1042	{
1043	iChunkMtx = pGMM->iNextChunkMtx++;
1044	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1045	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1046	{
1047	iChunkMtx = pGMM->iNextChunkMtx++;
1048	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1049	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1050	{
1051	iChunkMtx = pGMM->iNextChunkMtx++;
1052	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1053	}
1054	}
1055	}
1056
1057	pChunk->iChunkMtx = iChunkMtx;
1058	}
1059	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1060	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1061	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1062
1063	/*
1064	* Drop the giant?
1065	*/
1066	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1067	{
1068	/** @todo GMM life cycle cleanup (we may race someone
1069	* destroying and cleaning up GMM)? */
1070	gmmR0MutexRelease(pGMM);
1071	}
1072
1073	/*
1074	* Take the chunk mutex.
1075	*/
1076	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1077	AssertRC(rc);
1078	return rc;
1079	}
1080
1081
1082	/**
1083	* Releases the GMM giant lock.
1084	*
1085	* @returns Assert status code from RTSemFastMutexRequest.
1086	* @param pGMM Pointer to the GMM instance.
1087	* @param pChunk Pointer to the chunk if it's still
1088	* alive, NULL if it isn't. This is used to deassociate
1089	* the chunk from the mutex on the way out so a new one
1090	* can be selected next time, thus avoiding contented
1091	* mutexes.
1092	*/
1093	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1094	{
1095	PGMM pGMM = pMtxState->pGMM;
1096
1097	/*
1098	* Release the chunk mutex and reacquire the giant if requested.
1099	*/
1100	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1101	AssertRC(rc);
1102	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1103	rc = gmmR0MutexAcquire(pGMM);
1104	else
1105	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1106
1107	/*
1108	* Drop the chunk mutex user reference and deassociate it from the chunk
1109	* when possible.
1110	*/
1111	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1112	&& pChunk
1113	&& RT_SUCCESS(rc) )
1114	{
1115	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1116	pChunk->iChunkMtx = UINT8_MAX;
1117	else
1118	{
1119	rc = gmmR0MutexAcquire(pGMM);
1120	if (RT_SUCCESS(rc))
1121	{
1122	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1123	pChunk->iChunkMtx = UINT8_MAX;
1124	rc = gmmR0MutexRelease(pGMM);
1125	}
1126	}
1127	}
1128
1129	pMtxState->pGMM = NULL;
1130	return rc;
1131	}
1132
1133
1134	/**
1135	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1136	* chunk locked.
1137	*
1138	* This only works if gmmR0ChunkMutexAcquire was called with
1139	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1140	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1141	*
1142	* @returns VBox status code (assuming success is ok).
1143	* @param pMtxState Pointer to the chunk mutex state.
1144	*/
1145	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1146	{
1147	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1148	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1149	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1150	/** @todo GMM life cycle cleanup (we may race someone
1151	* destroying and cleaning up GMM)? */
1152	return gmmR0MutexRelease(pMtxState->pGMM);
1153	}
1154
1155
1156	/**
1157	* For experimenting with NUMA affinity and such.
1158	*
1159	* @returns The current NUMA Node ID.
1160	*/
1161	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1162	{
1163	#if 1
1164	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1165	#else
1166	return RTMpCpuId() / 16;
1167	#endif
1168	}
1169
1170
1171
1172	/**
1173	* Cleans up when a VM is terminating.
1174	*
1175	* @param pGVM Pointer to the Global VM structure.
1176	*/
1177	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1178	{
1179	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1180
1181	PGMM pGMM;
1182	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1183
1184	#ifdef VBOX_WITH_PAGE_SHARING
1185	/*
1186	* Clean up all registered shared modules first.
1187	*/
1188	gmmR0SharedModuleCleanup(pGMM, pGVM);
1189	#endif
1190
1191	gmmR0MutexAcquire(pGMM);
1192	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1193	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1194
1195	/*
1196	* The policy is 'INVALID' until the initial reservation
1197	* request has been serviced.
1198	*/
1199	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1200	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1201	{
1202	/*
1203	* If it's the last VM around, we can skip walking all the chunk looking
1204	* for the pages owned by this VM and instead flush the whole shebang.
1205	*
1206	* This takes care of the eventuality that a VM has left shared page
1207	* references behind (shouldn't happen of course, but you never know).
1208	*/
1209	Assert(pGMM->cRegisteredVMs);
1210	pGMM->cRegisteredVMs--;
1211
1212	/*
1213	* Walk the entire pool looking for pages that belong to this VM
1214	* and leftover mappings. (This'll only catch private pages,
1215	* shared pages will be 'left behind'.)
1216	*/
1217	/** @todo r=bird: This scanning+freeing could be optimized in bound mode! */
1218	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1219
1220	unsigned iCountDown = 64;
1221	bool fRedoFromStart;
1222	PGMMCHUNK pChunk;
1223	do
1224	{
1225	fRedoFromStart = false;
1226	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1227	{
1228	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1229	if ( ( !pGMM->fBoundMemoryMode
1230	\|\| pChunk->hGVM == pGVM->hSelf)
1231	&& gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1232	{
1233	/* We left the giant mutex, so reset the yield counters. */
1234	uLockNanoTS = RTTimeSystemNanoTS();
1235	iCountDown = 64;
1236	}
1237	else
1238	{
1239	/* Didn't leave it, so do normal yielding. */
1240	if (!iCountDown)
1241	gmmR0MutexYield(pGMM, &uLockNanoTS);
1242	else
1243	iCountDown--;
1244	}
1245	if (pGMM->cFreedChunks != cFreeChunksOld)
1246	{
1247	fRedoFromStart = true;
1248	break;
1249	}
1250	}
1251	} while (fRedoFromStart);
1252
1253	if (pGVM->gmm.s.Stats.cPrivatePages)
1254	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1255
1256	pGMM->cAllocatedPages -= cPrivatePages;
1257
1258	/*
1259	* Free empty chunks.
1260	*/
1261	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1262	do
1263	{
1264	fRedoFromStart = false;
1265	iCountDown = 10240;
1266	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1267	while (pChunk)
1268	{
1269	PGMMCHUNK pNext = pChunk->pFreeNext;
1270	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1271	if ( !pGMM->fBoundMemoryMode
1272	\|\| pChunk->hGVM == pGVM->hSelf)
1273	{
1274	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1275	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1276	{
1277	/* We've left the giant mutex, restart? (+1 for our unlink) */
1278	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1279	if (fRedoFromStart)
1280	break;
1281	uLockNanoTS = RTTimeSystemNanoTS();
1282	iCountDown = 10240;
1283	}
1284	}
1285
1286	/* Advance and maybe yield the lock. */
1287	pChunk = pNext;
1288	if (--iCountDown == 0)
1289	{
1290	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1291	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1292	&& pPrivateSet->idGeneration != idGenerationOld;
1293	if (fRedoFromStart)
1294	break;
1295	iCountDown = 10240;
1296	}
1297	}
1298	} while (fRedoFromStart);
1299
1300	/*
1301	* Account for shared pages that weren't freed.
1302	*/
1303	if (pGVM->gmm.s.Stats.cSharedPages)
1304	{
1305	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1306	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1307	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1308	}
1309
1310	/*
1311	* Clean up balloon statistics in case the VM process crashed.
1312	*/
1313	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1314	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1315
1316	/*
1317	* Update the over-commitment management statistics.
1318	*/
1319	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1320	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1321	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1322	switch (pGVM->gmm.s.Stats.enmPolicy)
1323	{
1324	case GMMOCPOLICY_NO_OC:
1325	break;
1326	default:
1327	/** @todo Update GMM->cOverCommittedPages */
1328	break;
1329	}
1330	}
1331
1332	/* zap the GVM data. */
1333	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1334	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1335	pGVM->gmm.s.Stats.fMayAllocate = false;
1336
1337	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1338	gmmR0MutexRelease(pGMM);
1339
1340	LogFlow(("GMMR0CleanupVM: returns\n"));
1341	}
1342
1343
1344	/**
1345	* Scan one chunk for private pages belonging to the specified VM.
1346	*
1347	* @note This function may drop the giant mutex!
1348	*
1349	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1350	* we didn't.
1351	* @param pGMM Pointer to the GMM instance.
1352	* @param pGVM The global VM handle.
1353	* @param pChunk The chunk to scan.
1354	*/
1355	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1356	{
1357	Assert(!pGMM->fBoundMemoryMode \|\| pChunk->hGVM == pGVM->hSelf);
1358
1359	/*
1360	* Look for pages belonging to the VM.
1361	* (Perform some internal checks while we're scanning.)
1362	*/
1363	#ifndef VBOX_STRICT
1364	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1365	#endif
1366	{
1367	unsigned cPrivate = 0;
1368	unsigned cShared = 0;
1369	unsigned cFree = 0;
1370
1371	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1372
1373	uint16_t hGVM = pGVM->hSelf;
1374	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1375	while (iPage-- > 0)
1376	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1377	{
1378	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1379	{
1380	/*
1381	* Free the page.
1382	*
1383	* The reason for not using gmmR0FreePrivatePage here is that we
1384	* must not cause the chunk to be freed from under us - we're in
1385	* an AVL tree walk here.
1386	*/
1387	pChunk->aPages[iPage].u = 0;
1388	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1389	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1390	pChunk->iFreeHead = iPage;
1391	pChunk->cPrivate--;
1392	pChunk->cFree++;
1393	pGVM->gmm.s.Stats.cPrivatePages--;
1394	cFree++;
1395	}
1396	else
1397	cPrivate++;
1398	}
1399	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1400	cFree++;
1401	else
1402	cShared++;
1403
1404	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1405
1406	/*
1407	* Did it add up?
1408	*/
1409	if (RT_UNLIKELY( pChunk->cFree != cFree
1410	\|\| pChunk->cPrivate != cPrivate
1411	\|\| pChunk->cShared != cShared))
1412	{
1413	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1414	pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1415	pChunk->cFree = cFree;
1416	pChunk->cPrivate = cPrivate;
1417	pChunk->cShared = cShared;
1418	}
1419	}
1420
1421	/*
1422	* If not in bound memory mode, we should reset the hGVM field
1423	* if it has our handle in it.
1424	*/
1425	if (pChunk->hGVM == pGVM->hSelf)
1426	{
1427	if (!g_pGMM->fBoundMemoryMode)
1428	pChunk->hGVM = NIL_GVM_HANDLE;
1429	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1430	{
1431	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1432	pChunk, pChunk->Core.Key, pChunk->cFree);
1433	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1434
1435	gmmR0UnlinkChunk(pChunk);
1436	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1437	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1438	}
1439	}
1440
1441	/*
1442	* Look for a mapping belonging to the terminating VM.
1443	*/
1444	GMMR0CHUNKMTXSTATE MtxState;
1445	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1446	unsigned cMappings = pChunk->cMappingsX;
1447	for (unsigned i = 0; i < cMappings; i++)
1448	if (pChunk->paMappingsX[i].pGVM == pGVM)
1449	{
1450	gmmR0ChunkMutexDropGiant(&MtxState);
1451
1452	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1453
1454	cMappings--;
1455	if (i < cMappings)
1456	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1457	pChunk->paMappingsX[cMappings].pGVM = NULL;
1458	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1459	Assert(pChunk->cMappingsX - 1U == cMappings);
1460	pChunk->cMappingsX = cMappings;
1461
1462	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1463	if (RT_FAILURE(rc))
1464	{
1465	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1466	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1467	AssertRC(rc);
1468	}
1469
1470	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1471	return true;
1472	}
1473
1474	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1475	return false;
1476	}
1477
1478
1479	/**
1480	* The initial resource reservations.
1481	*
1482	* This will make memory reservations according to policy and priority. If there aren't
1483	* sufficient resources available to sustain the VM this function will fail and all
1484	* future allocations requests will fail as well.
1485	*
1486	* These are just the initial reservations made very very early during the VM creation
1487	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1488	* ring-3 init has completed.
1489	*
1490	* @returns VBox status code.
1491	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1492	* @retval VERR_GMM_
1493	*
1494	* @param pVM Pointer to the VM.
1495	* @param idCpu The VCPU id.
1496	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1497	* This does not include MMIO2 and similar.
1498	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1499	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1500	* hyper heap, MMIO2 and similar.
1501	* @param enmPolicy The OC policy to use on this VM.
1502	* @param enmPriority The priority in an out-of-memory situation.
1503	*
1504	* @thread The creator thread / EMT.
1505	*/
1506	GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1507	GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1508	{
1509	LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1510	pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1511
1512	/*
1513	* Validate, get basics and take the semaphore.
1514	*/
1515	PGMM pGMM;
1516	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1517	PGVM pGVM;
1518	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1519	if (RT_FAILURE(rc))
1520	return rc;
1521
1522	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1523	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1524	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1525	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1526	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1527
1528	gmmR0MutexAcquire(pGMM);
1529	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1530	{
1531	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1532	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1533	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1534	{
1535	/*
1536	* Check if we can accommodate this.
1537	*/
1538	/* ... later ... */
1539	if (RT_SUCCESS(rc))
1540	{
1541	/*
1542	* Update the records.
1543	*/
1544	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1545	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1546	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1547	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1548	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1549	pGVM->gmm.s.Stats.fMayAllocate = true;
1550
1551	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1552	pGMM->cRegisteredVMs++;
1553	}
1554	}
1555	else
1556	rc = VERR_WRONG_ORDER;
1557	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1558	}
1559	else
1560	rc = VERR_GMM_IS_NOT_SANE;
1561	gmmR0MutexRelease(pGMM);
1562	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1563	return rc;
1564	}
1565
1566
1567	/**
1568	* VMMR0 request wrapper for GMMR0InitialReservation.
1569	*
1570	* @returns see GMMR0InitialReservation.
1571	* @param pVM Pointer to the VM.
1572	* @param idCpu The VCPU id.
1573	* @param pReq Pointer to the request packet.
1574	*/
1575	GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1576	{
1577	/*
1578	* Validate input and pass it on.
1579	*/
1580	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1581	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1582	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1583
1584	return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1585	}
1586
1587
1588	/**
1589	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1590	*
1591	* @returns VBox status code.
1592	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1593	*
1594	* @param pVM Pointer to the VM.
1595	* @param idCpu The VCPU id.
1596	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1597	* This does not include MMIO2 and similar.
1598	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1599	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1600	* hyper heap, MMIO2 and similar.
1601	*
1602	* @thread EMT.
1603	*/
1604	GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1605	{
1606	LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1607	pVM, cBasePages, cShadowPages, cFixedPages));
1608
1609	/*
1610	* Validate, get basics and take the semaphore.
1611	*/
1612	PGMM pGMM;
1613	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1614	PGVM pGVM;
1615	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1616	if (RT_FAILURE(rc))
1617	return rc;
1618
1619	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1620	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1621	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1622
1623	gmmR0MutexAcquire(pGMM);
1624	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1625	{
1626	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1627	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1628	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1629	{
1630	/*
1631	* Check if we can accommodate this.
1632	*/
1633	/* ... later ... */
1634	if (RT_SUCCESS(rc))
1635	{
1636	/*
1637	* Update the records.
1638	*/
1639	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1640	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1641	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1642	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1643
1644	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1645	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1646	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1647	}
1648	}
1649	else
1650	rc = VERR_WRONG_ORDER;
1651	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1652	}
1653	else
1654	rc = VERR_GMM_IS_NOT_SANE;
1655	gmmR0MutexRelease(pGMM);
1656	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1657	return rc;
1658	}
1659
1660
1661	/**
1662	* VMMR0 request wrapper for GMMR0UpdateReservation.
1663	*
1664	* @returns see GMMR0UpdateReservation.
1665	* @param pVM Pointer to the VM.
1666	* @param idCpu The VCPU id.
1667	* @param pReq Pointer to the request packet.
1668	*/
1669	GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1670	{
1671	/*
1672	* Validate input and pass it on.
1673	*/
1674	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1675	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1676	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1677
1678	return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1679	}
1680
1681	#ifdef GMMR0_WITH_SANITY_CHECK
1682
1683	/**
1684	* Performs sanity checks on a free set.
1685	*
1686	* @returns Error count.
1687	*
1688	* @param pGMM Pointer to the GMM instance.
1689	* @param pSet Pointer to the set.
1690	* @param pszSetName The set name.
1691	* @param pszFunction The function from which it was called.
1692	* @param uLine The line number.
1693	*/
1694	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1695	const char *pszFunction, unsigned uLineNo)
1696	{
1697	uint32_t cErrors = 0;
1698
1699	/*
1700	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1701	*/
1702	uint32_t cPages = 0;
1703	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1704	{
1705	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1706	{
1707	/** @todo check that the chunk is hash into the right set. */
1708	cPages += pCur->cFree;
1709	}
1710	}
1711	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1712	{
1713	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1714	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1715	cErrors++;
1716	}
1717
1718	return cErrors;
1719	}
1720
1721
1722	/**
1723	* Performs some sanity checks on the GMM while owning lock.
1724	*
1725	* @returns Error count.
1726	*
1727	* @param pGMM Pointer to the GMM instance.
1728	* @param pszFunction The function from which it is called.
1729	* @param uLineNo The line number.
1730	*/
1731	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1732	{
1733	uint32_t cErrors = 0;
1734
1735	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1736	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1737	/** @todo add more sanity checks. */
1738
1739	return cErrors;
1740	}
1741
1742	#endif /* GMMR0_WITH_SANITY_CHECK */
1743
1744	/**
1745	* Looks up a chunk in the tree and fill in the TLB entry for it.
1746	*
1747	* This is not expected to fail and will bitch if it does.
1748	*
1749	* @returns Pointer to the allocation chunk, NULL if not found.
1750	* @param pGMM Pointer to the GMM instance.
1751	* @param idChunk The ID of the chunk to find.
1752	* @param pTlbe Pointer to the TLB entry.
1753	*/
1754	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1755	{
1756	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1757	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1758	pTlbe->idChunk = idChunk;
1759	pTlbe->pChunk = pChunk;
1760	return pChunk;
1761	}
1762
1763
1764	/**
1765	* Finds a allocation chunk.
1766	*
1767	* This is not expected to fail and will bitch if it does.
1768	*
1769	* @returns Pointer to the allocation chunk, NULL if not found.
1770	* @param pGMM Pointer to the GMM instance.
1771	* @param idChunk The ID of the chunk to find.
1772	*/
1773	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1774	{
1775	/*
1776	* Do a TLB lookup, branch if not in the TLB.
1777	*/
1778	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1779	if ( pTlbe->idChunk != idChunk
1780	\|\| !pTlbe->pChunk)
1781	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1782	return pTlbe->pChunk;
1783	}
1784
1785
1786	/**
1787	* Finds a page.
1788	*
1789	* This is not expected to fail and will bitch if it does.
1790	*
1791	* @returns Pointer to the page, NULL if not found.
1792	* @param pGMM Pointer to the GMM instance.
1793	* @param idPage The ID of the page to find.
1794	*/
1795	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1796	{
1797	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1798	if (RT_LIKELY(pChunk))
1799	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1800	return NULL;
1801	}
1802
1803
1804	/**
1805	* Gets the host physical address for a page given by it's ID.
1806	*
1807	* @returns The host physical address or NIL_RTHCPHYS.
1808	* @param pGMM Pointer to the GMM instance.
1809	* @param idPage The ID of the page to find.
1810	*/
1811	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1812	{
1813	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1814	if (RT_LIKELY(pChunk))
1815	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1816	return NIL_RTHCPHYS;
1817	}
1818
1819
1820	/**
1821	* Selects the appropriate free list given the number of free pages.
1822	*
1823	* @returns Free list index.
1824	* @param cFree The number of free pages in the chunk.
1825	*/
1826	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1827	{
1828	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1829	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1830	("%d (%u)\n", iList, cFree));
1831	return iList;
1832	}
1833
1834
1835	/**
1836	* Unlinks the chunk from the free list it's currently on (if any).
1837	*
1838	* @param pChunk The allocation chunk.
1839	*/
1840	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1841	{
1842	PGMMCHUNKFREESET pSet = pChunk->pSet;
1843	if (RT_LIKELY(pSet))
1844	{
1845	pSet->cFreePages -= pChunk->cFree;
1846	pSet->idGeneration++;
1847
1848	PGMMCHUNK pPrev = pChunk->pFreePrev;
1849	PGMMCHUNK pNext = pChunk->pFreeNext;
1850	if (pPrev)
1851	pPrev->pFreeNext = pNext;
1852	else
1853	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1854	if (pNext)
1855	pNext->pFreePrev = pPrev;
1856
1857	pChunk->pSet = NULL;
1858	pChunk->pFreeNext = NULL;
1859	pChunk->pFreePrev = NULL;
1860	}
1861	else
1862	{
1863	Assert(!pChunk->pFreeNext);
1864	Assert(!pChunk->pFreePrev);
1865	Assert(!pChunk->cFree);
1866	}
1867	}
1868
1869
1870	/**
1871	* Links the chunk onto the appropriate free list in the specified free set.
1872	*
1873	* If no free entries, it's not linked into any list.
1874	*
1875	* @param pChunk The allocation chunk.
1876	* @param pSet The free set.
1877	*/
1878	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1879	{
1880	Assert(!pChunk->pSet);
1881	Assert(!pChunk->pFreeNext);
1882	Assert(!pChunk->pFreePrev);
1883
1884	if (pChunk->cFree > 0)
1885	{
1886	pChunk->pSet = pSet;
1887	pChunk->pFreePrev = NULL;
1888	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1889	pChunk->pFreeNext = pSet->apLists[iList];
1890	if (pChunk->pFreeNext)
1891	pChunk->pFreeNext->pFreePrev = pChunk;
1892	pSet->apLists[iList] = pChunk;
1893
1894	pSet->cFreePages += pChunk->cFree;
1895	pSet->idGeneration++;
1896	}
1897	}
1898
1899
1900	/**
1901	* Links the chunk onto the appropriate free list in the specified free set.
1902	*
1903	* If no free entries, it's not linked into any list.
1904	*
1905	* @param pChunk The allocation chunk.
1906	*/
1907	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1908	{
1909	PGMMCHUNKFREESET pSet;
1910	if (pGMM->fBoundMemoryMode)
1911	pSet = &pGVM->gmm.s.Private;
1912	else if (pChunk->cShared)
1913	pSet = &pGMM->Shared;
1914	else
1915	pSet = &pGMM->PrivateX;
1916	gmmR0LinkChunk(pChunk, pSet);
1917	}
1918
1919
1920	/**
1921	* Frees a Chunk ID.
1922	*
1923	* @param pGMM Pointer to the GMM instance.
1924	* @param idChunk The Chunk ID to free.
1925	*/
1926	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1927	{
1928	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1929	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1930	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1931	}
1932
1933
1934	/**
1935	* Allocates a new Chunk ID.
1936	*
1937	* @returns The Chunk ID.
1938	* @param pGMM Pointer to the GMM instance.
1939	*/
1940	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1941	{
1942	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1943	AssertCompile(NIL_GMM_CHUNKID == 0);
1944
1945	/*
1946	* Try the next sequential one.
1947	*/
1948	int32_t idChunk = ++pGMM->idChunkPrev;
1949	#if 0 /** @todo enable this code */
1950	if ( idChunk <= GMM_CHUNKID_LAST
1951	&& idChunk > NIL_GMM_CHUNKID
1952	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
1953	return idChunk;
1954	#endif
1955
1956	/*
1957	* Scan sequentially from the last one.
1958	*/
1959	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
1960	&& idChunk > NIL_GMM_CHUNKID)
1961	{
1962	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
1963	if (idChunk > NIL_GMM_CHUNKID)
1964	{
1965	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1966	return pGMM->idChunkPrev = idChunk;
1967	}
1968	}
1969
1970	/*
1971	* Ok, scan from the start.
1972	* We're not racing anyone, so there is no need to expect failures or have restart loops.
1973	*/
1974	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
1975	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
1976	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1977
1978	return pGMM->idChunkPrev = idChunk;
1979	}
1980
1981
1982	/**
1983	* Allocates one private page.
1984	*
1985	* Worker for gmmR0AllocatePages.
1986	*
1987	* @param pChunk The chunk to allocate it from.
1988	* @param hGVM The GVM handle of the VM requesting memory.
1989	* @param pPageDesc The page descriptor.
1990	*/
1991	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
1992	{
1993	/* update the chunk stats. */
1994	if (pChunk->hGVM == NIL_GVM_HANDLE)
1995	pChunk->hGVM = hGVM;
1996	Assert(pChunk->cFree);
1997	pChunk->cFree--;
1998	pChunk->cPrivate++;
1999
2000	/* unlink the first free page. */
2001	const uint32_t iPage = pChunk->iFreeHead;
2002	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
2003	PGMMPAGE pPage = &pChunk->aPages[iPage];
2004	Assert(GMM_PAGE_IS_FREE(pPage));
2005	pChunk->iFreeHead = pPage->Free.iNext;
2006	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
2007	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
2008	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
2009
2010	/* make the page private. */
2011	pPage->u = 0;
2012	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
2013	pPage->Private.hGVM = hGVM;
2014	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
2015	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
2016	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
2017	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
2018	else
2019	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
2020
2021	/* update the page descriptor. */
2022	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
2023	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
2024	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
2025	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
2026	}
2027
2028
2029	/**
2030	* Picks the free pages from a chunk.
2031	*
2032	* @returns The new page descriptor table index.
2033	* @param pGMM Pointer to the GMM instance data.
2034	* @param hGVM The global VM handle.
2035	* @param pChunk The chunk.
2036	* @param iPage The current page descriptor table index.
2037	* @param cPages The total number of pages to allocate.
2038	* @param paPages The page descriptor table (input + ouput).
2039	*/
2040	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
2041	PGMMPAGEDESC paPages)
2042	{
2043	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
2044	gmmR0UnlinkChunk(pChunk);
2045
2046	for (; pChunk->cFree && iPage < cPages; iPage++)
2047	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
2048
2049	gmmR0LinkChunk(pChunk, pSet);
2050	return iPage;
2051	}
2052
2053
2054	/**
2055	* Registers a new chunk of memory.
2056	*
2057	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2058	*
2059	* @returns VBox status code. On success, the giant GMM lock will be held, the
2060	* caller must release it (ugly).
2061	* @param pGMM Pointer to the GMM instance.
2062	* @param pSet Pointer to the set.
2063	* @param MemObj The memory object for the chunk.
2064	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2065	* affinity.
2066	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2067	* @param ppChunk Chunk address (out). Optional.
2068	*
2069	* @remarks The caller must not own the giant GMM mutex.
2070	* The giant GMM mutex will be acquired and returned acquired in
2071	* the success path. On failure, no locks will be held.
2072	*/
2073	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2074	PGMMCHUNK *ppChunk)
2075	{
2076	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2077	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2078	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2079
2080	int rc;
2081	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2082	if (pChunk)
2083	{
2084	/*
2085	* Initialize it.
2086	*/
2087	pChunk->hMemObj = MemObj;
2088	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2089	pChunk->hGVM = hGVM;
2090	/pChunk->iFreeHead = 0;/
2091	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2092	pChunk->iChunkMtx = UINT8_MAX;
2093	pChunk->fFlags = fChunkFlags;
2094	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2095	{
2096	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2097	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2098	}
2099	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2100	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2101
2102	/*
2103	* Allocate a Chunk ID and insert it into the tree.
2104	* This has to be done behind the mutex of course.
2105	*/
2106	rc = gmmR0MutexAcquire(pGMM);
2107	if (RT_SUCCESS(rc))
2108	{
2109	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2110	{
2111	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2112	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2113	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2114	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2115	{
2116	pGMM->cChunks++;
2117	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2118	gmmR0LinkChunk(pChunk, pSet);
2119	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2120
2121	if (ppChunk)
2122	*ppChunk = pChunk;
2123	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2124	return VINF_SUCCESS;
2125	}
2126
2127	/* bail out */
2128	rc = VERR_GMM_CHUNK_INSERT;
2129	}
2130	else
2131	rc = VERR_GMM_IS_NOT_SANE;
2132	gmmR0MutexRelease(pGMM);
2133	}
2134
2135	RTMemFree(pChunk);
2136	}
2137	else
2138	rc = VERR_NO_MEMORY;
2139	return rc;
2140	}
2141
2142
2143	/**
2144	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2145	* what's remaining to the specified free set.
2146	*
2147	* @note This will leave the giant mutex while allocating the new chunk!
2148	*
2149	* @returns VBox status code.
2150	* @param pGMM Pointer to the GMM instance data.
2151	* @param pGVM Pointer to the kernel-only VM instace data.
2152	* @param pSet Pointer to the free set.
2153	* @param cPages The number of pages requested.
2154	* @param paPages The page descriptor table (input + output).
2155	* @param piPage The pointer to the page descriptor table index
2156	* variable. This will be updated.
2157	*/
2158	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2159	PGMMPAGEDESC paPages, uint32_t *piPage)
2160	{
2161	gmmR0MutexRelease(pGMM);
2162
2163	RTR0MEMOBJ hMemObj;
2164	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2165	if (RT_SUCCESS(rc))
2166	{
2167	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2168	* free pages first and then unchaining them right afterwards. Instead
2169	* do as much work as possible without holding the giant lock. */
2170	PGMMCHUNK pChunk;
2171	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2172	if (RT_SUCCESS(rc))
2173	{
2174	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2175	return VINF_SUCCESS;
2176	}
2177
2178	/* bail out */
2179	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2180	}
2181
2182	int rc2 = gmmR0MutexAcquire(pGMM);
2183	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2184	return rc;
2185
2186	}
2187
2188
2189	/**
2190	* As a last restort we'll pick any page we can get.
2191	*
2192	* @returns The new page descriptor table index.
2193	* @param pSet The set to pick from.
2194	* @param pGVM Pointer to the global VM structure.
2195	* @param iPage The current page descriptor table index.
2196	* @param cPages The total number of pages to allocate.
2197	* @param paPages The page descriptor table (input + ouput).
2198	*/
2199	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2200	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2201	{
2202	unsigned iList = RT_ELEMENTS(pSet->apLists);
2203	while (iList-- > 0)
2204	{
2205	PGMMCHUNK pChunk = pSet->apLists[iList];
2206	while (pChunk)
2207	{
2208	PGMMCHUNK pNext = pChunk->pFreeNext;
2209
2210	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2211	if (iPage >= cPages)
2212	return iPage;
2213
2214	pChunk = pNext;
2215	}
2216	}
2217	return iPage;
2218	}
2219
2220
2221	/**
2222	* Pick pages from empty chunks on the same NUMA node.
2223	*
2224	* @returns The new page descriptor table index.
2225	* @param pSet The set to pick from.
2226	* @param pGVM Pointer to the global VM structure.
2227	* @param iPage The current page descriptor table index.
2228	* @param cPages The total number of pages to allocate.
2229	* @param paPages The page descriptor table (input + ouput).
2230	*/
2231	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2232	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2233	{
2234	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2235	if (pChunk)
2236	{
2237	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2238	while (pChunk)
2239	{
2240	PGMMCHUNK pNext = pChunk->pFreeNext;
2241
2242	if (pChunk->idNumaNode == idNumaNode)
2243	{
2244	pChunk->hGVM = pGVM->hSelf;
2245	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2246	if (iPage >= cPages)
2247	{
2248	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2249	return iPage;
2250	}
2251	}
2252
2253	pChunk = pNext;
2254	}
2255	}
2256	return iPage;
2257	}
2258
2259
2260	/**
2261	* Pick pages from non-empty chunks on the same NUMA node.
2262	*
2263	* @returns The new page descriptor table index.
2264	* @param pSet The set to pick from.
2265	* @param pGVM Pointer to the global VM structure.
2266	* @param iPage The current page descriptor table index.
2267	* @param cPages The total number of pages to allocate.
2268	* @param paPages The page descriptor table (input + ouput).
2269	*/
2270	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2271	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2272	{
2273	/** @todo start by picking from chunks with about the right size first? */
2274	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2275	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2276	while (iList-- > 0)
2277	{
2278	PGMMCHUNK pChunk = pSet->apLists[iList];
2279	while (pChunk)
2280	{
2281	PGMMCHUNK pNext = pChunk->pFreeNext;
2282
2283	if (pChunk->idNumaNode == idNumaNode)
2284	{
2285	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2286	if (iPage >= cPages)
2287	{
2288	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2289	return iPage;
2290	}
2291	}
2292
2293	pChunk = pNext;
2294	}
2295	}
2296	return iPage;
2297	}
2298
2299
2300	/**
2301	* Pick pages that are in chunks already associated with the VM.
2302	*
2303	* @returns The new page descriptor table index.
2304	* @param pGMM Pointer to the GMM instance data.
2305	* @param pGVM Pointer to the global VM structure.
2306	* @param pSet The set to pick from.
2307	* @param iPage The current page descriptor table index.
2308	* @param cPages The total number of pages to allocate.
2309	* @param paPages The page descriptor table (input + ouput).
2310	*/
2311	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2312	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2313	{
2314	uint16_t const hGVM = pGVM->hSelf;
2315
2316	/* Hint. */
2317	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2318	{
2319	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2320	if (pChunk && pChunk->cFree)
2321	{
2322	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2323	if (iPage >= cPages)
2324	return iPage;
2325	}
2326	}
2327
2328	/* Scan. */
2329	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2330	{
2331	PGMMCHUNK pChunk = pSet->apLists[iList];
2332	while (pChunk)
2333	{
2334	PGMMCHUNK pNext = pChunk->pFreeNext;
2335
2336	if (pChunk->hGVM == hGVM)
2337	{
2338	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2339	if (iPage >= cPages)
2340	{
2341	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2342	return iPage;
2343	}
2344	}
2345
2346	pChunk = pNext;
2347	}
2348	}
2349	return iPage;
2350	}
2351
2352
2353
2354	/**
2355	* Pick pages in bound memory mode.
2356	*
2357	* @returns The new page descriptor table index.
2358	* @param pGVM Pointer to the global VM structure.
2359	* @param iPage The current page descriptor table index.
2360	* @param cPages The total number of pages to allocate.
2361	* @param paPages The page descriptor table (input + ouput).
2362	*/
2363	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2364	{
2365	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2366	{
2367	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2368	while (pChunk)
2369	{
2370	Assert(pChunk->hGVM == pGVM->hSelf);
2371	PGMMCHUNK pNext = pChunk->pFreeNext;
2372	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2373	if (iPage >= cPages)
2374	return iPage;
2375	pChunk = pNext;
2376	}
2377	}
2378	return iPage;
2379	}
2380
2381
2382	/**
2383	* Checks if we should start picking pages from chunks of other VMs.
2384	*
2385	* @returns @c true if we should, @c false if we should first try allocate more
2386	* chunks.
2387	*/
2388	static bool gmmR0ShouldAllocatePagesInOtherChunks(PGVM pGVM)
2389	{
2390	/*
2391	* Don't allocate a new chunk if we're
2392	*/
2393	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2394	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2395	- pGVM->gmm.s.Stats.cBalloonedPages
2396	/** @todo what about shared pages? */;
2397	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2398	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2399	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2400	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2401	return true;
2402	/** @todo make the threshold configurable, also test the code to see if
2403	* this ever kicks in (we might be reserving too much or smth). */
2404
2405	/*
2406	* Check how close we're to the max memory limit and how many fragments
2407	* there are?...
2408	*/
2409	/** @todo. */
2410
2411	return false;
2412	}
2413
2414
2415	/**
2416	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2417	*
2418	* @returns VBox status code:
2419	* @retval VINF_SUCCESS on success.
2420	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2421	* gmmR0AllocateMoreChunks is necessary.
2422	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2423	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2424	* that is we're trying to allocate more than we've reserved.
2425	*
2426	* @param pGMM Pointer to the GMM instance data.
2427	* @param pGVM Pointer to the VM.
2428	* @param cPages The number of pages to allocate.
2429	* @param paPages Pointer to the page descriptors.
2430	* See GMMPAGEDESC for details on what is expected on input.
2431	* @param enmAccount The account to charge.
2432	*
2433	* @remarks Call takes the giant GMM lock.
2434	*/
2435	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2436	{
2437	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2438
2439	/*
2440	* Check allocation limits.
2441	*/
2442	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2443	return VERR_GMM_HIT_GLOBAL_LIMIT;
2444
2445	switch (enmAccount)
2446	{
2447	case GMMACCOUNT_BASE:
2448	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2449	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2450	{
2451	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2452	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2453	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2454	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2455	}
2456	break;
2457	case GMMACCOUNT_SHADOW:
2458	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2459	{
2460	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2461	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2462	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2463	}
2464	break;
2465	case GMMACCOUNT_FIXED:
2466	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2467	{
2468	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2469	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2470	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2471	}
2472	break;
2473	default:
2474	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2475	}
2476
2477	/*
2478	* If we're in legacy memory mode, it's easy to figure if we have
2479	* sufficient number of pages up-front.
2480	*/
2481	if ( pGMM->fLegacyAllocationMode
2482	&& pGVM->gmm.s.Private.cFreePages < cPages)
2483	{
2484	Assert(pGMM->fBoundMemoryMode);
2485	return VERR_GMM_SEED_ME;
2486	}
2487
2488	/*
2489	* Update the accounts before we proceed because we might be leaving the
2490	* protection of the global mutex and thus run the risk of permitting
2491	* too much memory to be allocated.
2492	*/
2493	switch (enmAccount)
2494	{
2495	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2496	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2497	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2498	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2499	}
2500	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2501	pGMM->cAllocatedPages += cPages;
2502
2503	/*
2504	* Part two of it's-easy-in-legacy-memory-mode.
2505	*/
2506	uint32_t iPage = 0;
2507	if (pGMM->fLegacyAllocationMode)
2508	{
2509	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2510	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2511	return VINF_SUCCESS;
2512	}
2513
2514	/*
2515	* Bound mode is also relatively straightforward.
2516	*/
2517	int rc = VINF_SUCCESS;
2518	if (pGMM->fBoundMemoryMode)
2519	{
2520	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2521	if (iPage < cPages)
2522	do
2523	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2524	while (iPage < cPages && RT_SUCCESS(rc));
2525	}
2526	/*
2527	* Shared mode is trickier as we should try archive the same locality as
2528	* in bound mode, but smartly make use of non-full chunks allocated by
2529	* other VMs if we're low on memory.
2530	*/
2531	else
2532	{
2533	/* Pick the most optimal pages first. */
2534	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2535	if (iPage < cPages)
2536	{
2537	/* Maybe we should try getting pages from chunks "belonging" to
2538	other VMs before allocating more chunks? */
2539	if (gmmR0ShouldAllocatePagesInOtherChunks(pGVM))
2540	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2541
2542	/* Allocate memory from empty chunks. */
2543	if (iPage < cPages)
2544	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2545
2546	/* Grab empty shared chunks. */
2547	if (iPage < cPages)
2548	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2549
2550	/*
2551	* Ok, try allocate new chunks.
2552	*/
2553	if (iPage < cPages)
2554	{
2555	do
2556	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2557	while (iPage < cPages && RT_SUCCESS(rc));
2558
2559	/* If the host is out of memory, take whatever we can get. */
2560	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2561	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2562	{
2563	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2564	if (iPage < cPages)
2565	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2566	AssertRelease(iPage == cPages);
2567	rc = VINF_SUCCESS;
2568	}
2569	}
2570	}
2571	}
2572
2573	/*
2574	* Clean up on failure. Since this is bound to be a low-memory condition
2575	* we will give back any empty chunks that might be hanging around.
2576	*/
2577	if (RT_FAILURE(rc))
2578	{
2579	/* Update the statistics. */
2580	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2581	pGMM->cAllocatedPages -= cPages - iPage;
2582	switch (enmAccount)
2583	{
2584	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2585	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2586	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2587	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2588	}
2589
2590	/* Release the pages. */
2591	while (iPage-- > 0)
2592	{
2593	uint32_t idPage = paPages[iPage].idPage;
2594	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2595	if (RT_LIKELY(pPage))
2596	{
2597	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2598	Assert(pPage->Private.hGVM == pGVM->hSelf);
2599	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2600	}
2601	else
2602	AssertMsgFailed(("idPage=%#x\n", idPage));
2603
2604	paPages[iPage].idPage = NIL_GMM_PAGEID;
2605	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2606	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2607	}
2608
2609	/* Free empty chunks. */
2610	/** @todo */
2611
2612	/* return the fail status on failure */
2613	return rc;
2614	}
2615	return VINF_SUCCESS;
2616	}
2617
2618
2619	/**
2620	* Updates the previous allocations and allocates more pages.
2621	*
2622	* The handy pages are always taken from the 'base' memory account.
2623	* The allocated pages are not cleared and will contains random garbage.
2624	*
2625	* @returns VBox status code:
2626	* @retval VINF_SUCCESS on success.
2627	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2628	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2629	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2630	* private page.
2631	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2632	* shared page.
2633	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2634	* owned by the VM.
2635	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2636	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2637	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2638	* that is we're trying to allocate more than we've reserved.
2639	*
2640	* @param pVM Pointer to the VM.
2641	* @param idCpu The VCPU id.
2642	* @param cPagesToUpdate The number of pages to update (starting from the head).
2643	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2644	* @param paPages The array of page descriptors.
2645	* See GMMPAGEDESC for details on what is expected on input.
2646	* @thread EMT.
2647	*/
2648	GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2649	{
2650	LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2651	pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2652
2653	/*
2654	* Validate, get basics and take the semaphore.
2655	* (This is a relatively busy path, so make predictions where possible.)
2656	*/
2657	PGMM pGMM;
2658	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2659	PGVM pGVM;
2660	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2661	if (RT_FAILURE(rc))
2662	return rc;
2663
2664	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2665	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2666	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2667	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2668	VERR_INVALID_PARAMETER);
2669
2670	unsigned iPage = 0;
2671	for (; iPage < cPagesToUpdate; iPage++)
2672	{
2673	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2674	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2675	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2676	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2677	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2678	VERR_INVALID_PARAMETER);
2679	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2680	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2681	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2682	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2683	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2684	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2685	}
2686
2687	for (; iPage < cPagesToAlloc; iPage++)
2688	{
2689	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2690	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2691	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2692	}
2693
2694	gmmR0MutexAcquire(pGMM);
2695	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2696	{
2697	/* No allocations before the initial reservation has been made! */
2698	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2699	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2700	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2701	{
2702	/*
2703	* Perform the updates.
2704	* Stop on the first error.
2705	*/
2706	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2707	{
2708	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2709	{
2710	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2711	if (RT_LIKELY(pPage))
2712	{
2713	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2714	{
2715	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2716	{
2717	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2718	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2719	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2720	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2721	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2722	/* else: NIL_RTHCPHYS nothing */
2723
2724	paPages[iPage].idPage = NIL_GMM_PAGEID;
2725	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2726	}
2727	else
2728	{
2729	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2730	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2731	rc = VERR_GMM_NOT_PAGE_OWNER;
2732	break;
2733	}
2734	}
2735	else
2736	{
2737	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2738	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2739	break;
2740	}
2741	}
2742	else
2743	{
2744	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2745	rc = VERR_GMM_PAGE_NOT_FOUND;
2746	break;
2747	}
2748	}
2749
2750	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2751	{
2752	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2753	if (RT_LIKELY(pPage))
2754	{
2755	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2756	{
2757	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2758	Assert(pPage->Shared.cRefs);
2759	Assert(pGVM->gmm.s.Stats.cSharedPages);
2760	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2761
2762	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2763	pGVM->gmm.s.Stats.cSharedPages--;
2764	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2765	if (!--pPage->Shared.cRefs)
2766	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2767	else
2768	{
2769	Assert(pGMM->cDuplicatePages);
2770	pGMM->cDuplicatePages--;
2771	}
2772
2773	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2774	}
2775	else
2776	{
2777	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2778	rc = VERR_GMM_PAGE_NOT_SHARED;
2779	break;
2780	}
2781	}
2782	else
2783	{
2784	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2785	rc = VERR_GMM_PAGE_NOT_FOUND;
2786	break;
2787	}
2788	}
2789	} /* for each page to update */
2790
2791	if (RT_SUCCESS(rc) && cPagesToAlloc > 0)
2792	{
2793	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2794	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2795	{
2796	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2797	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2798	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2799	}
2800	#endif
2801
2802	/*
2803	* Join paths with GMMR0AllocatePages for the allocation.
2804	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2805	*/
2806	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2807	}
2808	}
2809	else
2810	rc = VERR_WRONG_ORDER;
2811	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2812	}
2813	else
2814	rc = VERR_GMM_IS_NOT_SANE;
2815	gmmR0MutexRelease(pGMM);
2816	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2817	return rc;
2818	}
2819
2820
2821	/**
2822	* Allocate one or more pages.
2823	*
2824	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2825	* The allocated pages are not cleared and will contain random garbage.
2826	*
2827	* @returns VBox status code:
2828	* @retval VINF_SUCCESS on success.
2829	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2830	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2831	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2832	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2833	* that is we're trying to allocate more than we've reserved.
2834	*
2835	* @param pVM Pointer to the VM.
2836	* @param idCpu The VCPU id.
2837	* @param cPages The number of pages to allocate.
2838	* @param paPages Pointer to the page descriptors.
2839	* See GMMPAGEDESC for details on what is expected on input.
2840	* @param enmAccount The account to charge.
2841	*
2842	* @thread EMT.
2843	*/
2844	GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2845	{
2846	LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2847
2848	/*
2849	* Validate, get basics and take the semaphore.
2850	*/
2851	PGMM pGMM;
2852	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2853	PGVM pGVM;
2854	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2855	if (RT_FAILURE(rc))
2856	return rc;
2857
2858	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2859	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2860	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2861
2862	for (unsigned iPage = 0; iPage < cPages; iPage++)
2863	{
2864	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2865	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2866	\|\| ( enmAccount == GMMACCOUNT_BASE
2867	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2868	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2869	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2870	VERR_INVALID_PARAMETER);
2871	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2872	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2873	}
2874
2875	gmmR0MutexAcquire(pGMM);
2876	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2877	{
2878
2879	/* No allocations before the initial reservation has been made! */
2880	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2881	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2882	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2883	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2884	else
2885	rc = VERR_WRONG_ORDER;
2886	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2887	}
2888	else
2889	rc = VERR_GMM_IS_NOT_SANE;
2890	gmmR0MutexRelease(pGMM);
2891	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2892	return rc;
2893	}
2894
2895
2896	/**
2897	* VMMR0 request wrapper for GMMR0AllocatePages.
2898	*
2899	* @returns see GMMR0AllocatePages.
2900	* @param pVM Pointer to the VM.
2901	* @param idCpu The VCPU id.
2902	* @param pReq Pointer to the request packet.
2903	*/
2904	GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2905	{
2906	/*
2907	* Validate input and pass it on.
2908	*/
2909	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2910	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2911	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2912	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2913	VERR_INVALID_PARAMETER);
2914	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2915	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2916	VERR_INVALID_PARAMETER);
2917
2918	return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
2919	}
2920
2921
2922	/**
2923	* Allocate a large page to represent guest RAM
2924	*
2925	* The allocated pages are not cleared and will contains random garbage.
2926	*
2927	* @returns VBox status code:
2928	* @retval VINF_SUCCESS on success.
2929	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2930	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2931	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2932	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2933	* that is we're trying to allocate more than we've reserved.
2934	* @returns see GMMR0AllocatePages.
2935	* @param pVM Pointer to the VM.
2936	* @param idCpu The VCPU id.
2937	* @param cbPage Large page size.
2938	*/
2939	GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
2940	{
2941	LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
2942
2943	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
2944	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
2945	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
2946
2947	/*
2948	* Validate, get basics and take the semaphore.
2949	*/
2950	PGMM pGMM;
2951	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2952	PGVM pGVM;
2953	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2954	if (RT_FAILURE(rc))
2955	return rc;
2956
2957	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
2958	if (pGMM->fLegacyAllocationMode)
2959	return VERR_NOT_SUPPORTED;
2960
2961	*pHCPhys = NIL_RTHCPHYS;
2962	*pIdPage = NIL_GMM_PAGEID;
2963
2964	gmmR0MutexAcquire(pGMM);
2965	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2966	{
2967	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
2968	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2969	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2970	{
2971	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
2972	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
2973	gmmR0MutexRelease(pGMM);
2974	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2975	}
2976
2977	/*
2978	* Allocate a new large page chunk.
2979	*
2980	* Note! We leave the giant GMM lock temporarily as the allocation might
2981	* take a long time. gmmR0RegisterChunk will retake it (ugly).
2982	*/
2983	AssertCompile(GMM_CHUNK_SIZE == _2M);
2984	gmmR0MutexRelease(pGMM);
2985
2986	RTR0MEMOBJ hMemObj;
2987	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
2988	if (RT_SUCCESS(rc))
2989	{
2990	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
2991	PGMMCHUNK pChunk;
2992	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
2993	if (RT_SUCCESS(rc))
2994	{
2995	/*
2996	* Allocate all the pages in the chunk.
2997	*/
2998	/* Unlink the new chunk from the free list. */
2999	gmmR0UnlinkChunk(pChunk);
3000
3001	/** @todo rewrite this to skip the looping. */
3002	/* Allocate all pages. */
3003	GMMPAGEDESC PageDesc;
3004	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3005
3006	/* Return the first page as we'll use the whole chunk as one big page. */
3007	*pIdPage = PageDesc.idPage;
3008	*pHCPhys = PageDesc.HCPhysGCPhys;
3009
3010	for (unsigned i = 1; i < cPages; i++)
3011	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
3012
3013	/* Update accounting. */
3014	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
3015	pGVM->gmm.s.Stats.cPrivatePages += cPages;
3016	pGMM->cAllocatedPages += cPages;
3017
3018	gmmR0LinkChunk(pChunk, pSet);
3019	gmmR0MutexRelease(pGMM);
3020	}
3021	else
3022	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3023	}
3024	}
3025	else
3026	{
3027	gmmR0MutexRelease(pGMM);
3028	rc = VERR_GMM_IS_NOT_SANE;
3029	}
3030
3031	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
3032	return rc;
3033	}
3034
3035
3036	/**
3037	* Free a large page.
3038	*
3039	* @returns VBox status code:
3040	* @param pVM Pointer to the VM.
3041	* @param idCpu The VCPU id.
3042	* @param idPage The large page id.
3043	*/
3044	GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
3045	{
3046	LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
3047
3048	/*
3049	* Validate, get basics and take the semaphore.
3050	*/
3051	PGMM pGMM;
3052	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3053	PGVM pGVM;
3054	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3055	if (RT_FAILURE(rc))
3056	return rc;
3057
3058	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3059	if (pGMM->fLegacyAllocationMode)
3060	return VERR_NOT_SUPPORTED;
3061
3062	gmmR0MutexAcquire(pGMM);
3063	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3064	{
3065	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3066
3067	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3068	{
3069	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3070	gmmR0MutexRelease(pGMM);
3071	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3072	}
3073
3074	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3075	if (RT_LIKELY( pPage
3076	&& GMM_PAGE_IS_PRIVATE(pPage)))
3077	{
3078	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3079	Assert(pChunk);
3080	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3081	Assert(pChunk->cPrivate > 0);
3082
3083	/* Release the memory immediately. */
3084	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3085
3086	/* Update accounting. */
3087	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3088	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3089	pGMM->cAllocatedPages -= cPages;
3090	}
3091	else
3092	rc = VERR_GMM_PAGE_NOT_FOUND;
3093	}
3094	else
3095	rc = VERR_GMM_IS_NOT_SANE;
3096
3097	gmmR0MutexRelease(pGMM);
3098	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3099	return rc;
3100	}
3101
3102
3103	/**
3104	* VMMR0 request wrapper for GMMR0FreeLargePage.
3105	*
3106	* @returns see GMMR0FreeLargePage.
3107	* @param pVM Pointer to the VM.
3108	* @param idCpu The VCPU id.
3109	* @param pReq Pointer to the request packet.
3110	*/
3111	GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3112	{
3113	/*
3114	* Validate input and pass it on.
3115	*/
3116	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3117	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3118	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3119	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3120	VERR_INVALID_PARAMETER);
3121
3122	return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3123	}
3124
3125
3126	/**
3127	* Frees a chunk, giving it back to the host OS.
3128	*
3129	* @param pGMM Pointer to the GMM instance.
3130	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3131	* unmap and free the chunk in one go.
3132	* @param pChunk The chunk to free.
3133	* @param fRelaxedSem Whether we can release the semaphore while doing the
3134	* freeing (@c true) or not.
3135	*/
3136	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3137	{
3138	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3139
3140	GMMR0CHUNKMTXSTATE MtxState;
3141	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3142
3143	/*
3144	* Cleanup hack! Unmap the chunk from the callers address space.
3145	* This shouldn't happen, so screw lock contention...
3146	*/
3147	if ( pChunk->cMappingsX
3148	&& !pGMM->fLegacyAllocationMode
3149	&& pGVM)
3150	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3151
3152	/*
3153	* If there are current mappings of the chunk, then request the
3154	* VMs to unmap them. Reposition the chunk in the free list so
3155	* it won't be a likely candidate for allocations.
3156	*/
3157	if (pChunk->cMappingsX)
3158	{
3159	/** @todo R0 -> VM request */
3160	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3161	Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3162	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3163	return false;
3164	}
3165
3166
3167	/*
3168	* Save and trash the handle.
3169	*/
3170	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3171	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3172
3173	/*
3174	* Unlink it from everywhere.
3175	*/
3176	gmmR0UnlinkChunk(pChunk);
3177
3178	RTListNodeRemove(&pChunk->ListNode);
3179
3180	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3181	Assert(pCore == &pChunk->Core); NOREF(pCore);
3182
3183	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3184	if (pTlbe->pChunk == pChunk)
3185	{
3186	pTlbe->idChunk = NIL_GMM_CHUNKID;
3187	pTlbe->pChunk = NULL;
3188	}
3189
3190	Assert(pGMM->cChunks > 0);
3191	pGMM->cChunks--;
3192
3193	/*
3194	* Free the Chunk ID before dropping the locks and freeing the rest.
3195	*/
3196	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3197	pChunk->Core.Key = NIL_GMM_CHUNKID;
3198
3199	pGMM->cFreedChunks++;
3200
3201	gmmR0ChunkMutexRelease(&MtxState, NULL);
3202	if (fRelaxedSem)
3203	gmmR0MutexRelease(pGMM);
3204
3205	RTMemFree(pChunk->paMappingsX);
3206	pChunk->paMappingsX = NULL;
3207
3208	RTMemFree(pChunk);
3209
3210	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3211	AssertLogRelRC(rc);
3212
3213	if (fRelaxedSem)
3214	gmmR0MutexAcquire(pGMM);
3215	return fRelaxedSem;
3216	}
3217
3218
3219	/**
3220	* Free page worker.
3221	*
3222	* The caller does all the statistic decrementing, we do all the incrementing.
3223	*
3224	* @param pGMM Pointer to the GMM instance data.
3225	* @param pGVM Pointer to the GVM instance.
3226	* @param pChunk Pointer to the chunk this page belongs to.
3227	* @param idPage The Page ID.
3228	* @param pPage Pointer to the page.
3229	*/
3230	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3231	{
3232	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3233	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3234
3235	/*
3236	* Put the page on the free list.
3237	*/
3238	pPage->u = 0;
3239	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3240	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3241	pPage->Free.iNext = pChunk->iFreeHead;
3242	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3243
3244	/*
3245	* Update statistics (the cShared/cPrivate stats are up to date already),
3246	* and relink the chunk if necessary.
3247	*/
3248	unsigned const cFree = pChunk->cFree;
3249	if ( !cFree
3250	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3251	{
3252	gmmR0UnlinkChunk(pChunk);
3253	pChunk->cFree++;
3254	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3255	}
3256	else
3257	{
3258	pChunk->cFree = cFree + 1;
3259	pChunk->pSet->cFreePages++;
3260	}
3261
3262	/*
3263	* If the chunk becomes empty, consider giving memory back to the host OS.
3264	*
3265	* The current strategy is to try give it back if there are other chunks
3266	* in this free list, meaning if there are at least 240 free pages in this
3267	* category. Note that since there are probably mappings of the chunk,
3268	* it won't be freed up instantly, which probably screws up this logic
3269	* a bit...
3270	*/
3271	/** @todo Do this on the way out. */
3272	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3273	&& pChunk->pFreeNext
3274	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3275	&& !pGMM->fLegacyAllocationMode))
3276	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3277
3278	}
3279
3280
3281	/**
3282	* Frees a shared page, the page is known to exist and be valid and such.
3283	*
3284	* @param pGMM Pointer to the GMM instance.
3285	* @param pGVM Pointer to the GVM instance.
3286	* @param idPage The page id.
3287	* @param pPage The page structure.
3288	*/
3289	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3290	{
3291	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3292	Assert(pChunk);
3293	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3294	Assert(pChunk->cShared > 0);
3295	Assert(pGMM->cSharedPages > 0);
3296	Assert(pGMM->cAllocatedPages > 0);
3297	Assert(!pPage->Shared.cRefs);
3298
3299	pChunk->cShared--;
3300	pGMM->cAllocatedPages--;
3301	pGMM->cSharedPages--;
3302	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3303	}
3304
3305
3306	/**
3307	* Frees a private page, the page is known to exist and be valid and such.
3308	*
3309	* @param pGMM Pointer to the GMM instance.
3310	* @param pGVM Pointer to the GVM instance.
3311	* @param idPage The page id.
3312	* @param pPage The page structure.
3313	*/
3314	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3315	{
3316	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3317	Assert(pChunk);
3318	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3319	Assert(pChunk->cPrivate > 0);
3320	Assert(pGMM->cAllocatedPages > 0);
3321
3322	pChunk->cPrivate--;
3323	pGMM->cAllocatedPages--;
3324	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3325	}
3326
3327
3328	/**
3329	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3330	*
3331	* @returns VBox status code:
3332	* @retval xxx
3333	*
3334	* @param pGMM Pointer to the GMM instance data.
3335	* @param pGVM Pointer to the VM.
3336	* @param cPages The number of pages to free.
3337	* @param paPages Pointer to the page descriptors.
3338	* @param enmAccount The account this relates to.
3339	*/
3340	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3341	{
3342	/*
3343	* Check that the request isn't impossible wrt to the account status.
3344	*/
3345	switch (enmAccount)
3346	{
3347	case GMMACCOUNT_BASE:
3348	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3349	{
3350	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3351	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3352	}
3353	break;
3354	case GMMACCOUNT_SHADOW:
3355	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3356	{
3357	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3358	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3359	}
3360	break;
3361	case GMMACCOUNT_FIXED:
3362	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3363	{
3364	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3365	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3366	}
3367	break;
3368	default:
3369	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3370	}
3371
3372	/*
3373	* Walk the descriptors and free the pages.
3374	*
3375	* Statistics (except the account) are being updated as we go along,
3376	* unlike the alloc code. Also, stop on the first error.
3377	*/
3378	int rc = VINF_SUCCESS;
3379	uint32_t iPage;
3380	for (iPage = 0; iPage < cPages; iPage++)
3381	{
3382	uint32_t idPage = paPages[iPage].idPage;
3383	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3384	if (RT_LIKELY(pPage))
3385	{
3386	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3387	{
3388	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3389	{
3390	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3391	pGVM->gmm.s.Stats.cPrivatePages--;
3392	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3393	}
3394	else
3395	{
3396	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3397	pPage->Private.hGVM, pGVM->hSelf));
3398	rc = VERR_GMM_NOT_PAGE_OWNER;
3399	break;
3400	}
3401	}
3402	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3403	{
3404	Assert(pGVM->gmm.s.Stats.cSharedPages);
3405	Assert(pPage->Shared.cRefs);
3406	#if defined(VBOX_WITH_PAGE_SHARING) && defined(VBOX_STRICT) && HC_ARCH_BITS == 64
3407	if (pPage->Shared.u14Checksum)
3408	{
3409	uint32_t uChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
3410	uChecksum &= UINT32_C(0x00003fff);
3411	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum,
3412	("%#x vs %#x - idPage=%#x\n", uChecksum, pPage->Shared.u14Checksum, idPage));
3413	}
3414	#endif
3415	pGVM->gmm.s.Stats.cSharedPages--;
3416	if (!--pPage->Shared.cRefs)
3417	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3418	else
3419	{
3420	Assert(pGMM->cDuplicatePages);
3421	pGMM->cDuplicatePages--;
3422	}
3423	}
3424	else
3425	{
3426	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3427	rc = VERR_GMM_PAGE_ALREADY_FREE;
3428	break;
3429	}
3430	}
3431	else
3432	{
3433	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3434	rc = VERR_GMM_PAGE_NOT_FOUND;
3435	break;
3436	}
3437	paPages[iPage].idPage = NIL_GMM_PAGEID;
3438	}
3439
3440	/*
3441	* Update the account.
3442	*/
3443	switch (enmAccount)
3444	{
3445	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3446	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3447	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3448	default:
3449	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3450	}
3451
3452	/*
3453	* Any threshold stuff to be done here?
3454	*/
3455
3456	return rc;
3457	}
3458
3459
3460	/**
3461	* Free one or more pages.
3462	*
3463	* This is typically used at reset time or power off.
3464	*
3465	* @returns VBox status code:
3466	* @retval xxx
3467	*
3468	* @param pVM Pointer to the VM.
3469	* @param idCpu The VCPU id.
3470	* @param cPages The number of pages to allocate.
3471	* @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3472	* @param enmAccount The account this relates to.
3473	* @thread EMT.
3474	*/
3475	GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3476	{
3477	LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3478
3479	/*
3480	* Validate input and get the basics.
3481	*/
3482	PGMM pGMM;
3483	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3484	PGVM pGVM;
3485	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3486	if (RT_FAILURE(rc))
3487	return rc;
3488
3489	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3490	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3491	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3492
3493	for (unsigned iPage = 0; iPage < cPages; iPage++)
3494	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3495	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3496	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3497
3498	/*
3499	* Take the semaphore and call the worker function.
3500	*/
3501	gmmR0MutexAcquire(pGMM);
3502	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3503	{
3504	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3505	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3506	}
3507	else
3508	rc = VERR_GMM_IS_NOT_SANE;
3509	gmmR0MutexRelease(pGMM);
3510	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3511	return rc;
3512	}
3513
3514
3515	/**
3516	* VMMR0 request wrapper for GMMR0FreePages.
3517	*
3518	* @returns see GMMR0FreePages.
3519	* @param pVM Pointer to the VM.
3520	* @param idCpu The VCPU id.
3521	* @param pReq Pointer to the request packet.
3522	*/
3523	GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3524	{
3525	/*
3526	* Validate input and pass it on.
3527	*/
3528	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3529	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3530	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3531	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3532	VERR_INVALID_PARAMETER);
3533	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3534	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3535	VERR_INVALID_PARAMETER);
3536
3537	return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3538	}
3539
3540
3541	/**
3542	* Report back on a memory ballooning request.
3543	*
3544	* The request may or may not have been initiated by the GMM. If it was initiated
3545	* by the GMM it is important that this function is called even if no pages were
3546	* ballooned.
3547	*
3548	* @returns VBox status code:
3549	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3550	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3551	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3552	* indicating that we won't necessarily have sufficient RAM to boot
3553	* the VM again and that it should pause until this changes (we'll try
3554	* balloon some other VM). (For standard deflate we have little choice
3555	* but to hope the VM won't use the memory that was returned to it.)
3556	*
3557	* @param pVM Pointer to the VM.
3558	* @param idCpu The VCPU id.
3559	* @param enmAction Inflate/deflate/reset.
3560	* @param cBalloonedPages The number of pages that was ballooned.
3561	*
3562	* @thread EMT.
3563	*/
3564	GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3565	{
3566	LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3567	pVM, enmAction, cBalloonedPages));
3568
3569	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3570
3571	/*
3572	* Validate input and get the basics.
3573	*/
3574	PGMM pGMM;
3575	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3576	PGVM pGVM;
3577	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3578	if (RT_FAILURE(rc))
3579	return rc;
3580
3581	/*
3582	* Take the semaphore and do some more validations.
3583	*/
3584	gmmR0MutexAcquire(pGMM);
3585	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3586	{
3587	switch (enmAction)
3588	{
3589	case GMMBALLOONACTION_INFLATE:
3590	{
3591	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3592	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3593	{
3594	/*
3595	* Record the ballooned memory.
3596	*/
3597	pGMM->cBalloonedPages += cBalloonedPages;
3598	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3599	{
3600	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3601	AssertFailed();
3602
3603	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3604	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3605	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3606	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3607	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3608	}
3609	else
3610	{
3611	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3612	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3613	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3614	}
3615	}
3616	else
3617	{
3618	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3619	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3620	pGVM->gmm.s.Stats.Reserved.cBasePages));
3621	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3622	}
3623	break;
3624	}
3625
3626	case GMMBALLOONACTION_DEFLATE:
3627	{
3628	/* Deflate. */
3629	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3630	{
3631	/*
3632	* Record the ballooned memory.
3633	*/
3634	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3635	pGMM->cBalloonedPages -= cBalloonedPages;
3636	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3637	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3638	{
3639	AssertFailed(); /* This is path is for later. */
3640	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3641	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3642
3643	/*
3644	* Anything we need to do here now when the request has been completed?
3645	*/
3646	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3647	}
3648	else
3649	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3650	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3651	}
3652	else
3653	{
3654	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3655	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3656	}
3657	break;
3658	}
3659
3660	case GMMBALLOONACTION_RESET:
3661	{
3662	/* Reset to an empty balloon. */
3663	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3664
3665	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3666	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3667	break;
3668	}
3669
3670	default:
3671	rc = VERR_INVALID_PARAMETER;
3672	break;
3673	}
3674	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3675	}
3676	else
3677	rc = VERR_GMM_IS_NOT_SANE;
3678
3679	gmmR0MutexRelease(pGMM);
3680	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3681	return rc;
3682	}
3683
3684
3685	/**
3686	* VMMR0 request wrapper for GMMR0BalloonedPages.
3687	*
3688	* @returns see GMMR0BalloonedPages.
3689	* @param pVM Pointer to the VM.
3690	* @param idCpu The VCPU id.
3691	* @param pReq Pointer to the request packet.
3692	*/
3693	GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3694	{
3695	/*
3696	* Validate input and pass it on.
3697	*/
3698	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3699	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3700	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3701	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3702	VERR_INVALID_PARAMETER);
3703
3704	return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3705	}
3706
3707	/**
3708	* Return memory statistics for the hypervisor
3709	*
3710	* @returns VBox status code:
3711	* @param pVM Pointer to the VM.
3712	* @param pReq Pointer to the request packet.
3713	*/
3714	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3715	{
3716	/*
3717	* Validate input and pass it on.
3718	*/
3719	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3720	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3721	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3722	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3723	VERR_INVALID_PARAMETER);
3724
3725	/*
3726	* Validate input and get the basics.
3727	*/
3728	PGMM pGMM;
3729	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3730	pReq->cAllocPages = pGMM->cAllocatedPages;
3731	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3732	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3733	pReq->cMaxPages = pGMM->cMaxPages;
3734	pReq->cSharedPages = pGMM->cDuplicatePages;
3735	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3736
3737	return VINF_SUCCESS;
3738	}
3739
3740	/**
3741	* Return memory statistics for the VM
3742	*
3743	* @returns VBox status code:
3744	* @param pVM Pointer to the VM.
3745	* @parma idCpu Cpu id.
3746	* @param pReq Pointer to the request packet.
3747	*/
3748	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3749	{
3750	/*
3751	* Validate input and pass it on.
3752	*/
3753	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3754	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3755	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3756	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3757	VERR_INVALID_PARAMETER);
3758
3759	/*
3760	* Validate input and get the basics.
3761	*/
3762	PGMM pGMM;
3763	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3764	PGVM pGVM;
3765	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3766	if (RT_FAILURE(rc))
3767	return rc;
3768
3769	/*
3770	* Take the semaphore and do some more validations.
3771	*/
3772	gmmR0MutexAcquire(pGMM);
3773	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3774	{
3775	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3776	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3777	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3778	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3779	}
3780	else
3781	rc = VERR_GMM_IS_NOT_SANE;
3782
3783	gmmR0MutexRelease(pGMM);
3784	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3785	return rc;
3786	}
3787
3788
3789	/**
3790	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3791	*
3792	* Don't call this in legacy allocation mode!
3793	*
3794	* @returns VBox status code.
3795	* @param pGMM Pointer to the GMM instance data.
3796	* @param pGVM Pointer to the Global VM structure.
3797	* @param pChunk Pointer to the chunk to be unmapped.
3798	*/
3799	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3800	{
3801	Assert(!pGMM->fLegacyAllocationMode);
3802
3803	/*
3804	* Find the mapping and try unmapping it.
3805	*/
3806	uint32_t cMappings = pChunk->cMappingsX;
3807	for (uint32_t i = 0; i < cMappings; i++)
3808	{
3809	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3810	if (pChunk->paMappingsX[i].pGVM == pGVM)
3811	{
3812	/* unmap */
3813	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3814	if (RT_SUCCESS(rc))
3815	{
3816	/* update the record. */
3817	cMappings--;
3818	if (i < cMappings)
3819	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3820	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3821	pChunk->paMappingsX[cMappings].pGVM = NULL;
3822	Assert(pChunk->cMappingsX - 1U == cMappings);
3823	pChunk->cMappingsX = cMappings;
3824	}
3825
3826	return rc;
3827	}
3828	}
3829
3830	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3831	return VERR_GMM_CHUNK_NOT_MAPPED;
3832	}
3833
3834
3835	/**
3836	* Unmaps a chunk previously mapped into the address space of the current process.
3837	*
3838	* @returns VBox status code.
3839	* @param pGMM Pointer to the GMM instance data.
3840	* @param pGVM Pointer to the Global VM structure.
3841	* @param pChunk Pointer to the chunk to be unmapped.
3842	*/
3843	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3844	{
3845	if (!pGMM->fLegacyAllocationMode)
3846	{
3847	/*
3848	* Lock the chunk and if possible leave the giant GMM lock.
3849	*/
3850	GMMR0CHUNKMTXSTATE MtxState;
3851	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3852	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3853	if (RT_SUCCESS(rc))
3854	{
3855	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3856	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3857	}
3858	return rc;
3859	}
3860
3861	if (pChunk->hGVM == pGVM->hSelf)
3862	return VINF_SUCCESS;
3863
3864	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3865	return VERR_GMM_CHUNK_NOT_MAPPED;
3866	}
3867
3868
3869	/**
3870	* Worker for gmmR0MapChunk.
3871	*
3872	* @returns VBox status code.
3873	* @param pGMM Pointer to the GMM instance data.
3874	* @param pGVM Pointer to the Global VM structure.
3875	* @param pChunk Pointer to the chunk to be mapped.
3876	* @param ppvR3 Where to store the ring-3 address of the mapping.
3877	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3878	* contain the address of the existing mapping.
3879	*/
3880	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3881	{
3882	/*
3883	* If we're in legacy mode this is simple.
3884	*/
3885	if (pGMM->fLegacyAllocationMode)
3886	{
3887	if (pChunk->hGVM != pGVM->hSelf)
3888	{
3889	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3890	return VERR_GMM_CHUNK_NOT_FOUND;
3891	}
3892
3893	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3894	return VINF_SUCCESS;
3895	}
3896
3897	/*
3898	* Check to see if the chunk is already mapped.
3899	*/
3900	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3901	{
3902	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3903	if (pChunk->paMappingsX[i].pGVM == pGVM)
3904	{
3905	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3906	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3907	#ifdef VBOX_WITH_PAGE_SHARING
3908	/* The ring-3 chunk cache can be out of sync; don't fail. */
3909	return VINF_SUCCESS;
3910	#else
3911	return VERR_GMM_CHUNK_ALREADY_MAPPED;
3912	#endif
3913	}
3914	}
3915
3916	/*
3917	* Do the mapping.
3918	*/
3919	RTR0MEMOBJ hMapObj;
3920	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
3921	if (RT_SUCCESS(rc))
3922	{
3923	/* reallocate the array? assumes few users per chunk (usually one). */
3924	unsigned iMapping = pChunk->cMappingsX;
3925	if ( iMapping <= 3
3926	\|\| (iMapping & 3) == 0)
3927	{
3928	unsigned cNewSize = iMapping <= 3
3929	? iMapping + 1
3930	: iMapping + 4;
3931	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
3932	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
3933	{
3934	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3935	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
3936	}
3937
3938	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
3939	if (RT_UNLIKELY(!pvMappings))
3940	{
3941	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3942	return VERR_NO_MEMORY;
3943	}
3944	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
3945	}
3946
3947	/* insert new entry */
3948	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
3949	pChunk->paMappingsX[iMapping].pGVM = pGVM;
3950	Assert(pChunk->cMappingsX == iMapping);
3951	pChunk->cMappingsX = iMapping + 1;
3952
3953	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
3954	}
3955
3956	return rc;
3957	}
3958
3959
3960	/**
3961	* Maps a chunk into the user address space of the current process.
3962	*
3963	* @returns VBox status code.
3964	* @param pGMM Pointer to the GMM instance data.
3965	* @param pGVM Pointer to the Global VM structure.
3966	* @param pChunk Pointer to the chunk to be mapped.
3967	* @param fRelaxedSem Whether we can release the semaphore while doing the
3968	* mapping (@c true) or not.
3969	* @param ppvR3 Where to store the ring-3 address of the mapping.
3970	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3971	* contain the address of the existing mapping.
3972	*/
3973	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
3974	{
3975	/*
3976	* Take the chunk lock and leave the giant GMM lock when possible, then
3977	* call the worker function.
3978	*/
3979	GMMR0CHUNKMTXSTATE MtxState;
3980	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3981	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3982	if (RT_SUCCESS(rc))
3983	{
3984	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
3985	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3986	}
3987
3988	return rc;
3989	}
3990
3991
3992
3993	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
3994	/**
3995	* Check if a chunk is mapped into the specified VM
3996	*
3997	* @returns mapped yes/no
3998	* @param pGMM Pointer to the GMM instance.
3999	* @param pGVM Pointer to the Global VM structure.
4000	* @param pChunk Pointer to the chunk to be mapped.
4001	* @param ppvR3 Where to store the ring-3 address of the mapping.
4002	*/
4003	static bool gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
4004	{
4005	GMMR0CHUNKMTXSTATE MtxState;
4006	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
4007	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
4008	{
4009	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
4010	if (pChunk->paMappingsX[i].pGVM == pGVM)
4011	{
4012	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
4013	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4014	return true;
4015	}
4016	}
4017	*ppvR3 = NULL;
4018	gmmR0ChunkMutexRelease(&MtxState, pChunk);
4019	return false;
4020	}
4021	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
4022
4023
4024	/**
4025	* Map a chunk and/or unmap another chunk.
4026	*
4027	* The mapping and unmapping applies to the current process.
4028	*
4029	* This API does two things because it saves a kernel call per mapping when
4030	* when the ring-3 mapping cache is full.
4031	*
4032	* @returns VBox status code.
4033	* @param pVM The VM.
4034	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
4035	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
4036	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
4037	* @thread EMT
4038	*/
4039	GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
4040	{
4041	LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
4042	pVM, idChunkMap, idChunkUnmap, ppvR3));
4043
4044	/*
4045	* Validate input and get the basics.
4046	*/
4047	PGMM pGMM;
4048	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4049	PGVM pGVM;
4050	int rc = GVMMR0ByVM(pVM, &pGVM);
4051	if (RT_FAILURE(rc))
4052	return rc;
4053
4054	AssertCompile(NIL_GMM_CHUNKID == 0);
4055	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
4056	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
4057
4058	if ( idChunkMap == NIL_GMM_CHUNKID
4059	&& idChunkUnmap == NIL_GMM_CHUNKID)
4060	return VERR_INVALID_PARAMETER;
4061
4062	if (idChunkMap != NIL_GMM_CHUNKID)
4063	{
4064	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4065	*ppvR3 = NIL_RTR3PTR;
4066	}
4067
4068	/*
4069	* Take the semaphore and do the work.
4070	*
4071	* The unmapping is done last since it's easier to undo a mapping than
4072	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4073	* that it pushes the user virtual address space to within a chunk of
4074	* it it's limits, so, no problem here.
4075	*/
4076	gmmR0MutexAcquire(pGMM);
4077	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4078	{
4079	PGMMCHUNK pMap = NULL;
4080	if (idChunkMap != NIL_GVM_HANDLE)
4081	{
4082	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4083	if (RT_LIKELY(pMap))
4084	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4085	else
4086	{
4087	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4088	rc = VERR_GMM_CHUNK_NOT_FOUND;
4089	}
4090	}
4091	/** @todo split this operation, the bail out might (theoretcially) not be
4092	* entirely safe. */
4093
4094	if ( idChunkUnmap != NIL_GMM_CHUNKID
4095	&& RT_SUCCESS(rc))
4096	{
4097	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4098	if (RT_LIKELY(pUnmap))
4099	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4100	else
4101	{
4102	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4103	rc = VERR_GMM_CHUNK_NOT_FOUND;
4104	}
4105
4106	if (RT_FAILURE(rc) && pMap)
4107	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4108	}
4109
4110	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4111	}
4112	else
4113	rc = VERR_GMM_IS_NOT_SANE;
4114	gmmR0MutexRelease(pGMM);
4115
4116	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4117	return rc;
4118	}
4119
4120
4121	/**
4122	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4123	*
4124	* @returns see GMMR0MapUnmapChunk.
4125	* @param pVM Pointer to the VM.
4126	* @param pReq Pointer to the request packet.
4127	*/
4128	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4129	{
4130	/*
4131	* Validate input and pass it on.
4132	*/
4133	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4134	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4135	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4136
4137	return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4138	}
4139
4140
4141	/**
4142	* Legacy mode API for supplying pages.
4143	*
4144	* The specified user address points to a allocation chunk sized block that
4145	* will be locked down and used by the GMM when the GM asks for pages.
4146	*
4147	* @returns VBox status code.
4148	* @param pVM Pointer to the VM.
4149	* @param idCpu The VCPU id.
4150	* @param pvR3 Pointer to the chunk size memory block to lock down.
4151	*/
4152	GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4153	{
4154	/*
4155	* Validate input and get the basics.
4156	*/
4157	PGMM pGMM;
4158	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4159	PGVM pGVM;
4160	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4161	if (RT_FAILURE(rc))
4162	return rc;
4163
4164	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4165	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4166
4167	if (!pGMM->fLegacyAllocationMode)
4168	{
4169	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4170	return VERR_NOT_SUPPORTED;
4171	}
4172
4173	/*
4174	* Lock the memory and add it as new chunk with our hGVM.
4175	* (The GMM locking is done inside gmmR0RegisterChunk.)
4176	*/
4177	RTR0MEMOBJ MemObj;
4178	rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4179	if (RT_SUCCESS(rc))
4180	{
4181	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4182	if (RT_SUCCESS(rc))
4183	gmmR0MutexRelease(pGMM);
4184	else
4185	RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4186	}
4187
4188	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4189	return rc;
4190	}
4191
4192	#ifdef VBOX_WITH_PAGE_SHARING
4193
4194	# ifdef VBOX_STRICT
4195	/**
4196	* For checksumming shared pages in strict builds.
4197	*
4198	* The purpose is making sure that a page doesn't change.
4199	*
4200	* @returns Checksum, 0 on failure.
4201	* @param GMM The GMM instance data.
4202	* @param idPage The page ID.
4203	*/
4204	static uint32_t gmmR0StrictPageChecksum(PGMM pGMM, PGVM pGVM, uint32_t idPage)
4205	{
4206	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4207	AssertMsgReturn(pChunk, ("idPage=%#x\n", idPage), 0);
4208
4209	uint8_t *pbChunk;
4210	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4211	return 0;
4212	uint8_t const *pbPage = pbChunk + ((idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4213
4214	return RTCrc32(pbPage, PAGE_SIZE);
4215	}
4216	# endif /* VBOX_STRICT */
4217
4218
4219	/**
4220	* Calculates the module hash value.
4221	*
4222	* @returns Hash value.
4223	* @param pszModuleName The module name.
4224	* @param pszVersion The module version string.
4225	*/
4226	static uint32_t gmmR0ShModCalcHash(const char pszModuleName, const char pszVersion)
4227	{
4228	return RTStrHash1ExN(3, pszModuleName, RTSTR_MAX, "::", (size_t)2, pszVersion, RTSTR_MAX);
4229	}
4230
4231
4232	/**
4233	* Finds a global module.
4234	*
4235	* @returns Pointer to the global module on success, NULL if not found.
4236	* @param pGMM The GMM instance data.
4237	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4238	* @param cbModule The module size.
4239	* @param enmGuestOS The guest OS type.
4240	* @param pszModuleName The module name.
4241	* @param pszVersion The module version.
4242	*/
4243	static PGMMSHAREDMODULE gmmR0ShModFindGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4244	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4245	struct VMMDEVSHAREDREGIONDESC const *paRegions)
4246	{
4247	for (PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTAvllU32Get(&pGMM->pGlobalSharedModuleTree, uHash);
4248	pGblMod;
4249	pGblMod = (PGMMSHAREDMODULE)pGblMod->Core.pList)
4250	{
4251	if (pGblMod->cbModule != cbModule)
4252	continue;
4253	if (pGblMod->enmGuestOS != enmGuestOS)
4254	continue;
4255	if (pGblMod->cRegions != cRegions)
4256	continue;
4257	if (strcmp(pGblMod->szName, pszModuleName))
4258	continue;
4259	if (strcmp(pGblMod->szVersion, pszVersion))
4260	continue;
4261
4262	uint32_t i;
4263	for (i = 0; i < cRegions; i++)
4264	{
4265	uint32_t off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4266	if (pGblMod->aRegions[i].off != off)
4267	break;
4268
4269	uint32_t cb = RT_ALIGN_32(paRegions[i].cbRegion + off, PAGE_SIZE);
4270	if (pGblMod->aRegions[i].cb != cb)
4271	break;
4272	}
4273
4274	if (i == cRegions)
4275	return pGblMod;
4276	}
4277
4278	return NULL;
4279	}
4280
4281
4282	/**
4283	* Creates a new global module.
4284	*
4285	* @returns VBox status code.
4286	* @param pGMM The GMM instance data.
4287	* @param uHash The hash as calculated by gmmR0ShModCalcHash.
4288	* @param cbModule The module size.
4289	* @param enmGuestOS The guest OS type.
4290	* @param cRegions The number of regions.
4291	* @param pszModuleName The module name.
4292	* @param pszVersion The module version.
4293	* @param paRegions The region descriptions.
4294	* @param ppGblMod Where to return the new module on success.
4295	*/
4296	static int gmmR0ShModNewGlobal(PGMM pGMM, uint32_t uHash, uint32_t cbModule, VBOXOSFAMILY enmGuestOS,
4297	uint32_t cRegions, const char pszModuleName, const char pszVersion,
4298	struct VMMDEVSHAREDREGIONDESC const paRegions, PGMMSHAREDMODULE ppGblMod)
4299	{
4300	Log(("gmmR0ShModNewGlobal: %s %s size %#x os %u rgn %u\n", pszModuleName, pszVersion, cbModule, cRegions));
4301	if (pGMM->cShareableModules >= GMM_MAX_SHARED_GLOBAL_MODULES)
4302	{
4303	Log(("gmmR0ShModNewGlobal: Too many modules\n"));
4304	return VERR_GMM_TOO_MANY_GLOBAL_MODULES;
4305	}
4306
4307	PGMMSHAREDMODULE pGblMod = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4308	if (!pGblMod)
4309	{
4310	Log(("gmmR0ShModNewGlobal: No memory\n"));
4311	return VERR_NO_MEMORY;
4312	}
4313
4314	pGblMod->Core.Key = uHash;
4315	pGblMod->cbModule = cbModule;
4316	pGblMod->cRegions = cRegions;
4317	pGblMod->cUsers = 1;
4318	pGblMod->enmGuestOS = enmGuestOS;
4319	strcpy(pGblMod->szName, pszModuleName);
4320	strcpy(pGblMod->szVersion, pszVersion);
4321
4322	for (uint32_t i = 0; i < cRegions; i++)
4323	{
4324	Log(("gmmR0ShModNewGlobal: rgn[%u]=%RGvLB%#x\n", i, paRegions[i].GCRegionAddr, paRegions[i].cbRegion));
4325	pGblMod->aRegions[i].off = paRegions[i].GCRegionAddr & PAGE_OFFSET_MASK;
4326	pGblMod->aRegions[i].cb = paRegions[i].cbRegion + pGblMod->aRegions[i].off;
4327	pGblMod->aRegions[i].cb = RT_ALIGN_32(pGblMod->aRegions[i].cb, PAGE_SIZE);
4328	pGblMod->aRegions[i].paidPages = NULL; /* allocated when needed. */
4329	}
4330
4331	bool fInsert = RTAvllU32Insert(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4332	Assert(fInsert); NOREF(fInsert);
4333	pGMM->cShareableModules++;
4334
4335	*ppGblMod = pGblMod;
4336	return VINF_SUCCESS;
4337	}
4338
4339
4340	/**
4341	* Deletes a global module which is no longer referenced by anyone.
4342	*
4343	* @param pGMM The GMM instance data.
4344	* @param pGblMod The module to delete.
4345	*/
4346	static void gmmR0ShModDeleteGlobal(PGMM pGMM, PGMMSHAREDMODULE pGblMod)
4347	{
4348	Assert(pGblMod->cUsers == 0);
4349	Assert(pGMM->cShareableModules > 0 && pGMM->cShareableModules <= GMM_MAX_SHARED_GLOBAL_MODULES);
4350
4351	void *pvTest = RTAvllU32RemoveNode(&pGMM->pGlobalSharedModuleTree, &pGblMod->Core);
4352	Assert(pvTest == pGblMod); NOREF(pvTest);
4353	pGMM->cShareableModules--;
4354
4355	uint32_t i = pGblMod->cRegions;
4356	while (i-- > 0)
4357	{
4358	if (pGblMod->aRegions[i].paidPages)
4359	{
4360	/* We don't doing anything to the pages as they are handled by the
4361	copy-on-write mechanism in PGM. */
4362	RTMemFree(pGblMod->aRegions[i].paidPages);
4363	pGblMod->aRegions[i].paidPages = NULL;
4364	}
4365	}
4366	RTMemFree(pGblMod);
4367	}
4368
4369
4370	static int gmmR0ShModNewPerVM(PGVM pGVM, RTGCPTR GCBaseAddr, uint32_t cRegions, const VMMDEVSHAREDREGIONDESC *paRegions,
4371	PGMMSHAREDMODULEPERVM *ppRecVM)
4372	{
4373	if (pGVM->gmm.s.Stats.cShareableModules >= GMM_MAX_SHARED_PER_VM_MODULES)
4374	return VERR_GMM_TOO_MANY_PER_VM_MODULES;
4375
4376	PGMMSHAREDMODULEPERVM pRecVM;
4377	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegionsGCPtrs[cRegions]));
4378	if (!pRecVM)
4379	return VERR_NO_MEMORY;
4380
4381	pRecVM->Core.Key = GCBaseAddr;
4382	for (uint32_t i = 0; i < cRegions; i++)
4383	pRecVM->aRegionsGCPtrs[i] = paRegions[i].GCRegionAddr;
4384
4385	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4386	Assert(fInsert); NOREF(fInsert);
4387	pGVM->gmm.s.Stats.cShareableModules++;
4388
4389	*ppRecVM = pRecVM;
4390	return VINF_SUCCESS;
4391	}
4392
4393
4394	static void gmmR0ShModDeletePerVM(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULEPERVM pRecVM, bool fRemove)
4395	{
4396	/*
4397	* Free the per-VM module.
4398	*/
4399	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
4400	pRecVM->pGlobalModule = NULL;
4401
4402	if (fRemove)
4403	{
4404	void *pvTest = RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, pRecVM->Core.Key);
4405	Assert(pvTest == &pRecVM->Core);
4406	}
4407
4408	RTMemFree(pRecVM);
4409
4410	/*
4411	* Release the global module.
4412	* (In the registration bailout case, it might not be.)
4413	*/
4414	if (pGblMod)
4415	{
4416	Assert(pGblMod->cUsers > 0);
4417	pGblMod->cUsers--;
4418	if (pGblMod->cUsers == 0)
4419	gmmR0ShModDeleteGlobal(pGMM, pGblMod);
4420	}
4421	}
4422
4423	#endif /* VBOX_WITH_PAGE_SHARING */
4424
4425	/**
4426	* Registers a new shared module for the VM.
4427	*
4428	* @returns VBox status code.
4429	* @param pVM Pointer to the VM.
4430	* @param idCpu The VCPU id.
4431	* @param enmGuestOS The guest OS type.
4432	* @param pszModuleName The module name.
4433	* @param pszVersion The module version.
4434	* @param GCPtrModBase The module base address.
4435	* @param cbModule The module size.
4436	* @param cRegions The mumber of shared region descriptors.
4437	* @param paRegions Pointer to an array of shared region(s).
4438	*/
4439	GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4440	char *pszVersion, RTGCPTR GCPtrModBase, uint32_t cbModule,
4441	uint32_t cRegions, struct VMMDEVSHAREDREGIONDESC const *paRegions)
4442	{
4443	#ifdef VBOX_WITH_PAGE_SHARING
4444	/*
4445	* Validate input and get the basics.
4446	*
4447	* Note! Turns out the module size does necessarily match the size of the
4448	* regions. (iTunes on XP)
4449	*/
4450	PGMM pGMM;
4451	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4452	PGVM pGVM;
4453	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4454	if (RT_FAILURE(rc))
4455	return rc;
4456
4457	if (RT_UNLIKELY(cRegions > VMMDEVSHAREDREGIONDESC_MAX))
4458	return VERR_GMM_TOO_MANY_REGIONS;
4459
4460	if (RT_UNLIKELY(cbModule == 0 \|\| cbModule > _1G))
4461	return VERR_GMM_BAD_SHARED_MODULE_SIZE;
4462
4463	uint32_t cbTotal = 0;
4464	for (uint32_t i = 0; i < cRegions; i++)
4465	{
4466	if (RT_UNLIKELY(paRegions[i].cbRegion == 0 \|\| paRegions[i].cbRegion > _1G))
4467	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4468
4469	cbTotal += paRegions[i].cbRegion;
4470	if (RT_UNLIKELY(cbTotal > _1G))
4471	return VERR_GMM_SHARED_MODULE_BAD_REGIONS_SIZE;
4472	}
4473
4474	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4475	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4476	return VERR_GMM_MODULE_NAME_TOO_LONG;
4477
4478	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4479	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4480	return VERR_GMM_MODULE_NAME_TOO_LONG;
4481
4482	uint32_t const uHash = gmmR0ShModCalcHash(pszModuleName, pszVersion);
4483	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x hash %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule, uHash));
4484
4485	/*
4486	* Take the semaphore and do some more validations.
4487	*/
4488	gmmR0MutexAcquire(pGMM);
4489	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4490	{
4491	/*
4492	* Check if this module is already locally registered and register
4493	* it if it isn't. The base address is a unique module identifier
4494	* locally.
4495	*/
4496	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4497	bool fNewModule = pRecVM == NULL;
4498	if (fNewModule)
4499	{
4500	rc = gmmR0ShModNewPerVM(pGVM, GCPtrModBase, cRegions, paRegions, &pRecVM);
4501	if (RT_SUCCESS(rc))
4502	{
4503	/*
4504	* Find a matching global module, register a new one if needed.
4505	*/
4506	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4507	pszModuleName, pszVersion, paRegions);
4508	if (!pGblMod)
4509	{
4510	Assert(fNewModule);
4511	rc = gmmR0ShModNewGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4512	pszModuleName, pszVersion, paRegions, &pGblMod);
4513	if (RT_SUCCESS(rc))
4514	{
4515	pRecVM->pGlobalModule = pGblMod; /* (One referenced returned by gmmR0ShModNewGlobal.) */
4516	Log(("GMMR0RegisterSharedModule: new module %s %s\n", pszModuleName, pszVersion));
4517	}
4518	else
4519	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4520	}
4521	else
4522	{
4523	Assert(pGblMod->cUsers > 0 && pGblMod->cUsers < UINT32_MAX / 2);
4524	pGblMod->cUsers++;
4525	pRecVM->pGlobalModule = pGblMod;
4526
4527	Log(("GMMR0RegisterSharedModule: new per vm module %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4528	}
4529	}
4530	}
4531	else
4532	{
4533	/*
4534	* Attempt to re-register an existing module.
4535	*/
4536	PGMMSHAREDMODULE pGblMod = gmmR0ShModFindGlobal(pGMM, uHash, cbModule, enmGuestOS, cRegions,
4537	pszModuleName, pszVersion, paRegions);
4538	if (pRecVM->pGlobalModule == pGblMod)
4539	{
4540	Log(("GMMR0RegisterSharedModule: already registered %s %s, gbl users %d\n", pszModuleName, pszVersion, pGblMod->cUsers));
4541	rc = VINF_GMM_SHARED_MODULE_ALREADY_REGISTERED;
4542	}
4543	else
4544	{
4545	/** @todo may have to unregister+register when this happens in case it's caused
4546	* by VBoxService crashing and being restarted... */
4547	Log(("GMMR0RegisterSharedModule: Address clash!\n"
4548	" incoming at %RGvLB%#x %s %s rgns %u\n"
4549	" existing at %RGvLB%#x %s %s rgns %u\n",
4550	GCPtrModBase, cbModule, pszModuleName, pszVersion, cRegions,
4551	pRecVM->Core.Key, pRecVM->pGlobalModule->cbModule, pRecVM->pGlobalModule->szName,
4552	pRecVM->pGlobalModule->szVersion, pRecVM->pGlobalModule->cRegions));
4553	rc = VERR_GMM_SHARED_MODULE_ADDRESS_CLASH;
4554	}
4555	}
4556	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4557	}
4558	else
4559	rc = VERR_GMM_IS_NOT_SANE;
4560
4561	gmmR0MutexRelease(pGMM);
4562	return rc;
4563	#else
4564
4565	NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4566	NOREF(GCPtrModBase); NOREF(cbModule); NOREF(cRegions); NOREF(paRegions);
4567	return VERR_NOT_IMPLEMENTED;
4568	#endif
4569	}
4570
4571
4572	/**
4573	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4574	*
4575	* @returns see GMMR0RegisterSharedModule.
4576	* @param pVM Pointer to the VM.
4577	* @param idCpu The VCPU id.
4578	* @param pReq Pointer to the request packet.
4579	*/
4580	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4581	{
4582	/*
4583	* Validate input and pass it on.
4584	*/
4585	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4586	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4587	AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4588
4589	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4590	pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion,
4591	pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4592	return VINF_SUCCESS;
4593	}
4594
4595
4596	/**
4597	* Unregisters a shared module for the VM
4598	*
4599	* @returns VBox status code.
4600	* @param pVM Pointer to the VM.
4601	* @param idCpu The VCPU id.
4602	* @param pszModuleName The module name.
4603	* @param pszVersion The module version.
4604	* @param GCPtrModBase The module base address.
4605	* @param cbModule The module size.
4606	*/
4607	GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4608	RTGCPTR GCPtrModBase, uint32_t cbModule)
4609	{
4610	#ifdef VBOX_WITH_PAGE_SHARING
4611	/*
4612	* Validate input and get the basics.
4613	*/
4614	PGMM pGMM;
4615	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4616	PGVM pGVM;
4617	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4618	if (RT_FAILURE(rc))
4619	return rc;
4620
4621	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4622	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4623	if (RT_UNLIKELY(!memchr(pszModuleName, '\0', GMM_SHARED_MODULE_MAX_NAME_STRING)))
4624	return VERR_GMM_MODULE_NAME_TOO_LONG;
4625	if (RT_UNLIKELY(!memchr(pszVersion, '\0', GMM_SHARED_MODULE_MAX_VERSION_STRING)))
4626	return VERR_GMM_MODULE_NAME_TOO_LONG;
4627
4628	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCPtrModBase, cbModule));
4629
4630	/*
4631	* Take the semaphore and do some more validations.
4632	*/
4633	gmmR0MutexAcquire(pGMM);
4634	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4635	{
4636	/*
4637	* Locate and remove the specified module.
4638	*/
4639	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCPtrModBase);
4640	if (pRecVM)
4641	{
4642	/** @todo Do we need to do more validations here, like that the
4643	* name + version + cbModule matches? */
4644	Assert(pRecVM->pGlobalModule);
4645	gmmR0ShModDeletePerVM(pGMM, pGVM, pRecVM, true /fRemove/);
4646	}
4647	else
4648	rc = VERR_GMM_SHARED_MODULE_NOT_FOUND;
4649
4650	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4651	}
4652	else
4653	rc = VERR_GMM_IS_NOT_SANE;
4654
4655	gmmR0MutexRelease(pGMM);
4656	return rc;
4657	#else
4658
4659	NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCPtrModBase); NOREF(cbModule);
4660	return VERR_NOT_IMPLEMENTED;
4661	#endif
4662	}
4663
4664
4665	/**
4666	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4667	*
4668	* @returns see GMMR0UnregisterSharedModule.
4669	* @param pVM Pointer to the VM.
4670	* @param idCpu The VCPU id.
4671	* @param pReq Pointer to the request packet.
4672	*/
4673	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4674	{
4675	/*
4676	* Validate input and pass it on.
4677	*/
4678	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4679	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4680	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4681
4682	return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4683	}
4684
4685	#ifdef VBOX_WITH_PAGE_SHARING
4686
4687	/**
4688	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4689	*
4690	* @param pGMM Pointer to the GMM instance.
4691	* @param pGVM Pointer to the GVM instance.
4692	* @param pPage The page structure.
4693	*/
4694	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4695	{
4696	Assert(pGMM->cSharedPages > 0);
4697	Assert(pGMM->cAllocatedPages > 0);
4698
4699	pGMM->cDuplicatePages++;
4700
4701	pPage->Shared.cRefs++;
4702	pGVM->gmm.s.Stats.cSharedPages++;
4703	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4704	}
4705
4706
4707	/**
4708	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4709	*
4710	* @param pGMM Pointer to the GMM instance.
4711	* @param pGVM Pointer to the GVM instance.
4712	* @param HCPhys Host physical address
4713	* @param idPage The Page ID
4714	* @param pPage The page structure.
4715	*/
4716	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage,
4717	PGMMSHAREDPAGEDESC pPageDesc)
4718	{
4719	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4720	Assert(pChunk);
4721	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4722	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4723
4724	pChunk->cPrivate--;
4725	pChunk->cShared++;
4726
4727	pGMM->cSharedPages++;
4728
4729	pGVM->gmm.s.Stats.cSharedPages++;
4730	pGVM->gmm.s.Stats.cPrivatePages--;
4731
4732	/* Modify the page structure. */
4733	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4734	pPage->Shared.cRefs = 1;
4735	#ifdef VBOX_STRICT
4736	pPageDesc->u32StrictChecksum = gmmR0StrictPageChecksum(pGMM, pGVM, idPage);
4737	pPage->Shared.u14Checksum = pPageDesc->u32StrictChecksum;
4738	#else
4739	pPage->Shared.u14Checksum = 0;
4740	#endif
4741	pPage->Shared.u2State = GMM_PAGE_STATE_SHARED;
4742	}
4743
4744
4745	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4746	unsigned idxRegion, unsigned idxPage,
4747	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4748	{
4749	/* Easy case: just change the internal page type. */
4750	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4751	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4752	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4753	VERR_PGM_PHYS_INVALID_PAGE_ID);
4754
4755	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4756
4757	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage, pPageDesc);
4758
4759	/* Keep track of these references. */
4760	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4761
4762	return VINF_SUCCESS;
4763	}
4764
4765	/**
4766	* Checks specified shared module range for changes
4767	*
4768	* Performs the following tasks:
4769	* - If a shared page is new, then it changes the GMM page type to shared and
4770	* returns it in the pPageDesc descriptor.
4771	* - If a shared page already exists, then it checks if the VM page is
4772	* identical and if so frees the VM page and returns the shared page in
4773	* pPageDesc descriptor.
4774	*
4775	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4776	*
4777	* @returns VBox status code.
4778	* @param pGMM Pointer to the GMM instance data.
4779	* @param pGVM Pointer to the GVM instance data.
4780	* @param pModule Module description
4781	* @param idxRegion Region index
4782	* @param idxPage Page index
4783	* @param paPageDesc Page descriptor
4784	*/
4785	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, uint32_t idxRegion, uint32_t idxPage,
4786	PGMMSHAREDPAGEDESC pPageDesc)
4787	{
4788	int rc;
4789	PGMM pGMM;
4790	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4791	pPageDesc->u32StrictChecksum = 0;
4792
4793	AssertMsgReturn(idxRegion < pModule->cRegions,
4794	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4795	VERR_INVALID_PARAMETER);
4796
4797	uint32_t const cPages = pModule->aRegions[idxRegion].cb >> PAGE_SHIFT;
4798	AssertMsgReturn(idxPage < cPages,
4799	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4800	VERR_INVALID_PARAMETER);
4801
4802	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4803
4804	/*
4805	* First time; create a page descriptor array.
4806	*/
4807	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4808	if (!pGlobalRegion->paidPages)
4809	{
4810	Log(("Allocate page descriptor array for %d pages\n", cPages));
4811	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
4812	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4813
4814	/* Invalidate all descriptors. */
4815	uint32_t i = cPages;
4816	while (i-- > 0)
4817	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4818	}
4819
4820	/*
4821	* We've seen this shared page for the first time?
4822	*/
4823	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4824	{
4825	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4826	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4827	}
4828
4829	/*
4830	* We've seen it before...
4831	*/
4832	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4833	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4834	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4835
4836	/*
4837	* Get the shared page source.
4838	*/
4839	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
4840	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
4841	VERR_PGM_PHYS_INVALID_PAGE_ID);
4842
4843	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4844	{
4845	/*
4846	* Page was freed at some point; invalidate this entry.
4847	*/
4848	/** @todo this isn't really bullet proof. */
4849	Log(("Old shared page was freed -> create a new one\n"));
4850	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
4851	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4852	}
4853
4854	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4855
4856	/*
4857	* Calculate the virtual address of the local page.
4858	*/
4859	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
4860	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
4861	VERR_PGM_PHYS_INVALID_PAGE_ID);
4862
4863	uint8_t *pbChunk;
4864	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
4865	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
4866	VERR_PGM_PHYS_INVALID_PAGE_ID);
4867	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4868
4869	/*
4870	* Calculate the virtual address of the shared page.
4871	*/
4872	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
4873	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4874
4875	/*
4876	* Get the virtual address of the physical page; map the chunk into the VM
4877	* process if not already done.
4878	*/
4879	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4880	{
4881	Log(("Map chunk into process!\n"));
4882	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4883	AssertRCReturn(rc, rc);
4884	}
4885	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4886
4887	#ifdef VBOX_STRICT
4888	pPageDesc->u32StrictChecksum = RTCrc32(pbSharedPage, PAGE_SIZE);
4889	uint32_t uChecksum = pPageDesc->u32StrictChecksum & UINT32_C(0x00003fff);
4890	AssertMsg(!uChecksum \|\| uChecksum == pPage->Shared.u14Checksum \|\| !pPage->Shared.u14Checksum,
4891	("%#x vs %#x - idPage=%# - %s %s\n", uChecksum, pPage->Shared.u14Checksum,
4892	pGlobalRegion->paidPages[idxPage], pModule->szName, pModule->szVersion));
4893	#endif
4894
4895	/** @todo write ASMMemComparePage. */
4896	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4897	{
4898	Log(("Unexpected differences found between local and shared page; skip\n"));
4899	/* Signal to the caller that this one hasn't changed. */
4900	pPageDesc->idPage = NIL_GMM_PAGEID;
4901	return VINF_SUCCESS;
4902	}
4903
4904	/*
4905	* Free the old local page.
4906	*/
4907	GMMFREEPAGEDESC PageDesc;
4908	PageDesc.idPage = pPageDesc->idPage;
4909	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4910	AssertRCReturn(rc, rc);
4911
4912	gmmR0UseSharedPage(pGMM, pGVM, pPage);
4913
4914	/*
4915	* Pass along the new physical address & page id.
4916	*/
4917	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
4918	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
4919
4920	return VINF_SUCCESS;
4921	}
4922
4923
4924	/**
4925	* RTAvlGCPtrDestroy callback.
4926	*
4927	* @returns 0 or VERR_GMM_INSTANCE.
4928	* @param pNode The node to destroy.
4929	* @param pvArgs Pointer to an argument packet.
4930	*/
4931	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvArgs)
4932	{
4933	gmmR0ShModDeletePerVM(((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGMM,
4934	((GMMR0SHMODPERVMDTORARGS *)pvArgs)->pGVM,
4935	(PGMMSHAREDMODULEPERVM)pNode,
4936	false /fRemove/);
4937	return VINF_SUCCESS;
4938	}
4939
4940
4941	/**
4942	* Used by GMMR0CleanupVM to clean up shared modules.
4943	*
4944	* This is called without taking the GMM lock so that it can be yielded as
4945	* needed here.
4946	*
4947	* @param pGMM The GMM handle.
4948	* @param pGVM The global VM handle.
4949	*/
4950	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
4951	{
4952	gmmR0MutexAcquire(pGMM);
4953	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
4954
4955	GMMR0SHMODPERVMDTORARGS Args;
4956	Args.pGVM = pGVM;
4957	Args.pGMM = pGMM;
4958	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
4959
4960	AssertMsg(pGVM->gmm.s.Stats.cShareableModules == 0, ("%d\n", pGVM->gmm.s.Stats.cShareableModules));
4961	pGVM->gmm.s.Stats.cShareableModules = 0;
4962
4963	gmmR0MutexRelease(pGMM);
4964	}
4965
4966	#endif /* VBOX_WITH_PAGE_SHARING */
4967
4968	/**
4969	* Removes all shared modules for the specified VM
4970	*
4971	* @returns VBox status code.
4972	* @param pVM Pointer to the VM.
4973	* @param idCpu The VCPU id.
4974	*/
4975	GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
4976	{
4977	#ifdef VBOX_WITH_PAGE_SHARING
4978	/*
4979	* Validate input and get the basics.
4980	*/
4981	PGMM pGMM;
4982	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4983	PGVM pGVM;
4984	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4985	if (RT_FAILURE(rc))
4986	return rc;
4987
4988	/*
4989	* Take the semaphore and do some more validations.
4990	*/
4991	gmmR0MutexAcquire(pGMM);
4992	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4993	{
4994	Log(("GMMR0ResetSharedModules\n"));
4995	GMMR0SHMODPERVMDTORARGS Args;
4996	Args.pGVM = pGVM;
4997	Args.pGMM = pGMM;
4998	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, &Args);
4999	pGVM->gmm.s.Stats.cShareableModules = 0;
5000
5001	rc = VINF_SUCCESS;
5002	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5003	}
5004	else
5005	rc = VERR_GMM_IS_NOT_SANE;
5006
5007	gmmR0MutexRelease(pGMM);
5008	return rc;
5009	#else
5010	NOREF(pVM); NOREF(idCpu);
5011	return VERR_NOT_IMPLEMENTED;
5012	#endif
5013	}
5014
5015	#ifdef VBOX_WITH_PAGE_SHARING
5016
5017	/**
5018	* Tree enumeration callback for checking a shared module.
5019	*/
5020	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
5021	{
5022	GMMCHECKSHAREDMODULEINFO pArgs = (GMMCHECKSHAREDMODULEINFO)pvUser;
5023	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
5024	PGMMSHAREDMODULE pGblMod = pRecVM->pGlobalModule;
5025
5026	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x\n",
5027	pGblMod->szName, pGblMod->szVersion, pGblMod->Core.Key, pGblMod->cbModule));
5028
5029	int rc = PGMR0SharedModuleCheck(pArgs->pGVM->pVM, pArgs->pGVM, pArgs->idCpu, pGblMod, pRecVM->aRegionsGCPtrs);
5030	if (RT_FAILURE(rc))
5031	return rc;
5032	return VINF_SUCCESS;
5033	}
5034
5035	#endif /* VBOX_WITH_PAGE_SHARING */
5036	#ifdef DEBUG_sandervl
5037
5038	/**
5039	* Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5040	*
5041	* @returns VBox status code.
5042	* @param pVM Pointer to the VM.
5043	*/
5044	GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
5045	{
5046	/*
5047	* Validate input and get the basics.
5048	*/
5049	PGMM pGMM;
5050	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5051
5052	/*
5053	* Take the semaphore and do some more validations.
5054	*/
5055	gmmR0MutexAcquire(pGMM);
5056	if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5057	rc = VERR_GMM_IS_NOT_SANE;
5058	else
5059	rc = VINF_SUCCESS;
5060
5061	return rc;
5062	}
5063
5064	/**
5065	* Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
5066	*
5067	* @returns VBox status code.
5068	* @param pVM Pointer to the VM.
5069	*/
5070	GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
5071	{
5072	/*
5073	* Validate input and get the basics.
5074	*/
5075	PGMM pGMM;
5076	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5077
5078	gmmR0MutexRelease(pGMM);
5079	return VINF_SUCCESS;
5080	}
5081
5082	#endif /* DEBUG_sandervl */
5083
5084	/**
5085	* Check all shared modules for the specified VM.
5086	*
5087	* @returns VBox status code.
5088	* @param pVM Pointer to the VM.
5089	* @param pVCpu Pointer to the VMCPU.
5090	*/
5091	GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
5092	{
5093	#ifdef VBOX_WITH_PAGE_SHARING
5094	/*
5095	* Validate input and get the basics.
5096	*/
5097	PGMM pGMM;
5098	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5099	PGVM pGVM;
5100	int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
5101	if (RT_FAILURE(rc))
5102	return rc;
5103
5104	# ifndef DEBUG_sandervl
5105	/*
5106	* Take the semaphore and do some more validations.
5107	*/
5108	gmmR0MutexAcquire(pGMM);
5109	# endif
5110	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5111	{
5112	/*
5113	* Walk the tree, checking each module.
5114	*/
5115	Log(("GMMR0CheckSharedModules\n"));
5116
5117	GMMCHECKSHAREDMODULEINFO Args;
5118	Args.pGVM = pGVM;
5119	Args.idCpu = pVCpu->idCpu;
5120	rc = RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Args);
5121
5122	Log(("GMMR0CheckSharedModules done!\n"));
5123	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
5124	}
5125	else
5126	rc = VERR_GMM_IS_NOT_SANE;
5127
5128	# ifndef DEBUG_sandervl
5129	gmmR0MutexRelease(pGMM);
5130	# endif
5131	return rc;
5132	#else
5133	NOREF(pVM); NOREF(pVCpu);
5134	return VERR_NOT_IMPLEMENTED;
5135	#endif
5136	}
5137
5138	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
5139
5140	/**
5141	* RTAvlU32DoWithAll callback.
5142	*
5143	* @returns 0
5144	* @param pNode The node to search.
5145	* @param pvUser Pointer to the input argument packet.
5146	*/
5147	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvUser)
5148	{
5149	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
5150	GMMFINDDUPPAGEINFO pArgs = (GMMFINDDUPPAGEINFO )pvUser;
5151	PGVM pGVM = pArgs->pGVM;
5152	PGMM pGMM = pArgs->pGMM;
5153	uint8_t *pbChunk;
5154
5155	/* Only take chunks not mapped into this VM process; not entirely correct. */
5156	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5157	{
5158	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5159	if (RT_SUCCESS(rc))
5160	{
5161	/*
5162	* Look for duplicate pages
5163	*/
5164	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5165	while (iPage-- > 0)
5166	{
5167	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5168	{
5169	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5170
5171	if (!memcmp(pArgs->pSourcePage, pbDestPage, PAGE_SIZE))
5172	{
5173	pArgs->fFoundDuplicate = true;
5174	break;
5175	}
5176	}
5177	}
5178	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5179	}
5180	}
5181	return pArgs->fFoundDuplicate; /* (stops search if true) */
5182	}
5183
5184
5185	/**
5186	* Find a duplicate of the specified page in other active VMs
5187	*
5188	* @returns VBox status code.
5189	* @param pVM Pointer to the VM.
5190	* @param pReq Pointer to the request packet.
5191	*/
5192	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5193	{
5194	/*
5195	* Validate input and pass it on.
5196	*/
5197	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5198	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5199	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5200
5201	PGMM pGMM;
5202	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5203
5204	PGVM pGVM;
5205	int rc = GVMMR0ByVM(pVM, &pGVM);
5206	if (RT_FAILURE(rc))
5207	return rc;
5208
5209	/*
5210	* Take the semaphore and do some more validations.
5211	*/
5212	rc = gmmR0MutexAcquire(pGMM);
5213	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5214	{
5215	uint8_t *pbChunk;
5216	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5217	if (pChunk)
5218	{
5219	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5220	{
5221	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5222	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5223	if (pPage)
5224	{
5225	GMMFINDDUPPAGEINFO Args;
5226	Args.pGVM = pGVM;
5227	Args.pGMM = pGMM;
5228	Args.pSourcePage = pbSourcePage;
5229	Args.fFoundDuplicate = false;
5230	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Args);
5231
5232	pReq->fDuplicate = Args.fFoundDuplicate;
5233	}
5234	else
5235	{
5236	AssertFailed();
5237	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5238	}
5239	}
5240	else
5241	AssertFailed();
5242	}
5243	else
5244	AssertFailed();
5245	}
5246	else
5247	rc = VERR_GMM_IS_NOT_SANE;
5248
5249	gmmR0MutexRelease(pGMM);
5250	return rc;
5251	}
5252
5253	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5254
5255
5256	/**
5257	* Retrieves the GMM statistics visible to the caller.
5258	*
5259	* @returns VBox status code.
5260	*
5261	* @param pStats Where to put the statistics.
5262	* @param pSession The current session.
5263	* @param pVM Pointer to the VM to obtain statistics for. Optional.
5264	*/
5265	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5266	{
5267	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pVM=%p\n", pStats, pSession, pVM));
5268
5269	/*
5270	* Validate input.
5271	*/
5272	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5273	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5274	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5275
5276	PGMM pGMM;
5277	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5278
5279	/*
5280	* Resolve the VM handle, if not NULL, and lock the GMM.
5281	*/
5282	int rc;
5283	PGVM pGVM;
5284	if (pVM)
5285	{
5286	rc = GVMMR0ByVM(pVM, &pGVM);
5287	if (RT_FAILURE(rc))
5288	return rc;
5289	}
5290	else
5291	pGVM = NULL;
5292
5293	rc = gmmR0MutexAcquire(pGMM);
5294	if (RT_FAILURE(rc))
5295	return rc;
5296
5297	/*
5298	* Copy out the GMM statistics.
5299	*/
5300	pStats->cMaxPages = pGMM->cMaxPages;
5301	pStats->cReservedPages = pGMM->cReservedPages;
5302	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5303	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5304	pStats->cSharedPages = pGMM->cSharedPages;
5305	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5306	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5307	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5308	pStats->cChunks = pGMM->cChunks;
5309	pStats->cFreedChunks = pGMM->cFreedChunks;
5310	pStats->cShareableModules = pGMM->cShareableModules;
5311	RT_ZERO(pStats->au64Reserved);
5312
5313	/*
5314	* Copy out the VM statistics.
5315	*/
5316	if (pGVM)
5317	pStats->VMStats = pGVM->gmm.s.Stats;
5318	else
5319	RT_ZERO(pStats->VMStats);
5320
5321	gmmR0MutexRelease(pGMM);
5322	return rc;
5323	}
5324
5325
5326	/**
5327	* VMMR0 request wrapper for GMMR0QueryStatistics.
5328	*
5329	* @returns see GMMR0QueryStatistics.
5330	* @param pVM Pointer to the VM. Optional.
5331	* @param pReq Pointer to the request packet.
5332	*/
5333	GMMR0DECL(int) GMMR0QueryStatisticsReq(PVM pVM, PGMMQUERYSTATISTICSSREQ pReq)
5334	{
5335	/*
5336	* Validate input and pass it on.
5337	*/
5338	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5339	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5340
5341	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pVM);
5342	}
5343
5344
5345	/**
5346	* Resets the specified GMM statistics.
5347	*
5348	* @returns VBox status code.
5349	*
5350	* @param pStats Which statistics to reset, that is, non-zero fields
5351	* indicates which to reset.
5352	* @param pSession The current session.
5353	* @param pVM The VM to reset statistics for. Optional.
5354	*/
5355	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5356	{
5357	/* Currently nothing we can reset at the moment. */
5358	return VINF_SUCCESS;
5359	}
5360
5361
5362	/**
5363	* VMMR0 request wrapper for GMMR0ResetStatistics.
5364	*
5365	* @returns see GMMR0ResetStatistics.
5366	* @param pVM Pointer to the VM. Optional.
5367	* @param pReq Pointer to the request packet.
5368	*/
5369	GMMR0DECL(int) GMMR0ResetStatisticsReq(PVM pVM, PGMMRESETSTATISTICSSREQ pReq)
5370	{
5371	/*
5372	* Validate input and pass it on.
5373	*/
5374	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5375	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5376
5377	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pVM);
5378	}
5379

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 43361

Download in other formats: