GMMR0.cpp@ 40026

Last change on this file since 40026 was 40006, checked in by vboxsync, 13 years ago
build fix.
Property svn:eol-style set to `native` Property svn:keywords set to `Id`
File size: 181.4 KB

Line
1	/* $Id: GMMR0.cpp 40006 2012-02-06 10:49:51Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2011 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relation ship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref subsec_pgmPhys_Serializing.
115	*
116	* @see @ref subsec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*******************************************************************************
150	* Header Files *
151	*******************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/vm.h>
155	#include <VBox/vmm/gmm.h>
156	#include "GMMR0Internal.h"
157	#include <VBox/vmm/gvm.h>
158	#include <VBox/vmm/pgm.h>
159	#include <VBox/log.h>
160	#include <VBox/param.h>
161	#include <VBox/err.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#include <iprt/list.h>
165	#include <iprt/mem.h>
166	#include <iprt/memobj.h>
167	#include <iprt/mp.h>
168	#include <iprt/semaphore.h>
169	#include <iprt/string.h>
170	#include <iprt/time.h>
171
172
173	/*******************************************************************************
174	* Structures and Typedefs *
175	*******************************************************************************/
176	/** Pointer to set of free chunks. */
177	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
178
179	/**
180	* The per-page tracking structure employed by the GMM.
181	*
182	* On 32-bit hosts we'll some trickery is necessary to compress all
183	* the information into 32-bits. When the fSharedFree member is set,
184	* the 30th bit decides whether it's a free page or not.
185	*
186	* Because of the different layout on 32-bit and 64-bit hosts, macros
187	* are used to get and set some of the data.
188	*/
189	typedef union GMMPAGE
190	{
191	#if HC_ARCH_BITS == 64
192	/** Unsigned integer view. */
193	uint64_t u;
194
195	/** The common view. */
196	struct GMMPAGECOMMON
197	{
198	uint32_t uStuff1 : 32;
199	uint32_t uStuff2 : 30;
200	/** The page state. */
201	uint32_t u2State : 2;
202	} Common;
203
204	/** The view of a private page. */
205	struct GMMPAGEPRIVATE
206	{
207	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
208	uint32_t pfn;
209	/** The GVM handle. (64K VMs) */
210	uint32_t hGVM : 16;
211	/** Reserved. */
212	uint32_t u16Reserved : 14;
213	/** The page state. */
214	uint32_t u2State : 2;
215	} Private;
216
217	/** The view of a shared page. */
218	struct GMMPAGESHARED
219	{
220	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
221	uint32_t pfn;
222	/** The reference count (64K VMs). */
223	uint32_t cRefs : 16;
224	/** Reserved. Checksum or something? Two hGVMs for forking? */
225	uint32_t u14Reserved : 14;
226	/** The page state. */
227	uint32_t u2State : 2;
228	} Shared;
229
230	/** The view of a free page. */
231	struct GMMPAGEFREE
232	{
233	/** The index of the next page in the free list. UINT16_MAX is NIL. */
234	uint16_t iNext;
235	/** Reserved. Checksum or something? */
236	uint16_t u16Reserved0;
237	/** Reserved. Checksum or something? */
238	uint32_t u30Reserved1 : 30;
239	/** The page state. */
240	uint32_t u2State : 2;
241	} Free;
242
243	#else /* 32-bit */
244	/** Unsigned integer view. */
245	uint32_t u;
246
247	/** The common view. */
248	struct GMMPAGECOMMON
249	{
250	uint32_t uStuff : 30;
251	/** The page state. */
252	uint32_t u2State : 2;
253	} Common;
254
255	/** The view of a private page. */
256	struct GMMPAGEPRIVATE
257	{
258	/** The guest page frame number. (Max addressable: 2 ^ 36) */
259	uint32_t pfn : 24;
260	/** The GVM handle. (127 VMs) */
261	uint32_t hGVM : 7;
262	/** The top page state bit, MBZ. */
263	uint32_t fZero : 1;
264	} Private;
265
266	/** The view of a shared page. */
267	struct GMMPAGESHARED
268	{
269	/** The reference count. */
270	uint32_t cRefs : 30;
271	/** The page state. */
272	uint32_t u2State : 2;
273	} Shared;
274
275	/** The view of a free page. */
276	struct GMMPAGEFREE
277	{
278	/** The index of the next page in the free list. UINT16_MAX is NIL. */
279	uint32_t iNext : 16;
280	/** Reserved. Checksum or something? */
281	uint32_t u14Reserved : 14;
282	/** The page state. */
283	uint32_t u2State : 2;
284	} Free;
285	#endif
286	} GMMPAGE;
287	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
288	/** Pointer to a GMMPAGE. */
289	typedef GMMPAGE *PGMMPAGE;
290
291
292	/** @name The Page States.
293	* @{ */
294	/** A private page. */
295	#define GMM_PAGE_STATE_PRIVATE 0
296	/** A private page - alternative value used on the 32-bit implementation.
297	* This will never be used on 64-bit hosts. */
298	#define GMM_PAGE_STATE_PRIVATE_32 1
299	/** A shared page. */
300	#define GMM_PAGE_STATE_SHARED 2
301	/** A free page. */
302	#define GMM_PAGE_STATE_FREE 3
303	/** @} */
304
305
306	/** @def GMM_PAGE_IS_PRIVATE
307	*
308	* @returns true if private, false if not.
309	* @param pPage The GMM page.
310	*/
311	#if HC_ARCH_BITS == 64
312	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
313	#else
314	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
315	#endif
316
317	/** @def GMM_PAGE_IS_SHARED
318	*
319	* @returns true if shared, false if not.
320	* @param pPage The GMM page.
321	*/
322	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
323
324	/** @def GMM_PAGE_IS_FREE
325	*
326	* @returns true if free, false if not.
327	* @param pPage The GMM page.
328	*/
329	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
330
331	/** @def GMM_PAGE_PFN_LAST
332	* The last valid guest pfn range.
333	* @remark Some of the values outside the range has special meaning,
334	* see GMM_PAGE_PFN_UNSHAREABLE.
335	*/
336	#if HC_ARCH_BITS == 64
337	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
338	#else
339	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
340	#endif
341	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
342
343	/** @def GMM_PAGE_PFN_UNSHAREABLE
344	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
345	*/
346	#if HC_ARCH_BITS == 64
347	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
348	#else
349	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
350	#endif
351	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
352
353
354	/**
355	* A GMM allocation chunk ring-3 mapping record.
356	*
357	* This should really be associated with a session and not a VM, but
358	* it's simpler to associated with a VM and cleanup with the VM object
359	* is destroyed.
360	*/
361	typedef struct GMMCHUNKMAP
362	{
363	/** The mapping object. */
364	RTR0MEMOBJ hMapObj;
365	/** The VM owning the mapping. */
366	PGVM pGVM;
367	} GMMCHUNKMAP;
368	/** Pointer to a GMM allocation chunk mapping. */
369	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
370
371
372	/**
373	* A GMM allocation chunk.
374	*/
375	typedef struct GMMCHUNK
376	{
377	/** The AVL node core.
378	* The Key is the chunk ID. (Giant mtx.) */
379	AVLU32NODECORE Core;
380	/** The memory object.
381	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
382	* what the host can dish up with. (Chunk mtx protects mapping accesses
383	* and related frees.) */
384	RTR0MEMOBJ hMemObj;
385	/** Pointer to the next chunk in the free list. (Giant mtx.) */
386	PGMMCHUNK pFreeNext;
387	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
388	PGMMCHUNK pFreePrev;
389	/** Pointer to the free set this chunk belongs to. NULL for
390	* chunks with no free pages. (Giant mtx.) */
391	PGMMCHUNKFREESET pSet;
392	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
393	RTLISTNODE ListNode;
394	/** Pointer to an array of mappings. (Chunk mtx.) */
395	PGMMCHUNKMAP paMappingsX;
396	/** The number of mappings. (Chunk mtx.) */
397	uint16_t cMappingsX;
398	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
399	* mapping or freeing anything. (Giant mtx.) */
400	uint8_t volatile iChunkMtx;
401	/** Flags field reserved for future use (like eliminating enmType).
402	* (Giant mtx.) */
403	uint8_t fFlags;
404	/** The head of the list of free pages. UINT16_MAX is the NIL value.
405	* (Giant mtx.) */
406	uint16_t iFreeHead;
407	/** The number of free pages. (Giant mtx.) */
408	uint16_t cFree;
409	/** The GVM handle of the VM that first allocated pages from this chunk, this
410	* is used as a preference when there are several chunks to choose from.
411	* When in bound memory mode this isn't a preference any longer. (Giant
412	* mtx.) */
413	uint16_t hGVM;
414	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
415	* future use.) (Giant mtx.) */
416	uint16_t idNumaNode;
417	/** The number of private pages. (Giant mtx.) */
418	uint16_t cPrivate;
419	/** The number of shared pages. (Giant mtx.) */
420	uint16_t cShared;
421	/** The pages. (Giant mtx.) */
422	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
423	} GMMCHUNK;
424
425	/** Indicates that the NUMA properies of the memory is unknown. */
426	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
427
428	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
429	* @{ */
430	/** Indicates that the chunk is a large page (2MB). */
431	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
432	/** @} */
433
434
435	/**
436	* An allocation chunk TLB entry.
437	*/
438	typedef struct GMMCHUNKTLBE
439	{
440	/** The chunk id. */
441	uint32_t idChunk;
442	/** Pointer to the chunk. */
443	PGMMCHUNK pChunk;
444	} GMMCHUNKTLBE;
445	/** Pointer to an allocation chunk TLB entry. */
446	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
447
448
449	/** The number of entries tin the allocation chunk TLB. */
450	#define GMM_CHUNKTLB_ENTRIES 32
451	/** Gets the TLB entry index for the given Chunk ID. */
452	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
453
454	/**
455	* An allocation chunk TLB.
456	*/
457	typedef struct GMMCHUNKTLB
458	{
459	/** The TLB entries. */
460	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
461	} GMMCHUNKTLB;
462	/** Pointer to an allocation chunk TLB. */
463	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
464
465
466	/**
467	* The GMM instance data.
468	*/
469	typedef struct GMM
470	{
471	/** Magic / eye catcher. GMM_MAGIC */
472	uint32_t u32Magic;
473	/** The number of threads waiting on the mutex. */
474	uint32_t cMtxContenders;
475	/** The fast mutex protecting the GMM.
476	* More fine grained locking can be implemented later if necessary. */
477	RTSEMFASTMUTEX hMtx;
478	#ifdef VBOX_STRICT
479	/** The current mutex owner. */
480	RTNATIVETHREAD hMtxOwner;
481	#endif
482	/** The chunk tree. */
483	PAVLU32NODECORE pChunks;
484	/** The chunk TLB. */
485	GMMCHUNKTLB ChunkTLB;
486	/** The private free set. */
487	GMMCHUNKFREESET PrivateX;
488	/** The shared free set. */
489	GMMCHUNKFREESET Shared;
490
491	/** Shared module tree (global). */
492	/** @todo separate trees for distinctly different guest OSes. */
493	PAVLGCPTRNODECORE pGlobalSharedModuleTree;
494	/** Sharable modules (count of nodes in pGlobalSharedModuleTree). */
495	uint32_t cShareableModules;
496
497	/** The chunk list. For simplifying the cleanup process. */
498	RTLISTANCHOR ChunkList;
499
500	/** The maximum number of pages we're allowed to allocate.
501	* @gcfgm 64-bit GMM/MaxPages Direct.
502	* @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
503	uint64_t cMaxPages;
504	/** The number of pages that has been reserved.
505	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
506	uint64_t cReservedPages;
507	/** The number of pages that we have over-committed in reservations. */
508	uint64_t cOverCommittedPages;
509	/** The number of actually allocated (committed if you like) pages. */
510	uint64_t cAllocatedPages;
511	/** The number of pages that are shared. A subset of cAllocatedPages. */
512	uint64_t cSharedPages;
513	/** The number of pages that are actually shared between VMs. */
514	uint64_t cDuplicatePages;
515	/** The number of pages that are shared that has been left behind by
516	* VMs not doing proper cleanups. */
517	uint64_t cLeftBehindSharedPages;
518	/** The number of allocation chunks.
519	* (The number of pages we've allocated from the host can be derived from this.) */
520	uint32_t cChunks;
521	/** The number of current ballooned pages. */
522	uint64_t cBalloonedPages;
523
524	/** The legacy allocation mode indicator.
525	* This is determined at initialization time. */
526	bool fLegacyAllocationMode;
527	/** The bound memory mode indicator.
528	* When set, the memory will be bound to a specific VM and never
529	* shared. This is always set if fLegacyAllocationMode is set.
530	* (Also determined at initialization time.) */
531	bool fBoundMemoryMode;
532	/** The number of registered VMs. */
533	uint16_t cRegisteredVMs;
534
535	/** The number of freed chunks ever. This is used a list generation to
536	* avoid restarting the cleanup scanning when the list wasn't modified. */
537	uint32_t cFreedChunks;
538	/** The previous allocated Chunk ID.
539	* Used as a hint to avoid scanning the whole bitmap. */
540	uint32_t idChunkPrev;
541	/** Chunk ID allocation bitmap.
542	* Bits of allocated IDs are set, free ones are clear.
543	* The NIL id (0) is marked allocated. */
544	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
545
546	/** The index of the next mutex to use. */
547	uint32_t iNextChunkMtx;
548	/** Chunk locks for reducing lock contention without having to allocate
549	* one lock per chunk. */
550	struct
551	{
552	/** The mutex */
553	RTSEMFASTMUTEX hMtx;
554	/** The number of threads currently using this mutex. */
555	uint32_t volatile cUsers;
556	} aChunkMtx[64];
557	} GMM;
558	/** Pointer to the GMM instance. */
559	typedef GMM *PGMM;
560
561	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
562	#define GMM_MAGIC UINT32_C(0x19540414)
563
564
565	/**
566	* GMM chunk mutex state.
567	*
568	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
569	* gmmR0ChunkMutex* methods.
570	*/
571	typedef struct GMMR0CHUNKMTXSTATE
572	{
573	PGMM pGMM;
574	/** The index of the chunk mutex. */
575	uint8_t iChunkMtx;
576	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
577	uint8_t fFlags;
578	} GMMR0CHUNKMTXSTATE;
579	/** Pointer to a chunk mutex state. */
580	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
581
582	/** @name GMMR0CHUNK_MTX_XXX
583	* @{ */
584	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
585	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
586	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
587	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
588	#define GMMR0CHUNK_MTX_END UINT32_C(4)
589	/** @} */
590
591
592	/*******************************************************************************
593	* Global Variables *
594	*******************************************************************************/
595	/** Pointer to the GMM instance data. */
596	static PGMM g_pGMM = NULL;
597
598	/** Macro for obtaining and validating the g_pGMM pointer.
599	*
600	* On failure it will return from the invoking function with the specified
601	* return value.
602	*
603	* @param pGMM The name of the pGMM variable.
604	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
605	* status codes.
606	*/
607	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
608	do { \
609	(pGMM) = g_pGMM; \
610	AssertPtrReturn((pGMM), (rc)); \
611	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
612	} while (0)
613
614	/** Macro for obtaining and validating the g_pGMM pointer, void function
615	* variant.
616	*
617	* On failure it will return from the invoking function.
618	*
619	* @param pGMM The name of the pGMM variable.
620	*/
621	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
622	do { \
623	(pGMM) = g_pGMM; \
624	AssertPtrReturnVoid((pGMM)); \
625	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
626	} while (0)
627
628
629	/** @def GMM_CHECK_SANITY_UPON_ENTERING
630	* Checks the sanity of the GMM instance data before making changes.
631	*
632	* This is macro is a stub by default and must be enabled manually in the code.
633	*
634	* @returns true if sane, false if not.
635	* @param pGMM The name of the pGMM variable.
636	*/
637	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
638	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
639	#else
640	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
641	#endif
642
643	/** @def GMM_CHECK_SANITY_UPON_LEAVING
644	* Checks the sanity of the GMM instance data after making changes.
645	*
646	* This is macro is a stub by default and must be enabled manually in the code.
647	*
648	* @returns true if sane, false if not.
649	* @param pGMM The name of the pGMM variable.
650	*/
651	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
652	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
653	#else
654	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
655	#endif
656
657	/** @def GMM_CHECK_SANITY_IN_LOOPS
658	* Checks the sanity of the GMM instance in the allocation loops.
659	*
660	* This is macro is a stub by default and must be enabled manually in the code.
661	*
662	* @returns true if sane, false if not.
663	* @param pGMM The name of the pGMM variable.
664	*/
665	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
666	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
667	#else
668	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
669	#endif
670
671
672	/*******************************************************************************
673	* Internal Functions *
674	*******************************************************************************/
675	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
676	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
677	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
678	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
679	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
680	#ifdef GMMR0_WITH_SANITY_CHECK
681	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
682	#endif
683	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
684	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
685	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
686	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
687	#ifdef VBOX_WITH_PAGE_SHARING
688	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
689	#endif
690
691
692
693	/**
694	* Initializes the GMM component.
695	*
696	* This is called when the VMMR0.r0 module is loaded and protected by the
697	* loader semaphore.
698	*
699	* @returns VBox status code.
700	*/
701	GMMR0DECL(int) GMMR0Init(void)
702	{
703	LogFlow(("GMMInit:\n"));
704
705	/*
706	* Allocate the instance data and the locks.
707	*/
708	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
709	if (!pGMM)
710	return VERR_NO_MEMORY;
711
712	pGMM->u32Magic = GMM_MAGIC;
713	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
714	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
715	RTListInit(&pGMM->ChunkList);
716	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
717
718	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
719	if (RT_SUCCESS(rc))
720	{
721	unsigned iMtx;
722	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
723	{
724	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
725	if (RT_FAILURE(rc))
726	break;
727	}
728	if (RT_SUCCESS(rc))
729	{
730	/*
731	* Check and see if RTR0MemObjAllocPhysNC works.
732	*/
733	#if 0 /* later, see #3170. */
734	RTR0MEMOBJ MemObj;
735	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
736	if (RT_SUCCESS(rc))
737	{
738	rc = RTR0MemObjFree(MemObj, true);
739	AssertRC(rc);
740	}
741	else if (rc == VERR_NOT_SUPPORTED)
742	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
743	else
744	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
745	#else
746	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
747	pGMM->fLegacyAllocationMode = false;
748	# if ARCH_BITS == 32
749	/* Don't reuse possibly partial chunks because of the virtual
750	address space limitation. */
751	pGMM->fBoundMemoryMode = true;
752	# else
753	pGMM->fBoundMemoryMode = false;
754	# endif
755	# else
756	pGMM->fLegacyAllocationMode = true;
757	pGMM->fBoundMemoryMode = true;
758	# endif
759	#endif
760
761	/*
762	* Query system page count and guess a reasonable cMaxPages value.
763	*/
764	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
765
766	g_pGMM = pGMM;
767	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
768	return VINF_SUCCESS;
769	}
770
771	/*
772	* Bail out.
773	*/
774	while (iMtx-- > 0)
775	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
776	RTSemFastMutexDestroy(pGMM->hMtx);
777	}
778
779	pGMM->u32Magic = 0;
780	RTMemFree(pGMM);
781	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
782	return rc;
783	}
784
785
786	/**
787	* Terminates the GMM component.
788	*/
789	GMMR0DECL(void) GMMR0Term(void)
790	{
791	LogFlow(("GMMTerm:\n"));
792
793	/*
794	* Take care / be paranoid...
795	*/
796	PGMM pGMM = g_pGMM;
797	if (!VALID_PTR(pGMM))
798	return;
799	if (pGMM->u32Magic != GMM_MAGIC)
800	{
801	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
802	return;
803	}
804
805	/*
806	* Undo what init did and free all the resources we've acquired.
807	*/
808	/* Destroy the fundamentals. */
809	g_pGMM = NULL;
810	pGMM->u32Magic = ~GMM_MAGIC;
811	RTSemFastMutexDestroy(pGMM->hMtx);
812	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
813
814	/* Free any chunks still hanging around. */
815	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
816
817	/* Destroy the chunk locks. */
818	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
819	{
820	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
821	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
822	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
823	}
824
825	/* Finally the instance data itself. */
826	RTMemFree(pGMM);
827	LogFlow(("GMMTerm: done\n"));
828	}
829
830
831	/**
832	* RTAvlU32Destroy callback.
833	*
834	* @returns 0
835	* @param pNode The node to destroy.
836	* @param pvGMM The GMM handle.
837	*/
838	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
839	{
840	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
841
842	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
843	SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
844	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
845
846	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
847	if (RT_FAILURE(rc))
848	{
849	SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
850	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
851	AssertRC(rc);
852	}
853	pChunk->hMemObj = NIL_RTR0MEMOBJ;
854
855	RTMemFree(pChunk->paMappingsX);
856	pChunk->paMappingsX = NULL;
857
858	RTMemFree(pChunk);
859	NOREF(pvGMM);
860	return 0;
861	}
862
863
864	/**
865	* Initializes the per-VM data for the GMM.
866	*
867	* This is called from within the GVMM lock (from GVMMR0CreateVM)
868	* and should only initialize the data members so GMMR0CleanupVM
869	* can deal with them. We reserve no memory or anything here,
870	* that's done later in GMMR0InitVM.
871	*
872	* @param pGVM Pointer to the Global VM structure.
873	*/
874	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
875	{
876	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
877
878	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
879	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
880	pGVM->gmm.s.Stats.fMayAllocate = false;
881	}
882
883
884	/**
885	* Acquires the GMM giant lock.
886	*
887	* @returns Assert status code from RTSemFastMutexRequest.
888	* @param pGMM Pointer to the GMM instance.
889	*/
890	static int gmmR0MutexAcquire(PGMM pGMM)
891	{
892	ASMAtomicIncU32(&pGMM->cMtxContenders);
893	int rc = RTSemFastMutexRequest(pGMM->hMtx);
894	ASMAtomicDecU32(&pGMM->cMtxContenders);
895	AssertRC(rc);
896	#ifdef VBOX_STRICT
897	pGMM->hMtxOwner = RTThreadNativeSelf();
898	#endif
899	return rc;
900	}
901
902
903	/**
904	* Releases the GMM giant lock.
905	*
906	* @returns Assert status code from RTSemFastMutexRequest.
907	* @param pGMM Pointer to the GMM instance.
908	*/
909	static int gmmR0MutexRelease(PGMM pGMM)
910	{
911	#ifdef VBOX_STRICT
912	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
913	#endif
914	int rc = RTSemFastMutexRelease(pGMM->hMtx);
915	AssertRC(rc);
916	return rc;
917	}
918
919
920	/**
921	* Yields the GMM giant lock if there is contention and a certain minimum time
922	* has elapsed since we took it.
923	*
924	* @returns @c true if the mutex was yielded, @c false if not.
925	* @param pGMM Pointer to the GMM instance.
926	* @param puLockNanoTS Where the lock acquisition time stamp is kept
927	* (in/out).
928	*/
929	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
930	{
931	/*
932	* If nobody is contending the mutex, don't bother checking the time.
933	*/
934	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
935	return false;
936
937	/*
938	* Don't yield if we haven't executed for at least 2 milliseconds.
939	*/
940	uint64_t uNanoNow = RTTimeSystemNanoTS();
941	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
942	return false;
943
944	/*
945	* Yield the mutex.
946	*/
947	#ifdef VBOX_STRICT
948	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
949	#endif
950	ASMAtomicIncU32(&pGMM->cMtxContenders);
951	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
952
953	RTThreadYield();
954
955	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
956	*puLockNanoTS = RTTimeSystemNanoTS();
957	ASMAtomicDecU32(&pGMM->cMtxContenders);
958	#ifdef VBOX_STRICT
959	pGMM->hMtxOwner = RTThreadNativeSelf();
960	#endif
961
962	return true;
963	}
964
965
966	/**
967	* Acquires a chunk lock.
968	*
969	* The caller must own the giant lock.
970	*
971	* @returns Assert status code from RTSemFastMutexRequest.
972	* @param pMtxState The chunk mutex state info. (Avoids
973	* passing the same flags and stuff around
974	* for subsequent release and drop-giant
975	* calls.)
976	* @param pGMM Pointer to the GMM instance.
977	* @param pChunk Pointer to the chunk.
978	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
979	*/
980	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
981	{
982	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
983	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
984
985	pMtxState->pGMM = pGMM;
986	pMtxState->fFlags = (uint8_t)fFlags;
987
988	/*
989	* Get the lock index and reference the lock.
990	*/
991	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
992	uint32_t iChunkMtx = pChunk->iChunkMtx;
993	if (iChunkMtx == UINT8_MAX)
994	{
995	iChunkMtx = pGMM->iNextChunkMtx++;
996	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
997
998	/* Try get an unused one... */
999	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1000	{
1001	iChunkMtx = pGMM->iNextChunkMtx++;
1002	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1003	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1004	{
1005	iChunkMtx = pGMM->iNextChunkMtx++;
1006	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1007	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1008	{
1009	iChunkMtx = pGMM->iNextChunkMtx++;
1010	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1011	}
1012	}
1013	}
1014
1015	pChunk->iChunkMtx = iChunkMtx;
1016	}
1017	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1018	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1019	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1020
1021	/*
1022	* Drop the giant?
1023	*/
1024	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1025	{
1026	/** @todo GMM life cycle cleanup (we may race someone
1027	* destroying and cleaning up GMM)? */
1028	gmmR0MutexRelease(pGMM);
1029	}
1030
1031	/*
1032	* Take the chunk mutex.
1033	*/
1034	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1035	AssertRC(rc);
1036	return rc;
1037	}
1038
1039
1040	/**
1041	* Releases the GMM giant lock.
1042	*
1043	* @returns Assert status code from RTSemFastMutexRequest.
1044	* @param pGMM Pointer to the GMM instance.
1045	* @param pChunk Pointer to the chunk if it's still
1046	* alive, NULL if it isn't. This is used to deassociate
1047	* the chunk from the mutex on the way out so a new one
1048	* can be selected next time, thus avoiding contented
1049	* mutexes.
1050	*/
1051	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1052	{
1053	PGMM pGMM = pMtxState->pGMM;
1054
1055	/*
1056	* Release the chunk mutex and reacquire the giant if requested.
1057	*/
1058	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1059	AssertRC(rc);
1060	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1061	rc = gmmR0MutexAcquire(pGMM);
1062	else
1063	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1064
1065	/*
1066	* Drop the chunk mutex user reference and deassociate it from the chunk
1067	* when possible.
1068	*/
1069	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1070	&& pChunk
1071	&& RT_SUCCESS(rc) )
1072	{
1073	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1074	pChunk->iChunkMtx = UINT8_MAX;
1075	else
1076	{
1077	rc = gmmR0MutexAcquire(pGMM);
1078	if (RT_SUCCESS(rc))
1079	{
1080	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1081	pChunk->iChunkMtx = UINT8_MAX;
1082	rc = gmmR0MutexRelease(pGMM);
1083	}
1084	}
1085	}
1086
1087	pMtxState->pGMM = NULL;
1088	return rc;
1089	}
1090
1091
1092	/**
1093	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1094	* chunk locked.
1095	*
1096	* This only works if gmmR0ChunkMutexAcquire was called with
1097	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1098	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1099	*
1100	* @returns VBox status code (assuming success is ok).
1101	* @param pMtxState Pointer to the chunk mutex state.
1102	*/
1103	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1104	{
1105	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1106	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1107	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1108	/** @todo GMM life cycle cleanup (we may race someone
1109	* destroying and cleaning up GMM)? */
1110	return gmmR0MutexRelease(pMtxState->pGMM);
1111	}
1112
1113
1114	/**
1115	* For experimenting with NUMA affinity and such.
1116	*
1117	* @returns The current NUMA Node ID.
1118	*/
1119	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1120	{
1121	#if 1
1122	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1123	#else
1124	return RTMpCpuId() / 16;
1125	#endif
1126	}
1127
1128
1129
1130	/**
1131	* Cleans up when a VM is terminating.
1132	*
1133	* @param pGVM Pointer to the Global VM structure.
1134	*/
1135	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1136	{
1137	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1138
1139	PGMM pGMM;
1140	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1141
1142	#ifdef VBOX_WITH_PAGE_SHARING
1143	/*
1144	* Clean up all registered shared modules first.
1145	*/
1146	gmmR0SharedModuleCleanup(pGMM, pGVM);
1147	#endif
1148
1149	gmmR0MutexAcquire(pGMM);
1150	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1151	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1152
1153	/*
1154	* The policy is 'INVALID' until the initial reservation
1155	* request has been serviced.
1156	*/
1157	if ( pGVM->gmm.s.Stats.enmPolicy > GMMOCPOLICY_INVALID
1158	&& pGVM->gmm.s.Stats.enmPolicy < GMMOCPOLICY_END)
1159	{
1160	/*
1161	* If it's the last VM around, we can skip walking all the chunk looking
1162	* for the pages owned by this VM and instead flush the whole shebang.
1163	*
1164	* This takes care of the eventuality that a VM has left shared page
1165	* references behind (shouldn't happen of course, but you never know).
1166	*/
1167	Assert(pGMM->cRegisteredVMs);
1168	pGMM->cRegisteredVMs--;
1169
1170	/*
1171	* Walk the entire pool looking for pages that belong to this VM
1172	* and leftover mappings. (This'll only catch private pages,
1173	* shared pages will be 'left behind'.)
1174	*/
1175	uint64_t cPrivatePages = pGVM->gmm.s.Stats.cPrivatePages; /* save */
1176
1177	unsigned iCountDown = 64;
1178	bool fRedoFromStart;
1179	PGMMCHUNK pChunk;
1180	do
1181	{
1182	fRedoFromStart = false;
1183	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1184	{
1185	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1186	if (gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1187	{
1188	/* We left the giant mutex, so reset the yield counters. */
1189	uLockNanoTS = RTTimeSystemNanoTS();
1190	iCountDown = 64;
1191	}
1192	else
1193	{
1194	/* Didn't leave it, so do normal yielding. */
1195	if (!iCountDown)
1196	gmmR0MutexYield(pGMM, &uLockNanoTS);
1197	else
1198	iCountDown--;
1199	}
1200	if (pGMM->cFreedChunks != cFreeChunksOld)
1201	break;
1202	}
1203	} while (fRedoFromStart);
1204
1205	if (pGVM->gmm.s.Stats.cPrivatePages)
1206	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cPrivatePages);
1207
1208	pGMM->cAllocatedPages -= cPrivatePages;
1209
1210	/*
1211	* Free empty chunks.
1212	*/
1213	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1214	do
1215	{
1216	fRedoFromStart = false;
1217	iCountDown = 10240;
1218	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1219	while (pChunk)
1220	{
1221	PGMMCHUNK pNext = pChunk->pFreeNext;
1222	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1223	if ( !pGMM->fBoundMemoryMode
1224	\|\| pChunk->hGVM == pGVM->hSelf)
1225	{
1226	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1227	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1228	{
1229	/* We've left the giant mutex, restart? (+1 for our unlink) */
1230	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1231	if (fRedoFromStart)
1232	break;
1233	uLockNanoTS = RTTimeSystemNanoTS();
1234	iCountDown = 10240;
1235	}
1236	}
1237
1238	/* Advance and maybe yield the lock. */
1239	pChunk = pNext;
1240	if (--iCountDown == 0)
1241	{
1242	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1243	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1244	&& pPrivateSet->idGeneration != idGenerationOld;
1245	if (fRedoFromStart)
1246	break;
1247	iCountDown = 10240;
1248	}
1249	}
1250	} while (fRedoFromStart);
1251
1252	/*
1253	* Account for shared pages that weren't freed.
1254	*/
1255	if (pGVM->gmm.s.Stats.cSharedPages)
1256	{
1257	Assert(pGMM->cSharedPages >= pGVM->gmm.s.Stats.cSharedPages);
1258	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.Stats.cSharedPages);
1259	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.Stats.cSharedPages;
1260	}
1261
1262	/*
1263	* Clean up balloon statistics in case the VM process crashed.
1264	*/
1265	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
1266	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
1267
1268	/*
1269	* Update the over-commitment management statistics.
1270	*/
1271	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1272	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1273	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1274	switch (pGVM->gmm.s.Stats.enmPolicy)
1275	{
1276	case GMMOCPOLICY_NO_OC:
1277	break;
1278	default:
1279	/** @todo Update GMM->cOverCommittedPages */
1280	break;
1281	}
1282	}
1283
1284	/* zap the GVM data. */
1285	pGVM->gmm.s.Stats.enmPolicy = GMMOCPOLICY_INVALID;
1286	pGVM->gmm.s.Stats.enmPriority = GMMPRIORITY_INVALID;
1287	pGVM->gmm.s.Stats.fMayAllocate = false;
1288
1289	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1290	gmmR0MutexRelease(pGMM);
1291
1292	LogFlow(("GMMR0CleanupVM: returns\n"));
1293	}
1294
1295
1296	/**
1297	* Scan one chunk for private pages belonging to the specified VM.
1298	*
1299	* @note This function may drop the gian mutex!
1300	*
1301	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1302	* we didn't.
1303	* @param pGMM Pointer to the GMM instance.
1304	* @param pGVM The global VM handle.
1305	* @param pChunk The chunk to scan.
1306	*/
1307	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1308	{
1309	/*
1310	* Look for pages belonging to the VM.
1311	* (Perform some internal checks while we're scanning.)
1312	*/
1313	#ifndef VBOX_STRICT
1314	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1315	#endif
1316	{
1317	unsigned cPrivate = 0;
1318	unsigned cShared = 0;
1319	unsigned cFree = 0;
1320
1321	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1322
1323	uint16_t hGVM = pGVM->hSelf;
1324	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1325	while (iPage-- > 0)
1326	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1327	{
1328	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1329	{
1330	/*
1331	* Free the page.
1332	*
1333	* The reason for not using gmmR0FreePrivatePage here is that we
1334	* must not cause the chunk to be freed from under us - we're in
1335	* an AVL tree walk here.
1336	*/
1337	pChunk->aPages[iPage].u = 0;
1338	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1339	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1340	pChunk->iFreeHead = iPage;
1341	pChunk->cPrivate--;
1342	pChunk->cFree++;
1343	pGVM->gmm.s.Stats.cPrivatePages--;
1344	cFree++;
1345	}
1346	else
1347	cPrivate++;
1348	}
1349	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1350	cFree++;
1351	else
1352	cShared++;
1353
1354	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1355
1356	/*
1357	* Did it add up?
1358	*/
1359	if (RT_UNLIKELY( pChunk->cFree != cFree
1360	\|\| pChunk->cPrivate != cPrivate
1361	\|\| pChunk->cShared != cShared))
1362	{
1363	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1364	pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1365	pChunk->cFree = cFree;
1366	pChunk->cPrivate = cPrivate;
1367	pChunk->cShared = cShared;
1368	}
1369	}
1370
1371	/*
1372	* If not in bound memory mode, we should reset the hGVM field
1373	* if it has our handle in it.
1374	*/
1375	if (pChunk->hGVM == pGVM->hSelf)
1376	{
1377	if (!g_pGMM->fBoundMemoryMode)
1378	pChunk->hGVM = NIL_GVM_HANDLE;
1379	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1380	{
1381	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1382	pChunk, pChunk->Core.Key, pChunk->cFree);
1383	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1384
1385	gmmR0UnlinkChunk(pChunk);
1386	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1387	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1388	}
1389	}
1390
1391	/*
1392	* Look for a mapping belonging to the terminating VM.
1393	*/
1394	GMMR0CHUNKMTXSTATE MtxState;
1395	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1396	unsigned cMappings = pChunk->cMappingsX;
1397	for (unsigned i = 0; i < cMappings; i++)
1398	if (pChunk->paMappingsX[i].pGVM == pGVM)
1399	{
1400	gmmR0ChunkMutexDropGiant(&MtxState);
1401
1402	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1403
1404	cMappings--;
1405	if (i < cMappings)
1406	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1407	pChunk->paMappingsX[cMappings].pGVM = NULL;
1408	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1409	Assert(pChunk->cMappingsX - 1U == cMappings);
1410	pChunk->cMappingsX = cMappings;
1411
1412	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1413	if (RT_FAILURE(rc))
1414	{
1415	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1416	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1417	AssertRC(rc);
1418	}
1419
1420	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1421	return true;
1422	}
1423
1424	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1425	return false;
1426	}
1427
1428
1429	/**
1430	* The initial resource reservations.
1431	*
1432	* This will make memory reservations according to policy and priority. If there aren't
1433	* sufficient resources available to sustain the VM this function will fail and all
1434	* future allocations requests will fail as well.
1435	*
1436	* These are just the initial reservations made very very early during the VM creation
1437	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1438	* ring-3 init has completed.
1439	*
1440	* @returns VBox status code.
1441	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1442	* @retval VERR_GMM_
1443	*
1444	* @param pVM Pointer to the shared VM structure.
1445	* @param idCpu VCPU id
1446	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1447	* This does not include MMIO2 and similar.
1448	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1449	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1450	* hyper heap, MMIO2 and similar.
1451	* @param enmPolicy The OC policy to use on this VM.
1452	* @param enmPriority The priority in an out-of-memory situation.
1453	*
1454	* @thread The creator thread / EMT.
1455	*/
1456	GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1457	GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1458	{
1459	LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1460	pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1461
1462	/*
1463	* Validate, get basics and take the semaphore.
1464	*/
1465	PGMM pGMM;
1466	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1467	PGVM pGVM;
1468	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1469	if (RT_FAILURE(rc))
1470	return rc;
1471
1472	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1473	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1474	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1475	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1476	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1477
1478	gmmR0MutexAcquire(pGMM);
1479	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1480	{
1481	if ( !pGVM->gmm.s.Stats.Reserved.cBasePages
1482	&& !pGVM->gmm.s.Stats.Reserved.cFixedPages
1483	&& !pGVM->gmm.s.Stats.Reserved.cShadowPages)
1484	{
1485	/*
1486	* Check if we can accommodate this.
1487	*/
1488	/* ... later ... */
1489	if (RT_SUCCESS(rc))
1490	{
1491	/*
1492	* Update the records.
1493	*/
1494	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1495	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1496	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1497	pGVM->gmm.s.Stats.enmPolicy = enmPolicy;
1498	pGVM->gmm.s.Stats.enmPriority = enmPriority;
1499	pGVM->gmm.s.Stats.fMayAllocate = true;
1500
1501	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1502	pGMM->cRegisteredVMs++;
1503	}
1504	}
1505	else
1506	rc = VERR_WRONG_ORDER;
1507	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1508	}
1509	else
1510	rc = VERR_GMM_IS_NOT_SANE;
1511	gmmR0MutexRelease(pGMM);
1512	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1513	return rc;
1514	}
1515
1516
1517	/**
1518	* VMMR0 request wrapper for GMMR0InitialReservation.
1519	*
1520	* @returns see GMMR0InitialReservation.
1521	* @param pVM Pointer to the shared VM structure.
1522	* @param idCpu VCPU id
1523	* @param pReq The request packet.
1524	*/
1525	GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1526	{
1527	/*
1528	* Validate input and pass it on.
1529	*/
1530	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1531	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1532	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1533
1534	return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1535	}
1536
1537
1538	/**
1539	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1540	*
1541	* @returns VBox status code.
1542	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1543	*
1544	* @param pVM Pointer to the shared VM structure.
1545	* @param idCpu VCPU id
1546	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1547	* This does not include MMIO2 and similar.
1548	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1549	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1550	* hyper heap, MMIO2 and similar.
1551	*
1552	* @thread EMT.
1553	*/
1554	GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1555	{
1556	LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1557	pVM, cBasePages, cShadowPages, cFixedPages));
1558
1559	/*
1560	* Validate, get basics and take the semaphore.
1561	*/
1562	PGMM pGMM;
1563	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1564	PGVM pGVM;
1565	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1566	if (RT_FAILURE(rc))
1567	return rc;
1568
1569	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1570	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1571	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1572
1573	gmmR0MutexAcquire(pGMM);
1574	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1575	{
1576	if ( pGVM->gmm.s.Stats.Reserved.cBasePages
1577	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
1578	&& pGVM->gmm.s.Stats.Reserved.cShadowPages)
1579	{
1580	/*
1581	* Check if we can accommodate this.
1582	*/
1583	/* ... later ... */
1584	if (RT_SUCCESS(rc))
1585	{
1586	/*
1587	* Update the records.
1588	*/
1589	pGMM->cReservedPages -= pGVM->gmm.s.Stats.Reserved.cBasePages
1590	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
1591	+ pGVM->gmm.s.Stats.Reserved.cShadowPages;
1592	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1593
1594	pGVM->gmm.s.Stats.Reserved.cBasePages = cBasePages;
1595	pGVM->gmm.s.Stats.Reserved.cFixedPages = cFixedPages;
1596	pGVM->gmm.s.Stats.Reserved.cShadowPages = cShadowPages;
1597	}
1598	}
1599	else
1600	rc = VERR_WRONG_ORDER;
1601	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1602	}
1603	else
1604	rc = VERR_GMM_IS_NOT_SANE;
1605	gmmR0MutexRelease(pGMM);
1606	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1607	return rc;
1608	}
1609
1610
1611	/**
1612	* VMMR0 request wrapper for GMMR0UpdateReservation.
1613	*
1614	* @returns see GMMR0UpdateReservation.
1615	* @param pVM Pointer to the shared VM structure.
1616	* @param idCpu VCPU id
1617	* @param pReq The request packet.
1618	*/
1619	GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1620	{
1621	/*
1622	* Validate input and pass it on.
1623	*/
1624	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1625	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1626	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1627
1628	return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1629	}
1630
1631	#ifdef GMMR0_WITH_SANITY_CHECK
1632
1633	/**
1634	* Performs sanity checks on a free set.
1635	*
1636	* @returns Error count.
1637	*
1638	* @param pGMM Pointer to the GMM instance.
1639	* @param pSet Pointer to the set.
1640	* @param pszSetName The set name.
1641	* @param pszFunction The function from which it was called.
1642	* @param uLine The line number.
1643	*/
1644	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1645	const char *pszFunction, unsigned uLineNo)
1646	{
1647	uint32_t cErrors = 0;
1648
1649	/*
1650	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1651	*/
1652	uint32_t cPages = 0;
1653	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1654	{
1655	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1656	{
1657	/** @todo check that the chunk is hash into the right set. */
1658	cPages += pCur->cFree;
1659	}
1660	}
1661	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1662	{
1663	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1664	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1665	cErrors++;
1666	}
1667
1668	return cErrors;
1669	}
1670
1671
1672	/**
1673	* Performs some sanity checks on the GMM while owning lock.
1674	*
1675	* @returns Error count.
1676	*
1677	* @param pGMM Pointer to the GMM instance.
1678	* @param pszFunction The function from which it is called.
1679	* @param uLineNo The line number.
1680	*/
1681	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1682	{
1683	uint32_t cErrors = 0;
1684
1685	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1686	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1687	/** @todo add more sanity checks. */
1688
1689	return cErrors;
1690	}
1691
1692	#endif /* GMMR0_WITH_SANITY_CHECK */
1693
1694	/**
1695	* Looks up a chunk in the tree and fill in the TLB entry for it.
1696	*
1697	* This is not expected to fail and will bitch if it does.
1698	*
1699	* @returns Pointer to the allocation chunk, NULL if not found.
1700	* @param pGMM Pointer to the GMM instance.
1701	* @param idChunk The ID of the chunk to find.
1702	* @param pTlbe Pointer to the TLB entry.
1703	*/
1704	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1705	{
1706	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1707	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1708	pTlbe->idChunk = idChunk;
1709	pTlbe->pChunk = pChunk;
1710	return pChunk;
1711	}
1712
1713
1714	/**
1715	* Finds a allocation chunk.
1716	*
1717	* This is not expected to fail and will bitch if it does.
1718	*
1719	* @returns Pointer to the allocation chunk, NULL if not found.
1720	* @param pGMM Pointer to the GMM instance.
1721	* @param idChunk The ID of the chunk to find.
1722	*/
1723	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1724	{
1725	/*
1726	* Do a TLB lookup, branch if not in the TLB.
1727	*/
1728	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1729	if ( pTlbe->idChunk != idChunk
1730	\|\| !pTlbe->pChunk)
1731	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1732	return pTlbe->pChunk;
1733	}
1734
1735
1736	/**
1737	* Finds a page.
1738	*
1739	* This is not expected to fail and will bitch if it does.
1740	*
1741	* @returns Pointer to the page, NULL if not found.
1742	* @param pGMM Pointer to the GMM instance.
1743	* @param idPage The ID of the page to find.
1744	*/
1745	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1746	{
1747	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1748	if (RT_LIKELY(pChunk))
1749	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1750	return NULL;
1751	}
1752
1753
1754	/**
1755	* Gets the host physical address for a page given by it's ID.
1756	*
1757	* @returns The host physical address or NIL_RTHCPHYS.
1758	* @param pGMM Pointer to the GMM instance.
1759	* @param idPage The ID of the page to find.
1760	*/
1761	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1762	{
1763	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1764	if (RT_LIKELY(pChunk))
1765	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1766	return NIL_RTHCPHYS;
1767	}
1768
1769
1770	/**
1771	* Selects the appropriate free list given the number of free pages.
1772	*
1773	* @returns Free list index.
1774	* @param cFree The number of free pages in the chunk.
1775	*/
1776	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1777	{
1778	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1779	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1780	("%d (%u)\n", iList, cFree));
1781	return iList;
1782	}
1783
1784
1785	/**
1786	* Unlinks the chunk from the free list it's currently on (if any).
1787	*
1788	* @param pChunk The allocation chunk.
1789	*/
1790	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1791	{
1792	PGMMCHUNKFREESET pSet = pChunk->pSet;
1793	if (RT_LIKELY(pSet))
1794	{
1795	pSet->cFreePages -= pChunk->cFree;
1796	pSet->idGeneration++;
1797
1798	PGMMCHUNK pPrev = pChunk->pFreePrev;
1799	PGMMCHUNK pNext = pChunk->pFreeNext;
1800	if (pPrev)
1801	pPrev->pFreeNext = pNext;
1802	else
1803	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1804	if (pNext)
1805	pNext->pFreePrev = pPrev;
1806
1807	pChunk->pSet = NULL;
1808	pChunk->pFreeNext = NULL;
1809	pChunk->pFreePrev = NULL;
1810	}
1811	else
1812	{
1813	Assert(!pChunk->pFreeNext);
1814	Assert(!pChunk->pFreePrev);
1815	Assert(!pChunk->cFree);
1816	}
1817	}
1818
1819
1820	/**
1821	* Links the chunk onto the appropriate free list in the specified free set.
1822	*
1823	* If no free entries, it's not linked into any list.
1824	*
1825	* @param pChunk The allocation chunk.
1826	* @param pSet The free set.
1827	*/
1828	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1829	{
1830	Assert(!pChunk->pSet);
1831	Assert(!pChunk->pFreeNext);
1832	Assert(!pChunk->pFreePrev);
1833
1834	if (pChunk->cFree > 0)
1835	{
1836	pChunk->pSet = pSet;
1837	pChunk->pFreePrev = NULL;
1838	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1839	pChunk->pFreeNext = pSet->apLists[iList];
1840	if (pChunk->pFreeNext)
1841	pChunk->pFreeNext->pFreePrev = pChunk;
1842	pSet->apLists[iList] = pChunk;
1843
1844	pSet->cFreePages += pChunk->cFree;
1845	pSet->idGeneration++;
1846	}
1847	}
1848
1849
1850	/**
1851	* Links the chunk onto the appropriate free list in the specified free set.
1852	*
1853	* If no free entries, it's not linked into any list.
1854	*
1855	* @param pChunk The allocation chunk.
1856	*/
1857	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1858	{
1859	PGMMCHUNKFREESET pSet;
1860	if (pGMM->fBoundMemoryMode)
1861	pSet = &pGVM->gmm.s.Private;
1862	else if (pChunk->cShared)
1863	pSet = &pGMM->Shared;
1864	else
1865	pSet = &pGMM->PrivateX;
1866	gmmR0LinkChunk(pChunk, pSet);
1867	}
1868
1869
1870	/**
1871	* Frees a Chunk ID.
1872	*
1873	* @param pGMM Pointer to the GMM instance.
1874	* @param idChunk The Chunk ID to free.
1875	*/
1876	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1877	{
1878	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1879	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1880	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1881	}
1882
1883
1884	/**
1885	* Allocates a new Chunk ID.
1886	*
1887	* @returns The Chunk ID.
1888	* @param pGMM Pointer to the GMM instance.
1889	*/
1890	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1891	{
1892	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1893	AssertCompile(NIL_GMM_CHUNKID == 0);
1894
1895	/*
1896	* Try the next sequential one.
1897	*/
1898	int32_t idChunk = ++pGMM->idChunkPrev;
1899	#if 0 /** @todo enable this code */
1900	if ( idChunk <= GMM_CHUNKID_LAST
1901	&& idChunk > NIL_GMM_CHUNKID
1902	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
1903	return idChunk;
1904	#endif
1905
1906	/*
1907	* Scan sequentially from the last one.
1908	*/
1909	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
1910	&& idChunk > NIL_GMM_CHUNKID)
1911	{
1912	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
1913	if (idChunk > NIL_GMM_CHUNKID)
1914	{
1915	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1916	return pGMM->idChunkPrev = idChunk;
1917	}
1918	}
1919
1920	/*
1921	* Ok, scan from the start.
1922	* We're not racing anyone, so there is no need to expect failures or have restart loops.
1923	*/
1924	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
1925	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
1926	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1927
1928	return pGMM->idChunkPrev = idChunk;
1929	}
1930
1931
1932	/**
1933	* Allocates one private page.
1934	*
1935	* Worker for gmmR0AllocatePages.
1936	*
1937	* @param pChunk The chunk to allocate it from.
1938	* @param hGVM The GVM handle of the VM requesting memory.
1939	* @param pPageDesc The page descriptor.
1940	*/
1941	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
1942	{
1943	/* update the chunk stats. */
1944	if (pChunk->hGVM == NIL_GVM_HANDLE)
1945	pChunk->hGVM = hGVM;
1946	Assert(pChunk->cFree);
1947	pChunk->cFree--;
1948	pChunk->cPrivate++;
1949
1950	/* unlink the first free page. */
1951	const uint32_t iPage = pChunk->iFreeHead;
1952	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
1953	PGMMPAGE pPage = &pChunk->aPages[iPage];
1954	Assert(GMM_PAGE_IS_FREE(pPage));
1955	pChunk->iFreeHead = pPage->Free.iNext;
1956	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
1957	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
1958	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
1959
1960	/* make the page private. */
1961	pPage->u = 0;
1962	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
1963	pPage->Private.hGVM = hGVM;
1964	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
1965	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
1966	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
1967	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
1968	else
1969	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
1970
1971	/* update the page descriptor. */
1972	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
1973	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
1974	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
1975	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
1976	}
1977
1978
1979	/**
1980	* Picks the free pages from a chunk.
1981	*
1982	* @returns The new page descriptor table index.
1983	* @param pGMM Pointer to the GMM instance data.
1984	* @param hGVM The VM handle.
1985	* @param pChunk The chunk.
1986	* @param iPage The current page descriptor table index.
1987	* @param cPages The total number of pages to allocate.
1988	* @param paPages The page descriptor table (input + ouput).
1989	*/
1990	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
1991	PGMMPAGEDESC paPages)
1992	{
1993	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
1994	gmmR0UnlinkChunk(pChunk);
1995
1996	for (; pChunk->cFree && iPage < cPages; iPage++)
1997	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
1998
1999	gmmR0LinkChunk(pChunk, pSet);
2000	return iPage;
2001	}
2002
2003
2004	/**
2005	* Registers a new chunk of memory.
2006	*
2007	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2008	*
2009	* @returns VBox status code. On success, the giant GMM lock will be held, the
2010	* caller must release it (ugly).
2011	* @param pGMM Pointer to the GMM instance.
2012	* @param pSet Pointer to the set.
2013	* @param MemObj The memory object for the chunk.
2014	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2015	* affinity.
2016	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2017	* @param ppChunk Chunk address (out). Optional.
2018	*
2019	* @remarks The caller must not own the giant GMM mutex.
2020	* The giant GMM mutex will be acquired and returned acquired in
2021	* the success path. On failure, no locks will be held.
2022	*/
2023	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2024	PGMMCHUNK *ppChunk)
2025	{
2026	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2027	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2028	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2029
2030	int rc;
2031	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2032	if (pChunk)
2033	{
2034	/*
2035	* Initialize it.
2036	*/
2037	pChunk->hMemObj = MemObj;
2038	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2039	pChunk->hGVM = hGVM;
2040	/pChunk->iFreeHead = 0;/
2041	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2042	pChunk->iChunkMtx = UINT8_MAX;
2043	pChunk->fFlags = fChunkFlags;
2044	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2045	{
2046	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2047	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2048	}
2049	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2050	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2051
2052	/*
2053	* Allocate a Chunk ID and insert it into the tree.
2054	* This has to be done behind the mutex of course.
2055	*/
2056	rc = gmmR0MutexAcquire(pGMM);
2057	if (RT_SUCCESS(rc))
2058	{
2059	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2060	{
2061	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2062	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2063	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2064	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2065	{
2066	pGMM->cChunks++;
2067	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2068	gmmR0LinkChunk(pChunk, pSet);
2069	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2070
2071	if (ppChunk)
2072	*ppChunk = pChunk;
2073	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2074	return VINF_SUCCESS;
2075	}
2076
2077	/* bail out */
2078	rc = VERR_GMM_CHUNK_INSERT;
2079	}
2080	else
2081	rc = VERR_GMM_IS_NOT_SANE;
2082	gmmR0MutexRelease(pGMM);
2083	}
2084
2085	RTMemFree(pChunk);
2086	}
2087	else
2088	rc = VERR_NO_MEMORY;
2089	return rc;
2090	}
2091
2092
2093	/**
2094	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2095	* what's remaining to the specified free set.
2096	*
2097	* @note This will leave the giant mutex while allocating the new chunk!
2098	*
2099	* @returns VBox status code.
2100	* @param pGMM Pointer to the GMM instance data.
2101	* @param pGVM Pointer to the kernel-only VM instace data.
2102	* @param pSet Pointer to the free set.
2103	* @param cPages The number of pages requested.
2104	* @param paPages The page descriptor table (input + output).
2105	* @param piPage The pointer to the page descriptor table index
2106	* variable. This will be updated.
2107	*/
2108	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2109	PGMMPAGEDESC paPages, uint32_t *piPage)
2110	{
2111	gmmR0MutexRelease(pGMM);
2112
2113	RTR0MEMOBJ hMemObj;
2114	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2115	if (RT_SUCCESS(rc))
2116	{
2117	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2118	* free pages first and then unchaining them right afterwards. Instead
2119	* do as much work as possible without holding the giant lock. */
2120	PGMMCHUNK pChunk;
2121	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2122	if (RT_SUCCESS(rc))
2123	{
2124	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2125	return VINF_SUCCESS;
2126	}
2127
2128	/* bail out */
2129	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2130	}
2131
2132	int rc2 = gmmR0MutexAcquire(pGMM);
2133	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2134	return rc;
2135
2136	}
2137
2138
2139	/**
2140	* As a last restort we'll pick any page we can get.
2141	*
2142	* @returns The new page descriptor table index.
2143	* @param pSet The set to pick from.
2144	* @param pGVM Pointer to the global VM structure.
2145	* @param iPage The current page descriptor table index.
2146	* @param cPages The total number of pages to allocate.
2147	* @param paPages The page descriptor table (input + ouput).
2148	*/
2149	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2150	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2151	{
2152	unsigned iList = RT_ELEMENTS(pSet->apLists);
2153	while (iList-- > 0)
2154	{
2155	PGMMCHUNK pChunk = pSet->apLists[iList];
2156	while (pChunk)
2157	{
2158	PGMMCHUNK pNext = pChunk->pFreeNext;
2159
2160	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2161	if (iPage >= cPages)
2162	return iPage;
2163
2164	pChunk = pNext;
2165	}
2166	}
2167	return iPage;
2168	}
2169
2170
2171	/**
2172	* Pick pages from empty chunks on the same NUMA node.
2173	*
2174	* @returns The new page descriptor table index.
2175	* @param pSet The set to pick from.
2176	* @param pGVM Pointer to the global VM structure.
2177	* @param iPage The current page descriptor table index.
2178	* @param cPages The total number of pages to allocate.
2179	* @param paPages The page descriptor table (input + ouput).
2180	*/
2181	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2182	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2183	{
2184	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2185	if (pChunk)
2186	{
2187	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2188	while (pChunk)
2189	{
2190	PGMMCHUNK pNext = pChunk->pFreeNext;
2191
2192	if (pChunk->idNumaNode == idNumaNode)
2193	{
2194	pChunk->hGVM = pGVM->hSelf;
2195	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2196	if (iPage >= cPages)
2197	{
2198	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2199	return iPage;
2200	}
2201	}
2202
2203	pChunk = pNext;
2204	}
2205	}
2206	return iPage;
2207	}
2208
2209
2210	/**
2211	* Pick pages from non-empty chunks on the same NUMA node.
2212	*
2213	* @returns The new page descriptor table index.
2214	* @param pSet The set to pick from.
2215	* @param pGVM Pointer to the global VM structure.
2216	* @param iPage The current page descriptor table index.
2217	* @param cPages The total number of pages to allocate.
2218	* @param paPages The page descriptor table (input + ouput).
2219	*/
2220	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2221	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2222	{
2223	/** @todo start by picking from chunks with about the right size first? */
2224	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2225	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2226	while (iList-- > 0)
2227	{
2228	PGMMCHUNK pChunk = pSet->apLists[iList];
2229	while (pChunk)
2230	{
2231	PGMMCHUNK pNext = pChunk->pFreeNext;
2232
2233	if (pChunk->idNumaNode == idNumaNode)
2234	{
2235	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2236	if (iPage >= cPages)
2237	{
2238	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2239	return iPage;
2240	}
2241	}
2242
2243	pChunk = pNext;
2244	}
2245	}
2246	return iPage;
2247	}
2248
2249
2250	/**
2251	* Pick pages that are in chunks already associated with the VM.
2252	*
2253	* @returns The new page descriptor table index.
2254	* @param pGMM Pointer to the GMM instance data.
2255	* @param pGVM Pointer to the global VM structure.
2256	* @param pSet The set to pick from.
2257	* @param iPage The current page descriptor table index.
2258	* @param cPages The total number of pages to allocate.
2259	* @param paPages The page descriptor table (input + ouput).
2260	*/
2261	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2262	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2263	{
2264	uint16_t const hGVM = pGVM->hSelf;
2265
2266	/* Hint. */
2267	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2268	{
2269	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2270	if (pChunk && pChunk->cFree)
2271	{
2272	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2273	if (iPage >= cPages)
2274	return iPage;
2275	}
2276	}
2277
2278	/* Scan. */
2279	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2280	{
2281	PGMMCHUNK pChunk = pSet->apLists[iList];
2282	while (pChunk)
2283	{
2284	PGMMCHUNK pNext = pChunk->pFreeNext;
2285
2286	if (pChunk->hGVM == hGVM)
2287	{
2288	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2289	if (iPage >= cPages)
2290	{
2291	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2292	return iPage;
2293	}
2294	}
2295
2296	pChunk = pNext;
2297	}
2298	}
2299	return iPage;
2300	}
2301
2302
2303
2304	/**
2305	* Pick pages in bound memory mode.
2306	*
2307	* @returns The new page descriptor table index.
2308	* @param pGVM Pointer to the global VM structure.
2309	* @param iPage The current page descriptor table index.
2310	* @param cPages The total number of pages to allocate.
2311	* @param paPages The page descriptor table (input + ouput).
2312	*/
2313	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2314	{
2315	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2316	{
2317	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2318	while (pChunk)
2319	{
2320	Assert(pChunk->hGVM == pGVM->hSelf);
2321	PGMMCHUNK pNext = pChunk->pFreeNext;
2322	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2323	if (iPage >= cPages)
2324	return iPage;
2325	pChunk = pNext;
2326	}
2327	}
2328	return iPage;
2329	}
2330
2331
2332	/**
2333	* Checks if we should start picking pages from chunks of other VMs.
2334	*
2335	* @returns @c true if we should, @c false if we should first try allocate more
2336	* chunks.
2337	*/
2338	static bool gmmR0ShouldAllocatePagesInOtherChunks(PGVM pGVM)
2339	{
2340	/*
2341	* Don't allocate a new chunk if we're
2342	*/
2343	uint64_t cPgReserved = pGVM->gmm.s.Stats.Reserved.cBasePages
2344	+ pGVM->gmm.s.Stats.Reserved.cFixedPages
2345	- pGVM->gmm.s.Stats.cBalloonedPages
2346	/** @todo what about shared pages? */;
2347	uint64_t cPgAllocated = pGVM->gmm.s.Stats.Allocated.cBasePages
2348	+ pGVM->gmm.s.Stats.Allocated.cFixedPages;
2349	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2350	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2351	return true;
2352	/** @todo make the threshold configurable, also test the code to see if
2353	* this ever kicks in (we might be reserving too much or smth). */
2354
2355	/*
2356	* Check how close we're to the max memory limit and how many fragments
2357	* there are?...
2358	*/
2359	/** @todo. */
2360
2361	return false;
2362	}
2363
2364
2365	/**
2366	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2367	*
2368	* @returns VBox status code:
2369	* @retval VINF_SUCCESS on success.
2370	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2371	* gmmR0AllocateMoreChunks is necessary.
2372	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2373	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2374	* that is we're trying to allocate more than we've reserved.
2375	*
2376	* @param pGMM Pointer to the GMM instance data.
2377	* @param pGVM Pointer to the shared VM structure.
2378	* @param cPages The number of pages to allocate.
2379	* @param paPages Pointer to the page descriptors.
2380	* See GMMPAGEDESC for details on what is expected on input.
2381	* @param enmAccount The account to charge.
2382	*
2383	* @remarks Call takes the giant GMM lock.
2384	*/
2385	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2386	{
2387	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2388
2389	/*
2390	* Check allocation limits.
2391	*/
2392	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2393	return VERR_GMM_HIT_GLOBAL_LIMIT;
2394
2395	switch (enmAccount)
2396	{
2397	case GMMACCOUNT_BASE:
2398	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2399	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2400	{
2401	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2402	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages,
2403	pGVM->gmm.s.Stats.cBalloonedPages, cPages));
2404	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2405	}
2406	break;
2407	case GMMACCOUNT_SHADOW:
2408	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages + cPages > pGVM->gmm.s.Stats.Reserved.cShadowPages))
2409	{
2410	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2411	pGVM->gmm.s.Stats.Reserved.cShadowPages, pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
2412	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2413	}
2414	break;
2415	case GMMACCOUNT_FIXED:
2416	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages + cPages > pGVM->gmm.s.Stats.Reserved.cFixedPages))
2417	{
2418	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2419	pGVM->gmm.s.Stats.Reserved.cFixedPages, pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
2420	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2421	}
2422	break;
2423	default:
2424	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2425	}
2426
2427	/*
2428	* If we're in legacy memory mode, it's easy to figure if we have
2429	* sufficient number of pages up-front.
2430	*/
2431	if ( pGMM->fLegacyAllocationMode
2432	&& pGVM->gmm.s.Private.cFreePages < cPages)
2433	{
2434	Assert(pGMM->fBoundMemoryMode);
2435	return VERR_GMM_SEED_ME;
2436	}
2437
2438	/*
2439	* Update the accounts before we proceed because we might be leaving the
2440	* protection of the global mutex and thus run the risk of permitting
2441	* too much memory to be allocated.
2442	*/
2443	switch (enmAccount)
2444	{
2445	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages += cPages; break;
2446	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages += cPages; break;
2447	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages += cPages; break;
2448	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2449	}
2450	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2451	pGMM->cAllocatedPages += cPages;
2452
2453	/*
2454	* Part two of it's-easy-in-legacy-memory-mode.
2455	*/
2456	uint32_t iPage = 0;
2457	if (pGMM->fLegacyAllocationMode)
2458	{
2459	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2460	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2461	return VINF_SUCCESS;
2462	}
2463
2464	/*
2465	* Bound mode is also relatively straightforward.
2466	*/
2467	int rc = VINF_SUCCESS;
2468	if (pGMM->fBoundMemoryMode)
2469	{
2470	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2471	if (iPage < cPages)
2472	do
2473	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2474	while (iPage < cPages && RT_SUCCESS(rc));
2475	}
2476	/*
2477	* Shared mode is trickier as we should try archive the same locality as
2478	* in bound mode, but smartly make use of non-full chunks allocated by
2479	* other VMs if we're low on memory.
2480	*/
2481	else
2482	{
2483	/* Pick the most optimal pages first. */
2484	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2485	if (iPage < cPages)
2486	{
2487	/* Maybe we should try getting pages from chunks "belonging" to
2488	other VMs before allocating more chunks? */
2489	if (gmmR0ShouldAllocatePagesInOtherChunks(pGVM))
2490	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2491
2492	/* Allocate memory from empty chunks. */
2493	if (iPage < cPages)
2494	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2495
2496	/* Grab empty shared chunks. */
2497	if (iPage < cPages)
2498	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2499
2500	/*
2501	* Ok, try allocate new chunks.
2502	*/
2503	if (iPage < cPages)
2504	{
2505	do
2506	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2507	while (iPage < cPages && RT_SUCCESS(rc));
2508
2509	/* If the host is out of memory, take whatever we can get. */
2510	if ( (rc == VERR_NO_MEMORY \|\| rc == VERR_NO_PHYS_MEMORY)
2511	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2512	{
2513	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2514	if (iPage < cPages)
2515	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2516	AssertRelease(iPage == cPages);
2517	rc = VINF_SUCCESS;
2518	}
2519	}
2520	}
2521	}
2522
2523	/*
2524	* Clean up on failure. Since this is bound to be a low-memory condition
2525	* we will give back any empty chunks that might be hanging around.
2526	*/
2527	if (RT_FAILURE(rc))
2528	{
2529	/* Update the statistics. */
2530	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
2531	pGMM->cAllocatedPages -= cPages - iPage;
2532	switch (enmAccount)
2533	{
2534	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages; break;
2535	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= cPages; break;
2536	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= cPages; break;
2537	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2538	}
2539
2540	/* Release the pages. */
2541	while (iPage-- > 0)
2542	{
2543	uint32_t idPage = paPages[iPage].idPage;
2544	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2545	if (RT_LIKELY(pPage))
2546	{
2547	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2548	Assert(pPage->Private.hGVM == pGVM->hSelf);
2549	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2550	}
2551	else
2552	AssertMsgFailed(("idPage=%#x\n", idPage));
2553
2554	paPages[iPage].idPage = NIL_GMM_PAGEID;
2555	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2556	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2557	}
2558
2559	/* Free empty chunks. */
2560	/** @todo */
2561
2562	/* return the fail status on failure */
2563	return rc;
2564	}
2565	return VINF_SUCCESS;
2566	}
2567
2568
2569	/**
2570	* Updates the previous allocations and allocates more pages.
2571	*
2572	* The handy pages are always taken from the 'base' memory account.
2573	* The allocated pages are not cleared and will contains random garbage.
2574	*
2575	* @returns VBox status code:
2576	* @retval VINF_SUCCESS on success.
2577	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2578	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2579	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2580	* private page.
2581	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2582	* shared page.
2583	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2584	* owned by the VM.
2585	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2586	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2587	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2588	* that is we're trying to allocate more than we've reserved.
2589	*
2590	* @param pVM Pointer to the shared VM structure.
2591	* @param idCpu VCPU id
2592	* @param cPagesToUpdate The number of pages to update (starting from the head).
2593	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2594	* @param paPages The array of page descriptors.
2595	* See GMMPAGEDESC for details on what is expected on input.
2596	* @thread EMT.
2597	*/
2598	GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2599	{
2600	LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2601	pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2602
2603	/*
2604	* Validate, get basics and take the semaphore.
2605	* (This is a relatively busy path, so make predictions where possible.)
2606	*/
2607	PGMM pGMM;
2608	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2609	PGVM pGVM;
2610	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2611	if (RT_FAILURE(rc))
2612	return rc;
2613
2614	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2615	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2616	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2617	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2618	VERR_INVALID_PARAMETER);
2619
2620	unsigned iPage = 0;
2621	for (; iPage < cPagesToUpdate; iPage++)
2622	{
2623	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2624	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2625	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2626	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2627	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2628	VERR_INVALID_PARAMETER);
2629	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2630	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2631	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2632	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2633	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2634	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2635	}
2636
2637	for (; iPage < cPagesToAlloc; iPage++)
2638	{
2639	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2640	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2641	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2642	}
2643
2644	gmmR0MutexAcquire(pGMM);
2645	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2646	{
2647	/* No allocations before the initial reservation has been made! */
2648	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2649	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2650	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2651	{
2652	/*
2653	* Perform the updates.
2654	* Stop on the first error.
2655	*/
2656	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2657	{
2658	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2659	{
2660	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2661	if (RT_LIKELY(pPage))
2662	{
2663	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2664	{
2665	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2666	{
2667	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2668	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2669	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2670	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2671	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2672	/* else: NIL_RTHCPHYS nothing */
2673
2674	paPages[iPage].idPage = NIL_GMM_PAGEID;
2675	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2676	}
2677	else
2678	{
2679	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2680	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2681	rc = VERR_GMM_NOT_PAGE_OWNER;
2682	break;
2683	}
2684	}
2685	else
2686	{
2687	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2688	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2689	break;
2690	}
2691	}
2692	else
2693	{
2694	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2695	rc = VERR_GMM_PAGE_NOT_FOUND;
2696	break;
2697	}
2698	}
2699
2700	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2701	{
2702	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2703	if (RT_LIKELY(pPage))
2704	{
2705	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2706	{
2707	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2708	Assert(pPage->Shared.cRefs);
2709	Assert(pGVM->gmm.s.Stats.cSharedPages);
2710	Assert(pGVM->gmm.s.Stats.Allocated.cBasePages);
2711
2712	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2713	pGVM->gmm.s.Stats.cSharedPages--;
2714	pGVM->gmm.s.Stats.Allocated.cBasePages--;
2715	if (!--pPage->Shared.cRefs)
2716	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2717	else
2718	{
2719	Assert(pGMM->cDuplicatePages);
2720	pGMM->cDuplicatePages--;
2721	}
2722
2723	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2724	}
2725	else
2726	{
2727	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2728	rc = VERR_GMM_PAGE_NOT_SHARED;
2729	break;
2730	}
2731	}
2732	else
2733	{
2734	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2735	rc = VERR_GMM_PAGE_NOT_FOUND;
2736	break;
2737	}
2738	}
2739	} /* for each page to update */
2740
2741	if (RT_SUCCESS(rc))
2742	{
2743	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2744	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2745	{
2746	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2747	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2748	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2749	}
2750	#endif
2751
2752	/*
2753	* Join paths with GMMR0AllocatePages for the allocation.
2754	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2755	*/
2756	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2757	}
2758	}
2759	else
2760	rc = VERR_WRONG_ORDER;
2761	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2762	}
2763	else
2764	rc = VERR_GMM_IS_NOT_SANE;
2765	gmmR0MutexRelease(pGMM);
2766	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2767	return rc;
2768	}
2769
2770
2771	/**
2772	* Allocate one or more pages.
2773	*
2774	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2775	* The allocated pages are not cleared and will contains random garbage.
2776	*
2777	* @returns VBox status code:
2778	* @retval VINF_SUCCESS on success.
2779	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2780	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2781	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2782	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2783	* that is we're trying to allocate more than we've reserved.
2784	*
2785	* @param pVM Pointer to the shared VM structure.
2786	* @param idCpu VCPU id
2787	* @param cPages The number of pages to allocate.
2788	* @param paPages Pointer to the page descriptors.
2789	* See GMMPAGEDESC for details on what is expected on input.
2790	* @param enmAccount The account to charge.
2791	*
2792	* @thread EMT.
2793	*/
2794	GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2795	{
2796	LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2797
2798	/*
2799	* Validate, get basics and take the semaphore.
2800	*/
2801	PGMM pGMM;
2802	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2803	PGVM pGVM;
2804	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2805	if (RT_FAILURE(rc))
2806	return rc;
2807
2808	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2809	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2810	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2811
2812	for (unsigned iPage = 0; iPage < cPages; iPage++)
2813	{
2814	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2815	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2816	\|\| ( enmAccount == GMMACCOUNT_BASE
2817	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2818	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2819	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2820	VERR_INVALID_PARAMETER);
2821	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2822	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2823	}
2824
2825	gmmR0MutexAcquire(pGMM);
2826	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2827	{
2828
2829	/* No allocations before the initial reservation has been made! */
2830	if (RT_LIKELY( pGVM->gmm.s.Stats.Reserved.cBasePages
2831	&& pGVM->gmm.s.Stats.Reserved.cFixedPages
2832	&& pGVM->gmm.s.Stats.Reserved.cShadowPages))
2833	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2834	else
2835	rc = VERR_WRONG_ORDER;
2836	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2837	}
2838	else
2839	rc = VERR_GMM_IS_NOT_SANE;
2840	gmmR0MutexRelease(pGMM);
2841	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2842	return rc;
2843	}
2844
2845
2846	/**
2847	* VMMR0 request wrapper for GMMR0AllocatePages.
2848	*
2849	* @returns see GMMR0AllocatePages.
2850	* @param pVM Pointer to the shared VM structure.
2851	* @param idCpu VCPU id
2852	* @param pReq The request packet.
2853	*/
2854	GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2855	{
2856	/*
2857	* Validate input and pass it on.
2858	*/
2859	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2860	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2861	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2862	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2863	VERR_INVALID_PARAMETER);
2864	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2865	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2866	VERR_INVALID_PARAMETER);
2867
2868	return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
2869	}
2870
2871
2872	/**
2873	* Allocate a large page to represent guest RAM
2874	*
2875	* The allocated pages are not cleared and will contains random garbage.
2876	*
2877	* @returns VBox status code:
2878	* @retval VINF_SUCCESS on success.
2879	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2880	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2881	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2882	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2883	* that is we're trying to allocate more than we've reserved.
2884	* @returns see GMMR0AllocatePages.
2885	* @param pVM Pointer to the shared VM structure.
2886	* @param idCpu VCPU id
2887	* @param cbPage Large page size
2888	*/
2889	GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
2890	{
2891	LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
2892
2893	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
2894	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
2895	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
2896
2897	/*
2898	* Validate, get basics and take the semaphore.
2899	*/
2900	PGMM pGMM;
2901	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2902	PGVM pGVM;
2903	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2904	if (RT_FAILURE(rc))
2905	return rc;
2906
2907	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
2908	if (pGMM->fLegacyAllocationMode)
2909	return VERR_NOT_SUPPORTED;
2910
2911	*pHCPhys = NIL_RTHCPHYS;
2912	*pIdPage = NIL_GMM_PAGEID;
2913
2914	gmmR0MutexAcquire(pGMM);
2915	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2916	{
2917	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
2918	if (RT_UNLIKELY( pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cPages
2919	> pGVM->gmm.s.Stats.Reserved.cBasePages))
2920	{
2921	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
2922	pGVM->gmm.s.Stats.Reserved.cBasePages, pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
2923	gmmR0MutexRelease(pGMM);
2924	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2925	}
2926
2927	/*
2928	* Allocate a new large page chunk.
2929	*
2930	* Note! We leave the giant GMM lock temporarily as the allocation might
2931	* take a long time. gmmR0RegisterChunk will retake it (ugly).
2932	*/
2933	AssertCompile(GMM_CHUNK_SIZE == _2M);
2934	gmmR0MutexRelease(pGMM);
2935
2936	RTR0MEMOBJ hMemObj;
2937	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
2938	if (RT_SUCCESS(rc))
2939	{
2940	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
2941	PGMMCHUNK pChunk;
2942	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
2943	if (RT_SUCCESS(rc))
2944	{
2945	/*
2946	* Allocate all the pages in the chunk.
2947	*/
2948	/* Unlink the new chunk from the free list. */
2949	gmmR0UnlinkChunk(pChunk);
2950
2951	/** @todo rewrite this to skip the looping. */
2952	/* Allocate all pages. */
2953	GMMPAGEDESC PageDesc;
2954	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2955
2956	/* Return the first page as we'll use the whole chunk as one big page. */
2957	*pIdPage = PageDesc.idPage;
2958	*pHCPhys = PageDesc.HCPhysGCPhys;
2959
2960	for (unsigned i = 1; i < cPages; i++)
2961	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2962
2963	/* Update accounting. */
2964	pGVM->gmm.s.Stats.Allocated.cBasePages += cPages;
2965	pGVM->gmm.s.Stats.cPrivatePages += cPages;
2966	pGMM->cAllocatedPages += cPages;
2967
2968	gmmR0LinkChunk(pChunk, pSet);
2969	gmmR0MutexRelease(pGMM);
2970	}
2971	else
2972	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2973	}
2974	}
2975	else
2976	{
2977	gmmR0MutexRelease(pGMM);
2978	rc = VERR_GMM_IS_NOT_SANE;
2979	}
2980
2981	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
2982	return rc;
2983	}
2984
2985
2986	/**
2987	* Free a large page
2988	*
2989	* @returns VBox status code:
2990	* @param pVM Pointer to the shared VM structure.
2991	* @param idCpu VCPU id
2992	* @param idPage Large page id
2993	*/
2994	GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
2995	{
2996	LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
2997
2998	/*
2999	* Validate, get basics and take the semaphore.
3000	*/
3001	PGMM pGMM;
3002	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3003	PGVM pGVM;
3004	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3005	if (RT_FAILURE(rc))
3006	return rc;
3007
3008	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3009	if (pGMM->fLegacyAllocationMode)
3010	return VERR_NOT_SUPPORTED;
3011
3012	gmmR0MutexAcquire(pGMM);
3013	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3014	{
3015	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3016
3017	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3018	{
3019	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3020	gmmR0MutexRelease(pGMM);
3021	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3022	}
3023
3024	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3025	if (RT_LIKELY( pPage
3026	&& GMM_PAGE_IS_PRIVATE(pPage)))
3027	{
3028	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3029	Assert(pChunk);
3030	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3031	Assert(pChunk->cPrivate > 0);
3032
3033	/* Release the memory immediately. */
3034	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3035
3036	/* Update accounting. */
3037	pGVM->gmm.s.Stats.Allocated.cBasePages -= cPages;
3038	pGVM->gmm.s.Stats.cPrivatePages -= cPages;
3039	pGMM->cAllocatedPages -= cPages;
3040	}
3041	else
3042	rc = VERR_GMM_PAGE_NOT_FOUND;
3043	}
3044	else
3045	rc = VERR_GMM_IS_NOT_SANE;
3046
3047	gmmR0MutexRelease(pGMM);
3048	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3049	return rc;
3050	}
3051
3052
3053	/**
3054	* VMMR0 request wrapper for GMMR0FreeLargePage.
3055	*
3056	* @returns see GMMR0FreeLargePage.
3057	* @param pVM Pointer to the shared VM structure.
3058	* @param idCpu VCPU id
3059	* @param pReq The request packet.
3060	*/
3061	GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3062	{
3063	/*
3064	* Validate input and pass it on.
3065	*/
3066	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3067	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3068	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3069	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3070	VERR_INVALID_PARAMETER);
3071
3072	return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3073	}
3074
3075
3076	/**
3077	* Frees a chunk, giving it back to the host OS.
3078	*
3079	* @param pGMM Pointer to the GMM instance.
3080	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3081	* unmap and free the chunk in one go.
3082	* @param pChunk The chunk to free.
3083	* @param fRelaxedSem Whether we can release the semaphore while doing the
3084	* freeing (@c true) or not.
3085	*/
3086	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3087	{
3088	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3089
3090	GMMR0CHUNKMTXSTATE MtxState;
3091	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3092
3093	/*
3094	* Cleanup hack! Unmap the chunk from the callers address space.
3095	* This shouldn't happen, so screw lock contention...
3096	*/
3097	if ( pChunk->cMappingsX
3098	&& !pGMM->fLegacyAllocationMode
3099	&& pGVM)
3100	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3101
3102	/*
3103	* If there are current mappings of the chunk, then request the
3104	* VMs to unmap them. Reposition the chunk in the free list so
3105	* it won't be a likely candidate for allocations.
3106	*/
3107	if (pChunk->cMappingsX)
3108	{
3109	/** @todo R0 -> VM request */
3110	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3111	Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3112	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3113	return false;
3114	}
3115
3116
3117	/*
3118	* Save and trash the handle.
3119	*/
3120	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3121	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3122
3123	/*
3124	* Unlink it from everywhere.
3125	*/
3126	gmmR0UnlinkChunk(pChunk);
3127
3128	RTListNodeRemove(&pChunk->ListNode);
3129
3130	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3131	Assert(pCore == &pChunk->Core); NOREF(pCore);
3132
3133	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3134	if (pTlbe->pChunk == pChunk)
3135	{
3136	pTlbe->idChunk = NIL_GMM_CHUNKID;
3137	pTlbe->pChunk = NULL;
3138	}
3139
3140	Assert(pGMM->cChunks > 0);
3141	pGMM->cChunks--;
3142
3143	/*
3144	* Free the Chunk ID before dropping the locks and freeing the rest.
3145	*/
3146	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3147	pChunk->Core.Key = NIL_GMM_CHUNKID;
3148
3149	pGMM->cFreedChunks++;
3150
3151	gmmR0ChunkMutexRelease(&MtxState, NULL);
3152	if (fRelaxedSem)
3153	gmmR0MutexRelease(pGMM);
3154
3155	RTMemFree(pChunk->paMappingsX);
3156	pChunk->paMappingsX = NULL;
3157
3158	RTMemFree(pChunk);
3159
3160	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3161	AssertLogRelRC(rc);
3162
3163	if (fRelaxedSem)
3164	gmmR0MutexAcquire(pGMM);
3165	return fRelaxedSem;
3166	}
3167
3168
3169	/**
3170	* Free page worker.
3171	*
3172	* The caller does all the statistic decrementing, we do all the incrementing.
3173	*
3174	* @param pGMM Pointer to the GMM instance data.
3175	* @param pGVM Pointer to the GVM instance.
3176	* @param pChunk Pointer to the chunk this page belongs to.
3177	* @param idPage The Page ID.
3178	* @param pPage Pointer to the page.
3179	*/
3180	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3181	{
3182	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3183	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3184
3185	/*
3186	* Put the page on the free list.
3187	*/
3188	pPage->u = 0;
3189	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3190	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3191	pPage->Free.iNext = pChunk->iFreeHead;
3192	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3193
3194	/*
3195	* Update statistics (the cShared/cPrivate stats are up to date already),
3196	* and relink the chunk if necessary.
3197	*/
3198	unsigned const cFree = pChunk->cFree;
3199	if ( !cFree
3200	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3201	{
3202	gmmR0UnlinkChunk(pChunk);
3203	pChunk->cFree++;
3204	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3205	}
3206	else
3207	{
3208	pChunk->cFree = cFree + 1;
3209	pChunk->pSet->cFreePages++;
3210	}
3211
3212	/*
3213	* If the chunk becomes empty, consider giving memory back to the host OS.
3214	*
3215	* The current strategy is to try give it back if there are other chunks
3216	* in this free list, meaning if there are at least 240 free pages in this
3217	* category. Note that since there are probably mappings of the chunk,
3218	* it won't be freed up instantly, which probably screws up this logic
3219	* a bit...
3220	*/
3221	/** @todo Do this on the way out. */
3222	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3223	&& pChunk->pFreeNext
3224	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3225	&& !pGMM->fLegacyAllocationMode))
3226	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3227
3228	}
3229
3230
3231	/**
3232	* Frees a shared page, the page is known to exist and be valid and such.
3233	*
3234	* @param pGMM Pointer to the GMM instance.
3235	* @param pGVM Pointer to the GVM instance.
3236	* @param idPage The Page ID
3237	* @param pPage The page structure.
3238	*/
3239	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3240	{
3241	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3242	Assert(pChunk);
3243	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3244	Assert(pChunk->cShared > 0);
3245	Assert(pGMM->cSharedPages > 0);
3246	Assert(pGMM->cAllocatedPages > 0);
3247	Assert(!pPage->Shared.cRefs);
3248
3249	pChunk->cShared--;
3250	pGMM->cAllocatedPages--;
3251	pGMM->cSharedPages--;
3252	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3253	}
3254
3255
3256	/**
3257	* Frees a private page, the page is known to exist and be valid and such.
3258	*
3259	* @param pGMM Pointer to the GMM instance.
3260	* @param pGVM Pointer to the GVM instance.
3261	* @param idPage The Page ID
3262	* @param pPage The page structure.
3263	*/
3264	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3265	{
3266	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3267	Assert(pChunk);
3268	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3269	Assert(pChunk->cPrivate > 0);
3270	Assert(pGMM->cAllocatedPages > 0);
3271
3272	pChunk->cPrivate--;
3273	pGMM->cAllocatedPages--;
3274	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3275	}
3276
3277
3278	/**
3279	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3280	*
3281	* @returns VBox status code:
3282	* @retval xxx
3283	*
3284	* @param pGMM Pointer to the GMM instance data.
3285	* @param pGVM Pointer to the shared VM structure.
3286	* @param cPages The number of pages to free.
3287	* @param paPages Pointer to the page descriptors.
3288	* @param enmAccount The account this relates to.
3289	*/
3290	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3291	{
3292	/*
3293	* Check that the request isn't impossible wrt to the account status.
3294	*/
3295	switch (enmAccount)
3296	{
3297	case GMMACCOUNT_BASE:
3298	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages < cPages))
3299	{
3300	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cBasePages, cPages));
3301	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3302	}
3303	break;
3304	case GMMACCOUNT_SHADOW:
3305	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cShadowPages < cPages))
3306	{
3307	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cShadowPages, cPages));
3308	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3309	}
3310	break;
3311	case GMMACCOUNT_FIXED:
3312	if (RT_UNLIKELY(pGVM->gmm.s.Stats.Allocated.cFixedPages < cPages))
3313	{
3314	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Stats.Allocated.cFixedPages, cPages));
3315	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3316	}
3317	break;
3318	default:
3319	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3320	}
3321
3322	/*
3323	* Walk the descriptors and free the pages.
3324	*
3325	* Statistics (except the account) are being updated as we go along,
3326	* unlike the alloc code. Also, stop on the first error.
3327	*/
3328	int rc = VINF_SUCCESS;
3329	uint32_t iPage;
3330	for (iPage = 0; iPage < cPages; iPage++)
3331	{
3332	uint32_t idPage = paPages[iPage].idPage;
3333	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3334	if (RT_LIKELY(pPage))
3335	{
3336	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3337	{
3338	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3339	{
3340	Assert(pGVM->gmm.s.Stats.cPrivatePages);
3341	pGVM->gmm.s.Stats.cPrivatePages--;
3342	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3343	}
3344	else
3345	{
3346	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3347	pPage->Private.hGVM, pGVM->hSelf));
3348	rc = VERR_GMM_NOT_PAGE_OWNER;
3349	break;
3350	}
3351	}
3352	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3353	{
3354	Assert(pGVM->gmm.s.Stats.cSharedPages);
3355	pGVM->gmm.s.Stats.cSharedPages--;
3356	Assert(pPage->Shared.cRefs);
3357	if (!--pPage->Shared.cRefs)
3358	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3359	else
3360	{
3361	Assert(pGMM->cDuplicatePages);
3362	pGMM->cDuplicatePages--;
3363	}
3364	}
3365	else
3366	{
3367	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3368	rc = VERR_GMM_PAGE_ALREADY_FREE;
3369	break;
3370	}
3371	}
3372	else
3373	{
3374	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3375	rc = VERR_GMM_PAGE_NOT_FOUND;
3376	break;
3377	}
3378	paPages[iPage].idPage = NIL_GMM_PAGEID;
3379	}
3380
3381	/*
3382	* Update the account.
3383	*/
3384	switch (enmAccount)
3385	{
3386	case GMMACCOUNT_BASE: pGVM->gmm.s.Stats.Allocated.cBasePages -= iPage; break;
3387	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Stats.Allocated.cShadowPages -= iPage; break;
3388	case GMMACCOUNT_FIXED: pGVM->gmm.s.Stats.Allocated.cFixedPages -= iPage; break;
3389	default:
3390	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3391	}
3392
3393	/*
3394	* Any threshold stuff to be done here?
3395	*/
3396
3397	return rc;
3398	}
3399
3400
3401	/**
3402	* Free one or more pages.
3403	*
3404	* This is typically used at reset time or power off.
3405	*
3406	* @returns VBox status code:
3407	* @retval xxx
3408	*
3409	* @param pVM Pointer to the shared VM structure.
3410	* @param idCpu VCPU id
3411	* @param cPages The number of pages to allocate.
3412	* @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3413	* @param enmAccount The account this relates to.
3414	* @thread EMT.
3415	*/
3416	GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3417	{
3418	LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3419
3420	/*
3421	* Validate input and get the basics.
3422	*/
3423	PGMM pGMM;
3424	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3425	PGVM pGVM;
3426	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3427	if (RT_FAILURE(rc))
3428	return rc;
3429
3430	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3431	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3432	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3433
3434	for (unsigned iPage = 0; iPage < cPages; iPage++)
3435	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3436	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3437	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3438
3439	/*
3440	* Take the semaphore and call the worker function.
3441	*/
3442	gmmR0MutexAcquire(pGMM);
3443	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3444	{
3445	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3446	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3447	}
3448	else
3449	rc = VERR_GMM_IS_NOT_SANE;
3450	gmmR0MutexRelease(pGMM);
3451	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3452	return rc;
3453	}
3454
3455
3456	/**
3457	* VMMR0 request wrapper for GMMR0FreePages.
3458	*
3459	* @returns see GMMR0FreePages.
3460	* @param pVM Pointer to the shared VM structure.
3461	* @param idCpu VCPU id
3462	* @param pReq The request packet.
3463	*/
3464	GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3465	{
3466	/*
3467	* Validate input and pass it on.
3468	*/
3469	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3470	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3471	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3472	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3473	VERR_INVALID_PARAMETER);
3474	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3475	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3476	VERR_INVALID_PARAMETER);
3477
3478	return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3479	}
3480
3481
3482	/**
3483	* Report back on a memory ballooning request.
3484	*
3485	* The request may or may not have been initiated by the GMM. If it was initiated
3486	* by the GMM it is important that this function is called even if no pages were
3487	* ballooned.
3488	*
3489	* @returns VBox status code:
3490	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3491	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3492	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3493	* indicating that we won't necessarily have sufficient RAM to boot
3494	* the VM again and that it should pause until this changes (we'll try
3495	* balloon some other VM). (For standard deflate we have little choice
3496	* but to hope the VM won't use the memory that was returned to it.)
3497	*
3498	* @param pVM Pointer to the shared VM structure.
3499	* @param idCpu VCPU id
3500	* @param enmAction Inflate/deflate/reset
3501	* @param cBalloonedPages The number of pages that was ballooned.
3502	*
3503	* @thread EMT.
3504	*/
3505	GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3506	{
3507	LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3508	pVM, enmAction, cBalloonedPages));
3509
3510	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3511
3512	/*
3513	* Validate input and get the basics.
3514	*/
3515	PGMM pGMM;
3516	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3517	PGVM pGVM;
3518	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3519	if (RT_FAILURE(rc))
3520	return rc;
3521
3522	/*
3523	* Take the semaphore and do some more validations.
3524	*/
3525	gmmR0MutexAcquire(pGMM);
3526	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3527	{
3528	switch (enmAction)
3529	{
3530	case GMMBALLOONACTION_INFLATE:
3531	{
3532	if (RT_LIKELY(pGVM->gmm.s.Stats.Allocated.cBasePages + pGVM->gmm.s.Stats.cBalloonedPages + cBalloonedPages
3533	<= pGVM->gmm.s.Stats.Reserved.cBasePages))
3534	{
3535	/*
3536	* Record the ballooned memory.
3537	*/
3538	pGMM->cBalloonedPages += cBalloonedPages;
3539	if (pGVM->gmm.s.Stats.cReqBalloonedPages)
3540	{
3541	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3542	AssertFailed();
3543
3544	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3545	pGVM->gmm.s.Stats.cReqActuallyBalloonedPages += cBalloonedPages;
3546	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n",
3547	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages,
3548	pGVM->gmm.s.Stats.cReqBalloonedPages, pGVM->gmm.s.Stats.cReqActuallyBalloonedPages));
3549	}
3550	else
3551	{
3552	pGVM->gmm.s.Stats.cBalloonedPages += cBalloonedPages;
3553	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3554	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3555	}
3556	}
3557	else
3558	{
3559	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3560	pGVM->gmm.s.Stats.Allocated.cBasePages, pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages,
3561	pGVM->gmm.s.Stats.Reserved.cBasePages));
3562	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3563	}
3564	break;
3565	}
3566
3567	case GMMBALLOONACTION_DEFLATE:
3568	{
3569	/* Deflate. */
3570	if (pGVM->gmm.s.Stats.cBalloonedPages >= cBalloonedPages)
3571	{
3572	/*
3573	* Record the ballooned memory.
3574	*/
3575	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3576	pGMM->cBalloonedPages -= cBalloonedPages;
3577	pGVM->gmm.s.Stats.cBalloonedPages -= cBalloonedPages;
3578	if (pGVM->gmm.s.Stats.cReqDeflatePages)
3579	{
3580	AssertFailed(); /* This is path is for later. */
3581	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3582	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages, pGVM->gmm.s.Stats.cReqDeflatePages));
3583
3584	/*
3585	* Anything we need to do here now when the request has been completed?
3586	*/
3587	pGVM->gmm.s.Stats.cReqDeflatePages = 0;
3588	}
3589	else
3590	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3591	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.Stats.cBalloonedPages));
3592	}
3593	else
3594	{
3595	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.Stats.cBalloonedPages, cBalloonedPages));
3596	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3597	}
3598	break;
3599	}
3600
3601	case GMMBALLOONACTION_RESET:
3602	{
3603	/* Reset to an empty balloon. */
3604	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.Stats.cBalloonedPages);
3605
3606	pGMM->cBalloonedPages -= pGVM->gmm.s.Stats.cBalloonedPages;
3607	pGVM->gmm.s.Stats.cBalloonedPages = 0;
3608	break;
3609	}
3610
3611	default:
3612	rc = VERR_INVALID_PARAMETER;
3613	break;
3614	}
3615	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3616	}
3617	else
3618	rc = VERR_GMM_IS_NOT_SANE;
3619
3620	gmmR0MutexRelease(pGMM);
3621	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3622	return rc;
3623	}
3624
3625
3626	/**
3627	* VMMR0 request wrapper for GMMR0BalloonedPages.
3628	*
3629	* @returns see GMMR0BalloonedPages.
3630	* @param pVM Pointer to the shared VM structure.
3631	* @param idCpu VCPU id
3632	* @param pReq The request packet.
3633	*/
3634	GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3635	{
3636	/*
3637	* Validate input and pass it on.
3638	*/
3639	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3640	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3641	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3642	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3643	VERR_INVALID_PARAMETER);
3644
3645	return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3646	}
3647
3648	/**
3649	* Return memory statistics for the hypervisor
3650	*
3651	* @returns VBox status code:
3652	* @param pVM Pointer to the shared VM structure.
3653	* @param pReq The request packet.
3654	*/
3655	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3656	{
3657	/*
3658	* Validate input and pass it on.
3659	*/
3660	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3661	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3662	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3663	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3664	VERR_INVALID_PARAMETER);
3665
3666	/*
3667	* Validate input and get the basics.
3668	*/
3669	PGMM pGMM;
3670	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3671	pReq->cAllocPages = pGMM->cAllocatedPages;
3672	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3673	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3674	pReq->cMaxPages = pGMM->cMaxPages;
3675	pReq->cSharedPages = pGMM->cDuplicatePages;
3676	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3677
3678	return VINF_SUCCESS;
3679	}
3680
3681	/**
3682	* Return memory statistics for the VM
3683	*
3684	* @returns VBox status code:
3685	* @param pVM Pointer to the shared VM structure.
3686	* @parma idCpu Cpu id.
3687	* @param pReq The request packet.
3688	*/
3689	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3690	{
3691	/*
3692	* Validate input and pass it on.
3693	*/
3694	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3695	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3696	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3697	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3698	VERR_INVALID_PARAMETER);
3699
3700	/*
3701	* Validate input and get the basics.
3702	*/
3703	PGMM pGMM;
3704	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3705	PGVM pGVM;
3706	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3707	if (RT_FAILURE(rc))
3708	return rc;
3709
3710	/*
3711	* Take the semaphore and do some more validations.
3712	*/
3713	gmmR0MutexAcquire(pGMM);
3714	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3715	{
3716	pReq->cAllocPages = pGVM->gmm.s.Stats.Allocated.cBasePages;
3717	pReq->cBalloonedPages = pGVM->gmm.s.Stats.cBalloonedPages;
3718	pReq->cMaxPages = pGVM->gmm.s.Stats.Reserved.cBasePages;
3719	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3720	}
3721	else
3722	rc = VERR_GMM_IS_NOT_SANE;
3723
3724	gmmR0MutexRelease(pGMM);
3725	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3726	return rc;
3727	}
3728
3729
3730	/**
3731	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3732	*
3733	* Don't call this in legacy allocation mode!
3734	*
3735	* @returns VBox status code.
3736	* @param pGMM Pointer to the GMM instance data.
3737	* @param pGVM Pointer to the Global VM structure.
3738	* @param pChunk Pointer to the chunk to be unmapped.
3739	*/
3740	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3741	{
3742	Assert(!pGMM->fLegacyAllocationMode);
3743
3744	/*
3745	* Find the mapping and try unmapping it.
3746	*/
3747	uint32_t cMappings = pChunk->cMappingsX;
3748	for (uint32_t i = 0; i < cMappings; i++)
3749	{
3750	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3751	if (pChunk->paMappingsX[i].pGVM == pGVM)
3752	{
3753	/* unmap */
3754	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3755	if (RT_SUCCESS(rc))
3756	{
3757	/* update the record. */
3758	cMappings--;
3759	if (i < cMappings)
3760	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3761	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3762	pChunk->paMappingsX[cMappings].pGVM = NULL;
3763	Assert(pChunk->cMappingsX - 1U == cMappings);
3764	pChunk->cMappingsX = cMappings;
3765	}
3766
3767	return rc;
3768	}
3769	}
3770
3771	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3772	return VERR_GMM_CHUNK_NOT_MAPPED;
3773	}
3774
3775
3776	/**
3777	* Unmaps a chunk previously mapped into the address space of the current process.
3778	*
3779	* @returns VBox status code.
3780	* @param pGMM Pointer to the GMM instance data.
3781	* @param pGVM Pointer to the Global VM structure.
3782	* @param pChunk Pointer to the chunk to be unmapped.
3783	*/
3784	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3785	{
3786	if (!pGMM->fLegacyAllocationMode)
3787	{
3788	/*
3789	* Lock the chunk and if possible leave the giant GMM lock.
3790	*/
3791	GMMR0CHUNKMTXSTATE MtxState;
3792	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3793	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3794	if (RT_SUCCESS(rc))
3795	{
3796	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3797	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3798	}
3799	return rc;
3800	}
3801
3802	if (pChunk->hGVM == pGVM->hSelf)
3803	return VINF_SUCCESS;
3804
3805	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3806	return VERR_GMM_CHUNK_NOT_MAPPED;
3807	}
3808
3809
3810	/**
3811	* Worker for gmmR0MapChunk.
3812	*
3813	* @returns VBox status code.
3814	* @param pGMM Pointer to the GMM instance data.
3815	* @param pGVM Pointer to the Global VM structure.
3816	* @param pChunk Pointer to the chunk to be mapped.
3817	* @param ppvR3 Where to store the ring-3 address of the mapping.
3818	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3819	* contain the address of the existing mapping.
3820	*/
3821	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3822	{
3823	/*
3824	* If we're in legacy mode this is simple.
3825	*/
3826	if (pGMM->fLegacyAllocationMode)
3827	{
3828	if (pChunk->hGVM != pGVM->hSelf)
3829	{
3830	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3831	return VERR_GMM_CHUNK_NOT_FOUND;
3832	}
3833
3834	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3835	return VINF_SUCCESS;
3836	}
3837
3838	/*
3839	* Check to see if the chunk is already mapped.
3840	*/
3841	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3842	{
3843	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3844	if (pChunk->paMappingsX[i].pGVM == pGVM)
3845	{
3846	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3847	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3848	#ifdef VBOX_WITH_PAGE_SHARING
3849	/* The ring-3 chunk cache can be out of sync; don't fail. */
3850	return VINF_SUCCESS;
3851	#else
3852	return VERR_GMM_CHUNK_ALREADY_MAPPED;
3853	#endif
3854	}
3855	}
3856
3857	/*
3858	* Do the mapping.
3859	*/
3860	RTR0MEMOBJ hMapObj;
3861	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
3862	if (RT_SUCCESS(rc))
3863	{
3864	/* reallocate the array? assumes few users per chunk (usually one). */
3865	unsigned iMapping = pChunk->cMappingsX;
3866	if ( iMapping <= 3
3867	\|\| (iMapping & 3) == 0)
3868	{
3869	unsigned cNewSize = iMapping <= 3
3870	? iMapping + 1
3871	: iMapping + 4;
3872	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
3873	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
3874	{
3875	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3876	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
3877	}
3878
3879	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
3880	if (RT_UNLIKELY(!pvMappings))
3881	{
3882	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3883	return VERR_NO_MEMORY;
3884	}
3885	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
3886	}
3887
3888	/* insert new entry */
3889	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
3890	pChunk->paMappingsX[iMapping].pGVM = pGVM;
3891	Assert(pChunk->cMappingsX == iMapping);
3892	pChunk->cMappingsX = iMapping + 1;
3893
3894	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
3895	}
3896
3897	return rc;
3898	}
3899
3900
3901	/**
3902	* Maps a chunk into the user address space of the current process.
3903	*
3904	* @returns VBox status code.
3905	* @param pGMM Pointer to the GMM instance data.
3906	* @param pGVM Pointer to the Global VM structure.
3907	* @param pChunk Pointer to the chunk to be mapped.
3908	* @param fRelaxedSem Whether we can release the semaphore while doing the
3909	* mapping (@c true) or not.
3910	* @param ppvR3 Where to store the ring-3 address of the mapping.
3911	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3912	* contain the address of the existing mapping.
3913	*/
3914	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
3915	{
3916	/*
3917	* Take the chunk lock and leave the giant GMM lock when possible, then
3918	* call the worker function.
3919	*/
3920	GMMR0CHUNKMTXSTATE MtxState;
3921	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3922	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3923	if (RT_SUCCESS(rc))
3924	{
3925	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
3926	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3927	}
3928
3929	return rc;
3930	}
3931
3932
3933
3934	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
3935	/**
3936	* Check if a chunk is mapped into the specified VM
3937	*
3938	* @returns mapped yes/no
3939	* @param pGMM Pointer to the GMM instance.
3940	* @param pGVM Pointer to the Global VM structure.
3941	* @param pChunk Pointer to the chunk to be mapped.
3942	* @param ppvR3 Where to store the ring-3 address of the mapping.
3943	*/
3944	static int gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3945	{
3946	GMMR0CHUNKMTXSTATE MtxState;
3947	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3948	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3949	{
3950	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3951	if (pChunk->paMappingsX[i].pGVM == pGVM)
3952	{
3953	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3954	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3955	return true;
3956	}
3957	}
3958	*ppvR3 = NULL;
3959	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3960	return false;
3961	}
3962	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
3963
3964
3965	/**
3966	* Map a chunk and/or unmap another chunk.
3967	*
3968	* The mapping and unmapping applies to the current process.
3969	*
3970	* This API does two things because it saves a kernel call per mapping when
3971	* when the ring-3 mapping cache is full.
3972	*
3973	* @returns VBox status code.
3974	* @param pVM The VM.
3975	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
3976	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
3977	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
3978	* @thread EMT
3979	*/
3980	GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
3981	{
3982	LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
3983	pVM, idChunkMap, idChunkUnmap, ppvR3));
3984
3985	/*
3986	* Validate input and get the basics.
3987	*/
3988	PGMM pGMM;
3989	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3990	PGVM pGVM;
3991	int rc = GVMMR0ByVM(pVM, &pGVM);
3992	if (RT_FAILURE(rc))
3993	return rc;
3994
3995	AssertCompile(NIL_GMM_CHUNKID == 0);
3996	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
3997	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
3998
3999	if ( idChunkMap == NIL_GMM_CHUNKID
4000	&& idChunkUnmap == NIL_GMM_CHUNKID)
4001	return VERR_INVALID_PARAMETER;
4002
4003	if (idChunkMap != NIL_GMM_CHUNKID)
4004	{
4005	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4006	*ppvR3 = NIL_RTR3PTR;
4007	}
4008
4009	/*
4010	* Take the semaphore and do the work.
4011	*
4012	* The unmapping is done last since it's easier to undo a mapping than
4013	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4014	* that it pushes the user virtual address space to within a chunk of
4015	* it it's limits, so, no problem here.
4016	*/
4017	gmmR0MutexAcquire(pGMM);
4018	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4019	{
4020	PGMMCHUNK pMap = NULL;
4021	if (idChunkMap != NIL_GVM_HANDLE)
4022	{
4023	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4024	if (RT_LIKELY(pMap))
4025	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4026	else
4027	{
4028	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4029	rc = VERR_GMM_CHUNK_NOT_FOUND;
4030	}
4031	}
4032	/** @todo split this operation, the bail out might (theoretcially) not be
4033	* entirely safe. */
4034
4035	if ( idChunkUnmap != NIL_GMM_CHUNKID
4036	&& RT_SUCCESS(rc))
4037	{
4038	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4039	if (RT_LIKELY(pUnmap))
4040	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4041	else
4042	{
4043	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4044	rc = VERR_GMM_CHUNK_NOT_FOUND;
4045	}
4046
4047	if (RT_FAILURE(rc) && pMap)
4048	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4049	}
4050
4051	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4052	}
4053	else
4054	rc = VERR_GMM_IS_NOT_SANE;
4055	gmmR0MutexRelease(pGMM);
4056
4057	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4058	return rc;
4059	}
4060
4061
4062	/**
4063	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4064	*
4065	* @returns see GMMR0MapUnmapChunk.
4066	* @param pVM Pointer to the shared VM structure.
4067	* @param pReq The request packet.
4068	*/
4069	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4070	{
4071	/*
4072	* Validate input and pass it on.
4073	*/
4074	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4075	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4076	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4077
4078	return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4079	}
4080
4081
4082	/**
4083	* Legacy mode API for supplying pages.
4084	*
4085	* The specified user address points to a allocation chunk sized block that
4086	* will be locked down and used by the GMM when the GM asks for pages.
4087	*
4088	* @returns VBox status code.
4089	* @param pVM The VM.
4090	* @param idCpu VCPU id
4091	* @param pvR3 Pointer to the chunk size memory block to lock down.
4092	*/
4093	GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4094	{
4095	/*
4096	* Validate input and get the basics.
4097	*/
4098	PGMM pGMM;
4099	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4100	PGVM pGVM;
4101	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4102	if (RT_FAILURE(rc))
4103	return rc;
4104
4105	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4106	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4107
4108	if (!pGMM->fLegacyAllocationMode)
4109	{
4110	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4111	return VERR_NOT_SUPPORTED;
4112	}
4113
4114	/*
4115	* Lock the memory and add it as new chunk with our hGVM.
4116	* (The GMM locking is done inside gmmR0RegisterChunk.)
4117	*/
4118	RTR0MEMOBJ MemObj;
4119	rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4120	if (RT_SUCCESS(rc))
4121	{
4122	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4123	if (RT_SUCCESS(rc))
4124	gmmR0MutexRelease(pGMM);
4125	else
4126	RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4127	}
4128
4129	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4130	return rc;
4131	}
4132
4133
4134	typedef struct
4135	{
4136	PAVLGCPTRNODECORE pNode;
4137	char *pszModuleName;
4138	char *pszVersion;
4139	VBOXOSFAMILY enmGuestOS;
4140	} GMMFINDMODULEBYNAME, *PGMMFINDMODULEBYNAME;
4141
4142	/**
4143	* Tree enumeration callback for finding identical modules by name and version
4144	*/
4145	DECLCALLBACK(int) gmmR0CheckForIdenticalModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4146	{
4147	PGMMFINDMODULEBYNAME pInfo = (PGMMFINDMODULEBYNAME)pvUser;
4148	PGMMSHAREDMODULE pModule = (PGMMSHAREDMODULE)pNode;
4149
4150	if ( pInfo
4151	&& pInfo->enmGuestOS == pModule->enmGuestOS
4152	/** @todo replace with RTStrNCmp */
4153	&& !strcmp(pModule->szName, pInfo->pszModuleName)
4154	&& !strcmp(pModule->szVersion, pInfo->pszVersion))
4155	{
4156	pInfo->pNode = pNode;
4157	return 1; /* stop search */
4158	}
4159	return 0;
4160	}
4161
4162
4163	/**
4164	* Registers a new shared module for the VM
4165	*
4166	* @returns VBox status code.
4167	* @param pVM VM handle
4168	* @param idCpu VCPU id
4169	* @param enmGuestOS Guest OS type
4170	* @param pszModuleName Module name
4171	* @param pszVersion Module version
4172	* @param GCBaseAddr Module base address
4173	* @param cbModule Module size
4174	* @param cRegions Number of shared region descriptors
4175	* @param pRegions Shared region(s)
4176	*/
4177	GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4178	char *pszVersion, RTGCPTR GCBaseAddr, uint32_t cbModule,
4179	uint32_t cRegions, VMMDEVSHAREDREGIONDESC *pRegions)
4180	{
4181	#ifdef VBOX_WITH_PAGE_SHARING
4182	/*
4183	* Validate input and get the basics.
4184	*/
4185	PGMM pGMM;
4186	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4187	PGVM pGVM;
4188	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4189	if (RT_FAILURE(rc))
4190	return rc;
4191
4192	AssertPtrReturn(pszModuleName, VERR_INVALID_POINTER);
4193	AssertPtrReturn(pszVersion, VERR_INVALID_POINTER);
4194
4195
4196	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4197
4198	/*
4199	* Take the semaphore and do some more validations.
4200	*/
4201	gmmR0MutexAcquire(pGMM);
4202	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4203	{
4204	bool fNewModule = false;
4205
4206	/* Check if this module is already locally registered. */
4207	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4208	if (!pRecVM)
4209	{
4210	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegions[cRegions]));
4211	if (!pRecVM)
4212	{
4213	AssertFailed();
4214	rc = VERR_NO_MEMORY;
4215	goto end;
4216	}
4217	pRecVM->Core.Key = GCBaseAddr;
4218	pRecVM->cRegions = cRegions;
4219
4220	/* Save the region data as they can differ between VMs (address space scrambling or simply different loading order) */
4221	for (unsigned i = 0; i < cRegions; i++)
4222	{
4223	pRecVM->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4224	pRecVM->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4225	pRecVM->aRegions[i].u32Alignment = 0;
4226	pRecVM->aRegions[i].paidPages = NULL; /* unused */
4227	}
4228
4229	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4230	Assert(fInsert); NOREF(fInsert);
4231	pGVM->gmm.s.Stats.cShareableModules++;
4232
4233	Log(("GMMR0RegisterSharedModule: new local module %s\n", pszModuleName));
4234	fNewModule = true;
4235	}
4236	else
4237	rc = VINF_PGM_SHARED_MODULE_ALREADY_REGISTERED;
4238
4239	/* Check if this module is already globally registered. */
4240	PGMMSHAREDMODULE pGlobalModule = (PGMMSHAREDMODULE)RTAvlGCPtrGet(&pGMM->pGlobalSharedModuleTree, GCBaseAddr);
4241	if ( !pGlobalModule
4242	&& enmGuestOS == VBOXOSFAMILY_Windows64)
4243	{
4244	/*
4245	* Two identical copies of e.g. Win7 x64 will typically not have a
4246	* similar virtual address space layout for dlls or kernel modules.
4247	* Try to find identical binaries based on name and version.
4248	*/
4249	GMMFINDMODULEBYNAME Info;
4250
4251	Info.pNode = NULL;
4252	Info.pszVersion = pszVersion;
4253	Info.pszModuleName = pszModuleName;
4254	Info.enmGuestOS = enmGuestOS;
4255
4256	Log(("Try to find identical module %s\n", pszModuleName));
4257	int ret = RTAvlGCPtrDoWithAll(&pGMM->pGlobalSharedModuleTree, true /* fFromLeft */, gmmR0CheckForIdenticalModule, &Info);
4258	if (ret == 1)
4259	{
4260	Assert(Info.pNode);
4261	pGlobalModule = (PGMMSHAREDMODULE)Info.pNode;
4262	Log(("Found identical module at %RGv\n", pGlobalModule->Core.Key));
4263	}
4264	}
4265
4266	if (!pGlobalModule)
4267	{
4268	Assert(fNewModule);
4269	Assert(!pRecVM->fCollision);
4270
4271	pGlobalModule = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4272	if (!pGlobalModule)
4273	{
4274	AssertFailed();
4275	rc = VERR_NO_MEMORY;
4276	goto end;
4277	}
4278
4279	pGlobalModule->Core.Key = GCBaseAddr;
4280	pGlobalModule->cbModule = cbModule;
4281	/* Input limit already safe; no need to check again. */
4282	/** @todo replace with RTStrCopy */
4283	strcpy(pGlobalModule->szName, pszModuleName);
4284	strcpy(pGlobalModule->szVersion, pszVersion);
4285
4286	pGlobalModule->enmGuestOS = enmGuestOS;
4287	pGlobalModule->cRegions = cRegions;
4288
4289	for (unsigned i = 0; i < cRegions; i++)
4290	{
4291	Log(("New region %d base=%RGv size %x\n", i, pRegions[i].GCRegionAddr, pRegions[i].cbRegion));
4292	pGlobalModule->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4293	pGlobalModule->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4294	pGlobalModule->aRegions[i].u32Alignment = 0;
4295	pGlobalModule->aRegions[i].paidPages = NULL; /* uninitialized. */
4296	}
4297
4298	/* Save reference. */
4299	pRecVM->pGlobalModule = pGlobalModule;
4300	pRecVM->fCollision = false;
4301	pGlobalModule->cUsers++;
4302	rc = VINF_SUCCESS;
4303
4304	bool fInsert = RTAvlGCPtrInsert(&pGMM->pGlobalSharedModuleTree, &pGlobalModule->Core);
4305	Assert(fInsert); NOREF(fInsert);
4306	pGMM->cShareableModules++;
4307
4308	Log(("GMMR0RegisterSharedModule: new global module %s\n", pszModuleName));
4309	}
4310	else
4311	{
4312	Assert(pGlobalModule->cUsers > 0);
4313
4314	/* Make sure the name and version are identical. */
4315	/** @todo replace with RTStrNCmp */
4316	/** @todo need to check region count or it will blow up in
4317	* GMMR0SharedModuleCheckPage / PGMR0SharedModuleCheck.
4318	*
4319	* More interstingly though, is why we don't do the W7/64 thing for ALL
4320	* modules and just dispense with the address collision. */
4321	if ( !strcmp(pGlobalModule->szName, pszModuleName)
4322	&& !strcmp(pGlobalModule->szVersion, pszVersion))
4323	{
4324	/* Save reference. */
4325	pRecVM->pGlobalModule = pGlobalModule;
4326	if ( fNewModule
4327	\|\| pRecVM->fCollision == true) /* colliding module unregistered and new one registered since the last check */
4328	{
4329	pGlobalModule->cUsers++;
4330	Log(("GMMR0RegisterSharedModule: using existing module %s cUser=%d!\n", pszModuleName, pGlobalModule->cUsers));
4331	}
4332	pRecVM->fCollision = false;
4333	rc = VINF_SUCCESS;
4334	}
4335	else
4336	{
4337	Log(("GMMR0RegisterSharedModule: module %s collision!\n", pszModuleName));
4338	pRecVM->fCollision = true;
4339	rc = VINF_PGM_SHARED_MODULE_COLLISION;
4340	goto end;
4341	}
4342	}
4343
4344	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4345	}
4346	else
4347	rc = VERR_GMM_IS_NOT_SANE;
4348
4349	end:
4350	gmmR0MutexRelease(pGMM);
4351	return rc;
4352	#else
4353
4354	NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4355	NOREF(GCBaseAddr); NOREF(cbModule); NOREF(cRegions); NOREF(pRegions);
4356	return VERR_NOT_IMPLEMENTED;
4357	#endif
4358	}
4359
4360
4361	/**
4362	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4363	*
4364	* @returns see GMMR0RegisterSharedModule.
4365	* @param pVM Pointer to the shared VM structure.
4366	* @param idCpu VCPU id
4367	* @param pReq The request packet.
4368	*/
4369	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4370	{
4371	/*
4372	* Validate input and pass it on.
4373	*/
4374	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4375	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4376	AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4377
4378	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4379	pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4380	return VINF_SUCCESS;
4381	}
4382
4383
4384	/**
4385	* Unregisters a shared module for the VM
4386	*
4387	* @returns VBox status code.
4388	* @param pVM VM handle
4389	* @param idCpu VCPU id
4390	* @param pszModuleName Module name
4391	* @param pszVersion Module version
4392	* @param GCBaseAddr Module base address
4393	* @param cbModule Module size
4394	*/
4395	GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4396	RTGCPTR GCBaseAddr, uint32_t cbModule)
4397	{
4398	#ifdef VBOX_WITH_PAGE_SHARING
4399	/*
4400	* Validate input and get the basics.
4401	*/
4402	PGMM pGMM;
4403	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4404	PGVM pGVM;
4405	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4406	if (RT_FAILURE(rc))
4407	return rc;
4408
4409	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4410
4411	/*
4412	* Take the semaphore and do some more validations.
4413	*/
4414	gmmR0MutexAcquire(pGMM);
4415	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4416	{
4417	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4418	if (pRecVM)
4419	{
4420	/* Remove reference to global shared module. */
4421	if (!pRecVM->fCollision)
4422	{
4423	PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4424	Assert(pRec);
4425
4426	if (pRec) /* paranoia */
4427	{
4428	Assert(pRec->cUsers);
4429	pRec->cUsers--;
4430	if (pRec->cUsers == 0)
4431	{
4432	/* Free the ranges, but leave the pages intact as there
4433	might still be references; they will be cleared by
4434	the COW mechanism. */
4435	for (uint32_t i = 0; i < pRec->cRegions; i++)
4436	{
4437	RTMemFree(pRec->aRegions[i].paidPages);
4438	pRec->aRegions[i].paidPages = NULL;
4439	}
4440
4441	Assert(pRec->Core.Key == GCBaseAddr \|\| pRec->enmGuestOS == VBOXOSFAMILY_Windows64);
4442	Assert(pRec->cRegions == pRecVM->cRegions);
4443	#ifdef VBOX_STRICT
4444	for (uint32_t i = 0; i < pRecVM->cRegions; i++)
4445	{
4446	Assert(pRecVM->aRegions[i].GCRegionAddr == pRec->aRegions[i].GCRegionAddr);
4447	Assert(pRecVM->aRegions[i].cbRegion == pRec->aRegions[i].cbRegion);
4448	}
4449	#endif
4450
4451	/* Remove from the tree and free memory. */
4452	RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4453	pGMM->cShareableModules--;
4454	RTMemFree(pRec);
4455	}
4456	}
4457	else
4458	rc = VERR_PGM_SHARED_MODULE_REGISTRATION_INCONSISTENCY;
4459	}
4460	else
4461	Assert(!pRecVM->pGlobalModule);
4462
4463	/* Remove from the tree and free memory. */
4464	RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4465	pGVM->gmm.s.Stats.cShareableModules--;
4466	RTMemFree(pRecVM);
4467	}
4468	else
4469	rc = VERR_PGM_SHARED_MODULE_NOT_FOUND;
4470
4471	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4472	}
4473	else
4474	rc = VERR_GMM_IS_NOT_SANE;
4475
4476	gmmR0MutexRelease(pGMM);
4477	return rc;
4478	#else
4479
4480	NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCBaseAddr); NOREF(cbModule);
4481	return VERR_NOT_IMPLEMENTED;
4482	#endif
4483	}
4484
4485
4486	/**
4487	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4488	*
4489	* @returns see GMMR0UnregisterSharedModule.
4490	* @param pVM Pointer to the shared VM structure.
4491	* @param idCpu VCPU id
4492	* @param pReq The request packet.
4493	*/
4494	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4495	{
4496	/*
4497	* Validate input and pass it on.
4498	*/
4499	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4500	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4501	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4502
4503	return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4504	}
4505
4506	#ifdef VBOX_WITH_PAGE_SHARING
4507
4508	/**
4509	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4510	*
4511	* @param pGMM Pointer to the GMM instance.
4512	* @param pGVM Pointer to the GVM instance.
4513	* @param pPage The page structure.
4514	*/
4515	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4516	{
4517	Assert(pGMM->cSharedPages > 0);
4518	Assert(pGMM->cAllocatedPages > 0);
4519
4520	pGMM->cDuplicatePages++;
4521
4522	pPage->Shared.cRefs++;
4523	pGVM->gmm.s.Stats.cSharedPages++;
4524	pGVM->gmm.s.Stats.Allocated.cBasePages++;
4525	}
4526
4527
4528	/**
4529	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4530	*
4531	* @param pGMM Pointer to the GMM instance.
4532	* @param pGVM Pointer to the GVM instance.
4533	* @param HCPhys Host physical address
4534	* @param idPage The Page ID
4535	* @param pPage The page structure.
4536	*/
4537	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage)
4538	{
4539	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4540	Assert(pChunk);
4541	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4542	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4543
4544	pChunk->cPrivate--;
4545	pChunk->cShared++;
4546
4547	pGMM->cSharedPages++;
4548
4549	pGVM->gmm.s.Stats.cSharedPages++;
4550	pGVM->gmm.s.Stats.cPrivatePages--;
4551
4552	/* Modify the page structure. */
4553	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4554	pPage->Shared.cRefs = 1;
4555	pPage->Common.u2State = GMM_PAGE_STATE_SHARED;
4556	}
4557
4558
4559	static int gmmR0SharedModuleCheckPageFirstTime(PGMM pGMM, PGVM pGVM, PGMMSHAREDMODULE pModule,
4560	unsigned idxRegion, unsigned idxPage,
4561	PGMMSHAREDPAGEDESC pPageDesc, PGMMSHAREDREGIONDESC pGlobalRegion)
4562	{
4563	/* Easy case: just change the internal page type. */
4564	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->idPage);
4565	AssertMsgReturn(pPage, ("idPage=%#x (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x) #1\n",
4566	pPageDesc->idPage, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage),
4567	VERR_PGM_PHYS_INVALID_PAGE_ID);
4568
4569	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4570
4571	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->idPage, pPage);
4572
4573	/* Keep track of these references. */
4574	pGlobalRegion->paidPages[idxPage] = pPageDesc->idPage;
4575
4576	return VINF_SUCCESS;
4577	}
4578
4579	/**
4580	* Checks specified shared module range for changes
4581	*
4582	* Performs the following tasks:
4583	* - If a shared page is new, then it changes the GMM page type to shared and
4584	* returns it in the pPageDesc descriptor.
4585	* - If a shared page already exists, then it checks if the VM page is
4586	* identical and if so frees the VM page and returns the shared page in
4587	* pPageDesc descriptor.
4588	*
4589	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4590	*
4591	* @returns VBox status code.
4592	* @param pGMM Pointer to the GMM instance data.
4593	* @param pGVM Pointer to the GVM instance data.
4594	* @param pModule Module description
4595	* @param idxRegion Region index
4596	* @param idxPage Page index
4597	* @param paPageDesc Page descriptor
4598	*/
4599	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, unsigned idxRegion, unsigned idxPage,
4600	PGMMSHAREDPAGEDESC pPageDesc)
4601	{
4602	int rc;
4603	PGMM pGMM;
4604	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4605
4606	AssertMsgReturn(idxRegion < pModule->cRegions,
4607	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4608	VERR_INVALID_PARAMETER);
4609
4610	unsigned const cPages = pModule->aRegions[idxRegion].cbRegion >> PAGE_SHIFT;
4611	AssertMsgReturn(idxPage < cPages,
4612	("idxRegion=%#x cRegions=%#x %s %s\n", idxRegion, pModule->cRegions, pModule->szName, pModule->szVersion),
4613	VERR_INVALID_PARAMETER);
4614
4615	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4616
4617	/*
4618	* First time; create a page descriptor array.
4619	*/
4620	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4621	if (!pGlobalRegion->paidPages)
4622	{
4623	Log(("Allocate page descriptor array for %d pages\n", cPages));
4624	pGlobalRegion->paidPages = (uint32_t )RTMemAlloc(cPages sizeof(pGlobalRegion->paidPages[0]));
4625	AssertReturn(pGlobalRegion->paidPages, VERR_NO_MEMORY);
4626
4627	/* Invalidate all descriptors. */
4628	for (unsigned i = 0; i < cPages; i++)
4629	pGlobalRegion->paidPages[i] = NIL_GMM_PAGEID;
4630	}
4631
4632	/*
4633	* We've seen this shared page for the first time?
4634	*/
4635	if (pGlobalRegion->paidPages[idxPage] == NIL_GMM_PAGEID)
4636	{
4637	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4638	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4639	}
4640
4641	/*
4642	* We've seen it before...
4643	*/
4644	Log(("Replace existing page guest %RGp host %RHp id %#x -> id %#x\n",
4645	pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->idPage, pGlobalRegion->paidPages[idxPage]));
4646	Assert(pPageDesc->idPage != pGlobalRegion->paidPages[idxPage]);
4647
4648	/*
4649	* Get the shared page source.
4650	*/
4651	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paidPages[idxPage]);
4652	AssertMsgReturn(pPage, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #2\n", pPageDesc->idPage, idxRegion, idxPage),
4653	VERR_PGM_PHYS_INVALID_PAGE_ID);
4654
4655	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4656	{
4657	/*
4658	* Page was freed at some point; invalidate this entry.
4659	*/
4660	/** @todo this isn't really bullet proof. */
4661	Log(("Old shared page was freed -> create a new one\n"));
4662	pGlobalRegion->paidPages[idxPage] = NIL_GMM_PAGEID;
4663	return gmmR0SharedModuleCheckPageFirstTime(pGMM, pGVM, pModule, idxRegion, idxPage, pPageDesc, pGlobalRegion);
4664	}
4665
4666	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4667
4668	/*
4669	* Calculate the virtual address of the local page.
4670	*/
4671	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pPageDesc->idPage >> GMM_CHUNKID_SHIFT);
4672	AssertMsgReturn(pChunk, ("idPage=%#x (idxRegion=%#x idxPage=%#x) #4\n", pPageDesc->idPage, idxRegion, idxPage),
4673	VERR_PGM_PHYS_INVALID_PAGE_ID);
4674
4675	uint8_t *pbChunk;
4676	AssertMsgReturn(gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk),
4677	("idPage=%#x (idxRegion=%#x idxPage=%#x) #3\n", pPageDesc->idPage, idxRegion, idxPage),
4678	VERR_PGM_PHYS_INVALID_PAGE_ID);
4679	uint8_t *pbLocalPage = pbChunk + ((pPageDesc->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4680
4681	/*
4682	* Calculate the virtual address of the shared page.
4683	*/
4684	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paidPages[idxPage] >> GMM_CHUNKID_SHIFT);
4685	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4686
4687	/*
4688	* Get the virtual address of the physical page; map the chunk into the VM
4689	* process if not already done.
4690	*/
4691	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4692	{
4693	Log(("Map chunk into process!\n"));
4694	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4695	AssertRCReturn(rc, rc);
4696	}
4697	uint8_t *pbSharedPage = pbChunk + ((pGlobalRegion->paidPages[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4698
4699	/** @todo write ASMMemComparePage. */
4700	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4701	{
4702	Log(("Unexpected differences found between local and shared page; skip\n"));
4703	/* Signal to the caller that this one hasn't changed. */
4704	pPageDesc->idPage = NIL_GMM_PAGEID;
4705	return VINF_SUCCESS;
4706	}
4707
4708	/*
4709	* Free the old local page.
4710	*/
4711	GMMFREEPAGEDESC PageDesc;
4712	PageDesc.idPage = pPageDesc->idPage;
4713	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4714	AssertRCReturn(rc, rc);
4715
4716	gmmR0UseSharedPage(pGMM, pGVM, pPage);
4717
4718	/*
4719	* Pass along the new physical address & page id.
4720	*/
4721	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
4722	pPageDesc->idPage = pGlobalRegion->paidPages[idxPage];
4723
4724	return VINF_SUCCESS;
4725	}
4726
4727
4728	/**
4729	* RTAvlGCPtrDestroy callback.
4730	*
4731	* @returns 0 or VERR_GMM_INSTANCE.
4732	* @param pNode The node to destroy.
4733	* @param pvGVM The GVM handle.
4734	*/
4735	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvGVM)
4736	{
4737	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
4738	NOREF(pvGVM);
4739
4740	Assert(pRecVM->pGlobalModule \|\| pRecVM->fCollision);
4741	if (pRecVM->pGlobalModule)
4742	{
4743	PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4744	AssertPtr(pRec);
4745	Assert(pRec->cUsers);
4746
4747	Log(("gmmR0CleanupSharedModule: %s %s cUsers=%d\n", pRec->szName, pRec->szVersion, pRec->cUsers));
4748	pRec->cUsers--;
4749	if (pRec->cUsers == 0)
4750	{
4751	for (uint32_t i = 0; i < pRec->cRegions; i++)
4752	{
4753	RTMemFree(pRec->aRegions[i].paidPages);
4754	pRec->aRegions[i].paidPages = NULL;
4755	}
4756
4757	/* Remove from the tree and free memory. */
4758	PGMM pGMM;
4759	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4760	RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4761	pGMM->cShareableModules--;
4762	RTMemFree(pRec);
4763	}
4764	}
4765	RTMemFree(pRecVM);
4766	return 0;
4767	}
4768
4769
4770	/**
4771	* Used by GMMR0CleanupVM to clean up shared modules.
4772	*
4773	* This is called without taking the GMM lock so that it can be yielded as
4774	* needed here.
4775	*
4776	* @param pGMM The GMM handle.
4777	* @param pGVM The global VM handle.
4778	*/
4779	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
4780	{
4781	gmmR0MutexAcquire(pGMM);
4782	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
4783
4784	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4785	pGVM->gmm.s.Stats.cShareableModules = 0;
4786
4787	gmmR0MutexRelease(pGMM);
4788	}
4789
4790	#endif /* VBOX_WITH_PAGE_SHARING */
4791
4792	/**
4793	* Removes all shared modules for the specified VM
4794	*
4795	* @returns VBox status code.
4796	* @param pVM VM handle
4797	* @param idCpu VCPU id
4798	*/
4799	GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
4800	{
4801	#ifdef VBOX_WITH_PAGE_SHARING
4802	/*
4803	* Validate input and get the basics.
4804	*/
4805	PGMM pGMM;
4806	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4807	PGVM pGVM;
4808	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4809	if (RT_FAILURE(rc))
4810	return rc;
4811
4812	/*
4813	* Take the semaphore and do some more validations.
4814	*/
4815	gmmR0MutexAcquire(pGMM);
4816	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4817	{
4818	Log(("GMMR0ResetSharedModules\n"));
4819	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4820	pGVM->gmm.s.Stats.cShareableModules = 0;
4821
4822	rc = VINF_SUCCESS;
4823	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4824	}
4825	else
4826	rc = VERR_GMM_IS_NOT_SANE;
4827
4828	gmmR0MutexRelease(pGMM);
4829	return rc;
4830	#else
4831	NOREF(pVM); NOREF(idCpu);
4832	return VERR_NOT_IMPLEMENTED;
4833	#endif
4834	}
4835
4836	#ifdef VBOX_WITH_PAGE_SHARING
4837
4838	typedef struct
4839	{
4840	PGVM pGVM;
4841	VMCPUID idCpu;
4842	int rc;
4843	} GMMCHECKSHAREDMODULEINFO, *PGMMCHECKSHAREDMODULEINFO;
4844
4845	/**
4846	* Tree enumeration callback for checking a shared module.
4847	*/
4848	static DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4849	{
4850	PGMMCHECKSHAREDMODULEINFO pInfo = (PGMMCHECKSHAREDMODULEINFO)pvUser;
4851	PGMMSHAREDMODULEPERVM pLocalModule = (PGMMSHAREDMODULEPERVM)pNode;
4852	PGMMSHAREDMODULE pGlobalModule = pLocalModule->pGlobalModule;
4853
4854	if ( !pLocalModule->fCollision
4855	&& pGlobalModule)
4856	{
4857	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x collision=%d\n", pGlobalModule->szName, pGlobalModule->szVersion, pGlobalModule->Core.Key, pGlobalModule->cbModule, pLocalModule->fCollision));
4858	pInfo->rc = PGMR0SharedModuleCheck(pInfo->pGVM->pVM, pInfo->pGVM, pInfo->idCpu, pGlobalModule,
4859	pLocalModule->cRegions, pLocalModule->aRegions);
4860	if (RT_FAILURE(pInfo->rc))
4861	return VINF_CALLBACK_RETURN;
4862	}
4863	return VINF_SUCCESS;
4864	}
4865
4866	#endif /* VBOX_WITH_PAGE_SHARING */
4867	#ifdef DEBUG_sandervl
4868
4869	/**
4870	* Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4871	*
4872	* @returns VBox status code.
4873	* @param pVM VM handle
4874	*/
4875	GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
4876	{
4877	/*
4878	* Validate input and get the basics.
4879	*/
4880	PGMM pGMM;
4881	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4882
4883	/*
4884	* Take the semaphore and do some more validations.
4885	*/
4886	gmmR0MutexAcquire(pGMM);
4887	if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4888	rc = VERR_GMM_IS_NOT_SANE;
4889	else
4890	rc = VINF_SUCCESS;
4891
4892	return rc;
4893	}
4894
4895	/**
4896	* Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4897	*
4898	* @returns VBox status code.
4899	* @param pVM VM handle
4900	*/
4901	GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
4902	{
4903	/*
4904	* Validate input and get the basics.
4905	*/
4906	PGMM pGMM;
4907	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4908
4909	gmmR0MutexRelease(pGMM);
4910	return VINF_SUCCESS;
4911	}
4912
4913	#endif /* DEBUG_sandervl */
4914
4915	/**
4916	* Check all shared modules for the specified VM
4917	*
4918	* @returns VBox status code.
4919	* @param pVM VM handle
4920	* @param pVCpu VMCPU handle
4921	*/
4922	GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
4923	{
4924	#ifdef VBOX_WITH_PAGE_SHARING
4925	/*
4926	* Validate input and get the basics.
4927	*/
4928	PGMM pGMM;
4929	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4930	PGVM pGVM;
4931	int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
4932	if (RT_FAILURE(rc))
4933	return rc;
4934
4935	# ifndef DEBUG_sandervl
4936	/*
4937	* Take the semaphore and do some more validations.
4938	*/
4939	gmmR0MutexAcquire(pGMM);
4940	# endif
4941	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4942	{
4943	GMMCHECKSHAREDMODULEINFO Info;
4944
4945	Log(("GMMR0CheckSharedModules\n"));
4946	Info.pGVM = pGVM;
4947	Info.idCpu = pVCpu->idCpu;
4948	Info.rc = VINF_SUCCESS;
4949
4950	RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Info);
4951
4952	rc = Info.rc;
4953
4954	Log(("GMMR0CheckSharedModules done!\n"));
4955
4956	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4957	}
4958	else
4959	rc = VERR_GMM_IS_NOT_SANE;
4960
4961	# ifndef DEBUG_sandervl
4962	gmmR0MutexRelease(pGMM);
4963	# endif
4964	return rc;
4965	#else
4966	NOREF(pVM); NOREF(pVCpu);
4967	return VERR_NOT_IMPLEMENTED;
4968	#endif
4969	}
4970
4971	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
4972
4973	typedef struct
4974	{
4975	PGVM pGVM;
4976	PGMM pGMM;
4977	uint8_t *pSourcePage;
4978	bool fFoundDuplicate;
4979	} GMMFINDDUPPAGEINFO, *PGMMFINDDUPPAGEINFO;
4980
4981	/**
4982	* RTAvlU32DoWithAll callback.
4983	*
4984	* @returns 0
4985	* @param pNode The node to search.
4986	* @param pvInfo Pointer to the input parameters
4987	*/
4988	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvInfo)
4989	{
4990	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
4991	PGMMFINDDUPPAGEINFO pInfo = (PGMMFINDDUPPAGEINFO)pvInfo;
4992	PGVM pGVM = pInfo->pGVM;
4993	PGMM pGMM = pInfo->pGMM;
4994	uint8_t *pbChunk;
4995
4996	/* Only take chunks not mapped into this VM process; not entirely correct. */
4997	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4998	{
4999	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
5000	if (RT_SUCCESS(rc))
5001	{
5002	/*
5003	* Look for duplicate pages
5004	*/
5005	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
5006	while (iPage-- > 0)
5007	{
5008	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
5009	{
5010	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
5011
5012	if (!memcmp(pInfo->pSourcePage, pbDestPage, PAGE_SIZE))
5013	{
5014	pInfo->fFoundDuplicate = true;
5015	break;
5016	}
5017	}
5018	}
5019	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
5020	}
5021	}
5022	return pInfo->fFoundDuplicate; /* (stops search if true) */
5023	}
5024
5025
5026	/**
5027	* Find a duplicate of the specified page in other active VMs
5028	*
5029	* @returns VBox status code.
5030	* @param pVM VM handle
5031	* @param pReq Request packet
5032	*/
5033	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5034	{
5035	/*
5036	* Validate input and pass it on.
5037	*/
5038	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5039	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5040	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5041
5042	PGMM pGMM;
5043	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5044
5045	PGVM pGVM;
5046	int rc = GVMMR0ByVM(pVM, &pGVM);
5047	if (RT_FAILURE(rc))
5048	return rc;
5049
5050	/*
5051	* Take the semaphore and do some more validations.
5052	*/
5053	rc = gmmR0MutexAcquire(pGMM);
5054	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5055	{
5056	uint8_t *pbChunk;
5057	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5058	if (pChunk)
5059	{
5060	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5061	{
5062	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5063	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5064	if (pPage)
5065	{
5066	GMMFINDDUPPAGEINFO Info;
5067	Info.pGVM = pGVM;
5068	Info.pGMM = pGMM;
5069	Info.pSourcePage = pbSourcePage;
5070	Info.fFoundDuplicate = false;
5071	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Info);
5072
5073	pReq->fDuplicate = Info.fFoundDuplicate;
5074	}
5075	else
5076	{
5077	AssertFailed();
5078	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5079	}
5080	}
5081	else
5082	AssertFailed();
5083	}
5084	else
5085	AssertFailed();
5086	}
5087	else
5088	rc = VERR_GMM_IS_NOT_SANE;
5089
5090	gmmR0MutexRelease(pGMM);
5091	return rc;
5092	}
5093
5094	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5095
5096
5097	/**
5098	* Retrieves the GMM statistics visible to the caller.
5099	*
5100	* @returns VBox status code.
5101	*
5102	* @param pStats Where to put the statistics.
5103	* @param pSession The current session.
5104	* @param pVM The VM to obtain statistics for. Optional.
5105	*/
5106	GMMR0DECL(int) GMMR0QueryStatistics(PGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5107	{
5108	LogFlow(("GVMMR0QueryStatistics: pStats=%p pSession=%p pVM=%p\n", pStats, pSession, pVM));
5109
5110	/*
5111	* Validate input.
5112	*/
5113	AssertPtrReturn(pSession, VERR_INVALID_POINTER);
5114	AssertPtrReturn(pStats, VERR_INVALID_POINTER);
5115	pStats->cMaxPages = 0; /* (crash before taking the mutex...) */
5116
5117	PGMM pGMM;
5118	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5119
5120	/*
5121	* Resolve the VM handle, if not NULL, and lock the GMM.
5122	*/
5123	int rc;
5124	PGVM pGVM;
5125	if (pVM)
5126	{
5127	rc = GVMMR0ByVM(pVM, &pGVM);
5128	if (RT_FAILURE(rc))
5129	return rc;
5130	}
5131	else
5132	pGVM = NULL;
5133
5134	rc = gmmR0MutexAcquire(pGMM);
5135	if (RT_FAILURE(rc))
5136	return rc;
5137
5138	/*
5139	* Copy out the GMM statistics.
5140	*/
5141	pStats->cMaxPages = pGMM->cMaxPages;
5142	pStats->cReservedPages = pGMM->cReservedPages;
5143	pStats->cOverCommittedPages = pGMM->cOverCommittedPages;
5144	pStats->cAllocatedPages = pGMM->cAllocatedPages;
5145	pStats->cSharedPages = pGMM->cSharedPages;
5146	pStats->cDuplicatePages = pGMM->cDuplicatePages;
5147	pStats->cLeftBehindSharedPages = pGMM->cLeftBehindSharedPages;
5148	pStats->cBalloonedPages = pGMM->cBalloonedPages;
5149	pStats->cChunks = pGMM->cChunks;
5150	pStats->cFreedChunks = pGMM->cFreedChunks;
5151	pStats->cShareableModules = pGMM->cShareableModules;
5152	RT_ZERO(pStats->au64Reserved);
5153
5154	/*
5155	* Copy out the VM statistics.
5156	*/
5157	if (pGVM)
5158	pStats->VMStats = pGVM->gmm.s.Stats;
5159	else
5160	RT_ZERO(pStats->VMStats);
5161
5162	gmmR0MutexRelease(pGMM);
5163	return rc;
5164	}
5165
5166
5167	/**
5168	* VMMR0 request wrapper for GMMR0QueryStatistics.
5169	*
5170	* @returns see GMMR0QueryStatistics.
5171	* @param pVM Pointer to the shared VM structure. Optional.
5172	* @param pReq The request packet.
5173	*/
5174	GMMR0DECL(int) GMMR0QueryStatisticsReq(PVM pVM, PGMMQUERYSTATISTICSSREQ pReq)
5175	{
5176	/*
5177	* Validate input and pass it on.
5178	*/
5179	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5180	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5181
5182	return GMMR0QueryStatistics(&pReq->Stats, pReq->pSession, pVM);
5183	}
5184
5185
5186	/**
5187	* Resets the specified GMM statistics.
5188	*
5189	* @returns VBox status code.
5190	*
5191	* @param pStats Which statistics to reset, that is, non-zero fields
5192	* indicates which to reset.
5193	* @param pSession The current session.
5194	* @param pVM The VM to reset statistics for. Optional.
5195	*/
5196	GMMR0DECL(int) GMMR0ResetStatistics(PCGMMSTATS pStats, PSUPDRVSESSION pSession, PVM pVM)
5197	{
5198	/* Currently nothing we can reset at the moment. */
5199	return VINF_SUCCESS;
5200	}
5201
5202
5203	/**
5204	* VMMR0 request wrapper for GMMR0ResetStatistics.
5205	*
5206	* @returns see GMMR0ResetStatistics.
5207	* @param pVM Pointer to the shared VM structure. Optional.
5208	* @param pReq The request packet.
5209	*/
5210	GMMR0DECL(int) GMMR0ResetStatisticsReq(PVM pVM, PGMMRESETSTATISTICSSREQ pReq)
5211	{
5212	/*
5213	* Validate input and pass it on.
5214	*/
5215	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5216	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5217
5218	return GMMR0ResetStatistics(&pReq->Stats, pReq->pSession, pVM);
5219	}
5220

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 40026

Download in other formats: