GMMR0.cpp@ 39436

Last change on this file since 39436 was 39402, checked in by vboxsync, 13 years ago
VMM: don't use generic IPE status codes, use specific ones. Part 1.
Property svn:eol-style set to `native` Property svn:keywords set to `Id`
File size: 175.8 KB

Line
1	/* $Id: GMMR0.cpp 39402 2011-11-23 16:25:04Z vboxsync $ */
2	/** @file
3	* GMM - Global Memory Manager.
4	*/
5
6	/*
7	* Copyright (C) 2007-2011 Oracle Corporation
8	*
9	* This file is part of VirtualBox Open Source Edition (OSE), as
10	* available from http://www.virtualbox.org. This file is free software;
11	* you can redistribute it and/or modify it under the terms of the GNU
12	* General Public License (GPL) as published by the Free Software
13	* Foundation, in version 2 as it comes in the "COPYING" file of the
14	* VirtualBox OSE distribution. VirtualBox OSE is distributed in the
15	* hope that it will be useful, but WITHOUT ANY WARRANTY of any kind.
16	*/
17
18
19	/** @page pg_gmm GMM - The Global Memory Manager
20	*
21	* As the name indicates, this component is responsible for global memory
22	* management. Currently only guest RAM is allocated from the GMM, but this
23	* may change to include shadow page tables and other bits later.
24	*
25	* Guest RAM is managed as individual pages, but allocated from the host OS
26	* in chunks for reasons of portability / efficiency. To minimize the memory
27	* footprint all tracking structure must be as small as possible without
28	* unnecessary performance penalties.
29	*
30	* The allocation chunks has fixed sized, the size defined at compile time
31	* by the #GMM_CHUNK_SIZE \#define.
32	*
33	* Each chunk is given an unique ID. Each page also has a unique ID. The
34	* relation ship between the two IDs is:
35	* @code
36	* GMM_CHUNK_SHIFT = log2(GMM_CHUNK_SIZE / PAGE_SIZE);
37	* idPage = (idChunk << GMM_CHUNK_SHIFT) \| iPage;
38	* @endcode
39	* Where iPage is the index of the page within the chunk. This ID scheme
40	* permits for efficient chunk and page lookup, but it relies on the chunk size
41	* to be set at compile time. The chunks are organized in an AVL tree with their
42	* IDs being the keys.
43	*
44	* The physical address of each page in an allocation chunk is maintained by
45	* the #RTR0MEMOBJ and obtained using #RTR0MemObjGetPagePhysAddr. There is no
46	* need to duplicate this information (it'll cost 8-bytes per page if we did).
47	*
48	* So what do we need to track per page? Most importantly we need to know
49	* which state the page is in:
50	* - Private - Allocated for (eventually) backing one particular VM page.
51	* - Shared - Readonly page that is used by one or more VMs and treated
52	* as COW by PGM.
53	* - Free - Not used by anyone.
54	*
55	* For the page replacement operations (sharing, defragmenting and freeing)
56	* to be somewhat efficient, private pages needs to be associated with a
57	* particular page in a particular VM.
58	*
59	* Tracking the usage of shared pages is impractical and expensive, so we'll
60	* settle for a reference counting system instead.
61	*
62	* Free pages will be chained on LIFOs
63	*
64	* On 64-bit systems we will use a 64-bit bitfield per page, while on 32-bit
65	* systems a 32-bit bitfield will have to suffice because of address space
66	* limitations. The #GMMPAGE structure shows the details.
67	*
68	*
69	* @section sec_gmm_alloc_strat Page Allocation Strategy
70	*
71	* The strategy for allocating pages has to take fragmentation and shared
72	* pages into account, or we may end up with with 2000 chunks with only
73	* a few pages in each. Shared pages cannot easily be reallocated because
74	* of the inaccurate usage accounting (see above). Private pages can be
75	* reallocated by a defragmentation thread in the same manner that sharing
76	* is done.
77	*
78	* The first approach is to manage the free pages in two sets depending on
79	* whether they are mainly for the allocation of shared or private pages.
80	* In the initial implementation there will be almost no possibility for
81	* mixing shared and private pages in the same chunk (only if we're really
82	* stressed on memory), but when we implement forking of VMs and have to
83	* deal with lots of COW pages it'll start getting kind of interesting.
84	*
85	* The sets are lists of chunks with approximately the same number of
86	* free pages. Say the chunk size is 1MB, meaning 256 pages, and a set
87	* consists of 16 lists. So, the first list will contain the chunks with
88	* 1-7 free pages, the second covers 8-15, and so on. The chunks will be
89	* moved between the lists as pages are freed up or allocated.
90	*
91	*
92	* @section sec_gmm_costs Costs
93	*
94	* The per page cost in kernel space is 32-bit plus whatever RTR0MEMOBJ
95	* entails. In addition there is the chunk cost of approximately
96	* (sizeof(RT0MEMOBJ) + sizeof(CHUNK)) / 2^CHUNK_SHIFT bytes per page.
97	*
98	* On Windows the per page #RTR0MEMOBJ cost is 32-bit on 32-bit windows
99	* and 64-bit on 64-bit windows (a PFN_NUMBER in the MDL). So, 64-bit per page.
100	* The cost on Linux is identical, but here it's because of sizeof(struct page *).
101	*
102	*
103	* @section sec_gmm_legacy Legacy Mode for Non-Tier-1 Platforms
104	*
105	* In legacy mode the page source is locked user pages and not
106	* #RTR0MemObjAllocPhysNC, this means that a page can only be allocated
107	* by the VM that locked it. We will make no attempt at implementing
108	* page sharing on these systems, just do enough to make it all work.
109	*
110	*
111	* @subsection sub_gmm_locking Serializing
112	*
113	* One simple fast mutex will be employed in the initial implementation, not
114	* two as mentioned in @ref subsec_pgmPhys_Serializing.
115	*
116	* @see @ref subsec_pgmPhys_Serializing
117	*
118	*
119	* @section sec_gmm_overcommit Memory Over-Commitment Management
120	*
121	* The GVM will have to do the system wide memory over-commitment
122	* management. My current ideas are:
123	* - Per VM oc policy that indicates how much to initially commit
124	* to it and what to do in a out-of-memory situation.
125	* - Prevent overtaxing the host.
126	*
127	* There are some challenges here, the main ones are configurability and
128	* security. Should we for instance permit anyone to request 100% memory
129	* commitment? Who should be allowed to do runtime adjustments of the
130	* config. And how to prevent these settings from being lost when the last
131	* VM process exits? The solution is probably to have an optional root
132	* daemon the will keep VMMR0.r0 in memory and enable the security measures.
133	*
134	*
135	*
136	* @section sec_gmm_numa NUMA
137	*
138	* NUMA considerations will be designed and implemented a bit later.
139	*
140	* The preliminary guesses is that we will have to try allocate memory as
141	* close as possible to the CPUs the VM is executed on (EMT and additional CPU
142	* threads). Which means it's mostly about allocation and sharing policies.
143	* Both the scheduler and allocator interface will to supply some NUMA info
144	* and we'll need to have a way to calc access costs.
145	*
146	*/
147
148
149	/*******************************************************************************
150	* Header Files *
151	*******************************************************************************/
152	#define LOG_GROUP LOG_GROUP_GMM
153	#include <VBox/rawpci.h>
154	#include <VBox/vmm/vm.h>
155	#include <VBox/vmm/gmm.h>
156	#include "GMMR0Internal.h"
157	#include <VBox/vmm/gvm.h>
158	#include <VBox/vmm/pgm.h>
159	#include <VBox/log.h>
160	#include <VBox/param.h>
161	#include <VBox/err.h>
162	#include <iprt/asm.h>
163	#include <iprt/avl.h>
164	#include <iprt/list.h>
165	#include <iprt/mem.h>
166	#include <iprt/memobj.h>
167	#include <iprt/mp.h>
168	#include <iprt/semaphore.h>
169	#include <iprt/string.h>
170	#include <iprt/time.h>
171
172
173	/*******************************************************************************
174	* Structures and Typedefs *
175	*******************************************************************************/
176	/** Pointer to set of free chunks. */
177	typedef struct GMMCHUNKFREESET *PGMMCHUNKFREESET;
178
179	/**
180	* The per-page tracking structure employed by the GMM.
181	*
182	* On 32-bit hosts we'll some trickery is necessary to compress all
183	* the information into 32-bits. When the fSharedFree member is set,
184	* the 30th bit decides whether it's a free page or not.
185	*
186	* Because of the different layout on 32-bit and 64-bit hosts, macros
187	* are used to get and set some of the data.
188	*/
189	typedef union GMMPAGE
190	{
191	#if HC_ARCH_BITS == 64
192	/** Unsigned integer view. */
193	uint64_t u;
194
195	/** The common view. */
196	struct GMMPAGECOMMON
197	{
198	uint32_t uStuff1 : 32;
199	uint32_t uStuff2 : 30;
200	/** The page state. */
201	uint32_t u2State : 2;
202	} Common;
203
204	/** The view of a private page. */
205	struct GMMPAGEPRIVATE
206	{
207	/** The guest page frame number. (Max addressable: 2 ^ 44 - 16) */
208	uint32_t pfn;
209	/** The GVM handle. (64K VMs) */
210	uint32_t hGVM : 16;
211	/** Reserved. */
212	uint32_t u16Reserved : 14;
213	/** The page state. */
214	uint32_t u2State : 2;
215	} Private;
216
217	/** The view of a shared page. */
218	struct GMMPAGESHARED
219	{
220	/** The host page frame number. (Max addressable: 2 ^ 44 - 16) */
221	uint32_t pfn;
222	/** The reference count (64K VMs). */
223	uint32_t cRefs : 16;
224	/** Reserved. Checksum or something? Two hGVMs for forking? */
225	uint32_t u14Reserved : 14;
226	/** The page state. */
227	uint32_t u2State : 2;
228	} Shared;
229
230	/** The view of a free page. */
231	struct GMMPAGEFREE
232	{
233	/** The index of the next page in the free list. UINT16_MAX is NIL. */
234	uint16_t iNext;
235	/** Reserved. Checksum or something? */
236	uint16_t u16Reserved0;
237	/** Reserved. Checksum or something? */
238	uint32_t u30Reserved1 : 30;
239	/** The page state. */
240	uint32_t u2State : 2;
241	} Free;
242
243	#else /* 32-bit */
244	/** Unsigned integer view. */
245	uint32_t u;
246
247	/** The common view. */
248	struct GMMPAGECOMMON
249	{
250	uint32_t uStuff : 30;
251	/** The page state. */
252	uint32_t u2State : 2;
253	} Common;
254
255	/** The view of a private page. */
256	struct GMMPAGEPRIVATE
257	{
258	/** The guest page frame number. (Max addressable: 2 ^ 36) */
259	uint32_t pfn : 24;
260	/** The GVM handle. (127 VMs) */
261	uint32_t hGVM : 7;
262	/** The top page state bit, MBZ. */
263	uint32_t fZero : 1;
264	} Private;
265
266	/** The view of a shared page. */
267	struct GMMPAGESHARED
268	{
269	/** The reference count. */
270	uint32_t cRefs : 30;
271	/** The page state. */
272	uint32_t u2State : 2;
273	} Shared;
274
275	/** The view of a free page. */
276	struct GMMPAGEFREE
277	{
278	/** The index of the next page in the free list. UINT16_MAX is NIL. */
279	uint32_t iNext : 16;
280	/** Reserved. Checksum or something? */
281	uint32_t u14Reserved : 14;
282	/** The page state. */
283	uint32_t u2State : 2;
284	} Free;
285	#endif
286	} GMMPAGE;
287	AssertCompileSize(GMMPAGE, sizeof(RTHCUINTPTR));
288	/** Pointer to a GMMPAGE. */
289	typedef GMMPAGE *PGMMPAGE;
290
291
292	/** @name The Page States.
293	* @{ */
294	/** A private page. */
295	#define GMM_PAGE_STATE_PRIVATE 0
296	/** A private page - alternative value used on the 32-bit implementation.
297	* This will never be used on 64-bit hosts. */
298	#define GMM_PAGE_STATE_PRIVATE_32 1
299	/** A shared page. */
300	#define GMM_PAGE_STATE_SHARED 2
301	/** A free page. */
302	#define GMM_PAGE_STATE_FREE 3
303	/** @} */
304
305
306	/** @def GMM_PAGE_IS_PRIVATE
307	*
308	* @returns true if private, false if not.
309	* @param pPage The GMM page.
310	*/
311	#if HC_ARCH_BITS == 64
312	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_PRIVATE )
313	#else
314	# define GMM_PAGE_IS_PRIVATE(pPage) ( (pPage)->Private.fZero == 0 )
315	#endif
316
317	/** @def GMM_PAGE_IS_SHARED
318	*
319	* @returns true if shared, false if not.
320	* @param pPage The GMM page.
321	*/
322	#define GMM_PAGE_IS_SHARED(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_SHARED )
323
324	/** @def GMM_PAGE_IS_FREE
325	*
326	* @returns true if free, false if not.
327	* @param pPage The GMM page.
328	*/
329	#define GMM_PAGE_IS_FREE(pPage) ( (pPage)->Common.u2State == GMM_PAGE_STATE_FREE )
330
331	/** @def GMM_PAGE_PFN_LAST
332	* The last valid guest pfn range.
333	* @remark Some of the values outside the range has special meaning,
334	* see GMM_PAGE_PFN_UNSHAREABLE.
335	*/
336	#if HC_ARCH_BITS == 64
337	# define GMM_PAGE_PFN_LAST UINT32_C(0xfffffff0)
338	#else
339	# define GMM_PAGE_PFN_LAST UINT32_C(0x00fffff0)
340	#endif
341	AssertCompile(GMM_PAGE_PFN_LAST == (GMM_GCPHYS_LAST >> PAGE_SHIFT));
342
343	/** @def GMM_PAGE_PFN_UNSHAREABLE
344	* Indicates that this page isn't used for normal guest memory and thus isn't shareable.
345	*/
346	#if HC_ARCH_BITS == 64
347	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0xfffffff1)
348	#else
349	# define GMM_PAGE_PFN_UNSHAREABLE UINT32_C(0x00fffff1)
350	#endif
351	AssertCompile(GMM_PAGE_PFN_UNSHAREABLE == (GMM_GCPHYS_UNSHAREABLE >> PAGE_SHIFT));
352
353
354	/**
355	* A GMM allocation chunk ring-3 mapping record.
356	*
357	* This should really be associated with a session and not a VM, but
358	* it's simpler to associated with a VM and cleanup with the VM object
359	* is destroyed.
360	*/
361	typedef struct GMMCHUNKMAP
362	{
363	/** The mapping object. */
364	RTR0MEMOBJ hMapObj;
365	/** The VM owning the mapping. */
366	PGVM pGVM;
367	} GMMCHUNKMAP;
368	/** Pointer to a GMM allocation chunk mapping. */
369	typedef struct GMMCHUNKMAP *PGMMCHUNKMAP;
370
371
372	/**
373	* A GMM allocation chunk.
374	*/
375	typedef struct GMMCHUNK
376	{
377	/** The AVL node core.
378	* The Key is the chunk ID. (Giant mtx.) */
379	AVLU32NODECORE Core;
380	/** The memory object.
381	* Either from RTR0MemObjAllocPhysNC or RTR0MemObjLockUser depending on
382	* what the host can dish up with. (Chunk mtx protects mapping accesses
383	* and related frees.) */
384	RTR0MEMOBJ hMemObj;
385	/** Pointer to the next chunk in the free list. (Giant mtx.) */
386	PGMMCHUNK pFreeNext;
387	/** Pointer to the previous chunk in the free list. (Giant mtx.) */
388	PGMMCHUNK pFreePrev;
389	/** Pointer to the free set this chunk belongs to. NULL for
390	* chunks with no free pages. (Giant mtx.) */
391	PGMMCHUNKFREESET pSet;
392	/** List node in the chunk list (GMM::ChunkList). (Giant mtx.) */
393	RTLISTNODE ListNode;
394	/** Pointer to an array of mappings. (Chunk mtx.) */
395	PGMMCHUNKMAP paMappingsX;
396	/** The number of mappings. (Chunk mtx.) */
397	uint16_t cMappingsX;
398	/** The mapping lock this chunk is using using. UINT16_MAX if nobody is
399	* mapping or freeing anything. (Giant mtx.) */
400	uint8_t volatile iChunkMtx;
401	/** Flags field reserved for future use (like eliminating enmType).
402	* (Giant mtx.) */
403	uint8_t fFlags;
404	/** The head of the list of free pages. UINT16_MAX is the NIL value.
405	* (Giant mtx.) */
406	uint16_t iFreeHead;
407	/** The number of free pages. (Giant mtx.) */
408	uint16_t cFree;
409	/** The GVM handle of the VM that first allocated pages from this chunk, this
410	* is used as a preference when there are several chunks to choose from.
411	* When in bound memory mode this isn't a preference any longer. (Giant
412	* mtx.) */
413	uint16_t hGVM;
414	/** The ID of the NUMA node the memory mostly resides on. (Reserved for
415	* future use.) (Giant mtx.) */
416	uint16_t idNumaNode;
417	/** The number of private pages. (Giant mtx.) */
418	uint16_t cPrivate;
419	/** The number of shared pages. (Giant mtx.) */
420	uint16_t cShared;
421	/** The pages. (Giant mtx.) */
422	GMMPAGE aPages[GMM_CHUNK_SIZE >> PAGE_SHIFT];
423	} GMMCHUNK;
424
425	/** Indicates that the NUMA properies of the memory is unknown. */
426	#define GMM_CHUNK_NUMA_ID_UNKNOWN UINT16_C(0xfffe)
427
428	/** @name GMM_CHUNK_FLAGS_XXX - chunk flags.
429	* @{ */
430	/** Indicates that the chunk is a large page (2MB). */
431	#define GMM_CHUNK_FLAGS_LARGE_PAGE UINT16_C(0x0001)
432	/** @} */
433
434
435	/**
436	* An allocation chunk TLB entry.
437	*/
438	typedef struct GMMCHUNKTLBE
439	{
440	/** The chunk id. */
441	uint32_t idChunk;
442	/** Pointer to the chunk. */
443	PGMMCHUNK pChunk;
444	} GMMCHUNKTLBE;
445	/** Pointer to an allocation chunk TLB entry. */
446	typedef GMMCHUNKTLBE *PGMMCHUNKTLBE;
447
448
449	/** The number of entries tin the allocation chunk TLB. */
450	#define GMM_CHUNKTLB_ENTRIES 32
451	/** Gets the TLB entry index for the given Chunk ID. */
452	#define GMM_CHUNKTLB_IDX(idChunk) ( (idChunk) & (GMM_CHUNKTLB_ENTRIES - 1) )
453
454	/**
455	* An allocation chunk TLB.
456	*/
457	typedef struct GMMCHUNKTLB
458	{
459	/** The TLB entries. */
460	GMMCHUNKTLBE aEntries[GMM_CHUNKTLB_ENTRIES];
461	} GMMCHUNKTLB;
462	/** Pointer to an allocation chunk TLB. */
463	typedef GMMCHUNKTLB *PGMMCHUNKTLB;
464
465
466	/**
467	* The GMM instance data.
468	*/
469	typedef struct GMM
470	{
471	/** Magic / eye catcher. GMM_MAGIC */
472	uint32_t u32Magic;
473	/** The number of threads waiting on the mutex. */
474	uint32_t cMtxContenders;
475	/** The fast mutex protecting the GMM.
476	* More fine grained locking can be implemented later if necessary. */
477	RTSEMFASTMUTEX hMtx;
478	#ifdef VBOX_STRICT
479	/** The current mutex owner. */
480	RTNATIVETHREAD hMtxOwner;
481	#endif
482	/** The chunk tree. */
483	PAVLU32NODECORE pChunks;
484	/** The chunk TLB. */
485	GMMCHUNKTLB ChunkTLB;
486	/** The private free set. */
487	GMMCHUNKFREESET PrivateX;
488	/** The shared free set. */
489	GMMCHUNKFREESET Shared;
490
491	/** Shared module tree (global). */
492	/** @todo separate trees for distinctly different guest OSes. */
493	PAVLGCPTRNODECORE pGlobalSharedModuleTree;
494
495	/** The chunk list. For simplifying the cleanup process. */
496	RTLISTNODE ChunkList;
497
498	/** The maximum number of pages we're allowed to allocate.
499	* @gcfgm 64-bit GMM/MaxPages Direct.
500	* @gcfgm 32-bit GMM/PctPages Relative to the number of host pages. */
501	uint64_t cMaxPages;
502	/** The number of pages that has been reserved.
503	* The deal is that cReservedPages - cOverCommittedPages <= cMaxPages. */
504	uint64_t cReservedPages;
505	/** The number of pages that we have over-committed in reservations. */
506	uint64_t cOverCommittedPages;
507	/** The number of actually allocated (committed if you like) pages. */
508	uint64_t cAllocatedPages;
509	/** The number of pages that are shared. A subset of cAllocatedPages. */
510	uint64_t cSharedPages;
511	/** The number of pages that are actually shared between VMs. */
512	uint64_t cDuplicatePages;
513	/** The number of pages that are shared that has been left behind by
514	* VMs not doing proper cleanups. */
515	uint64_t cLeftBehindSharedPages;
516	/** The number of allocation chunks.
517	* (The number of pages we've allocated from the host can be derived from this.) */
518	uint32_t cChunks;
519	/** The number of current ballooned pages. */
520	uint64_t cBalloonedPages;
521
522	/** The legacy allocation mode indicator.
523	* This is determined at initialization time. */
524	bool fLegacyAllocationMode;
525	/** The bound memory mode indicator.
526	* When set, the memory will be bound to a specific VM and never
527	* shared. This is always set if fLegacyAllocationMode is set.
528	* (Also determined at initialization time.) */
529	bool fBoundMemoryMode;
530	/** The number of registered VMs. */
531	uint16_t cRegisteredVMs;
532
533	/** The number of freed chunks ever. This is used a list generation to
534	* avoid restarting the cleanup scanning when the list wasn't modified. */
535	uint32_t cFreedChunks;
536	/** The previous allocated Chunk ID.
537	* Used as a hint to avoid scanning the whole bitmap. */
538	uint32_t idChunkPrev;
539	/** Chunk ID allocation bitmap.
540	* Bits of allocated IDs are set, free ones are clear.
541	* The NIL id (0) is marked allocated. */
542	uint32_t bmChunkId[(GMM_CHUNKID_LAST + 1 + 31) / 32];
543
544	/** The index of the next mutex to use. */
545	uint32_t iNextChunkMtx;
546	/** Chunk locks for reducing lock contention without having to allocate
547	* one lock per chunk. */
548	struct
549	{
550	/** The mutex */
551	RTSEMFASTMUTEX hMtx;
552	/** The number of threads currently using this mutex. */
553	uint32_t volatile cUsers;
554	} aChunkMtx[64];
555	} GMM;
556	/** Pointer to the GMM instance. */
557	typedef GMM *PGMM;
558
559	/** The value of GMM::u32Magic (Katsuhiro Otomo). */
560	#define GMM_MAGIC UINT32_C(0x19540414)
561
562
563	/**
564	* GMM chunk mutex state.
565	*
566	* This is returned by gmmR0ChunkMutexAcquire and is used by the other
567	* gmmR0ChunkMutex* methods.
568	*/
569	typedef struct GMMR0CHUNKMTXSTATE
570	{
571	PGMM pGMM;
572	/** The index of the chunk mutex. */
573	uint8_t iChunkMtx;
574	/** The relevant flags (GMMR0CHUNK_MTX_XXX). */
575	uint8_t fFlags;
576	} GMMR0CHUNKMTXSTATE;
577	/** Pointer to a chunk mutex state. */
578	typedef GMMR0CHUNKMTXSTATE *PGMMR0CHUNKMTXSTATE;
579
580	/** @name GMMR0CHUNK_MTX_XXX
581	* @{ */
582	#define GMMR0CHUNK_MTX_INVALID UINT32_C(0)
583	#define GMMR0CHUNK_MTX_KEEP_GIANT UINT32_C(1)
584	#define GMMR0CHUNK_MTX_RETAKE_GIANT UINT32_C(2)
585	#define GMMR0CHUNK_MTX_DROP_GIANT UINT32_C(3)
586	#define GMMR0CHUNK_MTX_END UINT32_C(4)
587	/** @} */
588
589
590	/*******************************************************************************
591	* Global Variables *
592	*******************************************************************************/
593	/** Pointer to the GMM instance data. */
594	static PGMM g_pGMM = NULL;
595
596	/** Macro for obtaining and validating the g_pGMM pointer.
597	*
598	* On failure it will return from the invoking function with the specified
599	* return value.
600	*
601	* @param pGMM The name of the pGMM variable.
602	* @param rc The return value on failure. Use VERR_GMM_INSTANCE for VBox
603	* status codes.
604	*/
605	#define GMM_GET_VALID_INSTANCE(pGMM, rc) \
606	do { \
607	(pGMM) = g_pGMM; \
608	AssertPtrReturn((pGMM), (rc)); \
609	AssertMsgReturn((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic), (rc)); \
610	} while (0)
611
612	/** Macro for obtaining and validating the g_pGMM pointer, void function
613	* variant.
614	*
615	* On failure it will return from the invoking function.
616	*
617	* @param pGMM The name of the pGMM variable.
618	*/
619	#define GMM_GET_VALID_INSTANCE_VOID(pGMM) \
620	do { \
621	(pGMM) = g_pGMM; \
622	AssertPtrReturnVoid((pGMM)); \
623	AssertMsgReturnVoid((pGMM)->u32Magic == GMM_MAGIC, ("%p - %#x\n", (pGMM), (pGMM)->u32Magic)); \
624	} while (0)
625
626
627	/** @def GMM_CHECK_SANITY_UPON_ENTERING
628	* Checks the sanity of the GMM instance data before making changes.
629	*
630	* This is macro is a stub by default and must be enabled manually in the code.
631	*
632	* @returns true if sane, false if not.
633	* @param pGMM The name of the pGMM variable.
634	*/
635	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
636	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
637	#else
638	# define GMM_CHECK_SANITY_UPON_ENTERING(pGMM) (true)
639	#endif
640
641	/** @def GMM_CHECK_SANITY_UPON_LEAVING
642	* Checks the sanity of the GMM instance data after making changes.
643	*
644	* This is macro is a stub by default and must be enabled manually in the code.
645	*
646	* @returns true if sane, false if not.
647	* @param pGMM The name of the pGMM variable.
648	*/
649	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
650	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
651	#else
652	# define GMM_CHECK_SANITY_UPON_LEAVING(pGMM) (true)
653	#endif
654
655	/** @def GMM_CHECK_SANITY_IN_LOOPS
656	* Checks the sanity of the GMM instance in the allocation loops.
657	*
658	* This is macro is a stub by default and must be enabled manually in the code.
659	*
660	* @returns true if sane, false if not.
661	* @param pGMM The name of the pGMM variable.
662	*/
663	#if defined(VBOX_STRICT) && defined(GMMR0_WITH_SANITY_CHECK) && 0
664	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (gmmR0SanityCheck((pGMM), __PRETTY_FUNCTION__, __LINE__) == 0)
665	#else
666	# define GMM_CHECK_SANITY_IN_LOOPS(pGMM) (true)
667	#endif
668
669
670	/*******************************************************************************
671	* Internal Functions *
672	*******************************************************************************/
673	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM);
674	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
675	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk);
676	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet);
677	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
678	#ifdef GMMR0_WITH_SANITY_CHECK
679	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo);
680	#endif
681	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem);
682	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
683	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage);
684	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk);
685	#ifdef VBOX_WITH_PAGE_SHARING
686	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM);
687	#endif
688
689
690
691	/**
692	* Initializes the GMM component.
693	*
694	* This is called when the VMMR0.r0 module is loaded and protected by the
695	* loader semaphore.
696	*
697	* @returns VBox status code.
698	*/
699	GMMR0DECL(int) GMMR0Init(void)
700	{
701	LogFlow(("GMMInit:\n"));
702
703	/*
704	* Allocate the instance data and the locks.
705	*/
706	PGMM pGMM = (PGMM)RTMemAllocZ(sizeof(*pGMM));
707	if (!pGMM)
708	return VERR_NO_MEMORY;
709
710	pGMM->u32Magic = GMM_MAGIC;
711	for (unsigned i = 0; i < RT_ELEMENTS(pGMM->ChunkTLB.aEntries); i++)
712	pGMM->ChunkTLB.aEntries[i].idChunk = NIL_GMM_CHUNKID;
713	RTListInit(&pGMM->ChunkList);
714	ASMBitSet(&pGMM->bmChunkId[0], NIL_GMM_CHUNKID);
715
716	int rc = RTSemFastMutexCreate(&pGMM->hMtx);
717	if (RT_SUCCESS(rc))
718	{
719	unsigned iMtx;
720	for (iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
721	{
722	rc = RTSemFastMutexCreate(&pGMM->aChunkMtx[iMtx].hMtx);
723	if (RT_FAILURE(rc))
724	break;
725	}
726	if (RT_SUCCESS(rc))
727	{
728	/*
729	* Check and see if RTR0MemObjAllocPhysNC works.
730	*/
731	#if 0 /* later, see #3170. */
732	RTR0MEMOBJ MemObj;
733	rc = RTR0MemObjAllocPhysNC(&MemObj, _64K, NIL_RTHCPHYS);
734	if (RT_SUCCESS(rc))
735	{
736	rc = RTR0MemObjFree(MemObj, true);
737	AssertRC(rc);
738	}
739	else if (rc == VERR_NOT_SUPPORTED)
740	pGMM->fLegacyAllocationMode = pGMM->fBoundMemoryMode = true;
741	else
742	SUPR0Printf("GMMR0Init: RTR0MemObjAllocPhysNC(,64K,Any) -> %d!\n", rc);
743	#else
744	# if defined(RT_OS_WINDOWS) \|\| (defined(RT_OS_SOLARIS) && ARCH_BITS == 64) \|\| defined(RT_OS_LINUX) \|\| defined(RT_OS_FREEBSD)
745	pGMM->fLegacyAllocationMode = false;
746	# if ARCH_BITS == 32
747	/* Don't reuse possibly partial chunks because of the virtual
748	address space limitation. */
749	pGMM->fBoundMemoryMode = true;
750	# else
751	pGMM->fBoundMemoryMode = false;
752	# endif
753	# else
754	pGMM->fLegacyAllocationMode = true;
755	pGMM->fBoundMemoryMode = true;
756	# endif
757	#endif
758
759	/*
760	* Query system page count and guess a reasonable cMaxPages value.
761	*/
762	pGMM->cMaxPages = UINT32_MAX; /** @todo IPRT function for query ram size and such. */
763
764	g_pGMM = pGMM;
765	LogFlow(("GMMInit: pGMM=%p fLegacyAllocationMode=%RTbool fBoundMemoryMode=%RTbool\n", pGMM, pGMM->fLegacyAllocationMode, pGMM->fBoundMemoryMode));
766	return VINF_SUCCESS;
767	}
768
769	/*
770	* Bail out.
771	*/
772	while (iMtx-- > 0)
773	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
774	RTSemFastMutexDestroy(pGMM->hMtx);
775	}
776
777	pGMM->u32Magic = 0;
778	RTMemFree(pGMM);
779	SUPR0Printf("GMMR0Init: failed! rc=%d\n", rc);
780	return rc;
781	}
782
783
784	/**
785	* Terminates the GMM component.
786	*/
787	GMMR0DECL(void) GMMR0Term(void)
788	{
789	LogFlow(("GMMTerm:\n"));
790
791	/*
792	* Take care / be paranoid...
793	*/
794	PGMM pGMM = g_pGMM;
795	if (!VALID_PTR(pGMM))
796	return;
797	if (pGMM->u32Magic != GMM_MAGIC)
798	{
799	SUPR0Printf("GMMR0Term: u32Magic=%#x\n", pGMM->u32Magic);
800	return;
801	}
802
803	/*
804	* Undo what init did and free all the resources we've acquired.
805	*/
806	/* Destroy the fundamentals. */
807	g_pGMM = NULL;
808	pGMM->u32Magic = ~GMM_MAGIC;
809	RTSemFastMutexDestroy(pGMM->hMtx);
810	pGMM->hMtx = NIL_RTSEMFASTMUTEX;
811
812	/* Free any chunks still hanging around. */
813	RTAvlU32Destroy(&pGMM->pChunks, gmmR0TermDestroyChunk, pGMM);
814
815	/* Destroy the chunk locks. */
816	for (unsigned iMtx = 0; iMtx < RT_ELEMENTS(pGMM->aChunkMtx); iMtx++)
817	{
818	Assert(pGMM->aChunkMtx[iMtx].cUsers == 0);
819	RTSemFastMutexDestroy(pGMM->aChunkMtx[iMtx].hMtx);
820	pGMM->aChunkMtx[iMtx].hMtx = NIL_RTSEMFASTMUTEX;
821	}
822
823	/* Finally the instance data itself. */
824	RTMemFree(pGMM);
825	LogFlow(("GMMTerm: done\n"));
826	}
827
828
829	/**
830	* RTAvlU32Destroy callback.
831	*
832	* @returns 0
833	* @param pNode The node to destroy.
834	* @param pvGMM The GMM handle.
835	*/
836	static DECLCALLBACK(int) gmmR0TermDestroyChunk(PAVLU32NODECORE pNode, void *pvGMM)
837	{
838	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
839
840	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
841	SUPR0Printf("GMMR0Term: %p/%#x: cFree=%d cPrivate=%d cShared=%d cMappings=%d\n", pChunk,
842	pChunk->Core.Key, pChunk->cFree, pChunk->cPrivate, pChunk->cShared, pChunk->cMappingsX);
843
844	int rc = RTR0MemObjFree(pChunk->hMemObj, true /* fFreeMappings */);
845	if (RT_FAILURE(rc))
846	{
847	SUPR0Printf("GMMR0Term: %p/%#x: RTRMemObjFree(%p,true) -> %d (cMappings=%d)\n", pChunk,
848	pChunk->Core.Key, pChunk->hMemObj, rc, pChunk->cMappingsX);
849	AssertRC(rc);
850	}
851	pChunk->hMemObj = NIL_RTR0MEMOBJ;
852
853	RTMemFree(pChunk->paMappingsX);
854	pChunk->paMappingsX = NULL;
855
856	RTMemFree(pChunk);
857	NOREF(pvGMM);
858	return 0;
859	}
860
861
862	/**
863	* Initializes the per-VM data for the GMM.
864	*
865	* This is called from within the GVMM lock (from GVMMR0CreateVM)
866	* and should only initialize the data members so GMMR0CleanupVM
867	* can deal with them. We reserve no memory or anything here,
868	* that's done later in GMMR0InitVM.
869	*
870	* @param pGVM Pointer to the Global VM structure.
871	*/
872	GMMR0DECL(void) GMMR0InitPerVMData(PGVM pGVM)
873	{
874	AssertCompile(RT_SIZEOFMEMB(GVM,gmm.s) <= RT_SIZEOFMEMB(GVM,gmm.padding));
875
876	pGVM->gmm.s.enmPolicy = GMMOCPOLICY_INVALID;
877	pGVM->gmm.s.enmPriority = GMMPRIORITY_INVALID;
878	pGVM->gmm.s.fMayAllocate = false;
879	}
880
881
882	/**
883	* Acquires the GMM giant lock.
884	*
885	* @returns Assert status code from RTSemFastMutexRequest.
886	* @param pGMM Pointer to the GMM instance.
887	*/
888	static int gmmR0MutexAcquire(PGMM pGMM)
889	{
890	ASMAtomicIncU32(&pGMM->cMtxContenders);
891	int rc = RTSemFastMutexRequest(pGMM->hMtx);
892	ASMAtomicDecU32(&pGMM->cMtxContenders);
893	AssertRC(rc);
894	#ifdef VBOX_STRICT
895	pGMM->hMtxOwner = RTThreadNativeSelf();
896	#endif
897	return rc;
898	}
899
900
901	/**
902	* Releases the GMM giant lock.
903	*
904	* @returns Assert status code from RTSemFastMutexRequest.
905	* @param pGMM Pointer to the GMM instance.
906	*/
907	static int gmmR0MutexRelease(PGMM pGMM)
908	{
909	#ifdef VBOX_STRICT
910	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
911	#endif
912	int rc = RTSemFastMutexRelease(pGMM->hMtx);
913	AssertRC(rc);
914	return rc;
915	}
916
917
918	/**
919	* Yields the GMM giant lock if there is contention and a certain minimum time
920	* has elapsed since we took it.
921	*
922	* @returns @c true if the mutex was yielded, @c false if not.
923	* @param pGMM Pointer to the GMM instance.
924	* @param puLockNanoTS Where the lock acquisition time stamp is kept
925	* (in/out).
926	*/
927	static bool gmmR0MutexYield(PGMM pGMM, uint64_t *puLockNanoTS)
928	{
929	/*
930	* If nobody is contending the mutex, don't bother checking the time.
931	*/
932	if (ASMAtomicReadU32(&pGMM->cMtxContenders) == 0)
933	return false;
934
935	/*
936	* Don't yield if we haven't executed for at least 2 milliseconds.
937	*/
938	uint64_t uNanoNow = RTTimeSystemNanoTS();
939	if (uNanoNow - *puLockNanoTS < UINT32_C(2000000))
940	return false;
941
942	/*
943	* Yield the mutex.
944	*/
945	#ifdef VBOX_STRICT
946	pGMM->hMtxOwner = NIL_RTNATIVETHREAD;
947	#endif
948	ASMAtomicIncU32(&pGMM->cMtxContenders);
949	int rc1 = RTSemFastMutexRelease(pGMM->hMtx); AssertRC(rc1);
950
951	RTThreadYield();
952
953	int rc2 = RTSemFastMutexRequest(pGMM->hMtx); AssertRC(rc2);
954	*puLockNanoTS = RTTimeSystemNanoTS();
955	ASMAtomicDecU32(&pGMM->cMtxContenders);
956	#ifdef VBOX_STRICT
957	pGMM->hMtxOwner = RTThreadNativeSelf();
958	#endif
959
960	return true;
961	}
962
963
964	/**
965	* Acquires a chunk lock.
966	*
967	* The caller must own the giant lock.
968	*
969	* @returns Assert status code from RTSemFastMutexRequest.
970	* @param pMtxState The chunk mutex state info. (Avoids
971	* passing the same flags and stuff around
972	* for subsequent release and drop-giant
973	* calls.)
974	* @param pGMM Pointer to the GMM instance.
975	* @param pChunk Pointer to the chunk.
976	* @param fFlags Flags regarding the giant lock, GMMR0CHUNK_MTX_XXX.
977	*/
978	static int gmmR0ChunkMutexAcquire(PGMMR0CHUNKMTXSTATE pMtxState, PGMM pGMM, PGMMCHUNK pChunk, uint32_t fFlags)
979	{
980	Assert(fFlags > GMMR0CHUNK_MTX_INVALID && fFlags < GMMR0CHUNK_MTX_END);
981	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
982
983	pMtxState->pGMM = pGMM;
984	pMtxState->fFlags = (uint8_t)fFlags;
985
986	/*
987	* Get the lock index and reference the lock.
988	*/
989	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
990	uint32_t iChunkMtx = pChunk->iChunkMtx;
991	if (iChunkMtx == UINT8_MAX)
992	{
993	iChunkMtx = pGMM->iNextChunkMtx++;
994	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
995
996	/* Try get an unused one... */
997	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
998	{
999	iChunkMtx = pGMM->iNextChunkMtx++;
1000	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1001	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1002	{
1003	iChunkMtx = pGMM->iNextChunkMtx++;
1004	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1005	if (pGMM->aChunkMtx[iChunkMtx].cUsers)
1006	{
1007	iChunkMtx = pGMM->iNextChunkMtx++;
1008	iChunkMtx %= RT_ELEMENTS(pGMM->aChunkMtx);
1009	}
1010	}
1011	}
1012
1013	pChunk->iChunkMtx = iChunkMtx;
1014	}
1015	AssertCompile(RT_ELEMENTS(pGMM->aChunkMtx) < UINT8_MAX);
1016	pMtxState->iChunkMtx = (uint8_t)iChunkMtx;
1017	ASMAtomicIncU32(&pGMM->aChunkMtx[iChunkMtx].cUsers);
1018
1019	/*
1020	* Drop the giant?
1021	*/
1022	if (fFlags != GMMR0CHUNK_MTX_KEEP_GIANT)
1023	{
1024	/** @todo GMM life cycle cleanup (we may race someone
1025	* destroying and cleaning up GMM)? */
1026	gmmR0MutexRelease(pGMM);
1027	}
1028
1029	/*
1030	* Take the chunk mutex.
1031	*/
1032	int rc = RTSemFastMutexRequest(pGMM->aChunkMtx[iChunkMtx].hMtx);
1033	AssertRC(rc);
1034	return rc;
1035	}
1036
1037
1038	/**
1039	* Releases the GMM giant lock.
1040	*
1041	* @returns Assert status code from RTSemFastMutexRequest.
1042	* @param pGMM Pointer to the GMM instance.
1043	* @param pChunk Pointer to the chunk if it's still
1044	* alive, NULL if it isn't. This is used to deassociate
1045	* the chunk from the mutex on the way out so a new one
1046	* can be selected next time, thus avoiding contented
1047	* mutexes.
1048	*/
1049	static int gmmR0ChunkMutexRelease(PGMMR0CHUNKMTXSTATE pMtxState, PGMMCHUNK pChunk)
1050	{
1051	PGMM pGMM = pMtxState->pGMM;
1052
1053	/*
1054	* Release the chunk mutex and reacquire the giant if requested.
1055	*/
1056	int rc = RTSemFastMutexRelease(pGMM->aChunkMtx[pMtxState->iChunkMtx].hMtx);
1057	AssertRC(rc);
1058	if (pMtxState->fFlags == GMMR0CHUNK_MTX_RETAKE_GIANT)
1059	rc = gmmR0MutexAcquire(pGMM);
1060	else
1061	Assert((pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT) == (pGMM->hMtxOwner == RTThreadNativeSelf()));
1062
1063	/*
1064	* Drop the chunk mutex user reference and deassociate it from the chunk
1065	* when possible.
1066	*/
1067	if ( ASMAtomicDecU32(&pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers) == 0
1068	&& pChunk
1069	&& RT_SUCCESS(rc) )
1070	{
1071	if (pMtxState->fFlags != GMMR0CHUNK_MTX_DROP_GIANT)
1072	pChunk->iChunkMtx = UINT8_MAX;
1073	else
1074	{
1075	rc = gmmR0MutexAcquire(pGMM);
1076	if (RT_SUCCESS(rc))
1077	{
1078	if (pGMM->aChunkMtx[pMtxState->iChunkMtx].cUsers == 0)
1079	pChunk->iChunkMtx = UINT8_MAX;
1080	rc = gmmR0MutexRelease(pGMM);
1081	}
1082	}
1083	}
1084
1085	pMtxState->pGMM = NULL;
1086	return rc;
1087	}
1088
1089
1090	/**
1091	* Drops the giant GMM lock we kept in gmmR0ChunkMutexAcquire while keeping the
1092	* chunk locked.
1093	*
1094	* This only works if gmmR0ChunkMutexAcquire was called with
1095	* GMMR0CHUNK_MTX_KEEP_GIANT. gmmR0ChunkMutexRelease will retake the giant
1096	* mutex, i.e. behave as if GMMR0CHUNK_MTX_RETAKE_GIANT was used.
1097	*
1098	* @returns VBox status code (assuming success is ok).
1099	* @param pMtxState Pointer to the chunk mutex state.
1100	*/
1101	static int gmmR0ChunkMutexDropGiant(PGMMR0CHUNKMTXSTATE pMtxState)
1102	{
1103	AssertReturn(pMtxState->fFlags == GMMR0CHUNK_MTX_KEEP_GIANT, VERR_GMM_MTX_FLAGS);
1104	Assert(pMtxState->pGMM->hMtxOwner == RTThreadNativeSelf());
1105	pMtxState->fFlags = GMMR0CHUNK_MTX_RETAKE_GIANT;
1106	/** @todo GMM life cycle cleanup (we may race someone
1107	* destroying and cleaning up GMM)? */
1108	return gmmR0MutexRelease(pMtxState->pGMM);
1109	}
1110
1111
1112	/**
1113	* For experimenting with NUMA affinity and such.
1114	*
1115	* @returns The current NUMA Node ID.
1116	*/
1117	static uint16_t gmmR0GetCurrentNumaNodeId(void)
1118	{
1119	#if 1
1120	return GMM_CHUNK_NUMA_ID_UNKNOWN;
1121	#else
1122	return RTMpCpuId() / 16;
1123	#endif
1124	}
1125
1126
1127
1128	/**
1129	* Cleans up when a VM is terminating.
1130	*
1131	* @param pGVM Pointer to the Global VM structure.
1132	*/
1133	GMMR0DECL(void) GMMR0CleanupVM(PGVM pGVM)
1134	{
1135	LogFlow(("GMMR0CleanupVM: pGVM=%p:{.pVM=%p, .hSelf=%#x}\n", pGVM, pGVM->pVM, pGVM->hSelf));
1136
1137	PGMM pGMM;
1138	GMM_GET_VALID_INSTANCE_VOID(pGMM);
1139
1140	#ifdef VBOX_WITH_PAGE_SHARING
1141	/*
1142	* Clean up all registered shared modules first.
1143	*/
1144	gmmR0SharedModuleCleanup(pGMM, pGVM);
1145	#endif
1146
1147	gmmR0MutexAcquire(pGMM);
1148	uint64_t uLockNanoTS = RTTimeSystemNanoTS();
1149	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
1150
1151	/*
1152	* The policy is 'INVALID' until the initial reservation
1153	* request has been serviced.
1154	*/
1155	if ( pGVM->gmm.s.enmPolicy > GMMOCPOLICY_INVALID
1156	&& pGVM->gmm.s.enmPolicy < GMMOCPOLICY_END)
1157	{
1158	/*
1159	* If it's the last VM around, we can skip walking all the chunk looking
1160	* for the pages owned by this VM and instead flush the whole shebang.
1161	*
1162	* This takes care of the eventuality that a VM has left shared page
1163	* references behind (shouldn't happen of course, but you never know).
1164	*/
1165	Assert(pGMM->cRegisteredVMs);
1166	pGMM->cRegisteredVMs--;
1167
1168	/*
1169	* Walk the entire pool looking for pages that belong to this VM
1170	* and leftover mappings. (This'll only catch private pages,
1171	* shared pages will be 'left behind'.)
1172	*/
1173	uint64_t cPrivatePages = pGVM->gmm.s.cPrivatePages; /* save */
1174
1175	unsigned iCountDown = 64;
1176	bool fRedoFromStart;
1177	PGMMCHUNK pChunk;
1178	do
1179	{
1180	fRedoFromStart = false;
1181	RTListForEachReverse(&pGMM->ChunkList, pChunk, GMMCHUNK, ListNode)
1182	{
1183	uint32_t const cFreeChunksOld = pGMM->cFreedChunks;
1184	if (gmmR0CleanupVMScanChunk(pGMM, pGVM, pChunk))
1185	{
1186	/* We left the giant mutex, so reset the yield counters. */
1187	uLockNanoTS = RTTimeSystemNanoTS();
1188	iCountDown = 64;
1189	}
1190	else
1191	{
1192	/* Didn't leave it, so do normal yielding. */
1193	if (!iCountDown)
1194	gmmR0MutexYield(pGMM, &uLockNanoTS);
1195	else
1196	iCountDown--;
1197	}
1198	if (pGMM->cFreedChunks != cFreeChunksOld)
1199	break;
1200	}
1201	} while (fRedoFromStart);
1202
1203	if (pGVM->gmm.s.cPrivatePages)
1204	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x has %#x private pages that cannot be found!\n", pGVM->hSelf, pGVM->gmm.s.cPrivatePages);
1205
1206	pGMM->cAllocatedPages -= cPrivatePages;
1207
1208	/*
1209	* Free empty chunks.
1210	*/
1211	PGMMCHUNKFREESET pPrivateSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
1212	do
1213	{
1214	fRedoFromStart = false;
1215	iCountDown = 10240;
1216	pChunk = pPrivateSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
1217	while (pChunk)
1218	{
1219	PGMMCHUNK pNext = pChunk->pFreeNext;
1220	Assert(pChunk->cFree == GMM_CHUNK_NUM_PAGES);
1221	if ( !pGMM->fBoundMemoryMode
1222	\|\| pChunk->hGVM == pGVM->hSelf)
1223	{
1224	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1225	if (gmmR0FreeChunk(pGMM, pGVM, pChunk, true /fRelaxedSem/))
1226	{
1227	/* We've left the giant mutex, restart? (+1 for our unlink) */
1228	fRedoFromStart = pPrivateSet->idGeneration != idGenerationOld + 1;
1229	if (fRedoFromStart)
1230	break;
1231	uLockNanoTS = RTTimeSystemNanoTS();
1232	iCountDown = 10240;
1233	}
1234	}
1235
1236	/* Advance and maybe yield the lock. */
1237	pChunk = pNext;
1238	if (--iCountDown == 0)
1239	{
1240	uint64_t const idGenerationOld = pPrivateSet->idGeneration;
1241	fRedoFromStart = gmmR0MutexYield(pGMM, &uLockNanoTS)
1242	&& pPrivateSet->idGeneration != idGenerationOld;
1243	if (fRedoFromStart)
1244	break;
1245	iCountDown = 10240;
1246	}
1247	}
1248	} while (fRedoFromStart);
1249
1250	/*
1251	* Account for shared pages that weren't freed.
1252	*/
1253	if (pGVM->gmm.s.cSharedPages)
1254	{
1255	Assert(pGMM->cSharedPages >= pGVM->gmm.s.cSharedPages);
1256	SUPR0Printf("GMMR0CleanupVM: hGVM=%#x left %#x shared pages behind!\n", pGVM->hSelf, pGVM->gmm.s.cSharedPages);
1257	pGMM->cLeftBehindSharedPages += pGVM->gmm.s.cSharedPages;
1258	}
1259
1260	/*
1261	* Clean up balloon statistics in case the VM process crashed.
1262	*/
1263	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.cBalloonedPages);
1264	pGMM->cBalloonedPages -= pGVM->gmm.s.cBalloonedPages;
1265
1266	/*
1267	* Update the over-commitment management statistics.
1268	*/
1269	pGMM->cReservedPages -= pGVM->gmm.s.Reserved.cBasePages
1270	+ pGVM->gmm.s.Reserved.cFixedPages
1271	+ pGVM->gmm.s.Reserved.cShadowPages;
1272	switch (pGVM->gmm.s.enmPolicy)
1273	{
1274	case GMMOCPOLICY_NO_OC:
1275	break;
1276	default:
1277	/** @todo Update GMM->cOverCommittedPages */
1278	break;
1279	}
1280	}
1281
1282	/* zap the GVM data. */
1283	pGVM->gmm.s.enmPolicy = GMMOCPOLICY_INVALID;
1284	pGVM->gmm.s.enmPriority = GMMPRIORITY_INVALID;
1285	pGVM->gmm.s.fMayAllocate = false;
1286
1287	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1288	gmmR0MutexRelease(pGMM);
1289
1290	LogFlow(("GMMR0CleanupVM: returns\n"));
1291	}
1292
1293
1294	/**
1295	* Scan one chunk for private pages belonging to the specified VM.
1296	*
1297	* @note This function may drop the gian mutex!
1298	*
1299	* @returns @c true if we've temporarily dropped the giant mutex, @c false if
1300	* we didn't.
1301	* @param pGMM Pointer to the GMM instance.
1302	* @param pGVM The global VM handle.
1303	* @param pChunk The chunk to scan.
1304	*/
1305	static bool gmmR0CleanupVMScanChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1306	{
1307	/*
1308	* Look for pages belonging to the VM.
1309	* (Perform some internal checks while we're scanning.)
1310	*/
1311	#ifndef VBOX_STRICT
1312	if (pChunk->cFree != (GMM_CHUNK_SIZE >> PAGE_SHIFT))
1313	#endif
1314	{
1315	unsigned cPrivate = 0;
1316	unsigned cShared = 0;
1317	unsigned cFree = 0;
1318
1319	gmmR0UnlinkChunk(pChunk); /* avoiding cFreePages updates. */
1320
1321	uint16_t hGVM = pGVM->hSelf;
1322	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
1323	while (iPage-- > 0)
1324	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
1325	{
1326	if (pChunk->aPages[iPage].Private.hGVM == hGVM)
1327	{
1328	/*
1329	* Free the page.
1330	*
1331	* The reason for not using gmmR0FreePrivatePage here is that we
1332	* must not cause the chunk to be freed from under us - we're in
1333	* an AVL tree walk here.
1334	*/
1335	pChunk->aPages[iPage].u = 0;
1336	pChunk->aPages[iPage].Free.iNext = pChunk->iFreeHead;
1337	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
1338	pChunk->iFreeHead = iPage;
1339	pChunk->cPrivate--;
1340	pChunk->cFree++;
1341	pGVM->gmm.s.cPrivatePages--;
1342	cFree++;
1343	}
1344	else
1345	cPrivate++;
1346	}
1347	else if (GMM_PAGE_IS_FREE(&pChunk->aPages[iPage]))
1348	cFree++;
1349	else
1350	cShared++;
1351
1352	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1353
1354	/*
1355	* Did it add up?
1356	*/
1357	if (RT_UNLIKELY( pChunk->cFree != cFree
1358	\|\| pChunk->cPrivate != cPrivate
1359	\|\| pChunk->cShared != cShared))
1360	{
1361	SUPR0Printf("gmmR0CleanupVMScanChunk: Chunk %p/%#x has bogus stats - free=%d/%d private=%d/%d shared=%d/%d\n",
1362	pChunk->cFree, cFree, pChunk->cPrivate, cPrivate, pChunk->cShared, cShared);
1363	pChunk->cFree = cFree;
1364	pChunk->cPrivate = cPrivate;
1365	pChunk->cShared = cShared;
1366	}
1367	}
1368
1369	/*
1370	* If not in bound memory mode, we should reset the hGVM field
1371	* if it has our handle in it.
1372	*/
1373	if (pChunk->hGVM == pGVM->hSelf)
1374	{
1375	if (!g_pGMM->fBoundMemoryMode)
1376	pChunk->hGVM = NIL_GVM_HANDLE;
1377	else if (pChunk->cFree != GMM_CHUNK_NUM_PAGES)
1378	{
1379	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: cFree=%#x - it should be 0 in bound mode!\n",
1380	pChunk, pChunk->Core.Key, pChunk->cFree);
1381	AssertMsgFailed(("%p/%#x: cFree=%#x - it should be 0 in bound mode!\n", pChunk, pChunk->Core.Key, pChunk->cFree));
1382
1383	gmmR0UnlinkChunk(pChunk);
1384	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
1385	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
1386	}
1387	}
1388
1389	/*
1390	* Look for a mapping belonging to the terminating VM.
1391	*/
1392	GMMR0CHUNKMTXSTATE MtxState;
1393	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
1394	unsigned cMappings = pChunk->cMappingsX;
1395	for (unsigned i = 0; i < cMappings; i++)
1396	if (pChunk->paMappingsX[i].pGVM == pGVM)
1397	{
1398	gmmR0ChunkMutexDropGiant(&MtxState);
1399
1400	RTR0MEMOBJ hMemObj = pChunk->paMappingsX[i].hMapObj;
1401
1402	cMappings--;
1403	if (i < cMappings)
1404	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
1405	pChunk->paMappingsX[cMappings].pGVM = NULL;
1406	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
1407	Assert(pChunk->cMappingsX - 1U == cMappings);
1408	pChunk->cMappingsX = cMappings;
1409
1410	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings (NA) */);
1411	if (RT_FAILURE(rc))
1412	{
1413	SUPR0Printf("gmmR0CleanupVMScanChunk: %p/%#x: mapping #%x: RTRMemObjFree(%p,false) -> %d \n",
1414	pChunk, pChunk->Core.Key, i, hMemObj, rc);
1415	AssertRC(rc);
1416	}
1417
1418	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1419	return true;
1420	}
1421
1422	gmmR0ChunkMutexRelease(&MtxState, pChunk);
1423	return false;
1424	}
1425
1426
1427	/**
1428	* The initial resource reservations.
1429	*
1430	* This will make memory reservations according to policy and priority. If there aren't
1431	* sufficient resources available to sustain the VM this function will fail and all
1432	* future allocations requests will fail as well.
1433	*
1434	* These are just the initial reservations made very very early during the VM creation
1435	* process and will be adjusted later in the GMMR0UpdateReservation call after the
1436	* ring-3 init has completed.
1437	*
1438	* @returns VBox status code.
1439	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1440	* @retval VERR_GMM_
1441	*
1442	* @param pVM Pointer to the shared VM structure.
1443	* @param idCpu VCPU id
1444	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1445	* This does not include MMIO2 and similar.
1446	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1447	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1448	* hyper heap, MMIO2 and similar.
1449	* @param enmPolicy The OC policy to use on this VM.
1450	* @param enmPriority The priority in an out-of-memory situation.
1451	*
1452	* @thread The creator thread / EMT.
1453	*/
1454	GMMR0DECL(int) GMMR0InitialReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages,
1455	GMMOCPOLICY enmPolicy, GMMPRIORITY enmPriority)
1456	{
1457	LogFlow(("GMMR0InitialReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x enmPolicy=%d enmPriority=%d\n",
1458	pVM, cBasePages, cShadowPages, cFixedPages, enmPolicy, enmPriority));
1459
1460	/*
1461	* Validate, get basics and take the semaphore.
1462	*/
1463	PGMM pGMM;
1464	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1465	PGVM pGVM;
1466	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1467	if (RT_FAILURE(rc))
1468	return rc;
1469
1470	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1471	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1472	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1473	AssertReturn(enmPolicy > GMMOCPOLICY_INVALID && enmPolicy < GMMOCPOLICY_END, VERR_INVALID_PARAMETER);
1474	AssertReturn(enmPriority > GMMPRIORITY_INVALID && enmPriority < GMMPRIORITY_END, VERR_INVALID_PARAMETER);
1475
1476	gmmR0MutexAcquire(pGMM);
1477	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1478	{
1479	if ( !pGVM->gmm.s.Reserved.cBasePages
1480	&& !pGVM->gmm.s.Reserved.cFixedPages
1481	&& !pGVM->gmm.s.Reserved.cShadowPages)
1482	{
1483	/*
1484	* Check if we can accommodate this.
1485	*/
1486	/* ... later ... */
1487	if (RT_SUCCESS(rc))
1488	{
1489	/*
1490	* Update the records.
1491	*/
1492	pGVM->gmm.s.Reserved.cBasePages = cBasePages;
1493	pGVM->gmm.s.Reserved.cFixedPages = cFixedPages;
1494	pGVM->gmm.s.Reserved.cShadowPages = cShadowPages;
1495	pGVM->gmm.s.enmPolicy = enmPolicy;
1496	pGVM->gmm.s.enmPriority = enmPriority;
1497	pGVM->gmm.s.fMayAllocate = true;
1498
1499	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1500	pGMM->cRegisteredVMs++;
1501	}
1502	}
1503	else
1504	rc = VERR_WRONG_ORDER;
1505	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1506	}
1507	else
1508	rc = VERR_GMM_IS_NOT_SANE;
1509	gmmR0MutexRelease(pGMM);
1510	LogFlow(("GMMR0InitialReservation: returns %Rrc\n", rc));
1511	return rc;
1512	}
1513
1514
1515	/**
1516	* VMMR0 request wrapper for GMMR0InitialReservation.
1517	*
1518	* @returns see GMMR0InitialReservation.
1519	* @param pVM Pointer to the shared VM structure.
1520	* @param idCpu VCPU id
1521	* @param pReq The request packet.
1522	*/
1523	GMMR0DECL(int) GMMR0InitialReservationReq(PVM pVM, VMCPUID idCpu, PGMMINITIALRESERVATIONREQ pReq)
1524	{
1525	/*
1526	* Validate input and pass it on.
1527	*/
1528	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1529	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1530	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1531
1532	return GMMR0InitialReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages, pReq->enmPolicy, pReq->enmPriority);
1533	}
1534
1535
1536	/**
1537	* This updates the memory reservation with the additional MMIO2 and ROM pages.
1538	*
1539	* @returns VBox status code.
1540	* @retval VERR_GMM_MEMORY_RESERVATION_DECLINED
1541	*
1542	* @param pVM Pointer to the shared VM structure.
1543	* @param idCpu VCPU id
1544	* @param cBasePages The number of pages that may be allocated for the base RAM and ROMs.
1545	* This does not include MMIO2 and similar.
1546	* @param cShadowPages The number of pages that may be allocated for shadow paging structures.
1547	* @param cFixedPages The number of pages that may be allocated for fixed objects like the
1548	* hyper heap, MMIO2 and similar.
1549	*
1550	* @thread EMT.
1551	*/
1552	GMMR0DECL(int) GMMR0UpdateReservation(PVM pVM, VMCPUID idCpu, uint64_t cBasePages, uint32_t cShadowPages, uint32_t cFixedPages)
1553	{
1554	LogFlow(("GMMR0UpdateReservation: pVM=%p cBasePages=%#llx cShadowPages=%#x cFixedPages=%#x\n",
1555	pVM, cBasePages, cShadowPages, cFixedPages));
1556
1557	/*
1558	* Validate, get basics and take the semaphore.
1559	*/
1560	PGMM pGMM;
1561	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
1562	PGVM pGVM;
1563	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
1564	if (RT_FAILURE(rc))
1565	return rc;
1566
1567	AssertReturn(cBasePages, VERR_INVALID_PARAMETER);
1568	AssertReturn(cShadowPages, VERR_INVALID_PARAMETER);
1569	AssertReturn(cFixedPages, VERR_INVALID_PARAMETER);
1570
1571	gmmR0MutexAcquire(pGMM);
1572	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
1573	{
1574	if ( pGVM->gmm.s.Reserved.cBasePages
1575	&& pGVM->gmm.s.Reserved.cFixedPages
1576	&& pGVM->gmm.s.Reserved.cShadowPages)
1577	{
1578	/*
1579	* Check if we can accommodate this.
1580	*/
1581	/* ... later ... */
1582	if (RT_SUCCESS(rc))
1583	{
1584	/*
1585	* Update the records.
1586	*/
1587	pGMM->cReservedPages -= pGVM->gmm.s.Reserved.cBasePages
1588	+ pGVM->gmm.s.Reserved.cFixedPages
1589	+ pGVM->gmm.s.Reserved.cShadowPages;
1590	pGMM->cReservedPages += cBasePages + cFixedPages + cShadowPages;
1591
1592	pGVM->gmm.s.Reserved.cBasePages = cBasePages;
1593	pGVM->gmm.s.Reserved.cFixedPages = cFixedPages;
1594	pGVM->gmm.s.Reserved.cShadowPages = cShadowPages;
1595	}
1596	}
1597	else
1598	rc = VERR_WRONG_ORDER;
1599	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
1600	}
1601	else
1602	rc = VERR_GMM_IS_NOT_SANE;
1603	gmmR0MutexRelease(pGMM);
1604	LogFlow(("GMMR0UpdateReservation: returns %Rrc\n", rc));
1605	return rc;
1606	}
1607
1608
1609	/**
1610	* VMMR0 request wrapper for GMMR0UpdateReservation.
1611	*
1612	* @returns see GMMR0UpdateReservation.
1613	* @param pVM Pointer to the shared VM structure.
1614	* @param idCpu VCPU id
1615	* @param pReq The request packet.
1616	*/
1617	GMMR0DECL(int) GMMR0UpdateReservationReq(PVM pVM, VMCPUID idCpu, PGMMUPDATERESERVATIONREQ pReq)
1618	{
1619	/*
1620	* Validate input and pass it on.
1621	*/
1622	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
1623	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
1624	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
1625
1626	return GMMR0UpdateReservation(pVM, idCpu, pReq->cBasePages, pReq->cShadowPages, pReq->cFixedPages);
1627	}
1628
1629	#ifdef GMMR0_WITH_SANITY_CHECK
1630
1631	/**
1632	* Performs sanity checks on a free set.
1633	*
1634	* @returns Error count.
1635	*
1636	* @param pGMM Pointer to the GMM instance.
1637	* @param pSet Pointer to the set.
1638	* @param pszSetName The set name.
1639	* @param pszFunction The function from which it was called.
1640	* @param uLine The line number.
1641	*/
1642	static uint32_t gmmR0SanityCheckSet(PGMM pGMM, PGMMCHUNKFREESET pSet, const char *pszSetName,
1643	const char *pszFunction, unsigned uLineNo)
1644	{
1645	uint32_t cErrors = 0;
1646
1647	/*
1648	* Count the free pages in all the chunks and match it against pSet->cFreePages.
1649	*/
1650	uint32_t cPages = 0;
1651	for (unsigned i = 0; i < RT_ELEMENTS(pSet->apLists); i++)
1652	{
1653	for (PGMMCHUNK pCur = pSet->apLists[i]; pCur; pCur = pCur->pFreeNext)
1654	{
1655	/** @todo check that the chunk is hash into the right set. */
1656	cPages += pCur->cFree;
1657	}
1658	}
1659	if (RT_UNLIKELY(cPages != pSet->cFreePages))
1660	{
1661	SUPR0Printf("GMM insanity: found %#x pages in the %s set, expected %#x. (%s, line %u)\n",
1662	cPages, pszSetName, pSet->cFreePages, pszFunction, uLineNo);
1663	cErrors++;
1664	}
1665
1666	return cErrors;
1667	}
1668
1669
1670	/**
1671	* Performs some sanity checks on the GMM while owning lock.
1672	*
1673	* @returns Error count.
1674	*
1675	* @param pGMM Pointer to the GMM instance.
1676	* @param pszFunction The function from which it is called.
1677	* @param uLineNo The line number.
1678	*/
1679	static uint32_t gmmR0SanityCheck(PGMM pGMM, const char *pszFunction, unsigned uLineNo)
1680	{
1681	uint32_t cErrors = 0;
1682
1683	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->PrivateX, "private", pszFunction, uLineNo);
1684	cErrors += gmmR0SanityCheckSet(pGMM, &pGMM->Shared, "shared", pszFunction, uLineNo);
1685	/** @todo add more sanity checks. */
1686
1687	return cErrors;
1688	}
1689
1690	#endif /* GMMR0_WITH_SANITY_CHECK */
1691
1692	/**
1693	* Looks up a chunk in the tree and fill in the TLB entry for it.
1694	*
1695	* This is not expected to fail and will bitch if it does.
1696	*
1697	* @returns Pointer to the allocation chunk, NULL if not found.
1698	* @param pGMM Pointer to the GMM instance.
1699	* @param idChunk The ID of the chunk to find.
1700	* @param pTlbe Pointer to the TLB entry.
1701	*/
1702	static PGMMCHUNK gmmR0GetChunkSlow(PGMM pGMM, uint32_t idChunk, PGMMCHUNKTLBE pTlbe)
1703	{
1704	PGMMCHUNK pChunk = (PGMMCHUNK)RTAvlU32Get(&pGMM->pChunks, idChunk);
1705	AssertMsgReturn(pChunk, ("Chunk %#x not found!\n", idChunk), NULL);
1706	pTlbe->idChunk = idChunk;
1707	pTlbe->pChunk = pChunk;
1708	return pChunk;
1709	}
1710
1711
1712	/**
1713	* Finds a allocation chunk.
1714	*
1715	* This is not expected to fail and will bitch if it does.
1716	*
1717	* @returns Pointer to the allocation chunk, NULL if not found.
1718	* @param pGMM Pointer to the GMM instance.
1719	* @param idChunk The ID of the chunk to find.
1720	*/
1721	DECLINLINE(PGMMCHUNK) gmmR0GetChunk(PGMM pGMM, uint32_t idChunk)
1722	{
1723	/*
1724	* Do a TLB lookup, branch if not in the TLB.
1725	*/
1726	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(idChunk)];
1727	if ( pTlbe->idChunk != idChunk
1728	\|\| !pTlbe->pChunk)
1729	return gmmR0GetChunkSlow(pGMM, idChunk, pTlbe);
1730	return pTlbe->pChunk;
1731	}
1732
1733
1734	/**
1735	* Finds a page.
1736	*
1737	* This is not expected to fail and will bitch if it does.
1738	*
1739	* @returns Pointer to the page, NULL if not found.
1740	* @param pGMM Pointer to the GMM instance.
1741	* @param idPage The ID of the page to find.
1742	*/
1743	DECLINLINE(PGMMPAGE) gmmR0GetPage(PGMM pGMM, uint32_t idPage)
1744	{
1745	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1746	if (RT_LIKELY(pChunk))
1747	return &pChunk->aPages[idPage & GMM_PAGEID_IDX_MASK];
1748	return NULL;
1749	}
1750
1751
1752	/**
1753	* Gets the host physical address for a page given by it's ID.
1754	*
1755	* @returns The host physical address or NIL_RTHCPHYS.
1756	* @param pGMM Pointer to the GMM instance.
1757	* @param idPage The ID of the page to find.
1758	*/
1759	DECLINLINE(RTHCPHYS) gmmR0GetPageHCPhys(PGMM pGMM, uint32_t idPage)
1760	{
1761	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
1762	if (RT_LIKELY(pChunk))
1763	return RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, idPage & GMM_PAGEID_IDX_MASK);
1764	return NIL_RTHCPHYS;
1765	}
1766
1767
1768	/**
1769	* Selects the appropriate free list given the number of free pages.
1770	*
1771	* @returns Free list index.
1772	* @param cFree The number of free pages in the chunk.
1773	*/
1774	DECLINLINE(unsigned) gmmR0SelectFreeSetList(unsigned cFree)
1775	{
1776	unsigned iList = cFree >> GMM_CHUNK_FREE_SET_SHIFT;
1777	AssertMsg(iList < RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists) / RT_SIZEOFMEMB(GMMCHUNKFREESET, apLists[0]),
1778	("%d (%u)\n", iList, cFree));
1779	return iList;
1780	}
1781
1782
1783	/**
1784	* Unlinks the chunk from the free list it's currently on (if any).
1785	*
1786	* @param pChunk The allocation chunk.
1787	*/
1788	DECLINLINE(void) gmmR0UnlinkChunk(PGMMCHUNK pChunk)
1789	{
1790	PGMMCHUNKFREESET pSet = pChunk->pSet;
1791	if (RT_LIKELY(pSet))
1792	{
1793	pSet->cFreePages -= pChunk->cFree;
1794	pSet->idGeneration++;
1795
1796	PGMMCHUNK pPrev = pChunk->pFreePrev;
1797	PGMMCHUNK pNext = pChunk->pFreeNext;
1798	if (pPrev)
1799	pPrev->pFreeNext = pNext;
1800	else
1801	pSet->apLists[gmmR0SelectFreeSetList(pChunk->cFree)] = pNext;
1802	if (pNext)
1803	pNext->pFreePrev = pPrev;
1804
1805	pChunk->pSet = NULL;
1806	pChunk->pFreeNext = NULL;
1807	pChunk->pFreePrev = NULL;
1808	}
1809	else
1810	{
1811	Assert(!pChunk->pFreeNext);
1812	Assert(!pChunk->pFreePrev);
1813	Assert(!pChunk->cFree);
1814	}
1815	}
1816
1817
1818	/**
1819	* Links the chunk onto the appropriate free list in the specified free set.
1820	*
1821	* If no free entries, it's not linked into any list.
1822	*
1823	* @param pChunk The allocation chunk.
1824	* @param pSet The free set.
1825	*/
1826	DECLINLINE(void) gmmR0LinkChunk(PGMMCHUNK pChunk, PGMMCHUNKFREESET pSet)
1827	{
1828	Assert(!pChunk->pSet);
1829	Assert(!pChunk->pFreeNext);
1830	Assert(!pChunk->pFreePrev);
1831
1832	if (pChunk->cFree > 0)
1833	{
1834	pChunk->pSet = pSet;
1835	pChunk->pFreePrev = NULL;
1836	unsigned const iList = gmmR0SelectFreeSetList(pChunk->cFree);
1837	pChunk->pFreeNext = pSet->apLists[iList];
1838	if (pChunk->pFreeNext)
1839	pChunk->pFreeNext->pFreePrev = pChunk;
1840	pSet->apLists[iList] = pChunk;
1841
1842	pSet->cFreePages += pChunk->cFree;
1843	pSet->idGeneration++;
1844	}
1845	}
1846
1847
1848	/**
1849	* Links the chunk onto the appropriate free list in the specified free set.
1850	*
1851	* If no free entries, it's not linked into any list.
1852	*
1853	* @param pChunk The allocation chunk.
1854	*/
1855	DECLINLINE(void) gmmR0SelectSetAndLinkChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
1856	{
1857	PGMMCHUNKFREESET pSet;
1858	if (pGMM->fBoundMemoryMode)
1859	pSet = &pGVM->gmm.s.Private;
1860	else if (pChunk->cShared)
1861	pSet = &pGMM->Shared;
1862	else
1863	pSet = &pGMM->PrivateX;
1864	gmmR0LinkChunk(pChunk, pSet);
1865	}
1866
1867
1868	/**
1869	* Frees a Chunk ID.
1870	*
1871	* @param pGMM Pointer to the GMM instance.
1872	* @param idChunk The Chunk ID to free.
1873	*/
1874	static void gmmR0FreeChunkId(PGMM pGMM, uint32_t idChunk)
1875	{
1876	AssertReturnVoid(idChunk != NIL_GMM_CHUNKID);
1877	AssertMsg(ASMBitTest(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk));
1878	ASMAtomicBitClear(&pGMM->bmChunkId[0], idChunk);
1879	}
1880
1881
1882	/**
1883	* Allocates a new Chunk ID.
1884	*
1885	* @returns The Chunk ID.
1886	* @param pGMM Pointer to the GMM instance.
1887	*/
1888	static uint32_t gmmR0AllocateChunkId(PGMM pGMM)
1889	{
1890	AssertCompile(!((GMM_CHUNKID_LAST + 1) & 31)); /* must be a multiple of 32 */
1891	AssertCompile(NIL_GMM_CHUNKID == 0);
1892
1893	/*
1894	* Try the next sequential one.
1895	*/
1896	int32_t idChunk = ++pGMM->idChunkPrev;
1897	#if 0 /** @todo enable this code */
1898	if ( idChunk <= GMM_CHUNKID_LAST
1899	&& idChunk > NIL_GMM_CHUNKID
1900	&& !ASMAtomicBitTestAndSet(&pVMM->bmChunkId[0], idChunk))
1901	return idChunk;
1902	#endif
1903
1904	/*
1905	* Scan sequentially from the last one.
1906	*/
1907	if ( (uint32_t)idChunk < GMM_CHUNKID_LAST
1908	&& idChunk > NIL_GMM_CHUNKID)
1909	{
1910	idChunk = ASMBitNextClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1, idChunk);
1911	if (idChunk > NIL_GMM_CHUNKID)
1912	{
1913	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1914	return pGMM->idChunkPrev = idChunk;
1915	}
1916	}
1917
1918	/*
1919	* Ok, scan from the start.
1920	* We're not racing anyone, so there is no need to expect failures or have restart loops.
1921	*/
1922	idChunk = ASMBitFirstClear(&pGMM->bmChunkId[0], GMM_CHUNKID_LAST + 1);
1923	AssertMsgReturn(idChunk > NIL_GMM_CHUNKID, ("%#x\n", idChunk), NIL_GVM_HANDLE);
1924	AssertMsgReturn(!ASMAtomicBitTestAndSet(&pGMM->bmChunkId[0], idChunk), ("%#x\n", idChunk), NIL_GMM_CHUNKID);
1925
1926	return pGMM->idChunkPrev = idChunk;
1927	}
1928
1929
1930	/**
1931	* Allocates one private page.
1932	*
1933	* Worker for gmmR0AllocatePages.
1934	*
1935	* @param pChunk The chunk to allocate it from.
1936	* @param hGVM The GVM handle of the VM requesting memory.
1937	* @param pPageDesc The page descriptor.
1938	*/
1939	static void gmmR0AllocatePage(PGMMCHUNK pChunk, uint32_t hGVM, PGMMPAGEDESC pPageDesc)
1940	{
1941	/* update the chunk stats. */
1942	if (pChunk->hGVM == NIL_GVM_HANDLE)
1943	pChunk->hGVM = hGVM;
1944	Assert(pChunk->cFree);
1945	pChunk->cFree--;
1946	pChunk->cPrivate++;
1947
1948	/* unlink the first free page. */
1949	const uint32_t iPage = pChunk->iFreeHead;
1950	AssertReleaseMsg(iPage < RT_ELEMENTS(pChunk->aPages), ("%d\n", iPage));
1951	PGMMPAGE pPage = &pChunk->aPages[iPage];
1952	Assert(GMM_PAGE_IS_FREE(pPage));
1953	pChunk->iFreeHead = pPage->Free.iNext;
1954	Log3(("A pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x iNext=%#x\n",
1955	pPage, iPage, (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage,
1956	pPage->Common.u2State, pChunk->iFreeHead, pPage->Free.iNext));
1957
1958	/* make the page private. */
1959	pPage->u = 0;
1960	AssertCompile(GMM_PAGE_STATE_PRIVATE == 0);
1961	pPage->Private.hGVM = hGVM;
1962	AssertCompile(NIL_RTHCPHYS >= GMM_GCPHYS_LAST);
1963	AssertCompile(GMM_GCPHYS_UNSHAREABLE >= GMM_GCPHYS_LAST);
1964	if (pPageDesc->HCPhysGCPhys <= GMM_GCPHYS_LAST)
1965	pPage->Private.pfn = pPageDesc->HCPhysGCPhys >> PAGE_SHIFT;
1966	else
1967	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE; /* unshareable / unassigned - same thing. */
1968
1969	/* update the page descriptor. */
1970	pPageDesc->HCPhysGCPhys = RTR0MemObjGetPagePhysAddr(pChunk->hMemObj, iPage);
1971	Assert(pPageDesc->HCPhysGCPhys != NIL_RTHCPHYS);
1972	pPageDesc->idPage = (pChunk->Core.Key << GMM_CHUNKID_SHIFT) \| iPage;
1973	pPageDesc->idSharedPage = NIL_GMM_PAGEID;
1974	}
1975
1976
1977	/**
1978	* Picks the free pages from a chunk.
1979	*
1980	* @returns The new page descriptor table index.
1981	* @param pGMM Pointer to the GMM instance data.
1982	* @param hGVM The VM handle.
1983	* @param pChunk The chunk.
1984	* @param iPage The current page descriptor table index.
1985	* @param cPages The total number of pages to allocate.
1986	* @param paPages The page descriptor table (input + ouput).
1987	*/
1988	static uint32_t gmmR0AllocatePagesFromChunk(PGMMCHUNK pChunk, uint16_t const hGVM, uint32_t iPage, uint32_t cPages,
1989	PGMMPAGEDESC paPages)
1990	{
1991	PGMMCHUNKFREESET pSet = pChunk->pSet; Assert(pSet);
1992	gmmR0UnlinkChunk(pChunk);
1993
1994	for (; pChunk->cFree && iPage < cPages; iPage++)
1995	gmmR0AllocatePage(pChunk, hGVM, &paPages[iPage]);
1996
1997	gmmR0LinkChunk(pChunk, pSet);
1998	return iPage;
1999	}
2000
2001
2002	/**
2003	* Registers a new chunk of memory.
2004	*
2005	* This is called by both gmmR0AllocateOneChunk and GMMR0SeedChunk.
2006	*
2007	* @returns VBox status code. On success, the giant GMM lock will be held, the
2008	* caller must release it (ugly).
2009	* @param pGMM Pointer to the GMM instance.
2010	* @param pSet Pointer to the set.
2011	* @param MemObj The memory object for the chunk.
2012	* @param hGVM The affinity of the chunk. NIL_GVM_HANDLE for no
2013	* affinity.
2014	* @param fChunkFlags The chunk flags, GMM_CHUNK_FLAGS_XXX.
2015	* @param ppChunk Chunk address (out). Optional.
2016	*
2017	* @remarks The caller must not own the giant GMM mutex.
2018	* The giant GMM mutex will be acquired and returned acquired in
2019	* the success path. On failure, no locks will be held.
2020	*/
2021	static int gmmR0RegisterChunk(PGMM pGMM, PGMMCHUNKFREESET pSet, RTR0MEMOBJ MemObj, uint16_t hGVM, uint16_t fChunkFlags,
2022	PGMMCHUNK *ppChunk)
2023	{
2024	Assert(pGMM->hMtxOwner != RTThreadNativeSelf());
2025	Assert(hGVM != NIL_GVM_HANDLE \|\| pGMM->fBoundMemoryMode);
2026	Assert(fChunkFlags == 0 \|\| fChunkFlags == GMM_CHUNK_FLAGS_LARGE_PAGE);
2027
2028	int rc;
2029	PGMMCHUNK pChunk = (PGMMCHUNK)RTMemAllocZ(sizeof(*pChunk));
2030	if (pChunk)
2031	{
2032	/*
2033	* Initialize it.
2034	*/
2035	pChunk->hMemObj = MemObj;
2036	pChunk->cFree = GMM_CHUNK_NUM_PAGES;
2037	pChunk->hGVM = hGVM;
2038	/pChunk->iFreeHead = 0;/
2039	pChunk->idNumaNode = gmmR0GetCurrentNumaNodeId();
2040	pChunk->iChunkMtx = UINT8_MAX;
2041	pChunk->fFlags = fChunkFlags;
2042	for (unsigned iPage = 0; iPage < RT_ELEMENTS(pChunk->aPages) - 1; iPage++)
2043	{
2044	pChunk->aPages[iPage].Free.u2State = GMM_PAGE_STATE_FREE;
2045	pChunk->aPages[iPage].Free.iNext = iPage + 1;
2046	}
2047	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.u2State = GMM_PAGE_STATE_FREE;
2048	pChunk->aPages[RT_ELEMENTS(pChunk->aPages) - 1].Free.iNext = UINT16_MAX;
2049
2050	/*
2051	* Allocate a Chunk ID and insert it into the tree.
2052	* This has to be done behind the mutex of course.
2053	*/
2054	rc = gmmR0MutexAcquire(pGMM);
2055	if (RT_SUCCESS(rc))
2056	{
2057	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2058	{
2059	pChunk->Core.Key = gmmR0AllocateChunkId(pGMM);
2060	if ( pChunk->Core.Key != NIL_GMM_CHUNKID
2061	&& pChunk->Core.Key <= GMM_CHUNKID_LAST
2062	&& RTAvlU32Insert(&pGMM->pChunks, &pChunk->Core))
2063	{
2064	pGMM->cChunks++;
2065	RTListAppend(&pGMM->ChunkList, &pChunk->ListNode);
2066	gmmR0LinkChunk(pChunk, pSet);
2067	LogFlow(("gmmR0RegisterChunk: pChunk=%p id=%#x cChunks=%d\n", pChunk, pChunk->Core.Key, pGMM->cChunks));
2068
2069	if (ppChunk)
2070	*ppChunk = pChunk;
2071	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2072	return VINF_SUCCESS;
2073	}
2074
2075	/* bail out */
2076	rc = VERR_GMM_CHUNK_INSERT;
2077	}
2078	else
2079	rc = VERR_GMM_IS_NOT_SANE;
2080	gmmR0MutexRelease(pGMM);
2081	}
2082
2083	RTMemFree(pChunk);
2084	}
2085	else
2086	rc = VERR_NO_MEMORY;
2087	return rc;
2088	}
2089
2090
2091	/**
2092	* Allocate a new chunk, immediately pick the requested pages from it, and adds
2093	* what's remaining to the specified free set.
2094	*
2095	* @note This will leave the giant mutex while allocating the new chunk!
2096	*
2097	* @returns VBox status code.
2098	* @param pGMM Pointer to the GMM instance data.
2099	* @param pGVM Pointer to the kernel-only VM instace data.
2100	* @param pSet Pointer to the free set.
2101	* @param cPages The number of pages requested.
2102	* @param paPages The page descriptor table (input + output).
2103	* @param piPage The pointer to the page descriptor table index
2104	* variable. This will be updated.
2105	*/
2106	static int gmmR0AllocateChunkNew(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet, uint32_t cPages,
2107	PGMMPAGEDESC paPages, uint32_t *piPage)
2108	{
2109	gmmR0MutexRelease(pGMM);
2110
2111	RTR0MEMOBJ hMemObj;
2112	int rc = RTR0MemObjAllocPhysNC(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS);
2113	if (RT_SUCCESS(rc))
2114	{
2115	/** @todo Duplicate gmmR0RegisterChunk here so we can avoid chaining up the
2116	* free pages first and then unchaining them right afterwards. Instead
2117	* do as much work as possible without holding the giant lock. */
2118	PGMMCHUNK pChunk;
2119	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, 0 /fChunkFlags/, &pChunk);
2120	if (RT_SUCCESS(rc))
2121	{
2122	piPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, piPage, cPages, paPages);
2123	return VINF_SUCCESS;
2124	}
2125
2126	/* bail out */
2127	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2128	}
2129
2130	int rc2 = gmmR0MutexAcquire(pGMM);
2131	AssertRCReturn(rc2, RT_FAILURE(rc) ? rc : rc2);
2132	return rc;
2133
2134	}
2135
2136
2137	/**
2138	* As a last restort we'll pick any page we can get.
2139	*
2140	* @returns The new page descriptor table index.
2141	* @param pSet The set to pick from.
2142	* @param pGVM Pointer to the global VM structure.
2143	* @param iPage The current page descriptor table index.
2144	* @param cPages The total number of pages to allocate.
2145	* @param paPages The page descriptor table (input + ouput).
2146	*/
2147	static uint32_t gmmR0AllocatePagesIndiscriminately(PGMMCHUNKFREESET pSet, PGVM pGVM,
2148	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2149	{
2150	unsigned iList = RT_ELEMENTS(pSet->apLists);
2151	while (iList-- > 0)
2152	{
2153	PGMMCHUNK pChunk = pSet->apLists[iList];
2154	while (pChunk)
2155	{
2156	PGMMCHUNK pNext = pChunk->pFreeNext;
2157
2158	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2159	if (iPage >= cPages)
2160	return iPage;
2161
2162	pChunk = pNext;
2163	}
2164	}
2165	return iPage;
2166	}
2167
2168
2169	/**
2170	* Pick pages from empty chunks on the same NUMA node.
2171	*
2172	* @returns The new page descriptor table index.
2173	* @param pSet The set to pick from.
2174	* @param pGVM Pointer to the global VM structure.
2175	* @param iPage The current page descriptor table index.
2176	* @param cPages The total number of pages to allocate.
2177	* @param paPages The page descriptor table (input + ouput).
2178	*/
2179	static uint32_t gmmR0AllocatePagesFromEmptyChunksOnSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2180	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2181	{
2182	PGMMCHUNK pChunk = pSet->apLists[GMM_CHUNK_FREE_SET_UNUSED_LIST];
2183	if (pChunk)
2184	{
2185	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2186	while (pChunk)
2187	{
2188	PGMMCHUNK pNext = pChunk->pFreeNext;
2189
2190	if (pChunk->idNumaNode == idNumaNode)
2191	{
2192	pChunk->hGVM = pGVM->hSelf;
2193	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2194	if (iPage >= cPages)
2195	{
2196	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2197	return iPage;
2198	}
2199	}
2200
2201	pChunk = pNext;
2202	}
2203	}
2204	return iPage;
2205	}
2206
2207
2208	/**
2209	* Pick pages from non-empty chunks on the same NUMA node.
2210	*
2211	* @returns The new page descriptor table index.
2212	* @param pSet The set to pick from.
2213	* @param pGVM Pointer to the global VM structure.
2214	* @param iPage The current page descriptor table index.
2215	* @param cPages The total number of pages to allocate.
2216	* @param paPages The page descriptor table (input + ouput).
2217	*/
2218	static uint32_t gmmR0AllocatePagesFromSameNode(PGMMCHUNKFREESET pSet, PGVM pGVM,
2219	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2220	{
2221	/** @todo start by picking from chunks with about the right size first? */
2222	uint16_t const idNumaNode = gmmR0GetCurrentNumaNodeId();
2223	unsigned iList = GMM_CHUNK_FREE_SET_UNUSED_LIST;
2224	while (iList-- > 0)
2225	{
2226	PGMMCHUNK pChunk = pSet->apLists[iList];
2227	while (pChunk)
2228	{
2229	PGMMCHUNK pNext = pChunk->pFreeNext;
2230
2231	if (pChunk->idNumaNode == idNumaNode)
2232	{
2233	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2234	if (iPage >= cPages)
2235	{
2236	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2237	return iPage;
2238	}
2239	}
2240
2241	pChunk = pNext;
2242	}
2243	}
2244	return iPage;
2245	}
2246
2247
2248	/**
2249	* Pick pages that are in chunks already associated with the VM.
2250	*
2251	* @returns The new page descriptor table index.
2252	* @param pGMM Pointer to the GMM instance data.
2253	* @param pGVM Pointer to the global VM structure.
2254	* @param pSet The set to pick from.
2255	* @param iPage The current page descriptor table index.
2256	* @param cPages The total number of pages to allocate.
2257	* @param paPages The page descriptor table (input + ouput).
2258	*/
2259	static uint32_t gmmR0AllocatePagesAssociatedWithVM(PGMM pGMM, PGVM pGVM, PGMMCHUNKFREESET pSet,
2260	uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2261	{
2262	uint16_t const hGVM = pGVM->hSelf;
2263
2264	/* Hint. */
2265	if (pGVM->gmm.s.idLastChunkHint != NIL_GMM_CHUNKID)
2266	{
2267	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pGVM->gmm.s.idLastChunkHint);
2268	if (pChunk && pChunk->cFree)
2269	{
2270	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2271	if (iPage >= cPages)
2272	return iPage;
2273	}
2274	}
2275
2276	/* Scan. */
2277	for (unsigned iList = 0; iList < RT_ELEMENTS(pSet->apLists); iList++)
2278	{
2279	PGMMCHUNK pChunk = pSet->apLists[iList];
2280	while (pChunk)
2281	{
2282	PGMMCHUNK pNext = pChunk->pFreeNext;
2283
2284	if (pChunk->hGVM == hGVM)
2285	{
2286	iPage = gmmR0AllocatePagesFromChunk(pChunk, hGVM, iPage, cPages, paPages);
2287	if (iPage >= cPages)
2288	{
2289	pGVM->gmm.s.idLastChunkHint = pChunk->cFree ? pChunk->Core.Key : NIL_GMM_CHUNKID;
2290	return iPage;
2291	}
2292	}
2293
2294	pChunk = pNext;
2295	}
2296	}
2297	return iPage;
2298	}
2299
2300
2301
2302	/**
2303	* Pick pages in bound memory mode.
2304	*
2305	* @returns The new page descriptor table index.
2306	* @param pGVM Pointer to the global VM structure.
2307	* @param iPage The current page descriptor table index.
2308	* @param cPages The total number of pages to allocate.
2309	* @param paPages The page descriptor table (input + ouput).
2310	*/
2311	static uint32_t gmmR0AllocatePagesInBoundMode(PGVM pGVM, uint32_t iPage, uint32_t cPages, PGMMPAGEDESC paPages)
2312	{
2313	for (unsigned iList = 0; iList < RT_ELEMENTS(pGVM->gmm.s.Private.apLists); iList++)
2314	{
2315	PGMMCHUNK pChunk = pGVM->gmm.s.Private.apLists[iList];
2316	while (pChunk)
2317	{
2318	Assert(pChunk->hGVM == pGVM->hSelf);
2319	PGMMCHUNK pNext = pChunk->pFreeNext;
2320	iPage = gmmR0AllocatePagesFromChunk(pChunk, pGVM->hSelf, iPage, cPages, paPages);
2321	if (iPage >= cPages)
2322	return iPage;
2323	pChunk = pNext;
2324	}
2325	}
2326	return iPage;
2327	}
2328
2329
2330	/**
2331	* Checks if we should start picking pages from chunks of other VMs.
2332	*
2333	* @returns @c true if we should, @c false if we should first try allocate more
2334	* chunks.
2335	*/
2336	static bool gmmR0ShouldAllocatePagesInOtherChunks(PGVM pGVM)
2337	{
2338	/*
2339	* Don't allocate a new chunk if we're
2340	*/
2341	uint64_t cPgReserved = pGVM->gmm.s.Reserved.cBasePages
2342	+ pGVM->gmm.s.Reserved.cFixedPages
2343	- pGVM->gmm.s.cBalloonedPages
2344	/** @todo what about shared pages? */;
2345	uint64_t cPgAllocated = pGVM->gmm.s.Allocated.cBasePages
2346	+ pGVM->gmm.s.Allocated.cFixedPages;
2347	uint64_t cPgDelta = cPgReserved - cPgAllocated;
2348	if (cPgDelta < GMM_CHUNK_NUM_PAGES * 4)
2349	return true;
2350	/** @todo make the threshold configurable, also test the code to see if
2351	* this ever kicks in (we might be reserving too much or smth). */
2352
2353	/*
2354	* Check how close we're to the max memory limit and how many fragments
2355	* there are?...
2356	*/
2357	/** @todo. */
2358
2359	return false;
2360	}
2361
2362
2363	/**
2364	* Common worker for GMMR0AllocateHandyPages and GMMR0AllocatePages.
2365	*
2366	* @returns VBox status code:
2367	* @retval VINF_SUCCESS on success.
2368	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk or
2369	* gmmR0AllocateMoreChunks is necessary.
2370	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2371	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2372	* that is we're trying to allocate more than we've reserved.
2373	*
2374	* @param pGMM Pointer to the GMM instance data.
2375	* @param pGVM Pointer to the shared VM structure.
2376	* @param cPages The number of pages to allocate.
2377	* @param paPages Pointer to the page descriptors.
2378	* See GMMPAGEDESC for details on what is expected on input.
2379	* @param enmAccount The account to charge.
2380	*
2381	* @remarks Call takes the giant GMM lock.
2382	*/
2383	static int gmmR0AllocatePagesNew(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2384	{
2385	Assert(pGMM->hMtxOwner == RTThreadNativeSelf());
2386
2387	/*
2388	* Check allocation limits.
2389	*/
2390	if (RT_UNLIKELY(pGMM->cAllocatedPages + cPages > pGMM->cMaxPages))
2391	return VERR_GMM_HIT_GLOBAL_LIMIT;
2392
2393	switch (enmAccount)
2394	{
2395	case GMMACCOUNT_BASE:
2396	if (RT_UNLIKELY( pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cPages
2397	> pGVM->gmm.s.Reserved.cBasePages))
2398	{
2399	Log(("gmmR0AllocatePages:Base: Reserved=%#llx Allocated+Ballooned+Requested=%#llx+%#llx+%#x!\n",
2400	pGVM->gmm.s.Reserved.cBasePages, pGVM->gmm.s.Allocated.cBasePages, pGVM->gmm.s.cBalloonedPages, cPages));
2401	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2402	}
2403	break;
2404	case GMMACCOUNT_SHADOW:
2405	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cShadowPages + cPages > pGVM->gmm.s.Reserved.cShadowPages))
2406	{
2407	Log(("gmmR0AllocatePages:Shadow: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2408	pGVM->gmm.s.Reserved.cShadowPages, pGVM->gmm.s.Allocated.cShadowPages, cPages));
2409	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2410	}
2411	break;
2412	case GMMACCOUNT_FIXED:
2413	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cFixedPages + cPages > pGVM->gmm.s.Reserved.cFixedPages))
2414	{
2415	Log(("gmmR0AllocatePages:Fixed: Reserved=%#x Allocated+Requested=%#x+%#x!\n",
2416	pGVM->gmm.s.Reserved.cFixedPages, pGVM->gmm.s.Allocated.cFixedPages, cPages));
2417	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2418	}
2419	break;
2420	default:
2421	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2422	}
2423
2424	/*
2425	* If we're in legacy memory mode, it's easy to figure if we have
2426	* sufficient number of pages up-front.
2427	*/
2428	if ( pGMM->fLegacyAllocationMode
2429	&& pGVM->gmm.s.Private.cFreePages < cPages)
2430	{
2431	Assert(pGMM->fBoundMemoryMode);
2432	return VERR_GMM_SEED_ME;
2433	}
2434
2435	/*
2436	* Update the accounts before we proceed because we might be leaving the
2437	* protection of the global mutex and thus run the risk of permitting
2438	* too much memory to be allocated.
2439	*/
2440	switch (enmAccount)
2441	{
2442	case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages += cPages; break;
2443	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages += cPages; break;
2444	case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages += cPages; break;
2445	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2446	}
2447	pGVM->gmm.s.cPrivatePages += cPages;
2448	pGMM->cAllocatedPages += cPages;
2449
2450	/*
2451	* Part two of it's-easy-in-legacy-memory-mode.
2452	*/
2453	uint32_t iPage = 0;
2454	if (pGMM->fLegacyAllocationMode)
2455	{
2456	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2457	AssertReleaseReturn(iPage == cPages, VERR_GMM_ALLOC_PAGES_IPE);
2458	return VINF_SUCCESS;
2459	}
2460
2461	/*
2462	* Bound mode is also relatively straightforward.
2463	*/
2464	int rc = VINF_SUCCESS;
2465	if (pGMM->fBoundMemoryMode)
2466	{
2467	iPage = gmmR0AllocatePagesInBoundMode(pGVM, iPage, cPages, paPages);
2468	if (iPage < cPages)
2469	do
2470	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGVM->gmm.s.Private, cPages, paPages, &iPage);
2471	while (iPage < cPages && RT_SUCCESS(rc));
2472	}
2473	/*
2474	* Shared mode is trickier as we should try archive the same locality as
2475	* in bound mode, but smartly make use of non-full chunks allocated by
2476	* other VMs if we're low on memory.
2477	*/
2478	else
2479	{
2480	/* Pick the most optimal pages first. */
2481	iPage = gmmR0AllocatePagesAssociatedWithVM(pGMM, pGVM, &pGMM->PrivateX, iPage, cPages, paPages);
2482	if (iPage < cPages)
2483	{
2484	/* Maybe we should try getting pages from chunks "belonging" to
2485	other VMs before allocating more chunks? */
2486	if (gmmR0ShouldAllocatePagesInOtherChunks(pGVM))
2487	iPage = gmmR0AllocatePagesFromSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2488
2489	/* Allocate memory from empty chunks. */
2490	if (iPage < cPages)
2491	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2492
2493	/* Grab empty shared chunks. */
2494	if (iPage < cPages)
2495	iPage = gmmR0AllocatePagesFromEmptyChunksOnSameNode(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2496
2497	/*
2498	* Ok, try allocate new chunks.
2499	*/
2500	if (iPage < cPages)
2501	{
2502	do
2503	rc = gmmR0AllocateChunkNew(pGMM, pGVM, &pGMM->PrivateX, cPages, paPages, &iPage);
2504	while (iPage < cPages && RT_SUCCESS(rc));
2505
2506	/* If the host is out of memory, take whatever we can get. */
2507	if ( rc == VERR_NO_MEMORY
2508	&& pGMM->PrivateX.cFreePages + pGMM->Shared.cFreePages >= cPages - iPage)
2509	{
2510	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->PrivateX, pGVM, iPage, cPages, paPages);
2511	if (iPage < cPages)
2512	iPage = gmmR0AllocatePagesIndiscriminately(&pGMM->Shared, pGVM, iPage, cPages, paPages);
2513	AssertRelease(iPage == cPages);
2514	rc = VINF_SUCCESS;
2515	}
2516	}
2517	}
2518	}
2519
2520	/*
2521	* Clean up on failure. Since this is bound to be a low-memory condition
2522	* we will give back any empty chunks that might be hanging around.
2523	*/
2524	if (RT_FAILURE(rc))
2525	{
2526	/* Update the statistics. */
2527	pGVM->gmm.s.cPrivatePages -= cPages;
2528	pGMM->cAllocatedPages -= cPages - iPage;
2529	switch (enmAccount)
2530	{
2531	case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages -= cPages; break;
2532	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages -= cPages; break;
2533	case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages -= cPages; break;
2534	default: AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
2535	}
2536
2537	/* Release the pages. */
2538	while (iPage-- > 0)
2539	{
2540	uint32_t idPage = paPages[iPage].idPage;
2541	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
2542	if (RT_LIKELY(pPage))
2543	{
2544	Assert(GMM_PAGE_IS_PRIVATE(pPage));
2545	Assert(pPage->Private.hGVM == pGVM->hSelf);
2546	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
2547	}
2548	else
2549	AssertMsgFailed(("idPage=%#x\n", idPage));
2550
2551	paPages[iPage].idPage = NIL_GMM_PAGEID;
2552	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2553	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2554	}
2555
2556	/* Free empty chunks. */
2557	/** @todo */
2558
2559	/* return the fail status on failure */
2560	return rc;
2561	}
2562	return VINF_SUCCESS;
2563	}
2564
2565
2566	/**
2567	* Updates the previous allocations and allocates more pages.
2568	*
2569	* The handy pages are always taken from the 'base' memory account.
2570	* The allocated pages are not cleared and will contains random garbage.
2571	*
2572	* @returns VBox status code:
2573	* @retval VINF_SUCCESS on success.
2574	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2575	* @retval VERR_GMM_PAGE_NOT_FOUND if one of the pages to update wasn't found.
2576	* @retval VERR_GMM_PAGE_NOT_PRIVATE if one of the pages to update wasn't a
2577	* private page.
2578	* @retval VERR_GMM_PAGE_NOT_SHARED if one of the pages to update wasn't a
2579	* shared page.
2580	* @retval VERR_GMM_NOT_PAGE_OWNER if one of the pages to be updated wasn't
2581	* owned by the VM.
2582	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2583	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2584	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2585	* that is we're trying to allocate more than we've reserved.
2586	*
2587	* @param pVM Pointer to the shared VM structure.
2588	* @param idCpu VCPU id
2589	* @param cPagesToUpdate The number of pages to update (starting from the head).
2590	* @param cPagesToAlloc The number of pages to allocate (starting from the head).
2591	* @param paPages The array of page descriptors.
2592	* See GMMPAGEDESC for details on what is expected on input.
2593	* @thread EMT.
2594	*/
2595	GMMR0DECL(int) GMMR0AllocateHandyPages(PVM pVM, VMCPUID idCpu, uint32_t cPagesToUpdate, uint32_t cPagesToAlloc, PGMMPAGEDESC paPages)
2596	{
2597	LogFlow(("GMMR0AllocateHandyPages: pVM=%p cPagesToUpdate=%#x cPagesToAlloc=%#x paPages=%p\n",
2598	pVM, cPagesToUpdate, cPagesToAlloc, paPages));
2599
2600	/*
2601	* Validate, get basics and take the semaphore.
2602	* (This is a relatively busy path, so make predictions where possible.)
2603	*/
2604	PGMM pGMM;
2605	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2606	PGVM pGVM;
2607	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2608	if (RT_FAILURE(rc))
2609	return rc;
2610
2611	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2612	AssertMsgReturn( (cPagesToUpdate && cPagesToUpdate < 1024)
2613	\|\| (cPagesToAlloc && cPagesToAlloc < 1024),
2614	("cPagesToUpdate=%#x cPagesToAlloc=%#x\n", cPagesToUpdate, cPagesToAlloc),
2615	VERR_INVALID_PARAMETER);
2616
2617	unsigned iPage = 0;
2618	for (; iPage < cPagesToUpdate; iPage++)
2619	{
2620	AssertMsgReturn( ( paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2621	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK))
2622	\|\| paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2623	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE,
2624	("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys),
2625	VERR_INVALID_PARAMETER);
2626	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2627	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
2628	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2629	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
2630	/\|\| paPages[iPage].idSharedPage == NIL_GMM_PAGEID/,
2631	("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2632	}
2633
2634	for (; iPage < cPagesToAlloc; iPage++)
2635	{
2636	AssertMsgReturn(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS, ("#%#x: %RHp\n", iPage, paPages[iPage].HCPhysGCPhys), VERR_INVALID_PARAMETER);
2637	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2638	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2639	}
2640
2641	gmmR0MutexAcquire(pGMM);
2642	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2643	{
2644	/* No allocations before the initial reservation has been made! */
2645	if (RT_LIKELY( pGVM->gmm.s.Reserved.cBasePages
2646	&& pGVM->gmm.s.Reserved.cFixedPages
2647	&& pGVM->gmm.s.Reserved.cShadowPages))
2648	{
2649	/*
2650	* Perform the updates.
2651	* Stop on the first error.
2652	*/
2653	for (iPage = 0; iPage < cPagesToUpdate; iPage++)
2654	{
2655	if (paPages[iPage].idPage != NIL_GMM_PAGEID)
2656	{
2657	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idPage);
2658	if (RT_LIKELY(pPage))
2659	{
2660	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
2661	{
2662	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
2663	{
2664	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2665	if (RT_LIKELY(paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST))
2666	pPage->Private.pfn = paPages[iPage].HCPhysGCPhys >> PAGE_SHIFT;
2667	else if (paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE)
2668	pPage->Private.pfn = GMM_PAGE_PFN_UNSHAREABLE;
2669	/* else: NIL_RTHCPHYS nothing */
2670
2671	paPages[iPage].idPage = NIL_GMM_PAGEID;
2672	paPages[iPage].HCPhysGCPhys = NIL_RTHCPHYS;
2673	}
2674	else
2675	{
2676	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not owner! hGVM=%#x hSelf=%#x\n",
2677	iPage, paPages[iPage].idPage, pPage->Private.hGVM, pGVM->hSelf));
2678	rc = VERR_GMM_NOT_PAGE_OWNER;
2679	break;
2680	}
2681	}
2682	else
2683	{
2684	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not private! %.Rhxs (type %d)\n", iPage, paPages[iPage].idPage, sizeof(pPage), pPage, pPage->Common.u2State));
2685	rc = VERR_GMM_PAGE_NOT_PRIVATE;
2686	break;
2687	}
2688	}
2689	else
2690	{
2691	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (private)\n", iPage, paPages[iPage].idPage));
2692	rc = VERR_GMM_PAGE_NOT_FOUND;
2693	break;
2694	}
2695	}
2696
2697	if (paPages[iPage].idSharedPage != NIL_GMM_PAGEID)
2698	{
2699	PGMMPAGE pPage = gmmR0GetPage(pGMM, paPages[iPage].idSharedPage);
2700	if (RT_LIKELY(pPage))
2701	{
2702	if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
2703	{
2704	AssertCompile(NIL_RTHCPHYS > GMM_GCPHYS_LAST && GMM_GCPHYS_UNSHAREABLE > GMM_GCPHYS_LAST);
2705	Assert(pPage->Shared.cRefs);
2706	Assert(pGVM->gmm.s.cSharedPages);
2707	Assert(pGVM->gmm.s.Allocated.cBasePages);
2708
2709	Log(("GMMR0AllocateHandyPages: free shared page %x cRefs=%d\n", paPages[iPage].idSharedPage, pPage->Shared.cRefs));
2710	pGVM->gmm.s.cSharedPages--;
2711	pGVM->gmm.s.Allocated.cBasePages--;
2712	if (!--pPage->Shared.cRefs)
2713	gmmR0FreeSharedPage(pGMM, pGVM, paPages[iPage].idSharedPage, pPage);
2714	else
2715	{
2716	Assert(pGMM->cDuplicatePages);
2717	pGMM->cDuplicatePages--;
2718	}
2719
2720	paPages[iPage].idSharedPage = NIL_GMM_PAGEID;
2721	}
2722	else
2723	{
2724	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not shared!\n", iPage, paPages[iPage].idSharedPage));
2725	rc = VERR_GMM_PAGE_NOT_SHARED;
2726	break;
2727	}
2728	}
2729	else
2730	{
2731	Log(("GMMR0AllocateHandyPages: #%#x/%#x: Not found! (shared)\n", iPage, paPages[iPage].idSharedPage));
2732	rc = VERR_GMM_PAGE_NOT_FOUND;
2733	break;
2734	}
2735	}
2736	} /* for each page to update */
2737
2738	if (RT_SUCCESS(rc))
2739	{
2740	#if defined(VBOX_STRICT) && 0 /** @todo re-test this later. Appeared to be a PGM init bug. */
2741	for (iPage = 0; iPage < cPagesToAlloc; iPage++)
2742	{
2743	Assert(paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS);
2744	Assert(paPages[iPage].idPage == NIL_GMM_PAGEID);
2745	Assert(paPages[iPage].idSharedPage == NIL_GMM_PAGEID);
2746	}
2747	#endif
2748
2749	/*
2750	* Join paths with GMMR0AllocatePages for the allocation.
2751	* Note! gmmR0AllocateMoreChunks may leave the protection of the mutex!
2752	*/
2753	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPagesToAlloc, paPages, GMMACCOUNT_BASE);
2754	}
2755	}
2756	else
2757	rc = VERR_WRONG_ORDER;
2758	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2759	}
2760	else
2761	rc = VERR_GMM_IS_NOT_SANE;
2762	gmmR0MutexRelease(pGMM);
2763	LogFlow(("GMMR0AllocateHandyPages: returns %Rrc\n", rc));
2764	return rc;
2765	}
2766
2767
2768	/**
2769	* Allocate one or more pages.
2770	*
2771	* This is typically used for ROMs and MMIO2 (VRAM) during VM creation.
2772	* The allocated pages are not cleared and will contains random garbage.
2773	*
2774	* @returns VBox status code:
2775	* @retval VINF_SUCCESS on success.
2776	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2777	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2778	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2779	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2780	* that is we're trying to allocate more than we've reserved.
2781	*
2782	* @param pVM Pointer to the shared VM structure.
2783	* @param idCpu VCPU id
2784	* @param cPages The number of pages to allocate.
2785	* @param paPages Pointer to the page descriptors.
2786	* See GMMPAGEDESC for details on what is expected on input.
2787	* @param enmAccount The account to charge.
2788	*
2789	* @thread EMT.
2790	*/
2791	GMMR0DECL(int) GMMR0AllocatePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMPAGEDESC paPages, GMMACCOUNT enmAccount)
2792	{
2793	LogFlow(("GMMR0AllocatePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
2794
2795	/*
2796	* Validate, get basics and take the semaphore.
2797	*/
2798	PGMM pGMM;
2799	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2800	PGVM pGVM;
2801	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2802	if (RT_FAILURE(rc))
2803	return rc;
2804
2805	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
2806	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
2807	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
2808
2809	for (unsigned iPage = 0; iPage < cPages; iPage++)
2810	{
2811	AssertMsgReturn( paPages[iPage].HCPhysGCPhys == NIL_RTHCPHYS
2812	\|\| paPages[iPage].HCPhysGCPhys == GMM_GCPHYS_UNSHAREABLE
2813	\|\| ( enmAccount == GMMACCOUNT_BASE
2814	&& paPages[iPage].HCPhysGCPhys <= GMM_GCPHYS_LAST
2815	&& !(paPages[iPage].HCPhysGCPhys & PAGE_OFFSET_MASK)),
2816	("#%#x: %RHp enmAccount=%d\n", iPage, paPages[iPage].HCPhysGCPhys, enmAccount),
2817	VERR_INVALID_PARAMETER);
2818	AssertMsgReturn(paPages[iPage].idPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
2819	AssertMsgReturn(paPages[iPage].idSharedPage == NIL_GMM_PAGEID, ("#%#x: %#x\n", iPage, paPages[iPage].idSharedPage), VERR_INVALID_PARAMETER);
2820	}
2821
2822	gmmR0MutexAcquire(pGMM);
2823	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2824	{
2825
2826	/* No allocations before the initial reservation has been made! */
2827	if (RT_LIKELY( pGVM->gmm.s.Reserved.cBasePages
2828	&& pGVM->gmm.s.Reserved.cFixedPages
2829	&& pGVM->gmm.s.Reserved.cShadowPages))
2830	rc = gmmR0AllocatePagesNew(pGMM, pGVM, cPages, paPages, enmAccount);
2831	else
2832	rc = VERR_WRONG_ORDER;
2833	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
2834	}
2835	else
2836	rc = VERR_GMM_IS_NOT_SANE;
2837	gmmR0MutexRelease(pGMM);
2838	LogFlow(("GMMR0AllocatePages: returns %Rrc\n", rc));
2839	return rc;
2840	}
2841
2842
2843	/**
2844	* VMMR0 request wrapper for GMMR0AllocatePages.
2845	*
2846	* @returns see GMMR0AllocatePages.
2847	* @param pVM Pointer to the shared VM structure.
2848	* @param idCpu VCPU id
2849	* @param pReq The request packet.
2850	*/
2851	GMMR0DECL(int) GMMR0AllocatePagesReq(PVM pVM, VMCPUID idCpu, PGMMALLOCATEPAGESREQ pReq)
2852	{
2853	/*
2854	* Validate input and pass it on.
2855	*/
2856	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
2857	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
2858	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0]),
2859	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[0])),
2860	VERR_INVALID_PARAMETER);
2861	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages]),
2862	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMALLOCATEPAGESREQ, aPages[pReq->cPages])),
2863	VERR_INVALID_PARAMETER);
2864
2865	return GMMR0AllocatePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
2866	}
2867
2868
2869	/**
2870	* Allocate a large page to represent guest RAM
2871	*
2872	* The allocated pages are not cleared and will contains random garbage.
2873	*
2874	* @returns VBox status code:
2875	* @retval VINF_SUCCESS on success.
2876	* @retval VERR_NOT_OWNER if the caller is not an EMT.
2877	* @retval VERR_GMM_SEED_ME if seeding via GMMR0SeedChunk is necessary.
2878	* @retval VERR_GMM_HIT_GLOBAL_LIMIT if we've exhausted the available pages.
2879	* @retval VERR_GMM_HIT_VM_ACCOUNT_LIMIT if we've hit the VM account limit,
2880	* that is we're trying to allocate more than we've reserved.
2881	* @returns see GMMR0AllocatePages.
2882	* @param pVM Pointer to the shared VM structure.
2883	* @param idCpu VCPU id
2884	* @param cbPage Large page size
2885	*/
2886	GMMR0DECL(int) GMMR0AllocateLargePage(PVM pVM, VMCPUID idCpu, uint32_t cbPage, uint32_t pIdPage, RTHCPHYS pHCPhys)
2887	{
2888	LogFlow(("GMMR0AllocateLargePage: pVM=%p cbPage=%x\n", pVM, cbPage));
2889
2890	AssertReturn(cbPage == GMM_CHUNK_SIZE, VERR_INVALID_PARAMETER);
2891	AssertPtrReturn(pIdPage, VERR_INVALID_PARAMETER);
2892	AssertPtrReturn(pHCPhys, VERR_INVALID_PARAMETER);
2893
2894	/*
2895	* Validate, get basics and take the semaphore.
2896	*/
2897	PGMM pGMM;
2898	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
2899	PGVM pGVM;
2900	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
2901	if (RT_FAILURE(rc))
2902	return rc;
2903
2904	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
2905	if (pGMM->fLegacyAllocationMode)
2906	return VERR_NOT_SUPPORTED;
2907
2908	*pHCPhys = NIL_RTHCPHYS;
2909	*pIdPage = NIL_GMM_PAGEID;
2910
2911	gmmR0MutexAcquire(pGMM);
2912	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
2913	{
2914	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
2915	if (RT_UNLIKELY( pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cPages
2916	> pGVM->gmm.s.Reserved.cBasePages))
2917	{
2918	Log(("GMMR0AllocateLargePage: Reserved=%#llx Allocated+Requested=%#llx+%#x!\n",
2919	pGVM->gmm.s.Reserved.cBasePages, pGVM->gmm.s.Allocated.cBasePages, cPages));
2920	gmmR0MutexRelease(pGMM);
2921	return VERR_GMM_HIT_VM_ACCOUNT_LIMIT;
2922	}
2923
2924	/*
2925	* Allocate a new large page chunk.
2926	*
2927	* Note! We leave the giant GMM lock temporarily as the allocation might
2928	* take a long time. gmmR0RegisterChunk will retake it (ugly).
2929	*/
2930	AssertCompile(GMM_CHUNK_SIZE == _2M);
2931	gmmR0MutexRelease(pGMM);
2932
2933	RTR0MEMOBJ hMemObj;
2934	rc = RTR0MemObjAllocPhysEx(&hMemObj, GMM_CHUNK_SIZE, NIL_RTHCPHYS, GMM_CHUNK_SIZE);
2935	if (RT_SUCCESS(rc))
2936	{
2937	PGMMCHUNKFREESET pSet = pGMM->fBoundMemoryMode ? &pGVM->gmm.s.Private : &pGMM->PrivateX;
2938	PGMMCHUNK pChunk;
2939	rc = gmmR0RegisterChunk(pGMM, pSet, hMemObj, pGVM->hSelf, GMM_CHUNK_FLAGS_LARGE_PAGE, &pChunk);
2940	if (RT_SUCCESS(rc))
2941	{
2942	/*
2943	* Allocate all the pages in the chunk.
2944	*/
2945	/* Unlink the new chunk from the free list. */
2946	gmmR0UnlinkChunk(pChunk);
2947
2948	/** @todo rewrite this to skip the looping. */
2949	/* Allocate all pages. */
2950	GMMPAGEDESC PageDesc;
2951	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2952
2953	/* Return the first page as we'll use the whole chunk as one big page. */
2954	*pIdPage = PageDesc.idPage;
2955	*pHCPhys = PageDesc.HCPhysGCPhys;
2956
2957	for (unsigned i = 1; i < cPages; i++)
2958	gmmR0AllocatePage(pChunk, pGVM->hSelf, &PageDesc);
2959
2960	/* Update accounting. */
2961	pGVM->gmm.s.Allocated.cBasePages += cPages;
2962	pGVM->gmm.s.cPrivatePages += cPages;
2963	pGMM->cAllocatedPages += cPages;
2964
2965	gmmR0LinkChunk(pChunk, pSet);
2966	gmmR0MutexRelease(pGMM);
2967	}
2968	else
2969	RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
2970	}
2971	}
2972	else
2973	{
2974	gmmR0MutexRelease(pGMM);
2975	rc = VERR_GMM_IS_NOT_SANE;
2976	}
2977
2978	LogFlow(("GMMR0AllocateLargePage: returns %Rrc\n", rc));
2979	return rc;
2980	}
2981
2982
2983	/**
2984	* Free a large page
2985	*
2986	* @returns VBox status code:
2987	* @param pVM Pointer to the shared VM structure.
2988	* @param idCpu VCPU id
2989	* @param idPage Large page id
2990	*/
2991	GMMR0DECL(int) GMMR0FreeLargePage(PVM pVM, VMCPUID idCpu, uint32_t idPage)
2992	{
2993	LogFlow(("GMMR0FreeLargePage: pVM=%p idPage=%x\n", pVM, idPage));
2994
2995	/*
2996	* Validate, get basics and take the semaphore.
2997	*/
2998	PGMM pGMM;
2999	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3000	PGVM pGVM;
3001	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3002	if (RT_FAILURE(rc))
3003	return rc;
3004
3005	/* Not supported in legacy mode where we allocate the memory in ring 3 and lock it in ring 0. */
3006	if (pGMM->fLegacyAllocationMode)
3007	return VERR_NOT_SUPPORTED;
3008
3009	gmmR0MutexAcquire(pGMM);
3010	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3011	{
3012	const unsigned cPages = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
3013
3014	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cBasePages < cPages))
3015	{
3016	Log(("GMMR0FreeLargePage: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cBasePages, cPages));
3017	gmmR0MutexRelease(pGMM);
3018	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3019	}
3020
3021	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3022	if (RT_LIKELY( pPage
3023	&& GMM_PAGE_IS_PRIVATE(pPage)))
3024	{
3025	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3026	Assert(pChunk);
3027	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3028	Assert(pChunk->cPrivate > 0);
3029
3030	/* Release the memory immediately. */
3031	gmmR0FreeChunk(pGMM, NULL, pChunk, false /fRelaxedSem/); /** @todo this can be relaxed too! */
3032
3033	/* Update accounting. */
3034	pGVM->gmm.s.Allocated.cBasePages -= cPages;
3035	pGVM->gmm.s.cPrivatePages -= cPages;
3036	pGMM->cAllocatedPages -= cPages;
3037	}
3038	else
3039	rc = VERR_GMM_PAGE_NOT_FOUND;
3040	}
3041	else
3042	rc = VERR_GMM_IS_NOT_SANE;
3043
3044	gmmR0MutexRelease(pGMM);
3045	LogFlow(("GMMR0FreeLargePage: returns %Rrc\n", rc));
3046	return rc;
3047	}
3048
3049
3050	/**
3051	* VMMR0 request wrapper for GMMR0FreeLargePage.
3052	*
3053	* @returns see GMMR0FreeLargePage.
3054	* @param pVM Pointer to the shared VM structure.
3055	* @param idCpu VCPU id
3056	* @param pReq The request packet.
3057	*/
3058	GMMR0DECL(int) GMMR0FreeLargePageReq(PVM pVM, VMCPUID idCpu, PGMMFREELARGEPAGEREQ pReq)
3059	{
3060	/*
3061	* Validate input and pass it on.
3062	*/
3063	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3064	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3065	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMFREEPAGESREQ),
3066	("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(GMMFREEPAGESREQ)),
3067	VERR_INVALID_PARAMETER);
3068
3069	return GMMR0FreeLargePage(pVM, idCpu, pReq->idPage);
3070	}
3071
3072
3073	/**
3074	* Frees a chunk, giving it back to the host OS.
3075	*
3076	* @param pGMM Pointer to the GMM instance.
3077	* @param pGVM This is set when called from GMMR0CleanupVM so we can
3078	* unmap and free the chunk in one go.
3079	* @param pChunk The chunk to free.
3080	* @param fRelaxedSem Whether we can release the semaphore while doing the
3081	* freeing (@c true) or not.
3082	*/
3083	static bool gmmR0FreeChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3084	{
3085	Assert(pChunk->Core.Key != NIL_GMM_CHUNKID);
3086
3087	GMMR0CHUNKMTXSTATE MtxState;
3088	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3089
3090	/*
3091	* Cleanup hack! Unmap the chunk from the callers address space.
3092	* This shouldn't happen, so screw lock contention...
3093	*/
3094	if ( pChunk->cMappingsX
3095	&& !pGMM->fLegacyAllocationMode
3096	&& pGVM)
3097	gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3098
3099	/*
3100	* If there are current mappings of the chunk, then request the
3101	* VMs to unmap them. Reposition the chunk in the free list so
3102	* it won't be a likely candidate for allocations.
3103	*/
3104	if (pChunk->cMappingsX)
3105	{
3106	/** @todo R0 -> VM request */
3107	/* The chunk can be mapped by more than one VM if fBoundMemoryMode is false! */
3108	Log(("gmmR0FreeChunk: chunk still has %d/%d mappings; don't free!\n", pChunk->cMappingsX));
3109	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3110	return false;
3111	}
3112
3113
3114	/*
3115	* Save and trash the handle.
3116	*/
3117	RTR0MEMOBJ const hMemObj = pChunk->hMemObj;
3118	pChunk->hMemObj = NIL_RTR0MEMOBJ;
3119
3120	/*
3121	* Unlink it from everywhere.
3122	*/
3123	gmmR0UnlinkChunk(pChunk);
3124
3125	RTListNodeRemove(&pChunk->ListNode);
3126
3127	PAVLU32NODECORE pCore = RTAvlU32Remove(&pGMM->pChunks, pChunk->Core.Key);
3128	Assert(pCore == &pChunk->Core); NOREF(pCore);
3129
3130	PGMMCHUNKTLBE pTlbe = &pGMM->ChunkTLB.aEntries[GMM_CHUNKTLB_IDX(pChunk->Core.Key)];
3131	if (pTlbe->pChunk == pChunk)
3132	{
3133	pTlbe->idChunk = NIL_GMM_CHUNKID;
3134	pTlbe->pChunk = NULL;
3135	}
3136
3137	Assert(pGMM->cChunks > 0);
3138	pGMM->cChunks--;
3139
3140	/*
3141	* Free the Chunk ID before dropping the locks and freeing the rest.
3142	*/
3143	gmmR0FreeChunkId(pGMM, pChunk->Core.Key);
3144	pChunk->Core.Key = NIL_GMM_CHUNKID;
3145
3146	pGMM->cFreedChunks++;
3147
3148	gmmR0ChunkMutexRelease(&MtxState, NULL);
3149	if (fRelaxedSem)
3150	gmmR0MutexRelease(pGMM);
3151
3152	RTMemFree(pChunk->paMappingsX);
3153	pChunk->paMappingsX = NULL;
3154
3155	RTMemFree(pChunk);
3156
3157	int rc = RTR0MemObjFree(hMemObj, false /* fFreeMappings */);
3158	AssertLogRelRC(rc);
3159
3160	if (fRelaxedSem)
3161	gmmR0MutexAcquire(pGMM);
3162	return fRelaxedSem;
3163	}
3164
3165
3166	/**
3167	* Free page worker.
3168	*
3169	* The caller does all the statistic decrementing, we do all the incrementing.
3170	*
3171	* @param pGMM Pointer to the GMM instance data.
3172	* @param pGVM Pointer to the GVM instance.
3173	* @param pChunk Pointer to the chunk this page belongs to.
3174	* @param idPage The Page ID.
3175	* @param pPage Pointer to the page.
3176	*/
3177	static void gmmR0FreePageWorker(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, uint32_t idPage, PGMMPAGE pPage)
3178	{
3179	Log3(("F pPage=%p iPage=%#x/%#x u2State=%d iFreeHead=%#x\n",
3180	pPage, pPage - &pChunk->aPages[0], idPage, pPage->Common.u2State, pChunk->iFreeHead)); NOREF(idPage);
3181
3182	/*
3183	* Put the page on the free list.
3184	*/
3185	pPage->u = 0;
3186	pPage->Free.u2State = GMM_PAGE_STATE_FREE;
3187	Assert(pChunk->iFreeHead < RT_ELEMENTS(pChunk->aPages) \|\| pChunk->iFreeHead == UINT16_MAX);
3188	pPage->Free.iNext = pChunk->iFreeHead;
3189	pChunk->iFreeHead = pPage - &pChunk->aPages[0];
3190
3191	/*
3192	* Update statistics (the cShared/cPrivate stats are up to date already),
3193	* and relink the chunk if necessary.
3194	*/
3195	unsigned const cFree = pChunk->cFree;
3196	if ( !cFree
3197	\|\| gmmR0SelectFreeSetList(cFree) != gmmR0SelectFreeSetList(cFree + 1))
3198	{
3199	gmmR0UnlinkChunk(pChunk);
3200	pChunk->cFree++;
3201	gmmR0SelectSetAndLinkChunk(pGMM, pGVM, pChunk);
3202	}
3203	else
3204	{
3205	pChunk->cFree = cFree + 1;
3206	pChunk->pSet->cFreePages++;
3207	}
3208
3209	/*
3210	* If the chunk becomes empty, consider giving memory back to the host OS.
3211	*
3212	* The current strategy is to try give it back if there are other chunks
3213	* in this free list, meaning if there are at least 240 free pages in this
3214	* category. Note that since there are probably mappings of the chunk,
3215	* it won't be freed up instantly, which probably screws up this logic
3216	* a bit...
3217	*/
3218	/** @todo Do this on the way out. */
3219	if (RT_UNLIKELY( pChunk->cFree == GMM_CHUNK_NUM_PAGES
3220	&& pChunk->pFreeNext
3221	&& pChunk->pFreePrev /** @todo this is probably misfiring, see reset... */
3222	&& !pGMM->fLegacyAllocationMode))
3223	gmmR0FreeChunk(pGMM, NULL, pChunk, false);
3224
3225	}
3226
3227
3228	/**
3229	* Frees a shared page, the page is known to exist and be valid and such.
3230	*
3231	* @param pGMM Pointer to the GMM instance.
3232	* @param pGVM Pointer to the GVM instance.
3233	* @param idPage The Page ID
3234	* @param pPage The page structure.
3235	*/
3236	DECLINLINE(void) gmmR0FreeSharedPage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3237	{
3238	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3239	Assert(pChunk);
3240	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3241	Assert(pChunk->cShared > 0);
3242	Assert(pGMM->cSharedPages > 0);
3243	Assert(pGMM->cAllocatedPages > 0);
3244	Assert(!pPage->Shared.cRefs);
3245
3246	pChunk->cShared--;
3247	pGMM->cAllocatedPages--;
3248	pGMM->cSharedPages--;
3249	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3250	}
3251
3252
3253	/**
3254	* Frees a private page, the page is known to exist and be valid and such.
3255	*
3256	* @param pGMM Pointer to the GMM instance.
3257	* @param pGVM Pointer to the GVM instance.
3258	* @param idPage The Page ID
3259	* @param pPage The page structure.
3260	*/
3261	DECLINLINE(void) gmmR0FreePrivatePage(PGMM pGMM, PGVM pGVM, uint32_t idPage, PGMMPAGE pPage)
3262	{
3263	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
3264	Assert(pChunk);
3265	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
3266	Assert(pChunk->cPrivate > 0);
3267	Assert(pGMM->cAllocatedPages > 0);
3268
3269	pChunk->cPrivate--;
3270	pGMM->cAllocatedPages--;
3271	gmmR0FreePageWorker(pGMM, pGVM, pChunk, idPage, pPage);
3272	}
3273
3274
3275	/**
3276	* Common worker for GMMR0FreePages and GMMR0BalloonedPages.
3277	*
3278	* @returns VBox status code:
3279	* @retval xxx
3280	*
3281	* @param pGMM Pointer to the GMM instance data.
3282	* @param pGVM Pointer to the shared VM structure.
3283	* @param cPages The number of pages to free.
3284	* @param paPages Pointer to the page descriptors.
3285	* @param enmAccount The account this relates to.
3286	*/
3287	static int gmmR0FreePages(PGMM pGMM, PGVM pGVM, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3288	{
3289	/*
3290	* Check that the request isn't impossible wrt to the account status.
3291	*/
3292	switch (enmAccount)
3293	{
3294	case GMMACCOUNT_BASE:
3295	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cBasePages < cPages))
3296	{
3297	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cBasePages, cPages));
3298	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3299	}
3300	break;
3301	case GMMACCOUNT_SHADOW:
3302	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cShadowPages < cPages))
3303	{
3304	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cShadowPages, cPages));
3305	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3306	}
3307	break;
3308	case GMMACCOUNT_FIXED:
3309	if (RT_UNLIKELY(pGVM->gmm.s.Allocated.cFixedPages < cPages))
3310	{
3311	Log(("gmmR0FreePages: allocated=%#llx cPages=%#x!\n", pGVM->gmm.s.Allocated.cFixedPages, cPages));
3312	return VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3313	}
3314	break;
3315	default:
3316	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3317	}
3318
3319	/*
3320	* Walk the descriptors and free the pages.
3321	*
3322	* Statistics (except the account) are being updated as we go along,
3323	* unlike the alloc code. Also, stop on the first error.
3324	*/
3325	int rc = VINF_SUCCESS;
3326	uint32_t iPage;
3327	for (iPage = 0; iPage < cPages; iPage++)
3328	{
3329	uint32_t idPage = paPages[iPage].idPage;
3330	PGMMPAGE pPage = gmmR0GetPage(pGMM, idPage);
3331	if (RT_LIKELY(pPage))
3332	{
3333	if (RT_LIKELY(GMM_PAGE_IS_PRIVATE(pPage)))
3334	{
3335	if (RT_LIKELY(pPage->Private.hGVM == pGVM->hSelf))
3336	{
3337	Assert(pGVM->gmm.s.cPrivatePages);
3338	pGVM->gmm.s.cPrivatePages--;
3339	gmmR0FreePrivatePage(pGMM, pGVM, idPage, pPage);
3340	}
3341	else
3342	{
3343	Log(("gmmR0AllocatePages: #%#x/%#x: not owner! hGVM=%#x hSelf=%#x\n", iPage, idPage,
3344	pPage->Private.hGVM, pGVM->hSelf));
3345	rc = VERR_GMM_NOT_PAGE_OWNER;
3346	break;
3347	}
3348	}
3349	else if (RT_LIKELY(GMM_PAGE_IS_SHARED(pPage)))
3350	{
3351	Assert(pGVM->gmm.s.cSharedPages);
3352	pGVM->gmm.s.cSharedPages--;
3353	Assert(pPage->Shared.cRefs);
3354	if (!--pPage->Shared.cRefs)
3355	gmmR0FreeSharedPage(pGMM, pGVM, idPage, pPage);
3356	else
3357	{
3358	Assert(pGMM->cDuplicatePages);
3359	pGMM->cDuplicatePages--;
3360	}
3361	}
3362	else
3363	{
3364	Log(("gmmR0AllocatePages: #%#x/%#x: already free!\n", iPage, idPage));
3365	rc = VERR_GMM_PAGE_ALREADY_FREE;
3366	break;
3367	}
3368	}
3369	else
3370	{
3371	Log(("gmmR0AllocatePages: #%#x/%#x: not found!\n", iPage, idPage));
3372	rc = VERR_GMM_PAGE_NOT_FOUND;
3373	break;
3374	}
3375	paPages[iPage].idPage = NIL_GMM_PAGEID;
3376	}
3377
3378	/*
3379	* Update the account.
3380	*/
3381	switch (enmAccount)
3382	{
3383	case GMMACCOUNT_BASE: pGVM->gmm.s.Allocated.cBasePages -= iPage; break;
3384	case GMMACCOUNT_SHADOW: pGVM->gmm.s.Allocated.cShadowPages -= iPage; break;
3385	case GMMACCOUNT_FIXED: pGVM->gmm.s.Allocated.cFixedPages -= iPage; break;
3386	default:
3387	AssertMsgFailedReturn(("enmAccount=%d\n", enmAccount), VERR_IPE_NOT_REACHED_DEFAULT_CASE);
3388	}
3389
3390	/*
3391	* Any threshold stuff to be done here?
3392	*/
3393
3394	return rc;
3395	}
3396
3397
3398	/**
3399	* Free one or more pages.
3400	*
3401	* This is typically used at reset time or power off.
3402	*
3403	* @returns VBox status code:
3404	* @retval xxx
3405	*
3406	* @param pVM Pointer to the shared VM structure.
3407	* @param idCpu VCPU id
3408	* @param cPages The number of pages to allocate.
3409	* @param paPages Pointer to the page descriptors containing the Page IDs for each page.
3410	* @param enmAccount The account this relates to.
3411	* @thread EMT.
3412	*/
3413	GMMR0DECL(int) GMMR0FreePages(PVM pVM, VMCPUID idCpu, uint32_t cPages, PGMMFREEPAGEDESC paPages, GMMACCOUNT enmAccount)
3414	{
3415	LogFlow(("GMMR0FreePages: pVM=%p cPages=%#x paPages=%p enmAccount=%d\n", pVM, cPages, paPages, enmAccount));
3416
3417	/*
3418	* Validate input and get the basics.
3419	*/
3420	PGMM pGMM;
3421	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3422	PGVM pGVM;
3423	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3424	if (RT_FAILURE(rc))
3425	return rc;
3426
3427	AssertPtrReturn(paPages, VERR_INVALID_PARAMETER);
3428	AssertMsgReturn(enmAccount > GMMACCOUNT_INVALID && enmAccount < GMMACCOUNT_END, ("%d\n", enmAccount), VERR_INVALID_PARAMETER);
3429	AssertMsgReturn(cPages > 0 && cPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cPages), VERR_INVALID_PARAMETER);
3430
3431	for (unsigned iPage = 0; iPage < cPages; iPage++)
3432	AssertMsgReturn( paPages[iPage].idPage <= GMM_PAGEID_LAST
3433	/\|\| paPages[iPage].idPage == NIL_GMM_PAGEID/,
3434	("#%#x: %#x\n", iPage, paPages[iPage].idPage), VERR_INVALID_PARAMETER);
3435
3436	/*
3437	* Take the semaphore and call the worker function.
3438	*/
3439	gmmR0MutexAcquire(pGMM);
3440	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3441	{
3442	rc = gmmR0FreePages(pGMM, pGVM, cPages, paPages, enmAccount);
3443	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3444	}
3445	else
3446	rc = VERR_GMM_IS_NOT_SANE;
3447	gmmR0MutexRelease(pGMM);
3448	LogFlow(("GMMR0FreePages: returns %Rrc\n", rc));
3449	return rc;
3450	}
3451
3452
3453	/**
3454	* VMMR0 request wrapper for GMMR0FreePages.
3455	*
3456	* @returns see GMMR0FreePages.
3457	* @param pVM Pointer to the shared VM structure.
3458	* @param idCpu VCPU id
3459	* @param pReq The request packet.
3460	*/
3461	GMMR0DECL(int) GMMR0FreePagesReq(PVM pVM, VMCPUID idCpu, PGMMFREEPAGESREQ pReq)
3462	{
3463	/*
3464	* Validate input and pass it on.
3465	*/
3466	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3467	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3468	AssertMsgReturn(pReq->Hdr.cbReq >= RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0]),
3469	("%#x < %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[0])),
3470	VERR_INVALID_PARAMETER);
3471	AssertMsgReturn(pReq->Hdr.cbReq == RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages]),
3472	("%#x != %#x\n", pReq->Hdr.cbReq, RT_UOFFSETOF(GMMFREEPAGESREQ, aPages[pReq->cPages])),
3473	VERR_INVALID_PARAMETER);
3474
3475	return GMMR0FreePages(pVM, idCpu, pReq->cPages, &pReq->aPages[0], pReq->enmAccount);
3476	}
3477
3478
3479	/**
3480	* Report back on a memory ballooning request.
3481	*
3482	* The request may or may not have been initiated by the GMM. If it was initiated
3483	* by the GMM it is important that this function is called even if no pages were
3484	* ballooned.
3485	*
3486	* @returns VBox status code:
3487	* @retval VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH
3488	* @retval VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH
3489	* @retval VERR_GMM_OVERCOMMITTED_TRY_AGAIN_IN_A_BIT - reset condition
3490	* indicating that we won't necessarily have sufficient RAM to boot
3491	* the VM again and that it should pause until this changes (we'll try
3492	* balloon some other VM). (For standard deflate we have little choice
3493	* but to hope the VM won't use the memory that was returned to it.)
3494	*
3495	* @param pVM Pointer to the shared VM structure.
3496	* @param idCpu VCPU id
3497	* @param enmAction Inflate/deflate/reset
3498	* @param cBalloonedPages The number of pages that was ballooned.
3499	*
3500	* @thread EMT.
3501	*/
3502	GMMR0DECL(int) GMMR0BalloonedPages(PVM pVM, VMCPUID idCpu, GMMBALLOONACTION enmAction, uint32_t cBalloonedPages)
3503	{
3504	LogFlow(("GMMR0BalloonedPages: pVM=%p enmAction=%d cBalloonedPages=%#x\n",
3505	pVM, enmAction, cBalloonedPages));
3506
3507	AssertMsgReturn(cBalloonedPages < RT_BIT(32 - PAGE_SHIFT), ("%#x\n", cBalloonedPages), VERR_INVALID_PARAMETER);
3508
3509	/*
3510	* Validate input and get the basics.
3511	*/
3512	PGMM pGMM;
3513	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3514	PGVM pGVM;
3515	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3516	if (RT_FAILURE(rc))
3517	return rc;
3518
3519	/*
3520	* Take the semaphore and do some more validations.
3521	*/
3522	gmmR0MutexAcquire(pGMM);
3523	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3524	{
3525	switch (enmAction)
3526	{
3527	case GMMBALLOONACTION_INFLATE:
3528	{
3529	if (RT_LIKELY(pGVM->gmm.s.Allocated.cBasePages + pGVM->gmm.s.cBalloonedPages + cBalloonedPages <= pGVM->gmm.s.Reserved.cBasePages))
3530	{
3531	/*
3532	* Record the ballooned memory.
3533	*/
3534	pGMM->cBalloonedPages += cBalloonedPages;
3535	if (pGVM->gmm.s.cReqBalloonedPages)
3536	{
3537	/* Codepath never taken. Might be interesting in the future to request ballooned memory from guests in low memory conditions.. */
3538	AssertFailed();
3539
3540	pGVM->gmm.s.cBalloonedPages += cBalloonedPages;
3541	pGVM->gmm.s.cReqActuallyBalloonedPages += cBalloonedPages;
3542	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx Req=%#llx Actual=%#llx (pending)\n", cBalloonedPages,
3543	pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages, pGVM->gmm.s.cReqBalloonedPages, pGVM->gmm.s.cReqActuallyBalloonedPages));
3544	}
3545	else
3546	{
3547	pGVM->gmm.s.cBalloonedPages += cBalloonedPages;
3548	Log(("GMMR0BalloonedPages: +%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3549	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages));
3550	}
3551	}
3552	else
3553	{
3554	Log(("GMMR0BalloonedPages: cBasePages=%#llx Total=%#llx cBalloonedPages=%#llx Reserved=%#llx\n",
3555	pGVM->gmm.s.Allocated.cBasePages, pGVM->gmm.s.cBalloonedPages, cBalloonedPages, pGVM->gmm.s.Reserved.cBasePages));
3556	rc = VERR_GMM_ATTEMPT_TO_FREE_TOO_MUCH;
3557	}
3558	break;
3559	}
3560
3561	case GMMBALLOONACTION_DEFLATE:
3562	{
3563	/* Deflate. */
3564	if (pGVM->gmm.s.cBalloonedPages >= cBalloonedPages)
3565	{
3566	/*
3567	* Record the ballooned memory.
3568	*/
3569	Assert(pGMM->cBalloonedPages >= cBalloonedPages);
3570	pGMM->cBalloonedPages -= cBalloonedPages;
3571	pGVM->gmm.s.cBalloonedPages -= cBalloonedPages;
3572	if (pGVM->gmm.s.cReqDeflatePages)
3573	{
3574	AssertFailed(); /* This is path is for later. */
3575	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx Req=%#llx\n",
3576	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages, pGVM->gmm.s.cReqDeflatePages));
3577
3578	/*
3579	* Anything we need to do here now when the request has been completed?
3580	*/
3581	pGVM->gmm.s.cReqDeflatePages = 0;
3582	}
3583	else
3584	Log(("GMMR0BalloonedPages: -%#x - Global=%#llx / VM: Total=%#llx (user)\n",
3585	cBalloonedPages, pGMM->cBalloonedPages, pGVM->gmm.s.cBalloonedPages));
3586	}
3587	else
3588	{
3589	Log(("GMMR0BalloonedPages: Total=%#llx cBalloonedPages=%#llx\n", pGVM->gmm.s.cBalloonedPages, cBalloonedPages));
3590	rc = VERR_GMM_ATTEMPT_TO_DEFLATE_TOO_MUCH;
3591	}
3592	break;
3593	}
3594
3595	case GMMBALLOONACTION_RESET:
3596	{
3597	/* Reset to an empty balloon. */
3598	Assert(pGMM->cBalloonedPages >= pGVM->gmm.s.cBalloonedPages);
3599
3600	pGMM->cBalloonedPages -= pGVM->gmm.s.cBalloonedPages;
3601	pGVM->gmm.s.cBalloonedPages = 0;
3602	break;
3603	}
3604
3605	default:
3606	rc = VERR_INVALID_PARAMETER;
3607	break;
3608	}
3609	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3610	}
3611	else
3612	rc = VERR_GMM_IS_NOT_SANE;
3613
3614	gmmR0MutexRelease(pGMM);
3615	LogFlow(("GMMR0BalloonedPages: returns %Rrc\n", rc));
3616	return rc;
3617	}
3618
3619
3620	/**
3621	* VMMR0 request wrapper for GMMR0BalloonedPages.
3622	*
3623	* @returns see GMMR0BalloonedPages.
3624	* @param pVM Pointer to the shared VM structure.
3625	* @param idCpu VCPU id
3626	* @param pReq The request packet.
3627	*/
3628	GMMR0DECL(int) GMMR0BalloonedPagesReq(PVM pVM, VMCPUID idCpu, PGMMBALLOONEDPAGESREQ pReq)
3629	{
3630	/*
3631	* Validate input and pass it on.
3632	*/
3633	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3634	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3635	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMBALLOONEDPAGESREQ),
3636	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMBALLOONEDPAGESREQ)),
3637	VERR_INVALID_PARAMETER);
3638
3639	return GMMR0BalloonedPages(pVM, idCpu, pReq->enmAction, pReq->cBalloonedPages);
3640	}
3641
3642	/**
3643	* Return memory statistics for the hypervisor
3644	*
3645	* @returns VBox status code:
3646	* @param pVM Pointer to the shared VM structure.
3647	* @param pReq The request packet.
3648	*/
3649	GMMR0DECL(int) GMMR0QueryHypervisorMemoryStatsReq(PVM pVM, PGMMMEMSTATSREQ pReq)
3650	{
3651	/*
3652	* Validate input and pass it on.
3653	*/
3654	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3655	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3656	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3657	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3658	VERR_INVALID_PARAMETER);
3659
3660	/*
3661	* Validate input and get the basics.
3662	*/
3663	PGMM pGMM;
3664	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3665	pReq->cAllocPages = pGMM->cAllocatedPages;
3666	pReq->cFreePages = (pGMM->cChunks << (GMM_CHUNK_SHIFT- PAGE_SHIFT)) - pGMM->cAllocatedPages;
3667	pReq->cBalloonedPages = pGMM->cBalloonedPages;
3668	pReq->cMaxPages = pGMM->cMaxPages;
3669	pReq->cSharedPages = pGMM->cDuplicatePages;
3670	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
3671
3672	return VINF_SUCCESS;
3673	}
3674
3675	/**
3676	* Return memory statistics for the VM
3677	*
3678	* @returns VBox status code:
3679	* @param pVM Pointer to the shared VM structure.
3680	* @parma idCpu Cpu id.
3681	* @param pReq The request packet.
3682	*/
3683	GMMR0DECL(int) GMMR0QueryMemoryStatsReq(PVM pVM, VMCPUID idCpu, PGMMMEMSTATSREQ pReq)
3684	{
3685	/*
3686	* Validate input and pass it on.
3687	*/
3688	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
3689	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
3690	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(GMMMEMSTATSREQ),
3691	("%#x < %#x\n", pReq->Hdr.cbReq, sizeof(GMMMEMSTATSREQ)),
3692	VERR_INVALID_PARAMETER);
3693
3694	/*
3695	* Validate input and get the basics.
3696	*/
3697	PGMM pGMM;
3698	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3699	PGVM pGVM;
3700	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
3701	if (RT_FAILURE(rc))
3702	return rc;
3703
3704	/*
3705	* Take the semaphore and do some more validations.
3706	*/
3707	gmmR0MutexAcquire(pGMM);
3708	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
3709	{
3710	pReq->cAllocPages = pGVM->gmm.s.Allocated.cBasePages;
3711	pReq->cBalloonedPages = pGVM->gmm.s.cBalloonedPages;
3712	pReq->cMaxPages = pGVM->gmm.s.Reserved.cBasePages;
3713	pReq->cFreePages = pReq->cMaxPages - pReq->cAllocPages;
3714	}
3715	else
3716	rc = VERR_GMM_IS_NOT_SANE;
3717
3718	gmmR0MutexRelease(pGMM);
3719	LogFlow(("GMMR3QueryVMMemoryStats: returns %Rrc\n", rc));
3720	return rc;
3721	}
3722
3723
3724	/**
3725	* Worker for gmmR0UnmapChunk and gmmr0FreeChunk.
3726	*
3727	* Don't call this in legacy allocation mode!
3728	*
3729	* @returns VBox status code.
3730	* @param pGMM Pointer to the GMM instance data.
3731	* @param pGVM Pointer to the Global VM structure.
3732	* @param pChunk Pointer to the chunk to be unmapped.
3733	*/
3734	static int gmmR0UnmapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk)
3735	{
3736	Assert(!pGMM->fLegacyAllocationMode);
3737
3738	/*
3739	* Find the mapping and try unmapping it.
3740	*/
3741	uint32_t cMappings = pChunk->cMappingsX;
3742	for (uint32_t i = 0; i < cMappings; i++)
3743	{
3744	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3745	if (pChunk->paMappingsX[i].pGVM == pGVM)
3746	{
3747	/* unmap */
3748	int rc = RTR0MemObjFree(pChunk->paMappingsX[i].hMapObj, false /* fFreeMappings (NA) */);
3749	if (RT_SUCCESS(rc))
3750	{
3751	/* update the record. */
3752	cMappings--;
3753	if (i < cMappings)
3754	pChunk->paMappingsX[i] = pChunk->paMappingsX[cMappings];
3755	pChunk->paMappingsX[cMappings].hMapObj = NIL_RTR0MEMOBJ;
3756	pChunk->paMappingsX[cMappings].pGVM = NULL;
3757	Assert(pChunk->cMappingsX - 1U == cMappings);
3758	pChunk->cMappingsX = cMappings;
3759	}
3760
3761	return rc;
3762	}
3763	}
3764
3765	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3766	return VERR_GMM_CHUNK_NOT_MAPPED;
3767	}
3768
3769
3770	/**
3771	* Unmaps a chunk previously mapped into the address space of the current process.
3772	*
3773	* @returns VBox status code.
3774	* @param pGMM Pointer to the GMM instance data.
3775	* @param pGVM Pointer to the Global VM structure.
3776	* @param pChunk Pointer to the chunk to be unmapped.
3777	*/
3778	static int gmmR0UnmapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem)
3779	{
3780	if (!pGMM->fLegacyAllocationMode)
3781	{
3782	/*
3783	* Lock the chunk and if possible leave the giant GMM lock.
3784	*/
3785	GMMR0CHUNKMTXSTATE MtxState;
3786	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3787	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3788	if (RT_SUCCESS(rc))
3789	{
3790	rc = gmmR0UnmapChunkLocked(pGMM, pGVM, pChunk);
3791	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3792	}
3793	return rc;
3794	}
3795
3796	if (pChunk->hGVM == pGVM->hSelf)
3797	return VINF_SUCCESS;
3798
3799	Log(("gmmR0UnmapChunk: Chunk %#x is not mapped into pGVM=%p/%#x (legacy)\n", pChunk->Core.Key, pGVM, pGVM->hSelf));
3800	return VERR_GMM_CHUNK_NOT_MAPPED;
3801	}
3802
3803
3804	/**
3805	* Worker for gmmR0MapChunk.
3806	*
3807	* @returns VBox status code.
3808	* @param pGMM Pointer to the GMM instance data.
3809	* @param pGVM Pointer to the Global VM structure.
3810	* @param pChunk Pointer to the chunk to be mapped.
3811	* @param ppvR3 Where to store the ring-3 address of the mapping.
3812	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3813	* contain the address of the existing mapping.
3814	*/
3815	static int gmmR0MapChunkLocked(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3816	{
3817	/*
3818	* If we're in legacy mode this is simple.
3819	*/
3820	if (pGMM->fLegacyAllocationMode)
3821	{
3822	if (pChunk->hGVM != pGVM->hSelf)
3823	{
3824	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3825	return VERR_GMM_CHUNK_NOT_FOUND;
3826	}
3827
3828	*ppvR3 = RTR0MemObjAddressR3(pChunk->hMemObj);
3829	return VINF_SUCCESS;
3830	}
3831
3832	/*
3833	* Check to see if the chunk is already mapped.
3834	*/
3835	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3836	{
3837	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3838	if (pChunk->paMappingsX[i].pGVM == pGVM)
3839	{
3840	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3841	Log(("gmmR0MapChunk: chunk %#x is already mapped at %p!\n", pChunk->Core.Key, *ppvR3));
3842	#ifdef VBOX_WITH_PAGE_SHARING
3843	/* The ring-3 chunk cache can be out of sync; don't fail. */
3844	return VINF_SUCCESS;
3845	#else
3846	return VERR_GMM_CHUNK_ALREADY_MAPPED;
3847	#endif
3848	}
3849	}
3850
3851	/*
3852	* Do the mapping.
3853	*/
3854	RTR0MEMOBJ hMapObj;
3855	int rc = RTR0MemObjMapUser(&hMapObj, pChunk->hMemObj, (RTR3PTR)-1, 0, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
3856	if (RT_SUCCESS(rc))
3857	{
3858	/* reallocate the array? assumes few users per chunk (usually one). */
3859	unsigned iMapping = pChunk->cMappingsX;
3860	if ( iMapping <= 3
3861	\|\| (iMapping & 3) == 0)
3862	{
3863	unsigned cNewSize = iMapping <= 3
3864	? iMapping + 1
3865	: iMapping + 4;
3866	Assert(cNewSize < 4 \|\| RT_ALIGN_32(cNewSize, 4) == cNewSize);
3867	if (RT_UNLIKELY(cNewSize > UINT16_MAX))
3868	{
3869	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3870	return VERR_GMM_TOO_MANY_CHUNK_MAPPINGS;
3871	}
3872
3873	void pvMappings = RTMemRealloc(pChunk->paMappingsX, cNewSize sizeof(pChunk->paMappingsX[0]));
3874	if (RT_UNLIKELY(!pvMappings))
3875	{
3876	rc = RTR0MemObjFree(hMapObj, false /* fFreeMappings (NA) */); AssertRC(rc);
3877	return VERR_NO_MEMORY;
3878	}
3879	pChunk->paMappingsX = (PGMMCHUNKMAP)pvMappings;
3880	}
3881
3882	/* insert new entry */
3883	pChunk->paMappingsX[iMapping].hMapObj = hMapObj;
3884	pChunk->paMappingsX[iMapping].pGVM = pGVM;
3885	Assert(pChunk->cMappingsX == iMapping);
3886	pChunk->cMappingsX = iMapping + 1;
3887
3888	*ppvR3 = RTR0MemObjAddressR3(hMapObj);
3889	}
3890
3891	return rc;
3892	}
3893
3894
3895	/**
3896	* Maps a chunk into the user address space of the current process.
3897	*
3898	* @returns VBox status code.
3899	* @param pGMM Pointer to the GMM instance data.
3900	* @param pGVM Pointer to the Global VM structure.
3901	* @param pChunk Pointer to the chunk to be mapped.
3902	* @param fRelaxedSem Whether we can release the semaphore while doing the
3903	* mapping (@c true) or not.
3904	* @param ppvR3 Where to store the ring-3 address of the mapping.
3905	* In the VERR_GMM_CHUNK_ALREADY_MAPPED case, this will be
3906	* contain the address of the existing mapping.
3907	*/
3908	static int gmmR0MapChunk(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, bool fRelaxedSem, PRTR3PTR ppvR3)
3909	{
3910	/*
3911	* Take the chunk lock and leave the giant GMM lock when possible, then
3912	* call the worker function.
3913	*/
3914	GMMR0CHUNKMTXSTATE MtxState;
3915	int rc = gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk,
3916	fRelaxedSem ? GMMR0CHUNK_MTX_RETAKE_GIANT : GMMR0CHUNK_MTX_KEEP_GIANT);
3917	if (RT_SUCCESS(rc))
3918	{
3919	rc = gmmR0MapChunkLocked(pGMM, pGVM, pChunk, ppvR3);
3920	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3921	}
3922
3923	return rc;
3924	}
3925
3926
3927
3928	#if defined(VBOX_WITH_PAGE_SHARING) \|\| (defined(VBOX_STRICT) && HC_ARCH_BITS == 64)
3929	/**
3930	* Check if a chunk is mapped into the specified VM
3931	*
3932	* @returns mapped yes/no
3933	* @param pGMM Pointer to the GMM instance.
3934	* @param pGVM Pointer to the Global VM structure.
3935	* @param pChunk Pointer to the chunk to be mapped.
3936	* @param ppvR3 Where to store the ring-3 address of the mapping.
3937	*/
3938	static int gmmR0IsChunkMapped(PGMM pGMM, PGVM pGVM, PGMMCHUNK pChunk, PRTR3PTR ppvR3)
3939	{
3940	GMMR0CHUNKMTXSTATE MtxState;
3941	gmmR0ChunkMutexAcquire(&MtxState, pGMM, pChunk, GMMR0CHUNK_MTX_KEEP_GIANT);
3942	for (uint32_t i = 0; i < pChunk->cMappingsX; i++)
3943	{
3944	Assert(pChunk->paMappingsX[i].pGVM && pChunk->paMappingsX[i].hMapObj != NIL_RTR0MEMOBJ);
3945	if (pChunk->paMappingsX[i].pGVM == pGVM)
3946	{
3947	*ppvR3 = RTR0MemObjAddressR3(pChunk->paMappingsX[i].hMapObj);
3948	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3949	return true;
3950	}
3951	}
3952	*ppvR3 = NULL;
3953	gmmR0ChunkMutexRelease(&MtxState, pChunk);
3954	return false;
3955	}
3956	#endif /* VBOX_WITH_PAGE_SHARING \|\| (VBOX_STRICT && 64-BIT) */
3957
3958
3959	/**
3960	* Map a chunk and/or unmap another chunk.
3961	*
3962	* The mapping and unmapping applies to the current process.
3963	*
3964	* This API does two things because it saves a kernel call per mapping when
3965	* when the ring-3 mapping cache is full.
3966	*
3967	* @returns VBox status code.
3968	* @param pVM The VM.
3969	* @param idChunkMap The chunk to map. NIL_GMM_CHUNKID if nothing to map.
3970	* @param idChunkUnmap The chunk to unmap. NIL_GMM_CHUNKID if nothing to unmap.
3971	* @param ppvR3 Where to store the address of the mapped chunk. NULL is ok if nothing to map.
3972	* @thread EMT
3973	*/
3974	GMMR0DECL(int) GMMR0MapUnmapChunk(PVM pVM, uint32_t idChunkMap, uint32_t idChunkUnmap, PRTR3PTR ppvR3)
3975	{
3976	LogFlow(("GMMR0MapUnmapChunk: pVM=%p idChunkMap=%#x idChunkUnmap=%#x ppvR3=%p\n",
3977	pVM, idChunkMap, idChunkUnmap, ppvR3));
3978
3979	/*
3980	* Validate input and get the basics.
3981	*/
3982	PGMM pGMM;
3983	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
3984	PGVM pGVM;
3985	int rc = GVMMR0ByVM(pVM, &pGVM);
3986	if (RT_FAILURE(rc))
3987	return rc;
3988
3989	AssertCompile(NIL_GMM_CHUNKID == 0);
3990	AssertMsgReturn(idChunkMap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkMap), VERR_INVALID_PARAMETER);
3991	AssertMsgReturn(idChunkUnmap <= GMM_CHUNKID_LAST, ("%#x\n", idChunkUnmap), VERR_INVALID_PARAMETER);
3992
3993	if ( idChunkMap == NIL_GMM_CHUNKID
3994	&& idChunkUnmap == NIL_GMM_CHUNKID)
3995	return VERR_INVALID_PARAMETER;
3996
3997	if (idChunkMap != NIL_GMM_CHUNKID)
3998	{
3999	AssertPtrReturn(ppvR3, VERR_INVALID_POINTER);
4000	*ppvR3 = NIL_RTR3PTR;
4001	}
4002
4003	/*
4004	* Take the semaphore and do the work.
4005	*
4006	* The unmapping is done last since it's easier to undo a mapping than
4007	* undoing an unmapping. The ring-3 mapping cache cannot not be so big
4008	* that it pushes the user virtual address space to within a chunk of
4009	* it it's limits, so, no problem here.
4010	*/
4011	gmmR0MutexAcquire(pGMM);
4012	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4013	{
4014	PGMMCHUNK pMap = NULL;
4015	if (idChunkMap != NIL_GVM_HANDLE)
4016	{
4017	pMap = gmmR0GetChunk(pGMM, idChunkMap);
4018	if (RT_LIKELY(pMap))
4019	rc = gmmR0MapChunk(pGMM, pGVM, pMap, true /fRelaxedSem/, ppvR3);
4020	else
4021	{
4022	Log(("GMMR0MapUnmapChunk: idChunkMap=%#x\n", idChunkMap));
4023	rc = VERR_GMM_CHUNK_NOT_FOUND;
4024	}
4025	}
4026	/** @todo split this operation, the bail out might (theoretcially) not be
4027	* entirely safe. */
4028
4029	if ( idChunkUnmap != NIL_GMM_CHUNKID
4030	&& RT_SUCCESS(rc))
4031	{
4032	PGMMCHUNK pUnmap = gmmR0GetChunk(pGMM, idChunkUnmap);
4033	if (RT_LIKELY(pUnmap))
4034	rc = gmmR0UnmapChunk(pGMM, pGVM, pUnmap, true /fRelaxedSem/);
4035	else
4036	{
4037	Log(("GMMR0MapUnmapChunk: idChunkUnmap=%#x\n", idChunkUnmap));
4038	rc = VERR_GMM_CHUNK_NOT_FOUND;
4039	}
4040
4041	if (RT_FAILURE(rc) && pMap)
4042	gmmR0UnmapChunk(pGMM, pGVM, pMap, false /fRelaxedSem/);
4043	}
4044
4045	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4046	}
4047	else
4048	rc = VERR_GMM_IS_NOT_SANE;
4049	gmmR0MutexRelease(pGMM);
4050
4051	LogFlow(("GMMR0MapUnmapChunk: returns %Rrc\n", rc));
4052	return rc;
4053	}
4054
4055
4056	/**
4057	* VMMR0 request wrapper for GMMR0MapUnmapChunk.
4058	*
4059	* @returns see GMMR0MapUnmapChunk.
4060	* @param pVM Pointer to the shared VM structure.
4061	* @param pReq The request packet.
4062	*/
4063	GMMR0DECL(int) GMMR0MapUnmapChunkReq(PVM pVM, PGMMMAPUNMAPCHUNKREQ pReq)
4064	{
4065	/*
4066	* Validate input and pass it on.
4067	*/
4068	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4069	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4070	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4071
4072	return GMMR0MapUnmapChunk(pVM, pReq->idChunkMap, pReq->idChunkUnmap, &pReq->pvR3);
4073	}
4074
4075
4076	/**
4077	* Legacy mode API for supplying pages.
4078	*
4079	* The specified user address points to a allocation chunk sized block that
4080	* will be locked down and used by the GMM when the GM asks for pages.
4081	*
4082	* @returns VBox status code.
4083	* @param pVM The VM.
4084	* @param idCpu VCPU id
4085	* @param pvR3 Pointer to the chunk size memory block to lock down.
4086	*/
4087	GMMR0DECL(int) GMMR0SeedChunk(PVM pVM, VMCPUID idCpu, RTR3PTR pvR3)
4088	{
4089	/*
4090	* Validate input and get the basics.
4091	*/
4092	PGMM pGMM;
4093	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4094	PGVM pGVM;
4095	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4096	if (RT_FAILURE(rc))
4097	return rc;
4098
4099	AssertPtrReturn(pvR3, VERR_INVALID_POINTER);
4100	AssertReturn(!(PAGE_OFFSET_MASK & pvR3), VERR_INVALID_POINTER);
4101
4102	if (!pGMM->fLegacyAllocationMode)
4103	{
4104	Log(("GMMR0SeedChunk: not in legacy allocation mode!\n"));
4105	return VERR_NOT_SUPPORTED;
4106	}
4107
4108	/*
4109	* Lock the memory and add it as new chunk with our hGVM.
4110	* (The GMM locking is done inside gmmR0RegisterChunk.)
4111	*/
4112	RTR0MEMOBJ MemObj;
4113	rc = RTR0MemObjLockUser(&MemObj, pvR3, GMM_CHUNK_SIZE, RTMEM_PROT_READ \| RTMEM_PROT_WRITE, NIL_RTR0PROCESS);
4114	if (RT_SUCCESS(rc))
4115	{
4116	rc = gmmR0RegisterChunk(pGMM, &pGVM->gmm.s.Private, MemObj, pGVM->hSelf, 0 /fChunkFlags/, NULL);
4117	if (RT_SUCCESS(rc))
4118	gmmR0MutexRelease(pGMM);
4119	else
4120	RTR0MemObjFree(MemObj, false /* fFreeMappings */);
4121	}
4122
4123	LogFlow(("GMMR0SeedChunk: rc=%d (pvR3=%p)\n", rc, pvR3));
4124	return rc;
4125	}
4126
4127
4128	typedef struct
4129	{
4130	PAVLGCPTRNODECORE pNode;
4131	char *pszModuleName;
4132	char *pszVersion;
4133	VBOXOSFAMILY enmGuestOS;
4134	} GMMFINDMODULEBYNAME, *PGMMFINDMODULEBYNAME;
4135
4136	/**
4137	* Tree enumeration callback for finding identical modules by name and version
4138	*/
4139	DECLCALLBACK(int) gmmR0CheckForIdenticalModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4140	{
4141	PGMMFINDMODULEBYNAME pInfo = (PGMMFINDMODULEBYNAME)pvUser;
4142	PGMMSHAREDMODULE pModule = (PGMMSHAREDMODULE)pNode;
4143
4144	if ( pInfo
4145	&& pInfo->enmGuestOS == pModule->enmGuestOS
4146	/** @todo replace with RTStrNCmp */
4147	&& !strcmp(pModule->szName, pInfo->pszModuleName)
4148	&& !strcmp(pModule->szVersion, pInfo->pszVersion))
4149	{
4150	pInfo->pNode = pNode;
4151	return 1; /* stop search */
4152	}
4153	return 0;
4154	}
4155
4156
4157	/**
4158	* Registers a new shared module for the VM
4159	*
4160	* @returns VBox status code.
4161	* @param pVM VM handle
4162	* @param idCpu VCPU id
4163	* @param enmGuestOS Guest OS type
4164	* @param pszModuleName Module name
4165	* @param pszVersion Module version
4166	* @param GCBaseAddr Module base address
4167	* @param cbModule Module size
4168	* @param cRegions Number of shared region descriptors
4169	* @param pRegions Shared region(s)
4170	*/
4171	GMMR0DECL(int) GMMR0RegisterSharedModule(PVM pVM, VMCPUID idCpu, VBOXOSFAMILY enmGuestOS, char *pszModuleName,
4172	char *pszVersion, RTGCPTR GCBaseAddr, uint32_t cbModule,
4173	unsigned cRegions, VMMDEVSHAREDREGIONDESC *pRegions)
4174	{
4175	#ifdef VBOX_WITH_PAGE_SHARING
4176	/*
4177	* Validate input and get the basics.
4178	*/
4179	PGMM pGMM;
4180	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4181	PGVM pGVM;
4182	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4183	if (RT_FAILURE(rc))
4184	return rc;
4185
4186	Log(("GMMR0RegisterSharedModule %s %s base %RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4187
4188	/*
4189	* Take the semaphore and do some more validations.
4190	*/
4191	gmmR0MutexAcquire(pGMM);
4192	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4193	{
4194	bool fNewModule = false;
4195
4196	/* Check if this module is already locally registered. */
4197	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4198	if (!pRecVM)
4199	{
4200	pRecVM = (PGMMSHAREDMODULEPERVM)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULEPERVM, aRegions[cRegions]));
4201	if (!pRecVM)
4202	{
4203	AssertFailed();
4204	rc = VERR_NO_MEMORY;
4205	goto end;
4206	}
4207	pRecVM->Core.Key = GCBaseAddr;
4208	pRecVM->cRegions = cRegions;
4209
4210	/* Save the region data as they can differ between VMs (address space scrambling or simply different loading order) */
4211	for (unsigned i = 0; i < cRegions; i++)
4212	{
4213	pRecVM->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4214	pRecVM->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4215	pRecVM->aRegions[i].u32Alignment = 0;
4216	pRecVM->aRegions[i].paHCPhysPageID = NULL; /* unused */
4217	}
4218
4219	bool fInsert = RTAvlGCPtrInsert(&pGVM->gmm.s.pSharedModuleTree, &pRecVM->Core);
4220	Assert(fInsert); NOREF(fInsert);
4221
4222	Log(("GMMR0RegisterSharedModule: new local module %s\n", pszModuleName));
4223	fNewModule = true;
4224	}
4225	else
4226	rc = VINF_PGM_SHARED_MODULE_ALREADY_REGISTERED;
4227
4228	/* Check if this module is already globally registered. */
4229	PGMMSHAREDMODULE pGlobalModule = (PGMMSHAREDMODULE)RTAvlGCPtrGet(&pGMM->pGlobalSharedModuleTree, GCBaseAddr);
4230	if ( !pGlobalModule
4231	&& enmGuestOS == VBOXOSFAMILY_Windows64)
4232	{
4233	/* Two identical copies of e.g. Win7 x64 will typically not have a similar virtual address space layout for dlls or kernel modules.
4234	* Try to find identical binaries based on name and version.
4235	*/
4236	GMMFINDMODULEBYNAME Info;
4237
4238	Info.pNode = NULL;
4239	Info.pszVersion = pszVersion;
4240	Info.pszModuleName = pszModuleName;
4241	Info.enmGuestOS = enmGuestOS;
4242
4243	Log(("Try to find identical module %s\n", pszModuleName));
4244	int ret = RTAvlGCPtrDoWithAll(&pGMM->pGlobalSharedModuleTree, true /* fFromLeft */, gmmR0CheckForIdenticalModule, &Info);
4245	if (ret == 1)
4246	{
4247	Assert(Info.pNode);
4248	pGlobalModule = (PGMMSHAREDMODULE)Info.pNode;
4249	Log(("Found identical module at %RGv\n", pGlobalModule->Core.Key));
4250	}
4251	}
4252
4253	if (!pGlobalModule)
4254	{
4255	Assert(fNewModule);
4256	Assert(!pRecVM->fCollision);
4257
4258	pGlobalModule = (PGMMSHAREDMODULE)RTMemAllocZ(RT_OFFSETOF(GMMSHAREDMODULE, aRegions[cRegions]));
4259	if (!pGlobalModule)
4260	{
4261	AssertFailed();
4262	rc = VERR_NO_MEMORY;
4263	goto end;
4264	}
4265
4266	pGlobalModule->Core.Key = GCBaseAddr;
4267	pGlobalModule->cbModule = cbModule;
4268	/* Input limit already safe; no need to check again. */
4269	/** @todo replace with RTStrCopy */
4270	strcpy(pGlobalModule->szName, pszModuleName);
4271	strcpy(pGlobalModule->szVersion, pszVersion);
4272
4273	pGlobalModule->enmGuestOS = enmGuestOS;
4274	pGlobalModule->cRegions = cRegions;
4275
4276	for (unsigned i = 0; i < cRegions; i++)
4277	{
4278	Log(("New region %d base=%RGv size %x\n", i, pRegions[i].GCRegionAddr, pRegions[i].cbRegion));
4279	pGlobalModule->aRegions[i].GCRegionAddr = pRegions[i].GCRegionAddr;
4280	pGlobalModule->aRegions[i].cbRegion = RT_ALIGN_T(pRegions[i].cbRegion, PAGE_SIZE, uint32_t);
4281	pGlobalModule->aRegions[i].u32Alignment = 0;
4282	pGlobalModule->aRegions[i].paHCPhysPageID = NULL; /* uninitialized. */
4283	}
4284
4285	/* Save reference. */
4286	pRecVM->pGlobalModule = pGlobalModule;
4287	pRecVM->fCollision = false;
4288	pGlobalModule->cUsers++;
4289	rc = VINF_SUCCESS;
4290
4291	bool fInsert = RTAvlGCPtrInsert(&pGMM->pGlobalSharedModuleTree, &pGlobalModule->Core);
4292	Assert(fInsert); NOREF(fInsert);
4293
4294	Log(("GMMR0RegisterSharedModule: new global module %s\n", pszModuleName));
4295	}
4296	else
4297	{
4298	Assert(pGlobalModule->cUsers > 0);
4299
4300	/* Make sure the name and version are identical. */
4301	/** @todo replace with RTStrNCmp */
4302	if ( !strcmp(pGlobalModule->szName, pszModuleName)
4303	&& !strcmp(pGlobalModule->szVersion, pszVersion))
4304	{
4305	/* Save reference. */
4306	pRecVM->pGlobalModule = pGlobalModule;
4307	if ( fNewModule
4308	\|\| pRecVM->fCollision == true) /* colliding module unregistered and new one registered since the last check */
4309	{
4310	pGlobalModule->cUsers++;
4311	Log(("GMMR0RegisterSharedModule: using existing module %s cUser=%d!\n", pszModuleName, pGlobalModule->cUsers));
4312	}
4313	pRecVM->fCollision = false;
4314	rc = VINF_SUCCESS;
4315	}
4316	else
4317	{
4318	Log(("GMMR0RegisterSharedModule: module %s collision!\n", pszModuleName));
4319	pRecVM->fCollision = true;
4320	rc = VINF_PGM_SHARED_MODULE_COLLISION;
4321	goto end;
4322	}
4323	}
4324
4325	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4326	}
4327	else
4328	rc = VERR_GMM_IS_NOT_SANE;
4329
4330	end:
4331	gmmR0MutexRelease(pGMM);
4332	return rc;
4333	#else
4334
4335	NOREF(pVM); NOREF(idCpu); NOREF(enmGuestOS); NOREF(pszModuleName); NOREF(pszVersion);
4336	NOREF(GCBaseAddr); NOREF(cbModule); NOREF(cRegions); NOREF(pRegions);
4337	return VERR_NOT_IMPLEMENTED;
4338	#endif
4339	}
4340
4341
4342	/**
4343	* VMMR0 request wrapper for GMMR0RegisterSharedModule.
4344	*
4345	* @returns see GMMR0RegisterSharedModule.
4346	* @param pVM Pointer to the shared VM structure.
4347	* @param idCpu VCPU id
4348	* @param pReq The request packet.
4349	*/
4350	GMMR0DECL(int) GMMR0RegisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMREGISTERSHAREDMODULEREQ pReq)
4351	{
4352	/*
4353	* Validate input and pass it on.
4354	*/
4355	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4356	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4357	AssertMsgReturn(pReq->Hdr.cbReq >= sizeof(pReq) && pReq->Hdr.cbReq == RT_UOFFSETOF(GMMREGISTERSHAREDMODULEREQ, aRegions[pReq->cRegions]), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4358
4359	/* Pass back return code in the request packet to preserve informational codes. (VMMR3CallR0 chokes on them) */
4360	pReq->rc = GMMR0RegisterSharedModule(pVM, idCpu, pReq->enmGuestOS, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule, pReq->cRegions, pReq->aRegions);
4361	return VINF_SUCCESS;
4362	}
4363
4364
4365	/**
4366	* Unregisters a shared module for the VM
4367	*
4368	* @returns VBox status code.
4369	* @param pVM VM handle
4370	* @param idCpu VCPU id
4371	* @param pszModuleName Module name
4372	* @param pszVersion Module version
4373	* @param GCBaseAddr Module base address
4374	* @param cbModule Module size
4375	*/
4376	GMMR0DECL(int) GMMR0UnregisterSharedModule(PVM pVM, VMCPUID idCpu, char pszModuleName, char pszVersion,
4377	RTGCPTR GCBaseAddr, uint32_t cbModule)
4378	{
4379	#ifdef VBOX_WITH_PAGE_SHARING
4380	/*
4381	* Validate input and get the basics.
4382	*/
4383	PGMM pGMM;
4384	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4385	PGVM pGVM;
4386	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4387	if (RT_FAILURE(rc))
4388	return rc;
4389
4390	Log(("GMMR0UnregisterSharedModule %s %s base=%RGv size %x\n", pszModuleName, pszVersion, GCBaseAddr, cbModule));
4391
4392	/*
4393	* Take the semaphore and do some more validations.
4394	*/
4395	gmmR0MutexAcquire(pGMM);
4396	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4397	{
4398	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)RTAvlGCPtrGet(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4399	if (pRecVM)
4400	{
4401	/* Remove reference to global shared module. */
4402	if (!pRecVM->fCollision)
4403	{
4404	PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4405	Assert(pRec);
4406
4407	if (pRec) /* paranoia */
4408	{
4409	Assert(pRec->cUsers);
4410	pRec->cUsers--;
4411	if (pRec->cUsers == 0)
4412	{
4413	/* Free the ranges, but leave the pages intact as there might still be references; they will be cleared by the COW mechanism. */
4414	for (unsigned i = 0; i < pRec->cRegions; i++)
4415	if (pRec->aRegions[i].paHCPhysPageID)
4416	RTMemFree(pRec->aRegions[i].paHCPhysPageID);
4417
4418	Assert(pRec->Core.Key == GCBaseAddr \|\| pRec->enmGuestOS == VBOXOSFAMILY_Windows64);
4419	Assert(pRec->cRegions == pRecVM->cRegions);
4420	#ifdef VBOX_STRICT
4421	for (unsigned i = 0; i < pRecVM->cRegions; i++)
4422	{
4423	Assert(pRecVM->aRegions[i].GCRegionAddr == pRec->aRegions[i].GCRegionAddr);
4424	Assert(pRecVM->aRegions[i].cbRegion == pRec->aRegions[i].cbRegion);
4425	}
4426	#endif
4427
4428	/* Remove from the tree and free memory. */
4429	RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4430	RTMemFree(pRec);
4431	}
4432	}
4433	else
4434	rc = VERR_PGM_SHARED_MODULE_REGISTRATION_INCONSISTENCY;
4435	}
4436	else
4437	Assert(!pRecVM->pGlobalModule);
4438
4439	/* Remove from the tree and free memory. */
4440	RTAvlGCPtrRemove(&pGVM->gmm.s.pSharedModuleTree, GCBaseAddr);
4441	RTMemFree(pRecVM);
4442	}
4443	else
4444	rc = VERR_PGM_SHARED_MODULE_NOT_FOUND;
4445
4446	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4447	}
4448	else
4449	rc = VERR_GMM_IS_NOT_SANE;
4450
4451	gmmR0MutexRelease(pGMM);
4452	return rc;
4453	#else
4454
4455	NOREF(pVM); NOREF(idCpu); NOREF(pszModuleName); NOREF(pszVersion); NOREF(GCBaseAddr); NOREF(cbModule);
4456	return VERR_NOT_IMPLEMENTED;
4457	#endif
4458	}
4459
4460
4461	/**
4462	* VMMR0 request wrapper for GMMR0UnregisterSharedModule.
4463	*
4464	* @returns see GMMR0UnregisterSharedModule.
4465	* @param pVM Pointer to the shared VM structure.
4466	* @param idCpu VCPU id
4467	* @param pReq The request packet.
4468	*/
4469	GMMR0DECL(int) GMMR0UnregisterSharedModuleReq(PVM pVM, VMCPUID idCpu, PGMMUNREGISTERSHAREDMODULEREQ pReq)
4470	{
4471	/*
4472	* Validate input and pass it on.
4473	*/
4474	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
4475	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
4476	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
4477
4478	return GMMR0UnregisterSharedModule(pVM, idCpu, pReq->szName, pReq->szVersion, pReq->GCBaseAddr, pReq->cbModule);
4479	}
4480
4481	#ifdef VBOX_WITH_PAGE_SHARING
4482
4483	/**
4484	* Increase the use count of a shared page, the page is known to exist and be valid and such.
4485	*
4486	* @param pGMM Pointer to the GMM instance.
4487	* @param pGVM Pointer to the GVM instance.
4488	* @param pPage The page structure.
4489	*/
4490	DECLINLINE(void) gmmR0UseSharedPage(PGMM pGMM, PGVM pGVM, PGMMPAGE pPage)
4491	{
4492	Assert(pGMM->cSharedPages > 0);
4493	Assert(pGMM->cAllocatedPages > 0);
4494
4495	pGMM->cDuplicatePages++;
4496
4497	pPage->Shared.cRefs++;
4498	pGVM->gmm.s.cSharedPages++;
4499	pGVM->gmm.s.Allocated.cBasePages++;
4500	}
4501
4502
4503	/**
4504	* Converts a private page to a shared page, the page is known to exist and be valid and such.
4505	*
4506	* @param pGMM Pointer to the GMM instance.
4507	* @param pGVM Pointer to the GVM instance.
4508	* @param HCPhys Host physical address
4509	* @param idPage The Page ID
4510	* @param pPage The page structure.
4511	*/
4512	DECLINLINE(void) gmmR0ConvertToSharedPage(PGMM pGMM, PGVM pGVM, RTHCPHYS HCPhys, uint32_t idPage, PGMMPAGE pPage)
4513	{
4514	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, idPage >> GMM_CHUNKID_SHIFT);
4515	Assert(pChunk);
4516	Assert(pChunk->cFree < GMM_CHUNK_NUM_PAGES);
4517	Assert(GMM_PAGE_IS_PRIVATE(pPage));
4518
4519	pChunk->cPrivate--;
4520	pChunk->cShared++;
4521
4522	pGMM->cSharedPages++;
4523
4524	pGVM->gmm.s.cSharedPages++;
4525	pGVM->gmm.s.cPrivatePages--;
4526
4527	/* Modify the page structure. */
4528	pPage->Shared.pfn = (uint32_t)(uint64_t)(HCPhys >> PAGE_SHIFT);
4529	pPage->Shared.cRefs = 1;
4530	pPage->Common.u2State = GMM_PAGE_STATE_SHARED;
4531	}
4532
4533
4534	/**
4535	* Checks specified shared module range for changes
4536	*
4537	* Performs the following tasks:
4538	* - If a shared page is new, then it changes the GMM page type to shared and
4539	* returns it in the pPageDesc descriptor.
4540	* - If a shared page already exists, then it checks if the VM page is
4541	* identical and if so frees the VM page and returns the shared page in
4542	* pPageDesc descriptor.
4543	*
4544	* @remarks ASSUMES the caller has acquired the GMM semaphore!!
4545	*
4546	* @returns VBox status code.
4547	* @param pGMM Pointer to the GMM instance data.
4548	* @param pGVM Pointer to the GVM instance data.
4549	* @param pModule Module description
4550	* @param idxRegion Region index
4551	* @param idxPage Page index
4552	* @param paPageDesc Page descriptor
4553	*/
4554	GMMR0DECL(int) GMMR0SharedModuleCheckPage(PGVM pGVM, PGMMSHAREDMODULE pModule, unsigned idxRegion, unsigned idxPage,
4555	PGMMSHAREDPAGEDESC pPageDesc)
4556	{
4557	int rc = VINF_SUCCESS;
4558	PGMM pGMM;
4559	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4560	unsigned cPages = pModule->aRegions[idxRegion].cbRegion >> PAGE_SHIFT;
4561
4562	AssertReturn(idxRegion < pModule->cRegions, VERR_INVALID_PARAMETER);
4563	AssertReturn(idxPage < cPages, VERR_INVALID_PARAMETER);
4564
4565	LogFlow(("GMMR0SharedModuleCheckRange %s base %RGv region %d idxPage %d\n", pModule->szName, pModule->Core.Key, idxRegion, idxPage));
4566
4567	PGMMSHAREDREGIONDESC pGlobalRegion = &pModule->aRegions[idxRegion];
4568	if (!pGlobalRegion->paHCPhysPageID)
4569	{
4570	/* First time; create a page descriptor array. */
4571	Log(("Allocate page descriptor array for %d pages\n", cPages));
4572	pGlobalRegion->paHCPhysPageID = (uint32_t )RTMemAlloc(cPages sizeof(*pGlobalRegion->paHCPhysPageID));
4573	if (!pGlobalRegion->paHCPhysPageID)
4574	{
4575	AssertFailed();
4576	rc = VERR_NO_MEMORY;
4577	goto end;
4578	}
4579	/* Invalidate all descriptors. */
4580	for (unsigned i = 0; i < cPages; i++)
4581	pGlobalRegion->paHCPhysPageID[i] = NIL_GMM_PAGEID;
4582	}
4583
4584	/* We've seen this shared page for the first time? */
4585	if (pGlobalRegion->paHCPhysPageID[idxPage] == NIL_GMM_PAGEID)
4586	{
4587	new_shared_page:
4588	Log(("New shared page guest %RGp host %RHp\n", pPageDesc->GCPhys, pPageDesc->HCPhys));
4589
4590	/* Easy case: just change the internal page type. */
4591	PGMMPAGE pPage = gmmR0GetPage(pGMM, pPageDesc->uHCPhysPageId);
4592	if (!pPage)
4593	{
4594	Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #1 (GCPhys=%RGp HCPhys=%RHp idxRegion=%#x idxPage=%#x)\n",
4595	pPageDesc->uHCPhysPageId, pPageDesc->GCPhys, pPageDesc->HCPhys, idxRegion, idxPage));
4596	AssertFailed();
4597	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4598	goto end;
4599	}
4600
4601	AssertMsg(pPageDesc->GCPhys == (pPage->Private.pfn << 12), ("desc %RGp gmm %RGp\n", pPageDesc->HCPhys, (pPage->Private.pfn << 12)));
4602
4603	gmmR0ConvertToSharedPage(pGMM, pGVM, pPageDesc->HCPhys, pPageDesc->uHCPhysPageId, pPage);
4604
4605	/* Keep track of these references. */
4606	pGlobalRegion->paHCPhysPageID[idxPage] = pPageDesc->uHCPhysPageId;
4607	}
4608	else
4609	{
4610	uint8_t pbLocalPage, pbSharedPage;
4611	uint8_t *pbChunk;
4612	PGMMCHUNK pChunk;
4613
4614	Assert(pPageDesc->uHCPhysPageId != pGlobalRegion->paHCPhysPageID[idxPage]);
4615
4616	Log(("Replace existing page guest %RGp host %RHp id %x -> id %x\n", pPageDesc->GCPhys, pPageDesc->HCPhys, pPageDesc->uHCPhysPageId, pGlobalRegion->paHCPhysPageID[idxPage]));
4617
4618	/* Get the shared page source. */
4619	PGMMPAGE pPage = gmmR0GetPage(pGMM, pGlobalRegion->paHCPhysPageID[idxPage]);
4620	if (!pPage)
4621	{
4622	Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #2 (idxRegion=%#x idxPage=%#x)\n",
4623	pPageDesc->uHCPhysPageId, idxRegion, idxPage));
4624	AssertFailed();
4625	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4626	goto end;
4627	}
4628	if (pPage->Common.u2State != GMM_PAGE_STATE_SHARED)
4629	{
4630	/* Page was freed at some point; invalidate this entry. */
4631	/** @todo this isn't really bullet proof. */
4632	Log(("Old shared page was freed -> create a new one\n"));
4633	pGlobalRegion->paHCPhysPageID[idxPage] = NIL_GMM_PAGEID;
4634	goto new_shared_page; /* ugly goto */
4635	}
4636
4637	Log(("Replace existing page guest host %RHp -> %RHp\n", pPageDesc->HCPhys, ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT));
4638
4639	/* Calculate the virtual address of the local page. */
4640	pChunk = gmmR0GetChunk(pGMM, pPageDesc->uHCPhysPageId >> GMM_CHUNKID_SHIFT);
4641	if (pChunk)
4642	{
4643	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4644	{
4645	Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #3\n", pPageDesc->uHCPhysPageId));
4646	AssertFailed();
4647	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4648	goto end;
4649	}
4650	pbLocalPage = pbChunk + ((pPageDesc->uHCPhysPageId & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4651	}
4652	else
4653	{
4654	Log(("GMMR0SharedModuleCheckPage: Invalid idPage=%#x #4\n", pPageDesc->uHCPhysPageId));
4655	AssertFailed();
4656	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
4657	goto end;
4658	}
4659
4660	/* Calculate the virtual address of the shared page. */
4661	pChunk = gmmR0GetChunk(pGMM, pGlobalRegion->paHCPhysPageID[idxPage] >> GMM_CHUNKID_SHIFT);
4662	Assert(pChunk); /* can't fail as gmmR0GetPage succeeded. */
4663
4664	/* Get the virtual address of the physical page; map the chunk into the VM process if not already done. */
4665	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4666	{
4667	Log(("Map chunk into process!\n"));
4668	rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4669	if (rc != VINF_SUCCESS)
4670	{
4671	AssertRC(rc);
4672	goto end;
4673	}
4674	}
4675	pbSharedPage = pbChunk + ((pGlobalRegion->paHCPhysPageID[idxPage] & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
4676
4677	/** @todo write ASMMemComparePage. */
4678	if (memcmp(pbSharedPage, pbLocalPage, PAGE_SIZE))
4679	{
4680	Log(("Unexpected differences found between local and shared page; skip\n"));
4681	/* Signal to the caller that this one hasn't changed. */
4682	pPageDesc->uHCPhysPageId = NIL_GMM_PAGEID;
4683	goto end;
4684	}
4685
4686	/* Free the old local page. */
4687	GMMFREEPAGEDESC PageDesc;
4688
4689	PageDesc.idPage = pPageDesc->uHCPhysPageId;
4690	rc = gmmR0FreePages(pGMM, pGVM, 1, &PageDesc, GMMACCOUNT_BASE);
4691	AssertRCReturn(rc, rc);
4692
4693	gmmR0UseSharedPage(pGMM, pGVM, pPage);
4694
4695	/* Pass along the new physical address & page id. */
4696	pPageDesc->HCPhys = ((uint64_t)pPage->Shared.pfn) << PAGE_SHIFT;
4697	pPageDesc->uHCPhysPageId = pGlobalRegion->paHCPhysPageID[idxPage];
4698	}
4699	end:
4700	return rc;
4701	}
4702
4703
4704	/**
4705	* RTAvlGCPtrDestroy callback.
4706	*
4707	* @returns 0 or VERR_GMM_INSTANCE.
4708	* @param pNode The node to destroy.
4709	* @param pvGVM The GVM handle.
4710	*/
4711	static DECLCALLBACK(int) gmmR0CleanupSharedModule(PAVLGCPTRNODECORE pNode, void *pvGVM)
4712	{
4713	PGMMSHAREDMODULEPERVM pRecVM = (PGMMSHAREDMODULEPERVM)pNode;
4714	NOREF(pvGVM);
4715
4716	Assert(pRecVM->pGlobalModule \|\| pRecVM->fCollision);
4717	if (pRecVM->pGlobalModule)
4718	{
4719	PGMMSHAREDMODULE pRec = pRecVM->pGlobalModule;
4720	AssertPtr(pRec);
4721	Assert(pRec->cUsers);
4722
4723	Log(("gmmR0CleanupSharedModule: %s %s cUsers=%d\n", pRec->szName, pRec->szVersion, pRec->cUsers));
4724	pRec->cUsers--;
4725	if (pRec->cUsers == 0)
4726	{
4727	for (uint32_t i = 0; i < pRec->cRegions; i++)
4728	if (pRec->aRegions[i].paHCPhysPageID)
4729	RTMemFree(pRec->aRegions[i].paHCPhysPageID);
4730
4731	/* Remove from the tree and free memory. */
4732	PGMM pGMM;
4733	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4734	RTAvlGCPtrRemove(&pGMM->pGlobalSharedModuleTree, pRec->Core.Key);
4735	RTMemFree(pRec);
4736	}
4737	}
4738	RTMemFree(pRecVM);
4739	return 0;
4740	}
4741
4742
4743	/**
4744	* Used by GMMR0CleanupVM to clean up shared modules.
4745	*
4746	* This is called without taking the GMM lock so that it can be yielded as
4747	* needed here.
4748	*
4749	* @param pGMM The GMM handle.
4750	* @param pGVM The global VM handle.
4751	*/
4752	static void gmmR0SharedModuleCleanup(PGMM pGMM, PGVM pGVM)
4753	{
4754	gmmR0MutexAcquire(pGMM);
4755	GMM_CHECK_SANITY_UPON_ENTERING(pGMM);
4756
4757	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4758
4759	gmmR0MutexRelease(pGMM);
4760	}
4761
4762	#endif /* VBOX_WITH_PAGE_SHARING */
4763
4764	/**
4765	* Removes all shared modules for the specified VM
4766	*
4767	* @returns VBox status code.
4768	* @param pVM VM handle
4769	* @param idCpu VCPU id
4770	*/
4771	GMMR0DECL(int) GMMR0ResetSharedModules(PVM pVM, VMCPUID idCpu)
4772	{
4773	#ifdef VBOX_WITH_PAGE_SHARING
4774	/*
4775	* Validate input and get the basics.
4776	*/
4777	PGMM pGMM;
4778	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4779	PGVM pGVM;
4780	int rc = GVMMR0ByVMAndEMT(pVM, idCpu, &pGVM);
4781	if (RT_FAILURE(rc))
4782	return rc;
4783
4784	/*
4785	* Take the semaphore and do some more validations.
4786	*/
4787	gmmR0MutexAcquire(pGMM);
4788	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4789	{
4790	Log(("GMMR0ResetSharedModules\n"));
4791	RTAvlGCPtrDestroy(&pGVM->gmm.s.pSharedModuleTree, gmmR0CleanupSharedModule, pGVM);
4792
4793	rc = VINF_SUCCESS;
4794	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4795	}
4796	else
4797	rc = VERR_GMM_IS_NOT_SANE;
4798
4799	gmmR0MutexRelease(pGMM);
4800	return rc;
4801	#else
4802	NOREF(pVM); NOREF(idCpu);
4803	return VERR_NOT_IMPLEMENTED;
4804	#endif
4805	}
4806
4807	#ifdef VBOX_WITH_PAGE_SHARING
4808
4809	typedef struct
4810	{
4811	PGVM pGVM;
4812	VMCPUID idCpu;
4813	int rc;
4814	} GMMCHECKSHAREDMODULEINFO, *PGMMCHECKSHAREDMODULEINFO;
4815
4816	/**
4817	* Tree enumeration callback for checking a shared module.
4818	*/
4819	DECLCALLBACK(int) gmmR0CheckSharedModule(PAVLGCPTRNODECORE pNode, void *pvUser)
4820	{
4821	PGMMCHECKSHAREDMODULEINFO pInfo = (PGMMCHECKSHAREDMODULEINFO)pvUser;
4822	PGMMSHAREDMODULEPERVM pLocalModule = (PGMMSHAREDMODULEPERVM)pNode;
4823	PGMMSHAREDMODULE pGlobalModule = pLocalModule->pGlobalModule;
4824
4825	if ( !pLocalModule->fCollision
4826	&& pGlobalModule)
4827	{
4828	Log(("gmmR0CheckSharedModule: check %s %s base=%RGv size=%x collision=%d\n", pGlobalModule->szName, pGlobalModule->szVersion, pGlobalModule->Core.Key, pGlobalModule->cbModule, pLocalModule->fCollision));
4829	pInfo->rc = PGMR0SharedModuleCheck(pInfo->pGVM->pVM, pInfo->pGVM, pInfo->idCpu, pGlobalModule, pLocalModule->cRegions, pLocalModule->aRegions);
4830	if (RT_FAILURE(pInfo->rc))
4831	return 1; /* stop enumeration. */
4832	}
4833	return 0;
4834	}
4835
4836	#endif /* VBOX_WITH_PAGE_SHARING */
4837	#ifdef DEBUG_sandervl
4838
4839	/**
4840	* Setup for a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4841	*
4842	* @returns VBox status code.
4843	* @param pVM VM handle
4844	*/
4845	GMMR0DECL(int) GMMR0CheckSharedModulesStart(PVM pVM)
4846	{
4847	/*
4848	* Validate input and get the basics.
4849	*/
4850	PGMM pGMM;
4851	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4852
4853	/*
4854	* Take the semaphore and do some more validations.
4855	*/
4856	gmmR0MutexAcquire(pGMM);
4857	if (!GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4858	rc = VERR_GMM_IS_NOT_SANE;
4859	else
4860	rc = VINF_SUCCESS;
4861
4862	return rc;
4863	}
4864
4865	/**
4866	* Clean up after a GMMR0CheckSharedModules call (to allow log flush jumps back to ring 3)
4867	*
4868	* @returns VBox status code.
4869	* @param pVM VM handle
4870	*/
4871	GMMR0DECL(int) GMMR0CheckSharedModulesEnd(PVM pVM)
4872	{
4873	/*
4874	* Validate input and get the basics.
4875	*/
4876	PGMM pGMM;
4877	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4878
4879	gmmR0MutexRelease(pGMM);
4880	return VINF_SUCCESS;
4881	}
4882
4883	#endif /* DEBUG_sandervl */
4884
4885	/**
4886	* Check all shared modules for the specified VM
4887	*
4888	* @returns VBox status code.
4889	* @param pVM VM handle
4890	* @param pVCpu VMCPU handle
4891	*/
4892	GMMR0DECL(int) GMMR0CheckSharedModules(PVM pVM, PVMCPU pVCpu)
4893	{
4894	#ifdef VBOX_WITH_PAGE_SHARING
4895	/*
4896	* Validate input and get the basics.
4897	*/
4898	PGMM pGMM;
4899	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
4900	PGVM pGVM;
4901	int rc = GVMMR0ByVMAndEMT(pVM, pVCpu->idCpu, &pGVM);
4902	if (RT_FAILURE(rc))
4903	return rc;
4904
4905	# ifndef DEBUG_sandervl
4906	/*
4907	* Take the semaphore and do some more validations.
4908	*/
4909	gmmR0MutexAcquire(pGMM);
4910	# endif
4911	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
4912	{
4913	GMMCHECKSHAREDMODULEINFO Info;
4914
4915	Log(("GMMR0CheckSharedModules\n"));
4916	Info.pGVM = pGVM;
4917	Info.idCpu = pVCpu->idCpu;
4918	Info.rc = VINF_SUCCESS;
4919
4920	RTAvlGCPtrDoWithAll(&pGVM->gmm.s.pSharedModuleTree, true /* fFromLeft */, gmmR0CheckSharedModule, &Info);
4921
4922	rc = Info.rc;
4923
4924	Log(("GMMR0CheckSharedModules done!\n"));
4925
4926	GMM_CHECK_SANITY_UPON_LEAVING(pGMM);
4927	}
4928	else
4929	rc = VERR_GMM_IS_NOT_SANE;
4930
4931	# ifndef DEBUG_sandervl
4932	gmmR0MutexRelease(pGMM);
4933	# endif
4934	return rc;
4935	#else
4936	NOREF(pVM); NOREF(pVCpu);
4937	return VERR_NOT_IMPLEMENTED;
4938	#endif
4939	}
4940
4941	#if defined(VBOX_STRICT) && HC_ARCH_BITS == 64
4942
4943	typedef struct
4944	{
4945	PGVM pGVM;
4946	PGMM pGMM;
4947	uint8_t *pSourcePage;
4948	bool fFoundDuplicate;
4949	} GMMFINDDUPPAGEINFO, *PGMMFINDDUPPAGEINFO;
4950
4951	/**
4952	* RTAvlU32DoWithAll callback.
4953	*
4954	* @returns 0
4955	* @param pNode The node to search.
4956	* @param pvInfo Pointer to the input parameters
4957	*/
4958	static DECLCALLBACK(int) gmmR0FindDupPageInChunk(PAVLU32NODECORE pNode, void *pvInfo)
4959	{
4960	PGMMCHUNK pChunk = (PGMMCHUNK)pNode;
4961	PGMMFINDDUPPAGEINFO pInfo = (PGMMFINDDUPPAGEINFO)pvInfo;
4962	PGVM pGVM = pInfo->pGVM;
4963	PGMM pGMM = pInfo->pGMM;
4964	uint8_t *pbChunk;
4965
4966	/* Only take chunks not mapped into this VM process; not entirely correct. */
4967	if (!gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
4968	{
4969	int rc = gmmR0MapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/, (PRTR3PTR)&pbChunk);
4970	if (RT_SUCCESS(rc))
4971	{
4972	/*
4973	* Look for duplicate pages
4974	*/
4975	unsigned iPage = (GMM_CHUNK_SIZE >> PAGE_SHIFT);
4976	while (iPage-- > 0)
4977	{
4978	if (GMM_PAGE_IS_PRIVATE(&pChunk->aPages[iPage]))
4979	{
4980	uint8_t *pbDestPage = pbChunk + (iPage << PAGE_SHIFT);
4981
4982	if (!memcmp(pInfo->pSourcePage, pbDestPage, PAGE_SIZE))
4983	{
4984	pInfo->fFoundDuplicate = true;
4985	break;
4986	}
4987	}
4988	}
4989	gmmR0UnmapChunk(pGMM, pGVM, pChunk, false /fRelaxedSem/);
4990	}
4991	}
4992	return pInfo->fFoundDuplicate; /* (stops search if true) */
4993	}
4994
4995
4996	/**
4997	* Find a duplicate of the specified page in other active VMs
4998	*
4999	* @returns VBox status code.
5000	* @param pVM VM handle
5001	* @param pReq Request packet
5002	*/
5003	GMMR0DECL(int) GMMR0FindDuplicatePageReq(PVM pVM, PGMMFINDDUPLICATEPAGEREQ pReq)
5004	{
5005	/*
5006	* Validate input and pass it on.
5007	*/
5008	AssertPtrReturn(pVM, VERR_INVALID_POINTER);
5009	AssertPtrReturn(pReq, VERR_INVALID_POINTER);
5010	AssertMsgReturn(pReq->Hdr.cbReq == sizeof(pReq), ("%#x != %#x\n", pReq->Hdr.cbReq, sizeof(pReq)), VERR_INVALID_PARAMETER);
5011
5012	PGMM pGMM;
5013	GMM_GET_VALID_INSTANCE(pGMM, VERR_GMM_INSTANCE);
5014
5015	PGVM pGVM;
5016	int rc = GVMMR0ByVM(pVM, &pGVM);
5017	if (RT_FAILURE(rc))
5018	return rc;
5019
5020	/*
5021	* Take the semaphore and do some more validations.
5022	*/
5023	rc = gmmR0MutexAcquire(pGMM);
5024	if (GMM_CHECK_SANITY_UPON_ENTERING(pGMM))
5025	{
5026	uint8_t *pbChunk;
5027	PGMMCHUNK pChunk = gmmR0GetChunk(pGMM, pReq->idPage >> GMM_CHUNKID_SHIFT);
5028	if (pChunk)
5029	{
5030	if (gmmR0IsChunkMapped(pGMM, pGVM, pChunk, (PRTR3PTR)&pbChunk))
5031	{
5032	uint8_t *pbSourcePage = pbChunk + ((pReq->idPage & GMM_PAGEID_IDX_MASK) << PAGE_SHIFT);
5033	PGMMPAGE pPage = gmmR0GetPage(pGMM, pReq->idPage);
5034	if (pPage)
5035	{
5036	GMMFINDDUPPAGEINFO Info;
5037	Info.pGVM = pGVM;
5038	Info.pGMM = pGMM;
5039	Info.pSourcePage = pbSourcePage;
5040	Info.fFoundDuplicate = false;
5041	RTAvlU32DoWithAll(&pGMM->pChunks, true /* fFromLeft */, gmmR0FindDupPageInChunk, &Info);
5042
5043	pReq->fDuplicate = Info.fFoundDuplicate;
5044	}
5045	else
5046	{
5047	AssertFailed();
5048	rc = VERR_PGM_PHYS_INVALID_PAGE_ID;
5049	}
5050	}
5051	else
5052	AssertFailed();
5053	}
5054	else
5055	AssertFailed();
5056	}
5057	else
5058	rc = VERR_GMM_IS_NOT_SANE;
5059
5060	gmmR0MutexRelease(pGMM);
5061	return rc;
5062	}
5063
5064	#endif /* VBOX_STRICT && HC_ARCH_BITS == 64 */
5065

Note: See TracBrowser for help on using the repository browser.

source: vbox/trunk/src/VBox/VMM/VMMR0/GMMR0.cpp@ 39436

Download in other formats: