A forum for reverse engineering, OS internals and malware analysis 

Forum for discussion about kernel-mode development.
 #19741  by myid
 Sun Jun 23, 2013 11:50 am
Hi, everyone.

There is a ring3 address, for example, the address of ShellAboutW: 0x000007fe`fe7191b8

I use VirtualProtect function to set memory as PAGE_EXECUTE_READWRITE.

I use "!pte" command in WINDBG to get PXE/PPE/PDE/PTE, but why the value of its PTE is zero?
lkd> !pte 000007fe`fe7191b8
VA 000007fefe7191b8
PXE @ FFFFF6FB7DBED078 PPE at FFFFF6FB7DA0FFD8 PDE at FFFFF6FB41FFBF98 PTE at FFFFF683FF7F38C8
contains 008000010D27B867 contains 06E00000CA34D867 contains 17F00000C5B5A867 contains 0000000000000000
pfn 10d27b ---DA--UWEV pfn ca34d ---DA--UWEV pfn c5b5a ---DA--UWEV
 #19752  by myid
 Sun Jun 23, 2013 7:25 pm
I can use WINDBG to get some information:
lkd> !vtop ce3be000 000007FEFE4291B8
Amd64VtoP: Virt 000007fe`fe4291b8, pagedir ce3be000
Amd64VtoP: PML4E ce3be078
Amd64VtoP: PDPE cebbbfd8
Amd64VtoP: PDE c535ff90
Amd64VtoP: PTE c46c4148
Amd64VtoP: Mapped phys dceeb1b8
Virtual address 7fefe4291b8 translates to physical address dceeb1b8.
And I can use !db and !eb to read/write physical address directly.

My questions:
1.How to calculate the PTE physical address of ring3 memory (process memory) by virtual address and DirBase value?
2.How to read/write physical address directly by programming?
 #19764  by feryno
 Mon Jun 24, 2013 1:08 pm
Hi,
answer to your second question is shorter:
you can't directly access physical memory, you must map it somewhere at first:
MmMapIoSpace

answer to you first question is long story

Paging (virtual memory translation) is described in CPU manuals http://www.amd.com http://www.intel.com

x64 version of ms win runs in long mode (AMD name) = IA32e mode (Intel name) so CR0.PE=1, CR0.PG=1, CR0.PAE=1, EFER.LME=1, EFER.LMA=1

Page translation for long mode uses 2, 3, 4 levels of translation tables
2 levels are maybe not yet implemented in ms win, I saw only few CPUs with this feature, e.g. AMD Bulldozer
3 levels are used sometimes - e.g. about 4-6 pages of 2 MB mapped for ntoskrnl.exe, hal.dll and maybe some third kernel file which I don't remember from my head just now (or maybe only the 2 as I wrote) and an contiguous array of few 2 MB pages is available to map using MmMapIoSpace when you set input param concerning alignment to be power of 2 MB
4 levels are used almost for everything runnnig under ms win x64

When CPU translates virtual address (VA) into physical address (PA) it begins from CR3 and VA.
CR3 is per process specific so more processes may have the same VA without collision (finally they are translated to different PA because different CR3). The same PA may be mapped to more processes (or to each process) under identical VA (shared memory, kernel etc.). The same PA may be mapped to more processes under different VA.

VA must be in canonical address form (bits 63-47 must be the same = all 0 or all 1)
CR3 contains a pointer to base of translation tables (the base is PA). Base of translation tables points to PML4 and for x64 the PML4 contains 512 entries each 1 qword (8 bytes) which gives 4kB (1 page). PML4 must be aligned at 4kB (because only bits 51-12 of CR3 are used as a pointer to PML4, bits not used for pointer are used to select type of memory caching and some access restrictions).

so the first formula:
PML4 = CR3 and 000FFFFFFFFFF000h

PML4 is PA and ms win x64 have this always mapped at FFFFF6FB7DBED000 (VA)

when CPU continues in pages walk, it extracts 1 qword from PML4 and its name is PML4E (PML4 Entry)
PML4E = qword [PML4 + ((VA shr 39) and 1FFh) * 8]

Now CPU got PML4E.
If bit 0 of PML4E is 1 (this bit has the name Present, P), then bits 51-12 of PML4E contain a pointer to next level of pages translation, the base of next level of pages translation has the name PDP at is is again PA (not yet VA). If bit 0 of PML4E is 0 then CPU won't continue in pages translation and generates pagefault (#PF) - OS will capture #PF via exception handler and OS may then decide whether perform an attempt to map the missing page or reload it from swap file, or may terminate ring3 application, or may deliver exception to debugger, or may do BSOD...

Bit 0 of PML4E is 1 so go on.

PDP = PML4E and 000FFFFFFFFFF000h

PDP points to the similar table as was in previous level (512 entries, each 1 qword, that consumes 1 page of 4 kB)

Now CPU continues in pages walk, it extract PDPE (PDP Entry) in a formula like:
PDPE = qword [PDP + ((VA shr 30) and 1FFh) * 8]

If bit 0 of PDPE is 0 then CPU can't continue in pages translation (pagefault as in previous)
If it is 1 then CPU knows it contains a valid pointer and continues.

Newer CPUs (Bulldozer certainly, maybe some Xeons?) may be at the finish with translation if PDPE.PS bit (bit 7. of PDPE) is set to 1 (2-level paging translation which maps 1 GB page - usefull for some dataservers, databases, ...)
For 1 GB paging, the rest of VA (bits 29-0) is offset in the 1 GB page.
PA = (PDP and 000FFFFFC0000000h) + (VA and 3FFFFFFFh)

OK, PDP.P=1 and PDPE.PS=0, CPU continues in paging walk to the next level of paging tables.

PD = PDP and 000FFFFFFFFFF000h
PDE = qword [PD + ((VA shr 21) and 1FFh) * 8]

if PDE.P=0 then pagefault, CPU can't complete paging translation
if PDE.P=1 the entry contains valid pointer
now again bit 7 (PS)
if PDE.PS=1 then 3-level paging translation using 2 MB pages, bits 20-0 of VA contain offset in the 2 MB page, PA = (PDE and 000FFFFFFFE00000h) + (VA and 1FFFFFh)
if PDE.PS=0 then CPU continues to the last possible 4-th level of translation, CPU knows the base of last level table (PT) and and bits 20-12 of VA are index in the table (entry = qword [base + index*8]

PT = PDE and 000FFFFFFFFFF000h
PTE = qword [PT + ((VA shr 12) and 1FFh) * 8]
if PTE.P=0 then pagefault (CPU can't finish virtual memory translation)
if PTE.P=1 then CPU finished paging walk

PA = (PTE and 000FFFFFFFFFF000h) + (VA and FFFh)



here and example what CPU calculates while parsing paging tables, assuming bit 0 of every entry of all 4 levels in paging tables is 1 (P bit = 1):

HalPerformEndOfInterrupt:
mov dword ptr ds:0FFFFFFFFFFFE00B0h, 0 ; hexadecimal opcode C7 04 25 B0 00 FE FF 00 00 00 00
the APIC is mapped as uncacheable using 4kB 4-level pages translation

CPU knows current CR3 and VA=FFFFFFFFFFFE00B0
PML4 = CR3 and 000FFFFFFFFFF000h
PML4E = qword [PML4 + FFFFFFFFFFFE00B0 shr 39) and 1FFh) * 8] = qword [PML4 + FF8h]
PDP = PML4E and 000FFFFFFFFFF000h
PDPE = qword [PDP + (FFFFFFFFFFFE00B0 shr 30) and 1FFh) * 8] = qword [PDP + FF8h]
PD = PDPE and 000FFFFFFFFFF000h
PDE = qword [PD + (FFFFFFFFFFFE00B0 shr 21) and 1FFh) * 8] = qword [PD + FF8h]
PT = PDE and 000FFFFFFFFFF000h
PTE = qword [PT + (FFFFFFFFFFFE00B0 shr 12) and 1FFh) * 8] = qword [PT + F00h]
PA = (PTE and 000FFFFFFFFFF000h) + (FFFFFFFFFFFE00B0 and FFFh) = (PTE and 000FFFFFFFFFF000h) + 0B0h


As another example you should look into:
MmIsAddressValid
There was a small mistake in validation of canonical address form and Landy Wang fixed it so the correct way appeared in windows server 2012 x64 / windows 8 x64 (persists in windows server 2008 R2 SP1 x64 = windows 7 SP1 x64 where SAR RAX,30h should be correctly SAR RAX,2Fh)

Plese remember that OS wants to access PA (and it is able to do that only via VA) so it has all tables mapped at certain VA:

ms win x64:
#define PXE_BASE 0xFFFFF6FB7DBED000UI64
#define PXE_SELFMAP 0xFFFFF6FB7DBEDF68UI64
#define PPE_BASE 0xFFFFF6FB7DA00000UI64
#define PDE_BASE 0xFFFFF6FB40000000UI64
#define PTE_BASE 0xFFFFF68000000000UI64

#define PXE_TOP 0xFFFFF6FB7DBEDFFFUI64
#define PPE_TOP 0xFFFFF6FB7DBFFFFFUI64
#define PDE_TOP 0xFFFFF6FB7FFFFFFFUI64
#define PTE_TOP 0xFFFFF6FFFFFFFFFFUI64


There is a high probability that I did some unwanted mistake in the above formulas and calculations (your question is simple but answer too complicated and maybe cannot be simplified anymore), so certainly consult the CPU manuals.
 #19774  by myid
 Mon Jun 24, 2013 6:54 pm
feryno wrote:Hi,
answer to your second question is shorter:
you can't directly access physical memory, you must map it somewhere at first:
MmMapIoSpace

answer to you first question is long story

Paging (virtual memory translation) is described in CPU manuals http://www.amd.com http://www.intel.com

x64 version of ms win runs in long mode (AMD name) = IA32e mode (Intel name) so CR0.PE=1, CR0.PG=1, CR0.PAE=1, EFER.LME=1, EFER.LMA=1

Page translation for long mode uses 2, 3, 4 levels of translation tables
2 levels are maybe not yet implemented in ms win, I saw only few CPUs with this feature, e.g. AMD Bulldozer
3 levels are used sometimes - e.g. about 4-6 pages of 2 MB mapped for ntoskrnl.exe, hal.dll and maybe some third kernel file which I don't remember from my head just now (or maybe only the 2 as I wrote) and an contiguous array of few 2 MB pages is available to map using MmMapIoSpace when you set input param concerning alignment to be power of 2 MB
4 levels are used almost for everything runnnig under ms win x64

When CPU translates virtual address (VA) into physical address (PA) it begins from CR3 and VA.
CR3 is per process specific so more processes may have the same VA without collision (finally they are translated to different PA because different CR3). The same PA may be mapped to more processes (or to each process) under identical VA (shared memory, kernel etc.). The same PA may be mapped to more processes under different VA.

VA must be in canonical address form (bits 63-47 must be the same = all 0 or all 1)
CR3 contains a pointer to base of translation tables (the base is PA). Base of translation tables points to PML4 and for x64 the PML4 contains 512 entries each 1 qword (8 bytes) which gives 4kB (1 page). PML4 must be aligned at 4kB (because only bits 51-12 of CR3 are used as a pointer to PML4, bits not used for pointer are used to select type of memory caching and some access restrictions).

so the first formula:
PML4 = CR3 and 000FFFFFFFFFF000h

PML4 is PA and ms win x64 have this always mapped at FFFFF6FB7DBED000 (VA)

when CPU continues in pages walk, it extracts 1 qword from PML4 and its name is PML4E (PML4 Entry)
PML4E = qword [PML4 + ((VA shr 39) and 1FFh) * 8]

Now CPU got PML4E.
If bit 0 of PML4E is 1 (this bit has the name Present, P), then bits 51-12 of PML4E contain a pointer to next level of pages translation, the base of next level of pages translation has the name PDP at is is again PA (not yet VA). If bit 0 of PML4E is 0 then CPU won't continue in pages translation and generates pagefault (#PF) - OS will capture #PF via exception handler and OS may then decide whether perform an attempt to map the missing page or reload it from swap file, or may terminate ring3 application, or may deliver exception to debugger, or may do BSOD...

Bit 0 of PML4E is 1 so go on.

PDP = PML4E and 000FFFFFFFFFF000h

PDP points to the similar table as was in previous level (512 entries, each 1 qword, that consumes 1 page of 4 kB)

Now CPU continues in pages walk, it extract PDPE (PDP Entry) in a formula like:
PDPE = qword [PDP + ((VA shr 30) and 1FFh) * 8]

If bit 0 of PDPE is 0 then CPU can't continue in pages translation (pagefault as in previous)
If it is 1 then CPU knows it contains a valid pointer and continues.

Newer CPUs (Bulldozer certainly, maybe some Xeons?) may be at the finish with translation if PDPE.PS bit (bit 7. of PDPE) is set to 1 (2-level paging translation which maps 1 GB page - usefull for some dataservers, databases, ...)
For 1 GB paging, the rest of VA (bits 29-0) is offset in the 1 GB page.
PA = (PDP and 000FFFFFC0000000h) + (VA and 3FFFFFFFh)

OK, PDP.P=1 and PDPE.PS=0, CPU continues in paging walk to the next level of paging tables.

PD = PDP and 000FFFFFFFFFF000h
PDE = qword [PD + ((VA shr 21) and 1FFh) * 8]

if PDE.P=0 then pagefault, CPU can't complete paging translation
if PDE.P=1 the entry contains valid pointer
now again bit 7 (PS)
if PDE.PS=1 then 3-level paging translation using 2 MB pages, bits 20-0 of VA contain offset in the 2 MB page, PA = (PDE and 000FFFFFFFE00000h) + (VA and 1FFFFFh)
if PDE.PS=0 then CPU continues to the last possible 4-th level of translation, CPU knows the base of last level table (PT) and and bits 20-12 of VA are index in the table (entry = qword [base + index*8]

PT = PDE and 000FFFFFFFFFF000h
PTE = qword [PT + ((VA shr 12) and 1FFh) * 8]
if PTE.P=0 then pagefault (CPU can't finish virtual memory translation)
if PTE.P=1 then CPU finished paging walk

PA = (PTE and 000FFFFFFFFFF000h) + (VA and FFFh)



here and example what CPU calculates while parsing paging tables, assuming bit 0 of every entry of all 4 levels in paging tables is 1 (P bit = 1):

HalPerformEndOfInterrupt:
mov dword ptr ds:0FFFFFFFFFFFE00B0h, 0 ; hexadecimal opcode C7 04 25 B0 00 FE FF 00 00 00 00
the APIC is mapped as uncacheable using 4kB 4-level pages translation

CPU knows current CR3 and VA=FFFFFFFFFFFE00B0
PML4 = CR3 and 000FFFFFFFFFF000h
PML4E = qword [PML4 + FFFFFFFFFFFE00B0 shr 39) and 1FFh) * 8] = qword [PML4 + FF8h]
PDP = PML4E and 000FFFFFFFFFF000h
PDPE = qword [PDP + (FFFFFFFFFFFE00B0 shr 30) and 1FFh) * 8] = qword [PDP + FF8h]
PD = PDPE and 000FFFFFFFFFF000h
PDE = qword [PD + (FFFFFFFFFFFE00B0 shr 21) and 1FFh) * 8] = qword [PD + FF8h]
PT = PDE and 000FFFFFFFFFF000h
PTE = qword [PT + (FFFFFFFFFFFE00B0 shr 12) and 1FFh) * 8] = qword [PT + F00h]
PA = (PTE and 000FFFFFFFFFF000h) + (FFFFFFFFFFFE00B0 and FFFh) = (PTE and 000FFFFFFFFFF000h) + 0B0h


As another example you should look into:
MmIsAddressValid
There was a small mistake in validation of canonical address form and Landy Wang fixed it so the correct way appeared in windows server 2012 x64 / windows 8 x64 (persists in windows server 2008 R2 SP1 x64 = windows 7 SP1 x64 where SAR RAX,30h should be correctly SAR RAX,2Fh)

Plese remember that OS wants to access PA (and it is able to do that only via VA) so it has all tables mapped at certain VA:

ms win x64:
#define PXE_BASE 0xFFFFF6FB7DBED000UI64
#define PXE_SELFMAP 0xFFFFF6FB7DBEDF68UI64
#define PPE_BASE 0xFFFFF6FB7DA00000UI64
#define PDE_BASE 0xFFFFF6FB40000000UI64
#define PTE_BASE 0xFFFFF68000000000UI64

#define PXE_TOP 0xFFFFF6FB7DBEDFFFUI64
#define PPE_TOP 0xFFFFF6FB7DBFFFFFUI64
#define PDE_TOP 0xFFFFF6FB7FFFFFFFUI64
#define PTE_TOP 0xFFFFF6FFFFFFFFFFUI64


There is a high probability that I did some unwanted mistake in the above formulas and calculations (your question is simple but answer too complicated and maybe cannot be simplified anymore), so certainly consult the CPU manuals.

Thank you very much!