A forum for reverse engineering, OS internals and malware analysis 

Forum for discussion about user-mode development.
 #27414  by evelyette
 Mon Dec 14, 2015 8:20 pm
Hello,

I've stumbled upon a DLL, which has an exported structure, mainly the dnsapi.dll module exporting the DnsGlobals structure. While creating the proxy DLL dnsapi.dll and forwarding function calls to the original DLL dnsapi_.dll (renamed from dnsapi.dll in system32), I'm a little worried how I should approach the problem. The problem is that some functions in other DLLs access the DnsGlobals structure directly, reading/writing values from it. Since I'm using a proxy DLL; those reads/writes go directly to my proxy DLL and not the original DLL. Therefore, the external DLLs would change the structure in the proxy dll dnsapi.dll and not in the original dll dnsapi_.dll, which could result in an inconsistency, because an external DLL changes the structure in proxy DLL and then calls a function that gets forwarded to the original DLL, which doesn't have that structure (but has it's own copy that wasn't changed).

I could solve the problem the following ways:

1. I could create my own structure in the proxy DLL, but the structure could get read from/written to directly from other DLLs (since it's exported). Besides read/write operations (DATA XREFs in Ida Pro are labeled as 'r' and 'w'), there are also address XREFS of type 'o', which hold a pointer to an address. Therefore, not only the current exported structure, but also all dependent structures are affected.

2. Change the While loading the original dll dnsapi_.dll from the DllMain of the current dll dnsapi.dll, I could get the address of the DnsGlobals structure and copy the already initialized structure (since DllMain was called when we called LoadLibrary) from the original DLL (obtain the address with GetProcAddress). By doing it this way, if we pass some call through our proxy DLL to the original DLL; it's functions are changing the dnsglobals structure in the original DLL (but not in proxy dll). To solve that we would need to copy-paste the changes from the original dnsglobals structure to the proxy dnsglobal structure every time a function changes them, to keep them in sync.

3. While loading the original dll dnsapi_.dll from the DllMain of the current dll dnsapi.dll, I could get the address of the DnsGlobals structure and replace the address in current proxy DLL with the address of the original DnsGlobals structure. I don't see any dowsides with this, so I'm thinking this is the way to go.

Any ideas, comments on my thinking (especially the number 3., which I think is the way to go with exported data structures).
 #27420  by Brock
 Tue Dec 15, 2015 6:41 pm
#3 is the cleanest and most logical method for this in my opinion. I'd shoot for that and see how it pans out
 #27422  by evelyette
 Tue Dec 15, 2015 8:07 pm
Hello,

The problem with #3 is now that I've researched it a little more is the following: when an external module is being loaded, it will check it's import table and scan for the functions in other DLLs. When it stumbles on the DnsGlobals exported array from dnsapi.dll, it will get the RVA from the export table and add the base address of that same proxy DLL to obtain the actual address of the data structure.

The problem is that once the external DLL loads the dnsapi.dll and gets the address of the structure, it will load the address into ecx and then try to load the offset +8 from that table (as seen on the picture below).
test.png
test.png (12.03 KiB) Viewed 876 times
So basically the DLL will reference some memory in the dnsapi.dll; there is no way to change it, since it's not using a pointer to the data structure (which would be better and possible solution), but is using the data structure directly. The [ecx+0x8] is still in our proxy DLL, becaus the pointer to the structure is tored in [ecx]; this would work if the original mov instruction would look something like this: [[ecx]+0x8].

Therefore, I see two possible solutions:

1. Overwrite the proxy DLL's RVA from 16-bit to 32-bit address in memory, which would (after adding the base address of the proxy dll), point exactly to the structure in original DLL dnsapi_.dll. The problem is that I'm not sure whether the loader while resolving the imports of the third-party DLL, will actually take this into account, so everything works as it should; the other problem is whether this will also work on 64-bit windows with 64-bit address space (will this technique still work and how large is every item in the array in 64-bit processes - if it can store 64-bit addresses, then it's ok I guess)?

2. Hook LoadLibrary in the kernel32.dll to scan over the imports structure of every DLL in order to identify whether DnsGlobals is being used, in which case the library is first loaded by completing the original LoadLibrary after which it's imported address is changed to point to the original DnsGlobals structure. The other problem is that external DLL dnsrslvr.dll actually loads the DLL proxy dnsapi.dll, which itself loads the original DLL dnsapi_.dll, so at the time when DllMain of the proxy is called, the dnsrslvr.dll LoadLibrary is already being executed, so we won't catch the loading of this library as we want to.

Any ideas how to simplify this?

Btw: do you have any ideas about the advantages/disadvantages about changing the original DLL by using trampolines vs. DLL proxy: it seems to me that the method using the DLL proxy requires a lot of additional steps that are not evident at first when thinking about theoretical solution. I'm seriously thinking about changing the original DLL instead, but I can't determine if I'll also stumble upon the somehow similar problems.
 #27423  by Brock
 Wed Dec 16, 2015 12:17 am
Btw: do you have any ideas about the advantages/disadvantages about changing the original DLL by using trampolines vs. DLL proxy: it seems to me that the method using the DLL proxy requires a lot of additional steps that are not evident at first when thinking about theoretical solution. I'm seriously thinking about changing the original DLL instead, but I can't determine if I'll also stumble upon the somehow similar problems.
Hooking would be much simpler and a lot less work but this depends on the method you use. The main disadvantages are that if the same API is already hooked or it's hooked after your hook is installed it could be removed or a crashed process could also be the end result. Other code might not check to see if the function is hooked first and overwrite your modifications. It could also look like malware, especially given the case that this is Dns functionality you are interested in hooking. Inline hooking is the preferred method because it will catch all calls to the API of interest. With IAT hooking you have to patch ALL loaded modules of the process for that particular function you wish to hook and it will not catch dynamic calls (GetProcAddress). Some developers will hook the EAT on top of patching the IAT as a backup hooking method but inline hooking would definitely be the preferred method most choose for tasks like this. Most Windows API functions were written to be hot patchable since XP SP2, you'll see useless/junk instructions such as MOV EDI, EDI at the function prologue.
 #27424  by evelyette
 Wed Dec 16, 2015 1:22 am
Hi,

Thank you for providing your answer, it was useful, but I still have some questions. First, would you be kind enough to comment on these, if you have anything to add. The first problem is effectively using EAT hooking, where the RVA addresses in the export table are changed in memory only (without writing anything to hard drive), right? The second example probably cannot be solved easily, since our DllMain should be called beforehand?
1. Overwrite the proxy DLL's RVA from 16-bit to 32-bit address in memory, which would (after adding the base address of the proxy dll), point exactly to the structure in original DLL dnsapi_.dll. The problem is that I'm not sure whether the loader while resolving the imports of the third-party DLL, will actually take this into account, so everything works as it should; the other problem is whether this will also work on 64-bit windows with 64-bit address space (will this technique still work and how large is every item in the array in 64-bit processes - if it can store 64-bit addresses, then it's ok I guess)?

2. Hook LoadLibrary in the kernel32.dll to scan over the imports structure of every DLL in order to identify whether DnsGlobals is being used, in which case the library is first loaded by completing the original LoadLibrary after which it's imported address is changed to point to the original DnsGlobals structure. The other problem is that external DLL dnsrslvr.dll actually loads the DLL proxy dnsapi.dll, which itself loads the original DLL dnsapi_.dll, so at the time when DllMain of the proxy is called, the dnsrslvr.dll LoadLibrary is already being executed, so we won't catch the loading of this library as we want to.
Hooking would be much simpler and a lot less work but this depends on the method you use. The main disadvantages are that if the same API is already hooked or it's hooked after your hook is installed it could be removed or a crashed process could also be the end result. Other code might not check to see if the function is hooked first and overwrite your modifications. It could also look like malware, especially given the case that this is Dns functionality you are interested in hooking. Inline hooking is the preferred method because it will catch all calls to the API of interest. With IAT hooking you have to patch ALL loaded modules of the process for that particular function you wish to hook and it will not catch dynamic calls (GetProcAddress). Some developers will hook the EAT on top of patching the IAT as a backup hooking method but inline hooking would definitely be the preferred method most choose for tasks like this. Most Windows API functions were written to be hot patchable since XP SP2, you'll see useless/junk instructions such as MOV EDI, EDI at the function prologue.
The main problem is that I'm dealing with processes that can be alive for a really short period of time, so I need to inject my own code into the process really fast. One of the solutions is using DLL proxy (which has quite a few implementation difficulties), the other is dll patching (which has problems you outlined below).

The inline hooking cannot be used, since this is a runtime hooking and then I'll have to detect when a process gets started in order to inject a DLL Into process fast enough for it to be taken into account. Even if the process is alive for longer periods of time, the time from process start till the time of injected DLL, is when dns requests can be made and consequently we miss them.

Therefore, in order to detect invocation of certain DLL exported functions, we have to either make changes permanent (written to the hard drive) or hook them every time a process starts, but doing it fast enough (a more difficult problem, so the first one is preferable). This is why I suggested changing the exported RVAs in memory together with using the DLL proxy - however I'm not sure if this will work the way I described in 1., so I would be really grateful if you can comment on that and highlight any possible problems.

Thank you
 #27427  by Brock
 Wed Dec 16, 2015 2:50 am
Yeah you definitely don't want to miss any calls to Dns APIs residing in DnsXxX DLLs (cache resolver included). If you want to inject a DLL at a very early stage of process creation (Main EXE -> ntdll.dll -> verifier.dll -> your DLL) and have your DLL not miss statically linked modules you can use an Application Verifier Custom Provider DLL paired with hooking. It's explained well here

http://www.kernelmode.info/forum/viewto ... =15&t=3418

and more information can be seen here

http://blogs.msdn.com/b/reiley/archive/ ... ifier.aspx

You will not need your proxy DLL any longer or have to worry about "beating" processes at creation time. It solves the DLL injection at an early stage for you and only requires admin rights for writing to the IFEO registry key and dropping your provider DLL in \System32. Might be worth some investigation for you. My DLL injection package does everything from a kernel driver but I'm aware you want this done completely in usermode, this should do it for you then.
 #27428  by evelyette
 Wed Dec 16, 2015 3:22 am
Hi,

Thank you for providing this, it seems like a viable option - I'll check it out and report back how it turned out. Btw: just out of curiosity, I would like to verify that the proxy DLL could be used together with overwriting the EAT. From what I'm aware of, there are two methods of dealing with EAT; the first one is patching the DLL on hard drive, where we can change the actual DLL by using detour patching, and since this is permanently written into the DLL and stored on the harddrive, it would work, but we've established this isn't a good solution.

The second option is EAT hooking, where the EAT is being hooked at runtime, so once the proxy DLL (dnsapi.dll) has been loaded, the RVAs in the AddressOfFunctions must be overwritten; in normal-case scenarios, the RVAs contain the 16-bit offsets into the current module, so a base address of the module is added to the offset by the loader to obtain the full address of the function. But if the 16-bit address is overwritten with the 32-bit address, where the address is calculated by adding the actual address of the function in the original DLL (dnsapi_.dll) from which the base address of the proxy dll module is subtracted (it will be added automatically by the loader, so we'll end up with the actual address of the function in the original DLL), then this should work. Is this right?

Also, do RVAs on 64-bit Windows hold 32-bit offsets and the array fields are 64-bit large.

I'm asking this out of curiosity, since it's very interesting.
 #27431  by evelyette
 Wed Dec 16, 2015 11:16 am
Brock wrote:Yeah you definitely don't want to miss any calls to Dns APIs residing in DnsXxX DLLs (cache resolver included). If you want to inject a DLL at a very early stage of process creation (Main EXE -> ntdll.dll -> verifier.dll -> your DLL) and have your DLL not miss statically linked modules you can use an Application Verifier Custom Provider DLL paired with hooking. It's explained well here

http://www.kernelmode.info/forum/viewto ... =15&t=3418

and more information can be seen here

http://blogs.msdn.com/b/reiley/archive/ ... ifier.aspx

You will not need your proxy DLL any longer or have to worry about "beating" processes at creation time. It solves the DLL injection at an early stage for you and only requires admin rights for writing to the IFEO registry key and dropping your provider DLL in \System32. Might be worth some investigation for you. My DLL injection package does everything from a kernel driver but I'm aware you want this done completely in usermode, this should do it for you then.
Hi, I've taken a look at this approach and it seems it uses the IFEO registry keys, which is not what I need; I couldn't also used the IFEO directly myself without the verifier (although it allows you to hook the process earlier in the startup).

The problem is that the program name is not known and can be anything, so this will only work for the processes whose names I know in advance, but not for all user-land processes on the system.

Therefore I'm back to the proxy DLL option, because I don't see this being possible by doing anything else (except again revert back to detecting the process creation in user-land mode).

Therefore, a few comments/tips on the previous comment would be great, so I can verify my thoughts.
 #27435  by Brock
 Wed Dec 16, 2015 8:18 pm
I would like to verify that the proxy DLL could be used together with overwriting the EAT
Absolutely
the first one is patching the DLL on hard drive
Bad idea, especially DLLs in the System32\SysWow64 folder. Things like SFC won't like this, either. We generally refer to permanent on-disk file changes as "cold"patching. If we modify memory of a function (e.g: API hooking at run-time) we're "hot"patching and the file's disk representation is never touched in any way (which is good), only in memory are modifications applied.

A little about IAT patching/hooking, this works fine on x86 and x64 because the address being replaced is absolute (the actual virtual/linear address of a function with the base added to the relative offset). If you look at say ntimage.h you'll see that under the // import area there is IMAGE_THUNK_DATA32 and a 64 version of the structure that allows the Function member of the structure to be sizeof(PVOID) accordingly. On x86 this is sizeof(PVOID) = sizeof(ULONG) and on x64 it's sizeof(PVOID) = sizeof(ULONGLONG). Now, with EAT this is not the case, instead of a function address (during replacement) being absolute it's relative as you already know, this means that anything that does not fit within the confines of the DWORD in AddressOfFunctions will be truncated. So, if you have a function whose address is say 0x7FFFFFAFFFE it will be truncated to fit within the RVA DWORD. Remember, the imagebase is added to this value during GetProcAddress lookups, the first parameter is the HMODULE which it simply just adds to it, so there is no reason why MS would need to have these RVAs larger than a DWORD field as they are relative to the module whose export table is being searched. If you pursued EAT patching in memory on x64 you'll want to check the address of your function and if it's too large to fit you could potentially set your DLLs imagebase with /BASE to something small and generally random or you could allocate virtual memory with a NULL base when your DLL is loaded and copy your function there, it's more or less the opposite of what MEM_TOP_DOWN does (where the returned VA is the highest possible allocation address and complete opposite of what you would want, since you want this value low). FWIW, I personally wouldn't use a combination of IAT/EAT hooking when you could write directly to the original module's code with inline patching.
 #27438  by evelyette
 Wed Dec 16, 2015 10:06 pm
Bad idea, especially DLLs in the System32\SysWow64 folder. Things like SFC won't like this, either. We generally refer to permanent on-disk file changes as "cold"patching. If we modify memory of a function (e.g: API hooking at run-time) we're "hot"patching and the file's disk representation is never touched in any way (which is good), only in memory are modifications applied.
I agree, but currently this is the best approach so far, because it can detect dnsapi.dll in all processes, even if they are alive for a really short time. Btw: what is SFC?
A little about IAT patching/hooking, this works fine on x86 and x64 because the address being replaced is absolute (the actual virtual/linear address of a function with the base added to the relative offset). If you look at say ntimage.h you'll see that under the // import area there is IMAGE_THUNK_DATA32 and a 64 version of the structure that allows the Function member of the structure to be sizeof(PVOID) accordingly. On x86 this is sizeof(PVOID) = sizeof(ULONG) and on x64 it's sizeof(PVOID) = sizeof(ULONGLONG). Now, with EAT this is not the case, instead of a function address (during replacement) being absolute it's relative as you already know, this means that anything that does not fit within the confines of the DWORD in AddressOfFunctions will be truncated. So, if you have a function whose address is say 0x7FFFFFAFFFE it will be truncated to fit within the RVA DWORD. Remember, the imagebase is added to this value during GetProcAddress lookups, the first parameter is the HMODULE which it simply just adds to it, so there is no reason why MS would need to have these RVAs larger than a DWORD field as they are relative to the module whose export table is being searched. If you pursued EAT patching in memory on x64 you'll want to check the address of your function and if it's too large to fit you could potentially set your DLLs imagebase with /BASE to something small and generally random or you could allocate virtual memory with a NULL base when your DLL is loaded and copy your function there, it's more or less the opposite of what MEM_TOP_DOWN does (where the returned VA is the highest possible allocation address and complete opposite of what you would want, since you want this value low).
This is true, inside proxy DLL, we can also go to the end of the .text section and reserve a 64-bit array there and put the "jmp origfunc" in there; then the EAT can be overwritten to point to these addresses. When a structure is exported as it is in my case, this is still not possible, so this is only possible if the addresses are close together.
FWIW, I personally wouldn't use a combination of IAT/EAT hooking when you could write directly to the original module's code with inline patching.
I'm not sure how this would work exactly, since I have to hook the dnsapi.dll functions in ALL processes, so I can't rely on IFEO to hook only certain processes. It would be great if you can provide a possible/probable scenario of how to inject the code into the process in the first place: the inline hooking itself is not a problem and is a better solution in any case, but I can't use it, since I have to hook ALL short-lived processes on the system. In any case, if you can provide a scenario where this is taken into account, I will gladly use this technique.

Now I've implemented the hooking of the EAT function, which works, but whenever calling "nslookup google.com" in a terminal I get the following crash - this happens after the call the FreeLibrary returns sucessfully, so the execution is already outside of the proxy DLL. Does anybody have any tips what may be wrong, to point me into the right direction what to look for?
ntdll!KiFastSystemCallRet:
77d170b4 c3 ret
0:000> k
ChildEBP RetAddr
000df8dc 77d168d4 ntdll!KiFastSystemCallRet
000df8e0 77d2e1a7 ntdll!ZwTerminateProcess+0xc
000df8fc 76282164 ntdll!RtlExitUserProcess+0x85
WARNING: Stack unwind information not available. Following frames may be wrong.
000df910 77a636dc kernel32!ExitProcess+0x15
000df91c 77a63372 msvcrt!exit+0x32
000df954 77a636bb msvcrt!dup+0x2a9
000df968 00e86a13 msvcrt!exit+0x11
000df9e0 00e8cb14 nslookup+0x6a13
000dfa24 76283c45 nslookup+0xcb14
000dfa30 77d337f5 kernel32!BaseThreadInitThunk+0x12
000dfa70 77d337c8 ntdll!__RtlUserThreadStart+0x70
000dfa88 00000000 ntdll!_RtlUserThreadStart+0x1b
I've had HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug enabled, but why did the debugger break on exception if that actually didn't happen. If I delete the AeDebug, the nslookup doesn't report any problems anymore.
Last edited by evelyette on Wed Dec 16, 2015 10:31 pm, edited 1 time in total.