A forum for reverse engineering, OS internals and malware analysis 

Forum for analysis and discussion about malware.
 #1611  by Evilcry
 Mon Jul 19, 2010 5:36 am
Hi,

In this period I'm heavy working on Microsoft Compound Files, the Format that pertain to MS Office Files, like:

Doc
Xls
Ppt


In this blog post I'll not cover details of file format due to the fact that I'm writing a complete Research Paper on the argument plus the CFScreener (Compound File Format Forensic Tool).

Compound Files are a FAT based FileSystem placed into a file.

Classical Directories are called Storage.
Classical Filese are called Streams.


What I'm going to do is to show you a python script that I've carved out some day ago, that:

Given a Malicious Executable (.exe) and a clean .doc builds a final infected doc file.

Image

At offset 0 we have 0xd0cf11ea1b11ae1 this is a signature necessary to identify a Compound File. The variable compressedfile contains the skeleton of the malicious .doc, let's see the code involved:

Image

The malicious executable need to be shorter that 10240 bytes, successively we can observe a byte transfer from the malicious executable to the skeleton of the malicious doc, starting from 0x4c00 location of the evilbuffer (which is compressedfile variable); executable is pushed inside by performing a XOR operation.

When first transfer is performed, it's opened the Clean doc to infect, let's see the code:

Image

the clean doc cannot be greater that 1411072 bytes, the mechanism is similar to the previous, this time the Clean .doc is transfered into the evil one starting from offset 0x7400 and by XORing it with key 0x25, finally the evilbuffer is dumped into a new file that's the malicious doc.

XOR operation, especially on the executable, is performed to deceive basilar inspections and tools based on searching MZ Pattern.
The shellcode will provide a proper xor to decrypt the malicious executable.

Now we can perform a Differential Analysis between the Clear doc and Malicious to evidence the variations that finally can produce an Evidence of Infection.

This is the view of CFBF Header of the clean doc:

Image

Here the Header of out.doc

Image

In red you can see the changes, the most important is the Directory Sector value that is set on 0x33.

DirSect1 is the first SECT in the directory chain so the next core structure that we have to inspect is the DirectoryEntries.

This structure manages Storage and Streams, here is placed the malicious executable.

Again let's see via differential approach what changes we have:

Image

Here the malicious directory entries:

Image

The last two Directories presents two interesting aspects, entry 6 is an ObjectPool and second one is marked as Orphaned.

ObjectPool Directory is a Special Storage, where OLE objects are stored in, each OLE object is placed in a substorage of ObjectPool with a random name. Some COM and OLE vulnerabilities allow for an escalation of privileges and lack proper input filtration, leading to the compromise of systems running MS Office applications.

In this special storage goes for example embedded objects like xls or other office files.

New step as should be clear is now to inspect the content of ObjectPool. In this case we cannot use common tools of Doc Viewing that enumerates storages and streams, because our malicious file presents some malformations in the structure, this will lead common tools to errors, because the make massive use of IStorage and IStream Interface. We need a tool that performs a "raw" direct inspection of the Compound File, the best choise is LibForensics a library for developing digital forensics applications. Currently it is developed in pure Python.

http://libforensics.com/

Between Demo Tools we have olels.py that can enumerate storages and streams:


Command Line : out1.doc
r/r 3 \x05SummaryInformation
r/r 2 WordDocument
r/r 5 \x01CompObj
r/r 1 1Table
-/- 6 ObjectPool
r/r 4 \x05DocumentSummaryInformation
v/v 8 $Header
v/v 9 $DIFAT
v/v 10 $FAT
v/v 11 $MiniFAT

Not many informations from ObjectPool cause as you can see we have "-" before the ID ( 6 ), but let's carve as many informations as possible, now we will use olestat.py which show us detailed informations on Storage and streams


Command Line : out1.doc 6
Stream id: 6
Name size: 5632
Name: ObjectPool
Type: Invalid / Unknown (0x0)
Color: Black (0x1)
Left stream id: 4294967040
Right stream id: STREAM_ID_NONE (0xFFFFFFFF)
Child stream id: STREAM_ID_NONE (0xFFFFFFFF)
CLSID: 000000ff-0000-0000-0000-000000000000
State: 0x0
Size: 0
Timestamps:
Stream creation time: 14409739024056848384
Stream modification time: 14409739024056848385
Mini stream sector(s):
1


Here we have some Truly Interesting information, as you can see Timestamps filed is not Zeroed, this element due to the statical nature of the file, in other words due to the fact that the file is stored into a static block of data, constitutes an:

Evidence of Malicious File, Stream Creation Time and Stream Modification Time can be used to Identify the Threat.

Further investigations reveal that the malicious executable is placed into a Slack Space. Here a little clarification directly taken from the paper:

Due to the fact that we are in front off a classical FAT FS implementation, we can hypothize also the presence of Slack Space like every real FileSystem. This implies also the probable presence of Hidden Informations into Unallocated and Slack Spaces. Obviously Unallocated Data is not taken in consideration by the File Interpreter, also because it does not contain valuable information for the classical user, but as should be clear an examiner have the interest in extracting also hidden information, so we need specific procedure to carve out directly (manually or with automated tool) the blocks of unallocated data.

Image

In our case this Slack space is placed exactly after the Orphaned DirectoryEntry:

Image

Successively at offset 0x4C00

Image

You can verify the presence of the malicious file, indeed 0x687F XOR 0x25 = 'MZ'

This little blog post show the basical approach to the analysis of Microsoft Compound File Formats, this method can be used for:

Malware Analysis
Forensics Investigation

More details will be covered in my future paper.

See you to the next post..
Giuseppe 'Evilcry' Bonfa