A forum for reverse engineering, OS internals and malware analysis 

Forum for discussion about user-mode development.
 #156  by Dreg
 Mon Mar 15, 2010 9:17 am
X86IME from x86pfxlab <- This is my favorite engine, This engine is a x86 and x86_64 (32/64bits) disassembler/assembler of my friend Pluf.

The engine: It exist an intermediata object called x86im_instr_object, with this object you can: generate intructions, view dissasembly like a LDE or like INTEL syntax directly:
Code: Select all
typedef struct _x86im_instr_object                      // x86 decoded/generated instruction:
{
    unsigned long mode;                                 // mode: 32/64bits
    unsigned long flags;                                // instr flags
    unsigned long id;                                   // instr id
    unsigned long grp;                                  // instr grp & subgrp
    unsigned long mnm;                                  // instr mnemonic
    unsigned long len;                                  // total instr length
    unsigned char def_opsz;                             // default operand size: 1/2/4/8
    unsigned char def_adsz;                             // default address size: 16bit = 2 | 32bit = 4 | 64bit = 8
    unsigned char opcode[3];                            // instr opcodes: up to 3
    unsigned char opcode_count;                         // instr opcode count
    unsigned short prefix;                              // instr prefixes ( mask )
    unsigned char prefix_values[4];                     // prefixes
    unsigned char prefix_count;                         // instr prefix count
    unsigned long prefix_order;                         // instr prefix order
    unsigned char rexp;                                 // REX prefix
    unsigned char somimp;                               // mandatory prefix: SOMI instr only: 0x66|0xF2|0xF3
    unsigned char n3did;                                // 3dnow instr id
    unsigned char seg;                                  // implicit segment register used by mem operands:
    unsigned char w_bit;                                // wide bit value: 0/1 - if IF_WBIT
    unsigned char s_bit;                                // sign-extend bit value: 0/1 - if IF_SBIT
    unsigned char d_bit;                                // direction bit value: 0/1 - if IF_DBIT
    unsigned char gg_fld;                               // granularity field value: 0-2 ( mmx ) - if IF_GGFLD
    unsigned char tttn_fld;                             // condition test field value: if IF_TTTN
    unsigned short selector;                            // explicit segment selector used by CALL/JMP far: IF_SEL
    unsigned long imm_size;                             // imm size: 0 | (1/2/4/8)
    unsigned long long imm;                             // imm value: 64bit max value ( if imm_size != 0 )
    unsigned long disp_size;                            // disp size: 0 | (1/2/4/8)
    unsigned long long disp;                            // disp value: 64bit max value ( if disp_size != 0 )
    unsigned char mem_flags;                            // mem flags: src/dst/..
    unsigned short mem_am;                              // addressing mode
    unsigned short mem_size;                            // operand size ( xxx ptr )
    unsigned char mem_base;                             // base reg : grp+id
    unsigned char mem_index;                            // index reg: grp+id
    unsigned char mem_scale;                            // scale reg: grp+id
    unsigned char modrm;                                // modrm byte value & fields: if IF_MODRM
    unsigned char sib;                                  // sib byte value & fields: if IF_SIB
    unsigned long rop[4];                               // imp/exp reg op array
    unsigned char rop_count;                            // imp/exp reg op count
    unsigned int status;
    void *data;
} x86im_instr_object;
To dissasembly a instruction you need the parameters: x86im_instr_object, the mode X86IM_IO_MODE_32BIT or 64BIT, the data, is the buffer with the
instruction.
Code: Select all
int __stdcall x86im_dec( __inout x86im_instr_object *io,
                          __in unsigned long mode,
                          __in unsigned char *data )
Example of dissasembly of POP EAX instruction:
Code: Select all
x86im_instr_object io;
char *d = "\x58"; /* POP EAX, OPCODE */
x86im_dec( &io,
            X86IM_IO_MODE_32BIT,
            d );
You can access to INTEL syntax string with io.data

To generate an instruction, you need two steps, first generate a valid instruction with the code and operands reg/mem/disp/imm:
Code: Select all
int __stdcall x86im_gen( __inout x86im_instr_object *io,
                          __in unsigned long options,
                          __in unsigned long code,
                          __in unsigned long reg,
                          __in unsigned long mem,
                          __in unsigned long long disp,
                          __in unsigned long long imm )
Example of the generation of a POP EAX instruction:
Code: Select all
x86im_instr_object io;
x86im_gen( &io,
            X86IM_IO_MODE_32BIT|X86IM_GEN_OAT_NPO_D,
            X86IM_GEN_CODE_POP_RG1,
            X86IM_IO_ROP_ID_EAX, 0, 0, 0 );
There are many macros very very useful in the headers, like X86IM_GEN_CODE_POP_RG1 or macros like: X86IM_IO_IS_GPI_ADC(x) to check the ( ( (x)->id & 0xFFF0 ) == 0x0060 ), with this macros the code is very intuitive and you do not need hardcode values with many coments... IMHO, of course.

The nex step is the instruction encode with the x86im_enc interface:
Code: Select all
int __stdcall x86im_enc( __inout x86im_instr_object *io,
                          __out unsigned char *data )
With this function you get the real instruction in data buffer, to get the raw instruction in data of the POP EAX instruction generated in io with x86im_gen instruction:
Code: Select all
x86im_instr_object io;
char data[1];
x86im_enc( &io, data );
Now, you can dump the raw instruction stored data in somewhere.

With this powerful engine you can generate the same instruction with redundancy, for example of the ADD instruction:

Raw instruction: 03 C3
INTEL representation: ADD EAX, EBX
Mod:11, reg:000 and r/m:011

The same representation is with this raw: 01 D8, Mod:11 reg:011 and r/m:000.

You can generate any redundancy using the macros without hard values.

Donwload X86IME v1.0: http://sites.google.com/site/x86pfxlab/projects
Patch to compile in UNIX by nibble: http://nibble.develsec.org/get/x86im-1.0b.tar.gz
Last edited by GamingMasteR on Thu Oct 14, 2010 5:43 pm, edited 1 time in total. Reason: Added [code] tag
 #472  by j00ru
 Sat Mar 27, 2010 1:45 pm
When it comes to disassembly engines, I would recommend two more projects.

1. Udis86
Project website: http://udis86.sourceforge.net/
Description: A tiny engine, that aims to be as simple and convenient to the programmer as possible. I personally use it every time when I need to quickly add a disassembly routing in my tool. From the project site:
udis86 is an easy-to-use minimalistic disassembler library (libudis86) for the x86 and x86-64 class of instruction set architectures. The primary intent of the design and development of udis86 is to aid software development projects that entail binary code analysis.
2. diStorm64
Project website: http://ragestorm.net/distorm/
Description: More of a professional engine, supporting numberous processor extensions (MMX, SSE, SSE2, SSE3, SSSE3, SSE4, 3DNow!). From the project site:
diStorm is a binary stream disassembler. It's capable of disassembling 80x86 instructions in 64 bits (AMD64, X86-64) and both in 16 and 32 bits. In addition, it disassembles FPU, MMX, SSE, SSE2, SSE3, SSSE3, SSE4, 3DNow! (w/ extensions), new x86-64 instruction sets, VMX, and AMD's SVM! diStorm was written to decode quickly every instruction as accurately as possible. Robust decoding, while taking special care for valid or unused prefixes, is what makes this disassembler powerful, especially for research. Another benefit that might come in handy is that the module was written as multi-threaded, which means you could disassemble several streams or more simultaneously.
Have fun! ;>
 #2882  by cvndgf
 Mon Sep 27, 2010 8:46 am
x86im bug:
when disassembling the "call offset" instruction imm_size definition as 0 and field imm contains invalid value.
Sorry for my english, thx.
 #3044  by frank_boldewin
 Thu Oct 14, 2010 4:13 pm
i usually use CADT v1.1 from MS-REM.

sample usage from my easy disview tool which is part of officemalscanner.
Code: Select all
// DisView (c) 2009 by Frank Boldewin / http://www.reconstructer.org
// Just a simple CADT based disasm viewer
// Latest build with Microsoft Visual Studio 2008:
// cl /c /nologo DisView.cpp
// link /dynamicbase /nxcompat DisView.obj

#include <windows.h>
#include <stdio.h>
#include <string.h>
#include "cadtlib.h"

#pragma comment(lib,"cadt.lib")
#pragma strict_gs_check(on)

#define CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES 1
#define CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES_MEMORY 1 

LPBYTE lpGlobalBuffer = NULL;
DWORD  GlobalBufferSize = 0;

void Usage()
{
    printf("\nUsage:\n------\n");
    printf("DisView <file> <offset to start>\n");
    printf("\nExample:\n");
    printf("\tDisView evil.ppt 0x10e4c\n");
    exit(-1);
}

void Disasm(PVOID addr, ULONG naddr)
{
  char dBuff[1024];
  PVOID cPtr = 0;
  ULONG Len = 0;
  UINT i = 0;
  UINT j = 0;
  TDisCommand Command;
  TInstruction Instr;
  TMnemonicOptios Options;

  cPtr = addr;
  Options.RealtiveOffsets = TRUE;
  Options.AddAddresPart   = TRUE;
  Options.AlternativeAddres = DWORD(naddr);
  Options.AddHexDump = TRUE;
  Options.MnemonicAlign = 35;

  printf("\n");

  for(i=0;i<48;i++)
  {
    memset(&Instr, 0, sizeof(TInstruction));
    memset(&Command, 0, sizeof(TDisCommand));
    Len = InstrDecode(cPtr, &Instr, FALSE);
    InstrDasm(&Instr, &Command, FALSE);
    MakeMnemonic(dBuff, &Command, &Options);
    printf("%s\n",dBuff);
    cPtr = (PVOID)((ULONG)cPtr + Len);
    j = j + Len;
    Options.AlternativeAddres = DWORD(naddr)+j;
  }

  printf("--------------------------------------------------------------------------\n\n");

  return;
}

int main(int argc, char *argv[])
{
  HANDLE hFile = INVALID_HANDLE_VALUE;
  HANDLE hFileMappingObject = NULL;
  LPBYTE lpBaseAddr = NULL;
  LPBYTE lpImageBuffer = NULL;
  DWORD  dwSize = 0;
  unsigned long addr;
  char *addrval;


  unsigned int FuncNamesArraySize = 15;
  HANDLE hConsole = GetStdHandle( STD_OUTPUT_HANDLE );

  if (argc!=3)
    Usage();
  
  if ((hFile = CreateFile(argv[1],
                          GENERIC_READ,
                          FILE_SHARE_READ,
                          NULL,
                          OPEN_EXISTING,
                          FILE_ATTRIBUTE_NORMAL,
                          NULL)) == INVALID_HANDLE_VALUE)
  {
    printf ("\nCannot open file %s\n",argv[1]);
    exit(-2);
  }

  if (((dwSize = GetFileSize(hFile,
                             NULL)) == INVALID_FILE_SIZE))
  {
    printf ("\nUnable to retrieve filesize for %s. RC=%d\n",argv[1],GetLastError());
    exit(-3);
  }
  
  printf("Filesize is %lu (0x%x) Bytes\n",dwSize,dwSize);

  if((hFileMappingObject = CreateFileMapping(hFile,
                                             NULL,
                                             PAGE_READONLY,
                                             0,
                                             0,
                                             NULL)) == NULL)
  {
    printf("\nError creating a filemapping. RC=%d\n",GetLastError());
    exit(-4);
  }

  if ((lpBaseAddr = (LPBYTE)MapViewOfFile(hFileMappingObject,
                                          FILE_MAP_READ,
                                          0,
                                          0,
                                          0)) == NULL)
  {
    printf("\nError while mapping view of file. RC=%d\n",GetLastError());
    exit(-5);
  }

  lpGlobalBuffer = (LPBYTE)LocalAlloc(LMEM_FIXED | LMEM_ZEROINIT, dwSize+1);
  if (lpGlobalBuffer)
  {
      GlobalBufferSize = dwSize+1;
      CopyMemory(lpGlobalBuffer, lpBaseAddr, dwSize);
      UnmapViewOfFile(lpBaseAddr);
      CloseHandle(hFileMappingObject);
      CloseHandle(hFile);
      addrval = (char*)malloc(strlen(argv[2])+1);
      memset(addrval,0,strlen(argv[2])+1);
      strncpy(addrval,argv[2],strlen(argv[2]));
      if ((addrval[0]=='0') && ((addrval[1]=='x')||(addrval[1]=='X'))) addrval+=2;
  
      if(strlen(addrval) > 8)
      {
        printf("\nHexvalue of offset exeeds limit!\n");
        exit(-6);
      }
      else
      {
        addr = strtoul (addrval, NULL, 16);
        if(addr > dwSize)
        {
          printf("\nSorry, the given address is out of the filerange!\n");
          exit(-8);
        }
        Disasm(lpGlobalBuffer + addr,addr);
      }
      LocalFree(lpGlobalBuffer);
  }
  else
  {
      printf("LocalAlloc() Error - rc= %08x\n", GetLastError());
      exit(-7);
  }

  exit(0);
}
Attachments
(20.75 KiB) Downloaded 59 times
Last edited by GamingMasteR on Thu Oct 14, 2010 4:24 pm, edited 1 time in total. Reason: Added [code] tag
 #3046  by GamingMasteR
 Thu Oct 14, 2010 4:31 pm
I tried to port cadt source code to C language but something goes wrong .
I'm not that good in pascal, maybe someone will make it work some day :)
Attachments
(264.13 KiB) Downloaded 58 times
 #4315  by EP_X0FF
 Fri Jan 07, 2011 6:10 am
Original Ms-Rem cadt seems to be has bug in GetSibStr function. it is trying to StrCpy NIL pointer into output buffer, resulting in access violation.
 #5460  by EreTIk
 Mon Mar 14, 2011 4:59 pm
HDE64C: small disassembler engine, intended for parse of x86-64 code
Author: Patkov, Veacheslav

Features:
[+] support General-Purpose, FPU, MMX, SSE-SSE3, 3DNow! instructions
[+] high-speed and small size
[+] operating system independent code
[-] version is final, new instructions/features support dropped

Author's notes/Download: http://vx.netlux.org/vx.php?id=eh04