[Scummvm-cvs-logs] SF.net SVN: scummvm:[50767] tools/branches/gsoc2010-decompiler/decompiler

pidgeot at users.sourceforge.net pidgeot at users.sourceforge.net
Fri Jul 9 20:45:33 CEST 2010


Revision: 50767
          http://scummvm.svn.sourceforge.net/scummvm/?rev=50767&view=rev
Author:   pidgeot
Date:     2010-07-09 18:45:32 +0000 (Fri, 09 Jul 2010)

Log Message:
-----------
Add support for storing opcode number as part of Instruction
Make SimpleDisassembler automatically update this field
Add opcode number to tests
Add documentation about _opcode field

Modified Paths:
--------------
    tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex
    tools/branches/gsoc2010-decompiler/decompiler/instruction.h
    tools/branches/gsoc2010-decompiler/decompiler/simple_disassembler.h
    tools/branches/gsoc2010-decompiler/decompiler/test/disassembler_test.h

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex	2010-07-09 18:35:46 UTC (rev 50766)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex	2010-07-09 18:45:32 UTC (rev 50767)
@@ -19,6 +19,7 @@
 
 Each member of this struct has a specific purpose:
 \begin{itemize}
+\item \verb+_opcode+ is used to store the numeric opcode associated with the instruction. This is not used by the decompiler itself, but is for your reference during later parts of the decompilation process. Note that this field is declared as a 32-bit integer; if you need more than 4 bytes for your opcodes, you'll need to figure out which bytes you want to store if you want to use this field.
 \item \verb+_address+ stores the absolute memory address where this instruction would be loaded into memory.
 \item \verb+_stackChange+ stores the net change of executing this instruction - for example, if the instruction pushes a byte on to the stack, this should be set to 1. This is used to determine when each statement ends. The count can be in any unit you wish - bytes, words, bits - as long as the same unit is used for all instructions. This means that if your stack only works with 16-bit elements, pushing an 8-bit value and pushing a 16-bit value should have the same net effect on the stack.
 \item \verb+_name+ contains the name of the instruction. You will use this during code generation.
@@ -134,6 +135,8 @@
 \end{lstlisting}
 \end{C++}
 
+The \verb+OPCODE+ macro automatically stores the full opcode in the \verb+_opcode+ field of the generated \verb+Instruction+.
+
 \subsubsection{Parameter reading}
 \verb+PUSH+, \verb+PUSH2+ and \verb+PRINT+ all take parameters as part of the instruction. To read these, you must specify them as part of the parameter string, using one character per parameter. The types understood by default are specified in Table~\vref{tbl:paramtypes}.
 
@@ -223,7 +226,7 @@
 \end{lstlisting}
 \end{C++}
 
-Subopcodes can be nested if the instruction set requires it.
+Subopcodes can be nested if the instruction set requires it. For subopcodes, the \verb+_opcode+ field stores the bytes in the order they appear in the file - i.e., the HALT instruction would have the opcode value 0xFF00. If the opcodes are longer than 4 bytes, only the last 4 bytes will be stored.
 
 \subsubsection{Advanced opcode handling}
 If you have one or two opcodes that don't quite fit into the framework provided, you can define your own specialized handling for these opcodes.
@@ -238,4 +241,6 @@
 \end{lstlisting}
 \end{C++}
 
+\verb+OPCODE_BASE+ automatically keeps track of the current opcode value. You can access \verb+full_opcode+ to get the current full opcode.
+
 For your convenience, a few additional macros are available: \verb+ADD_INST+, which adds an empty instruction to the vector, and \verb+LAST_INST+ which retrieves the last instruction in the vector. Additionally, you can use \verb+INC_ADDR+ as a shorthand for incrementing the address variable by 1, but note that you should \emph{not} increment the address for the opcode itself - this is handled by the other macros.

Modified: tools/branches/gsoc2010-decompiler/decompiler/instruction.h
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/instruction.h	2010-07-09 18:35:46 UTC (rev 50766)
+++ tools/branches/gsoc2010-decompiler/decompiler/instruction.h	2010-07-09 18:45:32 UTC (rev 50767)
@@ -97,6 +97,7 @@
  * Structure for representing an instruction.
  */
 struct Instruction {
+	uint32 _opcode;                 ///< The instruction opcode.
 	uint32 _address;                ///< The instruction address.
 	int16 _stackChange;             ///< How much this instruction changes the stack pointer by.
 	std::string _name;              ///< The instruction name (opcode name).

Modified: tools/branches/gsoc2010-decompiler/decompiler/simple_disassembler.h
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/simple_disassembler.h	2010-07-09 18:35:46 UTC (rev 50766)
+++ tools/branches/gsoc2010-decompiler/decompiler/simple_disassembler.h	2010-07-09 18:45:32 UTC (rev 50767)
@@ -56,6 +56,7 @@
 #define START_OPCODES \
 	_address = _addressBase; \
 	while (_f.pos() != (int)_f.size()) { \
+		uint32 full_opcode = 0; \
 		uint8 opcode = _f.readByte(); \
 		switch (opcode) {
 #define END_OPCODES \
@@ -65,12 +66,16 @@
 		INC_ADDR; \
 	}
 
-#define OPCODE_BASE(val) case val:
+#define OPCODE_BASE(val) \
+	case val: \
+		full_opcode = (full_opcode << 8) + val;
+
 #define OPCODE_END break;
 
 #define OPCODE(val, name, category, stackChange, params) \
 	OPCODE_BASE(val)\
 		ADD_INST; \
+		LAST_INST._opcode = full_opcode; \
 		LAST_INST._address = _address; \
 		LAST_INST._stackChange = stackChange; \
 		LAST_INST._name = std::string(name); \

Modified: tools/branches/gsoc2010-decompiler/decompiler/test/disassembler_test.h
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/test/disassembler_test.h	2010-07-09 18:35:46 UTC (rev 50766)
+++ tools/branches/gsoc2010-decompiler/decompiler/test/disassembler_test.h	2010-07-09 18:45:32 UTC (rev 50767)
@@ -34,6 +34,7 @@
 			p.open("decompiler/test/hanoi20.pasb");
 			std::vector<Instruction> insts = p.disassemble();
 			TS_ASSERT(insts[0]._address == 0);
+			TS_ASSERT(insts[0]._opcode == 0x00);
 			TS_ASSERT(insts[0]._name == "PUSH");
 			TS_ASSERT(insts[0]._stackChange == 0);
 			TS_ASSERT(insts[0]._params[0]._type == kInt);
@@ -53,6 +54,7 @@
 			s.open("decompiler/test/subopcode_test.bin");
 			std::vector<Instruction> insts = s.disassemble();
 			TS_ASSERT(insts[0]._name == "FOO");
+			TS_ASSERT(insts[0]._opcode == 0xFFFF);
 		} catch (...) {
 			TS_ASSERT(false);
 		}
@@ -79,17 +81,47 @@
 			s.open("decompiler/test/script-15.dmp");
 			std::vector<Instruction> insts = s.disassemble();
 			TS_ASSERT(insts.size() == 11);
-			TS_ASSERT(insts[0]._address == 0 && insts[0]._name == "pushWordVar" && insts[0]._params[0].getUnsigned() == 16384);
-			TS_ASSERT(insts[1]._address == 3 && insts[1]._name == "writeWordVar" && insts[1]._params[0].getUnsigned() == 197);
-			TS_ASSERT(insts[2]._address == 6 && insts[2]._name == "pushWord" && insts[2]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[3]._address == 9 && insts[3]._name == "pushWord" && insts[3]._params[0].getSigned() == 11);
-			TS_ASSERT(insts[4]._address == 12 && insts[4]._name == "pushWord" && insts[4]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[5]._address == 15 && insts[5]._name == "startScript");
-			TS_ASSERT(insts[6]._address == 16 && insts[6]._name == "pushWord" && insts[6]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[7]._address == 19 && insts[7]._name == "pushWord" && insts[7]._params[0].getSigned() == 14);
-			TS_ASSERT(insts[8]._address == 22 && insts[8]._name == "pushWord" && insts[8]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[9]._address == 25 && insts[9]._name == "startScript");
-			TS_ASSERT(insts[10]._address == 26 && insts[10]._name == "stopObjectCodeB");
+			TS_ASSERT(insts[0]._address == 0);
+			TS_ASSERT(insts[0]._opcode == 0x03);
+			TS_ASSERT(insts[0]._name == "pushWordVar");
+			TS_ASSERT(insts[0]._params[0].getUnsigned() == 16384);
+			TS_ASSERT(insts[1]._address == 3);
+			TS_ASSERT(insts[1]._opcode == 0x43);
+			TS_ASSERT(insts[1]._name == "writeWordVar");
+			TS_ASSERT(insts[1]._params[0].getUnsigned() == 197);
+			TS_ASSERT(insts[2]._address == 6);
+			TS_ASSERT(insts[2]._opcode == 0x01);
+			TS_ASSERT(insts[2]._name == "pushWord");
+			TS_ASSERT(insts[2]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[3]._address == 9);
+			TS_ASSERT(insts[3]._opcode == 0x01);
+			TS_ASSERT(insts[3]._name == "pushWord");
+			TS_ASSERT(insts[3]._params[0].getSigned() == 11);
+			TS_ASSERT(insts[4]._address == 12);
+			TS_ASSERT(insts[4]._opcode == 0x01);
+			TS_ASSERT(insts[4]._name == "pushWord");
+			TS_ASSERT(insts[4]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[5]._address == 15);
+			TS_ASSERT(insts[5]._opcode == 0x5E);
+			TS_ASSERT(insts[5]._name == "startScript");
+			TS_ASSERT(insts[6]._address == 16);
+			TS_ASSERT(insts[6]._opcode == 0x01);
+			TS_ASSERT(insts[6]._name == "pushWord");
+			TS_ASSERT(insts[6]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[7]._address == 19);
+			TS_ASSERT(insts[7]._opcode == 0x01);
+			TS_ASSERT(insts[7]._name == "pushWord");
+			TS_ASSERT(insts[7]._params[0].getSigned() == 14);
+			TS_ASSERT(insts[8]._address == 22);
+			TS_ASSERT(insts[8]._opcode == 0x01);
+			TS_ASSERT(insts[8]._name == "pushWord");
+			TS_ASSERT(insts[8]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[9]._address == 25);
+			TS_ASSERT(insts[9]._opcode == 0x5E);
+			TS_ASSERT(insts[9]._name == "startScript");
+			TS_ASSERT(insts[10]._address == 26);
+			TS_ASSERT(insts[10]._opcode == 0x66);
+			TS_ASSERT(insts[10]._name == "stopObjectCodeB");
 		} catch (...) {
 			TS_ASSERT(false);
 		}
@@ -103,11 +135,25 @@
 			s.open("decompiler/test/script-31.dmp");
 			std::vector<Instruction> insts = s.disassemble();
 			TS_ASSERT(insts.size() == 5);
-			TS_ASSERT(insts[0]._address == 0 && insts[0]._name == "pushWord" && insts[0]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[1]._address == 3 && insts[1]._name == "writeWordVar" && insts[1]._params[0].getUnsigned() == 180);
-			TS_ASSERT(insts[2]._address == 6 && insts[2]._name == "pushWord" && insts[2]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[3]._address == 9 && insts[3]._name == "writeWordVar" && insts[3]._params[0].getUnsigned() == 181);
-			TS_ASSERT(insts[4]._address == 12 && insts[4]._name == "stopObjectCodeB");
+			TS_ASSERT(insts[0]._address == 0);
+			TS_ASSERT(insts[0]._opcode == 0x01);
+			TS_ASSERT(insts[0]._name == "pushWord");
+			TS_ASSERT(insts[0]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[1]._address == 3);
+			TS_ASSERT(insts[1]._opcode == 0x43);
+			TS_ASSERT(insts[1]._name == "writeWordVar");
+			TS_ASSERT(insts[1]._params[0].getUnsigned() == 180);
+			TS_ASSERT(insts[2]._address == 6);
+			TS_ASSERT(insts[2]._opcode == 0x01);
+			TS_ASSERT(insts[2]._name == "pushWord");
+			TS_ASSERT(insts[2]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[3]._address == 9);
+			TS_ASSERT(insts[3]._opcode == 0x43);
+			TS_ASSERT(insts[3]._name == "writeWordVar");
+			TS_ASSERT(insts[3]._params[0].getUnsigned() == 181);
+			TS_ASSERT(insts[4]._address == 12);
+			TS_ASSERT(insts[4]._opcode == 0x66);
+			TS_ASSERT(insts[4]._name == "stopObjectCodeB");
 		} catch (...) {
 			TS_ASSERT(false);
 		}
@@ -121,16 +167,44 @@
 			s.open("decompiler/test/script-33.dmp");
 			std::vector<Instruction> insts = s.disassemble();
 			TS_ASSERT(insts.size() == 10);
-			TS_ASSERT(insts[0]._address == 0 && insts[0]._name == "pushWord" && insts[0]._params[0].getSigned() == 0);
-			TS_ASSERT(insts[1]._address == 3 && insts[1]._name == "writeWordVar" && insts[1]._params[0].getUnsigned() == 71);
-			TS_ASSERT(insts[2]._address == 6 && insts[2]._name == "pushWordVar" && insts[2]._params[0].getUnsigned() == 177);
-			TS_ASSERT(insts[3]._address == 9 && insts[3]._name == "writeWordVar" && insts[3]._params[0].getUnsigned() == 173);
-			TS_ASSERT(insts[4]._address == 12 && insts[4]._name == "pushWord" && insts[4]._params[0].getSigned() == 874);
-			TS_ASSERT(insts[5]._address == 15 && insts[5]._name == "writeWordVar" && insts[5]._params[0].getUnsigned() == 177);
-			TS_ASSERT(insts[6]._address == 18 && insts[6]._name == "pushWordVar" && insts[6]._params[0].getUnsigned() == 177);
-			TS_ASSERT(insts[7]._address == 21 && insts[7]._name == "pushWord" && insts[7]._params[0].getSigned() == 93);
-			TS_ASSERT(insts[8]._address == 24 && insts[8]._name == "cursorCmd_Image");
-			TS_ASSERT(insts[9]._address == 26 && insts[9]._name == "stopObjectCodeB");
+			TS_ASSERT(insts[0]._address == 0);
+			TS_ASSERT(insts[0]._opcode == 0x01);
+			TS_ASSERT(insts[0]._name == "pushWord");
+			TS_ASSERT(insts[0]._params[0].getSigned() == 0);
+			TS_ASSERT(insts[1]._address == 3);
+			TS_ASSERT(insts[1]._opcode == 0x43);
+			TS_ASSERT(insts[1]._name == "writeWordVar");
+			TS_ASSERT(insts[1]._params[0].getUnsigned() == 71);
+			TS_ASSERT(insts[2]._address == 6);
+			TS_ASSERT(insts[2]._opcode == 0x03);
+			TS_ASSERT(insts[2]._name == "pushWordVar");
+			TS_ASSERT(insts[2]._params[0].getUnsigned() == 177);
+			TS_ASSERT(insts[3]._address == 9);
+			TS_ASSERT(insts[3]._opcode == 0x43);
+			TS_ASSERT(insts[3]._name == "writeWordVar");
+			TS_ASSERT(insts[3]._params[0].getUnsigned() == 173);
+			TS_ASSERT(insts[4]._address == 12);
+			TS_ASSERT(insts[4]._opcode == 0x01);
+			TS_ASSERT(insts[4]._name == "pushWord");
+			TS_ASSERT(insts[4]._params[0].getSigned() == 874);
+			TS_ASSERT(insts[5]._address == 15);
+			TS_ASSERT(insts[5]._opcode == 0x43);
+			TS_ASSERT(insts[5]._name == "writeWordVar");
+			TS_ASSERT(insts[5]._params[0].getUnsigned() == 177);
+			TS_ASSERT(insts[6]._address == 18);
+			TS_ASSERT(insts[6]._opcode == 0x03);
+			TS_ASSERT(insts[6]._name == "pushWordVar");
+			TS_ASSERT(insts[6]._params[0].getUnsigned() == 177);
+			TS_ASSERT(insts[7]._address == 21);
+			TS_ASSERT(insts[7]._opcode == 0x01);
+			TS_ASSERT(insts[7]._name == "pushWord");
+			TS_ASSERT(insts[7]._params[0].getSigned() == 93);
+			TS_ASSERT(insts[8]._address == 24);
+			TS_ASSERT(insts[8]._opcode == 0x6B99);
+			TS_ASSERT(insts[8]._name == "cursorCmd_Image");
+			TS_ASSERT(insts[9]._address == 26);
+			TS_ASSERT(insts[9]._opcode == 0x66);
+			TS_ASSERT(insts[9]._name == "stopObjectCodeB");
 		} catch (...) {
 			TS_ASSERT(false);
 		}


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




More information about the Scummvm-git-logs mailing list