[Scummvm-cvs-logs] SF.net SVN: scummvm:[51889] tools/branches/gsoc2010-decompiler/decompiler/ doc

Sun Aug 8 02:52:37 CEST 2010

Revision: 51889
          http://scummvm.svn.sourceforge.net/scummvm/?rev=51889&view=rev
Author:   pidgeot
Date:     2010-08-08 00:52:37 +0000 (Sun, 08 Aug 2010)

Log Message:
-----------
DECOMPILER: Update LaTeX documentation

Modified Paths:
--------------
    tools/branches/gsoc2010-decompiler/decompiler/doc/cfg.tex
    tools/branches/gsoc2010-decompiler/decompiler/doc/codegen.tex
    tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex
    tools/branches/gsoc2010-decompiler/decompiler/doc/doc.tex
    tools/branches/gsoc2010-decompiler/decompiler/doc/overview.tex

Added Paths:
-----------
    tools/branches/gsoc2010-decompiler/decompiler/doc/engine.tex

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/cfg.tex
===================================================================

--- tools/branches/gsoc2010-decompiler/decompiler/doc/cfg.tex	2010-08-08 00:52:30 UTC (rev 51888)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/cfg.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -5,24 +5,45 @@
 \begin{itemize}
 \item Create a graph with one instruction per vertex, and edges going from instructions to their possible successors
 \item Do a depth-first search to determine the expected stack level at each vertex
+\item Optionally detect functions
 \item Merge vertices to form groups
 \item Perform analysis on vertices
 \end{itemize}
 
-The first step is handled in the constructor, while the two next steps are handled by the \verb+createGroups()+ method. The last step is handled by the \verb+analyze+ method.
+Calls to in-script functions are not represented with edges in the graph. This is done to keep functions separate from one another, so if your engine uses a jump as part of calling functions, you need to make sure you've given that particular jump the type kCall.
 
+The first step is handled in the constructor, while the next three steps are handled by the \verb+createGroups()+ method. The last step is handled by the \verb+analyze+ method.
+
+\subsection{Function detection}
+\label{sec:autofunc}
+Prior to grouping, the control flow can be used to detect new functions. This detection is automatically activated if the Engine method \verb+detectMoreFuncs+ returns true.
+
+When function detection is enabled, unreachable blocks of code will be treated as functions, unless the presumed entry point is located within the range of another function. The end point of the function will then be the last instruction reachable from the entry point.
+
+Functions detected this way will be given the name \verb+auto_+. You can use this as a prefix to the actual name to signify that the function may not actually be a function, or you can ignore it and just replace it with the name you'd normally use.
+
+You can also have the function detection determine the end point of an existing function. To do so, \verb+_endIt+ must be the same as \verb+_startIt+ for that function. In this case, only the end point will be changed within the function; the name will stay the same.
+
+Note that the control flow analysis has no way of determining an appropriate name, number of arguments, return value, or metadata. You'll have to fill that in yourself using \verb+postCFG+ in the engine.
+
+If this step is not enabled, and no functions have been defined before the control flow analysis is started, there will still be added a single function covering the entire script. This is done to avoid having a special case in the code, and it will not affect the output of your script in any way.
+
 \subsection{Group generation}
 Groups are initially formed according to these rules:
 \begin{itemize}
-\item If the next instruction is a jump, end the group here.
+\item If the next instruction is a jump or a return, end the group here.
 \item If the next instruction has multiple predecessors, end the group here.
 \item If the current instruction brings the stack to a lower level than the start of the current group, make the new level the expected stack level (to support clean-up after control structures).
-\item If the current instruction brings the stack level to the same as the start of the current group, end the group here.
+\item If the current instruction brings the stack level to the same as the start of the current group, and at least one instruction with a stack effect >= 0 exists in the group, end the group here.
 \end{itemize}
 
+Unreachable code will not be processed in the group generation, but will remain as individual instructions.
+
 Prior to applying these rules, a depth-first search is performed to calculate the expected stack level at each instruction. If multiple paths are found to the same instruction, a warning will be output if the expected stack level from each path differs.
 
 \subsection{Short-circuit detection}
+\emph{NOTE: This feature is currently disabled.}
+
 As part of the group generation, the decompiler can combine multiple, consecutive groups if it detects them as being part of a single condition check that are merely split up due to short-circuiting.
 
 The rules used to detect if two consecutive groups $A$ and $B$ are oart of the same short-circuited check are as follows:
@@ -38,7 +59,8 @@
 \item \verb+while+
 \item \verb+break+
 \item \verb+continue+
-\item \verb+if-else+
+\item \verb+if+
+\item \verb+else+
 \end{itemize}
 
 Each of these five constructs are marked using a \verb+GroupType+ member of the \verb+Group+ type, while \verb+else+ blocks are flagged using two booleans, \verb+_startElse+ and \verb+_endElse+. If \verb+_startElse+ is true, then an \verb+else+ block starts with this group, and should be output prior to the code in this group. If \verb+_endElse+ is true, an \verb+else+ block ends with this group, and the end should be output after the code in this group.
@@ -59,13 +81,16 @@
 \paragraph{Continue detection}
 Unconditional jump to a \verb+while+ or \verb+do-while+ condition, unless it is targeting a \verb+while+ condition which jumps to the next sequential group (in which case it is merely the end of the \verb+while+-loop). Just as with \verb+break+s, the jump is verified to go to the appropriate loop.
 
-\paragraph{If-else detection}
-All remaining conditional jumps are flagged as \verb+if+s. If the instruction immediately before the target of the conditional jump is an unconditional jump, but not a \verb+break+ or a \verb+continue+, an \verb+else+ is detected to begin at the jump target of the conditional jump in the \verb+if+. The \verb+else+ is set to end at the group immediately before the target of the unconditional jump.
+\paragraph{If detection}
+All unflagged conditional jumps are flagged as \verb+if+s.
 
+\paragraph{Else detection}
+All \verb+if+s are processed to see if they may have an \verb+else+ attached. If the jump target of an \verb+if+ is immediately preceded by an unconditional jump, which is neither a break or a continue, and that jump goes to later in the code, this signifies a possible \verb+else+ block, starting with the jump target of the if condition and ending with the group immediately before the target of the jump immediately before the jump target of the if condition. To avoid false positives, this block is then validated to not cross block boundaries. If the check passes, the \verb+else+ block data is added to the graph.
+
 \subsection{Graph output}
-The graph can be output in DOT format by using the -g switch. In the graph, arrows on edges will be hollow if the edge is a jump, and filled if the edge represents the usual sequential order.
+The graph can be output in DOT format by using the \verb+-g+ switch. In the graph, arrows on edges will be hollow if the edge is a jump, and filled if the edge represents the usual sequential order.
 
 \subsection{Limitations}
-Currently, only unconditional jumps are supported for \verb+break+ and \verb+continue+; however, for code of the form \verb+if (...) break;+ or \verb+if (...) continue;+, it is an obvious optimization to use the \verb+break+/\verb+continue+ jump as the conditional jump for the \verb+if+ condition check. Since \verb+if+s are found last, it should be possible to simply check unmarked conditional jumps as well and see if they meet the other criteria for a \verb+break+/\verb+continue+, although there might be some false positives for an if that stretches to the end of the loop it is placed in.
+Currently, only unconditional jumps are supported for \verb+break+ and \verb+continue+; however, for code of the form \verb+if (...) break;+ or \verb+if (...) continue;+, it is a pretty straight-forward optimization to use the \verb+break+/\verb+continue+ jump as the conditional jump for the \verb+if+ condition check. Since \verb+if+s are found last, it should be possible to simply check unmarked conditional jumps as well and see if they meet the other criteria for a \verb+break+/\verb+continue+, although there might be some false positives for an if that stretches to the end of the loop it is placed in.
 
 It is currently assumed that all conditional jumps in \verb+if+ condition checks go to a later place in the code. If optimized continue statements are used in a while (as described above), this will cause the analysis to be incorrect.

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/codegen.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/codegen.tex	2010-08-08 00:52:30 UTC (rev 51888)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/codegen.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -9,6 +9,15 @@
 \item Next, the groups are iterated over sequentially, and the generated code is output.
 \end{itemize}
 
+This process is repeated for each function.
+
+\subsection{Function signature}
+For each function in the script, the \verb+constructFuncSignature+ method is called. By default, this will return the empty string, and this will cause the code to be output "freely", i.e. without anything surrounding it. If a non-empty string is returned, a \verb+}+ will be added after all of the code in the function.
+
+If your engine uses methods, you will want to override this method to output your own signature.
+
+\emph{Note:} You must currently include a \verb+{+ at the end of your signature.
+
 \subsection{Group processing}
 During processing of a group, the instructions in the group are processed one at a time. Certain kinds of instructions can be handled by generic code, while others must be handled by engine-specific code in the \verb+processInst+ method of your subclass.
 
@@ -19,6 +28,10 @@
 
 To manipulate the stack, use the \verb+push+ and \verb+pop+ methods to push or pop stack entries. Unlike the STL stack, \verb+pop+ returns the value being popped from the stack, so you don't have to first get the top element and then pop it afterwards, but you can still call the \verb+peek+ method if you just want to look at the topmost element without removing it. Additionally, it has an \verb+empty+ method to check if the stack is empty.
 
+Some engines require you to look further down the stack than just the topmost element. You can use the \verb+peekPos+ method to retrieve an element at an arbitrary position in the stack. This method takes an integer containing the number of stack entries to skip, i.e. passing the value 0 will give you the topmost element, while passing the value 2 will give you the third value on the stack.
+
+\emph{Note:} \verb+peekPos+ accesses the underlying STL container (\verb+std::deque+) using the \verb+at+ function, which will throw an exception if the stack does not contain enough elements.
+
 When working with entries, you should use the \verb+EntryPtr+ type. This wraps the entry in a \verb+boost::intrusive_ptr+ to free the associated memory when it is no longer referenced.
 
 Some stack entries contain references to an arbitrary number of stack entries. This is handled using an STL \verb+deque+, typedef'ed as \verb+EntryList+.
@@ -26,15 +39,16 @@
 Stack entries can be categorized into 9 different types:
 
 \paragraph{Integers (IntEntry)}
-Integers can use up to 32-bits, and be signed or unsigned. When creating an integer, you must specify its value and whether or not it is signed.
+Integers can use up to 32-bits, and be signed or unsigned. When creating an integer, you must specify its value and whether or not it is signed. This also contains additional methods to extract the value and signedness of the value, which may be of use in some situations.
+
 \paragraph{Variables (VarEntry)}
-Variables are stored as a simple string. Subclasses must implement their own logic to determine a suitable variable name when given a reference.
+Variables are stored as a simple string. Subclasses of \verb+CodeGenerator+ must implement their own logic to determine a suitable variable name when given a reference.
 
 \paragraph{Binary operations (BinaryOpEntry)}
 Binary operations stores the two stack entries used as operands, and a string containing the operator. Parenthesis are automatically added around all binary operations to preserve the proper evaluation order.
 
 \paragraph{Unary operations (UnaryOpEntry)}
-Just like binary operations, except on a single operand is stored.
+Just like binary operations, except only a single operand is stored. Note: Currently, the operator will always be output on the left side of the operand.
 
 \paragraph{Duplicated entries (DupEntry)}
 Stores an index to distinguish between multiple duplicated entries. This index is automatically assigned and determined when calling the \verb+dup+ function to duplicate a stack entry.
@@ -43,7 +57,7 @@
 Array entries are stored as a simple string containing the name of the array, and an EntryList of stack entries used as the indices, with the first element in the EntryList being output as the first index.
 
 \paragraph{Strings (StringEntry)}
-A string is stored as... well, a string. You have to supply your own quotes.
+A string is stored as... well, a string. You have to supply your own quotes if necessary.
 
 \paragraph{Lists (ListEntry)}
 A list is stored using an EntryList to contain the stack entries in the list. Elements are output left-to-right, such that the first element in the EntryList will be output as the first element in the list.
@@ -51,7 +65,7 @@
 \paragraph{Function calls (CallEntry)}
 Function calls have the same underlying storage types as an array entry, but the output is formatted like a function call instead of an array access.
 
-Each entry type knows how to output itself to an \verb+std::ostream+ supplied as a paraemter to the \verb+print+ function, and the common base class \verb+StackEntry+ also overloads the \verb+<<+ operator so any stack entry can be streamed directly to an output stream using that function.
+Each entry type knows how to output itself to an \verb+std::ostream+ supplied as a parameter to the \verb+print+ function, and the common base class \verb+StackEntry+ also overloads the \verb+<<+ operator so any stack entry can be streamed directly to an output stream using that function.
 
 \subsection{Outputting code}
 When processing certain kinds of instructions, you will probably want to create a line of code as part of the output. To do that, call \verb+addOutputLine+ with a string containing the code you wish to output as an argument. This will then be associated with the group being processed.
@@ -82,6 +96,11 @@
 \paragraph{kJump and kJumpRel}
 If the current group has been detected as a break or a continue, a break or continue statement is output. Otherwise, the jump is analyzed and output unless it is a jump back to the condition of a while-loop that ends there, or it is determined that the jump is unnecessary due to an else block following immediately after.
 
+\paragraph{kReturn}
+This simply adds a line \verb+return;+ to the output.
+
+\emph{Note:} The default handling does not currently allow specifying a return value as part of the statement, as in \verb+return 0;+.
+
 \paragraph{kSpecial}
 The metadata is treated similar to parameter specifications in \verb+SimpleDisassembler+ (see Section~\vref{sec:simpledisasm}). If the specification string starts with the character \verb+r+, this signifies that the call returns a value, and processing starts at the next character.
 For each character in the metadata string, \verb+processSpecialMetadata+ is called with the instruction being processed, and the current metadata character to be handled. The default implementation only understands the character \verb+p+, which pops an argument from the stack and adds it to the argument list.
@@ -92,11 +111,13 @@
 Due to the conflict with the specification of a return value, it is recommended that you do not adopt \verb+r+ as a metadata character.
 
 \paragraph{Other types}
-No default handling exists for types other than those mentioned above. These instructions will be sent to the \verb+processInst+ method of your subclass, where you must handle them appropriately. This includes types like kLoad and kStore.
+No default handling exists for types other than those mentioned above. These instructions will be sent to the \verb+processInst+ method of your subclass, where you must handle them appropriately. This includes types like \verb+kLoad+ and \verb+kStore+.
 
+Note that this also includes \verb+kCall+. Although many engines might want to handle this in a manner similar to kSpecial opcodes, this is left to the engine-specific code so they can fully make sense of the metadata they choose to add to the function.
+
 \subsection{Order of arguments}
 \label{sec:argOrder}
-The generic handling of binary operators (kBinaryOp, kComparison) and magic functions (kSpecial) can be configured to display their arguments using FIFO or LIFO - respecitvely, the first and the last entry to be pushed onto the stack is used as the first (leftmost) argument. This is set as part of the constructor for the \verb+CodeGenerator+ class, using the enumeration values \verb+kFIFO+ and \verb+kLIFO+.
+The generic handling of binary operators (kBinaryOp, kComparison) and magic functions (kSpecial) can be configured to display their arguments using FIFO or LIFO - respectively, the first and the last entry to be pushed onto the stack is used as the first (leftmost) argument. This is set as part of the constructor for the \verb+CodeGenerator+ class, using the enumeration values \verb+kFIFO+ and \verb+kLIFO+.
 
 To provide an example, consider the following sequence of instructions:
 
@@ -110,5 +131,6 @@
 
 This can mean two different things, either \verb+a - b+ or \verb+b - a+, depending on the order in which the operands should be evaluated. For the former, choose FIFO ordering, for the latter, choose LIFO.
 
-For arguments to function calls, the same principle applies. You can use the \verb+addArg+ method to add an argument to the call currently being processed, using the chosen ordering. In general, you might not know which value is more correct; unless you have reason to believe otherwise, you should simply use the same ordering as for binary operators.
+For arguments to function calls, the same principle applies. You can use the \verb+addArg+ method to add an argument to the call currently being processed, using the chosen ordering.
 
+In general, you might not know which ordering is more correct for function arguments; unless you have reason to believe otherwise, simply use the same ordering as for binary operators.

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex	2010-08-08 00:52:30 UTC (rev 51888)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/disassembler.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -23,7 +23,7 @@
 \item \verb+_opcode+ is used to store the numeric opcode associated with the instruction. This is not used by the decompiler itself, but is for your reference during later parts of the decompilation process. Note that this field is declared as a 32-bit integer; if you need more than 4 bytes for your opcodes, you'll need to figure out which bytes you want to store if you want to use this field.
 \item \verb+_address+ stores the absolute memory address where this instruction would be loaded into memory.
 \item \verb+_stackChange+ stores the net change of executing this instruction - for example, if the instruction pushes a byte on to the stack, this should be set to 1. This is used to determine when each statement ends. The count can be in any unit you wish - bytes, words, bits - as long as the same unit is used for all instructions. This means that if your stack only works with 16-bit elements, pushing an 8-bit value and pushing a 16-bit value should have the same net effect on the stack.
-\item \verb+_name+ contains the name of the instruction. You will use this during code generation.
+\item \verb+_name+ contains the name of the instruction. This is mainly for use during code generation.
 \item \verb+_type+ represent the type of instruction. See Section~\vref{sec:insttype} for details.
 \item \verb+_params+ contains the parameters given to the instruction - for example, if you have the instruction \verb+PUSH 1+, there would be one parameter, with the value of 1. See Section~\vref{sec:parameter} for details on the Parameter type.
 \item \verb+_codeGenData+ stores metadata to be used during code generation. For details, see Section~\vref{sec:codegen}.
@@ -85,7 +85,7 @@
 
 \verb+doDisassemble+ is the method used to perform the actual disassembly, so this method must be implemented by all disassemblers.
 
-\verb+disassemble+ simply calls the \verb+doDisassemble+ method to perform the disassembly (if necessary), and returns \verb+_insts+ to the calling methtod.
+\verb+disassemble+ simply calls the \verb+doDisassemble+ method to perform the disassembly (if necessary), and returns \verb+_insts+ to the calling method.
 
 Finally, \verb+dumpDisassembly+ is used to output the instructions in a human-readable format to a file or stdout, performing a disassembly first if required, and then calls \verb+doDumpDisassembly+ to perform the actual output. A default implementation is provided for \verb+doDumpDisassembly+, but you can override it if the standard output format is not suitable for your particular engine.
 
@@ -231,6 +231,8 @@
 
 Subopcodes can be nested if the instruction set requires it. For subopcodes, the \verb+_opcode+ field stores the bytes in the order they appear in the file - i.e., the HALT instruction would have the opcode value 0xFF00. If the opcodes are longer than 4 bytes, only the last 4 bytes will be stored.
 
+If all opcodes in a group of subopcodes share a prefix, you can use the \verb+START_SUBOPCODE_WITH_PREFIX+ macro instead of \verb+START_SUBOPCODE+. This macro takes an additional string parameter containing the full prefix to use for the opcodes associated with this subopcode. The prefix is not propagated if you nest subopcodes, only the nearest prefix is used.
+
 \subsubsection{Code generation metadata}
 For each opcode, you will need to replicate its semantics during code generation. To assist you in generalizing your code, you can use the \verb+OPCODE_MD+ macro to add metadata to the instruction, which is then available during code generation.
 
@@ -261,6 +263,6 @@
 \end{lstlisting}
 \end{C++}
 
-\verb+OPCODE_BASE+ automatically keeps track of the current opcode value. You can access \verb+full_opcode+ to get the current full opcode. Alternatively, you can use the \verb+OPCODE_BODY+ macro to use the standard behavior for opcodes, and then follow that with your own implementation. The \verb+OPCODE_BODY+ macro takes the same arguments as the \verb+OPCODE_MD+ macro.
+\verb+OPCODE_BASE+ automatically keeps track of the current opcode value. You can access \verb+full_opcode+ to get the current full opcode. Alternatively, you can use the \verb+OPCODE_BODY+ macro to use the standard behavior for opcodes, and then follow that with the additional code you want. The \verb+OPCODE_BODY+ macro takes the same arguments as the \verb+OPCODE_MD+ macro.
 
 For your convenience, a few additional macros are available: \verb+ADD_INST+, which adds an empty instruction to the vector, and \verb+LAST_INST+ which retrieves the last instruction in the vector. Additionally, you can use \verb+INC_ADDR+ as a shorthand for incrementing the address variable by 1, but note that you should \emph{not} increment the address for the opcode itself - this is handled by the other macros.

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/doc.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/doc.tex	2010-08-08 00:52:30 UTC (rev 51888)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/doc.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -9,6 +9,7 @@
 
 \newpage
 \input{overview}
+\input{engine}
 \input{disassembler}
 \input{cfg}
 \input{codegen}

Added: tools/branches/gsoc2010-decompiler/decompiler/doc/engine.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/engine.tex	                        (rev 0)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/engine.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -0,0 +1,69 @@
+\section{Engine}
+The \verb+Engine+ class represent a single engine. It works as a factory for the engine-specific classes required for each step of the process.
+
+As a minimum, engines must provide a disassembler and a code generator. All other steps are optional, but you can implement them for additional processing.
+
+If you need to store metadata about the script, you can add the necessary fields to your engine class and store the information there, as the same instance will be used throughout the decompilation process.
+
+\subsection{Adding a new engine}
+In order to make the decompiler use the code you write to decompile code for some engine, it must be registered in the program. To do so, use the \verb+ENGINE+ macro defined in \verb+decompiler.cpp+, and add your own use of the macro near the existing registrations.
+
+This macro takes 3 parameters: the engine ID, a description of the engine, and the name of the \verb+Engine+ subclass used to create the classes used for the various steps of the process. The ID is entered by the user to signify the engine where the script originates from, and the description is a descriptive text which will be shown when the user requests a list of the supported engines.
+
+The methods you need to implement are:
+\begin{itemize}
+\item \verb+getDisassembler+, which takes a reference to the instruction vector to use for storage and creates a disassembler object and returns it.
+\item \verb+getCodeGenerator+, which takes a reference to the \verb+std::ostream+ to output the code to and creates a code generator object and returns it.
+\item \verb+getDestAddress+, which takes a const iterator to a jump instruction as a parameter and returns the address the instruction will jump to if the jump is taken. Unless you do differently in your engine-specific code, this function will only receive jumps as input, so if you can take a shortcut based on that, you're allowed to.
+\end{itemize}
+
+Additional methods you can override are:
+\begin{itemize}
+\item \verb+supportsCodeFlow+ and \verb+supportsCodeGen+, which can be used to stop the decompiler from going any further after disassembly or code flow analysis, respectively. This is helpful when working on a brand new engine, so you can take one step at a time without having to remember to use the right command-line switch. If you do not override these methods, the decompiler will go through all steps.
+\item \verb+detectMoreFuncs+ allows you to tell the control flow analysis to automatically detect functions based on reachability. See~\vref{sec:autofunc} for details. By default, this is turned off; engines must opt-in to this feature.
+\item \verb+postCFG+, which is a post-processing step called after control flow analysis. If you override \verb+detectMoreFuncs+ to return true, you must also override this function to process any newly found functions. A default implementation which does nothing is already provided in case you don't need to do any post-processing.
+\end{itemize}
+
+\subsection{Game information}
+For some engines, it's not enough to know the engine; some instructions may differ in behavior between talkie or non-talkie versions, while others may require different behavior for different platforms.
+
+The \verb+Engine+ class contains a field \verb+_isTalkie+ which is set to true if the user passed in the \verb+-t+ switch on the command line. You can check this flag in your engine-specific code if necessary.
+
+In the interest of user friendliness, if the necessary data exists directly in the script file itself, you should use that instead of requiring additional switches to be passed.
+
+Note that, at the time of writing, there is no field containing platform information; this must be handled by implementing another engine which passes in relevant information to engine-specific classes.
+
+\subsection{Functions}
+Some engines allow multiple functions in a single script file. Each function must be analyzed separately, but in order to do that, it is of course necessary to know where the functions start and end, and when it's time to actually generate some code, you'll want to know a bit about the function as well.
+
+This information is stored in the engine, as a \verb+std::map+ of \verb+Function+s, in the field \verb+ _functions+.
+
+\begin{C++}
+\begin{lstlisting}
+struct Function {
+public:
+	ConstInstIterator _startIt;
+	ConstInstIterator _endIt;
+	std::string _name;
+	GraphVertex _v;
+	uint32 _args;
+	bool _retVal;
+	std::string _metadata;
+};
+\end{lstlisting}
+\end{C++}
+
+Each member of this struct has a specific purpose:
+\begin{itemize}
+\item \verb+_startIt+ is a const iterator pointing to the first instruction in the function, i.e. the entry point.
+\item \verb+_endIt+ is a const iterator pointing to the instruction immediately after the last instruction, similar to \verb+end()+ on STL collections.
+\item \verb+_name+ is the name of the functions.
+\item \verb+_v+ is the GraphVertex containing the entry point. This will be automatically assigned in the control flow analysis.
+\item \verb+_args+ is the number of arguments for the function. This is present as a convenience; you generally won't know the names of the arguments in the function, so you can store the number of them here and use this during code generation to generate a method signature containing some default parameter names.
+\item \verb+_retVal+ should be true if your method returns a value, and false if it does not. Again, this is for your own convenience, to make it easier to handle calls.
+\item \verb+_metadata+ contains metadata about the function, so you know how to handle the arguments when the function is being called. It's up to you how you want to use this.
+\end{itemize}
+
+When you add a function to the map, you must use the address of the first instruction as the key.
+
+Sometimes, you don't know where all of the functions begin or end. In that case, the control flow analysis can analyze the code for you and automatically detect missing functions or unknown end points. See~\vref{sec:autofunc} for details.


Property changes on: tools/branches/gsoc2010-decompiler/decompiler/doc/engine.tex
___________________________________________________________________
Added: svn:mime-type
   + text/plain
Added: svn:keywords
   + Date Rev Author URL Id
Added: svn:eol-style
   + native

Modified: tools/branches/gsoc2010-decompiler/decompiler/doc/overview.tex
===================================================================
--- tools/branches/gsoc2010-decompiler/decompiler/doc/overview.tex	2010-08-08 00:52:30 UTC (rev 51888)
+++ tools/branches/gsoc2010-decompiler/decompiler/doc/overview.tex	2010-08-08 00:52:37 UTC (rev 51889)
@@ -9,17 +9,23 @@
 
 Of these steps, the code flow analysis is engine-independent, while disassembly and code generation require engine-specific code.
 
-\subsection{Limitations}
-The decompiler is targeted for stack-based instruction sets, and may contain assumptions to that effect. If you want to add an engine which does not use a stack-based instruction set, parts of this documentation may not apply, and additional work to the generic parts may be necessary.
+\subsection{Reading guide}
+Names used in code are written in a \verb+monospaced typewriter font+.
 
-\subsection{The Engine class}
-The \verb+Engine+ class represent a single engine. It works as a factory for the engine-specific classes required for each step of the process.
+Actual code snippets have basic syntax highlighting on a light gray background, and lines are numbered, like below:
 
-As a minimum, engines must be provide a disassembler and a code generator. All other steps are optional, but you can implement them for additional processing.
+\begin{C++}
+\begin{lstlisting}
+#include <stdio.h>
 
-If you need to store metadata about the script, you can add the necessary fields to your engine class and store the information there, as the same instance will be used throughout the decompilation process.
+int main(int argc, char **argv) {
+	printf("Hello world!");
+	return 0;
+}
+\end{lstlisting}
+\end{C++}
 
-\subsection{Adding a new engine}
-In order to make the decompiler use the code you write to decompile code for some engine, it must be registered in the program. To do so, use the \verb+ENGINE+ macro defined in \verb+decompiler.cpp+, and add your own use of the macro near the existing registrations.
+In this document, the terms \emph{control flow analysis} and \emph{code flow analysis} are used interchangably.
 
-This macro takes 3 parameters: the engine ID, a description of the engine, and the name of the \verb+Engine+ subclass used to create the classes used for the various steps of the process. The ID is entered by the user to signify the engine where the script originates from, and the description is a descriptive text which will be shown when the user requests a list of the supported engines.
+\subsection{Limitations}
+The decompiler is targeted for stack-based instruction sets, and may contain assumptions to that effect. If you want to add an engine which does not use a stack-based instruction set, parts of this document may not apply directly, and additional work to the generic parts may be necessary.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.