Andrea Flexeder. Interprocedural Analysis of Low-Level Code. PhD thesis, Institut für Informatik, Technische Universität München June 2011.

Static analysis of machine code is employed for reverse engineering, automatic detection of low-level errors such as memory violations, malware detection, and many other application areas. Only at the level of executables can all errors introduced by programmers or even by compilers be identified. Analysis of machine code comes at a price: high-level language features such as local variables and procedures are no longer visible and need to be recovered. In particular, it is necessary to first reconstruct the control flow graph (CFG). This thesis tackles these challenges by presenting a sound static analysis to executables expressed using the abstract interpretation framework. Our aim is to present reasonably fast and sound analyses that scale well for industrial-sized applications. The two key contributions are as follows: First, we introduce a fully automatic and sound interprocedural analysis framework for executables. To this end, we argue for an analysis that intertwines disassembling and abstract interpretation-based analysis to provide a sound overapproximation of the CFG and additionally handles procedure calls precisely. In order to handle indirect jumps it is essential to reason about data. Hence, the location and size of variables in memory has to be inferred. Therefore we propose an analysis of differences and equalities between register contents which allows inference of potential local and global variables. In order to discharge certain assumptions made during control flow reconstruction we add an additional side-effect analysis that reasons about the modifying potential of procedures. Second, we present two novel domains: the domain of fast linear two-variable equalities and simplices, a special case of convex polyhedra. While the former domain allows a precise analysis of non-optimised assembly by inferring equalities between registers and memory locations, the latter infers precise information about memory accesses by providing linear inequality relations between loop iteration variables and memory accesses. We have implemented these analyses and experimentally evaluated that our techniques scale well for industrial-sized executables and provide very good results concerning control flow reconstruction, inference of local variables, alignment information for improving the worst-case execution time (WCET) estimation.

Download: PDF Reference: Bibtex