Reverse Engineering book

Reverse Engineering: Purposes, Methodologies, Tools, and Law

by Andrew Schulman

The following are notes for a forthcoming book. Contact the author for more information.

The book will include the use of reverse engineering as a fact-gathering tool in litigation, when the operation, composition, or design (whether as-intended or as-built) of a system is at issue, or relevant to something else at issue.

Also see articles on “Reverse engineering as a fact-investigation tool in software patent litigation,” “Hiding in plain sight: Using reverse engineering to uncover (or help show absence of) software patent infringement,” and “Open to inspection: Using reverse engineering to uncover software prior art“, and detailed outline for chapter on “Pre-filing investigation and reverse engineering” in yet another forthcoming book, on source-code examination for litigation.

The forthcoming book will include detailed coverage of hardware reverse engineering (based on the work of GreyB, such as “How we used electrical signal analysis” to detect smartphone processes), and of reverse engineering for non-litigation purposes.

The outline as currently planned (with major sections on simple vs. static vs. dynamic examination) puts more emphasis on specific tools than is consistent with the book’s goal of stressing what one is trying to accomplish with reverse engineering (what types of questions it can answer), and de-emphasizing how to use this or that NiftyTool with this or that specific version of a target product.

There will soon be material on reverse engineering AI systems. For now, see some discussion of this in sessions with ChatGPT; walking neural networks with ChatGPT (including Claude Sonnet 3.7 reverse engineering EXE binary files); and using AI chatbots to summarize disassembled/decompiled code. See also Chris Olah’s Mechanistic Interpretability article.

Summary outline

Part One: An overview of reverse engineering purposes, methodologies, tools, and law

Introduction, with several newsworthy examples of reverse engineering
Benefits of the outsider’s perspective (vs. that of the product’s creator; “straight from the horse’s mouth” is not always the best source of information, and often not the only source)
Defining reverse engineering: what it is, and is not
1. “Working backwards” vs. “working from the bottom-up” definitions of reverse engineering
2. Initial definition of software reverse engineering:
  1. Inspecting software, without source code or documentation, for purposes such as:
  2. interoperability, security, testing, improving documentation, programmer understanding, porting, re-engineering legacy code, intellectual property litigation, competitive intelligence, regulatory compliance, public policy, and generally:
  3. bringing the benefits of open source to otherwise-closed systems.
3. Refined definition of software reverse engineering:
  1. We’re going to inspect or carefully read software
  2. (usually someone else’s, and usually commercial closed-source products/apps/services),
  3. largely without source code or documentation,
  4. taking low-level details learned through engineering techniques, and
  5. turning those lower-level details into some higher-level model or actionable description of the code,
  6. likely different from the original code or design,
  7. for purposes such as … [see above]
  8. bringing the benefits of open source to otherwise-closed systems.
How reverse engineering relates to other means for learning about technology, and to forensics
How to avoid reverse engineering by diligently mining public sources for material “hiding in plain sight”
Reverse engineering methodology and heuristics (including methods/heuristics for source-code examination)
Why reverse engineering?: Purposes and goals
Legal and ethical questions
The law of trade secrets, copyright, DMCA (anti-circumvention), and contracts, and how they impact reverse engineering
Types of reverse engineering, and important distinctions (as-built vs. as-designed; dynamic vs. static analysis; “behavioral” vs. code-based analysis; etc.)
Reverse engineering tools, and general tool concepts
Teardowns and composition analysis: Using components and modularity in reverse engineering
Acquiring the target: The sometimes-surprisingly-difficult task of obtaining the product or process to be examined (including legal and ethical issues with “straw purchases”)
Formulating narrow technical questions that can be answered with reverse engineering

Part Two: Simple software reverse engineering: Treating code as data

Software reverse engineering as an example of reverse engineering generally
Code is also data: “Unstructured” or format-agnostic inspection
Hex dumpers and editors
Text inside binaries: strings
“Magic numbers,” signatures, and scanning

Part Three: Simple software reverse engineering with format-specific tools

“Structured” inspection: executable file formats
Using dynamic-linking and shared-library import and export headers
Mapping inter-module dependencies
Using debug symbol files and library files
Inspecting menus, dialogs, and other resources
Inspecting Apple OSX and iOS binaries
Inspecting .NET, Android, and ELF binaries

Part Four: Using the output of simple reverse-engineering tools

Reverse engineering is a tool for answering questions, not an end in itself
Using the command line (CLI), and tools with plain-text output
Correlating reverse engineering with public information (and with non-public documents such as company internal emails accessed during the discovery phase of litigation)
Scripting to answer specific questions
Repositories and “Big Code”: Building databases, and the importance of continuity
Moving to static and dynamic reverse engineering; legal implications of simple reverse engineering

Part Five: Static reverse engineering with disassemblers

Introduction to static reverse engineering: disassembly and decompilation
“Use the Source, Luke” (UTSL): Source code or near-source code may already be available
How reverse engineering relates to source-code analysis [see Spinellis, Code Reading]
Producing a disassembly listing
Navigating a disassembly listing: calls and jumps
Navigating an Apple OSX/iOS Objective-C disassembly listing
ARM, other processors, and special languages
Scripting to extract information from disassembly listings [see ancient example of NiceDbg]
Understanding and improving a disassembly listing
Using symbols, strings, “magic numbers” and signatures to identify code, including library code and compiled open source
Recognizing basic C/C++ constructions in assembly language
Code/data separation, data structures, and tables
Function pointers, jump tables, on-event handlers, and hooks

Part Six: Static reverse engineering with decompilers

Introduction to decompilation with Java and Android
Decompiling .NET (COM/OCX/OLE) code
Decompiling with NSA Ghidra and IDA Pro
Code obfuscation and de-obfuscation, including Java and JS deobfuscators
Using source-code tools with decompilation listings
Moving from simple and static, to dynamic reverse engineering

Part Seven: Dynamic reverse engineering with monitoring tools

Introduction to dynamic reverse engineering, and contrast to static reverse engineering
Network monitoring (“packet sniffing”)
Web monitoring with Fiddler, including AJAX client/server traffic
Encrypted web traffic (HTTPS), and mobile devices (iOS & Android)
Wireshark, pcap, and non-web protocols
Inferring server operation from client/server communications
Operating-system monitoring and logging tools
Walking live OS data structures
Monitoring application programming interface (API) usage
Mobile OS logging: Android, iOS, and Bluetooth
Event hooking
Memory inspection/forensics
Module removal and replacement: shimming, code injection, and other intrusive/active methods

Part Eight: Dynamic reverse engineering with debuggers

How using a debugger for reverse engineering differs from normal developer debugging
Web-browser debuggers and the document object model (DOM)
OS-level debuggers: breakpoints and intrusive testing
Back-tracing: “How did I get here?”
Debugging for Apple OSX/iOS and Android
Combining static and dynamic reverse engineering methods

Part Nine: Hardware reverse engineering [tentative outline; this section to be written by GreyB]

Introduction to hardware reverse engineering: how it resembles and differs from software reverse engineering
Microscopy and spectrometry tools: SEM/TEM, EDX, XPS, AFM, TOF, dynamics SIMS
Other tools: signal generators and oscilloscopes
Product teardown: Identifying internal boards, components, and ICs
Material categorization and composition
Thin-film layer categorization: electrical and magnetic properties
Chip-level circuit analysis
IC signal analysis
Chip-level code analysis: HDLs

Part Ten: Next steps in reverse engineering

Security and RE
Static & dynamic inspection to find security holes
Static inspection of known malware
Malware detection methods
Overcoming encryption and obfuscation; legal issues
Examining software from embedded devices (firmware)
Reverse engineering as a tool for litigation-related investigation
Project management: Time/budget to reverse engineer
Possible futures for reverse engineering:
1. the AI “black box” and “algorithmic transparency”;
2. reverse engineering machine learning (ML) models;
3. visualization;
4. inferring social-media algorithms;
5. supply-chain traceability & transparency

Appendices

Glossary
Summary of key points about reverse engineering
Common reverse-engineering errors
Bibliography

Software Litigation Consulting

Search

Menu

Recent updates