Hiding in Plain Sight: Using Reverse Engineering to Uncover (or Help Show Absence of) Software Patent Infringement

Hiding in Plain Sight: Using Reverse Engineering to Uncover (or Help Show Absence of) Software Patent Infringement

By Andrew Schulman

Imagine a building site where some event has occurred, and imagine some litigation about the event. Both sides are staring at the blueprints, but oddly, no one has visited the building site. Of course, blueprints present a lot of information not apparent from a building or jobsite. But since “as built” construction diverges considerably from original blueprints, and more fundamentally since a building and its blueprints are two different things, it’s hard to imagine construction litigation without investigating the actual jobsite.

Much software litigation can be analogized to a construction dispute where only the blueprints are consulted, without looking at the building.

In patent infringement cases involving computer software, in-depth factual investigation often awaits the opponent’s production during discovery of its closely-held source code. And once the source code is produced, investigation remains focused there – even if the product itself could be purchased on eBay for $50.

Loosely analogous to the blueprint for a building, source code holds the plans for a software-based product – but it is not the product itself. A software product – such as Microsoft Windows, a web site Flash animation, the “firmware” inside a Canon camera, or a PlayStation game – is generally distributed as object code (“open source” products are an important exception). Object code contains instructions to a computer (such as a PC, or the microprocessor located inside a digital camera). The instructions tell the computer how to behave like an operating system, word processor, game, or whatever other functionality the software product provides.

Computer programmers create these instructions, but usually indirectly: they write source code (which looks like a combination of pidgin English and mathematics), which is then translated (“compiled”) into object code, which is then packaged onto a CD or internet-download file to become the product sold to customers.

It therefore makes some sense that software litigation focuses on source code. After all, why bother with a translation – one that’s only immediately readable by a computer – when you can get the original written by humans?

But that’s just the problem: until a certain stage in the litigation, you can’t get the other side’s source code. It is tightly held by the product’s owner. This presents what is sometimes viewed as a chicken-and-egg problem in software patent litigation.

Under FRCP Rule 11, an attorney must have a reasonable basis for a patent infringement complaint. The reasonable basis has two prongs: claim interpretation, and comparing the accused device or method with those claims.[1] Once a software patent case is rolling, the comparison would be presented in a claims table using the opponent’s source code, produced in discovery. But discovery requires a case, which requires a complaint, which in turn requires, as just noted, some reasonable basis.

Apart from Rule 11, Local Patent Rules generally require that patent infringement plaintiffs provide, early in the case, detailed infringement contentions, describing — before discovery — where each limitation of at least one asserted claim is located within each accused product or process. Courts have interpreted this requirement to in turn require “reverse engineering or its equivalent” (see e.g. Network Caching v. Novell, ND Cal., 2002).

Thus, the reasonable basis for a software patent infringement claim, pre-filing and therefore pre-discovery, must generally be based on something besides the opponent’s tightly-held source code.

On what, then? The client’s mere say-so is not a reasonable basis.[2] Nor, often, is the alleged infringer’s marketing materials or product documentation. View Engineering v. Robotic Vision Systems, 208 F.3d 981 (Fed. Cir., 2001) (infringement claim inadequate when based merely on opponent’s advertising and claims made to customers).[3] Methods useful in investigating infringement of process and method patents, such as using the opponent’s regulatory filings, speaking with its customers and suppliers, and looking at recent price changes and capital improvements,[4] while useul, may have limited applicability to software.

The solution to this chicken-and-egg problem, then, at least until you get the source code, is to have an expert carefully examine the actual software product. In other words, reverse engineering (RE). RE usually appears in an IP context as a problem to be solved (do intermediate copies made during RE violate copyright?; can a shrinkwrap license agreement enforceably prohibit RE?). Here, however, RE is not a problem but a tool to be used in answering a factual question: does D’s product practice this claim of P’s invention?

The legal definition of reverse engineering is “starting with the known product and working backward to divine the process which aided in its development or manufacture.” Kewanee Oil v. Bicron, 416 US 470 (1974). In truth, software RE is more a working upwards than backwards. Rather than trying to reconstruct the original source code, the goal is to learn about a product’s “as built” design, from lower-level details of the product. It is an inductive process somewhat similar to the common law’s inference of blackletter law from myriad fact-specific case holdings. Generally the goal of software RE is not to duplicate the original product, but rather to interoperate with it or learn its flaws.

US trade secret law views RE as a proper means of learning a trade secret, if the product itself was acquired by fair and honest means. Uniform Trade Secrets Act (1985) § 1 (Comment). (Acquisition by fair and honest means may however be a big or at least medium-sized “if.” For instance, what about reverse engineering software from a widespread “beta” test, in which recipients personally signed a no-reverse-engineering clause but about which, on the other hand, the software vendor has been conducting widespread “leaks” to these press? These facts come from the trade-secrets portion of an early software patent case, Stac v. Microsoft.) [4B]

At the same time, most software end-user license agreements (EULAs) explicitly forbid reverse engineering. For example, the Microsoft Windows EULA states: “LIMITATIONS ON REVERSE ENGINEERING, DECOMPILATION, AND DISASSEMBLY. You may not reverse engineer, decompile, or disassemble the Software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation.”[5]

However, in the context of mass-market products with shrinkwrap or clickwrap licenses, such restrictions are generally viewed as unenforceable when there is a legitimate reason for RE and when there is no other means of access to otherwise-unprotected elements (Sega v. Accolade, 977 F.2d 1510 (9th Cir., 1993)) — and, as seen in the “except” portion of the Microsoft clause, this often is reflected in the EULA itself. (Software in a non-mass-market context may present a different story.)

Even given an otherwise-enforceable software license prohibition on RE, using the fruits of RE in litigation would likely fall under a fair-use exception for judicial proceedings. While not all courts view the use of copyrighted works in legal proceedings as “inherently” fair use (Shell v. DeVries, 2007 WL 4269047 (10th Cir.)), some do so long as judicial use is not the work’s “intrinsic purpose” (Jartech v. Clancy, 666 F.2d 403 (9th Cir., 1982)), and fair use would in any case normally be found under the standard four-factor test (e.g., Shell v. DeVries, in which D failed to pay P’s $5,000-per-page fee for printing from web site).

In fact, Rule 11 has been interpreted in some circumstances to require reverse engineering of a product as part of a pre-filing investigation.[6] Judin v. U.S. and Hewlett-Packard (110 F.3d 780 (Fed. Cir., 1997)) (attorney sanctions for reliance upon inventor of micro-optical imaging patent, without attempt to inspect Postal Service bar-code scanners), Antonious v. Spalding and Evenflo (275 F.3d 1066 (Fed. Cir., 2002)) (failure to cut open and examine all relevant golf clubs), Bender v. Maxim Integrated Products (2010 WL 2991257 (N.D. Cal.)) (plaintiff correct that reverse engineering not an absolute requirement, but here RE may be only method yielding sufficiently specific infringement contentions).[7]

At the same time, there is a persistent belief that object code is unintelligible, even by experts, or that software RE is unfeasible. Apparently for this reason, some courts allow what would otherwise be inadequate claim contentions while awaiting source code. N.Y.U. v. E.Piphany (2006 WL 559573 (S.D.N.Y., 2006)) (characterizing need for sufficiently detailed preliminary infringement contentions, without source code, as a “Catch 22”, and requiring defendant to provide source code so plaintiff can be more specific about claims, citing American Video Graphics v. Electronic Arts (359 F.Supp.2d 558 (E.D. Tex., 2005)).

Yet, apart from cases where the object code itself is unavailable (hidden, for example, on the opponent’s network server), it is generally possible to learn about infringement from software RE.

In particular, there is much immediately-intelligible textual information inside software products, hiding in plain sight.

Extracting Information from Software Products

One method for extracting information from software is so simple that it should probably be considered plain visual inspection rather than reverse engineering. The information is what programmers call strings – sequences of human-readable text – and the software utility used to extract this information is also called strings.

Strings include not only the text of menus and dialog boxes, but also error messages, internal diagnostics, self-tests, and “debugging” information that has been left in the product. Remarkably, even a vendor that tightly guards its source code as the “crown jewels” will often send out its products with debugging information that contains significant source-code fragments.[8] These strings often contain names of functions implemented and used by the product.

These strings can be seen without running the program. What is required is looking at the object- code files that comprise the product as if they were documents, little different from your own word-processing documents. For example, Microsoft Word is comprised of files with names such as winword.exe and mso.dll. These files become instructions when their contents are interpreted as instructions by a computer. Until run (for example, by clicking on an icon), a program file is really just a file, and it can be examined like a file.

To pick a random example, the tiny wireless modem I am using right now contains embedded software (“firmware”). There is a firmware update for the modem, freely available on the internet. Examining this firmware with the strings program reveals literally thousands of lines of text referring to the technology used in the modem software.

While admittedly inscrutable-looking, this text is no less inscrutable than the source code would be. For instance, the string “GLMSSecURIReplaceFunc” likely represents a function providing secure replacement of a Uniform Resource Identifier (such as an HTTP web address) in a Group List Management Server. A code fragment left in the product as part of a self-test, “ptr_arpi_cb -> arp_instance > LAN_LLE_MIN”, refers to the Address Resolution Protocol and Long Link Emulation. To emphasize the point, this source-code fragment is located inside the vendor’s update files publicly available on the internet.

Just as attorneys have learned that documents often contain hidden metadata (such as remnants of previous drafts), similarly the text hidden (barely) in software is a fruitful area for exploration. Software metadata includes what other software the product depends upon, or the names of services it uses or provides.

Such services are called application programming interfaces (APIs). APIs are exported to enable one separable component of a product, or third-party software, to use (import) some functionality provided by the product. This functionality is often invoked using its name, which in turn is often contained in a program file’s metadata as an import/export table. Thus, both the component providing the functionality, and other components using the functionality, will contain the name. These APIs often reflect low-level functionality which happens to correspond to elements or steps in a patent claim. APIs are often documented (if not by the vendor, then by third parties),[9] so knowing that product A uses an API from product B can, together with product B’s documentation, tell you something about product A, without product A’s source code.

After strings and metadata, only slightly less immediately readable inside software are what programmers sometimes call “magic numbers.” For example, the number 5A827999 indicates use of the Secure Hashing Algorithm (SHA). Such numbers thus can act as, or as part of, fingerprints or signatures. Using a utility that scans files for such signatures would help infer that a product uses certain algorithms. There are also “Big Code” approaches to binary code which may be useful in mining large amounts of code for patterns.[9B]

The accompanying box, “It’s Not Just Ones and Zeroes,” summarizes the types of readily- ascertainable information available in software products.

The best-known types of software RE tools are disassemblers and decompilers. These tools translate object code into something readable by humans. A disassembler displays object code in a readable assembly language. This would, for example, allow an expert to trace back from an error or warning message to the code that triggers it.[10] A decompiler attempts to recover some semblance of the original source code. While generally no more feasible than translating English text into Russian, and then attempting to recover the original English from the Russian translation, decompilation is possible for some programming languages such as Java or Flash animation (SWF), and environments such as Microsoft’s .NET.

The techniques described above are all forms of reading the software product in one way or another as a piece of text. These techniques can be classified as static RE. Another set of methods, dynamic RE, actively runs the accused product under the control of another program, which tests or tracks the product’s behavior. Dynamic RE tools include debuggers, network monitors (packet sniffers) and file monitors, and API trace utilities. See the accompanying box, “Software Reverse Engineering Tools.”

Using Reverse Engineering with Source Code

So is there any reason to get the source code? Yes, even if only to confirm and prove one’s RE findings with the opponent’s own documents. In addition, the object code inside a product is in one important sense incomplete. Just as some information in blueprints may not be reflected in the finished building, comments (notes that programmers put into the source code, explaining for example what a piece of code is intended to do, or why it is implemented in a certain way) are an important source-code component which is lost during compilation, and so cannot be recovered from object code. Comments are a key reason to seek the opponent’s source code in discovery.

If comments cannot be retrieved from the product, conversely is there anything in the product that cannot be read in the source code? Perhaps surprisingly, yes. Because not all of the source code is necessarily used to produce the accused product, and because not all code is necessarily executed, one cannot assume from its mere presence in source code that a given element or step is actually carried out by the accused product.[11] At the end of the day, it is the finished product, not the source code, which will or won’t be subject to an injunction or damages.

That the source and the product are different (though related) beasts also means that reverse engineering the product can help you get the source code in discovery. Like deleted text in a word-processing document, a software product may without the vendor’s awareness contain not only fragments of source code, but filenames or even full pathnames of source code.

For example, the wireless modem firmware update noted earlier contains filenames such as “gpssrch_dispatcher.c”, and pathnames such as “pistachio/kernel/src/generic/kmemory.cc” and “drivers\boot\boot_shared_progressive_boot_block.c”. Some of these correspond to open-source software used in the modem; knowing this, at least this portion of the source code can be accessed, without a discovery request to the modem’s vendor. (Of course, the vendor’s modifications, additions, and deletions to the open-source code are a different story.)

Furthermore, one can always frame better discovery requests if you know the proper nouns to invoke (“ask for it by name”). Most software products are not monoliths. You are likely especially interested in source code for specific modules, whose names and versions you or your expert may learn from the product. Software products also surprisingly often contain developer names, which can be used to propound alarmingly-pinpointed discovery requests (“how did they know that?!”).

Finally, once you do get the source code, there are two ways in which having already inspected the final product will help. Your expert will have a much better idea what to look for in the source. And, if the product contains filenames or pathnames, as they often do, you can cross-check the produced source code for completeness. Amazingly, many software vendors do not possess all the source code for their products.[12] For this and other reasons, source code produced in discovery may be incomplete. This can sometimes be determined by checking cross-references within the source code, but filenames or pathnames from the product also help reveal that something is missing.

Using Reverse Engineering to Locate Prior Art

While this article has focused on pre-claim investigation of software patent infringement, RE can also be used at other stages in the patent lifecycle. In particular, the simpler RE methods noted above may be used to show prior use or knowledge of otherwise-unpublished software inventions. It is the simpler methods such as “strings” which are likely applicable here, rather than disassembly, because prior use or knowledge is only anticipating if it is publicly disclosed and enabling.

It is an interesting question whether a publicly-available software product, holding within it an undocumented invention which could later be disassembled, is anticipatory, because information only accessible upon disassembly may be viewed as not publicly disclosed. Strings plainly visible inside an object file placed on the internet are far more likely to constitute prior use or knowledge. This follows naturally from the basic point of this article, which is that software products often contain readily-accessible useful information – including fragments of source code which the vendor has wittingly or unwittingly made public.

[An earlier version of this article appeared in Intellectual Property Today, November 2010]

It’s not just ones and zeroes: Some types of strings, metadata, and magic numbers found in software products

  • Menus and dialog box text (“resources”)
  • Error and warning messages
  • Self-tests (“assertions”)
  • Diagnostics and logging information
  • Debugging information, including source-code path/filenames
  • Embedded scripting code, e.g. HTML, SQL
  • Filenames and URLs of data files
  • Application Programming Interface (API) and module import/export tables
  • Registry or configuration setting keys and values
  • Magic numbers characteristic of standard algorithms
  • Globally Unique IDs (GUIDs, UUIDs)

Software reverse engineering tools

Static reverse engineering:

Dynamic reverse engineering:

Notes

1 See David Hricik, Patent Ethics: Litigation (2010), 99-101 (Application of Claims to Accused Product or Process), John Skenyon, “Investigation Needed Before Bringing Suit,” Patent Litigation ed. Laurence Pretty (PLI, 2004), ch. 2.

2 See Joseph Hosteny, “The Winnable Contingent Fee Case,” Intellectual Property Today, July 2010, 12-13 (“… the inventor who says someone infringes because that’s the only way the infringement can occur…. Inventors do not always appreciate that there may be more than one way to achieve the goals of an invention.”)

3 But see Network Caching v. Novell (2003 WL 21699799 (N.D. Cal., 2003)) (contentions based on marketing materials and other publicly available product documentation not exemplary, but sufficient).

4 See Jeffrey Lewis and Art Cody, “Unscrambling the Egg: Pre-Suit Infringement Investigations of Process and Method Patents,” 84 J. Pat. & Trademark Off. Soc’y 5 (2002).

[4B] See Schulman, “LA Law” [article on Stac v. Microsoft], Dr. Dobb’s Journal (May 1994), http://www.drdobbs.com/undocumented-corner/184409244 .

5 http://www.microsoft.com/windowsxp/eula/home.mspx (June 1, 2004)

6 For possible tension between DMCA anti-circumvention on the one hand, and Rule 11 RE requirement on the other, see Jeffrey Sullivan and Thomas Morrow, “Practicing Reverse Engineering in an Era of Growing Constraints under the Digital Millennium Copyright Act and Other Provisions,” 14 Alb. L.J. Sci. & Tech. 1, 38-48 (2003).

7 But see Intamin Ltd. v. Magnetar Technologies (483 F.3d 1328 (Fed. Cir., 2007)) (no sanctions for failure to obtain and cut open metal casing in roller- coaster braking system; distinguished from ease of obtaining sample in Judin).

8 Matt Pietrek, “Remove Fatty Deposits from Your Applications Using Our 32-bit Liposuction Tools,” Microsoft Systems Journal, Oct. 1996 (http://www.microsoft.com/msj/archive/s572. aspx).

9 See e.g. Andrew Schulman et al., Undocumented DOS, 2nd ed. (1993); Schulman et al., Undocumented Windows: A Programmer’s Guide to Reserved Microsoft Windows API Functions (1992); Geoff Chappell, notes on Windows kernel, Win32, shell, Internet Explorer, at http://www.geoffchappell.com.

[9B] See e.g. David & Yahav, “Tracelet-Based Code Search in Executables,” http://www.cs.technion.ac.il/~yahave/papers/pldi14-tracelets.pdf, and generally papers at “PRIME: Programming with Millions of Examples” (http://www.cs.technion.ac.il/~yahave/prime/index.html), “Statistical Similarity of Binaries” (http://binsim.com/), and “Learning from Big Code” (http://learnbigcode.github.io/challenges/notthereyet/).

10 See Andrew Schulman, “Examining the Windows AARD Detection Code,” Dr. Dobb’s Journal (Sept. 1993), http://www.drdobbs.com/184409070.

11 See Lee Hollaar,, “Requesting and Examining Computer Source Code,” 4 Expert Evidence Report (BNA) 238-241 (May 10, 2004).

12 In private antitrust litigation, Microsoft produced internal company documents complaining that its Windows source code was overly decentralized (“Windows as you know contains many pieces of functionality from different groups around the company. Regardless of product, good engineering practice would require us to be able to do a fresh build of a product at any time using the same tools. Unfortunately, we cannot do this with Windows today…. We need all the source code for Windows being built out of one place with one consistent set of tools. It is actually amazing how we have not done this for so long…. We need to be able to build what we ship long after we RTM [release to manufacturing]…. There are legal obligations regarding our ability here”). Skype in litigation has also noted lack of source code, as has Toshiba (Juniper Networks v. Toshiba).