Frequently-asked questions regarding source-code examination/review for litigation, followed by FAQs regarding software reverse engineering for litigation:
WITH WHICH PROGRAMMING LANGUAGES DO YOU WORK?
We’ve worked extensively with source-code productions in the following languages: C, C++, Java, JavaScript, Objective-C, C#, PHP, Python, Intel 32- and 64-bit assembly language, and other languages.
Frequently source code for a given product or service is written in multiple languages, including custom, in-house or proprietary languages, mini languages, and templates; it is important not to overlook these in a source-code examination.
WHAT TOOLS DO YOU USE FOR SOURCE-CODE EXAMINATION?
Both industry-standard tools (e.g. SciTools Understand; dtSearch; grep; findstr; PowerGrep; diff; WinMerge; etc.) and custom tools (scripts).
The ability to write custom tools is frequently important in source-code examination, as computers hosting the disputed source code are generally locked down (e.g. without internet or USB access) under a protective order (PO), often without adequate examination tools having been agreed to in the PO. The examiner must therefore be able to craft tools on the spot, using scripting facilities present on the computer, such as awk on Apple computers, or PowerShell and WSH on Windows computers. At the same time, use of industry-standard tools wherever possible is important to meet potential Daubert methodology challenges.
During source-code examinations, we make extensive use of scripting and the operating system’s “command line” to semi-automate portions of the examination. The scripting and automation facilities we use are generally available even on the locked-down computers provided under source-code protective orders. Surprisingly, our ability to quickly write programs to help examine large amounts of source code appears to be rare among source-code examiners, who generally depend more on as-provided GUI tools than we do.
WHAT PROCESS OR METHODOLOGY SHOULD BE USED FOR SOURCE-CODE EXAMINATION?
While each source-code examination is different, a general approach can be described, including work to be done before and after the examination. Before the source-code examination starts:
- If consultants/experts are brought on sufficiently early, they should help craft discovery requests.
- Before starting source-code examination, ideally consultant/expert should have already gleaned as much information as possible from non-source materials, including any publicly-accessible product or service; see reverse engineering FAQs below and material on pre-filing investigation.
- Frame narrow technical questions, to be answered at least in part by examining the source code.
At the source code examination itself:
- Consultant/expert must pay careful attention to limits on note-taking, printing, etc. set by protective order (PO); generally no other electronics allowed in same room as source-code computer.
- Do initial inventory of source-code production (products, version numbers, dates, platforms, programming languages); count directories, files, and lines of code.
- Test completeness/responsiveness of source-code production, using both internal and external tests.
- Initial selection of relevant portions of source-code production (e.g. specific versions within damages period).
- Initial search for keywords/synonyms/terminology (including e.g. terms from claim construction in patent litigation).
- Possibly perform software component analysis (SCA; this will not always be feasible under PO restrictions, as SCA may require comparison of source code with an external repository of code “signatures”).
- Select candidate source-code files, functions, structures, classes, etc. for closer inspection.
- Widened search, with additional terminology or criteria (dates, developer names, API usage, etc.) suggested by initial reading of candidate files (using scripting facilities, this can become a form of TAR [technology assisted review] applied to source code).
- Performing comparisons:
- in software patent litigation, comparing code to the elements/steps of patent claims, interpreted using claim construction;
- in copyright or trade-secret litigation, the examination will typically require comparison between two different source-code repositories; this comparison should account for both literal and non-literal similarities. In copyright litigation the examiner should ask whether any similarities reflect actual copying, and whether similarities are protectable expression; in trade-secrets litigation, whether any similarities fall within the definition of a trade secret.
- in a contracts dispute, source code will often be compared to a requirements specification, or to a published (or de facto) industry standard.
- Tracing (i.e., code examination NOT determined by keywords) to and from selected files/functions, including determination of code’s role within overall program; using top-down as well as bottom-up approaches to code examination; following code paths from entry points; tracing function pointers, callbacks, UI handlers.
- Close reading of code, e.g. to determine whether names, terminology, comments, etc. accurately reflect code operation.
- Determine discrepancies/differences between multiple versions of files/functions.
- Note taking and printing (or requests for the producing party to print), within limits set by PO.
After the source code examination:
- Reporting/memos as requested by client, including e.g. claims charts for patent litigation.
- Possibly amend/supplement discovery requests, as a result of missing code, or of newly-revealed products/versions; this may include third-party subpoenas.
- Possibly schedule return source-code visits.
- Careful handling of printouts, under PO.
See additional notes on the before/during/after source-code examination process in outline for book on source-code exam .
CAN SOURCE CODE REVIEW BE COMBINED WITH OTHER WAYS OF EXAMINING SOFTWARE?
Yes; in addition to the often-required use of reverse engineering e.g. as part of a reasonable pre-filing investigation (see FAQs below), non-source techniques include:
- Comparison/correlation of source code with products/services (e.g. “latent code” issues; creating source-code maps from binary/object code, as preparation for discovery).
- Comparison and correlation of internal documents from discovery (emails, requirements specifications, etc.) with resulting technical practices.
- Identifying specific code referenced in deposition transcript or from informal interviews with developers or managers.
- Open-source examination: comparison/correlation of open source with products/services, and with proprietary source code, including vendor’s modifications to open source.
- Correlation of source code with public information, such as application programming interface (API) documentation, software development kit (SDK) sample code, header files, open source, and so on.
- Comparison of source code (or decompiled/disassembled software) with a “Big Code” repository of software signatures, as part of software component analysis (SCA).
HOW DOES A LITIGATION-DRIVEN SOURCE-CODE REVIEW DIFFER FROM HAVING ANY COMPETENT PROGRAMMER SEARCH AND READ THE CODE?
The methodology followed in a litigation-driven source-code examination is based on widely-accepted, industry-standard practices for source-code examination as used in non-litigation contexts such as code maintenance (improvements and bug fixes), walk-throughs, and security auditing (see software inspection methodologies in รงรง 8.6.5.3, 8.6.5.4, and 8.6.5.5 of chapter 8 on experts in source-code book detailed outline).
However, a source-code exam for litigation requires a focus on answering narrowly-framed technical questions that arise from the legal issues (raised by the elements of a cause of action [COA] such as copyright infringement or breach of contract, or case-specifically by e.g. the elements/steps of a patent claim). A source-code examination generally cannot be a “holistic” attempt to understand the code as a whole. An outside consultant/experts’s perspective is generally more credible to the court than, and may uncover important information missed by, an insider who is more intimately familiar with the code. An outsider can also sometimes see things missed by insiders (as shown by the large numbers of software security bugs found by third-party penetration testers and “bounty hunters,” and by outside researchers). At the same time, the outside consultant can often work productively with a client’s in-house developers.
A litigation-driven source code examination typically involves comparison to a greater extent than does non-litigation source code use (though the “diff” utility is heavily used as part of version control to determine differences between older and new versions of source code). Most clearly in copyright and trade-secrets litigation, a source code examination will include comparison between plaintiff’s and defendant’s source code. Code-to-code comparisons are rare in patent litigation, but source code is compared against the text (as properly interpreted under claim construction) of the patent claims asserted to have been infringed. In a contracts dispute, source code will often be compared to a requirements specification and/or to industry standards, often more carefully than might occur as a normal part of software development.
Source code examination for litigation also differs from any simple searching and reading of the code. As one example, a proper source-code exam will be alert to the possibility of code which is unused in the finished product (“dead” or “latent code”); determining code usage or non-usage may require careful tracing (distinct from searching) through the source code, comparison of the source code with the binary/object code in the finished product, or even dynamic testing of the “live” product. Further, whereas a simple keyword search might end up treating the source code as a disconnected collection of lines of code, a true source-code examination will rely, for example, on:
- the relationship between caller and callee functions;
- child/parent class relationships;
- aliasing of names used for functions, parameters, variables, and data structures;
- actual operation of code (in contrast to relying entirely on the accuracy of names or comments in the code);
- awareness of implicit function calls (e.g. constructor/destructor), and implied function parameters;
- classes and modules;
- directory and filename path information;
- version control information and other metadata;
- careful attention to function pointers, vtables, operator/function overloading, hooks, on-event handlers/subscribers, run-time dynamic linking, and externally-generated events; and
- tracing data flow.
FOR WHICH LEGAL ISSUES CAN SOURCE CODE BE IMPORTANT?
- Patent litigation:
- determining infringement and non-infringement of software/internet/business-method/device patents, by comparing source code to the elements/steps of asserted patent claims (as interpreted under claim construction), looking for presence of all elements/steps (infringement), or their equivalents; or absence of at least one (non-infringement);
- examination guided by possible multiple alternative constructions, if pre-Markman;
- to a somewhat lesser extent (because prior art must have been publicly accessible at the relevant time, and source code may have been confidential, albeit associated with a publicly-accessible product/service), helping determine anticipation and obviousness;
- patent owner’s own earlier public use or sale, for statutory on-sale and public-use bars;
- patent owner’s own practicing/working of patent, for technical prong of ITC domestic-industry test (and as possibly relevant for injunctive relief);
- design around and non-infringing alternatives (NIA);
- correspondence of code to standards asserted to infringe patent claims (SEP, FRAND, SSO).
- Copyright:
- separating protected from non-protected code;
- analyzing copying as a separate issue from similarity (determine whether similar code represents independent creation, and/or if matching code derives from common source)
- determining proper level of generality for determination of non-literal similarity;
- both literal and non-literal copying;
- abstraction/filtration/comparison;
- substantial similarity in code structure or sequence;
- tokenized and structural testing to assess non-literal copying;
- derivative works;
- translation between different programming languages;
- presence of errors e.g. misspellings or “salting”;
- presence of “constrained” or mutually-borrowed code (including e.g. open source);
- showing access to underlying work, for copying determination (though access may also be inferred from near-identicality not explained by constraints);
- assessment of overlap percentages;
- assessment of overlapping code’s role within overall product/service, for technical portion of damages calculation.
- Trade secrets:
- determining whether assertedly-taken code is, or contains, trade secrets (including TS compilation);
- misappropriation through both literal and structural overlap;
- showing use or absence of reasonable security precautions (RSP, e.g., encryption, obfuscation, passwords, authentication);
- assessing importance of secrecy to the value of overlapping code (including whether a secret was at the relevant time nonetheless reasonably ascertainable using proper means, including reverse engineering);
- determine whether overlapping code is within confidentiality or NDA agreements (including as modified by state law).
- Antitrust, competition law:
- deliberate incompatibilities;
- fake warning/error messages;
- differential access to undocumented APIs;
- standards misuse;
- determination of importance of withheld technical information (“essential facilities” assertions);
- correlation of code with internal documents, to determine if internal plans “came to fruition”;
- assessing “integration” vs. loose-coupling of bundled components, for technical portion of tying analysis.
- Products liability (including e.g. software-based medical devices):
- acceptable vs. unacceptable software defects;
- is defect within license exclusion.
- Contracts disputes:
- was agreed-upon code delivered on schedule?;
- is delivered code reasonably fit for intended use?;
- software-failure analysis;
- comparison of delivered code to industry standards, both published and de facto (inferred from customary trade practices).
HOW DOES SOURCE-CODE EXAMINATION DIFFER FROM OTHER FORMS OF E-DISCOVERY?
See article comparing and contrasting source-code exam with e-discovery.
WHAT IMPACT CAN A PROTECTIVE ORDER (PO) HAVE ON SOURCE-CODE EXAMINATION?
For the impact on the source-code examination methodology that can be followed, on the expert report, on note-taking and printing, and other potential effects of the PO, see article on source code POs, and chapter on source-code examination environment in forthcoming book on source-code exam.
In a copyright infringement or trade-secrets misappropriation case, it is especially important that the two source-code productions to be compared be provided on the same computer; those negotiating POs sometimes overlook this.
For further information on source-code examination for litigation, see outline for forthcoming book.
To discuss source-code examination services from SoftwareLitigationConsulting.com, email undoc@sonic.net or call us (707-495-5240).
For sample projects that have been performed for clients, relating to both source-code examination and software reverse engineering, see list.
Frequently-asked questions (or frequently-held assumptions) about using reverse engineering as a litigation tool in software/internet patent cases:
“WHERE ELSE WOULD YOU LOOK, TO DETERMINE INFRINGEMENT, BESIDES THE SOURCE CODE?”
Anyone who has litigated software/internet patents, has probably heard (or perhaps even thought) something like the following:
- “There’s only so much we can do about factual investigation, until we get the other side’s source code.”
- “Where else would you look, to determine infringement, besides the source code?”
- “Source code is basically the only place to find out if a software patent claim is infringed, or anticipated by, a software/internet product/service.”
- “They’re going to have to show us their source code in discovery, so why bother reverse engineering their product?”
In twenty years of working as a consulting technical expert in software and internet patent litigation, I’ve frequently heard sentiments such as these.
While source code is very important here (see source-code FAQs above), at the same time the source code is NOT the same as the product or service that’s out on the market — i.e., the thing that actually generates revenue, that is accused of patent infringement, or that can be used as anticipatory prior art.
The other side’s source code is usually proprietary, tightly held as the “crown jewels,” and only available to your experts or consultants (apart from open source) under increasingly-stringent protective orders. In contrast, the other side’s products or services may exist in millions of readily-available copies.
And not merely readily available, but also readily examined by an expert, using tools and methods such as text extraction (“strings”), network monitoring, object/class listings, and (for certain types of code) decompilation.
These methods of software examination are sometimes called “reverse engineering.”
In litigation involving physical devices, both sides may employ what are sometimes called “teardown labs.” This “teardown” process is also applicable to software: even the simplest product is made up of numerous parts, components, or modules (whose names are often available right inside the product).
WHY BOTHER WITH ANYTHING EXCEPT THE SOURCE CODE? AND WHAT’S IN IT FOR DEFENDANTS?
But since the source code is eventually going to be made available during discovery, why bother hiring SoftwareLitigationConsulting.com to do an in-depth examination of the product itself, rather than wait for the source code?
If you represent the plaintiff, the reason is fairly clear: courts generally interpret Rule 11 and the Local Patent Rules to require “reverse engineering or its equivalent” for pre-discovery contentions. Courts have rejected even “preliminary” infringement contentions based entirely on marketing materials, web site screen-shots, or technical manuals. The courts generally want litigants in these cases to look at primary sources. Possibly forthcoming legislation (Rep. Bob Goodlatte’s “Innovation Act” patent litigation reform bill, passed several years ago as HR 3309, with somewhat parallel legislation before the Senate judiciary committee) further emphasizes the need for rigorous pre-filing investigation by plaintiffs.
If you represent a defendant, the same rules apply for your initial invalidity contentions. Further, when it comes to showing on-sale or public use, you’re better off pointing to the actual product or service that was on the market, than to the source code which was likely not public at the relevant time. Recall too that anticipatory prior art must have been capable at the time of teaching the person having ordinary skill in the art (PHOSITA), so when using earlier software or services as prior art, again you’re better off if you can point to something in an actual product which was on the market, rather than pointing to tightly-held source code. And if prior art was created by a third party, you might not even get access to that third party’s source code.
As for your own software, surely there’s no reason to reverse engineer this, since you already have the source code? However, it’s remarkable how often outsiders, looking only at the finished product, can see something more likely to be missed when only viewing the source code. The news is filled with reports of outsiders finding “security vulnerabilities” that are unknown to the company or team which developed a piece of software. A similar phenomenon is at work here in litigation-based code examination: an outsider’s perspective on your own code is often useful.
For both parties, waiting for the source code makes the source-code examination, when it does occur, more time-consuming and expensive than it need be. A litigant could often have learned a lot beforehand about a product or service’s internal operation – including, frequently, the layout of its source code, including full path/filenames – just from careful inspection of the product readily available on the market. When the source code later becomes available, it’s better if your expert or examiner beforehand knows, in detail, what to expect. (This also helps determine whether any source code is missing from the production.)
Both parties will also eventually need to compare the source code with the actual product or service, to determine the role played, in the actual revenue-generating good, of claim elements/steps identified in the source code. Sometimes source code doesn’t make its way into the final product, or is never executed (“latent code”), or is only executed under rare circumstances (which fact may impact damages calculations).
IS THE KIND OF INFORMATION I NEED REALLY PUBLICLY ACCESSIBLE?
Sure, you’re saying, I already knew the source code isn’t identical to the product. But isn’t the product just “object code” or “binary code,” just a collection of 1s and 0s that are of little use as evidence? And, okay, even if it also contains some text, is that text going to be relevant to the specific limitations of the patent claims?
You might be surprised at how much information can generally be derived, without source code, directly from software/internet products and services, and how much of this information is at the right level of granularity to do the element-by-element or step-by-step analysis needed to make at least a preliminary showing of infringement, non-infringement, or some aspects of invalidity.
Software and internet products and services, when an expert examiner digs into them, often contain readable text with the names of functions, modules, data structures, certain types of variables, and even source-code filenames (that’s right, often you can know the names of the other side’s source-code files, before ever seeing them). Here’s some randomly-selected examples of fragments extracted – NOT from source code, but from the binary/object code – in products on the market:
- “Unable to send trigger for metric 0x%x with device configuration 0x%x because AWDServerConnection::getInstance or getServerFacade returned NULL.”
- “/SourceCache/iTunesStore_Sim/iTunesStore-661.4.2/Daemon/FairPlayDecryptFileOperation.m”
- “- (id)imageForKey:(id)arg1 generateImageWithBlockIfNecessary:(id)arg2;”
- “What is going on!? The directory containing user generated vibration store file was created successfully, but somehow it still doesn’t exist.”
- “void __thiscall CConnection::CalculateHashForSessionHT(unsigned long)”
Granted, these look to many readers like turkey hash. But no more so than source code. And you can see that this information – coming right out of publicly-accessible iPhone and Windows products – uses the same type of terminology that patent claims tend to use: sending triggers for metrics, get server facades, generating images with blocks, user-generated vibration store files, and so on. The line that ends with “.m” is the pathname for an Apple source-code file for iTunes, written in Objective-C.
What about firmware, or software located behind a firewall, or software matching a process/method claim? Isn’t all that pretty much inaccessible, without the source code?
In some cases, yes. But as a general rule, no. Rather than blithely assume that the information is entirely within the possession, custody, and control of the software vendor, we start with the assumption that the information is out there, on public display and in “plain view” – if you know how to look. While sometimes it does turn out that the code really is locked down behind a firewall or thoroughly encrypted and buried inside a physical device, even then, one can learn a lot about server processes, for example, through the “dynamic reverse engineering” mentioned above; firmware is often available, apart from the target device, in the form of firmware update files provided on the internet, and these files can be reverse engineered.
IS IT PERMISSIBLE TO REVERSE ENGINEER SOFTWARE AS PART OF A LITIGATION-RELATED INVESTIGATION?
Many courts explicitly require “reverse engineering or its equivalent” as the hallmark of a reasonable pre-filing investigation. See for example the frequently-cited Network Caching v. Novell (ND Cal., 2002): “FRCP 11 requires that a plaintiff compare an accused product to its patents on a claim by claim, element by element basis for at least one of each defendant’s products. While the court is reluctant to hold that in all cases such a comparison requires reverse engineering of the defendant’s products, the court finds that reverse engineering or its equivalent is required.”
This likely trumps any clickwrap license language, especially as no-reverse-engineering clauses frequently carry their own caveats. For example’s Microsoft’s standard end-user license agreement (EULA): “You may not reverse engineer, decompile, or disassemble the Software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation” (emphasis added).
Further, most of the tools and methods used in pre-filing investigation of software products and services, while fitting the broadest technical definition of “reverse engineering,” likely do not fit the legal definition as employed in license agreements, which often are more narrowly concerned with the ostensible ability to recreate the original source code via decompilation and disassembly. Pre-filing investigation of software patent infringement more typically employs string extraction, header information, and network monitoring. Decompilation of Java code is sometimes employed. Disassembly is only rarely used during pre-filing investigation.
There’s a much more complete discussion of this issue (the need for reverse engineering as part of a reasonable pre-filing investigation, as it interacts with license agreement language regarding reverse engineering) in my IP Today article on using reverse engineering to uncover patent infringement, and two-part New Matter article (part 2) on using reverse engineering to uncover prior art.
ISN’T SOURCE CODE EXAMINATION IMPORTANT FOR SOFTWARE PATENT LITIGATION?
Absolutely. Source code has sometimes even been called the “best evidence” of software patent infringement or non-infringement. We’ve been doing source-code examinations for about twenty years, starting with the Stac v. Microsoft case in the 1990s, with code written in C/C++, Java, Objective-C, JavaScript, PHP, and other programming languages. We’ve worked with enormous source-code productions, sometimes exceeding one million files for over 100 different (but interrelated) products. Again, see source-code FAQs above.
SoftwareLitigationConsulting’s use of reverse engineering enhances our approach to source-code examination. By examining the product early on, we often know exact module/file names and function names, before ever stepping into the source-code examination room. We can often create “maps” of the source code, from the binary/object product on the market. This can help focus source-code requests. It also helps detect when a source-code production is incomplete.
See provisional table of contents for forthcoming book on source-code examination.
WHAT CAN SOFTWARE LITIGATION CONSULTING DO TO HELP?
Email undoc@sonic.net or call us (707-495-5240) to discuss how software reverse engineering can be used as a tool in your case, before source code becomes available in discovery, and/or to provide necessary context to a source-code examination.