Frequently-asked questions regarding source-code examination/review for litigation, followed by FAQs regarding software reverse engineering for litigation:
WITH WHICH PROGRAMMING LANGUAGES DO YOU WORK?
Frequently source code for a given product or service is written in multiple languages, including custom, in-house or proprietary languages, mini languages, and templates; it is important not to overlook these in a source-code examination.
WHAT TOOLS DO YOU USE FOR SOURCE-CODE EXAMINATION?
Both industry-standard tools (e.g. SciTools Understand; dtSearch; grep; findstr; PowerGrep; diff; WinMerge; etc.) and custom tools (scripts).
The ability to write custom tools is frequently important in source-code examination, as computers hosting the disputed source code are generally locked down (e.g. without internet access) under a protective order (PO), often without adequate examination tools having been agreed to in the PO, and the examiner must therefore be able to craft tools on the spot, using scripting facilities present on the computer, such as awk on Apple computers and Visual Basic (VB) on Windows computers. At the same time, use of industry-standard tools wherever possible is important to meet potential Daubert methodology challanges.
During source-code examinations, we make extensive use of scripting and the operating system’s “command line” to semi-automate portions of the examination. The scripting and automation facilities we use are available even on the locked-down computers generally used under source-code protective orders. Surprisingly, our ability to quickly write programs to help examine large amounts of source code appears to be rare among source-code examiners, who generally depend more on as-provided GUI tools than we do.
WHAT PROCESS OR METHODOLOGY SHOULD BE USED FOR SOURCE-CODE EXAMINATION?
While each source-code examination is different in some ways, a general approach before the source-code examination starts is:
- If consultants/experts are brought on sufficiently early, they should help craft discovery requests
- Before starting source-code examination, ideally consultant/expert should have already gleaned as much information as possible from non-source materials, including any publicly-accessible product or service; see reverse engineering FAQs below and material on pre-filing investigation
At the source code examination itself:
- Consultant/expert must pay careful attention to limits on note-taking, printing, etc. set by protective order (PO); generally no other electronics allowed in same room as source-code computer
- Do initial inventory of source-code production (products, version numbers, dates, platforms, programming languages); count directories, files, and lines of code
- Test completeness/responsiveness of source-code production, using both internal and external tests
- Initial selection of relevant portions of source-code production (e.g. specific versions within damages period)
- Initial search for keywords/synonyms/terminology (including e.g. terms from claim construction in patent litigation)
- Select candidate source-code files, functions, structures, classes, etc. for closer inspection
- Widened search, with additional terminology suggested by initial reading of candidate files
- Tracing (i.e., code examination NOT determined by keywords) to and from selected files/functions, including determination of code’s role within overall program
- Close reading of code, e.g. to determine whether names, terminology, comments, etc. accurately reflect code operation
- Note taking and printing, within limits set by PO
After the source code examination:
- Reporting/memos as requested by client, including e.g. claims charts for patent litigation
- Possibly amend/supplement discovery requests, as a result of missing code, or of newly-revealed products/versions
- Possibly schedule return source-code visits
See additional notes on the before/during/after source-code examination process in outline for book on source-code exam .
CAN SOURCE CODE REVIEW BE COMBINED WITH OTHER WAYS OF EXAMINING SOFTWARE?
Yes; in addition to the often-required use of reverse engineering e.g. as part of a reasonable pre-filing investigation (see FAQs below), non-source techniques include:
- Comparison/correlation of source code with products/services (e.g. “latent code” issues; creating source-code maps from binary/object code, as preparation for discovery)
- Comparison and correlation of internal documents from discovery (emails, etc.) with resulting technical practices
- Open-source examination: comparison/correlation of open source with products/services, and with proprietary source code, including vendor’s modifications to open source
- Correlation of source code with public information, such as application programming interface (API) documentation, software development kit (SDK) sample code, header files, open source, and so on.
HOW DOES A LITIGATION-DRIVEN SOURCE-CODE REVIEW DIFFER FROM HAVING ANY COMPETENT PROGRAMMER SEARCH AND READ THE CODE?
The methodology followed in a litigation-driven source-code examination is based on widely-accepted, industry-standard practices for source-code examination as used in non-litigation contexts such as code maintenance (improvements and bug fixes), walk-throughs, and security auditing.
However, a source-code exam for litigation requires a focus on answering narrowly-framed technical questions that arise from the legal issues. A source-code examination generally cannot be a “holistic” attempt to understand the code as a whole. An outside consultant/experts’s perspective is generally more credible to the court than, and may uncover important information missed by, an insider who is more intimately familiar with the code. At the same time, the outside consultant can often work productively with a client’s in-house developers.
Source code examination for litigation also differs from any simple searching and reading of the code. As one example, a proper source-code exam will be alert to the possibility of code which is unused in the finished product (“dead” or “latent code”); determining code usage or non-usage may require careful tracing (distinct from searching) through the source code, comparison of the source code with the binary/object code in the finished product, or even dynamic testing of the “live” product. Further, whereas a simple keyword search might end up treating the source code as a disconnected collection of lines of code, a true source-code examination will rely, for example, on:
- the relationship between caller and callee functions;
- child/parent class relationships;
- aliasing of names used for functions, parameters, variables, and data structures;
- actual operation of code (in contrast to relying entirely on the accuracy of names or comments in the code);
- awareness of implicit function calls (e.g. constructor/destructor), and implied function parameters;
- classes and modules;
- directory and filename path information;
- version control information and other metadata;
- careful attention to function pointers, vtables, operator/function overloading, hooks, on-event handlers/subscribers, run-time dynamic linking, and externally-generated events; and
- tracing data flow.
FOR WHICH LEGAL ISSUES CAN SOURCE CODE BE IMPORTANT?
- Patent litigation:
infringement and non-infringement of software/internet/business-method/device patents;
on-sale and public-use bars;
practicing/working of patent, including technical prong of ITC domestic-industry test;
to a somewhat lesser extent (because prior art must have been publicly accessible at the relevant time, and source code may have been confidential, albeit associated with a publicly-accessible product/service), anticipation and obviousness;
examination guided by claim construction (including multiple alternative constructions if pre-Markman).
both literal and non-literal copying;
substantial similarity in code structure or sequence;
tokenized and structural testing to assess non-literal copying;
translation between different programming languages;
presence of errors e.g. misspellings or “salting”;
presence of “constrained” or mutually-borrowed code (including e.g. open source);
showing access to underlying work;
assessment of overlap percentages;
assessment of overlapping code’s role within overall product/service.
- Trade secrets:
misappropriation through both literal and structural overlap;
showing use of reasonable security precautions (encryption, obfuscation);
assessment of importance of secrecy to the value of overlapping code;
determine whether overlapping code within confidentiality or NDA agreements (including as modified by state law).
- Antitrust, competition law:
fake warning/error messages;
differential access to undocumented APIs;
determination of importance of withheld technical information (“essential facilities” assertions);
correlation of code with internal documents, to determine if internal plans “came to fruition.”
- Products liability (including e.g. software-based medical devices):
acceptable vs. unacceptable software defects;
is defect within license exclusion?
- Contracts disputes:
was agreed-upon code delivered on schedule?;
is delivered code reasonably fit for intended use?;
customary trade practices.
HOW DOES SOURCE-CODE EXAMINATION DIFFER FROM OTHER FORMS OF E-DISCOVERY?
WHAT IMPACT CAN A PROTECTIVE ORDER (PO) HAVE ON SOURCE-CODE EXAMINATION?
For the impact on the source-code examination methodology that can be followed, on the expert report, on note-taking and printing, and other potential effects of the PO, see chapter on protective orders and chapter on source-code examination environment in forthcoming book on source-code exam.
In a copyright infringement case, it is especially important that the two source-code productions to be compared be provided on the same computer; those negotiating POs sometimes overlook this.
For further information on source-code examination for litigation, see outline for forthcoming book.
To discuss source-code examination services from SoftwareLitigationConsulting.com, email email@example.com or call us (707-495-5240).
For sample projects that have been performed for clients, relating to both source-code examination and software reverse engineering, see list.
Frequently-asked questions (or frequently-held assumptions) about using reverse engineering as a litigation tool in software/internet patent cases:
“WHERE ELSE WOULD YOU LOOK, TO DETERMINE INFRINGEMENT, BESIDES THE SOURCE CODE?”
Anyone who has litigated software/internet patents, has probably heard (or even thought) something like the following:
- “There’s only so much we can do about factual investigation, until we get the other side’s source code.”
- “Where else would you look, to determine infringement, besides the source code?”
- “Source code is basically the only place to find out if a patent claim is infringed, or anticipated by, a software/internet product/service.”
- “They’re going to have to show us their source code, so why bother reverse engineering their product?”
In twenty years of working as a consulting technical expert in software and internet patent litigation, I’ve frequently heard sentiments such as these.
While source code is very important here (see source-code FAQs above), at the same time the source code is NOT the same as the product or service that’s out on the market — i.e., the thing that actually generates revenue, that is accused of patent infringement, or that can be used as anticipatory prior art.
The other side’s source code is usually proprietary, tightly held as the “crown jewels,” and only available to your experts or consultants (apart from open source) under increasingly-stringent protective orders. In contrast, the other side’s products or services may exist in millions of readily-available copies.
And not merely readily available, but also readily examined by an expert, using tools and methods such as text extraction (“strings”), network monitoring, object/class listings, and (for certain types of code) decompilation.
These methods of software examination are sometimes called “reverse engineering.”
In litigation involving physical devices, both sides may employ what are sometimes called “teardown labs.” This “teardown” process is also applicable to software: even the simplest product is made up of numerous parts, components, or modules (whose names are often available right inside the product).
WHY BOTHER WITH ANYTHING EXCEPT THE SOURCE CODE? AND WHAT’S IN IT FOR DEFENDANTS?
But since the source code is eventually going to be made available during discovery, why bother hiring SoftwareLitigationConsulting.com to do an in-depth examination of the product itself, rather than wait for the source code?
If you represent the plaintiff, the reason is fairly clear: courts generally interpret Rule 11 and the Local Patent Rules to require “reverse engineering or its equivalent” for pre-discovery contentions. Courts have rejected even “preliminary” infringement contentions based entirely on marketing materials, web site screen-shots, or technical manuals. The courts generally want litigants in these cases to look at primary sources. Possibly forthcoming legislation (Rep. Bob Goodlatte’s “Innovation Act” patent litigation reform bill, passed several years ago as HR 3309, with somewhat parallel legislation before the Senate judiciary committee) further emphasizes the need for rigorous pre-filing investigation by plaintiffs.
If you represent a defendant, the same rules apply for your initial invalidity contentions. Further, when it comes to showing on-sale or public use, you’re better off pointing to the actual product or service that was on the market, than to the source code which was likely not public at the relevant time. Recall too that anticipatory prior art must have been capable at the time of teaching the person having ordinary skill in the art (PHOSITA), so when using earlier software or services as prior art, again you’re better off if you can point to something in an actual product which was on the market, rather than pointing to tightly-held source code. And if prior art was created by a third party, you might not even get access to that third party’s source code.
As for your own software, surely there’s no reason to reverse engineer this, since you already have the source code? However, it’s remarkable how often outsiders, looking only at the finished product, can see something more likely to be missed when only viewing the source code. The news is filled with reports of outsiders finding “security vulnerabilities” that are unknown to the company or team which developed a piece of software. A similar phenomenon is at work here in litigation-based code examination: an outsider’s perspective on your own code is often useful.
For both parties, waiting for the source code makes the source-code examination, when it does occur, more time-consuming and expensive than it need be. A litigant could often have learned a lot beforehand about a product or service’s internal operation – including, frequently, the layout of its source code, including full path/filenames – just from careful inspection of the product readily available on the market. When the source code later becomes available, it’s better if your expert or examiner beforehand knows, in detail, what to expect. (This also helps determine whether any source code is missing from the production.)
Both parties will also eventually need to compare the source code with the actual product or service, to determine the role played, in the actual revenue-generating good, of claim elements/steps identified in the source code. Sometimes source code doesn’t make its way into the final product, or is never executed (“latent code”), or is only executed under rare circumstances (which fact may impact damages calculations).
IS THE KIND OF INFORMATION I NEED REALLY ACCESSIBLE?
Sure, you’re saying, I already knew the source code isn’t identical to the product. But isn’t the product just “object code” or “binary code,” just a collection of 1s and 0s that are of little use as evidence? And, okay, even if it also contains some text, is that text going to be relevant to the specific limitations of the patent claims?
You might be surprised at how much information can generally be derived, without source code, directly from software/internet products and services, and how much of this information is at the right level of granularity to do the element-by-element or step-by-step analysis needed to make at least a preliminary showing of infringement, non-infringement, or some aspects of invalidity.
Software and internet products and services, when an expert examiner digs into them, often contain readable text with the names of functions, modules, data structures, certain types of variables, and even source-code filenames (that’s right, often you can know the names of the other side’s source-code files, before ever seeing them). Here’s some randomly-selected examples of fragments extracted – NOT from source code, but from the binary/object code – in products on the market:
- “Unable to send trigger for metric 0x%x with device configuration 0x%x because AWDServerConnection::getInstance or getServerFacade returned NULL.”
- “- (id)imageForKey:(id)arg1 generateImageWithBlockIfNecessary:(id)arg2;”
- “What is going on!? The directory containing user generated vibration store file was created successfully, but somehow it still doesn’t exist.”
- “void __thiscall CConnection::CalculateHashForSessionHT(unsigned long)”
Granted, these look to many readers like turkey hash. But no more so than source code. And you can see that this information – coming right out of publicly-accessible iPhone and Windows products – uses the same type of terminology that patent claims tend to use: sending triggers for metrics, get server facades, generating images with blocks, user-generated vibration store files, and so on. The line that ends with “.m” is the pathname for an Apple source-code file for iTunes, written in Objective-C.
What about firmware, or software located behind a firewall, or software matching a process/method claim? Isn’t all that pretty much inaccessible, without the source code?
In some cases, yes. But as a general rule, no. Rather than blithely assume that the information is entirely within the possession, custody, and control of the software vendor, we start with the assumption that the information is out there, on public display and in “plain view” – if you know how to look. While sometimes it does turn out that the code really is locked down behind a firewall or thoroughly encrypted and buried inside a physical device, even then, one can learn a lot about server processes, for example, through the “dynamic reverse engineering” mentioned above; firmware is often available, apart from the target device, in the form of firmware update files provided on the internet, and these files can be reverse engineered.
IS IT PERMISSIBLE TO REVERSE ENGINEER SOFTWARE AS PART OF A LITIGATION-RELATED INVESTIGATION?
Many courts explicitly require “reverse engineering or its equivalent” as the hallmark of a reasonable pre-filing investigation. See for example the frequently-cited Network Caching v. Novell (ND Cal., 2002): “FRCP 11 requires that a plaintiff compare an accused product to its patents on a claim by claim, element by element basis for at least one of each defendant’s products. While the court is reluctant to hold that in all cases such a comparison requires reverse engineering of the defendant’s products, the court finds that reverse engineering or its equivalent is required.”
This likely trumps any clickwrap license language, especially as no-reverse-engineering clauses frequently carry their own caveats. For example’s Microsoft’s standard end-user license agreement (EULA): “You may not reverse engineer, decompile, or disassemble the Software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation” (emphasis added).
Further, most of the tools and methods used in pre-filing investigation of software products and services, while fitting the broadest technical definition of “reverse engineering,” likely do not fit the legal definition as employed in license agreements, which often are more narrowly concerned with the ostensible ability to recreate the original source code via decompilation and disassembly. Pre-filing investigation of software patent infringement more typically employs string extraction, header information, and network monitoring. Decompilation of Java code is sometimes employed. Disassembly is only rarely used during pre-filing investigation.
There’s a much more complete discussion of this issue (the need for reverse engineering as part of a reasonable pre-filing investigation, as it interacts with license agreement language regarding reverse engineering) in my IP Today article on using reverse engineering to uncover patent infringement, and two-part New Matter article (part 2) on using reverse engineering to uncover prior art.
ISN’T SOURCE CODE EXAMINATION IMPORTANT FOR SOFTWARE PATENT LITIGATION?
SoftwareLitigationConsulting’s use of reverse engineering enhances our approach to source-code examination. By examining the product early on, we often know exact module/file names and function names, before ever stepping into the source-code examination room. We can often create “maps” of the source code, from the binary/object product on the market. This can help focus source-code requests. It also helps detect when a source-code production is incomplete.
WHAT CAN SOFTWARE LITIGATION CONSULTING DO TO HELP?
Email firstname.lastname@example.org or call us (707-495-5240) to discuss how software reverse engineering can be used as a tool in your case, before source code becomes available in discovery, and/or to provide necessary context to a source-code examination.