Source Code & Software Patents: A Guide to Software & Internet Patent Litigation for Attorneys & Experts
by Andrew Schulman (http://www.SoftwareLitigationConsulting.com)
Detailed outline for forthcoming book
Chapter 15: The source-code examination environment
15.1 Examining your own client’s code vs. the other side’s code
- Examination of your own client’s code can be done using normal non-litigation procedures for looking at source code
- It may however be best to look at your own code in the same way that the other side will, and to look at the same exact production given to the other side
- Outsider’s view vs. “straight from the horse’s mouth”: the owner may have code their engineers know nothing about, but which an outsider would see immediately [give examples where it’s important for defendant to have their own code reverse engineered, in the same way that plaintiff should have reverse engineered as part of a reasonable pre-filing investigation]
- Examination of opponent’s code will likely take place under protective order (chapter 11)
15.2 Examining the other side’s code under a protective order (PO)
- See chapter 11 on typical PO restrictions on the source-code exam
- The agreed-upon PO will have a large impact on the source-code examination environment
- It is crucial that the examiner understand the PO restrictions, before starting the source-code examination
- It is possible that these restrictions will interfere with examiner’s normal non-litigation practice (cf. Daubert)
- Yet while operating within PO restrictions, the examiner must do a reasonable examination of the source code
15.3 Typical PO restrictions affecting the source-code examination
- The source-code machine is generally disconnected from internet, so tools must be provided by producing party
- Tools commonly available as part of Windows and Apple OSX operating systems (e.g., findstr, grep, awk)
- Ability to create ad hoc tools using command line and scripting on Windows (VBScript; PowerShell) and Apple OSX (awk)
- Tools commonly provided as part of POs (Understand source-code browser; dtSearch; WinMerge; Microsoft Visual Studio; XCode; etc.)
- See full list of typical PO restrictions in chapter 11
15.4 Going to the source-code exam
- What to bring & not bring with you (e.g. bring printed copies of the PO and patents to the source-code exam, since laptops etc. will likel be excluded from the source-code room)
- When to go: see chapter 14 on “diligence,” scheduling, and return visits
15.5 What to expect in the source-code production
- Directory structure (may reflect multiple products, versions, platforms, and/or components/projects/teams within product)
- Version control systems (Perforce, etc.)
- Part of produced source code may be inside archives (e.g. ZIP or TAR files within source-code tree)
- Likely many duplicated files, or slightly-different versions of the same file, even as some other crucial files may be entirely missing (see chapter 16 on incomplete productions)
- Often, a huge quantity of code only some of which directly relates to the products at issue
- Often, a single source-code “tree” is used to build multiple products, multiple versions, possibly for multiple platforms (e.g. Windows v. Mac v. iPhone v. Android); the code will contain #ifdefs or similar conditional compilation directives
- The code should be in searchable plain-text form (plain text, possibly inside an integrated development environment [IDE] or version-control system), but sometimes even parties previously agreeing to produce in “native format” will (likely not understanding what “native format” means) produce code as PDF files
- Encryption (TrueCrypt, etc.)
- Read-only directories with at least one writeable drive (e.g. for script output; the source-code examiner will not only read code, but will likely also produce some output, at the very least an index; see chapter 17 on indexing)
- No internet connection, no external connections such as USB ports (see PO restrictions)
- Tools (see above)
- Possibly output from other examiners using the same source-code machine (is this protected work product, which others should not examine, or is it “fair game”?)
- Some source files may have non-standard extensions (e.g. *.tab for C files used in-house to generate tables then incorporated into other src code); don’t assume standard extensions *.c, *.cpp, *.js, *.java, etc.; see CoffeeScript and Handlebars *.coffee and *.hbs example below
- Source files used to automatically generate other source (e.g., “mktab”)
- Source files which have been automatically generated (e.g., “wizard” output)
- Non-Western filenames
- Build scripts, makefiles, batch/command/shell files
- Setup/config/install scripts for commercial product
- Data files containing code (e.g. SQL stored procedures)
- Code stored within wizard/studio
- Non-source documents (diagrams, specs, bug reports, emails, etc.):
- Does their inclusion within the source-code production change their status to that of source code?;
- Conversely, when source code is included within a non-source production (as often happens), does it loose its “source-codeness”?;
- Benefits and disadvantages to each other of whether a given piece of evidence is characterized as “source code”: not only benefits and disadvantages relating to discovery, but also relating to the document’s evidentiary status; a statement found in a file regarded as “source code” perhaps more directly shows presence or absence of a feature in the product, than a similar statement found in a non-source document; see ch.xxx.
- Binary object or library files
- “Dead code”, unused code, generic library code
- Header files, some not used in the relevant project
- Copies of shipping products (including older or future versions of which requesting party was not previously aware)
- Source code for previous or newer versions of which the requesting party was not previously aware
- Source code for related hardware (Verilog, VHDL, etc.)
- Code written in languages with which the examiner was not previously familiar (perhaps e.g. CoffeeScript *.coffee and Handlebars *.hbs files, when the examiner was expecting to find, and only searching for, JavaScript *.js files); it is important to consider this before complaining of missing code
- Source code for one language embedded in another (e.g. LINQ; concatenated JavaScript strings)
- Code using an ad hoc or in-house proprietary programming language or quasi-language (e.g. LISP-like “Parentheses” at Lotus)
- Code using an in-house proprietary framework
- Within a given source-code file:
- developer names, initials, email addresses;
- dates (often dates within files will differ from file-system dates; see ch.xxx on file dates);
- copyright notices including for third parties;
- references to other docs (e.g. internal wikis);
- references to bug report ID numbers which prompted code changes;
- comments (if the code doesn’t contain comments, these have likely been “redacted”; see ch.xxx)
- Contents of source-code production may itself be at issue, e.g., did source-code owner adhere to preservation-hold obligations?
15.6 Orientation: Getting the “lay of the land”
- For examiner’s protection, ensure that internet and USB access are disabled, source-code files are read-0nly, etc.
- Check that proper tools have been provided, that command line is accessible; check for presence of a writable drive for temp files, etc.
- Determine whether source-code is encrypted, compressed within archives, etc.
- Determine size of source-code production, begin to prioritize inspection
- Determine if files within version-control system; find command-line access to version control
- If feasible, create text-searchable directory listing of entire disk (e.g. in Windows, dir /s/b c:\source)
- Determine which products/versions/platforms/components present within source-code production
- Before looking inside files, look in directory listing for filenames containing names of elements/steps from patent claim
- Start locating filenames and function names known from earlier product reverse engineering (see chapter 6)
- Begin determination of production completeness (see chapter 16)
- Initial “eyeballing” of promising-looking files (contrast later “close reading”: see chapter 20)
- Generate index of source files (see chapter 17)
15.7 General considerations for the source-code examination
- Don’t aim for holistic “understanding” of the entire source-code tree; the goal is to locate (or determine absence or questionable presence of) elements & steps corresponding to limitations of patent claims (see chapter 13)
- Viewing source code in the context of version control (Perforce, etc.) vs. viewing at the OS file level
- Impact on the examiner of the location, time, form & manner of source-code production (see PO impact in chapter 11)
- Estimating schedule, hours, budget needed (or desirable) for exam
- Prioritizing, triage