Source code ch.01: Why the big deal about source code?

Source Code & Software Patents: A Guide to Software & Internet Patent Litigation for Attorneys & Experts
by Andrew Schulman (http://www.SoftwareLitigationConsulting.com)
Detailed outline & rough draft for forthcoming book

Chapter 1:  What’s so special (or not so special) about source code?

Company A owns one or more US patents, whose claims include “a method for generating postage evidencing information,” or in layman’s terms, one particular way of printing postage stamps. The method’s steps comprise generating an error correction code for the destination address and a “digital token” employing said ECC. In response to the award of this patent, perhaps the press will report “company A thinks it invented the postage stamp,”[1] online posters will scoff that error-correcting codes have been well known since the 1950s,[2] and congressmen will fulminate (rightly or wrongly) about “a patent system run amok.”

A’s patents also claim an apparatus or device for performing the method; a system for doing so; and computer-readable medium having computer code which embodies the method.[3]

That the patent claims comprise elements or steps called “digital tokens” and “electronic correction code” (ECC), along with the presence of computer-readable medium containing computer programs (so-called “Beauregard”) claims, marks this as a “software patent.” A software patent often appears in contexts that appears removed from what is customarily thought of as software. In addition to the mundane example of postage stamps, software patents often relate to biotechnology, to physical devices such as cameras or telephones or door locks or steel furnaces, and to methods of conducting business.[4]

Now, competing Company B has an online service for printing US postage stamps. B can print the stamps and then mail them to the consumer. B also sells a product, which can be downloaded or purchased as a CD ROM, with Windows and Mac desktop programs which employ a postal scale, a printer, and an online connection to B’s internet servers, to generate USPS-approved postal barcodes on the consumer’s printer.

A, following a reasonable investigation of publicly-accessible information on B’s products — such as having an associate attorney, who was once a computer programmer, use B’s service and product, inspect the generated postal barcodes for the presence of an ECC incorporating information from the destination address, possibly use a packet sniffer to reverse engineer portions of B’s product,[5] and search B’s web site for any detailed information[6] (see chapter 6 on pre-filing investigation) — has filed a complaint against B under 35 USC 271 for making, using, and selling A’s patented invention. The complaint is soon followed by a table of preliminary infringement contentions (PICs) based on the associate’s investigation, linking each element or step, of each asserted claim, to what A asserts is the corresponding location in B’s technology; see chapter 7 on PICs.

A now has its foot in the door to gather B’s non-public information about the inner working of B’s products. A must gather evidence that B makes, sells, uses, and/or imports A’s method, apparatus, system, and/or medium.

B in turn must gather evidence that what it does is not covered by A’s patent claims, and that in any case A’s patent is invalid, because for example another company C (or B itself) sold, made,  and/or disclosed the same device or method in some publicly-accessible manner before A filed for its patent, or that A itself’s R&D team did so a year before its filing. See chapter 4  on the role of  source code in  anticipation, obviousness, and the on-sale and public-use bars.  B will also likely assert 101 subject-matter invalidity, under Alice v. CLS Bank.

To show whether B infringes A’s patent, A’s best evidence (so to speak) is going to be found in B’s “source code.” To rebut A’s particularized assertions of infringement,[7] B similarly will look to its own source code. And important evidence (though not necessarily the best)[8] for whether A’s patent was anticipated by prior art may be found in C’s or B’s source code. Evidence that A waited too long to file its patent application may also be found in A’s source code. (Source code is unlikely to help B’s subject-matter invalidity argument, though if A has source code which tangibly embodies the patented invention, this could help it think through how to show that the patent claims address a concrete, tangible subject matter.)

Generally, source code has an importance in modern patent litigation, larger than one might expect from the rough analogy often made between source code and blueprints (see xxx below). For example, the ND Cal. model protective order (PO) devotes a separate section to source code. As another example of how source code, rightly or wrongly,  is given special treatment, see the cases discussed in chapter 7 in which D’s use of source code to build allegedly infringing products is viewed by courts as presenting P with a “Catch-22” or “chicken-and-egg” problem apparently not present when D instead uses blueprints, schematics, diagrams, or written specifications. For that matter, contrast the number of published opinions containing the term “source code” with those containing the terms “blueprint” or “schematics” [or chemical formula?].

What is source code?

While the definition of “source code” can be disputed in different legal contexts (First Amendment, munitions exports, tax purposes),[9] and even within software patent litigation,[10] for present purposes we can accept the following definition from Reiffin v. Microsoft: “a computer program written in a high level human readable language” such as the C++ or Java programming languages. This source code is compiled into “object code”, which is machine language “required for the program’s execution by a computer.”[11] Given the speed of today’s computers, this compilation can occur “on the fly,” such that source code, or something closely resembling it, sometimes (even in products with otherwise proprietary non-“open” code) is included in the product delivered to consumers, or used directly to provide a service over the internet, and then translated into a form more amenable to machine execution.

As an aside, to be discussed in more detail at xxx below, “code”  here does not mean “code” in the sense of encryption, keys, passwords, and so on. Attorneys sometimes make this mistake, thinking that what they are seeking in discovery is some sort of magic key: “They won’t give us Their Codes.” Source code is called “code” for an entirely different reason, to be discussed in detail below, but for now note that the type of code we’re concerned with here contains instructions to direct the operation of a machine (the copyright law definition at 17 USC 101 is helpful here: a “computer program” is a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result).

These instructions are in the form of numbers such as “123” for ADD or “126” for DIVIDE, and the machine “knows” that 123 means to perform addition because that agreed-upon meaning is built into the machine. The machine, in other words, has a built-in “codebook” telling it to interpret 123 (in certain contexts) as an instruction to add. See xxx below, and xxx which will further note that a number like 123 could mean four or five different things, in a single program, with its meaning (what it’s “code” for) depending on the particular context within the program.

In other words, getting back to the story of A v. B, employees of B have written computer programs in the form of source code, just as they might write emails, or a diagram of how B’s product should work, or a memo. What they have written must follow the dictates of a programming language, and will  have various constraints imposed by the nature of the problem to be solved, but can otherwise be almost infinitely expressive. Source code is text; it is a form of document.[12]

But just as some speech is legally operative (the words themselves carry out an action such as agreeing to a contract),[13] so B’s source code is more than a mere document, in that, in the Reiffin court’s words, it can be automatically “compiled” (or mechanically translated) into a different form which a machine can directly execute. There is a direct mapping from source code on the one hand, to running a machine on the other.[14]

Computer source code can be thought of as an odd form of blueprint which is also a raw material. Source code holds the actual (as-built)[15] design for a software product, and at the same time is directly employed in the construction of that product, such that significant fragments of source code may appear in the final product. This will surprise those who have been told that the final product “is just ones and zeroes”; see below on the “Myth of 1s and 0s.” The relationship between a software product and its source code is thus quite different from the relationship between, say, a mechanical device and the blueprints for that device. As to why this makes a different to patent litigators, see xxx below.

Matching (“reading on”) patent claims to source code

Discovery is now underway in the case of A v. B; B has produced its relevant source code to A’s attorneys and experts, under a protective order (PO; see chapter 9  on source-code discovery, chapter 11  on POs, and chapter 15  on the effect of POs on the source-code examination environment). A’s expert has found in B’s source code the text (written in a programming language such as C++, Java, and/or PHP) for a function named “GenerateStamps”, which, if executed (or invoked or “called”), would in turn call a function named “TokenGen”, which in turn passes data named “Addressee” to a function named “MkECC” (or, as discussed below at “What’s in a name?,” it might be named “ReedSolomonCode::encode” or “HammingCode”, with no explicit mention use of the term ECC); something like:

boolean GenerateStamps(byte *data) {
     return TokenGen(data);
}
boolean TokenGen(byte *data) {
     return MkECC(data->Addressee);
     // return ReedSolomonCode::encode(data->Addressee);
}

Does this source code, created by B, infringe A’s claim to a method for generating postage-evidencing information comprising an ECC for the destination address employed by a digital token? Naturally, the answer to that depends first and foremost on the construction of the claim limitations: what does the patent mean by destination address, digital token, and ECC? [16]

But it also depends on the facts of the source code. Presumably A’s expert will point out that “gen” and “mk” are common programmer jargon for “generate” or “make.” And the expert should further be prepared  to testify that the names, while suggestive (and perhaps even crucial  for locating the code in the first place), are not necessary for his analysis, because he carefully checked that  the code implementing MkECC  does in fact do what its name  suggests.

Source code vs. product/service: “Where’s the money?”

Yet, even were the claims to unambiguously read on the source code, how does A know that these particular functions in B’s source code are actually used in B’s revenue-generating product? True, B has produced its source code in discovery, presumably in response to A’s request for the source code to B’s commercial product, but if A v. B is a typical case, A has produced tens of thousands of source-code files (likely both over-producing as well as under-producing; see chapter 9  on discovery), from which A’s expert has extracted small portions:

  • B’s source code may contain a dozen different functions for generating stamps, only some of which are currently used in the product; perhaps those that would be most helpful to A’s case actually have less immediately-obvious names than the ones A’s expert has so far found.
  • Conversely, some of B’s source code is likely “dead code,” perhaps used as scaffolding in the construction of a product, but not moved into the product itself, or no longer used by (though still included in) the product; B’s dead code might include some of A’s favorite “smoking gun” infringement examples.[17]
  • Some may correspond to B’s previous or forthcoming products of which A was previously unaware.

While some source-code information may delight A, and lead to an amended complaint, once A starts pointing with particularity to these specific source-code locations, it is giving B a very clear target to hit, and a very clear non-infringement strategy to accompany its invalidity arguments: since B’s burden is not to disprove infringement generally, but only to rebut A’s specific assertions,[18] B is now positioned to try through its expert’s examination of B’s  own source code to show that the code pinpointed by A:

  • Despite what its name suggests to A, does not do the same thing as A’s properly-construed claim, nor its equivalent; B will assert that A’s experts, who after all are far less familiar than B with B’s own code, are perhaps misreading or overlooking some aspect of B’s code;[19]
  • Is not part of the product; it was an experiment, or was once part of the product, and (if B wants to go this far) has long since been replaced with a non-infringing alternative; [20]
  • Is part of the product, but is never used (but see the “latent code” cases); [21]
  • Is part of the product and is used, but is rarely used, and is typically bypassed (which unless de minimus won’t affect infringement which like virginity is a binary state, but which will definitely impact  damages; see chapter 30);
  • Is regularly used in the product, but typically fails in some way and is then followed by execution of a non-infringing alternative;
  • Is so flawed in its implementation of X that it does not actually do what it’s supposed to do, and hence doesn’t match the patent which is implicitly claiming an X which is reasonably though not completely error-free;[22]
  • Is so slow or so piggish in memory usage or such a source of tech-support complaints that its use is actually costing B;
  • and so on. [23]

Recall the definition of source code as human-readable text (a document). Just as a company email proposing some action may be excellent evidence of the action, but can sometimes be reasonably interpreted in different ways and is not the same as proof that the action was carried out,[24] likewise source code is generally not the same as the actual revenue-generating product delivered to consumers;[25]  see chapter 2.

(This is the first of several lists in this chapter which illustrate issue-spotting in code. Just as a good lawyer, or even a 1L law student, should be able from a few words to identify a series of factual and legal issues, likewise a software examiner involved in litigation should be able to see possible sources of contention in software.)

This partial list of issues shows that, at least for a “selling” complaint (as opposed to a complaint about D’s making or using the patented invention), source code is likely “mere” evidence of infringement, rather than infringement itself, because if B is like most companies likely to be targets of a patent infringement complaint, B holds its source code as proprietary “crown jewels.” What it sells, and largely what it uses, is the product or service or device or medium built from the source code, not the source code itself. [26] [This paragraph is too confusing. And what about “product by process” for method claims?]

Why inspect source code?

So, if source code is not the revenue-generating product/service, and if B’s source code is proprietary (not even available to A at time of A’s complaint),[27] why then is source code the “best” or at least preferable form of evidence for infringement of software patent? Why not look at the actual product and be done with it?

For one thing, because the federal courts have said so. Numerous cases recite how the plaintiff (P) won’t have the defendant’s (D) source code until after P files its complaint, and that A must therefore base its complaint on something other than D’s source code, yet still satisfy the requirements of a reasonable pre-filing investigation.[28] While the courts state that relying on D’s marketing literature is usually insufficient, and that P often needs to “reverse engineer” D’s product or service,[29] the courts only view this as adequate for initial or preliminary infringement contentions (PICs; see chapter 7). Having been given access to D’s source code, P and its experts are expected to diligently examine the source code and to file non-preliminary infringement contentions (ICs) with “pinpoint” citations to where (and in some venues, how)[30] D’s source code embodies P’s invention;[31]  see chapter 26  on claims tables.

The reason courts view reverse engineering the actual product or service as necessary and sufficient for PICs, but generally[32] prefer citing the closely-held source-code documents for non-preliminary ICs, is not only the simple fact that D’s source code is unavailable to P at the time of the PIC, thus presenting potential plaintiffs with what would otherwise be a chicken-and-egg problem.[33] More important, reverse-engineered listings from the actual product or service are simply not as readable, or as amenable to agreement among experts, as the source code – even though it is more likely the product rather than the source code which infringes, at least in a way likely to generate substantial damages.

An analogy: think of reverse engineering (RE) the actual product as concrete  research into what a company is doing, and think of the source code as the set of internal company emails and memos which directly led to what the company is doing. In an ideal world, non-documentary proof that a corporation actually does X should logically be more probative, more convincing as evidence (leaving aside issues of intent) that X was actually done, than a mere email from the CEO saying “let’s do X.” However, the non-documentary proof is harder to understand, more open to challenge, and doesn’t have the impact, color, or “smell” of the CEO’s statement. Few litigants would eschew the colorful email evidence in favor of, for example, a  statistical proof.

Likewise, while RE logically should be more convincing evidence of what the product actually does, than source code which is one step removed from the actual product;[34] and while source code on the one hand is far from being immediately comprehensible[35] and machine code on the other hand far from being a closed black box;[36] and with the following other heavily-footnoted caveats, nonetheless source code:

  • Is “straight from the horse’s mouth” — A and B have produced these files in discovery, in response to requests for their relevant source code; in contrast, the product of expert RE would require significantly more authentication;
  • Contains information not found in the product itself — in addition to “comments” (free form text programmers write to explain, sometimes accurately, how the code works, or why a method was chosen),[37] source code will often contain names for functions and data; these names are often (though definitely not always) boiled away during compilation into machine code[38]
  • Uses the party’s own terminology for the accused technology, such as “MkECC and TokenGen” in the hypothetical example of A v. B — it is much harder for D to argue that “MkECC” does not make an ECC than to say the same of the corresponding anonymous set of mathematical operations which A’s expert asserts generate an ECC;[39]
  • Depending on how it has been produced, source code should be in the form of text files which are full-text searchable; [40]
  • Is often easier to read than RE information extracted from the product;[41]
  • Is likely less subject to expert disagreement.

Given the numerous footnoted caveats to the above list, it is hardly surprising that source code alone is rarely sufficient.[42] If the claims are compared with the source code, then in most cases, both sides should also  want to also compare the source code with the accused product or service. Should: an expert may have to convince a budget- and simplicity-conscious client that source code, while important and perhaps even necessary, is an insufficient basis for the expert to confidently state, to a reasonable degree of engineering certainty, that B does or does not infringe A. While in some circumstances one truly does not care, [43] usually a revenue-generating product/service and/or some public disclosure is the thing. What’s accused is the product not the source code.

Also  consider third-party C’s invalidating prior art, or A’s statutorily-barring product. Proprietary source code itself often won’t be the best evidence of whether a particular portion of that source code was incorporated in, or executed by, or visible within, something that was publicly accessible to the PHOSITA at the relevant time. [Confusing paragraph]

To beat the “best evidence” horse: while the rule (BER) in FRE 1001 to 1008 states a preference for the original of any writing whose contents (as in a contract) are legally significant, here we are saying that the preferred evidence for what a product or services does is not the product or service itself, but rather a document (source code) which is both merely “about” the product or service, and at the same time a necessary antecedent to, and direct cause of, that product or service. Source code can be analogized as both a blueprint and a raw material; see xxx.

Symbols & synonyms: What’s in a name?

Earlier, A’s expert found a function named “MkECC” in B’s source code. It was lucky for A’s expert that B happened to use “ECC” in the name of a function to do ECC. Given the variety of ECCs, there are a wealth of methods and names B could have chosen. For example, in a standard piece of open-source  C++ code, a method which does ECC encoding is named “ReedSolomonCode::encode”,[44] with no mention of “error correction code” or “ECC”.

How would A’s expert have instead located relevant code under the name “ReedSolomonCode::encode”? In this case, the patent specification helpfully notes, “An error correction code is generated for the selected address data using, for example, Reed Solomon or BCH algorithms”. While not limiting the claim to Reed Solomon or BCH encoding, the explicit mention of these examples helps construct a set of proper nouns  for which the examiner can search. Other synonyms might be “RS” (for Reed Solomon), Hamming, or Golay. Such potential synonyms for searching also frequently appear in dependent claims (“The method of claim 1, in which the error correction code is a Hamming, BCH, or Reed Solomon code”). (It is perhaps obvious, but just to be clear, “searching” here refers to various ways of examining source code or software products, not searching patents — except to the extent they may contain source-code or pseudocode fragments — nor searching for non-source prior art.)

So long as A’s expert understands that other types of ECCs may also be read on by the independent claim, and so long as B understands that the term “ECC” may conceivably have a more limited meaning in A’s patent than at first appears (excluding, for instance, convolutional codes, because the only examples the spec provided were all block codes; see ch.xxx on “means plus function” and functional language), then we are still in the realm of literal infringement, or its rebuttal (an ECC claim element does or does not read on code named “RSEncode” or “MkBCH”), and no one is yet talking about the doctrine of equivalents.

But it still sounds as if we are depending on names. And if so, why might “ReedSolomonCode::encode” help comprise literal infringement of a claim which recites use of an ECC, without resort to the doctrine of equivalents? One   short  answer is that a name in code is just a convenient “symbol” to represent an address (location in memory);  see the example at xxx below of code in which all function and variable names have been replaced, without changing the code in such a way a way that would take it outside literal infringement. Consider too that were we not talking here about a software patent, or if  software patents were not made up of the same “stuff” (i.e., words) as source code, the issue of non-verbatim literal patent infringement would not even arise. A physical device will often literally infringe a patent claim, even though the physical device clearly does not use any words, much less the same words as the patent claim.

Some small source code examples

“Alice was beginning to get very tired of sitting by her sister on the bank, and having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, ‘and what is the use of a book,’ thought Alice, ‘without pictures or conversations?'”
— Lewis Carroll, Alice’s Adventures in Wonderland

It would seem to go without saying that a book on code or software  will present the reader with some examples of what code looks like. One important book, Lawrence Lessig’s justly-famous book Code, treats this as a unimportant, and several casebooks on software and internet law provide only the most perfunctory display of code, apparently more to mystify law-student readers than to help understand if there is even such a thing as “software law” or “cyberlaw.”

As the first of several source-code examples in this book, a small example below will help illustrate what we’ve been discussing, which is  claim/code matching. The source code below is much simpler than ECC. It merely adds up a series of numbers and produces their total. The code is written in the C programming language, and implements a function named “sum”, which is passed parameters named “start” and “end.” If invoked with the parameters 1 and 4, for example, the “sum” function produces 10, which is the sum of 1 + 2 + 3 + 4. In case this sounds too trivial, note that Uniloc v. Microsoft in part hinged on whether a function performs addition.[45]

int sum(int start, int end) {
         int i, total;
         for (i=start, total=0; i<=end; i++) {
                  total += i;
                  if (total < i) {
                           error_message("overflow in sum at %d\n", i);
                           return -1;
                  }
         }
         return total;
}

This code will be explained below at xxx. An attorney reading this book should  not be expected to understand this code, any more than he or she might be expected to understand a CAT scan or x-ray shown in a book on medical law, or a land survey shown in a book on boundary disputes in real property. But lawyers should be able to understand what their experts say about such documents; should understand for example that “start” and “end” are “parameters” to a “function” named “sum”, that “i” and “total” are “variables” of the type “int” (integer, i.e. whole numbers); and most important should be able to recognize the factual issues and legal issues that they raise. By the end of this lengthy chapter, the attorney reader will see that such unpromising-looking, eyes-glazing-over, “boring” material is as much a site for disputes as the photograph of the site  of a car crash (or perhaps even as the car-crash site itself).

For now, however, simply try to view the code above as one would any old text, albeit with the same few words repeated over and over, some mathematical symbols, curly braces, and odd-looking indenting. The text — think of it for now as a company’s inter-office memorandum on how the company will perform summation — it really just states one rather obvious way to do summation, which is to start at the beginning of a set  of consecutive numbers, and add each number to a  total (which total one has, of course, remembered to initialize to zero before starting). It also states that, if ever the total winds up less than the number you just added, something has gone wrong. The code refers to this something as “overflow,” and the possibility of its occurrence is a useful reminder that software is not abstract, but deals with tangible devices called computers which have definite limits. Here, if this code were to be run on a 32 bit computer, the total would overflow at around 4 billion. This of course has implications for subject-matter patent eligibility under Alice v. CLS Bank; see xxx.

Now, getting back to the code example, if one were to take this text, and perform a global search and replace, changing “sum” to “func” (a generic term for any function), “start” to “x”, and so on, the source code would now look like the following:

int func(int x, int y) {
         int i, z;
         for (i=x, z=0; i<=y; i++) {
        z += i;
      if (z < i) {
         error_message("overflow in func at %d\n", i);
         return -1;
      }
   }
   return z;
}

This really is the same exact code as in the previous example in that, when compiled, both versions would yield the same “object code” in a product which consumers use. The names sum, start, end and total are conveniences for the programmers. Important conveniences, but with no effect on the machine code – and with no effect on whether a given claim reads on code which includes this function.

Actually, the object code would likely not be exactly identical: if the error message is of the type which is included in the commercial product and not restricted to in-house “debugging” versions, then because the function’s original name “sum” was also replaced in the error message, this part of the product will be changed too. The presence of text for error messages, by the way, is one reason why “object code” in software products is almost never “just ones and zeroes” despite what is generally taught in law-school courses on software law; this is not a minor quibble, and actually makes an important difference in software litigation, and is discussed in detail below at xxx.

But ignoring any commercial products that might be built from this source code, and getting back to the source code itself: A’s expert would not find the code named “func” in a search for “sum”, “summation”, “addition”, or what have you. A instead would have to find this as part of a “top down” approach to the code, as discussed later at xxx.

Once found, it would be more difficult for A’s expert to show that “func” performs summation, than would be true for the function named “sum”. A’s expert would observe that the function is returning “z”, whose value is added to each time through the loop. The “overflow” error message is also a useful indicator (ch.xxx will show that error checking and error messages are extremely useful in searching for potentially relevant code, and in determining what the code does). Here, z is an “int” or integer, which only holds e.g. 32 or 64 bits. A 32-bit number can represent numbers up to about 4 billion. The programmer was concerned that z might become large enough to wrap around (overflow). [Okay, but how do these facts help A’s expert show that func() is generating a total?]

In partial answer to the question posed, “What’s in a name?,” we can see that it is very helpful when using the other side’s source code, if that source code employs names for functions, variables, and data structures which line up with claim language, or with reasonable synonyms of the claim language. While this is obvious for P fishing through D’s source code for evidence of D infringing P’s patents, D will likewise hope for (but not rely on) the presence of such names when e.g. using P’s source code to show that it was selling the invention more than a year before filing. Names in code may be somewhat arbitrary, and may be boiled away when compiling the product delivered to consumers, but are important, often even crucial, in software litigation. [Make sure reader understands that “names” here refers, not to names of programmers who worked on the code, but rather to the names of functions, variables, and data structures in the code.]

Aware of how useful names are for finding and demonstrating what the code is doing, some litigants before production to the other side in discovery will sanitize code, removing comments, and having a programmer run an obfuscation program to scrub function and variable names. Don’t do this. Apart from likely constituting spoliation (see chapter 10), you can simply rely on your client’s programmers already having been sufficiently inscrutable (and even, occasionally as job security, downright obfuscatory) in the ordinary course of business. Alternatively, B’s experts may in litigation deny that B’s name “x” really means “x” (see e.g. case xxx on the word “predicting”) though, even if true, may sound unconvincing.

Different implementations of the same component

To return to the example of the summation function in B’s code. At some point, a programmer at B noticed a problem: the further apart start and end are, the longer it takes to produce the result. This is not surprising since the function walks through each number adding it to total.

There are better ways to implement summation of numbers from x to y. Below is another method, discovered by a small schoolchild in the 18th century named Carl Gauss, whose teacher tried to busy the children (picture Miss Krebappel: “okay, children, we’ll be very quiet today because teacher has a hangover”) with adding up 1 through 100; little Gauss immediately saw that the answer was 5050, because rather than dutifully march through the numbers 1, 2, 3 and so on (as the code above does), he could pair up 1 with 99 to get 100, 2 with 98 to get another 100, and so on; he could figure out the number of pairs there would be, without counting them one by one, plus anything left over.[46]

The code below uses this idea, first with useful naming, then without. Also note that the programmer at B did not delete the previous versions (sum and f) from the source code; he simply added some more. In many modern programming languages which support function overloading and overriding (see xxx), the new versions might even have the same names as the old.

int sum2(int start, int end) {
   int tot = end * (end + 1) / 2;
   return (start==1) ? tot : (tot - sum2(1, start-1));
}
int f2(int x, int y) {
   int z = y * (y + 1) / 2;
   return (x==1) ? z : (z - f2(1, x-1)); // recursive
}

A few observations:

  • We now have four different functions which apparently all have the same purpose or role (“function” in F/W/R analysis) and which given the same input yield the same result (output), but with different implementations (way)
  • Actually, the two examples immediately above won’t yield the same output in the case of overflow
  • Perhaps there are other error conditions in which the different implementations do not yield the same result: what if the second number (labelled “end” or “y”) is smaller than the first one (labelled “start” or “x”)?
  • The two “Gaussian” examples immediately above do not contain a loop, yet they apparently also do summation
  • The function named “f2” calls itself (recursion)

Disputing a few lines of code

Each of the two tiny examples above raise a host of potential issues for A and B to dispute. As touched on earlier, code is a location for disputes. This is one reason why there is “software law” or “cyberlaw,” and perhaps not merely law applied to software or the internet (but see xxx on the “law of the horse” dispute between Lessig and Easterbrook). A software expert needs the ability to do “issue-spotting” in code, in somewhat the same way that an attorney does issue-spotting. If it sounds unrealistic to ask a non-attorney expert to learn the art of issue-spotting without at least the first year of law school, then this is all the more reason why attorneys working on software and internet litigation need to be able to spot the issues raised by code, even if they couldn’t write code and can barely read it. The “where’s the money?” section above gave one set of examples raised by the distinction between a given piece of source code on the one hand, and the commercial product built from it on the other. Other types of issues might be raised entirely within the four corners of the source code, without reference to the product. Some of the disputes sound a bit desperate, but some are genuine:

  • It might or might not make a difference in a given case, but which version is the program using? – apart from looking at this code, the experts need to examine the code which calls it (and the code which calls that, often up many layers, before determining that e.g. sum2 is not the function you’re looking for). [MORE: function overriding, overloading]
  • P’s expert finds that f2 is being used in D’s product, and that the function named “sum” is not. P’s expert tries to show that f2 really will yield a number with is the sum of the numbers from start to end. But is yielding that number, without addition, really summation? How can P’s expert show that? Imagine the functions here were more complicated than mere summation.
  • Alternatively, P’s expert finds that the nicely-named “sum2” is being used in D’s product. Does he need to explain what that “2” is all about, how it’s just the second, improved, version of “sum”, and not something which sums only two numbers?
  • Does P’s expert need to show that the odd-looking code in f2 does the same thing as the code in “sum”, just faster and in a different way? We have two different ways of reaching the same result: should we use the doctrine of equivalents’ function/way/result test here, or isn’t it still literal infringement if the claim refers to “summation” and the function generates a sum? [MORE: does absence of a loop mean not doing sum? Maybe reaching the same result as sum, but is that same thing? See case where judge realized that expert was merely saying that func was capable of x, not that it does x.]
  • P’s claims refer to “means for providing a summation value.” The parties agree this is a means-plus-function claim under 112(6). P’s specification discloses a loop or iteration. The f2 function doesn’t use a loop or iteration. Neither does the sum2 function – but D itself put “sum” in its name; should D be held to its own statement that “sum2” generates a sum? Picture DA Jim Trotter in My Cousin Vinny: “its own O-fficial statement, ladies and gentlemen of the jury!”. (Programmer estoppel?)
  • D’s expert notes that D’s sum2 and f2 are not doing any error checking. If start is greater than end, the code will produce something other than the sum of numbers between start and end. “What kind of sum function is that?,” D asks rhetorically. And there’s no overflow test as in the earlier code. And even there, the function could produce -1 as the sum of two sufficiently large numbers. Can D’s code really be said to do summation if it possibly does so with errors (yes, some of these arguments sound a bit desperate). Can P simply note that the function does do summation, for at least the vast majority of the inputs it receives in the real world? [MORE: Can P fall back on insubstantial differences? Would a PHOSITA regard a buggy sum function as insubstantially different from a sub function? Would discounting bugs render all software patents inoperable? [See Spinellis, Code Reading, #179 on flawed code as a spec for corresponding intended implementation.]

Purpose vs. structure

In A’s examination of B’s source code, almost identical code to that inside “sum” is found in another function, called “MkDigest”. Its purpose is apparently to create some sort of “digest” or “summary” from a set of numbers. It looks like a (very) poor man’s MD5 (see xxx). Whatever its purpose, its structure is just like that of the “sum” function, except instead of “start” and “end” parameters designating the beginning and end of a set of numbers to sum, it instead expects one parameter containing an array (which the programmer has named “arr”) of numbers (not necessarily consecutive, as in sum) and the length of this array:

int MkDigest(int *arr, int len) { // summary
   int i, total, *p;
  for (i=0, p=arr, total=0; i<len; i++, p++) {
  total += *p;
  if (total < *p)
        return -1;
 }
  return total;
}

Despite its apparently quite different purpose from the examples shown earlier, is this MkDigest() also a sum function? Do we care more about something’s purpose (function), or what it does (structure)? Would we care less if the term “summation” appeared in the preamble of a claim rather than its body? Preambles are generally not a source of limitations (see ch.xxx), in part because the patent system is focused more on what something does (its structure) rather than on what it will be used for (its “function,” not to be confused with source-code functions like sum() and MkDigest()). Would we care more if “summation” appeared in a “means-plus-function” limitation, such that any infringing (or invalidating?) code would need to match, not merely the functional language of the claim, but also the “means” (implementation) set forth in the specification? The author realizes he hasn’t explained this point very well, and so will defer to a much clearer learned treatise, Saturday Night Live, Season 1, Episode 9 (1975):

“[open on suburban kitchen, Wife and Husband arguing]
Wife: New Shimmer is a floor wax!
Husband: No, new Shimmer is a dessert topping!
Wife: It’s a floor wax!
Husband: It’s a dessert topping! …
Spokesman: Hey, hey, hey, calm down, you two. New Shimmer is both a floor wax and a dessert topping! Here, I’ll spray some on your mop.. [ sprays Shimmer onto mop ] ..and some on your butterscotch pudding. [sprays Shimmer onto pudding]
[Husband eats while Wife mops]
Husband: Mmmmm, tastes terrific!
Wife: And just look at that shine!  But will it last?
Spokesman: Hey, outlasts every other leading floor wax, 2 to 1. It’s durable, and it’s scuff-resistant.
Husband: And it’s delicious!
Spokesman: Sure is! Perks up anything from an ice cream sundae to a pumpkin pie!
Wife: Made from an exclusive non-yellowing formula.”

What patent litigation focuses on, in other words, is that exclusive non-yellowing formula, not what someone does with it. (Though oddly, the US PTO’s patent classification system is more focused on uses than on implementations, see xxx.)

So what? The importance of the “structure vs. function” distinction, and the preference to structure (implementation) over function (purpose), will be discussed several times in this book, including at xxx below, which in addition to covering functional claim language under 112(6), will also sort out the difference between structure/function in patent law and structure/function in programming.  For now, simply hold onto the point that in source-code examination in software patent litigation, the focus will usually be on what something does and not what purpose someone might put it to.

Programmer terms, jargon, abbreviations, and conventions

As yet another variation on adding a series of consecutive numbers, the following simply multiplies the number of numbers by the average number:

int sum3(int start, int end) {
   int num = (end - start) + 1 ;
   int ave = (end + start) >> 1 ;
   if ((num & 1) == 0)  {
      num >>= 1;
      ave += (ave + 1);
  }
  return num * ave;
}

If P’s expert were searching the code for “average”, he might miss “ave” here. This is a typical programmer abbreviation: procedure is either proc (or func, for function), except proc might also mean “process”. We’ve already seen “mk” for “make” or “generate”; even creation may appear as “creat” without the e. Some abbreviations are part of the programming language itself, such as “int” for integer, or part of a library or interface that the programmer is using, such as the Windows API (famous even to lawyers from the Microsoft antitrust litigation), which includes e.g. DefDlgProcEx and such vowel-deprived names as “hWndCtl”, “SNDMSG”, and “lpVtbl”. Even already-curiously-named constructions such as the important “thunk”[47] may be rendered as e.g. thk.

Modern code does tend to be more verbose and self-explanatory, sometimes to the point of annoyance of the code reader, though very helpful for the code searcher. The programming term “shim” might be more descriptively rendered as “interceptor” or “redirector” (though also as the incoherent “shiv”).[48]

The code above is using the C shift right >> operator to divide by 2. An expert might need to explain that there is division here. The “int” means this is integer arithmetic, so odd numbers will not evenly divide by 2. This particular function accounts for that, but another program doing integer division might be rounding down – possibly room for argument about what the code does. (Though D will want to be careful: if it says that its f isn’t doing X, because f merely approximates X or because D’s f has a rare bug which D now triumphantly and almost-proudly points to, D had better be sure this doesn’t mess up D’s invalidity argument that third-party 3P’s code anticipated P’s claims – because perhaps 3P’s code has the same or similar imperfections.)

[Give another example (besides >>=1 means integer divide by 2) of a programming convention, where the code simply doesn’t look like what it does; see e.g. non-intuitive-looking examples of square root (“sqrt”) in Henry Warren’s Hacker’s Delight (http://www.hackersdelight.org/).]

Enough with A v. B and their dispute over postal bar codes, ECCs, and summation. Let’s start back over and look in more detail at the issues raised by source code. [Better give reader a reason why we’re starting over: it made sense to start with a concrete scenario, and that raised lots of issues, but now want to look at the issues in a more logical order, and in more detail.]

What source code is, redux: Crown jewel, blueprint, piano roll, or just another doc?

  • Definition of source code for purposes of discovery: source code is a document, “just another doc,” but a different type of doc
  • Definition of source code elsewhere in patent litigation
  • Definitions of source code in other areas of law, including copyright, import/export, First Amendment, munitions (!), etc.
  • Why the patent litigator should be familiar with different definitions of “source code”
  • Copyright definition: a “computer program” is a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result (17 USC 101)
    • Statements or instructions — low-level e.g. ADD, MOV, CMP, JMP; mid-level e.g. sqrt, fread, DefWindowProc; high-level e.g. earlier examples from postage-stamp code
    • Instructions to a machine
    • Directly or indirectly — directly, in machine code, or indirectly from source code written in a higher-level programming language which is compiled to machine code
    • To bring about a certain result — actually bringing about that result; operative words; a type of blueprint which actually brings the structure into fruition
  • Telling a tangible machine how to bring about a certain result, in a way which actually (directly or indirectly) can bring about that result
    • What type of “machine” or “language” are we talking about here?
    • Compiling, virtual machines, bytecode, interpreters, eval(), executing code “on the fly”, JIT, “engines” (e.g. for regular expressions), etc.
    • Equivalence in computer science of machines & languages [perhaps note that almost everything else here is really software engineering (SE) rather than computer science (SC)]
    • “Functional” programming languages, e.g. Haskell; high-level languages e.g. SQL in which programmer can (up to a point) treat implementation/way as black box; role of “wishful thinking” in programming (“gee, if only there were a function that did x…” as a tool for creating code to bring about a concrete, tangible result)
    • Tangibility & limits, e.g. 32-bits => max 4 billion; integer approximation
    • Implications for 101 subject-matter eligibility under Alice
  • Some attributes of software, including from assorted legal definitions
    • Is comprehensible to appropriately-trained persons; there is a specific field of expertise
    • Is generally not capable of direct execution on a machine
    • Generally has a direct relationship, via literal mechanical translation by a “compiler,” to code which can be directly executed on a machine (but compiler reordering, optimizing)
    • May include related notes, design documents, etc.
    • Can constitute a component of the final product (but see Microsoft v. AT&T and Eolas cases on golden master disks)
    • Has “speech” aspects under First Amendment law (DeCSS code: “if you can put it on a t-shirt, it’s speech”)
    • Can possibly constitute a “munition” subject to export control (DeCSS silly, but Stuxnet)
  • Some attributes of source code, distinguishing it from other materials in e-discovery (ESI)
    • Structured; parts interrelate to each other; can detect if something missing; is read in a structured way (“tracing”); has dependencies
    • Plain text, though possibly inside version control or development environment
    • etc.; see chapter 9 on discovery
  • Some more-or-less useful analogies
    • Blueprint — how source code is similar to, and different from, a blueprint for a building or a device
      • Blueprints are used in construction or manufacturing to create buildings or goods, but the blueprints do not themselves “bring about a certain result” in the same direct way that source code can be said to bring about a certain result
      • Yes, source code generally needs to be compiled before it can direct a machine’s operation; but the translation of source code into machine code is (apart from the important exception of optimizing compilers) literal
      • Blueprints are not directly translated into instructions to erect a building (except possibly for the partial exception of BIM)
      • Nor does a complete building contain portions of the blueprint (but see the Google[x] spin-off Flux as a possible example of how working blueprints might go “into” a building, helping control its operation)
      • Blueprints often diverge from the final product, in ways in which source code does not diverge from a software product (there is such a divergence between source code and software products; see chapter 2; but it’s very different from the blueprint/building divergence; is there an equivalent in software to the need in construction for “as built” drawings?)
    • Recipe — see “The Sacertorte Algorithm”; Silvaco trade secrets case re: baking & recipe
    • “Piano roll” in copyright cases — “piano roll blues” (also a good example of how insufficiently-planned analogies can blow up)
    • Code as law/rule (Lessig)
    • Code as proto patent claim (xxx on turning code into claims)
    • Source code as “crown jewels” (yet frequently misplaced; see chapter 10)
  • Operative words, “speech acts” (Austin, Searle)
  • Patent cases disputing claim construction of claim terms such as “instruction,” “code,” “assemble,” etc.: of course, construction of “instruction” in one patent may not be relevant to another patent, even from the same patent holder (but see claim construction thesaurus), but useful at least as background

What source code is not

  • Sometimes 01010101 (sorry, 00101010) is presented as an example of what source code is not (Gomulkiewicz software law casebook), but sometimes as if it were source code
  • A general indifference to getting it right, so long as it looks sufficiently geeky; in this book, we’re going to get it sufficiently right that it will assist lawyers involved in software litigation
  • Do Google image search for “computer code stock photo”; see collection of “Source Code in TV and Film” (alright, most of it looks like actual code)
  • Show green letters on black background: “This is not code” (with apologies to Magritte)
  • “The Codes”: source code is not something like a key or password or decryption system (though it can be used for these)
  • So why is it called “code”? — the machine has a repertoire of instructions it can carry out; these instructions are specified with numbers (e.g. 123 means ADD, 124 means SUBTRACT, etc.); the machine receives a number, looks them up in its “codebook” to see what it should do (see machine.c example code at xxx)
  • Source code is generally not simply what you see when you “View Source” in a web browser (this is often the least interesting part)
  • Source code is generally not what you get in e-discovery; contrast code vs. data
  • Conversely, source code is generally not the One True fount of information, of which the commercial product on the market is a mere “manifestation” (see chapter 7 on the reason some courts give for allowing bare-boned PICs when P expects to later receive source code during discovery)
  • Source code “aura” generally: either treated with same semi-mystified respect as 01010101 in the movies, or conversely, seen as radically more readable and accessible than 01010101
  • Source code is not a completely-new sui generis thing for which brand new laws (or no laws) are required; see “Cyberlaw” debates e.g. between Lessig and Easterbrook, and consider whether analogies, such as swap-meet for online forum, work well enough [perhaps note long history of instructions to machines, “control codes,” etc.]
  • Source code is not necessarily more understandable than object code: contrast inscrutable code (e.g. expert’s source code reprinted in Novartis v. Ben Venue Labs; C obfuscation contest) with sometimes very-readable disassemblies or Java decompilation [xxx; see ch.3 on open source (some mythology)]

What difference does the nature of source code make to a litigator?

  • When code is viewed as just another thing to get in e-discovery, similar to emails, you often end up with a source-code production that is unhelpful to your expert: PDF files vs. “native format”; what “native format” means for code; how the structure of source code dictates how it should appear in discovery (chapter 9) and in expert investigation (chapter 15)
  • Conversely, when source code is viewed as all-important, overshadowing the commercial product at issue, you end up with deficient complaints and preliminary infringement contentions, and likely with a poor understanding of damages (which are generally applied to the commercial product, not to the source code; see chapter 30)
  • How and why source code should be treated differently from “ESI”: see article on source code & e-discovery
  • Knowing what you’re asking for in discovery
  • Understanding what you’re about to agree to in protective order
  • Actually challenging the other side’s expert, in a genuine (not merely “pot-shot”) manner
  • Understanding what needs to be shown, other than the source code (see “How source code relates to software products” below)
  • Code as a site for disputes: “issue spotting”
  • Analogies, and (with apologies to Birdman) the unsurprising virtues of attorney technical ignorance (goal here is not for attorney to sound like a computer programmer)

How source code relates to software products & services

  • Generally indirect, but literal relationship; see xxx above on compiling source code into object code
  • The myth of ones and zeroes; see e.g. Robert Gomulkiewicz casebook on software law; contrast presence of text strings (error messages, debug statements, names of external symbols) in most object code
  • If source code is the “blueprint” for the product, it’s an odd type of blueprint which:
    • (a) directly leads to creation of the building (though perhaps some BIM works this way); and
    • (b) is used as a raw material for the building, in that a substantial portion of source code ends up in the final product
  • If source code is a recipe, and the product is a “baked” good, then … (see Silvaco trade secrets case)
  • Fundamentally, what attorneys and courts often miss is the fact that large chunks of the source code end up in, or directly reflected in, most commercial software products/services; this assists reverse engineering
  • While comments are widely known not to move from source code into finished product, comments are not the only (or even most important) source of readable text in source code:
    • Functions and data usually have names which will often (not always) end up in the product
    • Menu and dialog items, error messages, etc. to be shown to the user necessarily will be in the product
    • Often debug information (logging, assertions, debug symbols) will be included in commercial products
  • “Products” which are services, and application of method & system claims
  • What about the internet, client-server code, protocols, signals, and packets?
    • Generally client sends, server receives; but “push”
    • Really not much different than non-network function calling, except different “address spaces”: remote procedure call (RPC), marshalling, de-marshalling
    • Packets may contain “instruction” acted on by code; see xxx on event-driven programming; not quite the same as “instruction” in non-network context
    • Protocols, packet formats
    • Receiving and executing code “on the fly”
    • Code/data hybrids e.g. via JSON, DWR, etc.
    • To assist reading of network-related source code, also often need run-time analysis with packet sniffer (e.g. Wireshark, Fiddler; see chapter 6) or logging (e.g. Android logcat, iOS console)
    • Even if client code is accessible via browser developer tools, generally need server code written e.g. in PHP, ASP, JSP; see chapter 9 on discovery

How source code relates to the user interface (GUI) that users see

  • JavaScript (JS) calculator example: visible UI is produced HTML, and linked to JS code
  • First show example without error checking, so reader sees how simple code eventually becomes “cruddy”, how error checking and boundary conditions come to take over from what started off as the main event
  • Show screenshots, these are like the building; code which produces this is sort-of like the blueprint
  • But the source code here also controls in real-time how the “building” interacts with its users, and blueprints don’t really do this
  • Of course, in this example, the source code really does control, because JS is an interpreted rather than a compiled language
  • But even in a compiled language like C++ or bytecode-compiled language like Java, there is a direct relationship between the source code on the one hand and the machine language which controls app behavior on the other hand
  • “View Source” in browser: why it’s usually only a small part of the source code (e.g. Google calculator)
  • Event-driven programming: ON <event> DO <operation> ; registering function pointers, “hooks”, “subscribers”
  • Reading such code is less linear than with conventional code; see xxx; even conventional code is read differently from most other text
  • Disconnected anonymous functions: implications for searching code; need to also trace code (see chapter 19); depends on run-time events & data; see also xxx on code-reading “gotchas”, e.g. C++ implicit destructors, structured exception handling (SEH)
  • Minor tweak to the calculator will even let it run code which did not exist at the time the calculator was built: JS eval() is an extreme example of run-time code, but even compiled languages allow for it (after all, even JS itself has been written in some programming language)

What types of information are available in source code?

  • If the product is the thing, then why bother looking at source code at all?; see xxx above
  • Names for modules, functions, variables, data structures, classes etc.
    • “Symbols” for addresses
    • Arbitrary (programmer silliness e.g. “folks” and “chilluns” for patent/child; programmer jargon, abbreviations, etc.)
    • Yet consistent within product (and for compatibility, often consistent with external constraints such as standards)
    • Used to identify locations within code (the “where”/location required in PICs and ICs is often better supplied using a function name than with mere line numbers in a given source-code file)
    • Important for searching code, and yielding at least initial/candidate relevant areas
  • Comments: no guarantee of accuracy but “programmer estoppel”
  • Dates: OS file system dates, version control dates, dates within source-code file
  • “Dead code”, “latent code”, etc.
  • Semi-automatic authentication; “straight from the horse’s mouth”
  • What does source code “look like”?; see xxx
  • What is typically produced during source-code discovery?; see xxx
  • Software products often use more than one programming language; e.g. PHP and C++ for server code, and combination of JavaScript, Java, and Flash for client
  • Products often employ “little languages” (regular expressions, printf formats) or ad hoc mini languages
  • Answering who/what/when/where/why/how questions with source code; see chapter 4
  • Locating infringement, anticipation, obviousness, on sale, public use, etc. in source code; again, see chapter 4
  • Some code-reading “gotchas”: e.g. function overloading, namespace, scope, registered function pointers, vtables; C++ implicit destructirs; see chapter 20

What types of information are not available in source code?

  • Information on how the commercial product was built from the source code is often not considered part of the source code itself, and hence is often unavailable at least during an initial source code visit; see xxx on build files, makefiles, scripts, #ifdef conditional compilation within source-code and command lines which enable or disable compilation of portions of the code
  • Version, date, authorship information is often missing, or is unreliable, or is difficult to pin down; see xxx
  • Run-time behavior vs. static code:
    • Modern code is often a disconnected set of functions which, rather than call each other in a linear fashion, are instead registered to handle different events; such events may not be known until the program runs, and may depend on e.g. the context of network packets received from other computers; see example xxx
    • Some code which runs may only be presented to the application while it runs, and may even have not been written at the time the application was built; see example xxx
    • Examples where dynamic data (only available at run-time) plays a greater role in final result than does static code (available at compile-time); dramatic example: bifurcation diagram and p = r * p * (1-p) behavior depends largely on value of r
  • Static analysis of product vs. static analysis of source code
  • Naming and comments in code are not guaranteed to be accurate
  • Just as a name in code, which differs from the verbatim wording of a claim limitation, may still literally embody or carry out that limitation, conversely even a name in code which matches verbatim to the wording of a claim limitation is not necessarily an instance of that claim limitation; see chapter 21 on reading claims onto code
  • Elements of source code produced in discovery do not necessarily tell you what’s in the accused product: “dead code”, “latent code”, etc.
  • Source code often doesn’t “look like” what it does; the function/role of some code may be difficult to glean from the implementation in the source code (extreme example is C obfuscation contest, but even real-world code)
  • Source code produced in discovery generally does not include standard libraries, APIs, etc. used by the program
    • Sometimes a party’s own complete source code does not include details on a crucial component, because the party uses the component as a “black box” without knowing about its implementation
    • This is often fine, and generally expert need not  drill down into well-known interfaces (e.g. “sqrt” function does integer square root, “fsqrt” does floating-point square root, and usually don’t care HOW these functions do what they do, so long as their functionality is documented somewhere accessible)
    • However, sometimes knowing how a standard “black box” component is implemented may be crucial to showing presence or absence of a patent claim limitation
    • This may require third-party discovery (see chapter 9), or reverse engineering of a truly “black box” component (for which, e.g. source code is no longer available because the original vendor is out of business)

How source code relates to patent claims

  • Source code and patent claims are made up of the same stuff: words; contrast other types of infringing technologies (physical devices, chemicals)
  • That source code and patent claims both employ words may lead to an unnecessary expectation that relevant source code will employ the same words, terminology, or phrases as the patent claims at issue
  • Words in code are both arbitrary and self-consistent (see xxx above)
  • Why words in code, even when not literally identical to words in patent claims, may still literally infringe (rather than only infringing under doctrine of equivalents)
  • Some software patent claims have been written directly from source code (see xxx); but usually generalized, not a “picture patent”
  • Source code is of course likely to contain programmer jargon (thunk, shim, mk, gen, etc.)
  • Using dependent claims to find potential “synonyms” for limitations in independent claims, and then locating those “synonyms” in source code
  • Source code matching a patent claim, or claim limitation, will generally be surrounded by error checking, tweaks, improvements, exception handling, handling odd “boundary” conditions, etc., such that the patent-relevant core may be somewhat obscured; this reflects the 90/10 rule (10% of the code handles 90% of the situations which arise; 90% of the code is used to handle what happens perhaps 10% of the time) [cite article sort-of claiming that 95% of code is “fluff”] ; modern code partially addresses this with structured exception handling (SEH) but this leads to what is referred to below at xxx as “disconnected code”
  • Code to do “x” may not look much like “x”; see e.g. square root, pi, and prime-number sieve examples at xxx
  • Why the presence of source code matching a claim only partially shows, and does not guarantee, that an infringer is “using” a method: “latent code”; capability
  • Claim type (apparatus/device, method, system, method) set forth in claim preamble
  • How source code relates to apparatus vs. method vs. system vs. medium claim types; e.g. method must be used (can a method be “sold” or “imported”?), and source code alone, while showing that a method has been implemented (which matches “making” type of infringement), likely won’t show whether a method is actually invoked (though one can see whether a function will be called, if the calling code is itself invoked)
  • Claim construction & code interpretation: sometimes difficult to say whether a dispute is legal (claim construction) or factual (code interpretation)
  • How functional language in claim, together with “means” in specification, relate to implementation in source code
  • “Function” and “data structure” in source code, vs. “function” and “structure” in patent law; see “Some important distinctions” below

Some important distinctions

  • Code vs. data (but code is also just another form of data; why a patent litigation attorney might care)
  • Function vs. structure/implementation, including purposes set forth in preambles, and functional language under 112(6)
  • Don’t think that functions in code are only relevant to method claims, or that data structures in code are only relevant to apparatus/device claims
  • Static vs. dynamic (run-time) examination of code; has some relationship to apparatus vs. method claims
  • Making vs. using vs. selling
  • Sending/receiving/client/push vs. writing/reading/server/pull
  • Device/apparatus vs. method vs. system vs. medium
  • Types of claims (device, method, system, medium) & types of infringement (making, using, selling, offering, importing)
  • Types vs. instances
  • Variable (type, name, location, etc.) vs. contents of variable; e.g. difficult for non-programmers to grok “arr[x] = y” (the distinction between arr[x] as a “mailslot” and y as its contents)

Code as a site for disputes

  • Disputing that source code accurately reflects commercial product: see bulleted list at “Source code vs. product/service: ‘Where’s the money?'” above
  • See bulleted list at “Disputing a few lines of code” above [possibly move this section there]
  • See “Turning mountains into molehills” in ch.xxx
  • The case of the goat and the cabbages (there are no cabbages; they’re not your cabbages; they weren’t eaten; they weren’t eaten by a goat; it wasn’t my goat; my goat was insane)
  • “Issue-spotting” in code
  • Disputing code interpretation is different from, though related to, claim construction: implications of factual vs. legal dispute (jury vs. judge fact finding)

Miscellaneous points about source code

  • It’s all a matter of representation: software might use the same number e.g. 38 to represent entirely different things, in different contexts in the same program: an instruction code (opcode); an ASCII character code; an address of data; an address of code (the target for a goto or function call); etc.; see primegap example below
  • Code to do “x” often does not look anything like “x”: see xxx above
  • Layers in software:
    • How far does the code examiner need to drill down?
    • E.g. given a call to a function named “sqrt,” what are the (rare) circumstances in which the examiner will need to verify that (and how) the function computes square roots?
    • Drilling-down will be more important when the function is not a well-known standard such as “sqrt” whose implementation can generally be treated as a black box
    • Layers are often associated with “crud”, overly-layered software (requiring calling down through a dozen layers to get to the code which actually “does something”): either written that way by an over-generalizing programmer, or evolved over time to take on “crud” as later programmers were afraid to touch earlier working code, and instead added new layers to it
  • Relation of code to standards (both de jure and de facto standards, “undocumented APIs”)
  • “Low-level” programming languages (e.g. C, ASM) offer more control than high-level languages (e.g. JS), but generally require more code; the high/low distinction is somewhat arbitrary (as browser becomes more important as a display engine and as a basis for apps, JS is becoming almost a new type of ASM)

Code examples

  • Reminder of examples given above, e.g. JavaScript calculator
  • Expert’s inscrutable code reprinted in Novartis v. Ben Venue Labs, 271 F.3d 1043 (CAFC, 2001)
  • DeCSS gallery (http://www.cs.cmu.edu/~dst/DeCSS/Gallery/) , including tiny working CSS descrambler
  • In contrast to the inscrutable-looking code above, Rob Pike one-page regular expression engine, from Oram & Wilson’s Beautiful Code; lawyers may recognize “regular expressions” from LEXIS searching
  • The “world in a line of code” phenomenon: see 10 PRINT book, bifurcation diagram in JS, etc.
  • Searching for text: an example of the software-development thought process
  • Code to manipulate patent data, including claims; including data scraping from uspto.gov to show simple web code
  • Code to index & search text (such as source code) in Visual Basic, ndx & find in awk, and mkndx & find in C; these are useful programs employed in chapter 17 and chapter 18; walk through parts of code; three different implementations of basically the same components; the C code implements the associative array taken for granted as a built-in type in the awk code, and referred to as a “Dictionary” (and thereafter taken for granted) in the VB code
  • JavaScript calculator examples, showing how code relates to GUI, and also showing event-driven nature of modern code; anonymous functions; how would examiner find anonymous functions? by tracing rather than searching
  • Low-level machine-language example: prime-number gap in pseudo machine code, with C source code for assembler and for machine to run code

=====

[1] [see press reports re: Amazon one-click]
[2] [true enough; Hamming’s work was in 1950; Golay’s half-page paper “Notes on Digital Coding” from 1949; and the Reed/Solomon paper from 1960; as one author puts it (http://www.beartronics.com/rscode.html), “Error correcting codes are marvelous jewels of mathematics and algorithms, providing an almost supernatural ability to recover good data from a corrupted channel.” A nice online demo, in which the user can scribble on a picture and have an ECC detect the scribble and remove it, is at http://www2.mat.dtu.dk/info/experiencing/err_corr_codes/. ]
[3] [This hypothetical example is loosely based on US 6,175,827 and 6,671,813]
[4] [controversy over how to define software patent, which makes problematic any proposals to treat these differently from non-software patents.]
[5] [IP Today article; information gleaned from RE *is* public info; note RE legality in this context]
[6] [such information might e.g. be in an “SDK” which B provides to third-party software developers, enabling them to integrate their products with B’s]
[7] [note that D not obligated to generally disprove infringement, but only P’s particular assertions; except possibly if assert non-infringement as affirmative defense?]
[8] [prior art must be publicly accessible; and source code is usually proprietary, though see chapter 3 on open source]
[9] [cases: Corley; Bernstein; etc.]
[10] [Eolas v. MS; claim construction on a patent involving source code; whether party has satisfied discovery request for source code, etc.]
[11] [Reiffin v. MS, 214 F.3d 1342 (CAFC, 2000) at xxx]
[12] [copyright; though binary code is also text/doc]
[13] [legally operative words, speech acts, Austin, Searle, “how to do things with words”; make sure reader understands why this may matter to them in patent litigation]
[14] [Assuming the source code does not contain errors of the type which prevent compilation. These “compile-time” errors are distinct from bugs, which are “run-time” errors which are problems or omissions in the code’s logic.]
[15] [re: “as-built” vs. original design]
[16] [The patent spec helpfully provides some non-limiting examples of what it means by ECC; as for initially-vague sounding “digital token”, it states e.g., “The postage value for a mail piece may be encrypted together with other data to generate a digital token. A digital token is encrypted information that authenticates and enables verification of the integrity of the information imprinted on a mail piece including postage values.” But make sure this not only about prior art; re: addressee info: “United State Postal Service eleven digit destination point delivery code (DPDC) or its equivalents as addressee information”]
[17] [if A’s complaint includes B’s “making” a patented device (not method), then even “dead code” may infringe; see latent code cases e.g. Finjan]
[18] [cite that B not burden to disprove any infringement]
[19] [e.g. how experts could disagree about what function does]
[20] [B burden to show non-infringement generally shifts here, if going to say that what B does do is non-infringing]
[21] [cite Finjan etc. latent code cases]
[22] [case where court notes that X need not be perfect, therefore buggy X can match claim for X; also buggy X is equivalent to X, in that PHOSITA would differences as insubstantial?; Spinellis #179 on flawed code as spec for corresponding intended implementation]
[23] [goat and cabbages]
[24] [“fruition”; head of company?]
[25] [open source; JS]
[26] [even if B employs interpretive prog lang such as JavaScript, PHP, generally must show that what sent to consumers (compressed/obfuscated JS) or used on server (PHP) matches the source code.]
[27] [chicken/egg cases]
[28] [Rule 11]
[29] [note that RE is legit tool especially in this context of court requirement]
[30] [cases requiring “how” as well as “where”]
[31] [note a few cases in which didn’t need source code, or insufficient]
[32] [exceptions]
[33] [chicken & egg]
[34] [maybe cite Geoff Chappell]
[35] [courts on suitably trained; contrast open source “eyeballs”; IEEE “reverse engineering” mostly re: source code]
[36] [not always; e.g. log files in chronological order]
[37] [not always accurate; e.g. out of date]
[38] [0/1 myth; debug, assertions, etc.; RE product to prepare for source code exam]
[39] [though again, the “MkECC” name may well find its way into the product itself, along with the corresponding code]
[40] [though binary files comprising product also contain text which can be used to create searchable text; CodeClaim]
[41] [see ch. xxx; decompile Java, .NET, SWF, etc.]
[42] [original motivation for this book was a client’s rhetorical question “if you have access to their source code, what else could you possibly need?”, implying that source holds all the answers to any technical questions arising in software litigation]
[43] [“selling” code which does not run (latent code); using source code as necessary scaffolding to get to final product]
[44] [http://nctunsntrelay.googlecode.com/svn/trunk/module/wimax/phy/fec/rs/rs_code.cc ; see also http://code.google.com/p/6615return-to-dust/source/browse/trunk/RS/rs.c ; following use the following from code: encode_data(), LFSR, NPAR, build_codeword; must be preceded by initialize_ecc, which calls init_galois_tables, compute_genpoly]
[45] [Uniloc v. Microsoft, 632 F.3d 1292 (CAFC, 2011); was it reasonable for expert to state that MD5 performs addition?]
[46] [“How to Be a Little Gauss”, http://www.jimloy.com/algebra/gauss.htm ]
[47] [see e.g. “substituting each static control link in the applet with a thunk DLL” in US 6,275,938 or “wherein a thunk component replaces the at least one address and links the context component to the shim component” in US 7,392,527]
[48] [http://paulirish.com/2011/the-history-of-the-html5-shiv/ ]

Print Friendly, PDF & Email