Open to Inspection: Using Reverse Engineering to Uncover Software Prior Art, Part 2
By Andrew Schulman
[An earlier version of this article appeared in New Matter (California State Bar IP Law Section), Fall 2011.]
See Part 1
- How reliable, particularly as to date, is evidence of prior art which is later uncovered through RE?
- If a feature of an earlier public product is only uncovered later, via RE, can the feature reasonably be characterized for 102(a) or 102(b) purposes as prior use or knowledge, on sale, or a printed publication? Must one ordinarily skilled in the art have been able to (readily?) uncover the feature at the time?
- If a feature can as a technical matter be uncovered with RE, but the product was accompanied by no-RE licensing language, is such language by itself sufficient to make the product’s internals confidential, and thus not prior art?
- Is a database of strings extracted or generated from software products an infringing derivative work, or are the fragments so used subject under the merger doctrine to at most “thin” copyright protection? At the very least, is a database of strings from code a fair use, akin to Google’s indexing or caching of text from web sites?
Getting Information from Software Products
The techniques noted in Part 1 extract some text from software products or infer text from numbers readily found inside the product. The extracted text includes error messages, including internal error messages known as “assertions” (often containing source-code fragments); debugging information left in the product; names of APIs (application programming interfaces) and protocols used by the product; pathnames of source-code files used to build the product; and embedded scripts. “Generated” text (text imputed to the product, based on other text or on numeric data in the product) includes names of services, protocols, and algorithms.
These techniques rely on extremely-simple algorithms such as “strings,” and might not even be considered as RE by many programmers, nor fit within either the standard legal definition of RE or an “end-user license agreement” no-RE clause. Even some fervent opponents of software RE have restricted their opposition to the more sophisticated techniques of “disassembly” or “decompilation” of code.
Decompilation attempts reconstruction of the original source code, and is feasible for virtual-machine languages such as Java, Microsoft .NET code, and Flash ActionScript (found in the SWF files ubiquitous on the web). Even then, decompilation can no more return to the original source code than one could retrieve a cow from a collection of hamburgers. When source code is “compiled” to form a product delivered to customers, much of the source code – particularly comments made by programmers in the source code – is unused.
With “native” machine code, to run directly for example on an Intel processor, disassembly rather than decompilation is employed. Disassembly does not attempt to recover the original source code, but instead displays machine code in a more readable form, and labels areas of the code based on cross-references found within the code. While neither disassembly nor even decompilation can retrieve the original source code from a software product, these techniques nonetheless can yield information relevant to a prior-art search.
In a patent infringement case, an expert may be asked to disassemble or decompile code, particularly in advance of the opponent’s production of source code in discovery. Disassembly and decompilation typically yield too much material. In one case a medium-size Flash SWF file used on a web site was found to be built from almost 400 ActionScript source-code files, totaling over 50,000 lines of code. The expert’s analysis can be partially automated by reducing the file into a set of searchable “signatures,” corresponding to each subroutine in code. Such signatures can also be based on unusual code sequences or on higher-level machine instructions whose names may be indicative of the elements or steps of an invention.
Dynamic Reverse Engineering
In the techniques described above and in Part 1, a software product is treated as a text to be examined without running the product, often without even installing it. One is reading a text, perhaps with some translation into a more readily-understandable language. These various ways of reading another’s software product such as strings, metadata extraction, and disassembly are collectively referred to as static reverse engineering.
Another set of techniques, dynamic reverse engineering, involves running the software program of interest, under the control of a second program, which tests or tracks the first product’s behavior. Dynamic RE tools include debuggers, network monitors (packet sniffers), file monitors, and API trace utilities. Dynamic RE appears particularly appropriate for method/process claims, but also has application to apparatus claims.
Many public web sites – whose front-end code is visible in the browser’s “View Source” window, explored with tools available for web developers, or (in the case of Flash SWF, .NET, or Java programs) decompilable – have large server back ends (e.g., ASPs), even whose object code is inaccessible, at least until discovery. The traffic between the client and the server can be monitored with a tool such as Fiddler or Wireshark, and various “black box” techniques can help draw inferences about server code. Note, however, if the code is outside public view, it is likely not prior art in the first place. Whereas object code is a “non-informing public use” (see below), server code is closer to a secret use.
Copyright Implications of Dynamic RE
From a copyright perspective, dynamic RE is sometimes viewed more favorably than static RE, because dynamic RE is merely observing what a product does, rather than looking inside (and in the course of looking, possibly temporarily copying bits of) the product. But because dynamic RE requires installing and running a software product, it is far more likely than static RE to require the motions of assent to a “clickwrap” license.
Dynamic RE carried out in a later period also presents the problem of reliably reconstructing the software environment, including network components, as it existed during the prior-art period. On the whole, static RE is better suited to uncovering prior art than is dynamic RE, which however is well suited to investigating claims of patent infringement.
Comparing Strings in Code with Limitations in Claims
Once searchable text is extracted or generated from software products and placed into a prior-art database, the elements or steps of patent claims can then be checked against the text, much as would text from any document. The software prior-art database will likely contain few complete sentences, and many small text fragments. Patent searching generally involves combinations of words or short phrases, corresponding to the elements or steps of a patent claim, together with likely synonyms. Such initial searches to produce “leads” for closer inspection are not a radical departure from what patent searchers already do.
As an example, take a patent claim with the following limitations: media channel, predefined episodes, subscribing, downloading updates, checking available space, and deleting earlier episodes; the patent claims priority to applications filed in Nov. 2003.
A search was conducted across a limited set of Microsoft products for combinations of text including the following terms: episode, channel, subscribe, update, and download. Results were found in a Windows 98 program named tvx.exe and in two modules it references, named commmstv.dll and msepg.ocx. These 1998 files refer to “Microsoft TV Services” and “WebTV for Windows”. The tvx.exe file contains embedded SQL source code for manipulating a SQL database of television data. Further searching among Microsoft products, including a 1998 Microsoft “Platform SDK” (software development kit), yielded publically-available source-code files related to a TV project from the early 1990s named “Microsoft Guide”, an EPG (Electronic Program Guide) with support for episodes, channels, subscriptions, and so on.
The patent in question appears not to refer to Microsoft explicitly, nor to cite patents assigned to Microsoft, so this may be a fruitful area for closer inspection. While hardly conclusive, this small example illustrates the basic approach of searching through product code to find potential prior art.
Even having found all the elements or steps together in a single prior-art reference, the searcher must determine if they fit together in an appropriate way. A patent claim is not a parts list or aggregation of components. For example, a relational database with a built-in debugger does not anticipate a claim to a debugging system with a relational-database back-end.
The Software Lexicon
How likely is it that software products contain patent-relevant strings, so that patent claims can be used to guide a search of text extracted from software products?
Claims may be broader than what an inventor has implemented, reflecting the (under-acknowledged but genuine) creative role of patent attorneys in uncovering the actual invention in its broadest sense, or narrower, so as to avoid prior art. Searching non-patent literature (NPL) often requires fleshing out claims with concrete examples, preferably proper nouns, from dependent claims or the patent specification. Method claims in particular tend to be more abstract, and more difficult to line up with NPL.
The language used by programmers often differs from that contained in software patents. Where a patent might refer to a copying a command or a packet, text extracted from a product may refer to cpy, cmd, or pkt. Programmers may use jocular terms, such as kill, crash, barf, and magic, whereas a patent might refer to deletion, failure, error message, and opaque numeric value. Code has even more misspellings (especially dyslexias) than patent documents.
At the same time, software patents include the same odd terminology as software itself. Terms resembling ordinary words but with a special software meaning, such as thread, spool, hash, salt, and thunk, appear in software patents as well as in code, with (subject to claim construction) the same meaning. The titles of even entire classes of patents include terms programmers use, including VxD, Markov, and prefetch.
Patent specifications often contain code fragments useful to narrow searches for what might otherwise be mind-numbingly broad claim terminology. For example, while a step involving “an intercept software component technology” was difficult to align with what appears in code, the patent specification referred to “vnodeops”, a code reference associated with intercepting file operations.
Between the ability of a patentee to be “their own lexicographer,” the preference for intrinsic over extrinsic (e.g. dictionary) evidence of claim meaning, the inability of language to fully capture an invention’s boundaries (Festo), and a tendency of computer programmers to use non-standard naming in their code, any software prior-art searching system requires an extensive lexicon, including synonyms, abbreviations, and class context. Frequently-occurring terms such as ROM, CPU, and DVD, are discounted.
Accordingly, the software lexicon presents difficult problems, but given the importance of software, these problems are worth tackling.
How Many References?
Use of software as prior art presents an issue distinct from printed documents: what constitutes a single reference in the software context? It is black letter law that “A claim is anticipated only if each and every element as set forth in the claim is found, either expressly or inherently described, in a single prior art reference.” How does one determine the boundaries of a single reference in a software product?
A single software product such as Windows may include hundreds or even thousands of different modules, each a separate file on the disk. Is everything in Windows part of a single reference? This does not sound right, at least when applied to parts of Windows not typically used together. But neither should each module be regarded as a separate reference, such that, for anticipation, every limitation of a patent claim must be found within a single module. The better view is that a single software reference includes a given module, plus those the module relies upon. A single binary/object file, as delivered to customers, or put into use in a process/service, is a sensible division-point for a software “reference” — arguably more so than a single source-file code.
If it proves overly complicated to form a single prior-art reference from what are arguably different pieces of code – albeit comprising a single product, or used together to bring about a single result – the analysis can shift from § 102 anticipation (where a single reference is required for lack of novelty) to § 103 obviousness (where multiple references may be combined to demonstrate the requisite motivation and thus obviousness). The POSITA (person of ordinary skill in the art) may have implicit motivation to combine references, particularly ones already having a real-world interrelation.
Dating and Reliability of Software References
Whenever code is used in litigation, it is crucial to accurately date the code. File date/timestamps found on manufactured CD-ROMs present few problems; likewise for file date/timestamps found on non-writeable floppy diskettes, or when a original-shrinkwrap copy of the software is producable. However, vendor-provided dates can be changed, and even some vendor-provided dates are incorrect. Thus, a software prior-art database will collect as many dates as possible for each code file: creation date; date of last access and of last write; any © dates embedded in the file; compiler/linker dates built into the file; and dates on the physical media and packaging.
Reviews and advertisements in trade journals may relate back to a product date. For more recent software, online resources can provide the date of a code file’s first appearance “in the wild.”
When considered as a printed publication, software prior art is an electronic publication. Greater concern over the reliability of electronic publications appears in Europe than in the US, but even the EPO views dating rather than contents as the primary reliability concern. Public accessibility of material posted on the internet is a separate concern.
Finding Old Code
Existing software-related prior art NPL (non-patent literature) libraries are based, not on the software products themselves, but on descriptions of software. Further, this descriptive material is predominantly sourced from academic rather than trade publications, and from abstracts rather than full text. To the extent that trade publications have been used, the source code that often accompanies articles in publications such as the Microsoft Systems Journal, Dr. Dobb’s Journal, or MacTech, is missing, as generally are the all-important advertisements. There are several useful “defensive publication” archives, including one specially focused on software prior art, but again, these rely on printed publications several steps removed from the software itself.
Given the incentive to smoke out prior art, where can litigants find older software products?
Several code preservation societies and computer museums have large collections of software products. But with one notable exception, these generally are indexed at best by product name, not by filenames within the product, much less by the textual content of each file. General libraries, whose catalogs have in turn been cataloged at WorldCat, surprisingly often have software titles, though presumably not for rental. At one point, the Library of Congress had a Machine Readable Collections Reading Room, which was closed.
Products which were sold to the public can often be found on eBay, in the backs of old computer books available on Amazon.com, and in stores such as Weird Stuff Warehouse in Sunnyvale, CA. The basements and closets of experts working on a case are natural sources of relevant prior art, with caveats regarding dating, reliability, and chain of custody.
In addition to its “Wayback Machine” (a source of dated object code if one already knows the filename of interest), the Internet Archive has a “Software Archive” project, the impetus for the software-archiving exemption to the DMCA anti-circumvention clause. Unfortunately, while the archive itself apparently contains a large amount of older Apple, Atari, and Commodore products, for now the public collection is dominated by the TUCOWS and CD BBS archives, already available elsewhere.
One issue faced by software archivists is uncertainty as to what this material is good for; in some cases, old code is preserved solely so that old data can still be accessed. These archives have not been designed with patent prior-art searching in mind; indexing does not include sufficient detail beyond product name.
The US government’s National Software Reference Library (NSRL), part of the National Institute for Science and Technology (NIST), is a collection of approximately 21,000 different software products from about 1,400 vendors, whose contents have been processed to obtain digital signatures uniquely identifying the files in the software packages. NIST distributes these signatures largely for use by law-enforcement investigators as indicators of what to ignore when inspecting computer systems seized in criminal cases. These lists of filenames associated with products may help in some prior-art situations, but the NSRL’s goal of filtering out commercial code is almost the exact opposite of what software prior-art searchers need.
If Nobody Saw it at the Time, Is it Still “Prior” Art? (Answer: Of Course It Is; the Question is Instead Whether the PHOSITA Could Have Seen It at the Time, Possibly via Reverse Engineering)
Vendor A puts feature X in its product P, sold to the public, but only some employees of A know about X. This occurs at time T1. Some time later, at T2, vendor B receives a patent on X, and B sues C for patent infringement. Now C, at time T3, hires an expert, who reverse engineers P and uncovers X. Is P prior art which invalidates B’s patent? In other words, if a feature of an earlier public product is only uncovered later, via RE, can the feature reasonably be characterized as prior art which may invalidate a later patent?
It is black-letter law that prior art is public; a drawing made on the underside of a tablecloth in someone’s kitchen, unknown at the time to the tractor-pulling art, is not prior art. As befits black-letter law, there are important exceptions of “secret prior art.” 
Yet it happens all the time: at T3, C uncovers a remarkably obscure document dating from time T1, and this is prior art, so long as it’s not an internal memo of B (or of A, for that matter).
Between a secret on the one hand, and ready availability on the other, there are only degrees of convenience or obscurity. Prior art status depends, not on whether the public actually accessed a reference at the time, but whether the relevant public could have. The prior art need not have been inspected, so long as it was open to inspection.
Thus, the question becomes, if a feature is only uncovered later using RE, could it have been uncovered earlier? Must one ordinarily skilled in the art have been able to (readily?) uncover the feature at the time? Alternatively, if no one saw it at the time, is that perhaps suggestive, circumstantial evidence that it couldn’t be seen at the time?
Note that this question need not be asked of the patentee’s own pre-patent code, because even secret use by the inventor is subject to the 102(b) on-sale bar.
Further, the question becomes irrelevant, at least under a 102(g) concealment analysis, if a third party’s earlier inventive activity “benefitted” the public. For example, a golf ball whose covering is made of a secret material is prior art, even if the secret to the golf-ball covering only becomes public afterwards, because the golf ball itself (and certainly its exterior) is public. In contrast, a hidden process used to produce a public product – such as a machine to make quilts, where the quilts are sold, but the machine is not, where the inventive workings of the machine cannot be discerned from the quilts — is not prior art.
There is an important distinction between a hidden inventive process to produce a public product on the one hand (not prior art), and a public product with an invention hidden inside it on the other (prior art). In the quilting case, the public can enjoy the fruits or output of the invention, but has no ability to access the invention itself, because the invention and its output are separable, and the invention can’t be discerned from the output. In the golf ball case, the invention itself is what’s in public, and the public merely failed to observe at the time.
Where do software products fit here? Some are like the secret quilting machine used to produce public quilts. In particular, in client/server products, code runs on the server, with only its output delivered to clients. Even object code on the server is inaccessible, except to the extent that clients can understand the server code via dynamic RE (see above). Software held back from the public, whether in source or object form, is generally not prior art.
Much object-code software, however, is disseminated to the public, and what the public could learn through RE is prior art, just as the “secret ingredient” of a golf ball is prior art, if the golf ball itself is in sufficiently plain view. As noted in the golf-ball case, “even though there may be no explicit disclosure of the inventive concept, when the article itself is freely accessible to the public at large, it is fair to presume that its secret will be uncovered by potential competitors long before the time when a patent would have expired”.
This area comprising the undocumented portions of public products is known, in the 102(g) context, as “non-informing public use.” The concept dates back to the 19th century cases when, for instance, an arrangement of bolts in a safe was held to be prior art, even though “[t]hey were, it is true, hidden from view after the safes were completed, and it required a destruction of the safe to bring them into view. But this was no concealment of them or use of them in secret. They had no more concealment than was inseparable from any legitimate use of them.”
But is it really correct that a well-hidden component of a public product, perhaps requiring destructive testing of the product, is not a secret but merely a non-informing public use, at least until the secret is actually found out? Having noted that secrets are generally not prior art, aren’t the internals of even very public software products, such as Microsoft Windows, regarded as trade secrets? If vendor C learns of the secret at time T3, wasn’t it still a trade secret at T1? Isn’t it a secret until someone actually learns it, not merely when they could have conceivably done so?
A secret, after all, is never more than a matter of delay or timing. That X was uncovered, even easily, at time T3, does not make it non-secret at T1. The Epstein case teaches that the date for software prior art can relate back from a later abstract, so perhaps the same relation-back applies to some inherent feature of the code, merely waiting to be examined. But the abstracts in Epstein may relate to more superficial aspects of a product than what one learns from even the simplest RE. How would learning a secret at T3 relate back to a finding of non-secrecy at T1?
This question — whether learning a secret at time T3 “relates back” to a finding of non-secrecy at time T1 — implicates much larger questions: trade secret/patent interaction, the prior-user defense, and the basic disclosure vs. invention policy conflict, in which the patent system should reward one who discloses an invention, over one who keeps it secret, but that patents should be granted for inventions, not what has already been done.
Trade-secret law provides a simple answer. Most enactments of the Uniform Trade Secrets Act (UTSA) include as part of the definition of a trade secret that the information’s value derives from “not being readily ascertainable by proper means” (and RE is generally seen as a proper means). Information’s status as a trade secret depends, not on whether it has been actually ascertained, but on whether it was readily ascertainable, by proper means.
So, the question becomes whether information ascertained at T3 could “readily” have been ascertained earlier, at T1. Could a POSITA have been able, before the invention or critical date, to readily uncover the information, without (to analogize from a somewhat different area) undue experimentation and fresh inventive activity? That depends:
- Strings found verbatim in object code at T3 were inherently present at T1, plainly visible if anyone had decided to view the object code in a word processor. Readily ascertainable.
- Text from external documentation and imputed to object code, based on strings or numbers found in the code, are readily ascertainable if the documentation existed at the time, and the POSITA would have known to look there. Given the possibility of false positives (e.g., somehow the numbers 243FF6A88 and 85A308D3 are present in the code for reasons unrelated to the Blowfish algorithm), there is room for debate on inherency.
- Information available at T3 through disassembly or decompilation, if the disassembler or decompiler were available at T1, is readily ascertainable at T1, at least so long as understanding the listings is within the POSITA’s skills, which it may not always be.
- Information from dynamic RE, such as a network monitor, is unlikely to become an issue in the first place because of the difficulty of reliably reconstructing, at time T3, the T1 computing environment. But if an expert at T3 can show that he has reproduced the T1 environment, then almost by definition, the POSITA at T1 would have been able to do so.
At some point, it is sufficiently difficult to glean information through RE at T3, indicating it was undetectable at T1. If the requisite RE could not have been achieved at the time (including for legal reasons; see next section), the technology is effectively the same as a secret machine locked away from public view. Information only gleaned through heroic RE efforts, for example where the time and cost to RE exceeds a trade secret owner’s reasonable expense to prevent RE, is not “readily ascertainable.” Some RE is sufficiently creative, uncovering features of a product unknown even to the vendor, that it might be viewed as making a fresh inventive discovery rather than merely finding prior art. Even here, there is no requirement that prior art be known to its owner.
If prior-art software is being characterized as a printed publication – which is quite reasonable given the prior art status of electronic publications — then it must at the time have enabled the POSITA to practice the invention, which a working product in the market would presumably do.
Further, to be prior art, a reference must have been available or accessible, not only physically, but logically. In the college theses cases, it was insufficient that a thesis reside on a library shelf; the reference must be cataloged or indexed so that the POSITA at the time could have found it – not merely have found it if he already knew with hindsight exactly where to look, but actually found it if he didn’t already know its location. At the same time, only the title need have been cataloged, not the full text contents. The analogy to software is that a competitor, knowing of a public product, would have been able to examine the object code.
To summarize, something normally hidden may nonetheless be public. Of course; every student in an IP survey course knows this from the 19th century corset case, in which even part of an undergarment was in public use because it outside the inventor’s control.
Is “Shrinkwrapped” Code Really Out in Public?
Software products generally come with “shrinkwrap license agreement” or end-user license agreement (EULA) language restricting reverse engineering. Because software is a form of text, restricting RE in effect restricts reading the text, at least too closely. Indeed, some vendors recognize this, and not only attempt to restrict the usual triumvirate of “reverse engineering, disassembly, and decompilation,” but assert the user will not “analyze” the product.
How do these no-RE clauses relate to the previous section’s conclusion that technology uncovered later through RE can constitute prior art? The immediate concern is not whether such language prohibits reverse engineering for the purpose of uncovering prior art (see next section). Rather, the concern goes back to cases such as Gillman v. Stern, where Judge Learned Hand noted how the earlier machine “was always kept as strictly secret as was possible, consistently with its exploitation. In general, everybody was carefully kept out of Haas’ shop where the four machines were used.” Similarly in a well-known case on oil prospecting as prior art, the court emphasized the lack of any deliberate concealment. The question is whether a software vendor’s placement of “you may not analyze” language on the product is sufficiently deliberate action to make the product’s internals secret, such that the product’s internals are not prior art.
With one exception, such language is widely regarded as unenforceable, often even carrying indicia of its possible unenforceability: “you may not analyze, reverse engineer, decompile, disassemble the Software or seek to obtain a source code of the Software in any way, except for the scope stipulated by applicable law.” As for the applicable law, as noted earlier, California’s Uniform Trade Secrets Act explicitly states reverse engineering by itself is not improper means of learning a trade secret. The important exception is non-public software, such as a limited “beta” test for which the user must provide a signature. Even here, simple methods such as string extraction may not even constitute “reverse engineering” as the license language defines this term.
But does the general non-enforceability of such clauses really dispose of the question whether the mere presence of the clause renders information hidden inside object code a secret, at least until someone decides to ignore the clause? Doesn’t the presence of a no-RE clause make information buried in otherwise-public software that much less accessible or available to the POSITA, akin to an internal document, which is not a printed publication “no matter how many copies are distributed”?
However, while a company’s proprietary source code is an internal document, information buried inside object code available in the marketplace, with no-RE shrinkwrap, is something entirely different: a widely-available public document which is marked “Confidential.” If it can be had by anyone for the asking, by paying some amount, from whom is it being kept confidential?
Akin to Classified Technology?
The software scenario – activity designated as confidential but carried out in the marketplace – can be analyzed under cases relating to the prior-art status of government-classified technology.
But second, the holder himself must do something to keep the material confidential. This is akin to reasonable security precautions (RSP) in trade-secret law. For example, documents in a library on a military base, with security and subject to the librarian’s approval, were nonetheless printed publications because they were distributed to commercial companies and private individuals without restriction. In general, information is publicly accessible if “interested members of the relevant public could obtain [it] if they wanted to,” even for a fee.
A recent application to software is Ex parte ePlus, in which the BPAI considered ePlus’s argument that a user manual was proprietary: “Most software and the manuals that come with such software would contain restrictions on copying and further distribution, but that would not rise to the level of those items being considered confidential disclosures … [which] would require restrictions on who could purchase or otherwise obtain the specific documentation… . The requirement of purchasing software to obtain a manual goes to its cost and not to its accessibility.”
The BPAI distinguished IMX v. Lendingtree, in which the user manual for mortgage-related software was found not to be a printed publication, because in IMX there was no independent evidence that the manual (stamped “Highly Confidential”) was publicly accessible, whereas in ePlus the examiner “clearly show[ed] that the software packages had been sold”.
An analogy may be made to the experimental use exception to public use under patent (or trade secret) law, in which the inventor’s control over the invention is the key factor in determining whether use of the invention in public is truly experimental, or a public use. Just as a statement of experimentation, without control, would not cancel public use, likewise a statement of confidentiality, without a restriction on dissemination, does not create confidentiality. Secrecy has an objective as well as a subjective component, and a vendor cannot cast a spell of secrecy over object code present on perhaps millions of machines (including open-access internet cafes) by mere ipse dixit or by a no-RE clause in a EULA (End User License Agreement).
Collecting Prior Art as Copyright Violation?
Another potential concern is whether a database built from text extracted from software products would constitute a copyright-infringing derivative work.
First, the database contains extracts from code, not the entire code; it is not as a collection of discrete pieces of software, but a concordance of words and phrases, linked back to information on the file in which the word or phrase is found. In many cases, the same string is located across files from multiple vendors; in the case of some vendors, few of their strings will be unique.
Because the database contains fragments from code, most of these fragments served some functional purpose in the product from which they were extracted, and would thus be analyzed under the copyright merger doctrine. In Lotus v. Borland, even an extensive set of 469 strings comprising a menu hierarchy was found to be functional, akin to the labels on VCR buttons. The prior-art software database discussed here is, in large part, collecting such labels, such as names of API calls, error messages, debug text, and menu and dialog items. Copyright protection would be “thin,” largely confined to non-constrained portions of the text, such as unused “dead code.”
Alternatively, analysis would follow fair use in library preservation, and of caching by search engines such as Google. Given that the database’s purpose is patent related, and can also be used to find software copyright infringement and open-source violations, litigation privilege would be a further defense.
A Database of Prior-Art Software
The author has used preliminary versions of this database of prior-art software in several cases, and a group of software engineers including the author is building a more extensive database. The code is out there; it’s prior art; it can be indexed and searched.
 In addition to examples given in part 1, note the similarity between older and recent concerns, e.g. in 1995 “Useful examples of prior are are most likely buried within application programs or system software, and known only to the authors in any detail” (Bernard Galler, Software and Intellectual Property Protection 32 (1995)) and in a 2009 online discussion of i4i’s metacode-mapping patent (US Patent 5,787,449, recently the subject of i4i v. Microsoft, 564 US ___ (2011)), “it can be impossible to do prior art search for software. There were numerous word processing systems in the 70s and 80s that took [this] … the companies that created these systems are gone and none of the systems [are] running. There was no Web so the only articles are in old magazines. Companies didn’t publish the details of their implementations…” (comment at http://broadcast.oreilly.com, Aug. 15, 2009).
 As discussed in detail later, while a third party’s truly confidential or secret (as opposed to merely hidden or obscure) technology is not prior art, the patentee’s own secret activities may raise the on-sale bar.
 An API is a service provided by one piece of software and used by another. APIs generally have names, or numbers for which there are well-known names. These APIs are typically documented by the vendor, but there is also a large category of “undocumented” APIs. In Microsoft’s case, some of these were the subject of an antitrust consent decree. See part 1, and Geoff Chappell, Missing Settlement Functions, at http://www.geoffchappell.com/viewer.htm?doc=studies/windows/shell/missing.htm.
 The “strings” algorithm counts consecutive sequences of printable characters; when the end is found of a sequence longer than some minimum such as 4 or 5 characters, the sequence is printed, the count reset to zero, and the search for printable characters resumed.
 “…reverse engineering, that is, starting with the known product and working backward to divine the process which aided in its development or manufacture.” Kewanee Oil v. Bicron, 416 US 470, 476 (1974), citing a 1902-3 case, though the earliest judicial use of the term “reverse engineering” appears to be Koehring v. Etnyre & Co., 254 F. Supp. 334, 339 (N.D. Ill. 1966); see also Thompson Ramo Woolridge Inc. v. US, 175 Ct. Cl. 527, 532 (Ct. Cl. 1966).
 Jonatahan Band and Masanobu Katoh, Interfaces On Trial 2.0, 138-9 (2011; available online at http://mitpress.mit.edu/band) (IBM stating in 1990 that, among RE techniques, only decompilation is “regarded as objectionable”). IBM had claimed these techniques assist software pirates, before itself feeling compelled to reverse engineer Microsoft Windows. Ibid, 56 (IBM reverse engineering Windows), 138-139 (earlier IBM demonstration of decompilation facilitating piracy).
 E.g. Clive Turvey, Win32 disassembler in dumppe at http://www.tbcnet.com/~clive/vcomwinp.html#DUMPPE; IDA Pro disassembler at http://www.hex-rays.com/idapro/; SWF Flash decompiler at http://www.sothink.com/product/flashdecompiler/; and .NET decompiler at http://www.reflector.net/.
 This point is emphasized in a classic article, Andrew Johnson-Laird, Software Reverse Engineering in the Real World, 19 U. Dayton L. Rev. 843 (1994); today, the point would perhaps rather be how much information (though not the original source code) can be gleaned from widely-available software products.
 Andrew Schulman, Finding Binary Clones with Opstrings and Function Digests, Dr. Dobb’s Journal, July 2005 (Part I), August 2005 (Part II), and Sept. 2005 (Part III).
 Andrew Schulman, Notes on Similarities between QTVHDW.DLL (Apple QuickTime for Windows) and DCISVGA.DRV (Microsoft/Intel Video for Windows, Feb. 22, 1995, at http://www.sonic.net/~undoc/apple_ms.txt (sequence of unusual API calls used to show identicality between code from two different vendors).
 E.g. Fiddler web debugger at http://www.fiddler2.com; Wireshark (formerly Ethereal) at http://www.wireshark.org/; and Mark Russinovich and Bryce Cogswell’s SysInternals monitoring tools, including RegMon, FileMon, PortMon, and ProcMon, now distributed by Microsoft at http://technet.microsoft.com/en-us/sysinternals.
 But see the SABRE airline reservation system in Lockwood v. American Airlines, 107 F.3d 1563 (CAFC, 1997).
 Andrew Schulman, Hiding in Plain Sight: Using Reverse Engineering to Uncover Patent Infringement, Intellectual Property Today (Nov. 2010).
 See Gregory Stobbs, Software Patents (2010 revision) § 5.10 (Software Prior Art Searching)
 This example was selected, more or less at random, from US Patent 7,568,213, which has been reported as a “patent for podcasting”; see http://w2.eff.org/patent/wanted/volomedia/EFF_volomedia_prior_art.pdf.
 See Robert Faber, Mechanics of Patent Claim Drafting (2010 revision) § 3.22 (Tying the Elements Together); NetMoney v. Verisign, 545 F.3d 1359 (CAFC, 2008); Therasense v. Becton Dickinson, 593 F.3d 1289 (CAFC, 2010).
 US Patent 7,117,483
 Ronald Slusky, Invention and Analysis and Claiming: A Patent Lawyer’s Guide (2007).
 A uspto.gov search for ACLM/thunk returns 25 hits.
 US Patent 6,606,685
 See Interactive Gift v. Compuserve, 231 F.3d 859 (CAFC, 2000), in which the term “point of sale” (POS) location in US Patent 4,528,643 referred, not merely to cash registers and the like, but even to a home computer. The venerable “own lexicographer” phrase appears e.g. in Chicago Steel Foundry v. Burnside Steel Foundry, 132 F.2d 812 (7th Cir., 1943) (“and we add, his own grammarian”).
 Vitronics v. Conceptronic, 90 F.3d 1576 (CAFC, 1996).
 Verdegaal Bros. v. Union Oil Co. of California, 814 F.2d 628 (Fed. Cir. 1987), cited in MPEP § 2131.
 Though some software vendors claim their products are “tightly coupled” or “integrated,” which if true (or by estoppel) would suggest all modules constitute a single reference.
 Of the thousands of items found in the “kitchen sink” that is Windows, few are used all the time, and the others tend to form sub-groups of modules used together; this was the basis for a Microsoft effort known as “MinWin” (minimal Windows) and for Windows Embedded; other software exhibits similar locality of reference.
 But MPEP § 608.1(p) expressly states, of incorporation of reference by a patent, “An incorporation by reference by hyperlink or other form of browser executable code is not permitted.” Trying to find a single reference from multiple interrelated modules also raises the issue of whether module reliance is determined statically or dynamically; static reliance (i.e., a “hard” link from A to B, found in a module import/export table) can yield a bright-line determination, whereas dynamic reliance (in some amount of testing e.g. with a “dependency walker,” it was found that A invokes B) depends on the particular configurations tested. These issues resemble those in the determination of which modules Microsoft was obligated to document under its consent decrees with US and European antitrust/competition authorities.
 KSR v. Teleflex, 550 US 398 (2007).
 Older MS-DOS floppy-disk software is often dated Jan. 1, 1980, which is default DOS filedate, but which also predates MS-DOS 1.0 by two years.
 In re Epstein, 32 F.3d 1559 (CAFC, 1994) (software products in public use or on sale, more than one year before filing date, based on descriptions of products in abstracts appearing after critical date).
 See MPEP § 2128 (Electronic Publications as Prior Art); EPO, T 1134/06 (Internet Citations), 16 Jan. 2007 (lengthy discussion of prior-art status of http://web.archive.org/web/20000620174023/http://www.boersenspiel.de); David Rogers, Documents on the Internet as Prior Art, 2 J. IP Law & Practice 354 (2007).
 SRI v. Internet Security Systems, 85 USPQ.2d 1489 (CAFC, 2008) (paper posted to un-secured, but also un-indexed, FTP site not prior art).
 However, there are searchable archives of open source code, including Black Duck’s http://www.koders.com (3.4 billion lines of open source code; now 21 billion lines at http://code.ohloh.net); http://codase.com (250 million lines of code); the now-defunct Google Code Search; and the Netlib/GAMS (Guide to Available Mathematical Software) repository at http://www.netlib.org/bib/gams.html and http://gams.nist.gov/.
 As noted in Stobbs, Software Patents § 5.09, CD-ROMs of such material is available, including the Microsoft Developer Network (MSDN; http://msdn.microsoft.com/en-us/subscriptions), the Dr. Dobb’s Developer Library DVD (http://store.ddj.com), and the MacTech DVD (http://www.mactech.com/cd/). However, these CDs and DVDs as shipped are generally not immediately searchable, as contents are often compressed and archived.
 Software Patent Institute at http://spi.org. Besides defensive publications, SPI’s collection also includes “computer manuals, older textbooks, journal articles, conference proceedings, computer science theses, and other such materials which may contain valuable prior art.” For reliability concerns about such material see E. Robert Yoches and Terry Callaghan, The Next Battle: New Forms of Software Prior Art, 2 U. Balt. Intell. Prop. L.J. 115 (1994), but the article predates In re Klopfenstein, 380 F.3d 1345 (CAFC, 2004) (slides mounted on poster boards for 2.5 days at conference are prior art, even though no copies disseminated).
 See Doron Swade, Preserving Software in an Object-Oriented Culture, in Edward Higgs ed., History of Electronic Artefacts 195-206 (1998), Software Preservation Group (http://www.softwarepreservation.org/); Computer History Museum, Mountain View, CA (e.g. dBase II software for Osborne 1 cataloged at http://www.computerhistory.org/collections/accession/102626801); software collection of Munich Computer Museum (http://www.computermuseum-muenchen.de/software.html); Charles Babbage Institute (http://www.cbi.umn.edu); and product-specific archives such as Dan Bricklin’s Visicalc site (http://www.bricklin.com/visicalc.htm).
 A notable exception is the superb PDP-10 software archive at http://pdp-10.trailing-edge.com/ in which files have been extracted from tape images, each file given its own web page, and contents even of executable files included on the page, enabling a Google search for strings (e.g. “Complex terms used in a relational other than EQ or NE”), leading to e.g. an old FORTRAN compiler containing that string (http://pdp-10.trailing-edge.com/k20v7b/01/field-image/fortra.exe.html). Another is Frank van Gilluwe’s FaultWire File Analysis Lookup at http://www.faultwire.com/solutions/file_solutions.php, indexed by filename, e.g. http://www.faultwire.com/file_detail/shimgvw.dll*1356.html, and containing some executable strings (e.g. “GDI+ file thumbnail extractor”).
 E.g., code at issue in Judge Jackson’s Findings of Fact in US v. Microsoft, 84 F.Supp.2d 9 (DDC, 1999) ¶¶ 90-92 (“Withholding Crucial Technical Information”) (Nov. 1999) can be found, with Aug. 1995 file date/timestamp, tucked away inside a ZIP archive file at http://replay.waybackmachine.org/19990209175113/http://www.microsoft.com/WIN32DEV/APIEXT/ RAS1244B.ZIP. Also see discussion of the Wayback Machine in EPO, T 1134/06 (Internet Citations), 16 Jan. 2007.
 See Internet Archive Gets DMCA Exemption to Help Archive Vintage Software, http://www.archive.org/about/dmca.php; Software Archive at http://www.archive.org/details/software; description of Classic Software Preservation Project (CLASP) and copyright issues at http://web.archive.org/web/20100722072350/http://www.archive.org/details/clasp; Sam Williams, Prowling the Ruins of Ancient Software, Salon, July 30, 2003, at http://www.salon.com/technology/feature/2003/07/30/software_archaeology.
 See http://www.archive.org/details/cdbbsarchive and http://www.archive.org/details/tucows; an especially useful archive of material from CDs of material from older computer “bulletin board” systems (BBS) is http://cd.textfiles.com/ (“The Past on Plastic”; “Who knew that the companies looking for a quick buck through the late 1980’s and early 1990’s with ‘Shovelware’ CDs would become the unwitting archivists of the BBS age?”).
 See Elizabeth Kaplan, A Response to “Preserving Software: Why and How,” 1 Iterations: An Interdisciplinary Journal of Software History 2-3 (Sept. 13, 2002), at http://www.cbi.umn.edu/iterations/kaplan.html.
 For example, the phrase “low order memory fragmenter” appears in source code accompanying an article in PC Magazine, March 28, 1995. The source code file was named 1MBFort.c, and can still be found compressed inside an archive file named 1mbfort.zip. If one already knows the 1mbfort.zip filename, it is easy to find the file. But if one had only the phrase “low order memory fragmenter,” how would one find the file in which it appeared (and thereby the date when it was public), without “deep” content indexing? (Or an explicit mention, as at http://oreilly.com/centers/windows/feat/softram/sr-start.html.)
 NSRL’s NSRLProd.txt and NSRLMfg.txt indicate a much larger collection than does http://web.archive.org/web/20100527095702/http://www.nsrl.nist.gov/index/prodname.index.txt.
 See the graphic of Sherlock Holmes throwing away files in http://www.nsrl.nist.gov/Documents/NSRL-CFS-April-2009.pdf.
 However, one NSRL project includes “Smart Unpacking” of files; see Benjamin Long, Smart Unpacking: Understanding Files Through Patterns (2009) at http://www.nsrl.nist.gov/Documents/aafs2009/blong-aafs2009.pdf); extracting individual code files from archives is crucial for large-scale use of software products as prior art.
 Nat’l Tractor Pullers v. Watkins, 205 USPQ 892 (N.Ill., 1980), though this relates largely to the need for contemporaneous corroboration of later oral testimony regarding earlier non-doc prior art (see also Woodland Trust v. Flowertree Nursery, 148 F.3d 1368 (Fed. Cir., 1968)).
 See so-called “secret prior art” (eventually printed patent applications, prior art as of the application date) in 35 USC 102 (e); Hazeltine v. Brenner, 382 US 252 (1965); and the inventor’s own secret prior activity, noted below.
 MPEP § 2128.01 (Level of public accessibility required) III (Internal documents intended to be confidential are not “printed publications”) (“Documents and items only distributed internally within an organization which are intended to remain confidential are not ‘printed publications’ no matter how many copies are distributed”, citing e.g. Garret v. US, 422 F.2d 874 (Ct. Cl., 1970)). Confidential source code is a perfect example, but publicly-sold object code produced from the source code, and any source fragments which find their way into the object code, is not.
 MPEP § 2128 (Examiner need not prove anyone actually looked at the document).
 Auld v. Chroma Graphics, 714 F.2d 1144 (CAFC, 1983) (method patentee’s secret method used to prepare samples offered for sale before critical date).
 Dunlop v. Ram Golf, 524 F.2d 33, 37 (7th Cir., 1975), discussed below.
 Gillman v. Stern, 114 F.2d 28 (2nd Cir., 1940) (L. Hand; secret method not discernible from public product); see also Gore v. Garlock, 721 F.2d 1540 (CAFC, 1983) (public could not have learned crucial parameters even from observing the machine, nominally held in confidence, much less from the output of the machine); Metallizing Engineering v. Kenyon Bearing & Auto Parts, 153 F.2d 516 (2nd Cir., 1946) (L. Hand, citing numerous cases of secret machines producing public output).
 But see the SABRE airline reservation system in Lockwood v. American Airlines, 107 F.3d 1563 (CAFC, 1997).
 Dunlop v. Ram Golf, 524 F.2d 33, 37 (7th Cir., 1975) (“In this case, for example, it is not unreasonable to assume that competing manufacturers of golf balls in search of a tough new material to be used as a cover, might make inquiries of Wagner’s Surlyn supplier that would soon reveal his secret ingredient”); a supplier inquiry is analogous to (and may even form part of) RE. See also Kewanee v. Bicron, 416 US 470, 491 (1974) (“If the invention, though still a trade secret, is put into public use, the competition is alerted to the existence of the inventor’s solution to the problem and may be encouraged to make an extra effort to independently find the solution thus known to be possible”). But note “still a trade secret”; see below.
 Philip Burke, The “Non-Informing Public Use” Concept and its Application to Patent-Trade Secret Conflicts, 45 Alb. L. Rev. 1060 (1980-81); Ami Patel, Advocating a Totality of the Circumstances Test to Analyze a Non-Informing Use of an Invention, 48 Wayne L. Rev. 1287 (2002-03).
 Hall v. Macneal, 107 US 90 (1883).
 Except for what are called “subjective secrets,” exemplified by “I’m thinking of a number between 1 and 1000.” See Arvin Quist, Security Classification of Information, Vol. 2, Principles for Classification of Information (1993), Ch. 2, Major Types of Classified Information – Subjective and Objective Secrets, at http://www.fas.org/sgp/library/quist2/chap_2.html. Even as to objective secrets, so-called “security through obscurity” works better in practice than many purists admit; see Collberg et al., Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection (2009).
 MPEP § 2112 (Requirements for rejection based on inherency; burden of proof), in particular, § 2112 (II) (Inherent feature need not be recognized at the time of the invention).
 Kewanee Oil v. Bicron, 416 US 470 (1974); F. Andrew Ubel, Who’s On First? – The Trade Secret Prior User or a Subsequent Patentee, 76 J. Pat & Trademark Off. Soc’y 401 (1994); Leslie Hill, Prior User Defense: The Road to Hell is Paved with Good and Bad Intentions, 10 Fed. Cir. B.J. 513 (2000-01).
 Compare Brenner v. Manson, 383 US 519 (1966) (no patent for invention without disclosure of utility; “a patent is not a hunting license”) with Oak Industries v. Zenith Electronics, 726 F.Supp. 1525 (N.D. Ill., 1989) (possibility of patent even when likely that invention was already known to industry).
 Uniform Trade Secrets Act § 1 (4); California omits this phrase from its UTSA adoption Cal. Civ. Code § 3426, but ready ascertainability is a defense, and may affect damages.
 California’s UTSA explicitly notes that “Reverse engineering or independent derivation alone shall not be considered improper means.” California UTSA, Cal. Civ. Code § 3426.
 See Fig. 1 in Part 1, New Matter, Summer 2011 at 29.
 Compare In re Robertson, 169 F.3d 743 (CAFC, 1999) (inherency “may not be established by probabilities or possibilities. The mere fact that a certain thing may result from a given set of circumstances is not sufficient”) with Schering Corp. v. Geneva Pharm., 339 F.3d 1373 (CAFC, 2003) (“inherent anticipation does not require that a person of ordinary skill in the art at the time would have recognized the inherent disclosure”).
 E.g., Geoff Chappell’s unraveling of Microsoft’s deliberately-obfuscated AARD code; see Schulman, Examining the Windows AARD Detection Code, Dr. Dobb’s Journal, Sept. 1993; and Chappell letter to editor, Dr. Dobb’s Journal, Jan. 1994.
 Application of Glenn T. Seaborg, 328 F.2d 996 (CCPA, 1964) (“the claimed product, if it was produced in the Fermi process, was produced in such minuscule amounts and under such conditions that its presence was undetectable”).
 See duPont v. Christopher, 431 F.2d 1012 (5th Cir., 1970) (trade secret not lost when DuPont failed to take unreasonable security precautions to prevent aerial photography of chemical plant under construction). As one data point on RE costs, see Honeynet Project, Reverse Challenge Results, Time/Cost Analysis (2002) at http://old.honeynet.org/reverse/results/#cost.
 One RE organization’s definition of RE even emphasizes this: “Reverse engineering is the process of generating new information about software…” (http://www.program-transformation.org/Transform/UVicReverseEngineeringTutorial). When a third-party analyst discovers a security bug in another’s product, it’s debatable that this is the creation of new information, because the bug already existed, though (hopefully) unknown to the vendor. Whether RE can create as well as discover facts is related to the copyright issue of “created facts,” including academic response to the naïve view of facts in Feist v. Rural Telephone, 499 US 340 (1991), and “numeric expression” in cases such as CDN v. Kapes, 197 F.3d 1256 (1999) and NYMEX v. Intercontinental Exchange, 497 F.3d 109 (2nd Cir., 2007). Justin Hughes, Created Facts and their Awkward Place in Copyright Law, in Brauneis ed., Intellectual Property Protection of Fact-Based Works 186 (2009).
 E.g., Abbott Labs v. Geneva Pharmaceuticals, 182 F.3d 1315 (Fed. Cir., 1999) (earlier third-party sales of a product no one, not even the seller, knew at the time happened to include the later-to-be-disputed compound; only years later, in litigation, was it revealed the compound had in fact been on sale before plaintiff’s patent application; even the seller of a product need not have recognized the significance of its product, for that product to constitute prior art, if the product is later found to inherently possess all claim limitations).
 MPEP § 2128 (“Printed Publications” as Prior Art; Electronic Publications as Prior Art); electronic content can qualify as a “printed publication” because this term of art requires neither printing nor publication in the traditional sense, requiring rather dissemination and availability. In re Wyer, 655 F.2d 221, 226 (CCPA, 1981); Joseph Root, Rules of Patent Drafting (2011), Ch. 13 (Printed Publication).
 In re Donohue, 766 F.2d 531 (CAFC, 1985) (mere naming of claimed compounds would be insufficient to constitute enabling disclosure, but article in J. Chem. Soc’y sufficiently describes invention because POSITA could have combined with his own knowledge to make claimed invention); MPEP § 2121.01 (Use of prior art in rejections where operability Is in question).
 Compare In re Hall, 781 F.2d 897 (Fed. Cir., 1986) (doctoral thesis, indexed in library catalog, is prior art) with In re Cronyn, 890 F.2d 1158 (Fed. Cir., 1989) (doctoral thesis, only indexed by student name in shoebox in dept. library, is not prior art).
 Egbert v. Lippmann, 104 US 333 (1881), concerning the patentee’s own public use; for a third-party analogy, see Baxter Int’l v. COBE Laboratories, 88 F.3d 1054 (CAFC, 1996) (Newman J dissent: “Heretofore, § 102(e) was the only source of so-called ‘secret prior art’”; majority allows “unknown, private laboratory work to create a new bar to patentability”).
 E.g., “Except to the extent expressly permitted by the laws of the jurisdiction where you are located, you may not analyze, decompile, disassemble or otherwise reverse engineer the Software.” http://www.teac.co.jp/dspd/download/driver_agreement.html. Compare with “If you do not agree with these terms and conditions, then you may not read … information on this Site.” http://www.caseta.com/legal.html.
 114 F.2d 28 at 30.
 Rosaire v. Baroid Sales, 218 F.2d 72, 74 (5th Cir., 1955) (“The work was performed in the field under ordinary conditions without any deliberate attempt at concealment or effort to exclude the public and without any instructions of secrecy to the employees performing the work”).
 See Mark Flagel and Steven Weiner, Existing Programs Could Be Considered Prior Art; Publicly Available Software with a Ban on Reverse-Engineering May Invalidate Future Registrations, 17 National Law Journal C32 (May 8, 1995); Flagel and Weiner, Trade Secret Software As Prior Art: Litigation Strategies, 11 Computer Lawyer 8 (Dec. 1994); George Gates, Trade Secret Software: Is It Prior Art?, 6 Computer Lawyer 11 (Aug. 1989).
 See e.g. Vault v. Quaid, 847 F.2d 255 (5th Cir., 1988), but note shrinkwrap license enforceability in other contexts, e.g. ProCD v. Zeidenberg, 86 F.3d 1447 (7th Cir., 1996); see also Jonathan Brand and Masanobu Katoh, Interfaces on Trial 2.0 (2011); Pamela Samuelson and Suzanne Scotchmer, The Law and Economics of Reverse Engineering, 111 Yale L.J. 1575 (2001-2).
 http://mousecloner.com/eula.html; similarly, Microsoft: “You may not reverse engineer, decompile, or disassemble the Software, except and only to the extent that such activity is expressly permitted by applicable law notwithstanding this limitation.” http://www.microsoft.com/windowsxp/eula/home.mspx.
 Cal. Civ. Code § 3426.1 (a) (“Reverse engineering or independent derivation alone shall not be considered improper means”); the “alone” is important, and assumes one has legal access to the item being reverse engineered.
 See e.g. Microsoft’s assertion of trade-secret misappropriation against Stac Electronics, in counter-claim to Stac patent infringement suit, based on Stac’s reverse engineering of preload interface during a beta test of MS-DOS 6.0, for which Stac signed (Dr. Dobb’s Journal, May 1994; http://drdobbs.com/184409244?queryText=stac). If the beta itself is widespread, though (some Microsoft beta tests are larger and more public than many product releases from other companies), or if the vendor engages in extensive publicity for the beta, it may look more like an early product release than a product test intended to be maintained in confidence.
 MPEP § 2128.01.
 Del Mar Engineering v. US, 524 F.2d 1178 (CAFC, 1975); Lockheed v. US, 553 F.2d 69 (Ct. Cl., 1977)
 Of course, once a secret is disclosed, it is no longer a secret, even if the disclosure was improper. Religious Technology Center v. Lerma, 908 F.Supp. 1362 (DC, ED Virg., 1995); in particular, the 102(b) bar still operates under misdisclosure: Evans Cooling v. GM, 125 F.3d 1448 (CAFC, 1997).
 Siemens-Elema AB v. Puritan-Bennett Corp., 1989 U.S. Dist. LEXIS 16609, 5-7 (S.D. Cal., 1989).
 Constant v. AMD, , 848 F.2d 1560 (CAFC, 1988). But see National Semiconductor v. Linear Technology, 703 F. Supp. 845 (N.D. Cal., 1988) (widespread “grabbing of papers” at conference and showing them to competing companies, contradicting an implied confidentiality policy did not constitute publication).
 Ex parte ePlus, BPAI Appeal 2010-007804, Reexamination Control 90/008,104, Patent 6,023,683 (May 2011) at http://des.uspto.gov/Foia/ReterivePdf?system=BPAI&flNm=fd2010007804-05-18-2011-1.
 405 F.Supp.2d 479 (D. Del., 2005).
 Lough v. Brunswick, 86 F.3d 1113 (CAFC, 1996) (extent of control inventor maintained over testing is “critically important”).
 Thus, there is not a copy of the software as such, except in some cases a temporary copy created during software analysis, and likely existing for less than the 1.2 seconds noted in Cartoon Network v. CSC, 536 F.3d 121 (2nd Cir., 2008); but see the older MAI v. Peak, 991 F.2d 511 (9th Cir., 1993) (loading program into RAM is making a copy; congressionally overruled for computer repair in 17 USC 117).
 Nichols v. Universal Pictures, 45 F.2d 119 (2nd Cir., 1930) (L. Hand), applied to software in Computer Associates v. Altai, 982 F.2d 693 (2nd Cir., 1993) (elements dictated by efficiency, dictated by external factors, and taken from public domain; the court states that its abstractions test resembles RE).
 Lotus v. Borland, 49 F.3d 807 (1st Cir., 1995).
 Field v. Google, 412 F. Supp. 2d 1106 (D. Nev. 2006); Parker v. Google, 422 F. Supp. 2d 492 (E.D. Pa. 2006) (characterizing Google as indiscriminately collecting material). A potential difference is that Google’s raw material (with the important exception of Google Books) is already online, with an implied license for indexing, if the material is linked-to by another page (i.e., part of the “web”), and not excluded by a robots.txt file. eBay v. Bidder’s Edge, 100 F.Supp.2d 1058 (ND Cal., 2000). It might be argued that there is less fair use for unpublished works (Harper & Row v. Nation, 471 US 539 (1975)), but software products are published, albeit written in an obscure language.
 A software patent owner suspecting infringement “may have no choice but to reverse engineer the rival’s software in order to gain the evidence it needs to file suit.” Julie Cohen and Mark Lemley, Patent Scope and Innovation in the Software Industry, 89 Cal. L. Rev. 1, 28 (2001). The same point has been made regarding impact of DMCA anti-circumvention on litigant’s ability to research their cases. Jeffrey Sullivan and Thomas Morrow, Practicing Reverse Engineering in an Era of Growing Constraints under the Digital Millennium Copyright Act and Other Provisions, 14 Alb. L.J. Sci. & Tech. 1, 6 (2003).