Source code ch.17: Indexing

Chapter 17: Indexing source code

17.1 Why source code generally should be indexed before searching

  • Huge volume of source code typically produced in litigation; searching is too slow without prior indexing
  • Indexing can create new information in its own right: counting; frequency of occurrence
  • “Searching by counting”
  • “Searching by sorting”
  • Tokenization
  • Normalization (e.g. replacing all names with XXX, for code comparisons)
  • Identifying key unique terms and phrases in code (similar to Amazon SIPs, “statistically improbable phrases”)

17.2 Indexing with standard tools

17.3 Creating an index using scripting languages commonly available on locked-down source code machines



Print Friendly, PDF & Email