Software Litigation Consulting

Andrew Schulman
Consulting Technical Expert & Attorney

Software Litigation Consulting

Software Litigation Consulting (Andrew Schulman; undoc@sonic.net) specializes in source code examination, software reverse engineering, pre-filing investigation of software/internet patent infringement, the preparation of claim charts (infringement, non-infringement, invalidity, and validity contentions), and research in software prior art. Currently working with Claude to build a CodeExam tool, and using AI chatbots to finish work on a Patent Litigation book.

Too busy working with Claude (Opus 4.5 and 4.6) since Jan. 22, 2026, to write up the experience of developing large CodeExam tool with this shockingly-good collaborator, nor to write about the CodeExam tool itself. For now, just sharing some of the chat transcripts (sometimes with a claude.ai/share URL, other times a txt file), most recent first. I’ll soon describe CodeExam, and provide some annotations to the Claude chat transcripts below, including looking at Claude’s “chain of thought” and tool use:

  • CodeExam GUI design with HTMX or XMLUI” (in progress)
  • CodeExam Node.js port development roadmap” (txt)
    • Indexing of source-code tree from zip/tar/gz/etc. archive (7z to come).
    • Indexing of binary executable files (exe, dll, sys, so, pyd, etc.), with strings (and demangled C++ function signatures) turned into pseudo-functions in .op files.
    • Class::method disambiguation for callees; nice test_disambiguation.js covering about a dozen scenarios; a few issues still TODO.
    • –multisect-search (and therefore related multisect-analyze, claim-search, claim_analyze) now looks for classes in which substantially all terms appear; so preference is now for hits within single function, then hits across multiple methods within single class (possibly distributed over multiple fles), then hits across multiple functions in same file, and finally hits across multiple files in same folder.
    • –call-inventory command to list all calls, including to an external source not in the codebase (RTL, API, missing code, etc.); basically a call-target symbol table; joins –entry-points and –gaps (list code implemented without a caller in the codebase).
    • Continued testing on massive codebases
  • Vocabulary-guided patent claim term extraction” (txt)
    • Using the codebase’s vocabulary (from –discover-vocabulary) to guide extraction of terms from patent claim text.
    • Testing extraction of terms with local LLM, Qwen Coder, and use of –vocab-tight for local LLMs.
    • New –follows-calls and –deep commands to have –extract show not only named function, but also its callees.
    • New –comments-only option to –extract command, to show only full-line comments from a function; works with –follows-calls to generate nice trees of major source-code comments; all followed by a caveat that says “Tip: Comments can lie! Verify against actual code logic.”
  • CodeExam Node.js phase 8a complete, phase 8b planning” (txt)
    • Continuing port of code from Python to Node.js
    • Finish –multisect-search command (looks for interection of all search terms, in narroest location: first in function, then across multiple functions in same file/class, then in multiple files in same folder; includes NOT terms, regex).
    • Finish –multisect-analyze command (takes top 2 functions returned from multisect-search, and passes to LLM for analysis; for air-gapped source-code analysis under PO, can use local LLM; right now, using Qwen-Coder: qwen2.5-coder-7b-instruct-q4_k_m.gguf).
    • Finish –claim-analyze command (takes patent claim text, passes to LLM to extract terms [both tight and broad, with synonyms], then does –multisect-analyze with tight and broad terms to get LLM analysis of top 2 functions from multisect-search; LLM analysis done in context of patent claim text and extracted tight/broad terms).
    • End to end pipeline now working with local LLM: patent claim text -> local LLM extracts terms -> multisect search to locate two best matching functions matching terms -> pass functions + original claim text to local LLM -> get back analysis.
    • Comparing analyses from local DeepSeek-Coder vs. local Qwen-Coder vs. Claude.
    • Qwen-Coder is current favorite local LLM model for code, over DeepSeek-Coder and CodeLlama.
    • Nice discussion of “the wonder of local LLMs” (how an essentially static 4GB file of floating-point numbers can input English + code, and can output an analysis in English of the code; more surprising than fact that an online LLM like Claude, with massive infrastructure behind it, can do this; discussion of node-llama-cpp).
    • Both PY and JS code include lengthy English-language prompts for LLMs.
    • Initial JS port of function masking from PY (right now only removing comments from code before passing to LLM; in full PY implementation, masking almost all names/symbols to force LLM analysis to rely on code, not lean on comments/naming)
    • Miscellaneous fixes re: –discover-vocabulary, –list-functions, –extract-function
    • Discussion of using –discover-vocabulary feature to extract terms from claim similar to source code’s nomenclature.
  • Wrapping up multisect-search phase 5 port” (txt)
    • Porting –multisect-search feature from PY to JS (identify functions or files or folders meeting substantially all of a list of semicolon-delimited terms, including synonyms for terms, and NOT terms)
    • Add –in filter to search-related commands, e.g. search for pivot –in excel
    • “Term selectivity” in multisect search, compared to TF-IDF in –discover-vocabulary
    • Show correct function/file/class context for search hits
    • Modify search to look in path/filename as well as file contents
  • Node.js port continuation and feaure implementation summary” (txt)
    • Porting dupe and de-duplication features from PY to JS
    • Four levels of dupe testing: verbatim (SHA1-hashed) files; verbatim (SHA1-hashed) function bodies; “structurally-identical” function bodies (using Abstract Syntax Tree [AST], without operands, as structure); “near-dupes” (functions with identical filename/size, but SHA1 of function body differs)
    • –struct-diff and –struct-diff-all to show changes across structurally-identical but not verbatim-identical functions, and across near-dupes
    • Improved class::method detection, including inferring class from calls-to, even without declaration (works for . as well as :: separator)
    • Improved language-specific regex (but will be moving to tree-sitter, already worked well in PY)
  • Converting Python code to Node.js
    • Claude creates 9-phase strategy for Node.js port (noting existing JSON-based persistence allows direct compatibility between Python-built indexes and the new Node.js tool).
    • Delivery of Phase 1 and 2 of the JS port (core index building, literal/inverted search, call-tree, and file-map functionality, maintaining zero external npm dependencies).
    • Overcoming Node.js hard size limits (bypassing 536MB string and 2GB buffer limits by building a chunked `FileScanner` and streaming JSON parser to load massive 5GB+ Chromium source-code indexes).
    • Prioritizing Phase 6 Interactive REPL (keeping multi-gigabyte indexes in memory across multiple slash commands like `/hotspots` and `/classes` to avoid long load times).
    • Fixing V8 heap out-of-memory errors (keeping the 2.9GB inverted index on disk to stream on-demand, introducing lazy parsing and result caching for instant subsequent queries).
    • Fixing regex patterns for C++ parsing (capturing Chromium’s multi-line method signatures in .cc files where return types or arguments wrap across lines).
    • Planning for future features (–discover-vocabulary for sweet-spot token filtering to find domain jargon, and loading zip/tar archive files).
  • Code exam claim search parameter issue” (txt)
    • Discussion of ChromaDB semantic search limitations (explaining why standard natural language embedding models perform poorly on source code without intermediate AI summarization).
    • Rebuilding ce_analyze.py (restoring the full 2,000-line pipeline from analyze_bridge.py to bring back the automated extraction and analysis features).
    • Successful end-to-end pipeline validation (confirming the full flow of passing claim text, extracting terms via LLM, running multisect search, and having the LLM accurately analyze the resulting functions).
    • Clarification of –claim-analyze design intent (resolving the architectural difference between the original “manual” workflow of passing a known function name to the LLM, versus fully-automated pipeline that searches for code at least in same ballpark as patent claim text, auto-picks the top hit, and analyzes that)
  • Refactoring code search into modular architecture
  • Collaborative software source code investigation
    • Claude and I use the CodeExam tool to answer a question about how Moltbook authenticates its users are bots, not humans; we end up at Twitter/X OAuth
  • call-tree and file-map implementation
  • Connecting extract and analyze pipeline with command variants
  • Air-gapped tool demo index and search-analyzer integration
  • Updating TODO master file incrementally
  • Code exam, part 4
  • Air-gapped source code analysis tool development” (part 3)
  • Code exam, part 2
  • Code exam, part 1b
  • Offline code analysis with local LLM and RAG

AI chatbot front-end to this web site built with Google NotebookLM (NBLM):

AI “Chatbook” (using Google NotebookLM) for Patent Litigation book by Andrew Schulman

PatClaim.ai (forthcoming; some areas being implemented and/or investigated include):

PatLitig.ai (forthcoming; some areas being implemented and/or investigated include):

RE/AI: Reverse engineering, Source code examination, and AI:

  • See ai-reverse-engineering-source-code-examination/
  • Two major topics here:
    • Reverse engineering/source code examination, using AI-based tools,
    • Changes to standard reverse engineering and source-code examination methodologies, when examining AI-based software
  • The name RevEng.ai is already taken (“Reverse Engineering meets Artificial Intelligence / AI powered Binary Analysis platform for Reverse Engineering and Malware Analysis”)

Services: source code examination & software reverse engineering

For contact information, see About Us or send email.

Somewhat-recent CV available here; a confidential list of past cases and clients is available under a non-disclosure agreement.

Sample source-code examination and reverse engineering cases and projects here.

LinkedIn profile here

Partial collection at academia.edu of drafts and (mostly older) articles