UnifiedDiff

Produce a unified diff of two strings.

Algorithm

Diffs are computed using the classic dynamic-programming LCS (longest common subsequence) approach:

  1. Build an (m+1) × (n+1) DP table over the two line arrays.
  2. Backtrack through the table to produce a sequence of Keep, Remove, and Add edits. The backtracking is iterative (tail-recursive with an accumulator) to avoid stack overflows on large files.
  3. Group nearby changes into hunks, each padded with up to three lines of context. Change groups whose context windows overlap are merged into a single hunk.
  4. Render each hunk with a @@ -a,b +c,d @@ header.

Time complexity is O(mn) in the size of the two inputs; space complexity is the same for the DP table.

Limitations

The DP table itself allocates O(mn) memory, so comparing very large files (thousands of lines each) will be slow. For most source files and configuration files encountered in practice this is not an issue.

Functions

unifiedDiffStrings : String -> String -> String

Compute a unified diff between two strings. Each string is split on newlines and compared line by line. Returns the diff as a String, or an empty String if the inputs are identical.

The output follows the standard unified diff format:

    @@ -1,4 +1,4 @@
     context line
    -removed line
    +added line
     context line
unifiedDiffArrays : Array String -> Array String -> String

Compute a unified diff between two Arrays of Strings, where each string is a single line of text. Returns the diff as a String, or an empty String if the inputs are identical.

The output follows the standard unified diff format:

    @@ -1,4 +1,4 @@
     context line
    -removed line
    +added line
     context line