The COOPY Toolbox: The COOPY Toolbox: Applying highlighter diffs using sspatch

The COOPY toolbox can report the difference between tables in a number of formats, one of which is called the "hilite" format.

If someone sends you such a report, what can you do with it?

How to recognize a highlighter diff

A good clue that you are looking at a highlighter diff is if you see annotations like "+++", "---", or "->" in the first column of one or more tables in a spreadsheet. Here is such a table:

@@	bridge	designer	length
	Brooklyn	J. A. Roebling	1595
+++	Manhattan	G. Lindenthal	1470
->	Williamsburg	D. Duck->L. L. Buck	1600
	Queensborough	Palmer & Hornbostel	1182
	Triborough	O. H. Ammann	1380,383
...	...	...	...
	Throgs Neck	O. H. Ammann	1800
	George Washington	O. H. Ammann	3500
---	Spamspan	S. Spamington	10000

A highlighter diff shows the differences between two versions of a spreadsheet. It says what should be done to the first version of the spreadsheet to produce the second - by adding rows, removing rows, changing cell values, etc. The diff does not generally show all of the original spreadsheet, just parts that have changed and their immediate surroundings. See: Specification of the highlighter diff format.

Applying all changes from a highlighter diff

Highlighter diffs can be applied to a variety of database and spreadsheet formats using the sspatch program available in the COOPY toolbox. Suppose we have received the highlighter diff as a file "diff.xls". We can do any of:

sspatch --inplace my_table.xlsx diff.xls
sspatch --inplace my_table.ods diff.xls
sspatch --inplace my_table.gnumeric diff.xls
sspatch --inplace my_table.csv diff.xls
sspatch --inplace my_table.sqlite diff.xls
sspatch --inplace dbi:mysql:my_database:username=root diff.xls
...

See sspatch documentation for options.

Applying just some changes from a highlighter diff

Delete any rows from the diff that you don't want (or set the first column to empty). Then apply as normal.

Converting to another format

A highlighter diff can be converted to other formats, using ssrediff. Here's an SQL translation of the above example:

INSERT INTO sheet (bridge, designer, length) VALUES ('Manhattan', 'G. Lindenthal', '1470');
UPDATE sheet SET designer='L. L. Buck' WHERE bridge='Williamsburg' AND designer='D. Duck' AND length='1600';
DELETE FROM sheet WHERE bridge='Spamspan' AND designer='S. Spamington' AND length='10000';

(produced with ssrediff --format sql diff.xls).

Here's a translation to TDIFF, the preferred format for the COOPY toolbox:


@@@ sheet

@ |bridge=|designer=|length=|
* |Brooklyn|J. A. Roebling|1595|
+ |Manhattan|G. Lindenthal|1470|
= |Williamsburg|D. Duck->L. L. Buck|1600|
- |Spamspan|S. Spamington|10000|

(produced with ssrediff --format tdiff diff.xls).

Here's a translation to "ROWOPS" format, a simple tabular representation of changes this can be useful if you don't have to worry about NULLs.

name	op	bridge0	designer0	length0	bridge1	designer1	length1
sheet	context	Brooklyn	J. A. Roebling	1595
sheet	insert				Manhattan	G. Lindenthal	1470
sheet	update	Williamsburg	D. Duck	1600		L. L. Buck
sheet	delete	Spamspan	S. Spamington	10000

(produced with ssrediff --format ops diff.xls).

There is a more elaborate tabular representation (--format csv) that deals with NULLs, schema changes, etc. but is less easy to read.

Accessing diffs programmatically

The programs in the COOPY toolbox are a thin wrapper around a library written in C++ that can be accessed from other languages. See the top level BUILD.txt file of the source code. Currently Python and Ruby are supported. For example, this code in python:

import coopy

class TableDiff(coopy.Patcher):
    
    def changeRow(self,change):
        print("Row change of type '%s'"%change.modeString())
        for c in change.cond.keys():
            print("* condition: %s = %s"%(c,change.cond[c].toString()))
        for v in change.val.keys():
            print("* value: %s -> %s"%(v,change.val[v].toString()))
        print("")
        return True

flags = coopy.CompareFlags()
differ = TableDiff()
parser = coopy.PatchParser(differ,"diff.xls",flags)
parser.applyPatch()

Will produce:

Row change of type 'context'
* condition: bridge = Brooklyn
* condition: designer = J. A. Roebling
* condition: length = 1595

Row change of type 'insert'
* value: bridge -> Manhattan
* value: designer -> G. Lindenthal
* value: length -> 1470

Row change of type 'update'
* condition: bridge = Williamsburg
* condition: designer = D. Duck
* condition: length = 1600
* value: designer -> L. L. Buck

Row change of type 'delete'
* condition: bridge = Spamspan
* condition: designer = S. Spamington
* condition: length = 10000

The equivalent code in Ruby is:

require 'coopy'

class TableDiff < Coopy::Patcher
  
  def changeRow(change)
    puts "Row change of type '%s'"%change.modeString()
    for c in change.cond.keys()
      puts "* condition: %s = %s"%[c,change.cond[c].toString()]
    end
    for c in change.val.keys()
      puts "* value: %s -> %s"%[c,change.val[c].toString()]
    end
    puts ""
    true
  end
  
end

flags = Coopy::CompareFlags.new()
differ = TableDiff.new()
parser = Coopy::PatchParser.new(differ,"diff.xls",flags)
parser.applyPatch()

The same code can read TDIFF format diff files.