New in version 2.1.
Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear.
Each line of a Differ delta begins with a two-letter code:
| Code | Meaning | 
|---|---|
| '- ' | line unique to sequence 1 | 
| '+ ' | line unique to sequence 2 | 
| '  ' | line common to both sequences | 
| '? ' | line not present in either input sequence | 
Lines beginning with `? ' attempt to guide the eye to
  intraline differences, and were not present in either input
  sequence. These lines can be confusing if the sequences contain tab
  characters.
This class can be used to create an HTML table (or a complete HTML file containing the table) showing a side by side, line by line comparison of text with inter-line and intra-line change highlights. The table can be generated in either full or contextual difference mode.
The constructor for this class is:
| [tabsize][, wrapcolumn][, linejunk][, charjunk]) | 
Initializes instance of HtmlDiff.
tabsize is an optional keyword argument to specify tab stop spacing
    and defaults to 8.
wrapcolumn is an optional keyword to specify column number where
    lines are broken and wrapped, defaults to None where lines are not
    wrapped.
linejunk and charjunk are optional keyword arguments passed
    into ndiff() (used by HtmlDiff to generate the
    side by side HTML differences).  See ndiff() documentation for
    argument default values and descriptions.
The following methods are public:
| fromlines, tolines [, fromdesc][, todesc][, context][, numlines]) | 
fromdesc and todesc are optional keyword arguments to specify from/to file column header strings (both default to an empty string).
context and numlines are both optional keyword arguments.
    Set context to True when contextual differences are to be
    shown, else the default is False to show the full files.
    numlines defaults to 5.  When context is True
    numlines controls the number of context lines which surround the
    difference highlights.  When context is False numlines
    controls the number of lines which are shown before a difference
    highlight when using the "next" hyperlinks (setting to zero would cause
    the "next" hyperlinks to place the next difference highlight at the top of
    the browser without any leading context).
  
| fromlines, tolines [, fromdesc][, todesc][, context][, numlines]) | 
The arguments for this method are the same as those for the make_file() method.
Tools/scripts/diff.py is a command-line front-end to this class and contains a good example of its use.
New in version 2.4.
| a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm]) | 
Context diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by n which defaults to three.
By default, the diff control lines (those with *** or ---)
  are created with a trailing newline.  This is helpful so that inputs created
  from file.readlines() result in diffs that are suitable for use
  with file.writelines() since both the inputs and outputs have
  trailing newlines.
For inputs that do not have trailing newlines, set the lineterm
  argument to "" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for fromfile, tofile, fromfiledate, and tofiledate. The modification times are normally expressed in the format returned by time.ctime(). If not specified, the strings default to blanks.
Tools/scripts/diff.py is a command-line front-end for this function.
New in version 2.3.
| word, possibilities[, n][, cutoff]) | 
Optional argument n (default 3) is the maximum number
  of close matches to return; n must be greater than 0.
Optional argument cutoff (default 0.6) is a float in
  the range [0, 1].  Possibilities that don't score at least that
  similar to word are ignored.
The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
['apple', 'ape']
>>> import keyword
>>> get_close_matches('wheel', keyword.kwlist)
['while']
>>> get_close_matches('apple', keyword.kwlist)
[]
>>> get_close_matches('accept', keyword.kwlist)
['except']
| a, b[, linejunk][, charjunk]) | 
Optional keyword parameters linejunk and charjunk are
  for filter functions (or None):
linejunk: A function that accepts a single string
  argument, and returns true if the string is junk, or false if not.
  The default is (None), starting with Python 2.3.  Before then,
  the default was the module-level function
  IS_LINE_JUNK(), which filters out lines without visible
  characters, except for at most one pound character ("#").
  As of Python 2.3, the underlying SequenceMatcher class
  does a dynamic analysis of which lines are so frequent as to
  constitute noise, and this usually works better than the pre-2.3
  default.
charjunk: A function that accepts a character (a string of length 1), and returns if the character is junk, or false if not. The default is module-level function IS_CHARACTER_JUNK(), which filters out whitespace characters (a blank or tab; note: bad idea to include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
...              'ore\ntree\nemu\n'.splitlines(1))
>>> print ''.join(diff),
- one
?  ^
+ ore
?  ^
- two
- three
?  -
+ tree
+ emu
| sequence, which) | 
Given a sequence produced by Differ.compare() or ndiff(), extract lines originating from file 1 or 2 (parameter which), stripping off line prefixes.
Example:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
...              'ore\ntree\nemu\n'.splitlines(1))
>>> diff = list(diff) # materialize the generated delta into a list
>>> print ''.join(restore(diff, 1)),
one
two
three
>>> print ''.join(restore(diff, 2)),
ore
tree
emu
| a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm]) | 
Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a inline style (instead of separate before/after blocks). The number of context lines is set by n which defaults to three.
By default, the diff control lines (those with ---, +++,
  or @@) are created with a trailing newline.  This is helpful so
  that inputs created from file.readlines() result in diffs
  that are suitable for use with file.writelines() since both
  the inputs and outputs have trailing newlines.
For inputs that do not have trailing newlines, set the lineterm
  argument to "" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for fromfile, tofile, fromfiledate, and tofiledate. The modification times are normally expressed in the format returned by time.ctime(). If not specified, the strings default to blanks.
Tools/scripts/diff.py is a command-line front-end for this function.
New in version 2.3.
| line) | 
| ch) | 
See Also: