" lines in HTML files). That may be because this is the only method of the 3 that has a *concept* of "junk" . Example, comparing two strings, and considering blanks to be "junk": >>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;") >>> .ratio() returns a float in [0, 1], measuring the "similarity" of the sequences. As a rule of thumb, a .ratio() value over 0.6 means the sequences are close matches: >>> print(round(s.ratio(), 3)) 0.866 >>> If you're only interested in where the sequences match, .get_matching_blocks() is handy: >>> for block in s.get_matching_blocks(): ... print("a[%d] and b[%d] match for %d elements" % block) a[0] and b[0] match for 8 elements a[8] and b[17] match for 21 elements a[29] and b[38] match for 0 elements Note that the last tuple returned by .get_matching_blocks() is always a dummy, (len(a), len(b), 0), and this is the only case in which the last tuple element (number of elements matched) is 0. If you want to know how to change the first sequence into the second, use .get_opcodes(): >>> for opcode in s.get_opcodes(): ... print("%6s a[%d:%d] b[%d:%d]" % opcode) equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:29] b[17:38] See the Differ class for a fancy human-friendly file differencer, which uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines. See also function get_close_matches() in this module, which shows how simple code building on SequenceMatcher can be used to do useful work. Timing: Basic R-O is cubic time worst case and quadratic time expected case. SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear. Nc ` || _ dx| _ | _ || _ | j || y)a! Construct a SequenceMatcher. Optional arg isjunk is None (the default), or a one-argument function that takes a sequence element and returns true iff the element is junk. None is equivalent to passing "lambda x: 0", i.e. no elements are considered to be junk. For example, pass lambda x: x in " \t" if you're comparing lines as sequences of characters, and don't want to synch up on blanks or hard tabs. Optional arg a is the first of two sequences to be compared. By default, an empty string. The elements of a must be hashable. See also .set_seqs() and .set_seq1(). Optional arg b is the second of two sequences to be compared. By default, an empty string. The elements of b must be hashable. See also .set_seqs() and .set_seq2(). Optional arg autojunk should be set to False to disable the "automatic junk heuristic" that treats popular elements as junk (see module documentation for more information). N)isjunkabautojunkset_seqs)selfr r r r s r __init__zSequenceMatcher.__init__x s/ v ar c H | j | | j | y)zSet the two sequences to be compared. >>> s = SequenceMatcher() >>> s.set_seqs("abcd", "bcde") >>> s.ratio() 0.75 N)set_seq1set_seq2)r! r r s r r zSequenceMatcher.set_seqs s a ar c L || j u ry|| _ dx| _ | _ y)aM Set the first sequence to be compared. The second sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq1("bcde") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq2(). N)r matching_blocksopcodes)r! r s r r$ zSequenceMatcher.set_seq1 s( * ;.22t|r c z || j u ry|| _ dx| _ | _ d| _ | j y)aM Set the second sequence to be compared. The first sequence to be compared is not changed. >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.set_seq2("abcd") >>> s.ratio() 1.0 >>> SequenceMatcher computes and caches detailed information about the second sequence, so if you want to compare one sequence S against many sequences, use .set_seq2(S) once and call .set_seq1(x) repeatedly for each of the other sequences. See also set_seqs() and set_seq1(). N)r r' r( fullbcount_SequenceMatcher__chain_b)r! r s r r% zSequenceMatcher.set_seq2 s9 * ;.22t|r c < | j }i x| _ }t | D ]( \ }}|j |g }|j | * t x| _ }| j }|r9|j D ] } || s|j | |D ] }||= t x| _ }t | } | j rQ| dk\ rK| dz dz } |j D ]% \ }}t | | kD s|j | ' |D ] }||= y y y )N d )r b2j enumerate setdefaultappendsetbjunkr keysaddbpopularlenr items)r! r r0 ieltindicesjunkr popularnntestidxss r __chain_bzSequenceMatcher.__chain_b s FF3lFAsnnS"-GNN1 # E! Txxz#;HHSM " H #&%' F==Q#XHqLE YY[ Tt9u$KK$ ) H &=r c | j | j | j | j j f\ }}}}|t | }|t | }||d}} } i }g } t || D ]e }|j }i }|j || | D ]; }||k r ||k\ r n. ||dz d dz x}||<