Edit distance

wikipedia:Edit distance
编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
sitten (k->s)
sittin (e->i)
sitting (->g)
所以kitten和sitting的编辑距离是3。俄罗斯科学家Vladimir Levenshtein在1965年提出这个概念。

The problem is easy if you know the state transition equation.

//From 51nod
we define the function same(i,j) to be 0 if S[i] == T[j], otherwise 1.
Let us show the recursion:
f(i,j) = min(f(i – 1, j – 1) + same(i,j), f(i – 1,j ) + 1, f(i, j – 1) + 1)
initial value:
f(0, j) = j
f(i, 0) = i

The tutorial above 51nod is really good.Nou just LCS or Edit distance,but every tutorial there.
All the follwing are from 51nod.
The difficulty of this problem is that it is difficult to have such operations as "add" and "delete", which is very troublesome. Let's try to understand the problem from a different angle and see it as a string alignment problem. In fact, we can understand the problem from the perspective of bioinformatics comparison genes.

Given the strings S and T, we can use a special character to facilitate the alignment of the two strings. The special character we added is "-". We allow you to add this special character to S and T so that it is the same length, then "align" the two strings, and finally the two strings appear in the same position with different characters. With 1 point deduction, we want to make these two string alignment points as few as possible.

For the example we actually took this alignment:


Note: If you want to align, the two "-" are relatively meaningless, so we ask that this does not happen.
Then take a look:
(1) S, T corresponding positions are ordinary characters, the same, then no points. For example, the position 2, 4
(2) S, T is the normal character, and the difference is 1 point. For example, position 1
(3) S is a special character at this position, and T is a normal character at this position, then 1 point is deducted, for example, position 5
(4) S is a normal character at this position, and T is a special character at that position, then Deduct 1 point, for example, position 3,

let's see what the deduction points correspond to?

(1) No deduction, direct correspondence
(2) Corresponding to the character modification of the corresponding position
in T (3) Corresponding to deleting the character
in T (4) Corresponding to adding the character in T

, the target is clear, and it feels like Like LCS? Let us try:
Let f(i,j) denote the minimum deduction after the alignment of the first i bit of S and the first j bit of T.

Then let's take a look at the last one, the alignment

(1) S[i] == T[j] must be used. At this time, the first i – 1 and j – 1 bits are already aligned. This part must be deducted at least. The minimum deduction in this case is f(i-1,j-1)
(2) is similar to (1), S[i]≠T[j], in which case the least deduction is f(i) -1, j – 1) + 1
(3) The front i position of S and the front (j – 1) bit of T are already aligned, and this part has the least points. In this case, the least deduction is f(i,j-1) + 1
(4) The first (i-1) bit of S has been aligned with the first j position of T, which is the least. In this case, the minimum deduction is f(i,j-1) + 1

What is the value of f(i,j), obviously it depends on which case has the least deduction.

An example on nowcoder:
Link:Edit distance




Intentional analysis:
A basic Edit distance problem,just use the recursion above.


using namespace std;
const int maxn = 1050;
int f[maxn][maxn];
string a,b;
bool same(char x,char  y)
    if(x==y)    return 0;
    return 1;
int main()
        for(int i=0;i<=a.size();i++)
            for(int j=0;j<=b.size();j++)
                if(i==0)    f[i][j]=j;
                else if(j==0)    f[i][j]=i;
                else f[i][j]=min(f[i-1][j-1]+same(a[i-1],b[j-1]),min(f[i-1][j]+1,f[i][j-1]+1));
    return 0;