Edit distance

wikipedia:Edit distance
Description:
编辑距离,又称Levenshtein距离(也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
例如将kitten一字转成sitting:
sitten (k->s)
sittin (e->i)
sitting (->g)
所以kitten和sitting的编辑距离是3。俄罗斯科学家Vladimir Levenshtein在1965年提出这个概念。
给出两个字符串a,b,求a和b的编辑距离。

The problem is easy if you know the state transition equation.

//From 51nod
we define the function same(i,j) to be 0 if S[i] == T[j], otherwise 1.
Let us show the recursion:
f(i,j) = min(f(i – 1, j – 1) + same(i,j), f(i – 1,j ) + 1, f(i, j – 1) + 1)
initial value:
f(0, j) = j
f(i, 0) = i

The tutorial above 51nod is really good.Nou just LCS or Edit distance,but every tutorial there.
All the follwing are from 51nod.
The difficulty of this problem is that it is difficult to have such operations as “add” and “delete”, which is very troublesome. Let’s try to understand the problem from a different angle and see it as a string alignment problem. In fact, we can understand the problem from the perspective of bioinformatics comparison genes.

Given the strings S and T, we can use a special character to facilitate the alignment of the two strings. The special character we added is “-“. We allow you to add this special character to S and T so that it is the same length, then “align” the two strings, and finally the two strings appear in the same position with different characters. With 1 point deduction, we want to make these two string alignment points as few as possible.

For the example we actually took this alignment:

12345
ABCF-
DB-FG

Note: If you want to align, the two “-“ are relatively meaningless, so we ask that this does not happen.
Then take a look:
(1) S, T corresponding positions are ordinary characters, the same, then no points. For example, the position 2, 4
(2) S, T is the normal character, and the difference is 1 point. For example, position 1
(3) S is a special character at this position, and T is a normal character at this position, then 1 point is deducted, for example, position 5
(4) S is a normal character at this position, and T is a special character at that position, then Deduct 1 point, for example, position 3,

let’s see what the deduction points correspond to?

(1) No deduction, direct correspondence
(2) Corresponding to the character modification of the corresponding position
in T (3) Corresponding to deleting the character
in T (4) Corresponding to adding the character in T

, the target is clear, and it feels like Like LCS? Let us try:
Let f(i,j) denote the minimum deduction after the alignment of the first i bit of S and the first j bit of T.

Then let’s take a look at the last one, the alignment

(1) S[i] == T[j] must be used. At this time, the first i – 1 and j – 1 bits are already aligned. This part must be deducted at least. The minimum deduction in this case is f(i-1,j-1)
(2) is similar to (1), S[i]≠T[j], in which case the least deduction is f(i) -1, j – 1) + 1
(3) The front i position of S and the front (j – 1) bit of T are already aligned, and this part has the least points. In this case, the least deduction is f(i,j-1) + 1
(4) The first (i-1) bit of S has been aligned with the first j position of T, which is the least. In this case, the minimum deduction is f(i,j-1) + 1

What is the value of f(i,j), obviously it depends on which case has the least deduction.

An example on nowcoder:
Link:Edit distance
Description:
UNIX系统下有一个行编辑器ed,它每次只对一行文本做删除一个字符、插入一个字符或替换一个字符三种操作。例如某一行的内容是“ABC”,经过把第二个字符替换成“D”、删除第一个字符、末尾插入一个字符“B”,这三步操作后,内容就变成了“DCB”。即“ABC”变成“DCB”需要经过3步操作,我们称它们的编辑距离为3。
现在给你两个任意字符串(不包含空格),请帮忙计算它们的最短编辑距离。

输入描述:
输入包含多组数据。

每组数据包含两个字符串m和n,它们仅包含字母,并且长度不超过1024。

输出描述:
对应每组输入,输出最短编辑距离。
示例1
输入
ABC CBCD
ABC DCB
输出
2
3

Intentional analysis:
A basic Edit distance problem,just use the recursion above.

Code:

#include<iostream>
#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
const int maxn = 1050;
int f[maxn][maxn];
string a,b;
bool same(char x,char  y)
{
    if(x==y)    return 0;
    return 1;
}
int main()
{
    while(cin>>a>>b)
    {
        memset(f,0,sizeof(f));
        for(int i=0;i<=a.size();i++)
        {
            for(int j=0;j<=b.size();j++)
            {
                if(i==0)    f[i][j]=j;
                else if(j==0)    f[i][j]=i;
                else f[i][j]=min(f[i-1][j-1]+same(a[i-1],b[j-1]),min(f[i-1][j]+1,f[i][j-1]+1));
            }
        }
        cout<<f[a.size()][b.size()]<<endl;
    }
    return 0;
}