my fork of dmp
Ready to publish
| -rw-r--r-- | CHANGELOG.md | 6 | ||||
| -rw-r--r-- | Cargo.toml | 4 | ||||
| -rw-r--r-- | README.md | 39 | ||||
| -rw-r--r-- | src/dmp.rs | 65 | ||||
| -rw-r--r-- | src/lib.rs | 39 |
5 files changed, 88 insertions, 65 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md index b53acc3..13253eb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # CHANGELOG.md +## 0.4.0 + +Performance: + + - Non-algorithmic improvements to `Diff` implementation resulting in `~41%` improvements over previous [benchmarks](). [Here's]() a writeup of the experiments. + ## 0.3.2 Fix: @@ -1,11 +1,11 @@ [package] name = "diff-match-patch-rs" -version = "0.3.2" +version = "0.4.0" edition = "2021" authors = ["Anubhab Bandyopadhyay"] homepage = "https://docs.rs/diff-match-patch-rs" repository = "https://github.com/AnubhabB/diff-match-patch-rs.git" -description = "A high-performance port of Myer's diff algorithm to perform the operations required for synchronizing plain text." +description = "The fastest implementation of Myer's diff algorithm to perform the operations required for synchronizing plain text." readme = "README.md" license = "MIT OR Apache-2.0" keywords = ["diff", "match", "patch", "text-synchronization"] @@ -4,7 +4,7 @@ [<img alt="crates.io" src="https://img.shields.io/crates/v/diff-match-patch-rs" height="20">](https://crates.io/crates/diff-match-patch-rs) [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-diff_match_patch_rs?style=for-the-badge&logo=docs.rs&labelColor=%23555555" height="20">](https://docs.rs/diff-match-patch-rs) -A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust. The diff implementation is based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf). +**Fastest**, **accurate** and **wasm ready** implementation of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf). ## Highlights of this crate - Exposes two modes of operating with `diff-match-patch`, a `Efficient` mode and `Compat` mode. While the `Efficient` mode squeezes out the max performance the `Compat` mode ensures compatibility across other libraries or implementations (rust or otherwise). According to [Benchmarks](#benchmarks), our slower `Compat` mode is still faster than other implementations in rust. @@ -17,11 +17,29 @@ A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](http - Added a `fuzzer` for sanity - Exposes the same APIs as [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) with minor changes to make it more idiomatic in Rust. +## Benchmarks +Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench) + +| Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct | +|:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:| +| `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^1] | 56.618 ms | 9.00 ms | Criterion | - | ✅ | +| `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ | +| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 32.795 ms | 533.17 µs | Criterion | `Efficient`| ✅ | +| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 33.222 ms | 993.90 µs | Criterion | `Compat` | ✅ | +| `go` | [go-diff](https://github.com/sergi/go-diff)[^2] | 30.31 ms | 118.2 ms | go test | - | ❗ | +| `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^3] | 246.90 ms | 1.07 ms | tinybench | - | ❌ | +| `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ | + + +[^1]: [diff_match_patch](https://crates.io/crates/diff_match_patch) Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`. +[^2]: [go-diff](https://github.com/sergi/go-diff) is technically a correct implementation but it generates far more diffs than any other implementation I've tested. E.g. In our test data [here](https://github.com/AnubhabB/diff-match-patch-rs-bench/testdata) the go lib ends up generating `~3000` diffs while other implementations are generating `~2300` diffs. My guess is one of the cleanup passes are skipped by `go-diff`. +[^3]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`. + ## Usage Examples ```toml [dependencies] -diff-match-patch-rs = "0.3.2" +diff-match-patch-rs = "0.4.0" ``` ### `Effitient` mode @@ -175,23 +193,6 @@ Please checkout the `examples` directory of the [source repo](https://github.com <div class="warning">The `Effitient` and `Compat` modes are mutually exclusive and will not generate correct output if used interchangibly at source and destination</div> -## Benchmarks -Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench) - -| Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct | -|:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:| -| `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^2] | 56.618 ms | 9.00 ms | Criterion | - | ✅ | -| `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ | -| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 34.660 ms | 533.17 µs | Criterion | `Efficient`| ✅ | -| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 35.095 ms | 993.90 µs | Criterion | `Compat` | ✅ | -| `go` | [go-diff](https://github.com/sergi/go-diff) | 40.31 ms | 135.2 ms | go test | - | ✅ | -| `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^1] | 246.90 ms | 1.07 ms | tinybench | - | ❌ | -| `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ | - - -[^1]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`. -[^2]: Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`. - ## Gotchas **Diff incompatibility with `JavaScript` libs**: @@ -679,8 +679,13 @@ impl DiffMatchPatch { { let v_trg = v_offset as usize + 1; - v1[v_trg] = 0; - v2[v_trg] = 0; + if v_trg < v1.len() { + v1[v_trg] = 0; + } + + if v_trg < v2.len() { + v2[v_trg] = 0; + } } let delta = old_len - new_len; @@ -714,15 +719,21 @@ impl DiffMatchPatch { let (x1, y1) = { let k1_offset = v_offset + k1; - let v1_prev = if (1..v1.len() as isize).contains(&k1_offset) { - v1[k1_offset as usize - 1] - } else { - -1 + let v1_prev = { + let prev_idx = (k1_offset - 1) as usize; + if prev_idx < v1.len() { + v1[prev_idx] + } else { + -1 + } }; - let v1_next = if (0..v1.len() as isize).contains(&(k1_offset + 1)) { - v1[k1_offset as usize + 1] - } else { - -1 + let v1_next = { + let next_idx = (k1_offset + 1) as usize; + if next_idx < v1.len() { + v1[next_idx] + } else { + -1 + } }; let mut x1 = if k1 == -edit_dist || (k1 != edit_dist && v1_prev < v1_next) { @@ -751,10 +762,13 @@ impl DiffMatchPatch { // Ran off the bottom of the graph. k1start += 2; } else if front { - let k2_offset = v_offset + delta - k1; - if (0..v_len).contains(&k2_offset) && v2[k2_offset as usize] != -1 { + let k2_offset = (v_offset + delta - k1) as usize; + if (0..v_len as usize).contains(&k2_offset) + && k2_offset < v2.len() + && v2[k2_offset] != -1 + { // Mirror x2 onto top-left coordinate system. - let x2 = old_len - v2[k2_offset as usize]; + let x2 = old_len - v2[k2_offset]; if x1 >= x2 { // Overlap detected. return T::bisect_split( @@ -775,15 +789,17 @@ impl DiffMatchPatch { let mut k2 = k2start - edit_dist; while k2 < edit_dist + 1 - k2end { let (mut x2, y2) = { - let k2_offset = v_offset + k2; + let k2_offset = (v_offset + k2) as usize; + let prev_idx = k2_offset - 1; + let next_idx = k2_offset + 1; - let v2_prev = if (0..v2.len() as isize).contains(&(k2_offset - 1)) { - v2[k2_offset as usize - 1] + let v2_prev = if prev_idx < v2.len() { + v2[prev_idx] } else { -1 }; - let v2_next = if (0..v2.len() as isize).contains(&(k2_offset + 1)) { - v2[k2_offset as usize + 1] + let v2_next = if next_idx < v2.len() { + v2[next_idx] } else { -1 }; @@ -799,8 +815,8 @@ impl DiffMatchPatch { y2 = yi as isize; x2 = xi as isize; - if (0..v2.len() as isize).contains(&k2_offset) { - v2[k2_offset as usize] = x2; + if k2_offset < v2.len() { + v2[k2_offset] = x2; } (x2, y2) @@ -813,10 +829,11 @@ impl DiffMatchPatch { // Ran off the top of the graph. k2start += 2; } else if !front { - let k1_offset = v_offset + delta - k2; - if (0..v_len).contains(&k1_offset) && v1[k1_offset as usize] != -1 { - let x1 = v1[k1_offset as usize]; - let y1 = v_offset + x1 - k1_offset; + let k1_offset = (v_offset + delta - k2) as usize; + if k1_offset < v_len as usize && k1_offset < v1.len() && v1[k1_offset] != -1 { + // if (0..v_len).contains(&k1_offset) && k1_offset < v1.len() as isize && v1[k1_offset as usize] != -1 { + let x1 = v1[k1_offset]; + let y1 = v_offset + x1 - k1_offset as isize; // Mirror x2 onto top-left coordinate system. x2 = old_len - x2; if x1 >= x2 { @@ -5,8 +5,7 @@ //! [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-diff_match_patch_rs?style=for-the-badge&logo=docs.rs&labelColor=%23555555" height="20">](https://docs.rs/diff-match-patch-rs) //! //! -//! A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust. The -//! diff implementation is based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf). +//! **Fastest**, **accurate** and **wasm ready** implementation of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf). //! //! ## Highlights of this crate //! - Exposes two modes of operating with `diff-match-patch`, a `Efficient` mode and `Compat` mode. While the `Efficient` mode squeezes out the max performance the `Compat` mode ensures compatibility across other libraries or implementations (rust or otherwise). According to [Benchmarks](#benchmarks), our slower `Compat` mode is still faster than other implementations in rust. @@ -19,11 +18,28 @@ //! - Added a `fuzzer` for sanity //! - Exposes the same APIs as [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) with minor changes to make it more idiomatic in Rust. //! +//! ## Benchmarks +//! Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench) +//! +//! | Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct | +//! |:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:| +//! | `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^1] | 56.618 ms | 9.00 ms | Criterion | - | ✅ | +//! | `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ | +//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 32.795 ms | 533.17 µs | Criterion | `Efficient`| ✅ | +//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 33.222 ms | 993.90 µs | Criterion | `Compat` | ✅ | +//! | `go` | [go-diff](https://github.com/sergi/go-diff)[^2] | 30.31 ms | 118.2 ms | go test | - | ❗ | +//! | `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^3] | 246.90 ms | 1.07 ms | tinybench | - | ❌ | +//! | `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ | +//! +//! [^1]: [diff_match_patch](https://crates.io/crates/diff_match_patch) Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`. +//! [^2]: [go-diff](https://github.com/sergi/go-diff) is technically a correct implementation but it generates far more diffs than any other implementation I've tested. E.g. In our test data [here](https://github.com/AnubhabB/diff-match-patch-rs-bench/testdata) the go lib ends up generating `~3000` diffs while other implementations are generating `~2300` diffs. My guess is one of the cleanup passes are skipped by `go-diff`.That's probably the reason behind unreasonably high `patch` timings. +//! [^3]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`. +//! //! ## Usage Examples //! //! ```toml //! [dependencies] -//! diff-match-patch-rs = "0.3.2" +//! diff-match-patch-rs = "0.4.0" //! ``` //! //! ### `Effitient` mode @@ -176,23 +192,6 @@ //! //! <div class="warning">The `Effitient` and `Compat` modes are mutually exclusive and will not generate correct output if used interchangibly at source and destination</div> //! -//! ## Benchmarks -//! Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench) -//! -//! | Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct | -//! |:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:| -//! | `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^2] | 56.618 ms | 10.596 ms | Criterion | - | ✅ | -//! | `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 14.654 ms | Criterion | - | ✅ | -//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 55.149 ms | 631.13 µs | Criterion | `Efficient` | ✅ | -//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 55.275 ms | 1.1703 ms | Criterion | `Compat` | ✅ | -//! | `go` | [go-diff](https://github.com/sergi/go-diff) | 50.31 ms | 135.2 ms | go test | - | ✅ | -//! | `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^1] | 246.90 ms | 1.07 ms | tinybench | - | ❌ | -//! | `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ | -//! -//! [^1]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`. -//! [^2]: Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`. -//! -//! //! ## Gotchas //! **Diff incompatibility with `JavaScript` libs**: //! |