my fork of dmp
Ready to publish
Anubhab Bandyopadhyay 2025-01-03
parent a56e49a · commit 41166ab
-rw-r--r--CHANGELOG.md6
-rw-r--r--Cargo.toml4
-rw-r--r--README.md39
-rw-r--r--src/dmp.rs65
-rw-r--r--src/lib.rs39
5 files changed, 88 insertions, 65 deletions
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b53acc3..13253eb 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,11 @@
# CHANGELOG.md
+## 0.4.0
+
+Performance:
+
+ - Non-algorithmic improvements to `Diff` implementation resulting in `~41%` improvements over previous [benchmarks](). [Here's]() a writeup of the experiments.
+
## 0.3.2
Fix:
diff --git a/Cargo.toml b/Cargo.toml
index 3cc35cb..81ed106 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,11 +1,11 @@
[package]
name = "diff-match-patch-rs"
-version = "0.3.2"
+version = "0.4.0"
edition = "2021"
authors = ["Anubhab Bandyopadhyay"]
homepage = "https://docs.rs/diff-match-patch-rs"
repository = "https://github.com/AnubhabB/diff-match-patch-rs.git"
-description = "A high-performance port of Myer's diff algorithm to perform the operations required for synchronizing plain text."
+description = "The fastest implementation of Myer's diff algorithm to perform the operations required for synchronizing plain text."
readme = "README.md"
license = "MIT OR Apache-2.0"
keywords = ["diff", "match", "patch", "text-synchronization"]
diff --git a/README.md b/README.md
index 179dbca..c4e3316 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
[<img alt="crates.io" src="https://img.shields.io/crates/v/diff-match-patch-rs" height="20">](https://crates.io/crates/diff-match-patch-rs)
[<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-diff_match_patch_rs?style=for-the-badge&logo=docs.rs&labelColor=%23555555" height="20">](https://docs.rs/diff-match-patch-rs)
-A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust. The diff implementation is based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf).
+**Fastest**, **accurate** and **wasm ready** implementation of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf).
## Highlights of this crate
- Exposes two modes of operating with `diff-match-patch`, a `Efficient` mode and `Compat` mode. While the `Efficient` mode squeezes out the max performance the `Compat` mode ensures compatibility across other libraries or implementations (rust or otherwise). According to [Benchmarks](#benchmarks), our slower `Compat` mode is still faster than other implementations in rust.
@@ -17,11 +17,29 @@ A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](http
- Added a `fuzzer` for sanity
- Exposes the same APIs as [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) with minor changes to make it more idiomatic in Rust.
+## Benchmarks
+Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench)
+
+| Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct |
+|:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:|
+| `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^1] | 56.618 ms | 9.00 ms | Criterion | - | ✅ |
+| `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ |
+| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 32.795 ms | 533.17 µs | Criterion | `Efficient`| ✅ |
+| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 33.222 ms | 993.90 µs | Criterion | `Compat` | ✅ |
+| `go` | [go-diff](https://github.com/sergi/go-diff)[^2] | 30.31 ms | 118.2 ms | go test | - | ❗ |
+| `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^3] | 246.90 ms | 1.07 ms | tinybench | - | ❌ |
+| `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ |
+
+
+[^1]: [diff_match_patch](https://crates.io/crates/diff_match_patch) Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`.
+[^2]: [go-diff](https://github.com/sergi/go-diff) is technically a correct implementation but it generates far more diffs than any other implementation I've tested. E.g. In our test data [here](https://github.com/AnubhabB/diff-match-patch-rs-bench/testdata) the go lib ends up generating `~3000` diffs while other implementations are generating `~2300` diffs. My guess is one of the cleanup passes are skipped by `go-diff`.
+[^3]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`.
+
## Usage Examples
```toml
[dependencies]
-diff-match-patch-rs = "0.3.2"
+diff-match-patch-rs = "0.4.0"
```
### `Effitient` mode
@@ -175,23 +193,6 @@ Please checkout the `examples` directory of the [source repo](https://github.com
<div class="warning">The `Effitient` and `Compat` modes are mutually exclusive and will not generate correct output if used interchangibly at source and destination</div>
-## Benchmarks
-Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench)
-
-| Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct |
-|:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:|
-| `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^2] | 56.618 ms | 9.00 ms | Criterion | - | ✅ |
-| `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ |
-| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 34.660 ms | 533.17 µs | Criterion | `Efficient`| ✅ |
-| `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 35.095 ms | 993.90 µs | Criterion | `Compat` | ✅ |
-| `go` | [go-diff](https://github.com/sergi/go-diff) | 40.31 ms | 135.2 ms | go test | - | ✅ |
-| `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^1] | 246.90 ms | 1.07 ms | tinybench | - | ❌ |
-| `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ |
-
-
-[^1]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`.
-[^2]: Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`.
-
## Gotchas
**Diff incompatibility with `JavaScript` libs**:
diff --git a/src/dmp.rs b/src/dmp.rs
index e1905ec..fef0ebb 100644
--- a/src/dmp.rs
+++ b/src/dmp.rs
@@ -679,8 +679,13 @@ impl DiffMatchPatch {
{
let v_trg = v_offset as usize + 1;
- v1[v_trg] = 0;
- v2[v_trg] = 0;
+ if v_trg < v1.len() {
+ v1[v_trg] = 0;
+ }
+
+ if v_trg < v2.len() {
+ v2[v_trg] = 0;
+ }
}
let delta = old_len - new_len;
@@ -714,15 +719,21 @@ impl DiffMatchPatch {
let (x1, y1) = {
let k1_offset = v_offset + k1;
- let v1_prev = if (1..v1.len() as isize).contains(&k1_offset) {
- v1[k1_offset as usize - 1]
- } else {
- -1
+ let v1_prev = {
+ let prev_idx = (k1_offset - 1) as usize;
+ if prev_idx < v1.len() {
+ v1[prev_idx]
+ } else {
+ -1
+ }
};
- let v1_next = if (0..v1.len() as isize).contains(&(k1_offset + 1)) {
- v1[k1_offset as usize + 1]
- } else {
- -1
+ let v1_next = {
+ let next_idx = (k1_offset + 1) as usize;
+ if next_idx < v1.len() {
+ v1[next_idx]
+ } else {
+ -1
+ }
};
let mut x1 = if k1 == -edit_dist || (k1 != edit_dist && v1_prev < v1_next) {
@@ -751,10 +762,13 @@ impl DiffMatchPatch {
// Ran off the bottom of the graph.
k1start += 2;
} else if front {
- let k2_offset = v_offset + delta - k1;
- if (0..v_len).contains(&k2_offset) && v2[k2_offset as usize] != -1 {
+ let k2_offset = (v_offset + delta - k1) as usize;
+ if (0..v_len as usize).contains(&k2_offset)
+ && k2_offset < v2.len()
+ && v2[k2_offset] != -1
+ {
// Mirror x2 onto top-left coordinate system.
- let x2 = old_len - v2[k2_offset as usize];
+ let x2 = old_len - v2[k2_offset];
if x1 >= x2 {
// Overlap detected.
return T::bisect_split(
@@ -775,15 +789,17 @@ impl DiffMatchPatch {
let mut k2 = k2start - edit_dist;
while k2 < edit_dist + 1 - k2end {
let (mut x2, y2) = {
- let k2_offset = v_offset + k2;
+ let k2_offset = (v_offset + k2) as usize;
+ let prev_idx = k2_offset - 1;
+ let next_idx = k2_offset + 1;
- let v2_prev = if (0..v2.len() as isize).contains(&(k2_offset - 1)) {
- v2[k2_offset as usize - 1]
+ let v2_prev = if prev_idx < v2.len() {
+ v2[prev_idx]
} else {
-1
};
- let v2_next = if (0..v2.len() as isize).contains(&(k2_offset + 1)) {
- v2[k2_offset as usize + 1]
+ let v2_next = if next_idx < v2.len() {
+ v2[next_idx]
} else {
-1
};
@@ -799,8 +815,8 @@ impl DiffMatchPatch {
y2 = yi as isize;
x2 = xi as isize;
- if (0..v2.len() as isize).contains(&k2_offset) {
- v2[k2_offset as usize] = x2;
+ if k2_offset < v2.len() {
+ v2[k2_offset] = x2;
}
(x2, y2)
@@ -813,10 +829,11 @@ impl DiffMatchPatch {
// Ran off the top of the graph.
k2start += 2;
} else if !front {
- let k1_offset = v_offset + delta - k2;
- if (0..v_len).contains(&k1_offset) && v1[k1_offset as usize] != -1 {
- let x1 = v1[k1_offset as usize];
- let y1 = v_offset + x1 - k1_offset;
+ let k1_offset = (v_offset + delta - k2) as usize;
+ if k1_offset < v_len as usize && k1_offset < v1.len() && v1[k1_offset] != -1 {
+ // if (0..v_len).contains(&k1_offset) && k1_offset < v1.len() as isize && v1[k1_offset as usize] != -1 {
+ let x1 = v1[k1_offset];
+ let y1 = v_offset + x1 - k1_offset as isize;
// Mirror x2 onto top-left coordinate system.
x2 = old_len - x2;
if x1 >= x2 {
diff --git a/src/lib.rs b/src/lib.rs
index 8431329..c5c1cdf 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -5,8 +5,7 @@
//! [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-diff_match_patch_rs?style=for-the-badge&logo=docs.rs&labelColor=%23555555" height="20">](https://docs.rs/diff-match-patch-rs)
//!
//!
-//! A very **fast**, **accurate** and **wasm ready** port of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust. The
-//! diff implementation is based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf).
+//! **Fastest**, **accurate** and **wasm ready** implementation of [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) in Rust based on [Myers' diff algorithm](https://neil.fraser.name/writing/diff/myers.pdf).
//!
//! ## Highlights of this crate
//! - Exposes two modes of operating with `diff-match-patch`, a `Efficient` mode and `Compat` mode. While the `Efficient` mode squeezes out the max performance the `Compat` mode ensures compatibility across other libraries or implementations (rust or otherwise). According to [Benchmarks](#benchmarks), our slower `Compat` mode is still faster than other implementations in rust.
@@ -19,11 +18,28 @@
//! - Added a `fuzzer` for sanity
//! - Exposes the same APIs as [Diff Match Patch](https://github.com/dmsnell/diff-match-patch) with minor changes to make it more idiomatic in Rust.
//!
+//! ## Benchmarks
+//! Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench)
+//!
+//! | Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct |
+//! |:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:|
+//! | `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^1] | 56.618 ms | 9.00 ms | Criterion | - | ✅ |
+//! | `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 12.25 ms | Criterion | - | ✅ |
+//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 32.795 ms | 533.17 µs | Criterion | `Efficient`| ✅ |
+//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 33.222 ms | 993.90 µs | Criterion | `Compat` | ✅ |
+//! | `go` | [go-diff](https://github.com/sergi/go-diff)[^2] | 30.31 ms | 118.2 ms | go test | - | ❗ |
+//! | `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^3] | 246.90 ms | 1.07 ms | tinybench | - | ❌ |
+//! | `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ |
+//!
+//! [^1]: [diff_match_patch](https://crates.io/crates/diff_match_patch) Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`.
+//! [^2]: [go-diff](https://github.com/sergi/go-diff) is technically a correct implementation but it generates far more diffs than any other implementation I've tested. E.g. In our test data [here](https://github.com/AnubhabB/diff-match-patch-rs-bench/testdata) the go lib ends up generating `~3000` diffs while other implementations are generating `~2300` diffs. My guess is one of the cleanup passes are skipped by `go-diff`.That's probably the reason behind unreasonably high `patch` timings.
+//! [^3]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`.
+//!
//! ## Usage Examples
//!
//! ```toml
//! [dependencies]
-//! diff-match-patch-rs = "0.3.2"
+//! diff-match-patch-rs = "0.4.0"
//! ```
//!
//! ### `Effitient` mode
@@ -176,23 +192,6 @@
//!
//! <div class="warning">The `Effitient` and `Compat` modes are mutually exclusive and will not generate correct output if used interchangibly at source and destination</div>
//!
-//! ## Benchmarks
-//! Benchmarks are maintained [diff-match-patch-bench repository](https://github.com/AnubhabB/diff-match-patch-rs-bench)
-//!
-//! | Lang. | Library | Diff Avg. | Patch Avg. | Bencher | Mode | Correct |
-//! |:-------:|:----------------------------------------------------------------------------------------:|:---------:|:----------:|:----------:|:-----------:|:-------:|
-//! | `rust` | [diff_match_patch v0.1.1](https://crates.io/crates/diff_match_patch)[^2] | 56.618 ms | 10.596 ms | Criterion | - | ✅ |
-//! | `rust` | [dmp v0.2.0](https://crates.io/crates/dmp) | 56.609 ms | 14.654 ms | Criterion | - | ✅ |
-//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 55.149 ms | 631.13 µs | Criterion | `Efficient` | ✅ |
-//! | `rust` | [diff-match-patch-rs](https://github.com/AnubhabB/diff-match-patch-rs.git)<sup>our</sup> | 55.275 ms | 1.1703 ms | Criterion | `Compat` | ✅ |
-//! | `go` | [go-diff](https://github.com/sergi/go-diff) | 50.31 ms | 135.2 ms | go test | - | ✅ |
-//! | `node` | [diff-match-patch](https://www.npmjs.com/package/diff-match-patch)[^1] | 246.90 ms | 1.07 ms | tinybench | - | ❌ |
-//! | `python`| [diff-match-patch](https://pypi.org/project/diff-match-patch/) | 1.01 s | 0.25 ms | timeit | - | ✅ |
-//!
-//! [^1]: [diff-match-patch](https://www.npmjs.com/package/diff-match-patch) generated `patch text` and `delta` breaks on `unicode surrogates`.
-//! [^2]: Adds an extra clone to the iterator because the `patch_apply` method takes mutable refc. to `diffs`.
-//!
-//!
//! ## Gotchas
//! **Diff incompatibility with `JavaScript` libs**:
//!