//! This module implements the [`StackBlur`] struct and the improved version of
//! the Stackblur algorithm.
//!
//! ## The improved Stackblur algorithm
//!
//! As previously stated, this crate implements a modified version of Stackblur,
//! and understanding the original algorithm is key to understanding the
//! improvements that have been made to it.
//!
//! The original Stackblur is essentially a weighted box-blur. Instead of taking
//! the average of a flat array of pixels, like this:
//!
//! ```text
//! ( 00 + 01 + 02 + 03 + 04 + 05 + 06 ) / 7
//! ```
//!
//! it takes a weighted average of the pixels:
//!
//! ```text
//!              ( 03 +
//!           02 + 03 + 04 +
//!      01 + 02 + 03 + 04 + 05 +
//! 00 + 01 + 02 + 03 + 04 + 05 + 06 ) / 16
//! ```
//!
//! This is a rough approximation of a Gaussian blur, and in fact it's already
//! most of the way there to being a complete algorithm on its own. You can just
//! multiply each pixel by something like `radius + 1 - center_dist` when you
//! make your sum, then divide by `radius * (radius + 2) + 1`.
//!
//! But there are two problems with this:
//!
//! 1. That would be *O*(n(m * 2 + 1)²) complexity, where `n` is the number of
//!    pixels in the image, and `m` is the radius of the blur. This is basically
//!    just as expensive as running a proper convolution filter.
//!
//! 2. How do we handle pixels off the edge of the image?
//!
//! I'm scared of #1 so I'm going to handle #2 first. What most implementations
//! choose to do is just repeat the edge of the image:
//!
//! ```text
//!              ( 00 +
//!           00 + 00 + 01 +
//!      00 + 00 + 00 + 01 + 02 +
//! 00 + 00 + 00 + 00 + 01 + 02 + 03 ) / 16
//! ```
//!
//! But this creates even more problems, because the edge of the blur will be
//! quite biased towards the pixels that aren't even in the image. This is known
//! as "edge bleeding", where a single pixel at the edge of a blur can cause
//! very large and ugly artifacts to show up.
//!
//! The solution, of course, is to not calculate the denominator using that
//! equation from earlier, and instead incrementally update the denominator as
//! pixels are scanned, allowing you to sum a varying number of pixels:
//!
//! ```text
//! ( 00 +
//!   00 + 01 +
//!   00 + 01 + 02 +
//!   00 + 01 + 02 + 03 ) / 10
//!
//!      ( 01 +
//!   00 + 01 + 02 +
//!   00 + 01 + 02 + 03 +
//!   00 + 01 + 02 + 03 + 04 ) / 13
//!
//!           ( 02 +
//!        01 + 02 + 03 +
//!   00 + 01 + 02 + 03 + 04 +
//!   00 + 01 + 02 + 03 + 04 + 05 ) / 15
//!
//!                ( 03 +
//!             02 + 03 + 04 +
//!        01 + 02 + 03 + 04 + 05 +
//!   00 + 01 + 02 + 03 + 04 + 05 + 06 ) / 16
//! ```
//!
//! This is one of the improvements made to the Stackblur algorithm that is
//! implemented by this crate.
//!
//! Now for #1 - the complexity problem. It is possible to make a streaming
//! Stackblur that does not need to read any pixels out of order or more than
//! once, or even know how long the input is. In fact, that's exactly what this
//! crate does.
//!
//! First you fill up the cache with `radius + 1` pixels in order to be able to
//! make the first sum, then you start producing results until you run out of
//! pixels, then you produce the last `radius` results using what you have in
//! cache. I don't have the skill to explain the algorithm in full detail, but
//! it's open-source so you can look at it if you want.
//!
//! In this crate the "cache" is not actually the actual heap of values, as that
//! would be too slow. Instead, the "cache" is a list of changes to make to the
//! rate of change of the sum, and the denominator is updated incrementally in
//! response to the number of leading and trailing values changing.
//!
//! The reason the whole rate of change thing exists is so that adding to the
//! sum and registering it in the cache becomes *O*(1) instead of *O*(2n+1)
//! (where `n` is the radius). It's basically the most important thing that
//! makes the algorithm constant-time.

use std::collections::VecDeque;

use crate::traits::StackBlurrable;

/// An iterator that implements an improved Stackblur algorithm.
///
/// For any [`StackBlurrable`] element `T` and any iterator `I` over items of
/// type `T`, [`StackBlur`] will yield the same amount of elements as `I`,
/// blurred together according to the improved Stackblur algorithm as described
/// by the [`iter`][crate::iter] module documentation.
///
/// ## Usage
///
/// [`StackBlur`] just wraps another iterator with a radius and a cache (called
/// `ops` by [`StackBlur::new`]):
///
/// ```compile_fail
/// # use std::collections::VecDeque;
/// # use std::num::Wrapping;
/// # use stackblur_iter::iter::StackBlur;
/// #
/// let arr = [255u8, 0, 0, 0, 127, 0, 0, 0, 255u8];
/// let iter = arr.iter().copied().map(|i| Wrapping(i as usize));
/// let blur = StackBlur::new(iter, 2, VecDeque::new());
/// ```
///
/// That example unfortunately doesn't compile because `Wrapping<usize>` doesn't
/// implement `Mul<usize>` and `Div<usize>`, which are required for the
/// averaging that [`StackBlur`] performs. It's recommended to create a newtype
/// wrapper around whatever you plan on blurring, and implement all the traits
/// required by [`StackBlurrable`].
///
/// A [`StackBlur`] always yields exactly as many items as its inner iterator
/// does. Additionally, a non-fused iterator which repeats will cause the
/// [`StackBlur`] to repeat as well.
pub struct StackBlur<'a, T: StackBlurrable, I: Iterator<Item = T>> {
    iter: I,
    radius: usize,
    sum: T,
    rate: T,
    dnom: usize,
    ops: &'a mut VecDeque<T>,
    leading: usize,
    trailing: usize,
    done: bool,
    first: bool,
}

impl<'a, T: StackBlurrable, I: Iterator<Item = T>> StackBlur<'a, T, I> {
    /// Creates a new [`StackBlur`] from the provided iterator, radius, and
    /// [`VecDeque`].
    ///
    /// The iterator is not advanced until a call to [`StackBlur::next`].
    pub fn new(iter: I, radius: usize, ops: &'a mut VecDeque<T>) -> Self {
        Self {
            iter,
            radius,
            sum: T::default(),
            rate: T::default(),
            dnom: 0,
            leading: 0,
            trailing: 0,
            ops,
            done: true,
            first: true,
        }
    }

    fn init(&mut self) {
        self.done = false;

        for sub in 0..=self.radius {
            let Some(item) = self.iter.next() else { break };

            if sub == 0 {
                let start = self.radius + 1;
                let needed = start * 2 + 2;
                self.ops.reserve(needed.saturating_sub(self.ops.capacity()));
                self.ops
                    .iter_mut()
                    .take(start)
                    .for_each(|place| *place = T::default());
                self.ops.resize_with(start, T::default);

                self.sum = T::default();
                self.rate = T::default();
                self.dnom = 0;
                self.leading = 0;
                self.trailing = 0;
            }

            let mul = self.radius + 1 - sub;
            self.sum += item * mul;
            self.rate += item;
            self.dnom += mul;

            if self.dnom > mul {
                self.trailing += 1;
            }

            self.ops[sub] -= item * 2;
            self.ops.push_back(item);
        }

        if self.dnom == 0 {
            self.done = true;
        }
    }
}

impl<T: StackBlurrable, I: Iterator<Item = T>> Iterator for StackBlur<'_, T, I> {
    type Item = T;

    #[inline]
    fn next(&mut self) -> Option<Self::Item> {
        if self.done {
            self.init();

            if !std::mem::replace(&mut self.first, false) {
                return None;
            }
        }

        let result = self.sum / self.dnom;

        self.rate += self.ops.pop_front().unwrap();
        self.sum += self.rate;

        if self.leading < self.radius {
            self.leading += 1;
            self.dnom += self.radius + 1 - self.leading;
        }

        if self.radius == 0 || self.trailing == self.radius {
            if let Some(item) = self.iter.next() {
                self.sum += item;
                self.rate += item;
                self.ops[self.radius] -= item * 2;
                self.ops.push_back(item);
            } else if self.radius > 0 {
                self.dnom -= self.radius + 1 - self.trailing;
                self.trailing -= 1;
            } else {
                self.done = true;
            }
        } else if self.trailing > 0 {
            self.dnom -= self.radius + 1 - self.trailing;
            self.trailing -= 1;
        } else if self.trailing == 0 {
            self.done = true;
        }

        Some(result)
    }
}