好吧,这是 2D拉宾-卡普 https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm方法。
对于下面的讨论,假设我们想要在 (n, n) 矩阵。 (这个概念也适用于矩形矩阵,但我用完了索引。)
这个想法是,对于每个可能的子矩阵,我们计算一个散列。仅当该散列与我们想要查找的矩阵的散列匹配时,我们才会按元素进行比较。
为了提高效率,我们必须避免每次重新计算子矩阵的整个散列。因为我今晚睡得很少,所以我唯一能弄清楚如何轻松做到这一点的哈希函数是各个子矩阵中 1 的总和。我把它留给比我聪明的人作为练习,以找出更好的滚动哈希函数。
现在,如果我们刚刚检查了从 (i, j) 到 (i + m – 1, j + m – 1) 并且知道里面有x个1,我们可以计算子矩阵中1的数量向右一位 – 即从 (i, j + 1) 到 (i + m) – 1, j + m) – 从 (i, j 中减去子向量中 1 的数量>) 到 (i + m – 1, j) 并添加来自 (i>i , j + m) 到 (i + m – 1, j + m)。
如果我们到达大矩阵的右边距,我们将窗口向下移动一位,然后回到左边距,然后再次向下移动一位,然后再次向右移动,依此类推。
Note that this requires O(m) operations, not O(m2) for each candidate. If we do this for every pair of indices, we get O(mn2) work. Thus, by cleverly shifting a window of the size of the potential sub-matrix through the large matrix, we can reduce the amount of work by a factor of m. That is, if we don't get too many hash collisions.
这是一张图片:
当我们将当前窗口右移一位时,减去左边红色列向量中1的个数,加上右边绿色列向量中1的个数,得到新窗口中的1个数。窗户。
我已经使用伟大的实现了这个想法的快速演示Eigen http://eigen.tuxfamily.org/index.php?title=Main_PageC++ 模板库。该示例还使用了 Boost 中的一些内容,但仅用于参数解析和输出格式化,因此如果您没有 Boost 但想尝试代码,您可以轻松摆脱它。索引摆弄有点混乱,但我将不做进一步解释。上面的散文应该足以涵盖它。
#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <random>
#include <type_traits>
#include <utility>
#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>
#include <Eigen/Dense>
#define PROGRAM "submatrix"
#define SEED_CSTDLIB_RAND 1
using BitMatrix = Eigen::Matrix<bool, Eigen::Dynamic, Eigen::Dynamic>;
using Index1D = BitMatrix::Index;
using Index2D = std::pair<Index1D, Index1D>;
std::ostream&
operator<<(std::ostream& out, const Index2D& idx)
{
out << "(" << idx.first << ", " << idx.second << ")";
return out;
}
BitMatrix
get_random_bit_matrix(const Index1D rows, const Index1D cols)
{
auto matrix = BitMatrix {rows, cols};
matrix.setRandom();
return matrix;
}
Index2D
findSubMatrix(const BitMatrix& haystack,
const BitMatrix& needle,
Index1D *const collisions_ptr = nullptr) noexcept
{
static_assert(std::is_signed<Index1D>::value, "unsigned index type");
const auto end = Index2D {haystack.rows(), haystack.cols()};
const auto hr = haystack.rows();
const auto hc = haystack.cols();
const auto nr = needle.rows();
const auto nc = needle.cols();
if (nr > hr || nr > hc)
return end;
const auto target = needle.count();
auto current = haystack.block(0, 0, nr - 1, nc).count();
auto j = Index1D {0};
for (auto i = Index1D {0}; i <= hr - nr; ++i)
{
if (j == 0) // at left margin
current += haystack.block(i + nr - 1, 0, 1, nc).count();
else if (j == hc - nc) // at right margin
current += haystack.block(i + nr - 1, hc - nc, 1, nc).count();
else
assert(!"this should never happen");
while (true)
{
if (i % 2 == 0) // moving right
{
if (j > 0)
current += haystack.block(i, j + nc - 1, nr, 1).count();
}
else // moving left
{
if (j < hc - nc)
current += haystack.block(i, j, nr, 1).count();
}
assert(haystack.block(i, j, nr, nc).count() == current);
if (current == target)
{
// TODO: There must be a better way than using cwiseEqual().
if (haystack.block(i, j, nr, nc).cwiseEqual(needle).all())
return Index2D {i, j};
else if (collisions_ptr)
*collisions_ptr += 1;
}
if (i % 2 == 0) // moving right
{
if (j < hc - nc)
{
current -= haystack.block(i, j, nr, 1).count();
++j;
}
else break;
}
else // moving left
{
if (j > 0)
{
current -= haystack.block(i, j + nc - 1, nr, 1).count();
--j;
}
else break;
}
}
if (i % 2 == 0) // at right margin
current -= haystack.block(i, hc - nc, 1, nc).count();
else // at left margin
current -= haystack.block(i, 0, 1, nc).count();
}
return end;
}
int
main(int argc, char * * argv)
{
if (SEED_CSTDLIB_RAND)
{
std::random_device rnddev {};
srand(rnddev());
}
if (argc != 5)
{
std::cerr << "usage: " << PROGRAM
<< " ROWS_HAYSTACK COLUMNS_HAYSTACK"
<< " ROWS_NEEDLE COLUMNS_NEEDLE"
<< std::endl;
return EXIT_FAILURE;
}
auto hr = boost::lexical_cast<Index1D>(argv[1]);
auto hc = boost::lexical_cast<Index1D>(argv[2]);
auto nr = boost::lexical_cast<Index1D>(argv[3]);
auto nc = boost::lexical_cast<Index1D>(argv[4]);
const auto haystack = get_random_bit_matrix(hr, hc);
const auto needle = get_random_bit_matrix(nr, nc);
auto collisions = Index1D {};
const auto idx = findSubMatrix(haystack, needle, &collisions);
const auto end = Index2D {haystack.rows(), haystack.cols()};
std::cout << "This is the haystack:\n\n" << haystack << "\n\n";
std::cout << "This is the needle:\n\n" << needle << "\n\n";
if (idx != end)
std::cout << "Found as sub-matrix at " << idx << ".\n";
else
std::cout << "Not found as sub-matrix.\n";
std::cout << boost::format("There were %d (%.2f %%) hash collisions.\n")
% collisions
% (100.0 * collisions / ((hr - nr) * (hc - nc)));
return (idx != end) ? EXIT_SUCCESS : EXIT_FAILURE;
}
当它编译和运行时,请将上面的内容视为伪代码。我几乎没有尝试优化它。这只是我自己的一个概念验证。