PoseWarping：如何矢量化此 for 循环（z 缓冲区）

2024-03-21

我正在尝试使用地面真实深度图、姿势信息和相机矩阵将帧从视图 1 扭曲到视图 2。我已经能够删除大部分 for 循环并将其矢量化，除了一个 for 循环。扭曲时，由于遮挡，视图 1 中的多个像素可能会映射到视图 2 中的单个位置。在这种情况下，我需要选择深度值最低的像素（前景对象）。我无法对这部分代码进行矢量化。任何帮助向量化这个 for 循环的帮助都是值得赞赏的。

Context:

I'm trying to warp an image into a new view, given ground truth pose, depth, and camera matrix. After computing warped locations, I'm rounding them off. Any suggestions to implement inverse bilinear interpolation are also welcome. My images are of full HD resolution. Hence it is taking a lot of time to warp the frames to the new view. If I can vectorize, I'm planning to convert the code to TensorFlow or PyTorch and run it on a GPU. Any other suggestions to speed up warping, or existing implementations are also welcome.

Code:

def warp_frame_04(frame1: numpy.ndarray, depth: numpy.ndarray, intrinsic: numpy.ndarray, transformation1: numpy.ndarray,
                  transformation2: numpy.ndarray, convert_to_uint: bool = True, verbose_log: bool = True):
    """
    Vectorized Forward warping. Nearest Neighbor.
    Offset requirement of warp_frame_03() overcome.
    mask: 1 if pixel found, 0 if no pixel found
    Drawback: Nearest neighbor, collision resolving not vectorized
    """
    height, width, _ = frame1.shape
    assert depth.shape == (height, width)
    transformation = numpy.matmul(transformation2, numpy.linalg.inv(transformation1))

    y1d = numpy.array(range(height))
    x1d = numpy.array(range(width))
    x2d, y2d = numpy.meshgrid(x1d, y1d)
    ones_2d = numpy.ones(shape=(height, width))
    ones_4d = ones_2d[:, :, None, None]
    pos_vectors_homo = numpy.stack([x2d, y2d, ones_2d], axis=2)[:, :, :, None]

    intrinsic_inv = numpy.linalg.inv(intrinsic)
    intrinsic_4d = intrinsic[None, None]
    intrinsic_inv_4d = intrinsic_inv[None, None]
    depth_4d = depth[:, :, None, None]
    trans_4d = transformation[None, None]

    unnormalized_pos = numpy.matmul(intrinsic_inv_4d, pos_vectors_homo)
    world_points = depth_4d * unnormalized_pos
    world_points_homo = numpy.concatenate([world_points, ones_4d], axis=2)
    trans_world_homo = numpy.matmul(trans_4d, world_points_homo)
    trans_world = trans_world_homo[:, :, :3]
    trans_norm_points = numpy.matmul(intrinsic_4d, trans_world)
    trans_pos = trans_norm_points[:, :, :2, 0] / trans_norm_points[:, :, 2:3, 0]
    trans_pos_int = numpy.round(trans_pos).astype('int')

    # Solve occlusions
    a = trans_pos_int.reshape(-1, 2)
    d = depth.ravel()
    b = numpy.unique(a, axis=0, return_index=True, return_counts=True)
    collision_indices = b[1][b[2] >= 2]  # Unique indices which are involved in collision
    for c1 in tqdm(collision_indices, disable=not verbose_log):
        cl = a[c1].copy()  # Collision Location
        ci = numpy.where((a[:, 0] == cl[0]) & (a[:, 1] == cl[1]))[0]  # Colliding Indices: Indices colliding for cl
        cci = ci[numpy.argmin(d[ci])]  # Closest Collision Index: Index of the nearest point among ci
        a[ci] = [-1, -1]
        a[cci] = cl
    trans_pos_solved = a.reshape(height, width, 2)

    # Offset both axes by 1 and set any out of frame motion to edge. Then crop 1-pixel thick edge
    trans_pos_offset = trans_pos_solved + 1
    trans_pos_offset[:, :, 0] = numpy.clip(trans_pos_offset[:, :, 0], a_min=0, a_max=width + 1)
    trans_pos_offset[:, :, 1] = numpy.clip(trans_pos_offset[:, :, 1], a_min=0, a_max=height + 1)

    warped_image = numpy.ones(shape=(height + 2, width + 2, 3)) * numpy.nan
    warped_image[trans_pos_offset[:, :, 1], trans_pos_offset[:, :, 0]] = frame1
    cropped_warped_image = warped_image[1:-1, 1:-1]
    mask = numpy.isfinite(cropped_warped_image)
    cropped_warped_image[~mask] = 0
    if convert_to_uint:
        final_warped_image = cropped_warped_image.astype('uint8')
    else:
        final_warped_image = cropped_warped_image
    mask = mask[:, :, 0]
    return final_warped_image, mask

代码说明

我使用方程 [1,2] 来获取 view2 中的像素位置
一旦获得像素位置，我需要确定是否存在任何遮挡，如果有，我必须选择前景像素。
`b = numpy.unique(a, axis=0, return_index=True, return_counts=True)` 给了我独特的位置。
如果 view1 中的多个像素映射到 view2 中的单个像素（碰撞），则“return_counts”将给出大于 1 的值。
`collision_indices = b[1][b[2] >= 2]` 给出涉及碰撞的索引。请注意，这只会为每次碰撞提供一个索引。
对于每个这样的碰撞点， `ci = numpy.where((a[:, 0] == cl[0]) & (a[:, 1] == cl[1]))[0]` 提供索引view1 中映射到 view2 中同一点的所有像素。
`cci = ci[numpy.argmin(d[ci])]` 给出具有最低深度值的像素索引。
`a[ci] = [-1, -1]` 和 `a[cci] = cl` 将所有其他背景像素映射到框架外的位置 (-1,-1)，因此将被忽略。

[1] https://i.stack.imgur.com/s1D9t.png https://i.stack.imgur.com/s1D9t.png
[2] https://dsp.stackexchange.com/q/69890/32876 https://dsp.stackexchange.com/q/69890/32876

如果您尝试在 99.9% 的情况下进行图像处理（您就是这样），您会遇到默认 Numpy 函数未涵盖的边缘情况。我不确定如何使用 Numpy 对该代码进行矢量化，但您不必这样做。看看赛通。它允许您创建自定义 C++ 扩展（这就是 Numpy 的真正含义）。您可以从基本 Python 代码开始，逐步添加键入信息以及禁用特定于 Python 的检查（例如禁用环绕和边界检查）。这些可能会导致崩溃，因此一次进行一项优化并确保测试每一步。如果您的代码是可并行的（对我来说似乎是这样）并且您对多线程感到满意，您可以释放 GIL （使用 nogil:) 并将原始数组、偏移量和计数传递给 Cython 函数进行操作在不同线程的共享内存上（使用内置线程池通常效果很好）。如果您想遵循此路径，请告诉我，以便我可以在此答案中添加更多详细信息和代码片段，或者您是否更愿意坚持使用 Numpy。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)