用Python中的索引展平嵌套列表

2024-01-11

我有一个清单['','','',['',[['a','b']['c']]],[[['a','b'],['c']]],[[['d']]]]

我想用索引压平列表，输出应如下所示：

flat list=['','','','','a','b','c','a','b','c','d']
indices=[0,1,2,3,3,3,3,4,4,4,5]

这个怎么做？

我已经尝试过这个：

def flat(nums):
    res = []
    index = []
    for i in range(len(nums)):
        if isinstance(nums[i], list):
            res.extend(nums[i])
            index.extend([i]*len(nums[i]))
        else:
            res.append(nums[i])
            index.append(i)
    return res,index

但这并没有按预期工作。

TL;DR

此实现处理具有无限深度的嵌套迭代：

def enumerate_items_from(iterable):
    cursor_stack = [iter(iterable)]
    item_index = -1
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:
            cursor_stack.pop()
            continue
        if len(cursor_stack) == 1:
            item_index += 1
        if not isinstance(item, str):
            try:
                cursor_stack.append(iter(item))
                continue
            except TypeError:
                pass
        yield item, item_index

def flat(iterable):
    return map(list, zip(*enumerate_items_from(a)))

可用于产生所需的输出：


>>> nested = ['', '', '', ['', [['a', 'b'], ['c']]], [[['a', 'b'], ['c']]], [[['d']]]]
>>> flat_list, item_indexes = flat(nested)
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5]
>>> print(flat_list)
['', '', '', '', 'a', 'b', 'c', 'a', 'b', 'c', 'd']

请注意，您可能应该将索引放在第一位来模仿enumerate做。对于已经了解的人来说会更容易使用enumerate.

重要提示除非您确定列表不会嵌套太多，否则不应使用任何基于递归的解决方案。否则，一旦您的嵌套列表的深度大于 1000，您的代码就会崩溃。我解释一下here https://stackoverflow.com/a/51649649/1720199。请注意，一个简单的调用str(list)会在测试用例上崩溃depth > 1000（对于某些 python 实现来说，它不止于此，但它总是有界的）。使用基于递归的解决方案时遇到的典型异常是（简而言之，这是由于 python 调用堆栈的工作方式造成的）：

RecursionError: maximum recursion depth exceeded ...

实施细节

我将一步一步进行，首先我们将展平一个列表，然后我们将输出展平后的列表和所有项目的深度，最后我们将输出列表和“主列表”中相应的项目索引。

展平列表

话虽这么说，这实际上非常有趣，因为迭代解决方案是为此完美设计的，您可以采用简单的（非递归）列表展平算法：

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:       # post-order
            cursor_stack.pop()
            continue
        if isinstance(item, list):  # pre-order
            cursor_stack.append(iter(item))
        else:
            yield item              # in-order

计算深度

我们可以通过查看堆栈大小来访问深度，depth = len(cursor_stack) - 1

        else:
            yield item, len(cursor_stack) - 1      # in-order

这将返回对（项目，深度）的迭代，如果我们需要将此结果分成两个迭代器，我们可以使用zip功能：

>>> a = [1,  2,  3, [4 , [[5, 6], [7]]], [[[8, 9], [10]]], [[[11]]]]
>>> flatten(a)
[(1, 0), (2, 0), (3, 0), (4, 1), (5, 3), (6, 3), (7, 3), (8, 3), (9, 3), (10, 3), (11, 3)]
>>> flat_list, depths = zip(*flatten(a))
>>> print(flat_list)
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
>>> print(depths)
(0, 0, 0, 1, 3, 3, 3, 3, 3, 3, 3)

我们现在将做一些类似的事情，用项目索引而不是深度。

计算项目索引

要计算项目索引（在主列表中），您需要计算到目前为止所看到的项目数量，这可以通过将 1 添加到item_index每次我们迭代深度为 0 的项目时（当堆栈大小等于 1 时）：

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    item_index = -1
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:             # post-order
            cursor_stack.pop()
            continue
        if len(cursor_stack) == 1:        # If current item is in "main" list
            item_index += 1               
        if isinstance(item, list):        # pre-order
            cursor_stack.append(iter(item))
        else:
            yield item, item_index        # in-order

类似地，我们将使用 ˋzip 将两个迭代中的对分开, we will also use ˋmap将两个迭代器转换为列表：

>>> a = [1,  2,  3, [4 , [[5, 6], [7]]], [[[8, 9], [10]]], [[[11]]]]
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3, 4, 4, 4, 5]

改进——处理可迭代输入

能够采用更广泛的嵌套迭代作为输入可能是可取的（特别是如果您构建它供其他人使用）。例如，如果我们将嵌套迭代作为输入，当前的实现将无法按预期工作，例如：

>>> a = iter([1, '2',  3, iter([4, [[5, 6], [7]]])])
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, '2', 3, <list_iterator object at 0x100f6a390>]
>>> print(item_indexes)
[0, 1, 2, 3]

如果我们希望它起作用，我们需要小心一点，因为字符串是可迭代的，但我们希望它们被视为原子项（而不是字符列表）。而不是像我们之前那样假设输入是一个列表：

        if isinstance(item, list):        # pre-order
            cursor_stack.append(iter(item))
        else:
            yield item, item_index        # in-order

我们不会检查输入类型，而是尝试使用它，就好像它是一个可迭代的，如果失败，我们将知道它不是一个可迭代的（鸭子类型）：

       if not isinstance(item, str):
            try:
                cursor_stack.append(iter(item))
                continue
            # item is not an iterable object:
            except TypeError:
                pass
        yield item, item_index

通过这个实现，我们有：

>>> a = iter([1, 2,  3, iter([4, [[5, 6], [7]]])])
>>> flat_list, item_indexes = map(list, zip(*flatten(a)))
>>> print(flat_list)
[1, 2, 3, 4, 5, 6, 7]
>>> print(item_indexes)
[0, 1, 2, 3, 3, 3, 3]

构建测试用例

如果需要生成深度较大的测试用例，可以使用这段代码：

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

您可以使用它来确保我的实现在深度很大时不会崩溃：

a = build_deep_list(1200)
flat_list, item_indexes = map(list, zip(*flatten(a)))

我们还可以使用以下命令来检查是否无法打印这样的列表str功能：

>>> a = build_deep_list(1200)
>>> str(a)
RecursionError: maximum recursion depth exceeded while getting the repr of an object

功能repr被称为str(list)输入中的每个元素list.

结束语

最后，我同意递归实现更容易阅读（因为调用堆栈为我们完成了一半的工作），但是当实现这样的低级函数时，我认为拥有一个适用于所有情况的代码是一项很好的投资案例（或者至少是您能想到的所有案例）。尤其是当解决方案并不那么困难时。这也是一种不要忘记如何编写在树状结构上工作的非递归代码的方法（除非您自己实现数据结构，否则这种情况可能不会经常发生，但这是一个很好的练习）。

请注意，我所说的“反对”递归都是正确的，因为 python 在面对递归时不会优化调用堆栈的使用：Python 中的尾递归消除 http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html。而许多编译语言都这样做尾调用递归优化 (TCO) https://stackoverflow.com/questions/310974/what-is-tail-call-optimization。这意味着即使你写得完美尾递归 https://stackoverflow.com/questions/33923/what-is-tail-recursionpython 中的函数，它会在深度嵌套列表上崩溃。

如果您需要有关列表展平算法的更多详细信息，您可以参考我链接的帖子。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python3x