官术网_书友最值得收藏!

Flattening sequences

Sometimes, we'll have zipped data that needs to be flattened. For example, our input could be a file that has columnar data. It looks like this:

2     3      5      7     11     13     17     19     23     29
31    37     41     43    47     53     59     61     67     71
...

We can easily use (line.split() for line in file) to create a sequence. Each item within that sequence will be a 10-item tuple from the values on a single line.

This creates data in blocks of 10 values. It looks as follows:

>>> blocked = list(line.split() for line in file)
>>> blocked
[['2', '3', '5', '7', '11', '13', '17', '19', '23', '29'], ['31', '37', '41', '43', '47', '53', '59', '61', '67', '71'], ['179', '181', '191', '193', '197', '199', '211', '223', '227', '229']]

This is a start, but it isn't complete. We want to get the numbers into a single, flat sequence. Each item in the input is a 10 tuple; we'd rather not deal with decomposing this one item at a time.

We can use a two-level generator expression, as shown in the following code snippet, for this kind of flattening:

>>> (x for line in blocked for x in line)
<generator object <genexpr> at 0x101cead70>
>>> list(_)
['2', '3', '5', '7', '11', '13', '17', '19', '23', '29', '31', 
'37', '41', '43', '47', '53', '59', '61', '67', '71',
... ]

The first for clause assigns each item—a list of 10 values— from the blocked list to the line variable. The second for clause assigns each individual string from the line variable to the x variable. The final generator is this sequence of values assigned to the x variable.

We can understand this via a simple rewrite as follows:

def flatten(data: Iterable[Iterable[Any]]) -> Iterable[Any]:
for line in data:
for x in line:
yield x

This transformation shows us how the generator expression works. The first for clause (for line in data) steps through each 10-tuple in the data. The second for clause (for x in line) steps through each item in the first for clause.

This expression flattens a sequence-of-sequence structure into a single sequence. More generally, it flattens any iterable that contains an iterable into a single, flat iterable. It will work for list-of-list as well as list-of-set or any other combination of nested iterables.

主站蜘蛛池模板: 涟源市| 惠州市| 泾阳县| 邻水| 涪陵区| 临沂市| 丰台区| 安岳县| 宣化县| 新乡市| 天水市| 贵溪市| 息烽县| 固镇县| 弥勒县| 行唐县| 绥芬河市| 宝兴县| 金门县| 彝良县| 油尖旺区| 大兴区| 赞皇县| 乌海市| 沁源县| 汾阳市| 积石山| 深圳市| 建湖县| 桦甸市| 天祝| 四会市| 延寿县| 门头沟区| 辉南县| 武定县| 新化县| 丰县| 洪洞县| 尖扎县| 华坪县|