Proposal: Add smart chunking utility (like chunked()) to itertools or stdlib (original) (raw)
Hi All,
I’d like to propose adding a general-purpose, composable chunking utility to the Python standard library (possibly under itertools
). The goal is to cover a wide range of real-world chunking needs like fixed-size grouping, sliding windows, conditional filtering, and chunk selection logic.
Project Info:
- GitHub: GitHub - catchmaurya/smartchunks: Advanced Python chunking with stride, filtering, and expressions
- PyPI: smartchunks · PyPI
Features:
- Fixed-size chunking
- Sliding window (via
stride
) nth_position
andchunk_position
for advanced selection- Optional padding for incomplete chunks
- Filter function to keep only qualifying chunks
- Materialize as generator or list
Example usage:
python
CopyEdit
from smartchunks import chunked
data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Basic chunking
print(list(chunked(data, size=3)))
# → [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Sliding window
print(list(chunked(data, size=3, stride=1)))
# → [[1,2,3], [2,3,4], ..., [7,8,9]]
# Advanced selection
print(list(chunked(data, size=2, nth_position=2)))
# → [[1, 3], [5, 7], [9]]
Would love to hear your feedback on whether this could belong in the standard library or potentially as an enhancement to itertools
.
Thanks so much!
– Maurya Allimuthu
Im against it
the tricky cases are not an api thats sensible to use and the easy cases are easy combinations of zip iter and tee
NeilGirdhar (Neil Girdhar) June 8, 2025, 3:40pm 3
catchmaurya (Catchmaurya) June 8, 2025, 3:58pm 4
Thanks Ronny and Neil — appreciate your quick feedback
@ Neil Girdhar:
Yes, I reviewed more_itertools
. It’s fantastic — and I agree that chunked
, windowed
, and stagger
cover a lot of ground.
Where smartchunks
expands beyond that is in pipeline logic:
nth_position
andchunk_position
: useful for selective time-series sampling (e.g., telemetry, logs)stride + filter_fn
: mimics overlapping windowed views but supports filtering out irrelevant chunksapply_nth_before_chunk
: lets you reconfigure the pipeline likemap → chunk
vschunk → map
So it’s not meant to replace the basics — but to cover cases where chunk filtering and logic routing matter.
@ Ronny Pfannschmidt:
Totally fair — the easy cases should stay easy (and zip/tee
are still king there).
The more expressive options are inspired by real-world usage in logs, NLP batching, error detection pipelines, etc.
That said, I’d be open to a simpler version like chunked_plus()
with a few extra knobs:
stride
- optional
filter
- optional
pad
Would something in that direction feel more at home in itertools
or more_itertools
?
NeilGirdhar (Neil Girdhar) June 8, 2025, 4:09pm 5
Just compose functions.
Compose in the other order.
Stagger does this.
Also, it’s a bit odd that since you knew about more-itertools, you would choose examples that are already handled by more-itertools.
catchmaurya (Catchmaurya) June 8, 2025, 4:24pm 6
Hi Neil and team, thanks for the pointer!
I’m absolutely aware of and appreciate how more_itertools
provides core utilities like chunked
, windowed
, and stagger
. These cover the essentials brilliantly.
What smartchunks
adds on top of that:
nth_position
– lets you skip items before or after chunkingchunk_position
– lets you sample every nth chunklife of pipeline
– choose the order of transformations withapply_nth_before_chunk
stride + filter_fn
– get overlapping chunks and filter based on custom logic- Optional padding and generator/list materialization
So where more_itertools
solves the building blocks, smartchunks
offers a composed chunk-processing pipeline in one function—ideal for conditional workflows like log segmentation, telemetry sampling, or NLP batching.
I’m open to condensing this into a clear “chunked_plus()or simplified interface if that aligns better with
itertools`. Would love to hear if that reframing makes sense!
Thanks again
– Maurya
ayhanfuat (Ayhan Ç.) June 8, 2025, 4:25pm 7
itertools added batched in 3.12.