Python has a memory allocator optimized for allocations <= 512 bytes: PyObject_Malloc(). It was discussed to replace it by the native "Low-fragmentation Heap" memory allocator on Windows. I'm not aware of anyone who tried that. I would nice to try, especially to run benchmarks. See also the issue #26249: "Change PyMem_Malloc to use PyObject_Malloc allocator?".
We tried it at one point, but it made very little difference because we don't use the Windows heap for most allocations. IIRC, replacing Python's optimised allocator with the LFH was a slight performance regression, but I'm not sure the benchmarks were reliable enough back then to be trusted. I'm also not sure what optimisations have been performed in Windows 8/10. Since the LFH is the default though, it really should just be a case of replacing Py_Malloc with a simple HeapAlloc shim and testing it. The APIs are nearly the same (the result of GetProcessHeap() will be stable for the lifetime of the process, and there's little value in creating specific heaps unless you intend to destroy it rather than free each allocation individually).