Blockwise Scaling for FP8 by manishucsd · Pull Request #1932 · NVIDIA/cutlass (original) (raw)
sijialouintel added a commit to sijialouintel/cutlass that referenced this pull request
Handle MNK Sm90{Row, Col}Reduction problem shapes (NVIDIA#1803)
add is_last_tile
Improve sm90 mixed dtype kernel (NVIDIA#1883)
Add GMMA shape m64n40k16 (NVIDIA#1864)
Add all supported GMMA shapes (NVIDIA#1890)
add maximum support (NVIDIA#1833)
fix typo (NVIDIA#1853)
fix by adding public (NVIDIA#1753)
added mapping for bf16 to torch::kBFloat16 (NVIDIA#1843)
Co-authored-by: Haicheng Wu 57973641+hwu36@users.noreply.github.com
Fix README (NVIDIA#1658)
Fix README
Improve README
Co-authored-by: Haicheng Wu 57973641+hwu36@users.noreply.github.com
Adjusting code indentation (NVIDIA#1639)
Include of regular_tile_iterator.h fixed for NVRTC (NVIDIA#1765)
Include of regular_tile_iterator.h fixed for NVRTC
More include fixed for NVRTC
Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" (NVIDIA#1569)
fix compile with cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2
- remove redundant hardcoded packing configs in mixed dtype gemm (NVIDIA#1894)
Co-authored-by: Siyuan Fu siyuanf@nvidia.com
fix wrong A/BLayout in MMA_Traits for binary mma and append other MMA_Traits support (NVIDIA#1856)
fix wrong A/BLayout in MMA_Traits and append support for m8n8k128, m16n8k128 mma.and.popc in MMA_Traits instantiation
add "print" template for subbyte_reference
Add a print for the uint{x}b_t type. (NVIDIA#1871)
Refactor some GroupedGEMM logic (NVIDIA#1899)
feat: support kFactor 8 used in mma tensor op tile iterator (NVIDIA#1512)
Update publications (NVIDIA#1912)
remove restriction of stride == kernel in nhwc_pooling (NVIDIA#1896)
fix undefined in device code error (NVIDIA#1880)
Fix the racing condition of mixed-input gemm when writing the registers (NVIDIA#1931)
move two warpgroup_wait
merge main
Co-authored-by: Siyuan Fu siyuanf@nvidia.com
Fix
cutlasspython library with cuda12.6.2.post1(NVIDIA#1942)Fix
cutlasspython library with cuda12.6.2.post1
Previously we had this error:
File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp>
_version_splits = [int(x) for x in __version__.split("rc")[0].split(".")]
^^^^^^
ValueError: invalid literal for int() with base 10: 'post1'Update sm90_utils.py
Update generator.py
Update python/cutlass_library/generator.py
Co-authored-by: Jack Kosaian jackkosaian@gmail.com
- Update python/cutlass_library/sm90_utils.py
Co-authored-by: Jack Kosaian jackkosaian@gmail.com
Co-authored-by: Jack Kosaian jackkosaian@gmail.com
add {uint4, uint2, int2} => {fp16, bf16} conversion (NVIDIA#1966)
Improve mixed dtype GEMM (NVIDIA#1972)
update
fix a typo
fix a typo that fails the compiling when ElementScale is not the same as MmaType (NVIDIA#1977)
Fix CuTe README Typo (NVIDIA#1951)
Fix Typo (NVIDIA#1962)
3.6.0 update (NVIDIA#2005)
3.6.0 update
doc and swap stuff
Co-authored-by: yuzhai yuzhai@nvidia.com Co-authored-by: Haicheng Wu haichengw@nvidia.com
Update CHANGELOG.md
Update 0x_gemm_tutorial.md (NVIDIA#1982)
Shouldn't this be BLK_M, BLK_K, k
fix bug: arch/mma_sm60.h Mma<2,2,1> calculate wrong (NVIDIA#1989)
fix mem fence (NVIDIA#2030)
Co-authored-by: yuzhai yuzhai@nvidia.com
Add half->int8 saturate conversion to promise valid range (NVIDIA#1983)
Add half->int8 saturate conversion to promise valid range
add gpu only macro
Co-authored-by: Haicheng Wu haichengw@nvidia.com
Add vector-types back to platform.h (NVIDIA#2026)
Fix typo in library_defaults.py (NVIDIA#2024)
Fix Typos (NVIDIA#2021)
Fix Typo
Fix Typo
Add Line Break (NVIDIA#2020)
Blockwise Scaling for FP8 (NVIDIA#1932)
F8 Blockwise Scaling
two more NumProducerThreadEvents
Co-authored-by: Haicheng Wu haichengw@nvidia.com
fix assertion in integer_subbytes.h (NVIDIA#1961)
CUTLASS 3.7 (NVIDIA#2045)
CUTLASS 3.7
clean up changelog
Co-authored-by: yuzhai yuzhai@nvidia.com Co-authored-by: Haicheng Wu haichengw@nvidia.com
update 3.7 docs (NVIDIA#2051)
update docs
update docs
update docs
update docs
update docs
Co-authored-by: yuzhai yuzhai@nvidia.com
CUTLASS 3.8 Release (NVIDIA#2059)
CUTLASS 3.8 Release
update
Update README.md
Revert "Update README.md"
This reverts commit b353e36.
update
update
Co-authored-by: Haicheng Wu 57973641+hwu36@users.noreply.github.com Co-authored-by: Haicheng Wu haichengw@nvidia.com
fix cuda 12.6 issues (NVIDIA#2066)
fix a readme broken link (NVIDIA#2069)
Update README.md
Groupwise scaling along M for FP8 gemm (NVIDIA#2037)
FP8 groupwise scaling along M
small updates
Co-authored-by: zl zl@deepseek.com Co-authored-by: Haicheng Wu haichengw@nvidia.com
bugfix generic-k code in top-k with softmax (NVIDIA#1993)
bugfix generic-k code in top-k with softmax
Update include/cutlass/epilogue/fusion/sm90_visitor_topk_softmax.hpp
Co-authored-by: Ali Hassani 68103095+alihassanijr@users.noreply.github.com
- Update examples/61_hopper_gemm_with_topk_and_softmax/61_hopper_gemm_with_topk_and_softmax.cu
Co-authored-by: Ali Hassani 68103095+alihassanijr@users.noreply.github.com
Co-authored-by: Ali Hassani 68103095+alihassanijr@users.noreply.github.com
[EVT] Add support for Row/Col broadcast PtrArray (NVIDIA#2033)
Add group support to EVT row/col broadcast.
small modifications
Co-authored-by: Haicheng Wu haichengw@nvidia.com
v3.8.0 update (NVIDIA#2082)
3.8 update
fix Markus' name
Co-authored-by: yuzhai yuzhai@nvidia.com
- [WA] Fix compiling errors
Co-authored-by: Saagar Jha saagar@saagarjha.com Co-authored-by: Haicheng Wu haichengw@nvidia.com Co-authored-by: Sergey Klevtsov 141879860+sklevtsov-nvidia@users.noreply.github.com Co-authored-by: Tri Dao tridao@users.noreply.github.com Co-authored-by: Xinyu Yang ltyxy@buaa.edu.cn Co-authored-by: sijialou sijia.lou@intel.com Co-authored-by: Bogumil Sapinski Mobica 48835513+Bogumil-Sapinski-Mobica@users.noreply.github.com Co-authored-by: Haicheng Wu 57973641+hwu36@users.noreply.github.com Co-authored-by: Lei Mao dukeleimao@gmail.com Co-authored-by: 103yiran 1039105206@qq.com Co-authored-by: MaxAkaAltmer MaxAkaAltmer@yandex.ru Co-authored-by: 侯奇 houqi1993@gmail.com Co-authored-by: Lain 28486541+IwakuraRein@users.noreply.github.com Co-authored-by: Siyuan Fu siyuanf@nvidia.com Co-authored-by: Caleb_Du 59528230+CalebDu@users.noreply.github.com Co-authored-by: LiYu Lu luliyucoordinate@outlook.com Co-authored-by: azhurkevich 101208641+azhurkevich@users.noreply.github.com Co-authored-by: chenwei 15601910741@163.com Co-authored-by: Wenlei Bao 142055114+wenlei-bao@users.noreply.github.com Co-authored-by: LiuQiang thorneliu@gmail.com Co-authored-by: dan_the_3rd 43445237+danthe3rd@users.noreply.github.com Co-authored-by: Jack Kosaian jackkosaian@gmail.com Co-authored-by: Yujia Zhai yzhai015@ucr.edu Co-authored-by: yuzhai yuzhai@nvidia.com Co-authored-by: Andrew O'Neill foolusion@gmail.com Co-authored-by: Dongxu.Wang wangdongxuking61@gmail.com Co-authored-by: ZZK 359521840@qq.com Co-authored-by: Driss Guessous 32754868+drisspg@users.noreply.github.com Co-authored-by: ZincCat 52513999+zinccat@users.noreply.github.com Co-authored-by: Manish Gupta mgupta.iitr@gmail.com Co-authored-by: bobliao codechaser@163.com Co-authored-by: mihir-awatramani 162148077+mihir-awatramani@users.noreply.github.com Co-authored-by: Liang 44948473+soundOfDestiny@users.noreply.github.com Co-authored-by: zl zl@deepseek.com Co-authored-by: Tadej Ciglarič tadej.c@gmail.com Co-authored-by: Ali Hassani 68103095+alihassanijr@users.noreply.github.com Co-authored-by: Josh Fromm jwfromm@meta.com