BitsFusion (original) (raw)
1Snap Inc. 2Rutgers University
NeurIPS 2024
BitsFusion compresses the UNet of Stable Diffusion v1.5 (1.72 GB, FP16) into 1.99 bits (219 MB), achieving a 7.9X compression ratio and even better performance.
Left 1: a portrait of an anthropomorphic cyberpunk raccoon smoking a cigar, cyberpunk!, fantasy, elegant, digital painting, artstation, concept art, matte, sharp focus, illustration, art by josan Gonzalez
Left 2: Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumetric lighting, spectacular, ambient lights, light pollution, cinematic atmosphere, art nouveau style, illustration art artwork by SenseiJaye, intricate detail.
Left 3: tropical island, 8 k, high resolution, detailed charcoal drawing, beautiful hd, art nouveau, concept art, colourful, in the style of vadym meller
Left 4: anthropomorphic art of a fox wearing a white suit, white cowboy hat, and sunglasses, smoking a cigar, texas inspired clothing by artgerm, victo ngai, ryohei hase, artstation. highly detailed digital painting, smooth, global illumination, fantasy art by greg rutkowsky, karl spitzweg
Left 5: a painting of a lantina elder woman by Leonardo da Vinci . details, smooth, sharp focus, illustration, realistic, cinematic, artstation, award winning, rgb , unreal engine, octane render, cinematic light, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art CG render made in Maya, Blender and Photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, arthouse.
Left 6: panda mad scientist mixing sparkling chemicals, high-contrast painting
Left 7: An astronaut riding a horse on the moon, oil painting by Van Gogh.
Left 8: A red dragon dressed in a tuxedo and playing chess. The chess pieces are fashioned after robots.
Top: Images generated from full-precision Stable Diffusion v1.5. Bottom: Images generated from BitsFusion, where the weights of UNet are quantized into 1.99 bits, achieving 7.9X smaller storage than the one from Stable Diffusion v1.5. All the images are synthesized under the setting of using PNDM sampler with 50 sampling steps and random seed as 1024.
Abstract
Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality.
Presentation Video
Overview of Training and Inference Pipeline
Left: We analyze the quantization error for each layer in SD-v1.5 and derive the mixed-precision recipe to assign different bit widths to different layers. We then initialize the quantized UNet by adding a balance integer, pre-computing and caching the time embedding, and alternately optimizing the scaling factor. Middle: During the Stage-I training, we freeze the teacher model (i.e., SD-v1.5) and optimize the quantized UNet through CFG-aware quantization distillation and feature distillation losses, along with sampling time steps by considering quantization errors. During the Stage-II training, we fine-tune the previous model with the noise prediction. Right: For the inference stage, using the pre-cached time features, our model processes text prompts and generates high-quality images.
Benchmark Comparisons
Comparison between our 1.99-bits model vs. SD-v1.5 on various evaluation metrics with CFG scales ranging from 2.5 to 9.5.
Human Evaluation
Human evaluation comparisons between SD-v1.5 and BitsFusion. BitsFusion is favored 54.41% of the time over SD-v1.5.
More Comparisons
Hover the cursor on the images to reveal the prompts.
Left 1: A person standing on the desert, desert waves, gossip illustration, half red, half blue, abstract image of sand, clear style, trendy illustration, outdoor, top view, clear style, precision art, ultra high definition image
Left 2: A detailed oil painting of an old sea captain, steering his ship through a storm. Saltwater is splashing against his weathered face, determination in his eyes. Twirling malevolent clouds are seen above and stern waves threaten to submerge the ship while seagulls dive and twirl through the chaotic landscape. Thunder and lights embark in the distance, illuminating the scene with an eerie green glow.
Left 3: A solitary figure shrouded in mists peers up from the cobble stone street at the imposing and dark gothic buildings surrounding it. an old-fashioned lamp shines nearby. oil painting.
Left 4: A deep forest clearing with a mirrored pond reflecting a galaxy-filled night sky
Left 5: a handsome 24 years old boy in the middle with sky color background wearing eye glasses, it's super detailed with anime style, it's a portrait with delicated eyes and nice looking face
Left 6: A dog that has been meditating all the time.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: A small cactus with a happy face in the Sahara desert.
Left 2: A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.
Left 3: A high contrast portrait photo of a fluffy hamster wearing an orange beanie and sunglasses holding a sign that says "Let's PAINT!”
Left 4: An extreme close-up of an gray-haired man with a beard in his 60s, he is deep in thought pondering the history of the universe as he sits at a cafe in Paris, his eyes focus on people offscreen as they walk as he sits mostly motionless, he is dressed in a wool coat suit coat with a button-down shirt , he wears a brown beret and glasses and has a very professorial appearance, and the end he offers a subtle closed-mouth smile as if he found the answer to the mystery of life, the lighting is very cinematic with the golden light and the Parisian streets and city in the background, depth of field, cinematic 35mm film.
Left 5: poster of a mechanical cat, techical Schematics viewed from front and side view on light white blueprint paper, illustartion drafting style, illustation, typography, conceptual art, dark fantasy steampunk, cinematic, dark fantasy
Left 6: I want to supplement vitamin c, please help me paint related food.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: new cyborg with cybertronic gadgets and vr helmet, hard surface, beautiful colours, sharp textures, shiny shapes, acid screen, biotechnology, tim hildebrandt, bruce pennington, donato giancola, larry elmore, masterpiece, trending on artstation, featured on pixiv, cinematic composition, dramatic pose, beautiful lighting, sharp, details, hyper - detailed, hd, hdr, 4 k, 8 k
Left 2: portrait of teenage aphrodite, light freckles, curly copper colored hair, smiling kindly, wearing an embroidered white linen dress with lace neckline, intricate, elegant, mother of pearl jewelry, glowing lights, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by wlop, mucha, artgerm, and greg Rutkowski
Left 3: portrait of a dystopian cute dog wearing an outfit inspired by the handmaid � s tale ( 2 0 1 7 ), intricate, headshot, highly detailed, digital painting, artstation, concept art, sharp focus, cinematic lighting, digital painting, art by artgerm and greg rutkowski, alphonse mucha, cgsociety
Left 4: Portrait of a man by Greg Rutkowski, symmetrical face, a marine with a helmet, using a VR Headset, Kubric Stare, crooked smile, he's wearing a tacitcal gear, highly detailed portrait, scifi, digital painting, artstation, book cover, cyberpunk, concept art, smooth, sharp foccus ilustration, Artstation HQ
Left 5: Film still of female Saul Goodman wearing a catmaid outfit, from Red Dead Redemption 2 (2018 video game), trending on artstation, artstationHD, artstationHQ
Left 6: oil paining of robotic humanoid, intricate mechanisms, highly detailed, professional digital painting, Unreal Engine 5, Photorealism, HD quality, 8k resolution, cinema 4d, 3D, cinematic, professional photography, art by artgerm and greg rutkowski and alphonse mucha and loish and WLOP
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: anthropomorphic tetracontagon head in opal edgy darknimite mudskipper, intricate, elegant, highly detailed animal monster, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm, bob eggleton, michael whelan, stephen hickman, richard corben, wayne barlowe, trending on artstation and greg rutkowski and alphonse mucha, 8 k
Left 2: background shows moon, many light effects, particle, lights, gems, symmetrical!!! centered portrait dark witch, large cloak, fantasy forest landscape, dragon scales, fantasy magic, undercut hairstyle, short purple black fade hair, dark light night, intricate, elegant, sharp focus, digital painting, concept art, matte, art by wlop and artgerm and greg rutkowski and alphonse mucha, masterpiece
Left 3: cat seahorse fursona, autistic bisexual graphic designer and musician, long haired attractive androgynous fluffy humanoid character design, sharp focus, weirdcore voidpunk digital art by artgerm, akihiko yoshida, louis wain, simon stalenhag, wlop, noah bradley, furaffinity, artstation hd, trending on deviantart
Left 4: concept art of ruins of a victorian city burning down by j. c. leyendecker, wlop, ruins, dramatic, octane render, epic painting, extremely detailed, 8 k
Left 5: hyperrealistic Gerald Gallego as a killer clown from outer space, trending on artstation, portrait, sharp focus, illustration, art by artgerm and greg rutkowski and magali Villeneuve
Left 6: low angle photo of a squirrel dj wearing on - ear headphones and colored sunglasses, stadning at a dj table playing techno music at a dance club, hyperrealistic, highly detailed, intricate, smoke, colored lights, concept art, digital art, oil painting, character design by charlie bowater, ross tran, artgerm, makoto shinkai, wlop
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: a photograph of an ostrich wearing a fedora and singing soulfully into a microphone
Left 2: a pirate ship landing on the moon
Left 3: a pumpkin with a candle in it
Left 4: a rabbit wearing a black tophat and monocle
Left 5: a red sports car on the road
Left 6: a robot cooking in the kitchen.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: a baby daikon radish in a tutu
Left 2: a baby penguin wearing a blue hat, red gloves, green shirt, and yellow pants
Left 3: a woman with long black hair and dark skin
Left 4: an emoji of a baby penguin wearing a blue hat, red gloves, green shirt, and yellow pants
Left 5: a blue sports car on the road
Left 6: a butterfly.
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.
Left 1: Helmet of a forgotten Deity, clowing corals, extremly detailed digital painting, in the style of Fenghua Zhong and Ruan Jia and jeremy lipking and Peter Mohrbacher, mystical colors, rim light, beautiful lighting, 8k, stunning scene, raytracing, octane, trending on artstation
Left 2: Jeff Bezos as a female amazon warrior, closeup, D\&D, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, hearthstone, art by Artgerm and Greg Rutkowski and Alphonse Mucha
Left 3: Portrait of a draconic humanoid, HD, illustration, epic, D\&D, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, monster hunter illustrations art book
Left 4: [St.Georges slaying a car adorned with checkered flag. Soviet Propaganda!!! poster!!!, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, octane render, unreal engine, photography]
Left 5: a fire - breathing dragon at a medieval hobbit home, ornate, beautiful, atmosphere, vibe, mist, smoke, chimney, rain, wet, pristine, puddles, waterfall, clear stream, bridge, forest, flowers, concept art illustration, color page, 4 k, tone mapping, doll, akihiko yoshida, james jean, andrei riabovitchev, marc simonetti, yoshitaka amano, digital illustration, greg rutowski, volumetric lighting, sunbeams, particles
Left 6: portrait of a well-dressed raccoon, oil painting in the style of Rembrandt
Top: full-precision Stable Diffusion v1.5. Bottom: 1.99 bits BitsFusion.