mrfakename (@realmrfakename) on X (original) (raw)
We finally have open-source, long-form, expressive TTS models. VibeVoice-1.5B from Microsoft: ✅ MIT licensed ✅ Up to 90 minutes long ✅ Highly expressive & emotional!
New 17B open image generation model released 👀 ➡️ 3 variants: Full (50 steps), Dev (28 steps), Fast (16) steps ➡️ Llama 3-based text encoder ➡️ MIT licensed (but text encoder licensed under Llama 3)
🎵 An open-source music generation model (ACE-Step) was just released by StepFun AI + ACE Studio! 📐 3.5B parameter open-weight model 🔓 Apache-2.0 license 🎙️ Supports lyric generation 🚀 Can generate 4m songs in just 20s on A100
This should be illegal. I see it with every major model release. Kokoro, VibeVoice, F5-TTS, now Nano Banana. Low-effort sites masquerading as official pages. SEO optimizing and stealing traffic from the real authors. Often the just embed a Gradio demo or sometimes even sell
Hugging Face announces Cosmo 1B, a fully open sourced Phi competitor with an open sourced dataset. The dataset references various articles and textbooks as "seed data" to generate conversations. Licensed under the Apache 2.0 license.
Grok 2 was just released on Hugging Face... And its not even open-sourced 😔 "Grok 2 Community License Agreement" ❌ $1M commercial usage cap ❌ No distillation allowed ❌ Attribution required "Powered by xAI"
🚀 MegaTTS 3 voice cloning is here For context: a while back, ByteDance released MegaTTS 3 (with exceptional voice cloning capabilities), but for various reasons, they decided not to release the WavVAE encoder necessary for voice cloning to work. Recently, a WavVAE encoder
Just launched TTS Arena V2! 🎙️ Blind A/B voting for TTS models (open + closed) 🗣️ New: Conversational Arena (CSM 1B, Dia 1.6B, more) 📊 Personal Leaderboard to track your favorites (optional login) ⚡ Rebuilt from scratch - faster + keyboard shortcuts
🎙️ Just finished training OpenF5-TTS, an Apache 2.0 retrain of the F5-TTS model! 🔊 Zero-shot voice cloning w/ 3s of audio 🔓 Apache 2.0 - free for commercial use 🔃 Compatible w/ F5-TTS - use as a drop-in replacement ⏳ Trained for 1M steps on almost 100K hours of speech
OpenF5 TTS is trending on the first page of Hugging Face for text-to-speech models! It's an Apache 2.0-licensed retrain of the F5-TTS model that can be used for commercial purposes
Live demo for Dia 1.6B now live on @huggingface spaces. Dia is a new Apache-licensed 1.6B text-to-dialogue model released by Nari Labs. Below is a simple demo to try it out:
Replying to @GroqInc Very cool! Created a proxy to use it with Claude Code:
Muyan TTS was just released 🎙️ TTS model designed for podcast generation 💰 Trained with $50,000 budget 🔊 Trained on >100K hours of podcast speech 🔓 Licensed under Apache 2.0 🦙 Based on Llama 3B