Community lexicons · karashiiro/TextToTalk · Discussion #62 (original) (raw)

Community lexicons #62

Aug 13, 2021

· 71 comments· 55 replies

(Continued from #43)

If anyone has lexicons they're willing to share, I'd appreciate it if they could drop a link so I can provide them to anyone who wants them and doesn't know how to make them themselves 😄 Alternatively, feel free to post them in the #preset-sharing channel in the goat place Discord, and we can relink them.

A list of community lexicons is maintained on the wiki, and we have a checked-in collection in the repo.

You must be logged in to vote

You must be logged in to vote

0 replies

You must be logged in to vote

0 replies

FFXIVCharacters&LocationsEN.zip
fixed a mistake where two entries phonemes were swapped while testing

I've done all the characters I've noticed the most when going through MSQ. As well as Mispronounced Location names. These use pronemes so pronunciation should be more consistent though different regions of English.
Works with all English voices. (tested for US and GB)

You must be logged in to vote

0 replies

Is there an app or something easy to make these lexicons?

You must be logged in to vote

0 replies

I am not sure if this should be a separate issue or not, but I tried using the FFXIVCharacters&LocationsEN.zip lexicon (both through Amazon Polly and directly uploaded to the plugin) and it said "Maximum lexicons size has been exceeded".

You must be logged in to vote

0 replies

Is there an app or something easy to make these lexicons?

https://docs.aws.amazon.com/polly/latest/dg/gs-put-lexicon.html

I don't know of any apps for this, but this article has some lexicons used in its examples that might explain the concept.

I am not sure if this should be a separate issue or not, but I tried using the FFXIVCharacters&LocationsEN.zip lexicon (both through Amazon Polly and directly uploaded to the plugin) and it said "Maximum lexicons size has been exceeded".

I've never heard of this happening, but I assume that means Amazon Polly has some sort of size limit on lexicons. You can try splitting the lexicon in half, maybe? Pulling out half of the lexemes and putting them into a new lexicon file and uploading the resulting two smaller ones.

You must be logged in to vote

0 replies

I've never heard of this happening, but I assume that means Amazon Polly has some sort of size limit on lexicons. You can try splitting the lexicon in half, maybe? Pulling out half of the lexemes and putting them into a new lexicon file and uploading the resulting two smaller ones.

Looks like you nailed it, as per Amazon Polly’s site, “Each lexicon can be up to 4,000 characters in size. ”
I’ll have to wait until I have time to figure out how to separate them on an actual computer.

You must be logged in to vote

0 replies

Looks like you nailed it, as per Amazon Polly’s site, “Each lexicon can be up to 4,000 characters in size. ”
I’ll have to wait until I have time to figure out how to separate them on an actual computer.

Wow 4000 characters is quite a small limit, in the future I'll have to make split files for Amazon Polly. For now though, I removed a lexeme that can't be used right now with the way the plugin currently works, and the character count is now 3999! If there is any issue let me know!
FFXIVLexiconPollyEN.zip

You must be logged in to vote

0 replies

Oh damn, thank you so much! Hah, karashiiro saw the future and knew it needed to get under 4000 characters LOL. But seriously, thank you so much, I'll try it in a few, hopefully, after a few morning jobs. Also, do you recommend uploading it to Amazon directly or just uploading it to the addon?

You must be logged in to vote

0 replies

You'll need to upload it through the TextToTalk plugin.

You must be logged in to vote

0 replies

You must be logged in to vote

0 replies

FFXIVCharacters&Locations.zip

You must be logged in to vote

0 replies

FFXIVCharacters&Locations.zip

* Fixes pronunciation for Urianger's name for Microsoft David. Hopefully works with other voices (confirmed working for Zira atleast). I haven't been able to test others since I reinstalled windows. Under 4000 characters btw

That zip looks empty 👀

You must be logged in to vote

0 replies

You must be logged in to vote

0 replies

--------Update: Added plurals to the new additions--------

Fixed pronunciation for the word Aetheryte and one of the expansions location names.
There will be more updates coming as I play through the expansion! Glad to have TTT on the first day I've been able to play the story!

Also I've split the polly zip version into two lexicon files to respect the 4000 character limit.

FFXIVCharacters.Locations.zip

FFXIVCharacters.Locations.Polly.zip

You must be logged in to vote

0 replies

I've been sitting on an idea for the lexicon repository for a while now, and I think it would be pretty neat to have a main repository for all the non-"controversial" pronunciations for lack of a better word. And then, on the side, for people like me who prefer a more western style pronunciation for some character names and who use UK voices with British accents, I could create an alternative lexicon for words so that R's sound like R's and L's sound like L's in the context of a British accent, which many voiced characters in FFXIV actually do have.

For example, the character name Yugiri has two alternative ways of pronouncing it, depending on which character in FFXIV is speaking the name.

You must be logged in to vote

0 replies

Should we consider removing "Eld" and "eld" from the lexicon? It seems to be messing with the world "battlefield", pronouncing it as "battle-fee eld" instead of "battle-feeld". Is that a bug that can be fixed in C# code or is that just how lexicons work and that we should just remove "eld" from the lexicon?

Below is a screenshot of the Dalamud Log for this text to speech:

image

You must be logged in to vote

2 replies

@ryankhart

Tagging #104 as a related issue.

If, however, we force all graphemes to match only when surrounded by spaces or punctuation, that could also break the graphemes' that make 's sound like z. So I like the regex idea to allow us to manually specify extra details about the surrounding environment in order to match.

@johnysandels

Yeah I'd just put a space or punctuation in the lexicon file on a case by case basis. Until we sort out a better solution that is.

Yeah, I'm mentioned the same issue in that community post I made Here . Just was trying to figure out if it's possible to regex those shorter words or something. The work around for eld would be putting a space before it, so prevent it from interrupting mid word.

You must be logged in to vote

0 replies

I just finished adding Uberduck support as of v1.15, no need to debug lexicons on it though (since it doesn't seem to support SSML).

You must be logged in to vote

0 replies

Is there any reason why we shouldn't just append new lexemes to the bottom of the main lexicon? I grew tired of manually finding the right category and alphabetical place to place each lexeme when I can just ctrl+f find to check if a word has been added yet. I'm inclined to just add new lexemes at the bottom of the lexicon and then push the commit, but maybe there's a reason I haven't thought of for the way you have it organized @johnysandels.

I've just pushed a batch of new lexemes that I've added to my local copy as I've been playing through Shadowbringers since it's been working fine for me for a couple weeks now.

You must be logged in to vote

1 reply

@johnysandels

No real point to make it organized tbh. I just like it! If we're just doing them as we think of them, I'll remove the headings on the sections.

Hey, do you think there is a way to fix the mammets's speech with a Lexicon? At least using Polly, it's unintelligible. Example:

UgH, mY hEaD. wHo...? Oh, It'S yOu. YoU fOlLoWeD mE...

You must be logged in to vote

5 replies

@johnysandels

Ahahah! I could imagine what that sounds like.

It would be possible to fix, but it would be at a plugin level. and it would cause issues with lexicons afterwards depending on how it's implemented. Capitalization is important to how the lexicon is written, so if it was all forced into lowercase we would have to rewrite some of our lexicons. And we would run into issues as well.

Only option I could think of, is to have it change the text to lower case conditionally. Maybe it could parse the string and only set it all to lowercase if the text is =>40% capitals

@karashiiro

Would it also cause issues if the graphemes were made to be lowercase, too? I don't know if there are any lexemes that are different depending on their capitalization.

@johnysandels

Yeah the only real issue would be short words at the beginning of the sentence, where the word could be part of other word. Like "Oh". There are many words that have oh in them and in the beginning of the text the only way to discern it from other words that have oh in them is the capital.

There was an idea about using regex to fix that issue, but it opens another can of worms.

@ryankhart

I just saw this, and this gave me and idea, but I don't know how feasible it is yet. What if literally all text was converted to lowercase before feeding it to text-to-speech? That might solve both this problem and not make us have to keep various versions of every non-proper noun word in the lexicons.

@karashiiro

That would run into the issue @johnysandels just mentioned, unless we changed SSML rendering significantly (which we could do, it'd just be a lot of work).

Small notice: I added Azure support in v1.18, and it has lexicons enabled. Not sure if it'll create any new compatibility issues, but from my minimal testing it looks like it works fine with Polly lexicons.

You must be logged in to vote

0 replies

Hey

I have installed the plugin and unfortunately every time I enter the API Credential key from Amazon Polly it disappears, I thought I'd inform you on this bug, it maybe due to the recent patch.

Thanks

You must be logged in to vote

2 replies

@karashiiro

I'm aware, I just haven't time to deal with this, yet. Going to fix it tonight when I'm back at my hotel.

@IcePantha

Thanks, honestly your Text To Talk plugin is amazing, it's helped me so much when was playing Final Fantasy.

I'm aware, I just haven't time to deal with this, yet. Going to fix it tonight when I'm back at my hotel.

You must be logged in to vote

0 replies

Was doing some scholar/summoner quests and noticed these needed fixing:

<lexeme>
<grapheme>Zolka</grapheme>
<phoneme>zolkɑ</phoneme>
</lexeme>
<lexeme>
<grapheme>Nym</grapheme>
<phoneme>nɪ̈m</phoneme>
</lexeme>
<lexeme>
<grapheme>Nymian</grapheme>
<phoneme>nɪmi.ən</phoneme>
</lexeme>
<lexeme>
<grapheme>Mamool Ja</grapheme>
<phoneme>ma.muːl d͡ʒɑ</phoneme>
</lexeme>
<lexeme>
<grapheme>Y'mhitra</grapheme>
<phoneme>jˈmiːtɹɑ</phoneme>
</lexeme>
<lexeme>
<grapheme>Y'mhitra's</grapheme>
<phoneme>jˈmiːtɹɑz</phoneme>
</lexeme>
<lexeme>
<grapheme>Mhitra</grapheme>
<phoneme>miːtɹɑ</phoneme>
</lexeme>
<lexeme>
<grapheme>-egi</grapheme>
<grapheme>-Egi</grapheme>
<grapheme> egi </grapheme>
<grapheme> egi.</grapheme>
<phoneme>ˈɛɡi</phoneme>
</lexeme>
<lexeme>
<grapheme>Ramuh</grapheme>
<phoneme>raɱu</phoneme>
</lexeme>
<lexeme>
<grapheme>Ifrit</grapheme>
<phoneme>ifrjt</phoneme>
</lexeme>
<lexeme>
<grapheme>Mhach</grapheme>
<phoneme>mɑ́ːk</phoneme>
</lexeme>
<lexeme>
<grapheme>Mhachi</grapheme>
<phoneme>mɑ́ːki</phoneme>
</lexeme>
<lexeme>
<grapheme>voidmage</grapheme>
<phoneme>vɔɪdmeɪd͡ʒ</phoneme>
</lexeme>
<lexeme>
<grapheme>spriggan</grapheme>
<grapheme>Spriggan</grapheme>
<phoneme>sprɪgən</phoneme>
</lexeme>
<lexeme>
<grapheme>Carito</grapheme>
<phoneme>kɑɹˈiː.təʊ</phoneme>
</lexeme>
<lexeme>
<grapheme>Allag's</grapheme>
<phoneme>ælɛgz</phoneme>
</lexeme>
<lexeme>
<grapheme>R'ashaht</grapheme>
<phoneme>ɹʌʃˈaːt</phoneme>
</lexeme>
<lexeme>
<grapheme>Qiqirn</grapheme>
<phoneme>kɪˈkirn</phoneme>
</lexeme>
<lexeme>
<grapheme>Thanalan</grapheme>
<phoneme>ˈθænələn</phoneme>
</lexeme>
<lexeme>
<grapheme>a construct</grapheme>
<phoneme>ɑːˈkɑn.stɹʌkt</phoneme>
</lexeme>
<lexeme>
<grapheme>Bozja</grapheme>
<phoneme>ˈbozj̆a</phoneme>
</lexeme>

Also I got tired of hearing HM for all those "Hm..." so I had to extend the current ones. Can anyone else test these? I don't' know if they'll interfere with other common words.

<lexeme>
<grapheme>Hm,</grapheme>
<grapheme>Hm.</grapheme>
<grapheme>Hm?</grapheme>
<grapheme> Hm,</grapheme>
<grapheme> Hm.</grapheme>
<grapheme> Hm?</grapheme>
<grapheme> hm,</grapheme>
<grapheme> hm.</grapheme>
<grapheme> hm?</grapheme>
<phoneme>hm̩</phoneme>

A weird one I came across (the word was paladin-ing):

<lexeme>
<grapheme>-ing</grapheme>
<phoneme>ˈɪng</phoneme>
</lexeme>

Some tweaks I use and propose to add:

kɝː.θəs for Coerthas (The current one is not really the way they pronounce it in game. It's also very convoluted.)
j ʃtəʊlə for Y'shtola
uːlˈdɑ for Ul'dah
ˈʃɑrliən for Sharlayan (not sure why it was ˈʃɑrliiən with ii , it doesn't pronounce it correctly that way)
ˈʃɑrliənzfor Sharlayans (same as above)
ˈi.θɚ for Aether (It's pronounced as Ether in game, Aether is a stylistic spelling choice)
ˈi.θəraɪt for Aetheryte (same as above, although a slight tweak so it works for all english accents)
ˈi.θəraɪtz for Aetherytes (same as above)
ləˈmɪnsə for Lominsa (better accented syllable)

You must be logged in to vote

0 replies

Added more:

<grapheme>Lahabrea</grapheme>
<phoneme>ˈlahʌˈbɹeːæ</phoneme>
</lexeme>
<lexeme>
<grapheme>Yayake</grapheme>
<phoneme>jə.jɑːkɛ</phoneme>
</lexeme>
<lexeme>
<grapheme>Telophoroi</grapheme>
<phoneme>telˈɔfɔɹɔɪ</phoneme>
</lexeme>
<lexeme>
<grapheme>grimoires</grapheme>
<phoneme>ˈɡɹɪmˌwɑːɹs</phoneme>
</lexeme>
<lexeme>
<grapheme>Halone</grapheme>
<phoneme>hʌlˈəʊnɛɪ</phoneme>
</lexeme>
<lexeme>
<grapheme>Cloudtop</grapheme>
<phoneme>klaʊdtɒp</phoneme>
</lexeme>
<lexeme>
<grapheme>Vundu Ok'Bendu</grapheme>
<phoneme>ˈvundu ok bɛndu</phoneme>
</lexeme>
<lexeme>
<grapheme>Padjal</grapheme>
<phoneme>pʌˈd͡ʒæl</phoneme>
</lexeme>
<lexeme>
<grapheme>Vidofnir</grapheme>
<phoneme>vɪd.ˈoʊfnɪɚ</phoneme>
</lexeme>
<lexeme>
<grapheme>Ystride de Caulignont</grapheme>
<phoneme>ɪstɹɪd du kɔlɪɲɒn</phoneme>
</lexeme>
<lexeme>
<grapheme>Ystride</grapheme>
<phoneme>ɪstɹɪd</phoneme>
</lexeme>
<lexeme>
<grapheme>Caulignont</grapheme>
<phoneme>kɔlɪɲɒn</phoneme>
</lexeme>
<lexeme>
<grapheme>Kupo</grapheme>
<grapheme>kupo</grapheme>
<phoneme>ˈkuːˌpoʊ</phoneme>
</lexeme>
<lexeme>
<grapheme>Moogle</grapheme>
<grapheme>moogle</grapheme>
<phoneme>muːɡəl</phoneme>
</lexeme>
<lexeme>
<grapheme>Moogles</grapheme>
<grapheme>moogles</grapheme>
<phoneme>muːɡəls</phoneme>
</lexeme>
<lexeme>
<grapheme>Asah</grapheme>
<phoneme>ʌsɑː</phoneme>
</lexeme>
<lexeme>
<grapheme>Gegeruju</grapheme>
<phoneme>ɡɛɡɛˈɹu.d͡ʒu</phoneme>
</lexeme>
<lexeme>
<grapheme>PvP</grapheme>
<grapheme>pvp</grapheme>
<alias>PV P</alias>
</lexeme>
<lexeme>
<grapheme>Lominsan</grapheme>
<phoneme>ləˈmɪnsən</phoneme>
</lexeme>
<lexeme>
<grapheme>Er.</grapheme>
<grapheme> er.</grapheme>
<grapheme> er </grapheme>
<grapheme> err.</grapheme>
<grapheme> err </grapheme>
<phoneme>ɜːɹ</phoneme>
</lexeme>
<lexeme>
<grapheme>Us.</grapheme>
<grapheme>Us,</grapheme>
<grapheme> us.</grapheme>
<grapheme> us,</grapheme>
<grapheme> us </grapheme>
<phoneme>ʌs</phoneme>
</lexeme>

For some reason I can't get it to pronounce the "E" here:

<lexeme>
<grapheme>E-Sumi-Yan</grapheme>
<phoneme>ˈiːˈsuːmiˈjɛn</phoneme>
</lexeme>

Does anyone have any ideas on how to get it to work? I tried separating it into different words but is still doesn't work for some reason. I tried System, Amazon, and Azure.

You must be logged in to vote

5 replies

@johnysandels

Seems like this issue is caused by having "Attempt to remove stutter from NPC dialogue on.
image

I'm adding your entries, But i need to comb through it to fix an issue I'm coming across before posting. Also thank you for the updated city names! I made those when I first started on the list, so I wasn't as good back then 👍

@SKLCLU

I see. It might work if I made an alias for it maybe?

<lexeme>
<grapheme>E-Sumi-Yan</grapheme>
<alias>E Sumi Yan</alias>
<phoneme>ˈiːˈsuːmiˈjɛn</phoneme>
</lexeme>

I'll test it. EDIT: Hm, didn't seem to work. I'm open for ideas.

Also thank you for the updated city names! I made those when I first started on the list, so I wasn't as good back then 👍

Oh no problem, as a matter of fact if anyone has an improvement over my stuff I welcome it. Not saying they're all perfect. I'm still tweaking them as I go along.
While we're on the topic I did make Amh Araeng as ˈɑːmɐrɛng, which you included, but there's also an entry with ˈɑːm məˈræŋ which I think is better. The problem is there's two entries - one misspelled "Ahm" instead of "Amh". Just delete this entry

@SKLCLU

Just realized PvP might work easier as

<lexeme>
<grapheme>PvP</grapheme>
<grapheme>pvp</grapheme>
<alias>PV P</alias>
</lexeme>

Untested yet though. EDIT: Works! Editing the original post with the update.

@johnysandels

I've updated the system lexicons but I will work on the polly one later today. Also unfortunately the stutter fix removes the E entirely, so it wont work for E-Sumi-Yan, it will just have to be turned off.

@SKLCLU

Yeah, it seems like it. I mean, at least the pronunciation is correct, so there's not much more I can do.
BTW, did a slight tweak for Lahabrea to ˈlahʌˈbɹeːæ and Halone hʌlˈəʊnɛɪ . Should I keep posting here If I make changes and/or add stuff? I'll probably make a new post every now and then when I come across new words, but I'm wondering about making changes to current ones.

Good Morning! I see that this thread has been quiet for most of this current year (2024), but I am gonna try asking my question here to see if any of you have been successful with making a lexicon for something like this:

I have repeated issues with punctuation being read aloud whenever it is followed by a music note symbol: ( ♪ ).

This happens whenever a character is singing, such as The Wandering Minstrel, Jehantel (from the Bard questline), and even Tataru when she is humming a tune.

So what I have gotten is:

  1. .♪ : When a verse ends with a period ( . ) directly before a music note symbol ( .♪ ) It reads "(any lyrical line) dot"
  2. ~♪ : When a verse ends with a tilde ( ~ ) directly before a music note symbol ( ~♪ ) It reads "(any lyrical line) tilde"
  3. ,♪ : When a verse ends with a comma ( , ) directly before a music note symbol ( ,♪) It reads it as "(any lyrical line) comma"
  4. -♪ : When a verse ends with a dash ( - ) directly before a music note symbol ( -♪ ) It reads it as "(any lyrical line) dash"

I think these are the four most common occurrences in which the music note causes the punctuation or other symbol to be read aloud, but there may be more. What I cannot figure out, yet, is how to silence those punctuations from being verbalized when directly followed by a music note symbol without any spaces, as generally the ♪ always comes after punctuation without any spaces.

It can be quite annoying, especially when the song or ditty has multiple lines to it.

You must be logged in to vote

1 reply

@johnysandels

Hi there, that's an issue with how the text is parsed. I've created an issue for this here, #197 .
It's should be a simple fix where it just ignores that symbol when it's parsing the text.

Hey, just a heads up. I'm back in FFXIV again. And... I just pushed a fix for my bad commit that I just realized that I made that completely messed up the Names and locations Amazon Poly lexicon about 2 years ago. I guess I'm the only one who uses that if no one noticed or fixed it since then. But it's fixed now.

8985f2c

You must be logged in to vote

0 replies

Hey @johnysandels,

I pushed a commit (d27de5e) for the "Polly" lexicon to merge a bunch of your contributions in with mine by using the built-in git diff tool in VS Code after overwriting the Polly file with the System file and then manually adjusting it from there bringing over the additions I have made back into the Polly file.

The above is relevant for this next part.

I also wrote a Powershell and Windows Batch script set that will automatically sort our each element by its first value. It's only half working right now, so it needs about 10 seconds of manual editing afterward. That's why I'm not committing my scripts as utilities right now until it just works. My sorted lexicon commit is here (f87112a). I used the redhat.vscode-xml extension in VS Code by Red Hat to automatically do all the intending (which can be done by Ctrl+Shift+P "Format Document").

So my question for you is, do you mind if we standardize on a 4-space indent for the XML formatting, if not for readability, then at least to make it easier to keep contributions between the two major lexicons in sync a bit easier?

If so, I'll do a sort and format for the System lexicon too. Then, periodically, when I sync lexicons together, I can just sort, format, and then merge them, in that order so I don't accidentally merge in duplicates.

You must be logged in to vote

3 replies

@johnysandels

Yeah, I'm into the 4 space formatting. Although I do personally prefer having sub-categories for my lexicons, all alphabetically listed. if you just wanted to make everything alpabetical and delete the categories that is also fine. I only know basic coding and coding language so if you want to overhaul it, I really don't mind!

@ryankhart

Ok, I'm done making changes. I've got the System lexicon all sorted and formatted, and then I merged a few additions into it without merging in any conflicting differences since the system voices might not work as well with the phonemes I use with Amazon Polly.

Let me know if you have any trouble if merging any over your local changes with the new format. You can send your local file to me, and I'll figure out how to merge them.

@ryankhart

Oh, I've also pushed my completed scripts for auto-sorting to the base lexicon directory. It has a README file for instructions. No coding necessary. I've included an option for drag and drop, and then double-clicking a file to run it in Windows Explorer (your file browser).

1514d95

I ended up completely rewriting my first set of scripts in Python mainly because I know Python. I don't know PowerShell as well. The only downside is that it requires you to install Python first.

@johnysandels Which System voices do you test your phonemes with? I'm updating the "Community Lexicon" wiki page with recommendations for voices since, at least with Amazon Polly, some voices are objectively worse than others when handling custom phonemes.

You must be logged in to vote

1 reply

@johnysandels

Hey, sorry for the delay, I use Microsoft George and Microsoft Catherine

This discussion was converted from issue #43 on December 18, 2021 19:15.

Heading

Bold

Italic

Quote

Code

Link


Numbered list

Unordered list

Task list


Attach files

Mention

Reference

Menu

Select a reply

Loading

Uh oh!

There was an error while loading. Please reload this page.

Create a new saved reply

👍 1 reacted with thumbs up emoji 👎 1 reacted with thumbs down emoji 😄 1 reacted with laugh emoji 🎉 1 reacted with hooray emoji 😕 1 reacted with confused emoji ❤️ 1 reacted with heart emoji 🚀 1 reacted with rocket emoji 👀 1 reacted with eyes emoji