Written by POPY
With thanks to W and M.
Synthesizer V Studio Flat is an enhanced build of SV1 Pro created by some Synth fans, inspired by Yumekey. In addition to unlocking and bundling voicebanks, it ships with an add-on called Flat Manager that lets you manage voicebanks and edit them with much higher freedom.
With Synthesizer V Studio Flat and its editors, you can modify existing SV1 Pro voicebanks with almost no practical limits. If your goal is to improve vocal quality and push for more refined results, SV Flat opens up an entirely different workflow.
Read through this guide end to end, then experiment section by section. That's the fastest way to build an intuition for what each parameter controls and how to iterate safely.
SFPK is the voicebank file format used by Synthesizer V Studio Flat. It can be opened and installed via Flat Manager. Conceptually, an SFPK is an archive; depending on the voicebank, it may contain Base Model files, images, NOFS-JSON, and more.
NOFS-JSON is Flat's lightweight voicebank format. By editing the JSON in Flat Manager, you can change metadata, timbre parameters, Vocal Modes, pitch-related parameters, phoneme tables, and more.
Timbre / Vocal Mode / pitch parameters in the JSON are 256-character HEX strings. Internally they are parsed as 32 fp32 values, forming a 32-dimensional embedding vector (Emb). By performing mathematical operations on these floats, you can blend or reinforce timbre and Vocal Modes.
Note: For batch packaging, you can zip multiple .sfpk files and rename the zip extension to .sfpks.
base_model): The Common Denominator Behind VoicebanksIn Flat Manager, use the main menu sort option "By model" to group voicebanks by model. That model is the Base Model (base_model). In the voicebank JSON editor you'll also see entries such as:
"base_model": "2c233d8e9b19f1f4dc0276ba3a5542c1"
base_model is the foundational model a voicebank is built on. A practical way to think about it: SV1 Pro likely trained on large datasets (male and female voices from a training batch), producing a Base Model. In generative-model terms (e.g., VAE), a fixed Base Model defines a region of feature space. Within that space, adjusting embedding vectors (Emb) allows the singing to move smoothly across different voice states.
In short, the Base Model is the lowest-level data source for a voicebank. It stores a range of voice states and largely determines the timbre range the voicebank can reach.
Base Model field format:
"base_model": "2c233d8e9b19f1f4dc0276ba3a5542c1"
A: Current Flat Manager versions support mixing/transferring/modifying voicebank timbre parameters (base / default / auxiliary). These parameters are rooted in the Base Model's feature space. If you want to mix or transfer Vocal Modes between voicebanks, using the same base_model is strongly recommended; otherwise results may become unpredictable (sometimes surprisingly good, but not controllable).
base_model?A: You can, but it's not recommended. A voicebank's timbre is built on top of its Base Model. Replacing only the Base Model usually behaves like generating a random voicebank and is rarely useful.
A: Yes, aside from SV2 compatibility libraries / PLUS compatibility libraries. Many Base Models are fine-tuned from the SV1 "general" Base Model (2c233d8e9b19f1f4dc0276ba3a5542c1) with additional data, so they can be somewhat similar. On these fine-tuned Base Models (use release timing as a clue), you may sometimes reuse timbre parameters from the general SV1 Base Model, but the outcome still has significant uncertainty because the Base Model changed.
base / default / auxiliary): Layered Offset Style VectorsWith a Base Model, SV needs a way to point to a specific voicebank. That's the role of timbre parameters.
At a high level, SynthV Pro can be treated as a three-stage "offset model": three types of embedding vectors (Emb) are stacked to shape a specific voicebank:
(base): the foundational timbre vector. It anchors core characteristics such as Articulation, accent, singing style, and overall timbre. In most Base Models, (base) is what identifies the voicebank (with exceptions such as SV2 compatibility libraries / PLUS compatibility libraries). This is hidden in stock SV1 Pro.(default): the default Vocal Mode. After (base) is set, (default) anchors the voicebank's default singing mode. (default) is effectively a 100%-strength Vocal Mode and can also be reused as a Vocal Mode in other voicebanks. This is also hidden in stock SV1 Pro.(auxiliary): adjustable Vocal Modes. Thanks to SV1 Pro's Base Model data coverage, (auxiliary) can express not only timbre but also Resonance, vocal technique, breathiness, and more. In SV Flat, you can tweak/transfer/modify/build special (auxiliary) modes for specific dynamic effects.(default) and (auxiliary) are equivalent categories and can be blended with each other. (base) is different: in general, (base) should only be blended with other (base) vectors.
Practical importance usually looks like this:
$$ \mathtt{(base) > (default) > (auxiliary)} $$
You can think of SV1 Pro as stacking three offset vectors in order: (base), then (default), then (auxiliary). The direction of a vector represents timbre features; its magnitude represents feature weight. Controlling these vectors gives you fine-grained control over Vocal Modes.
Replacing, mixing, or reshaping these three layers can significantly alter timbre, mouth shape behavior, and perceived texture. Adding or replacing auxiliary styles increases the editing range inside the editor (and works best when you stay aligned with the voicebank's base_model).
extra is a scalar correction used only for (auxiliary) styles. It adjusts details like aspiration; deleting it (default to 0) usually causes only small changes. Flat Manager can predict extra for an (auxiliary) style (currently limited to Base Model 2c233d8e9b19f1f4dc0276ba3a5542c1).
Safety note: When editing a voicebank, change the version and save once first. Flat Manager will create a new version branch and helps prevent accidental loss of the original.
Format example (styles):
"styles": [
{
"name": "(base)",
"data": "C622493E2C8638BEBE4E3B3EB7331FBE52C4DA3DCE1A21BD667B993D43A82D3E367652BEB53579BC7814C1BDBA9427BDFFB2913B14E9433D232DE03D5660BD3D2407653EBD37F8BB61C15E3D2F8478BDBD0A8E3D4AAE033EDCEDD83D23F5193E002985BD3A3D09BCA6CA46BD1C4D4BBDBDFDC0BDA52B81BC0201E43D7D4A383D"
},
{
"name": "(default)",
"data": "877D863A8DEEB1BD7ED4BDBDF6BF1FBE1FB8EFBD32474CBE7058D6BCAF8804BE64DED9BCC66E8F3C64F1B1BC6CE880BE263808BE997563BD78C42EBE8D80423D1F2B78BE374A64BB0045F7BDE0CF92BEC45A8EBEE71019BCEEEB45BE6CF7CFBEC8ED133EDA4C19BD951A37BDEAC7733EE8EC98BD4A9C1DBE7DF3013CAC661E3C"
},
{
"name": "Gentle",
"data": "6F231F3D0F37CEBD7EE0503D0080E53D92C8433E00A47C3C741E7B3EC178103EEC0DB3BC5E0556BD007F2E3E65E4233E3457D7BD62F9023EDC75C73D405DAF3ABF9E21BE0B8A593D003AF5BD4074CE3CC09DD2BBB4F4C33C443A59BDDC3775BE28B1C13C56E05D3C60170D3E54FD11BD6A14A13D409ED93B66ADB73C1E9DEF3D",
"extra": 0.18505549430847168
}
]
A0 — Add more Vocal Modes / create a blank template:
Open the voicebank editor, right-click any style and copy it, for example:
{
"name": "Gentle",
"data": "6F231F3D0F37CEBD7EE0503D0080E53D92C8433E00A47C3C741E7B3EC178103EEC0DB3BC5E0556BD007F2E3E65E4233E3457D7BD62F9023EDC75C73D405DAF3ABF9E21BE0B8A593D003AF5BD4074CE3CC09DD2BBB4F4C33C443A59BDDC3775BE28B1C13C56E05D3C60170D3E54FD11BD6A14A13D409ED93B66ADB73C1E9DEF3D",
"extra": 0.18505549430847168
}
Paste it back into the same list and change it into a blank template, for example:
{
"name": "TimbreStyle1",
"data": "0000000000000080000000000000008000000000000000800000000000000000000000800000008000000080000000800000000000000000000000000000000000000000000000800000000000000080000000000000000000000000000000000000008000000080000000800000008000000080000000800000000000000000",
"extra": 0
}
Make sure the JSON stays valid: commas are the most common issue (missing or extra, depending on whether you insert in the middle or at the end). Click Save (or press Ctrl+S) when done. This blank style is useful as a target slot for A2 — Mixing.
A1 — Magnitude control:
In the voicebank editor, click a style's data value. Press Ctrl+M (or right-click and select "Adjust embedding Magnitude") to view its absolute magnitude. Use the slider to set magnitude in [-5, 5] or just input a number without range limitation. Negative values invert the vector direction. You can also add an "enfored_length": 0 field (any number) to directly set the absolute magnitude.
Note: Large magnitudes can easily cause clipping or extreme loudness. Be cautious, inspect rendered waveforms, then audition. Manual input can exceed 5; use that sparingly.
A2 — Vocal Mode mixing:
Click the style's data value, then press Ctrl+E (or right-click "Export embedding to Mixer") to send it to the Mixer (calculator icon in Flat Manager). Mix channels with sliders, then click Export to write an auxiliary embedding back into the JSON you're editing. Alternatively, copy the 256-HEX result and paste it into the blank template created in A0.
Note: In theory you can mix any embeddings (base/default/auxiliary, different Base Models, random vectors, pitch embeddings, etc.). For controlled results, follow the Base Model guidance and keep edits targeted.
A3 — Vocal Mode transfer:
In the source voicebank, copy a style entry. In the destination voicebank, paste it somewhere after (base) and (default) inside styles, then ensure JSON commas are correct. Save (Ctrl+S).
A4 — Random style:
In the voicebank editor, click "New Random Style" (round button with a plus). Flat Manager appends a random style to the end of styles. Save (Ctrl+S).
A5 — Random voicebank (Reset):
In the voicebank editor, click "Reset" (square button with a plus). This generates a completely random voicebank. Save (Ctrl+S).
A: Besides restarting the SV Flat editor, you can uninstall one minor version of the voicebank and use "Refresh" in the SV Flat editor to reload it.
A: Use version branching. Change version to create separate variants so you can switch between them:
"name": "GUMI AI",
"version": "101",
"vendor": "INTERNET Co., Ltd.",
"language": "japanese",
"phoneset": "romaji"
Note: Never make voicebanks with the same name and different vendors, which may cause some problems.
(default) with (base)?A: Not recommended. (default) is equivalent to an (auxiliary) at 100% strength, but (base) is different and is generally safest to mix only with other (base) vectors. Mixing (base) with (default) is often unpredictable.
f0_model / pitch): The "Secret" Behind Automatic Singing PitchIn Flat Manager, f0_model and pitch control a voicebank's Auto-Pitch characteristics.
"f0_model": "dcee89442f69984189a5b2aedbf9f090",
"pitch": "9D31DDBD30D22D3D4A195ABDD00D0F3E311D253EF28921BE97631C3EA27055BD62F983BDAC2EDDBC7724243DC266003DF53852BC0699D43DC8119DBDAACDA73E2288033EB2A995BD76A4113EBC717FBD9449BB3E8AFE193E58011BBCD5E7863D763E623DD0FBF83CE814BEBD548B83BE27D7D23DCBF8B23DE4982DBEABB693BC"
Replacing both f0_model and pitch together lets you steer a voicebank's Auto-Pitch behavior.
A: If you replace POPY's f0_model & pitch with Minus's f0_model & pitch, POPY's Auto-Pitch behavior will shift toward Minus.
pitch is also a float vector, can it be mixed?A: In theory, yes, but it's usually not very useful. pitch is trained specifically and it's hard to evaluate mix quality like Vocal Modes. Using a high-quality Auto-Pitch set is generally the better option.
sing_model): Optional Plug-ins for Resonance / Articulation BehaviorIn Flat Manager, sing_model mainly affects Articulation (oral target / placement) and how the voice behaves in the mixed-voice range, which in turn influences fundamental support. It has relatively smaller impact on pure-voice timbre and overall loudness.
"sing_model": "3f649ae6cb04ee4f7e9a7ed72ee29928"
As a rule of thumb: pitch changes how it's produced, not the timbre itself.
sing_model is largely independent from base_model, so it's generally safe to try different sing_model options as an enhancer for a voicebank you like.
That said, its impact is often subtle and the result may not fit your target voicebank. Test carefully.
sing_model Will Do?A: Try it. Or analyze the source voicebank's singing characteristics and decide whether your target voicebank needs that direction.
timing_model / timing): Controlling Phoneme Durations and Consonant BehaviorIn Flat Manager, timing_model and timing control phoneme-level timing. Together they influence duration, consonant phenomena, and the "feel" of transitions.
"timing_model": "7b0ea690ada94b4484b50d9d64a21cae",
"timing": "D8168F3D9147F5BC6D9FA93C9A3E923D530D8F3C88D4883C806D7B3CCA41003D456A56BCD5FFC0BC666A5EBDAAA5E2BCC0082FBDB6A66C3D60BABB3C33CF233D064AD83C6BB9AA3C2DC6633CF3DB16BE3870853C28F2A03BBEAE20BDA1BACEBCF2F48C3C46E6FDBC002B48BD6656D9BAF3D7753D56954BBC0FD731BD97F4D43D253F2EBBE46EFCBD09EC7EBDED411EBD36DB9FBC29C2C8BC66F5413DC79A713B7A5F183D4ABF34BDA58529BBBB7F563B098E52BDC212B1BC26F0B03D4EEE573D4157363D68DB51BCF759A73B131389BBE695923CD48E67BDD4D1FB3DC6B4E5BB222421BD75BB15BC7128883D4D88B1BDB2EFA33C6E18973C14FA673BA3D233BD4C5C3BBDB063D33C399F87BBFEA8763D8B8F353D420E643CF3BB8D3C416F49BC0133CA3CCAF3DABDE99E643DC98626BBC405FA3CE27EA43A2770643D244D10BC7D72A9BD620CCDBC3AC22E3D1EEC693C3D098C3BF50D56BDB5BC073C7D4A5F3C845246BD4A13283E100B7ABD081F723C8CBC96BC77FDF43CE7EDBEBCA93525BAAD1B883D3035D53C6BEA413D61A688BC30832FBD98C3D23CAA551EBD73009F3D8D7B9C3CFCAE3BBD888F343CA4B4C23CA837513C31A228BDD3089DBD6D32E7BB87CC773D7BD9173DA6776EBD001D7E3D1BBD1A3D59C5463BC997B6BCE341153C3E6E0E3D0AD9293D36FF2F3DC251E7BCF6E17E3D4983B4BBCE10DA3D182A8B3C"
Replacing both timing_model and timing together lets you steer phoneme behavior:
timing is more about how to place things (phoneme durations, Articulation tightness/looseness, transition dynamics)timing_model is more about what rules exist (voicing, whether consonant events like cl/br can appear, and whether they are triggerable)The combined effect usually looks like this: Articulation won't be rebuilt into another voicebank's timbre or oral target (that's closer to pitch), but it will feel like your voicebank's Articulation is re-timed and re-connected. You may hear tighter/looser Articulation, cleaner/smoother transitions, and noticeable duration changes for certain phonemes (e.g., m, n).
Within the rule boundaries provided by timing_model, voiced/unvoiced behavior or consonant events like cl/br may become easier to trigger or closer to the target's tendency. There is still a limit: even with a full swap, you typically move the trend toward the target rather than perfectly cloning its most extreme traits.
One-sentence summary: timing controls phoneme intensity/duration/connection, timing_model controls available phoneme behaviors and rules. Together they shape phoneme-level Articulation feel and consonant behavior.
This model set is also independent from the Base Model, so swapping is generally safe. Still, test carefully.
timing or timing_model?A: Yes. The effect may be smaller. Try and adjust it carefully.
timing / timing_model / sing_model freely?A: Yes. Their effects are often subtle and require careful control. If you switch all three together, the overall behavior tends to shift more consistently toward the chosen target.
Flat Manager supports switching/editing/adding phoneme tables. Open the editor and click the phoneme table (the "A" icon with three dots).
Each language has its own phoneme table, and each entry contains:
name: the identifier you type into the phoneme field.type: e.g., stop, vowel, fricative, nasal, liquid.token: the real token used in training/synthesis. If two phoneme names map to the same token, they are pronunciation-equivalent within the same language (cross-language equivalence is not guaranteed).Flat enables you to use any system tokens in SynthV in any languages. It's also ok to change phoneme names or make new phonemes as you like. By editing phoneme tables (act on all the voicebanks) while setting which language a voicebank can use, you can make up new languages or write your own phoneme system (but the system still relies on the original training tokens, so results vary).
Flat has edited the phoneme tables of Cantonese and Spanish for Language Extensions.
{
"name": "cantonese-xsampa-phones",
"phonemes": [
{"name":"a", "type":"vowel", "token":"a"}
]
}
The voicebank library includes timbre, Vocal Mode, and pitch data for currently known SV1 singers (including some SV2 compatibility libraries and SV2 PLUS libraries). You can open a voicebank or send embeddings to the Vocal Mode editor (or named Mixer) for further processing.
Note: To reproduce timbre and Vocal Mode behavior accurately, using data from the same Base Model is recommended.
The style library supports grouping voicebanks by name / vendor / Base Model, and supports refresh, batch install, batch export, and search.
NOFS-JSON(.nofs) voicebank you want to edit.NOFS-JSON.version by default.extra, and more.NOFS-JSON..sfpk: save/export a voicebank to .sfpk..sfpk: install a .sfpk file.styles.Voicebank editor field template:
{
"name": "Voicebank display name",
"version": "Version string",
"vendor": "Vendor / publisher",
"language": "Default / primary language",
"phoneset": "Phoneme table (e.g., xsampa; can be inferred from language, not suggested)",
"support_languages": [
"List of supported languages(not suggested)"
],
"base_model": "Base Model hash (locate/verify model files)",
"sing_model": "Singing assistant model hash",
"timing_model": "Timing model hash",
"f0_model": "F0 model hash",
"styles": [
{
"name": "Style label (e.g., (base)/(default))",
"data": "Embedding vector (serialized 256-HEX string)"
},
{
"name": "Style label",
"data": "Embedding vector (serialized 256-HEX string)",
"extra": "Extra scalar correction"
}
],
"pitch": "Auto-Pitch embedding (serialized 256-HEX string)",
"timing": "Phoneme timing embedding (serialized 1024-HEX string)",
"note": "Additional properties are ok for notes"
}
Note 1: You can add your own custom fields to the voicebank JSON (Flat Manager won't read them). This can be used as a "notes" area for recipes and management:
"Memory_1": "Notes: ...",
"Memory_2": "Notes: ..."
Note 2: Flat Manager has a default fill mechanism: if critical fields (like base_model) are missing, it will backfill them with defaults (e.g., 2c233d8e9b19f1f4dc0276ba3a5542c1). Voicebank with no (base) or (default) will be backfilled to 000... (256 zeros). The recommended practice is to never manually specify phoneset and support_languages, relying instead on the editor to autofill them. See the minimal example on the NOFS-JSON of Refresh (you can use Preview Editor to see it).
The Mixer supports blending 256-HEX embeddings for timbre, Vocal Modes, and pitch.
Editor or Preview Editors into the Mixer for mixing.Editor.Switch/browse phoneme tables. See the Phoneme Tables section in Basics and Background Concepts.
How to Install Language Extensions: Language extensions are essentially user dictionaries, but they will unlock hidden phonemes when used with flat. Due to the limitations of the original SynthV R2, a user dictionary can only be installed under one default language (means that only voicebanks with this default language can use the dictionary). Therefore, when installing the extension package, the installer will first prompt you to choose the default language for the extension. For example, if you want to use the Aver or Asterian voicebank with an extension for any language, you would select Japanese + English.
Next, the program will prompt you to choose which languages to install the extension for. If you need extensions for Russian and French, you should select Russian + French. After installation, Flat will allow the voicebank set to the default language of Japanese/French to extend with Russian/French phonemes.
Note: Having too many user dictionaries absolutely can slow down the startup of SynthV.
How to Use Language Extensions: After selecting a voicebank like Aver, and confirming that the extension for its default language (Japanese) is installed, go to the user dictionary tab in the SV sidebar. Select the dictionary, such as "ru(sp)_dict." Here, "ru" refers to the Russian extension, and "(sp)" indicates that the language must be switched to Spanish in the singer tab. In other words, the "ru" extension is used under the "sp" language setting.
Once this is done, you can directly sing in Russian lyrics.
Why Are Some Phonemes Silent / Auto-pitch Rendering Errors? Older base models are more likely to have missing phonemes. Flat only unlocks hidden phonemes and does not forcefully add new ones. The generation of auto-pitch is strongly correlated with phonemes, so if there are phoneme errors, auto-pitch will also encounter issues.
This section focuses on special workflows and ways of thinking. It's not "better" than basic usage - just different. This guide is also limited by the author's experience and the time spent compiling it, so treat it as a starting point and validate by ear.
Recall the Base Model definition:
base_modelis the foundational model. SV1 Pro likely trained on a large dataset, producing a Base Model that defines a feature space. By adjusting embedding vectors, singing can move continuously across different voice states.In short, the Base Model is the lowest-level data source and stores a range of voice states.
So, changes in timbre and singing style are changes in embedding vectors inside the Base Model's high-dimensional feature space.
If you want to build or tune a voicebank, start with a suitable Base Model and then confirm the timbre parameters step by step: from (base) to (default) to (auxiliary). This is a layered tuning workflow: foundational timbre → default mode → specific modes.
When creating a voicebank via random search / voiceprint comparison / data mining, keep in mind:
(base) alone can sometimes land near a target voicebank, because (base) carries high weight and strongly determines core characteristics. As a rough heuristic, if the cosine similarity between a candidate (base) embedding and the target embedding is above ~0.7, it may sound relatively close.(default) often plays a more central role, so starting analysis from (default) can be more effective.During tuning:
(default) and (auxiliary) are equivalent categories, you can clear (default), rename it, and move it into the auxiliary section to tune the default mode as if it were an adjustable mode.(default) vectors can act as very effective soft/strong Vocal Modes.(default) vectors can create a hybrid embedding with traits from both. This can be a practical "XSY" workflow. You can then reuse that mixed (default) as an adjustable default mode to increase flexibility.(base) and (default). If the foundational styles are too far apart, auxiliary transfer may degrade. One workaround is to mix (base) and (default) toward the target first (for example, source:target = 7:3) to reduce feature-space distance. The ratio is not fixed; avoid overly large shifts because they can distort timbre and drift away from the source identity.If different Base Models are "supposed" to be incompatible, why do some cross-model transfers work?
Different-Base-Model mixing can work, but the outcome depends strongly on Base Model similarity, feature-space distance between (base) layers, and how the underlying data overlaps.
Many Base Models are fine-tuned from the general SV1 Base Model, which means there can be partial compatibility. However, feature-space distance is difficult to estimate directly; Base Models trained far apart in time or on very different data can be much harder to transfer between.
An observed workflow: if a cross-model Vocal Mode works well, you can sometimes use a voicebank's (base) as an adapter so that another voicebank on the same Base Model can inherit that cross-model mode more smoothly. This is a practical form of "similarity and compatibility".
Summary: whether transfer works is mostly correlated with Base Model and (base) similarity. "Cross Base Model" does not automatically mean "impossible".
In many cases, embedding magnitude affects a Vocal Mode's effective strength. But SV seems to enforce internal limits on some voicebanks: even if you increase the displayed percentage, the effect may saturate.
Large magnitudes can also cause clipping, no audible change, or extreme loudness.
In practice, influence isn't only magnitude—it also depends on distribution. If you cluster all Vocal Modes within a voicebank (high-dimensional analysis), outliers often sound distinctive (either very strong or very subtle), and the relationship with magnitude is not absolute.
When mixing, avoid blending strongly opposing "strong vs. weak" modes unless you're specifically exploring special effects. That kind of mixing can cancel amplitude in feature space and make the result bland.
Based on a large amount of analysis, many Vocal Modes (notably on Base Models such as 6e5da191faa421a20b529b40c3aa4968) can produce a roughly "opposite" effect via simple inversion.
Some practical ideas:
In the editor, a voicebank doesn't need to be a fixed timbre. You can push one voicebank into different singing states by adjusting oral target placement, Resonance / phonation behavior, Articulation tightness, phoneme transitions, breathiness, and more.
With that level of control, "similarity" becomes more than a subjective judgment. It can be turned into a workflow: align toward a target mode, infer which parameters to move, and iterate with structure. The long-term boundary may not be "timbre cloning", but rather a stable, reproducible space of singing strategies—something closer to a creative and pedagogical vocal design system, where higher freedom does not sacrifice stability.
Here are some examples to get you started.
base: 30% shuo + 70% pastel
default: 30% shuo + 70% pastel
singmodel: Muxin
F0: Muxin
timing: Muxin
35% mild
50% shuo (powerful)
Minus (default)
to be continued ...
| Phoneme Name | Pronunciation Type | System Token | Language | Phoneme System |
|---|---|---|---|---|
| aa | vowel | a | english | arpabet |
| ae | vowel | ARP_ae | english | arpabet |
| ah | vowel | A | english | arpabet |
| ao | vowel | ARP_ao | english | arpabet |
| aw | diphthong | AU | english | arpabet |
| ax | vowel | ARP_ax | english | arpabet |
| ay | diphthong | ARP_ay | english | arpabet |
| b | stop | p | english | arpabet |
| ch | affricate | ts`h | english | arpabet |
| d | stop | t | english | arpabet |
| dx | stop | ARP_dx | english | arpabet |
| dr | affricate | ARP_dr | english | arpabet |
| dw | affricate | ARP_dw | english | arpabet |
| dh | fricative | ARP_dh | english | arpabet |
| eh | vowel | e | english | arpabet |
| er | vowel | ARP_er | english | arpabet |
| ey | diphthong | ARP_ey | english | arpabet |
| f | fricative | f | english | arpabet |
| g | stop | k | english | arpabet |
| hh | aspirate | x | english | arpabet |
| ih | vowel | ARP_ih | english | arpabet |
| iy | vowel | i | english | arpabet |
| jh | affricate | ts` | english | arpabet |
| k | stop | kh | english | arpabet |
| l | liquid | l | english | arpabet |
| m | nasal | m | english | arpabet |
| n | nasal | n | english | arpabet |
| ng | nasal | N | english | arpabet |
| ow | diphthong | @U | english | arpabet |
| oy | diphthong | ARP_oy | english | arpabet |
| p | stop | ph | english | arpabet |
| q | stop | ARP_q | english | arpabet |
| r | semivowel | z` | english | arpabet |
| s | fricative | s | english | arpabet |
| sh | fricative | s` | english | arpabet |
| t | stop | th | english | arpabet |
| tr | affricate | ARP_tr | english | arpabet |
| tw | affricate | ARP_tw | english | arpabet |
| th | fricative | ARP_th | english | arpabet |
| uh | vowel | u | english | arpabet |
| uw | vowel | y | english | arpabet |
| v | fricative | ARP_v | english | arpabet |
| w | semivowel | w | english | arpabet |
| y | semivowel | j | english | arpabet |
| z | fricative | ts | english | arpabet |
| zh | fricative | ARP_zh | english | arpabet |
| pau | silence | pau | english | arpabet |
| sil | silence | sil | english | arpabet |
| cl | stop | cl | english | arpabet |
| br | breath | br | english | arpabet |
| a | vowel | a | japanese | romaji |
| i | vowel | i | japanese | romaji |
| u | vowel | u | japanese | romaji |
| e | vowel | e | japanese | romaji |
| o | vowel | o | japanese | romaji |
| N | vowel | N | japanese | romaji |
| cl | stop | cl | japanese | romaji |
| t | stop | th | japanese | romaji |
| d | stop | t | japanese | romaji |
| s | fricative | s | japanese | romaji |
| sh | fricative | s\ | japanese | romaji |
| j | affricate | ts\ | japanese | romaji |
| z | affricate | ts | japanese | romaji |
| ts | affricate | tsh | japanese | romaji |
| k | stop | kh | japanese | romaji |
| kw | stop | ROM_kw | japanese | romaji |
| g | stop | k | japanese | romaji |
| gw | stop | ROM_gw | japanese | romaji |
| h | aspirate | x | japanese | romaji |
| b | stop | p | japanese | romaji |
| p | stop | ph | japanese | romaji |
| f | fricative | f | japanese | romaji |
| ch | affricate | ts\h | japanese | romaji |
| ry | liquid | ROM_ry | japanese | romaji |
| ky | stop | ROM_ky | japanese | romaji |
| py | stop | ROM_py | japanese | romaji |
| dy | stop | ROM_dy | japanese | romaji |
| ty | stop | ROM_ty | japanese | romaji |
| ny | nasal | ROM_ny | japanese | romaji |
| hy | aspirate | ROM_hy | japanese | romaji |
| my | nasal | ROM_my | japanese | romaji |
| gy | stop | ROM_gy | japanese | romaji |
| by | stop | ROM_by | japanese | romaji |
| n | nasal | n | japanese | romaji |
| m | nasal | m | japanese | romaji |
| r | liquid | l | japanese | romaji |
| w | semivowel | w | japanese | romaji |
| v | semivowel | ARP_v | japanese | romaji |
| y | semivowel | j | japanese | romaji |
| pau | silence | pau | japanese | romaji |
| sil | silence | sil | japanese | romaji |
| br | breath | br | japanese | romaji |
| a | vowel | a | mandarin | xsampa |
| A | vowel | A | mandarin | xsampa |
| o | vowel | o | mandarin | xsampa |
| @ | vowel | @ | mandarin | xsampa |
| e | vowel | e | mandarin | xsampa |
| 7 | vowel | 7 | mandarin | xsampa |
| U | vowel | U | mandarin | xsampa |
| u | vowel | u | mandarin | xsampa |
| i | vowel | i | mandarin | xsampa |
| i\ | vowel | i\ | mandarin | xsampa |
| i` | vowel | i` | mandarin | xsampa |
| y | vowel | y | mandarin | xsampa |
| AU | diphthong | AU | mandarin | xsampa |
| @U | diphthong | @U | mandarin | xsampa |
| ia | diphthong | ia | mandarin | xsampa |
| iA | diphthong | iA | mandarin | xsampa |
| iAU | diphthong | iAU | mandarin | xsampa |
| ie | diphthong | ie | mandarin | xsampa |
| iE | diphthong | iE | mandarin | xsampa |
| iU | diphthong | iU | mandarin | xsampa |
| i@U | diphthong | i@U | mandarin | xsampa |
| y{ | diphthong | y{ | mandarin | xsampa |
| yE | diphthong | yE | mandarin | xsampa |
| ua | diphthong | ua | mandarin | xsampa |
| uA | diphthong | uA | mandarin | xsampa |
| u@ | diphthong | u@ | mandarin | xsampa |
| ue | diphthong | ue | mandarin | xsampa |
| uo | diphthong | uo | mandarin | xsampa |
| :\i | coda | :\i | mandarin | xsampa |
| r\` | coda | r\` | mandarin | xsampa |
| :n | coda | :n | mandarin | xsampa |
| N | coda | N | mandarin | xsampa |
| p | stop | p | mandarin | xsampa |
| ph | stop | ph | mandarin | xsampa |
| t | stop | t | mandarin | xsampa |
| th | stop | th | mandarin | xsampa |
| k | stop | k | mandarin | xsampa |
| kh | stop | kh | mandarin | xsampa |
| ts\ | affricate | ts\ | mandarin | xsampa |
| ts | affricate | ts | mandarin | xsampa |
| tsh | affricate | tsh | mandarin | xsampa |
| ts` | affricate | ts` | mandarin | xsampa |
| ts`h | affricate | ts`h | mandarin | xsampa |
| x | aspirate | x | mandarin | xsampa |
| f | fricative | f | mandarin | xsampa |
| s | fricative | s | mandarin | xsampa |
| s` | fricative | s` | mandarin | xsampa |
| ts\h | fricative | ts\h | mandarin | xsampa |
| s\ | fricative | s\ | mandarin | xsampa |
| m | nasal | m | mandarin | xsampa |
| n | nasal | n | mandarin | xsampa |
| l | liquid | l | mandarin | xsampa |
| z` | semivowel | z` | mandarin | xsampa |
| w | semivowel | w | mandarin | xsampa |
| j | semivowel | j | mandarin | xsampa |
| pau | silence | pau | mandarin | xsampa |
| sil | silence | sil | mandarin | xsampa |
| cl | stop | cl | mandarin | xsampa |
| br | breath | br | mandarin | xsampa |
| ts | affricate | ts | cantonese | xsampa |
| tsh | affricate | tsh | cantonese | xsampa |
| f | fricative | f | cantonese | xsampa |
| h | fricative | x | cantonese | xsampa |
| s | fricative | s | cantonese | xsampa |
| l | liquid | l | cantonese | xsampa |
| m | nasal | m | cantonese | xsampa |
| n | nasal | n | cantonese | xsampa |
| N | nasal | YUE_N | cantonese | xsampa |
| w | semivowel | w | cantonese | xsampa |
| j | semivowel | j | cantonese | xsampa |
| p | stop | p | cantonese | xsampa |
| ph | stop | ph | cantonese | xsampa |
| t | stop | t | cantonese | xsampa |
| th | stop | th | cantonese | xsampa |
| k | stop | k | cantonese | xsampa |
| kh | stop | kh | cantonese | xsampa |
| kw | stop | ROM_gw | cantonese | xsampa |
| kwh | stop | ROM_kw | cantonese | xsampa |
| a | vowel | a | cantonese | xsampa |
| 6 | vowel | 6 | cantonese | xsampa |
| E | vowel | e | cantonese | xsampa |
| e | vowel | e | cantonese | xsampa |
| i | vowel | i | cantonese | xsampa |
| I | vowel | ARP_ih | cantonese | xsampa |
| O | vowel | o | cantonese | xsampa |
| o | vowel | o | cantonese | xsampa |
| u | vowel | u | cantonese | xsampa |
| U | vowel | U | cantonese | xsampa |
| 9 | vowel | 9 | cantonese | xsampa |
| 8 | vowel | 8 | cantonese | xsampa |
| y | vowel | y | cantonese | xsampa |
| m= | vowel | m | cantonese | xsampa |
| N= | vowel | N | cantonese | xsampa |
| :i | coda | :\i | cantonese | xsampa |
| :u | coda | :u | cantonese | xsampa |
| :m | coda | :m | cantonese | xsampa |
| :n | coda | :n | cantonese | xsampa |
| :N | coda | N | cantonese | xsampa |
| :p_} | coda | :p_} | cantonese | xsampa |
| :t_} | coda | :t_} | cantonese | xsampa |
| :k_} | coda | :k_} | cantonese | xsampa |
| pau | silence | pau | cantonese | xsampa |
| sil | silence | sil | cantonese | xsampa |
| cl | stop | cl | cantonese | xsampa |
| br | breath | br | cantonese | xsampa |
| a | vowel | a | spanish | xsampa |
| e | vowel | e | spanish | xsampa |
| i | vowel | i | spanish | xsampa |
| o | vowel | o | spanish | xsampa |
| u | vowel | u | spanish | xsampa |
| U | semivowel | w | spanish | xsampa |
| I | semivowel | ES_I | spanish | xsampa |
| y | semivowel | j | spanish | xsampa |
| ll | semivowel | ll | spanish | xsampa |
| b | stop | b | spanish | xsampa |
| B | stop | B | spanish | xsampa |
| d | stop | d | spanish | xsampa |
| D | stop | D | spanish | xsampa |
| g | stop | g | spanish | xsampa |
| k | stop | k | spanish | xsampa |
| p | stop | p | spanish | xsampa |
| t | stop | t | spanish | xsampa |
| l | liquid | l | spanish | xsampa |
| rr | trill | rr | spanish | xsampa |
| r | liquid | r | spanish | xsampa |
| m | nasal | m | spanish | xsampa |
| n | nasal | n | spanish | xsampa |
| N | nasal | N | spanish | xsampa |
| J | nasal | ROM_ny | spanish | xsampa |
| f | fricative | f | spanish | xsampa |
| s | fricative | s | spanish | xsampa |
| C | fricative | ARP_th | spanish | xsampa |
| sh | fricative | s` | spanish | xsampa |
| ch | affricate | ts`h | spanish | xsampa |
| x | fricative | x | spanish | xsampa |
| pau | silence | pau | spanish | xsampa |
| sil | silence | sil | spanish | xsampa |
| cl | stop | cl | spanish | xsampa |
| br | breath | br | spanish | xsampa |
| 4 | liquid | l | korean | xsampa |
| 6 | vowel | 6 | korean | xsampa |
| b | stop | b | korean | xsampa |
| d | stop | d | korean | xsampa |
| dz\ | affricate | ts\ | korean | xsampa |
| e_o | vowel | e | korean | xsampa |
| g | stop | g | korean | xsampa |
| h | fricative | x | korean | xsampa |
| i | vowel | i | korean | xsampa |
| j | semivowel | j | korean | xsampa |
| ts\h | affricate | ts\h | korean | xsampa |
| k | stop | k | korean | xsampa |
| k_t | stop | k_t | korean | xsampa |
| l | liquid | l | korean | xsampa |
| M | nasal | U | korean | xsampa |
| m | nasal | m | korean | xsampa |
| n | nasal | n | korean | xsampa |
| N | coda | N | korean | xsampa |
| o | vowel | o | korean | xsampa |
| p | stop | p | korean | xsampa |
| p_t | stop | p_t | korean | xsampa |
| s | fricative | s | korean | xsampa |
| s_t | fricative | s_t | korean | xsampa |
| t | stop | t | korean | xsampa |
| t_t | stop | t_t | korean | xsampa |
| ts\_h | affricate | ts`h | korean | xsampa |
| u | vowel | u | korean | xsampa |
| V | vowel | A | korean | xsampa |
| w | semivowel | w | korean | xsampa |
| pau | silence | pau | korean | xsampa |
| sil | silence | sil | korean | xsampa |
| cl | stop | cl | korean | xsampa |
| br | breath | br | korean | xsampa |