Languages and narrations

In the world of content creation, choosing the right voices can make a significant difference in the impact of your message on the audience.

Imagine having a YAML file with specific information about the voices you will use in your project. You can define each speaker's name, style, and settings, such as speed and pitch. This gives you precise control over how your characters or narrators sound.

If you remember our quickstart documentation, the toc.yml file contains all the settings needed to output the video, including speaker settings.

General Voice Configuration

  • name: The name of the speaker; in the next part, you can look at all the voices.
  • speed: The speed of speech, where 1.0 is the normal speed. Values greater than 1.0 increase the speed, and values less than 1.0 decrease the speed and accept float values from 0.5 to 2 to modify the speed.
  • pitch: The pitch of the speaker's voice, where 1.0 is the normal pitch. Values greater than 1.0 increase the pitch, and values less than 1.0 decrease the pitch and accept float values from 0.5 to 2 to modify the pitch.
  • style: The style of the speaker's voice, which can be a predefined voice style like customer service or chat. In the next part, you can look at all the available styles.

Chapter-specific Voice Configuration

You can specify a different speaker, speed, pitch, or style for each chapter, overriding the general voice settings for that particular chapter.

Section-specific Voice Configuration

Similar to chapters, you can specify individual voice settings for each section within a chapter, overriding both the general and chapter-specific voice settings for that section.

Example

name: Hello world
version: v1
speaker: Jenny
speed: 1.1
pitch: 1.2
style: customerservice
chapters:
  - name: Chapter One
    speaker: Aria # Change the voice and configuration for this chapter (and sections)
    speed: 1.2
    style: chat
    sections:
      - name: Test
        href: hello_world.md
      - name: Test_2
        href: hello_world_2.md
  - name: Chapter Two
    sections:
      - name: Test
        speaker: Guy # Change the voice and configuration for this section
        speed: 1
        pitch: 1
        style: chat
        href: hello_world.md
      - name: Test_2
        href: hello_world_2.md

In this example:

  • Jenny speaker with customer service voice style is set at the beginning as per the initial configuration.
  • In Chapter One, the voice setting is changed to Aria and is specified with a different speed, pitch, and style, overriding the general setting.
  • In Chapter Two, the first section, Test, uses Guy with normal speed and pitch, as specified.
  • In the second section, Test_2 in Chapter Two, the voice returns to Jenny with speed: 1.1 and pitch: 1.2, retains the general voice settings.

Inline styles

Additionally, you can incorporate inline emotions to add tone and mood to each section, enhancing the listener's experience:

[[cheerful]] Welcome to our universe, a vast and wondrous place full of mysteries and surprises. In this video, we will share with you some fun facts about the universe that will blow your mind. So, sit back, relax, and enjoy the show!

Phonemes and Graphemes Support (Currently only available for English)

This feature enables you to specify how acronyms, initials, or any other set of characters are pronounced, either through an alias or by providing their phonetic representation in IPA (International Phonetic Alphabet).

You specify the way in which the acronyms, initials or any other set of characters are said, either through an alias or the phoneme in IPA.

Steps:

  • Create a lexicon-[LANGUAGE].json (currently only available: lexicon-en.json) file.
  • Define the aliases and phonemes as needed.
  • Enjoy enhanced pronunciation accuracy!

Structure:

Every alias and phoneme is going to be specified inside a lexicon-[LANGUAGE].json file using the format:

(lexicon-en.json)

{
    "phonemes":{
        "hello" : "həˈloʊ",
        "next_word" : "next_pronunciation"
    },
    "alias": {
        "BTW": "by the way",
        "next_alias" : "next_interpretation"
    }
}

This JSON structure allows you to map words to their phonetic representations or specify aliases for easier pronunciation in your video.