sonuai.dev logo
SonuAI.dev
AI Music Generator

YuE AI Music Generator: Create Full Songs from Lyrics and Genre Prompts

YuE AI Music Generator: Create Full Songs from Lyrics and Genre Prompts
0 views
10 min read
#AI Music Generator

In this article, I’ll walk you through the YuE AI Music Generator, an open-source tool that can create full songs from just a few inputs. All you need to do is provide the lyrics and specify the genre, and it will generate a complete song for you. Let’s dive into how it works and explore some examples of what it can do.


How YuE AI Music Generator Works?

The YuE AI Music Generator is a tool that allows you to create full songs by simply inputting lyrics and selecting a genre. It’s similar to other tools like Udio or Suno, but what sets it apart is its open-source nature and the fact that it’s completely free to use.

YuE AI Music Generator

Here’s how it works:

  1. Input Lyrics: Write or paste the lyrics you want the song to include.
  2. Specify Genre: Choose a genre or style for the song, such as pop, rap, jazz, or heavy metal.
  3. Generate Song: The AI will create a full song, complete with vocals and instrumentation, based on your inputs.

YuE AI Music Generator Overview:

FeatureDetails
Model NameYuE AI Music Generator
FunctionalityGenerate full songs from lyrics and genre prompts
Project Pagemap-yue.github.io/
GitHub Repositorygithub.com/multimodal-art-projection/YuE
Hugging Face Spacehuggingface.co/m-a-p/YuE-s1-7B-anneal-en-cot
InputLyrics and genre specifications
OutputComplete musical compositions with vocals and instrumentation

Examples of Songs Generated by YuE AI

To give you a better idea of what YuE AI can do, let’s look at some examples of songs it has generated across different genres.

Pop Genre Example

For the first example, I specified the genre as "inspiring female uplifting pop, airy vocal, electronic, bright vocal" and input the following lyrics:

"Staring at the sense colors the sky of you, keep swearing can't deny, I know you down and make me St, but I'm here to man the heart I didn't break, every word you take my the behind every dream you're a Chase, I reach you by the life you can't F this, and now I won't back down, you know you can't deny it now."

Here’s what the AI generated:

"Staring at the sense colors the sky of you, keep swearing can't deny, I know you down and make me St, but I'm here to man the heart I didn't break, every word you take my the behind every dream you're a Chase, I reach you by the life you can't F this, and now I won't back down, you know you can't deny it now."

If I generate the song again with the same inputs, here’s another variation:

"Come beat the sky, that's how you keep sweating, can't to high, I know I let you down, I made mistakes, but I'm here to man The, I didn't break, every you take, I'm eyes you."

As you can see, the AI creates unique variations each time, even with the same lyrics and genre.


Rap Genre Example

Next, let’s try a rap song. For this example, I specified the genre as "rap piano, street tough, piercing vocal, hip-hop synthesizer, clear vocal male" and input the following lyrics:

"This is my journey and I'm running this race, he the Cs about a base, said I couldn't do a, said I'd never rise, but now I'm soaring high, reaching for the skies, lessons that I learned made me who I am, standing T now I don't give a damn, echoes in the alley, music need to me, watch is cish, she be so me like me, let USA and SK get all to ca first, bir certain that b fing the from the world, cing the CH do and the witch and on this fool man on my wing and from my near the end there before the chorus."

Here’s what the AI generated:

"This is my journey and I'm running this race, he the Cs about a base, said I couldn't do a, said I'd never rise, but now I'm soaring high, reaching for the skies, lessons that I learned made me who I am, standing T now I don't give a damn, echoes in the alley, music need to me, watch is cish, she be so me like me, let USA and SK get all to ca first, bir certain that b fing the from the world, cing the CH do and the witch and on this fool man on my wing and from my near the end there before the chorus."

One thing to note is that the AI sometimes skips lines or jumps around with the lyrics. However, it’s impressive how it can continue rapping without any guidance, almost like freestyling.


Jazz Genre Example

Now, let’s explore a jazz example. For this, I specified the genre as "female blues, airy vocal, bright vocal, piano, sad romantic guitar, jazz" and input the following lyrics:

"Sh T to fall up the ne, echo through the hall, bre in the silent, I hear gentle voice, guarding me back Homeward, making my heart Rejoice, don't let this moment, they hold me close to with your he beside me, things around can't the Dem, don't want to let you go, stay with me forever."

Here’s what the AI generated:

"Sh T to fall up the ne, echo through the hall, bre in the silent, I hear gentle voice, guarding me back Homeward, making my heart Rejoice, don't let this moment, they hold me close to with your he beside me, things around can't the Dem, don't want to let you go, stay with me forever."

The result is a smooth, jazzy tune with a romantic vibe. It’s a great example of how versatile the YuE AI Music Generator can be.


Heavy Metal Genre Example

For those who prefer something more intense, YuE AI can also handle heavy metal. Here’s an example with the genre specified as "heavy metal" and the following lyrics:

"Step back all night without a fight, no SC get up with the first fight, up P your hands be light, might step back cuz I hold back F going going the on what thing."

Here’s what the AI generated:

"Step back all night without a fight, no SC get up with the first fight, up P your hands be light, might step back cuz I hold back F going going the on what thing."

The AI has no problem handling screaming and other extreme elements of heavy metal. Just a warning—you might want to turn your speakers down for this one!


Multilingual Song Example

YuE AI can also create songs in different languages. For this example, I input lyrics that include Japanese, English, and Korean:

"The only one I know, know the only know, love you, do you want me, and I want to be your number one, come a little closer."

Here’s what the AI generated:

"The only one I know, know the only know, love you, do you want me, and I want to be your number one, come a little closer."

Near the end of the song, the AI even added some autotune synth voice effects, which is a cool touch.


How to Use YuE AI Music Generator?

If you’re interested in trying out the YuE AI Music Generator for yourself, here’s what you need to know:

  1. Download the Tool: The GitHub link at the top of the blog post contains all the instructions on how to download and use the tool.
  2. System Requirements: The tool requires significant GPU memory. You’ll need at least 16 GB of VRAM, though 24 GB is recommended for optimal performance.
  3. Open-Source License: The tool is under the Apache 2 license, which means you’re free to use it for any purpose, including commercial projects.

How to use YuE AI Locally using github?


Step 1: Install Environment and Dependencies

  1. Create a new Python environment using Conda:

    • Open your terminal and run:
      conda create -n yue python=3.8
      conda activate yue
    • Note: Python 3.8 or higher is recommended.
  2. Install PyTorch and CUDA (for GPU acceleration):

    • Install the necessary libraries:
      conda install pytorch torchvision torchaudio cudatoolkit=11.8 -c pytorch -c nvidia
  3. Install YuE requirements:

    • Run the following command to install the required dependencies:
      pip install -r <(curl -sSL https://raw.githubusercontent.com/multimodal-art-projection/YuE/main/requirements.txt)
  4. Save GPU memory by installing FlashAttention 2:

    • FlashAttention 2 helps reduce VRAM usage. Install it with:
      pip install flash-attn --no-build-isolation
    • Note: Make sure the FlashAttention version matches your CUDA version to avoid issues.

Step 2: Download Inference Code and Tokenizer

  1. Install Git LFS (Large File Storage):

    • Update your system and install Git LFS:
      sudo apt update
      sudo apt install git-lfs
      git lfs install
  2. Clone the YuE repository:

    • Download the YuE code with:
      git clone https://github.com/multimodal-art-projection/YuE.git
      cd YuE/inference/
  3. Download the tokenizer:

    • Clone the tokenizer model:
      git clone https://huggingface.co/m-a-p/xcodec_mini_infer

Step 3: Run Inference to Generate Music

Now you’re ready to generate music using YuE. Follow these steps:

  1. Basic inference:

    • Use the following command to generate music:

      python infer.py \
          --cuda_idx 0 \
          --stage1_model m-a-p/YuE-s1-7B-anneal-en-cot \
          --stage2_model m-a-p/YuE-s2-1B-general \
          --genre_txt ../prompt_egs/genre.txt \
          --lyrics_txt ../prompt_egs/lyrics.txt \
          --run_n_segments 2 \
          --stage2_batch_size 4 \
          --output_dir ../output \
          --max_new_tokens 3000
    • Parameters explained:

      • --run_n_segments: Number of lyric sections to generate.
      • --stage2_batch_size: Adjust based on your GPU memory.
      • --output_dir: Folder where the generated music will be saved.
  2. Customizing prompts:

    • Edit genre.txt and lyrics.txt to customize the music’s genre and lyrics.
    • Refer to the prompt engineering guide for advanced tips.
  3. Avoid out-of-memory (OOM) errors:

    • If you encounter OOM errors, reduce --stage2_batch_size.

Advanced: Music In-Context Learning (ICL)

YuE supports generating music based on reference audio tracks. There are two types of ICL:

  • Provide separate vocal and instrumental tracks for better results.

    python infer.py \
        --cuda_idx 0 \
        --stage1_model m-a-p/YuE-s1-7B-anneal-en-icl \
        --stage2_model m-a-p/YuE-s2-1B-general \
        --genre_txt ../prompt_egs/genre.txt \
        --lyrics_txt ../prompt_egs/lyrics.txt \
        --run_n_segments 2 \
        --stage2_batch_size 4 \
        --output_dir ../output \
        --max_new_tokens 3000 \
        --use_dual_tracks_prompt \
        --vocal_track_prompt_path ../prompt_egs/pop.00001.Vocals.mp3 \
        --instrumental_track_prompt_path ../prompt_egs/pop.00001.Instrumental.mp3 \
        --prompt_start_time 0 \
        --prompt_end_time 30

2. Single-Track ICL:

  • Provide one audio track (mix, vocal, or instrumental).

    python infer.py \
        --cuda_idx 0 \
        --stage1_model m-a-p/YuE-s1-7B-anneal-en-icl \
        --stage2_model m-a-p/YuE-s2-1B-general \
        --genre_txt ../prompt_egs/genre.txt \
        --lyrics_txt ../prompt_egs/lyrics.txt \
        --run_n_segments 2 \
        --stage2_batch_size 4 \
        --output_dir ../output \
        --max_new_tokens 3000 \
        --use_audio_prompt \
        --audio_prompt_path ../prompt_egs/pop.00001.mp3 \
        --prompt_start_time 0 \
        --prompt_end_time 30
    • You can use tools like Ultimate Vocal Remover GUI or python-audio-separator to extract vocals and instrumentals.

Final Thoughts

While the quality of the songs generated by YuE AI isn’t quite on par with tools like Udio or Suno AI yet, it’s still an impressive tool, especially considering that it’s completely free and open-source. As more users work on optimizing it, I’m sure we’ll see even better results in the future.