Introduction
The GitHub repository for “GPT-SoVITS” is a project focused on voice data processing and text-to-speech (TTS) technology. It highlights the capability of training a good TTS model using as little as one minute of voice data, a method known as “few shot voice cloning.” The project is under the MIT license and involves Python as its primary programming language.
This tutorial will talk about how to running this project using the CPU under the Mac platform.
- Don’t think about trainning on Mac yet, It’s good enough if they can preprocess and infer. Running LLM might be possible, but if anyone has successfully trained on a Mac (with MPS), please let me know.
- This tutorial mainly talks about the inference process after training and downloading the model to the local machine. I have tested it, and it all works.
- Training related information can be found in the reference videos above, which are very detailed. The dataset is the key, and patience is needed for training.
MPS Not Supported
Project link: https://github.com/RVC-Boss/GPT-SoVITS
This tutorial is for communication and learning purposes only. Please do not use it for illegal, immoral, or unethical purposes.
Please ensure that you address any authorization issues related to the dataset on your own. You bear full responsibility for any problems arising from the usage of non-authorized datasets for training, as well as any resulting consequences. The repository and its maintainer, svc develop team, disclaim any association with or liability for the consequences.
It is strictly forbidden to use it for any political-related purposes.
Software requirements:
- Homebrew https://brew.sh/
- VScode (optional)
- Python3.9
brew install python3.9
Local Inference
1. Create venv
Create a virtual environment
python3.9 -m venv myenv #change 'myenv' to a different name
2. Enter venv
cd myenv
source bin/activate
3. Download Project
git clone https://github.com/RVC-Boss/GPT-SoVITS.git
cd
to the project directory
4. Download Package
brew install ffmpeg
pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet
Additional Requirements
If you need Chinese ASR (supported by FunASR), install:
pip install modelscope torchaudio sentencepiece funasr
Note: If you find No module named
Just install that package
You can also use requirements.txt
to install, but if there are some problems, just install what I mentioned before.
pip install -r requirements.txt # No need to run this
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models
.
For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/damo_asr/models
.
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights
.
If you want one click package: https://github.com/RVC-Boss/GPT-SoVITS/issues/4
5. Start WebUI
python web.py
6. Choose Models
The models are in these two folders, one is the GPT model and the other is the SoVITS model. You should put the file to the right folder
Click 是否开启TTS推理WebUI
An error may be reported at this time. You need to modify GPT_SoVITS/inference_webui.py
to use CPU inference.
├── GPT_weights
└── LeiJun-e15.ckpt
├── SoVITS_weights
└── LeiJun_e10_s470.pth
7. Change inference_webui.py
- Change
CUDA
toCPU
- Change half precision to full precision
model.half()
--->model.float()
Processed file: https://github.com/RoversX/GPT-SoVITS/blob/main/GPT_SoVITS/inference_webui.py
Just save the changes and re-run it to run.
python web.py
Thanks for reading. If there are any questions or better methods in the tutorial, please point them out.