GPT-SoVITS for local inference on Intel or Apple Silicon Mac

4 min readJan 21, 2024


The GitHub repository for “GPT-SoVITS” is a project focused on voice data processing and text-to-speech (TTS) technology. It highlights the capability of training a good TTS model using as little as one minute of voice data, a method known as “few shot voice cloning.” The project is under the MIT license and involves Python as its primary programming language.

This tutorial will talk about how to running this project using the CPU under the Mac platform.

  • Don’t think about trainning on Mac yet, It’s good enough if they can preprocess and infer. Running LLM might be possible, but if anyone has successfully trained on a Mac (with MPS), please let me know.
  • This tutorial mainly talks about the inference process after training and downloading the model to the local machine. I have tested it, and it all works.
  • Training related information can be found in the reference videos above, which are very detailed. The dataset is the key, and patience is needed for training.

MPS Not Supported

Project link:

This tutorial is for communication and learning purposes only. Please do not use it for illegal, immoral, or unethical purposes.

Please ensure that you address any authorization issues related to the dataset on your own. You bear full responsibility for any problems arising from the usage of non-authorized datasets for training, as well as any resulting consequences. The repository and its maintainer, svc develop team, disclaim any association with or liability for the consequences.

It is strictly forbidden to use it for any political-related purposes.

Software requirements:

  1. Homebrew
  2. VScode (optional)
  3. Python3.9
brew install python3.9

Local Inference

1. Create venv

Create a virtual environment

python3.9 -m venv myenv #change 'myenv' to a different name

2. Enter venv

cd myenv
source bin/activate

3. Download Project

git clone

cd to the project directory

4. Download Package

brew install ffmpeg
pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm cn2an pypinyin pyopenjtalk g2p_en chardet

Additional Requirements

If you need Chinese ASR (supported by FunASR), install:

pip install modelscope torchaudio sentencepiece funasr

Note: If you find No module named Just install that package

You can also use requirements.txt to install, but if there are some problems, just install what I mentioned before.

pip install -r requirements.txt # No need to run this

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/damo_asr/models.

For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.

If you want one click package:

5. Start WebUI


6. Choose Models

The models are in these two folders, one is the GPT model and the other is the SoVITS model. You should put the file to the right folder

Click 是否开启TTS推理WebUI

An error may be reported at this time. You need to modify GPT_SoVITS/ to use CPU inference.

├── GPT_weights
└── LeiJun-e15.ckpt
├── SoVITS_weights
└── LeiJun_e10_s470.pth

7. Change

  1. Change CUDA to CPU
  2. Change half precision to full precision model.half() ---> model.float()

Processed file:

Just save the changes and re-run it to run.


Thanks for reading. If there are any questions or better methods in the tutorial, please point them out.