How to use PaddleOCR and PP-OCRv5 to fine-tune a text recognition model.
Tim Chan • Published 2025-07-27 • ~5 min read
This blog post is what I did step-by-step to fine-tune the PP-OCRv5 text recognition model which was implemented in the steakcam project.
As a person who does not use PaddleOCR often let alone spend time fine-tuning OCR models, I found some of the available documentation out there harder to follow and missing some context.
PaddleOCR is a multilingual OCR and Document Parsing toolkits based on PaddlePaddle (ref. Repo description). It handles English, Chinese, and Japanese text pretty well with the newest PP-OCRv5 model.
However, there are some situations where the text recognition is not as accurate as desired. In this case, training (a.k.a fine-tuning) the model with a known dataset can be a good option to improve accuracy.
Ensure the following has been set up:
nvidia-smi
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.88 Driver Version: 576.88 CUDA Version: 12.9 |
+-----------------------------------------------------------------------------------------+
The first step is to install paddlepaddle. Follow the steps from PaddleOCR's installation guide to install the right version. Given the output from nvidia-smi
showing CUDA Version: 12.9
, the command to install paddlepaddle is:
pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/
The next step is to clone the PaddleOCR and install the dependencies. The PaddleOCR repository contains helper scripts and tools to fine-tune a PP-OCRv5 model.
The path for the instructions below will be /home/timc/ocr
. Adjust the path for your own system.
cd /home/timc/ocr
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
Gather your training data, whether this is your own data or from an existing dataset. The steps below focus on bringing your own data set, whether it is synthetic training data or real life examples of the text you to recognise.
PaddleOCR expects the dataset to be in a specific format of filename{tabspace}label
which may be different from other training data sets. There is a utility file ppocr/utils/gen_label.py
in the PaddleOCR repository that will convert a CSV into the expected format.
Move the unlabelled (or labelled) data folder into the PaddleOCR directory.
cd /path/to/the/dataset
ls
data/
└── images/
├── randomfournumbers1.jpg
├── randomfournumbers2.jpg
└── randomfournumbers3.jpg
cp -r /path/to/the/dataset/* /home/timc/ocr/PaddleOCR/
Create data/labels.csv
and annotate the data in the format of filename,label
. Do not add a header row in the CSV file.
randomfournumbers1.jpg,1234
randomfournumbers2.jpg,5678
randomfournumbers3.jpg,9012
Run the following command to generate the labels.txt file that PaddleOCR expects. Only columns 1 and 2 are used.
cd /home/timc/ocr/PaddleOCR
python ppocr/utils/gen_label.py --mode="rec" --input_path=data/labels.csv --output_label=data/labels.txt
Split the data into training (train.txt
) and validation (val.txt
) data sets in whichever ratio you see fit.
This is the character set that the model will use while fine-tuning. Create the data/dict.txt
and add in characters to recognise.
1
2
3
...
At this point - the training data has been annotated, the labels have been generated, and the dictionary has been created. The next step is to fine-tune the model.
Now that the training data has been prepared, fine-tuning can begin. Check to see the file structure looks somewhat like the below before continuing.
ls /home/timc/ocr/PaddleOCR
PaddleOCR/
├── applications/
├── configs/
├── data/
│ ├── dict.txt
│ ├── images/
│ │ ├── randomfournumbers1.jpg
│ │ ├── randomfournumbers2.jpg
│ │ └── randomfournumbers3.jpg
│ ├── labels.csv
│ ├── labels.txt
│ ├── train.txt
│ └── val.txt
├── docs/
├── ppocr/
└── tests/
PaddleOCR uses a config file to define the training parameters. Copy the existing PP-OCRv5 text recognition configuration from the configs/
folder as the base for the config.
cd /home/timc/ocr/PaddleOCR
cp configs/rec/PP-OCRv5/PP-OCRv5_server_rec.yml data/config.yml
Download the pre-trained text recognition model from PaddleOCR and place in the pretrained_models directory.
mkdir pretrained_models
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams -O pretrained_models/PP-OCRv5_server_rec_pretrained.pdparams
Modify the base config for fine-tuning. Refer to the PaddleOCR v2 documentation to understand some of the more important configuration options. The field documentation for v3 doesn't seem to exist anywhere.
The following is an example of what fields to modify in the config to match the training data. Do not take the values as gospel, instead adjust them to what makes sense for the use case.
Global:
use_gpu: true # false if using CPU
epoch_num: 500 # number of epochs to train for
save_model_dir: ./output/PP-OCRv5_server_rec # where to save the finetuned model
save_epoch_step: 50 # save the model every 50 epochs
eval_batch_step: [0, 500] # evaluate every 500 iterations after the 0th iteration
pretrained_model: ./pretrain_models/PP-OCRv5_server_rec_pretrained.pdparams # path to the pretrained model, this can also be a URL
character_dict_path: ./data/dict.txt # the dictionary created earlier
max_text_length: &max_text_length 4 # maximum text length of the labelled data
Train:
dataset:
data_dir: ./data/ # training dataset path
transforms:
- RecAug: # remove this line to stop random augmentation
label_file_list:
- ./data/train.txt
first_bs: &bs 128 # batch size
Eval:
dataset:
data_dir: ./data/ # validation dataset path
label_file_list:
- ./data/val.txt
batch_size_per_card: 128 # batch size
The fine-tuning process can be started by running the training script. This can take a while depending on the size of the dataset, the configuration, and the hardware. The script provides an estimated time for completion.
cd /home/timc/ocr/PaddleOCR
python tools/train.py -c data/config.yml
Once training is complete, export the fine-tuned model into a usable format. This will create the following files which can then be used for text recognition.
python tools/export_model.py -c output/PP-OCRv5_server_rec/config.yml -o Global.pretrained_model=output/PP-OCRv5_server_rec/best_accuracy.pdparams Global.save_inference_dir="output/trained_model/
ls output/trained_model/
output/trained_model/
├── inference.json
├── inference.pdiparams
├── inference.yaml
Now that the fine-tuned model has been exported, it can be used for text recognition. The model is used the same way as any other model in paddleOCR.
A sample of a script in Python that uses the fine-tuned model for text recognition is below:
from paddleocr import TextRecognition
ocr = TextRecognition("/path/to/the/traimed/model")
result = ocr.predict("/path/to/the/image")
print(result)
For a sample implementation, refer to the steakcam repository.