Fine-tuning PaddleOCR models for text recognition

set up environment

Ensure the following has been set up:

A Python interpreter environment with the suggestion to use Python 3.11.1 or greater
Ensure the proper drivers (with CUDA) are installed if using a Nvidia GPU

Identify the CUDA version by looking at the output from nvidia-smi

nvidia-smi

+-----------------------------------------------------------------------------------------+

| NVIDIA-SMI 576.88                 Driver Version: 576.88         CUDA Version: 12.9     |

+-----------------------------------------------------------------------------------------+

install paddlepaddle

The first step is to install paddlepaddle. Follow the steps from PaddleOCR's installation guide to install the right version. Given the output from nvidia-smi showing CUDA Version: 12.9, the command to install paddlepaddle is:

pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/

clone the PaddleOCR repository and install dependencies

The next step is to clone the PaddleOCR and install the dependencies. The PaddleOCR repository contains helper scripts and tools to fine-tune a PP-OCRv5 model.

The path for the instructions below will be /home/timc/ocr. Adjust the path for your own system.

cd /home/timc/ocr

git clone https://github.com/PaddlePaddle/PaddleOCR.git

cd PaddleOCR

pip install -r requirements.txt

set up training data

Gather your training data, whether this is your own data or from an existing dataset. The steps below focus on bringing your own data set, whether it is synthetic training data or real life examples of the text you to recognise.

using your own dataset

PaddleOCR expects the dataset to be in a specific format of filename{tabspace}label which may be different from other training data sets. There is a utility file ppocr/utils/gen_label.py in the PaddleOCR repository that will convert a CSV into the expected format.

creating the laballed data set

Move the unlabelled (or labelled) data folder into the PaddleOCR directory.

cd /path/to/the/dataset

ls

data/

└── images/

    ├── randomfournumbers1.jpg

    ├── randomfournumbers2.jpg

    └── randomfournumbers3.jpg

cp -r /path/to/the/dataset/* /home/timc/ocr/PaddleOCR/

Create data/labels.csv and annotate the data in the format of filename,label. Do not add a header row in the CSV file.

randomfournumbers1.jpg,1234

randomfournumbers2.jpg,5678

randomfournumbers3.jpg,9012

Run the following command to generate the labels.txt file that PaddleOCR expects. Only columns 1 and 2 are used.

cd /home/timc/ocr/PaddleOCR

python ppocr/utils/gen_label.py --mode="rec" --input_path=data/labels.csv --output_label=data/labels.txt

Split the data into training (train.txt) and validation (val.txt) data sets in whichever ratio you see fit.

creating the dictionary

This is the character set that the model will use while fine-tuning. Create the data/dict.txt and add in characters to recognise.

...

wrapping up

At this point - the training data has been annotated, the labels have been generated, and the dictionary has been created. The next step is to fine-tune the model.

fine-tuning the model

Now that the training data has been prepared, fine-tuning can begin. Check to see the file structure looks somewhat like the below before continuing.

ls /home/timc/ocr/PaddleOCR

PaddleOCR/

├── applications/

├── configs/

├── data/

│   ├── dict.txt

│   ├── images/

│   │   ├── randomfournumbers1.jpg

│   │   ├── randomfournumbers2.jpg

│   │   └── randomfournumbers3.jpg

│   ├── labels.csv

│   ├── labels.txt

│   ├── train.txt

│   └── val.txt

├── docs/

├── ppocr/

└── tests/

creating the configuration for fine-tuning

PaddleOCR uses a config file to define the training parameters. Copy the existing PP-OCRv5 text recognition configuration from the configs/ folder as the base for the config.

cd /home/timc/ocr/PaddleOCR

cp configs/rec/PP-OCRv5/PP-OCRv5_server_rec.yml data/config.yml

Download the pre-trained text recognition model from PaddleOCR and place in the pretrained_models directory.

mkdir pretrained_models

wget https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_server_rec_pretrained.pdparams -O pretrained_models/PP-OCRv5_server_rec_pretrained.pdparams

Modify the base config for fine-tuning. Refer to the PaddleOCR v2 documentation to understand some of the more important configuration options. The field documentation for v3 doesn't seem to exist anywhere.

The following is an example of what fields to modify in the config to match the training data. Do not take the values as gospel, instead adjust them to what makes sense for the use case.

Global:

  use_gpu: true # false if using CPU

  epoch_num: 500 # number of epochs to train for

  save_model_dir: ./output/PP-OCRv5_server_rec # where to save the finetuned model

  save_epoch_step: 50 # save the model every 50 epochs

  eval_batch_step: [0, 500] # evaluate every 500 iterations after the 0th iteration

  pretrained_model: ./pretrain_models/PP-OCRv5_server_rec_pretrained.pdparams # path to the pretrained model, this can also be a URL

  character_dict_path: ./data/dict.txt # the dictionary created earlier

  max_text_length: &max_text_length 4 # maximum text length of the labelled data

Train:

  dataset:

    data_dir: ./data/ # training dataset path

    transforms:

    - RecAug: # remove this line to stop random augmentation

    label_file_list:

    - ./data/train.txt

    first_bs: &bs 128 # batch size

Eval:

  dataset:

    data_dir: ./data/ # validation dataset path

    label_file_list:

    - ./data/val.txt

    batch_size_per_card: 128 # batch size

fine-tuning the model

The fine-tuning process can be started by running the training script. This can take a while depending on the size of the dataset, the configuration, and the hardware. The script provides an estimated time for completion.

cd /home/timc/ocr/PaddleOCR

python tools/train.py -c data/config.yml

Once training is complete, export the fine-tuned model into a usable format. This will create the following files which can then be used for text recognition.

python tools/export_model.py -c output/PP-OCRv5_server_rec/config.yml -o Global.pretrained_model=output/PP-OCRv5_server_rec/best_accuracy.pdparams Global.save_inference_dir="output/trained_model/

ls output/trained_model/

output/trained_model/

├── inference.json

├── inference.pdiparams

├── inference.yaml

using the fine-tuned model

Now that the fine-tuned model has been exported, it can be used for text recognition. The model is used the same way as any other model in paddleOCR.

A sample of a script in Python that uses the fine-tuned model for text recognition is below:

from paddleocr import TextRecognition

ocr = TextRecognition("/path/to/the/traimed/model")

result = ocr.predict("/path/to/the/image")

print(result)

For a sample implementation, refer to the steakcam repository.

Fine-tuning PaddleOCR models for text recognition

introduction

what is PaddleOCR?

set up environment

install paddlepaddle

clone the PaddleOCR repository and install dependencies

set up training data

using your own dataset

creating the laballed data set

creating the dictionary

wrapping up

fine-tuning the model

creating the configuration for fine-tuning

fine-tuning the model

using the fine-tuned model