1% 의 금융 데이터로 학습시킨 모델로 10% 테스트 데이터에 대해 추론을 시키는 경우
from the scratch (Aihub 금융 train) | transfer learned
(Aihub 금융 train) | |
IC2015 bbox hmean | 0.0000 | 할필요없음 |
1% bbox hmean | 0.9317 | 할필요없음 |
10% bbox hmean | 0.9303 | 할필요없음 |
10% 의 금융 데이터로 학습시킨 모델로 100% 테스트 데이터에 대해 추론을 시키는 경우
from the scratch (Aihub 금융 train) | transfer learned
(Aihub 금융 train) | |
IC2015 bbox hmean | 0.0029 | 0.0000 |
10% bbox hmean | 0.8930 | 0.9625 |
100% bbox hmean | 0.8429 | 0.9415 |
•
10% from the scratch
nohup tools/dist_train.sh \
configs/textdet/dbnet/dbnet_resnet18_fpnc_20e_aihubfinance10of100.py \
2 > nohup.out &
Bash
복사
train
cp work_dirs/dbnet_resnet18_fpnc_20e_aihubfinance10of100/epoch_20.pth \
pretrained/dbnet_resnet18_fpnc_20e_aihubfinance10of100_sparkling-cloud-104.pth
nohup tools/dist_test.sh \
configs/textdet/dbnet/dbnet_resnet18_fpnc_20e_aihubfinance10of100.py \
pretrained/dbnet_resnet18_fpnc_20e_aihubfinance10of100_sparkling-cloud-104.pth \
2 > nohup.out &
Bash
복사
eval
11/28 19:12:01 - mmengine - INFO - Epoch(test) [3250/3250] AihubFinance10of100/icdar/precision: 0.9538 AihubFinance10of100/icdar/recall: 0.9713 AihubFinance10of100/icdar/hmean: 0.9625 AihubFinance100of100/icdar/precision: 0.9253 AihubFinance100of100/icdar/recall: 0.9582 AihubFinance100of100/icdar/hmean: 0.9415 IC15/icdar/precision: 0.0000 IC15/icdar/recall: 0.0000 IC15/icdar/hmean: 0.0000
Bash
복사
eval res
python3 -m mmocr.ocr \
--det-config configs/textdet/dbnet/dbnet_resnet18_fpnc_20e_aihubfinance10of100.py \
--det-ckpt pretrained/dbnet_resnet18_fpnc_20e_aihubfinance10of100_sparkling-cloud-104.pth \
data/det/aihub_finance/part_10of100/imgs/IMG_OCR_6_F_00964.png \
--img-out-dir work_dirs/dbnet_resnet18_fpnc_20e_aihubfinance10of100 \
--pred-out-file work_dirs/dbnet_resnet18_fpnc_20e_aihubfinance10of100/output.pkl \
--device cpu
Bash
복사
io
python3 -m work_dirs_utils.pkl2json \
work_dirs/dbnet_resnet18_fpnc_20e_aihubfinance10of100/output.pkl \
work_dirs/dbnet_resnet18_fpnc_20e_aihubfinance10of100/output.json
Bash
복사
jsonify
200장 랜덤 샘플링
cnt | 비고 | |
모집단 | 5,000장 | 전체 데이터의 5% |
샘플링 | 5,000장 중 180장 | |
문제가 발견된 문서 | 180장 중 36장 | 검사한 데이터 중 약 20% |