⛳

2_1_3_2_1.1____3.a1.1.1_2. [entry] title: Pyramid Mask Text Detector 추론 구현체를 분석하고 구현의 정도를 확인하기

생성

prev summary

🚀 prev note

♻️ prev note

2_1_3_2_1.1____3.a1.1.1_1. [info] title: Pyramid Mask Text Detector 공식 구현체를 실행할 수 있는 상태로 만들기

next summary

🚀 next note

♻️ next note

관련 임시노트

9 more properties

논하는 세부 주제들 개요 MAX 25 links

•

저자는 구현체를 공개했지만, 이것은 추론 구현체에 불과하다.

◦

2_1_3_2_1.1____3.a1.1.1_1. [info] title:
Pyramid Mask Text Detector 공식 구현체를 실행할 수 있는 상태로 만들기

•

Top-down approach

가장 높은 수준에서의 추론 파이프라인은 결국 아래와 같이 요약할 수 있다.

# apply pre-processing to image
image = self.transforms(original_image)
# convert to an ImageList, padded so that it is divisible by
# cfg.DATALOADER.SIZE_DIVISIBILITY
image_list = to_image_list(image, self.cfg.DATALOADER.SIZE_DIVISIBILITY)
image_list = image_list.to(self.device)
# compute predictions
with torch.no_grad():
    predictions = self.model(image_list)
predictions = [o.to(self.cpu_device) for o in predictions]
Python
복사

•

predictions 변수는 torchvision 의 자료형이 들어있는 리스트이다.

•

self.transforms, self.model 을 확인하면 될 것 같다.

self.transforms

self.model

class PMTDDemo(COCODemo):
    CATEGORIES = [
        "__background",
        "text"
    ]
		# ...


# Copyright (c) Facebook, Inc.
class COCODemo(object):
    # COCO categories for pretty print
    CATEGORIES = [
				# 어차피 PMTDDemo 에서 덮어씀
		]

    def __init__(self, cfg, masker, ... ):
				# ...
				self.model = build_detection_model(cfg)
				# ...
				if show_mask_heatmaps:
            self.masker = Masker(threshold=-1, padding=1)
        else:
            self.masker = masker

# Copyright (c) Facebook, Inc.
from .generalized_rcnn import GeneralizedRCNN
_DETECTION_META_ARCHITECTURES = {"GeneralizedRCNN": GeneralizedRCNN}
def build_detection_model(cfg):
    meta_arch = _DETECTION_META_ARCHITECTURES[cfg.MODEL.META_ARCHITECTURE]
    return meta_arch(cfg)
Python
복사

•

모델 설정과 관련된 정보를 담고 있는 cfg 와 masker 이외에는 신경쓸 필요가 없다. 왜냐하면  저자는  self.model 을 직접 만든 것이 아니라, 그냥 maskrcnn-benchmark의 구현체를 불러다가 사용하고 있기 때문이다. 아래는 저자가 모델을 만들기 위해 사용하는 코드이다. masker 변수에 maskrcnn-benchmark 구현체 내에 구현되어 있는 Masker 클래스를 사용하는 것이 아니라 직접 제작한 PlaneClustering 클래스를 사용하고 있다.

from demo.inference import PlaneClustering
from maskrcnn_benchmark.modeling.roi_heads.mask_head.inference import Masker
if args.method == 'PlaneClustering':
    masker = PlaneClustering()
else:
    masker = Masker(threshold=0.01, padding=1)

from maskrcnn_benchmark.config import cfg
cfg.merge_from_file("configs/e2e_PMTD_R_50_FPN_1x_ICDAR2017MLT_test.yaml")
cfg.merge_from_list([
    'MODEL.DEVICE', args.device,
    'MODEL.WEIGHT', args.model_path,
    'INPUT.MAX_SIZE_TEST', args.longer_size,
])

pmtd_demo = PMTDDemo(
	  cfg,
	  masker=masker,
	  confidence_threshold=0.5,
	  show_mask_heatmaps=False,
)
Python
복사

•

다음으로 탐색할 소스코드는 두 가지이다. 첫째, PlaneClustering 클래스를 알아보아야 한다. 둘째, masker 변수가 self.model 과 상호작용하는 방식을 확인하는 것이다.

PlaneClustering

•

PlaneClustering 은 Masker 클래스를 상속받았다. 따라서 5를 분석한 뒤 어느 부분부터 분석을 해야 하는지를 알아보도록 한다.

from maskrcnn_benchmark.modeling.roi_heads.mask_head.inference import Masker
class PlaneClustering(Masker):
    def __init__(self):
				...
		def forward_single_image(self, ...):
				...
		def reg_pyramid_in_image(self, ...):
				...
Python
복사

•

masker 변수는 __call__ 이 구현되어 있다. 이 함수에서는 배치 차원 각각에 대해서 forward_single_image 을 호출한다. PlaneClustering 클래스는 forward_single_image 함수만을 구현한다.

•

이 함수는 다음 코드를 포함하고 있다.

◦

high level view

# arg: mask: shape (28, 28), ranged [0, 1] 
if torch.max(mask) <= 0.7:
		raise ValueError('No apex')

# arg: mask[..., None]: (28, 28, 1), ranged [0, 1]
# arg: self.assist_info: (28, 28, 3), ranged [0, 27]
src_points = torch.cat([self.assist_info, mask[..., None]], dim=2)
pos_points = src_points[(mask > 0.1) & (mask < 0.8)]
# ret: src_points: (28, 28, 4), ranged [0, 27]
# ret: pos_points: (28x28개_픽셀_중_조건을_만족하는_픽셀의_수, 4)

planes = plane_init(pos_points, ideal_value=1)
planes = plane_clustering(pos_points, planes)
points = get_intersection_of_plane(planes)
# ret: planes: (3, 4) - 평면 4개 각각의 평면방정식 변수 3개
# ret: points: (4, 2) - 평면 4개의 바닥면에서 교점, 값의 범위 [0, 27] 을 보장할 수 없음.

if not is_clockwise(points):
    raise ValueError("Points is not clockwise")
points = torch.clamp(points, - 0.5 * MASK_SCALE, 1.5 * MASK_SCALE)
Python
복사

◦

plane clustering algorithm

def plane_clustering(pos_points, planes, iter_num=10):
    ones = torch.ones((1, 4), dtype=torch.float32)
    for iter in range(iter_num):
        ans = torch.abs(torch.matmul(pos_points, torch.cat([planes, ones])))
        partition = torch.argmin(ans, dim=1)
        point_groups = [pos_points[partition == i] for i in range(planes.shape[1])]
        for i, group in enumerate(point_groups):
            if len(group) == 0:
                continue
            X = group[:, :3]
            B = -group[:, 3:]
            A = torch.lstsq(B, X)[0][:3]
            abs_residuals = torch.abs(torch.matmul(X, A) - B)
            abs_residual_scale = torch.median(abs_residuals)
            if abs_residual_scale > 1e-4:
                X_weight = abs_residuals / (6.9460 * abs_residual_scale)
                X_weight[X_weight > 1] = 0
                X_weight[X_weight <= 1] = (1 - X_weight[X_weight <= 1] ** 2) ** 2
                X_weighted = X_weight * X
                X = torch.matmul(X_weighted.t(), X)
                B = torch.matmul(X_weighted.t(), B)
                A = torch.lstsq(B, X)[0]
            planes[:, i] = A.flatten()
    return planes
Python
복사

masker 변수와 self.model 의 관계

•

Masker 클래스가 무엇을 하는 역할인지를 mask-rcnn 논문, 코드와 연관지어 생각해 보아야 한다.

◦

bounding box 로부터 지정된 특정 위치에 해당하는 이미지의 영역에 mask set 을 프로젝션한다.

class Masker(object):
    """
    Projects a set of masks in an image on the locations
    specified by the bounding boxes
    """
Python
복사

◦

Mask R-CNN 에서 마스크 브랜치의 최종 output 텐서인 mask 의 크기는 굉장히 작다. 그리고 원본 영상과의 위치관계가 제거되었다고도 볼 수 있다.

▪

◦

따라서 mask 를 projection 한다는 것은 다음을 의미한다.

해당 mask 를 만든 영역 제안이 그리는 bounding box 와 이에 해당하는 RoI 가 원본 영상에서 차지하는 영역에 해당 mask 를 잘 펼쳐 바른다.

기존 개요가 마음에 들지 않아서 
엔트리를 다음과 같이 변경한다.