🔷

‣ 까지 이해한 GaussianDreamer 파이프라인

프로젝트

🚀 prev note

♻️ prev note

🚀 next note

♻️ next note

ab0. title: 디퓨전 디노이징 모델의 작동 방식은 동생이 주사위를 던져 망가뜨린 레고 성을 원상복구시키면서 배운 추상적인 특징 조합법을 처음 보는 레고 뭉치에도 적용해 보는 것이다.

관련 임시노트

14 more properties

Shap-E 라는 2D to 3D Model 을 이용해서 3D pointcloud 를 만든다. 3D pointcloud 에 약간의 후처리 과정을 거친 다음, gaussian splatting 모델의 초기화값으로 사용한다. GaussianDreamer 의 최종 목표는 gaussian splatting 모델을 학습해 Implicit representation 을 생성하는 것이다.

Gaussian Splatting 은 NeRF 의 연속판이라고 추상적으로 이해할 수 있다. NeRF는 ray 기반의 렌더링 결과물을 GT Image 와 비교해 최적화하는 모델이다. Gaussian Splatting은 점 대신 가우시안을 사용한다. 구체적인 렌더링 방법은 조금 다르지만, 렌더링 결과물 이미지와 GT Image 를 비교해 최적화한다.

NeRF 와 마찬가지로, Gaussian Splatting 모델을 학습하기 위해서도 GT Image 가 필요한데 생성된 3D pointcloud 로부터 렌더링하는 것이기 때문에 당연히 GT Image 를 확보할수가 없다. 이 문제를 해결하기 위해 2D 디퓨전모델을 사용한다.

Gaussian Dreamer 에서 디퓨전 모델은 래스터화된 2D 이미지의 노이즈를 제거해서 3D 표현을 최적화하기 위한 도구로 여겨진다. 이때 사용될 수 있는 손실이 SDS 손실과 SJC 손실이다. SDS 손실과 SJC 손실을 제안한 논문에서는 3D 표현 모델로 NeRF를 사용했다. Gaussian Dreamer 에서는 NeRF 대신 Gaussian Splatting 을 사용하여 속도와 성능을 높였다.

디퓨전 모델이 학습하는 실체를 의미론적으로 어떻게 해석할 수 있는가?

parse me : 언젠가 이 글에 쓰이면 좋을 것 같은 재료을 보관해 두는 영역입니다.

Dreambooth 는 UNet 전체를 fine-tuning 해서 비효율적임. Parameter efficient fine-tuning (PEFT) & LoRA

To enrich details and improve the quality of the 3D asset, we optimize the 3D Gaussians θb with a 2D diffusion model F2DF_{2D}F2D​ after initializing them with 3D diffusion model priors. We employ the SDS (Score Distillation Sampling) loss to optimize the 3D Gaussians. First, we use the method of 3D Gaussian Splatting [24] to obtain the rendered image x = g(θi). Here, g represents the splatting rendering method as in Eq. 3. Then, we use Eq. 1 to calculate the gradients for updating the Gaussian parameters θi with the 2D diffusion model F2DF_{2D}F2D​ . After a short optimization period using the 2D diffusion model F2DF_{2D}F2D​, the final generated 3D instance θf achieves high quality and fidelity on top of the 3D consistency provided by the 3D diffusion model F3DF_{3D}F3D​.

•

해당 논문 5페이지

카메라 렌더링 결과물을 UNet 에 넣을 때..

•

카메라 position query → gaussian splatting model → output rasterized image → UNet

◦

카메라는 어떻게 세팅하는지?

◦

카메라 세팅 정보를 텍스트와 함께 UNet 에 어떻게 넣는지?

•

UNet 을 몇 번 돌리는지. T 번을 다 돌리는지?

◦

깨끗한 이미지가 나와도 T 번을 다 돌리는지?

from : 과거의 어떤 원자적 생각이 이 생각을 만들었는지 연결하고 설명합니다.

None

•

연결한 이유

supplementary : 어떤 새로운 생각이 이 문서에 작성된 생각을 뒷받침하는지 연결합니다.

ab0. title: 디퓨전 디노이징 모델의 작동 방식은 동생이 주사위를 던져 망가뜨린 레고 성을 원상복구시키면서 배운 추상적인 특징 조합법을 처음 보는 레고 뭉치에도 적용해 보는 것이다.

opposite : 어떤 새로운 생각이 이 문서에 작성된 생각과 대조되는지 연결합니다.

None

to : 이 문서에 작성된 생각이 어떤 생각으로 발전되거나 이어지는지를 작성하는 영역입니다.

None

ref : 생각에 참고한 자료입니다.

what we need to optimize are 1) the position (p), 2) transparency (alpha), 3) the covariance matrix (sigma), and 4) the color information represented by SH coefficients. … With these straightforward equations and adjustments to coefficients, you can represent various colors, as shown below. The point is that you can represent a 2D image using equations rather than storing actual RGB values. … SH is essentially an attempt to apply this methodology to 3D.