Search
๐Ÿ“–

we obtain non-confusing textual embeddings of a concept by fine-tuning CLIP via contrasting a concept and the over-segmented visual regions of other concepts.

์ถœ์ฒ˜
์ˆ˜์ง‘์‹œ๊ฐ„
2024/06/04 08:55
์—ฐ๊ฒฐ์™„๋ฃŒ
1 more property