/
Search
💬

That's the embedding of the initial CLS token. It's "pooled" from all input tokens in the sense that the multiple attention layers will force it to depend on all other tokens. The pooling here is not the usual fixed "avg-pool" or "max-pool" operation. … It's "pooling" in the sense that it's extracting a representation for the whole sequence.

출처
수집시간
2024/05/27 07:44
연결완료
1 more property