Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In DDN, 1×1 convolutions are used only in the output layers of the Discrete Distribution Layer (DDL). The NN blocks between DDLs, which supply the fundamental computational power and parameter count, adopt standard 3×3 convolutions.


Was there a specific reason for this choice?


1x1 convolution is the most lightweight operator for transforming features into outputs.

3x3 convolution is the most common operator used to provide basic computational power.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: