Published inTDS ArchiveCan Weight Decay Work Without Residual Connections?How do residual connections secretly fight overfitting?Feb 21, 2023Feb 21, 2023
Published inTDS ArchiveSpeaking Probes: Self-Interpreting Models?Can language models aid in their interpretation?Jan 16, 2023Jan 16, 2023
Published inTDS ArchiveAnalyzing Transformers in Embedding Space — ExplainedIn this post, I present the paper “Analyzing Transformers in Embedding Space” (2022) by Guy Dar, Mor Geva, Ankit Gupta, and Jonathan…Sep 14, 20221Sep 14, 20221