Thanks, that's a useful way to think about it. Presumably the internal state at ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		HarHarVeryFunny 4 months ago \| parent \| context \| favorite \| on: BERT is just a single text diffusion step Thanks, that's a useful way to think about it. Presumably the internal state at any given token position must also be encoding information specific to that position, as well as this evolving/current memory... So, can this be seen in the internal embeddings - are they composed of a position-dependent part that changes a lot between positions, and an evolving memory part that is largely similar between positions only changing slowly? Are there any papers or talks discussing this ?

sailingparrot 4 months ago [–]

I don't remember any paper looking at this specific question (thought it might be out there), but in general Anthropic's circuit threads series of article is very good on the broader subject: https://transformer-circuits.pub

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact