The Illustrated Retrieval Transformer · jalammar/jalammar.github.io · Discussion #21 (original) (raw)

Thanks for the great illustrations! I've been trying to find out which key-value database could be used that does effective vector similarity search. Do you happen to know? Or is it a custom built thing?

You must be logged in to vote

2 replies

Thanks, Annoy seems like it's what I'm looking for. I like that it can use static files as indexes.

Thanks for the great article Jay!
Are the completions also given to the encoder block ? Or are they only used to calculate loss ?

You must be logged in to vote

1 reply

They're not. The encoders are only for the retrieved neighbors.

I read the paper a couple of weeks ago, and came away with a much better impression of the approach after reading your description!

But I think you should make it clearer that your examples are idealistic. If you look at the examples given in the paper, the two nearest neighbours it found seemed to always be the same sentence, just with slightly different punctuation. The downside of using a huge corpus.

You show the output of the encoder stack as Keys and Values, different colours as if they are separate things. But the output is a single embedding, which is used as both the the key and value by cross attention.

You must be logged in to vote

1 reply

But I think you should make it clearer that your examples are idealistic. If you look at the examples given in the paper, the two nearest neighbours it found seemed to always be the same sentence, just with slightly different punctuation. The downside of using a huge corpus.

Oh for sure. The educational device I tend to use is oversimplify in the beginning to establish key concepts and incrementally add detail. Even at the end of the article, there are more details that can be learned form the paper (e.g. how the actual attention mechanism incorporates different chunks). But the hope is that readers feel confident and prepared enough to continue tackling the concepts of the paper if they need more detail.

You show the output of the encoder stack as Keys and Values, different colours as if they are separate things. But the output is a single embedding, which is used as both the the key and value by cross attention.

Hmmm interesting. That wasn't my read. I would appreciate if you can point out where they detail this. Thank you!

Could you please explain Chunked cross attention ?

You must be logged in to vote

0 replies

You must be logged in to vote

0 replies

Hey @jalammar !

I am part of a Brazilian community of ML practitioners, focusing on bringing valuable content to Brazil, in Portuguese. It is a collaborative community where students and professionals can participate by discussing and generating content for the aspirants who are not able to consume knowledge in English.

I was wondering if I could translate your article and share it in our community. I'd definitely give your the proper credits and make ir very clear it is freely translated.

Out community is called BRAINS - Brazilian AI Networks, and can be found at: https://brains.dev/

I hope you don't mind if I translate it. It will be priceless for our community.

You must be logged in to vote

0 replies