It is a wonderful feeling to understand something about the world that no one else has ever understood - Linus Pauling

Old dog, new tricks: Word Injection in the text embedding models

In real life, we rarely experience a mind-blowing moment that makes us rethink everything we know. Present experiences are rather built on top of past ones, and gathering knowledge is a gradual process, not a sudden one. When I learn a new word, I probably try to understand it based on the concepts I already know. If I know the word “dog”, I can easily understand the word “puppy”. Somehow, we accept that the representation models, a....

September 22, 2024 · 11 min · Kacper Łukawski
Hugging Face emoji, looking at a table filled with lots of numbers, cartoon-like, scratch, tech-savy, open source --ar 16:9

Sharing datasets with embeddings on Hugging Face Hub

Hugging Face Hub is a go-to place for state-of-the-art open source Machine Learning models. However, being a truly open source in that space is not only about exposing the weights under a proper license but also a training pipeline and the data used as an input to this process. Models are only as good as the data used to teach them. But datasets are also valuable for evaluation and benchmarking, and Hugging Face repositories have also become a standard way of exposing them to the public....

November 18, 2023 · 6 min · Kacper Łukawski