Posts

Hugging Face Hub is a go-to place for state-of-the-art open source Machine Learning models. However, being a truly open source in that space is not only about exposing the weights under a proper license but also a training pipeline and the data used as an input to this process. Models are only as good as the data used to teach them. But datasets are also valuable for evaluation and benchmarking, and Hugging Face repositories have also become a standard way of exposing them to the public. The datasets library makes downloading a selected dataset in any Python app or notebook convenient. ...

Posts

Old dog, new tricks: Word Injection in the text embedding models

Sharing datasets with embeddings on Hugging Face Hub