GlotLID is a Fasttext language identification (LID) model that supports more than 2000 labels.
Latest: GlotLID is now updated to V3. V3 supports 2102 labels (three-letter ISO codes with script). For more details on the supported languages and performance, as well as significant changes from previous versions, please refer to https://github.com/cisnlp/GlotLID/blob/main/languages-v3.md.
Here is how to use this model to detect the language of a given text:
>>> import fasttext
>>> from huggingface_hub import hf_hub_download
# model.bin is the latest version always
>>> model_path = hf_hub_download(repo_id="cis-lmu/glotlid", filename="model.bin")
>>> model = fasttext.load_model(model_path)
>>> model.predict("Hello, world!")
If you are not a fan of huggingface_hub, then download the model directyly:
>>> ! wget https://huggingface.co/cis-lmu/glotlid/resolve/main/model.bin
>>> import fasttext
>>> model = fasttext.load_model("/path/to/model.bin")
>>> model.predict("Hello, world!")
The model is distributed under the Apache License, Version 2.0.
We always maintain the previous version of GlotLID in our repository.
To access a specific version, simply append the version number to the filename.
model_v1.bin (introduced in the GlotLID paper and used in all experiments).model_v2.bin (an edited version of v1, featuring more languages, and cleaned from noisy corpora based on the analysis of v1).model_v3.bin (an edited version of v2, featuring more languages, excluding macro languages, further cleaned from noisy corpora and incorrect metadata labels based on the analysis of v2, supporting "zxx" and "und" series labels)model.bin always refers to the latest version (v3).
If you use this model, please cite the following paper:
@inproceedings{
kargaran2023glotlid,
title={{GlotLID}: Language Identification for Low-Resource Languages},
author={Kargaran, Amir Hossein and Imani, Ayyoob and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
booktitle={The 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023},
url={https://openreview.net/forum?id=dl4e3EBz5j}
}