gilf/french-camembert-postag-model

token classificationtransformersfrtransformerspytorchtfsafetensorscamemberttoken-classification
233.7K

About

The french-camembert-postag-model is a part of speech tagging model for French that was trained on the free-french-treebank dataset available on github. The base tokenizer and model used for training is 'camembert-base'.

Supported Tags

It uses the following tags:

TagCategoryExtra Info
ADJadjectif
ADJWHadjectif
ADVadverbe
ADVWHadverbe
CCconjonction de coordination
CLOpronomobj
CLRpronomrefl
CLSpronomsuj
CSconjonction de subordination
DETdéterminant
DETWHdéterminant
ETmot étranger
Iinterjection
NCnom commun
NPPnom propre
Ppréposition
P+Dpréposition + déterminant
PONCTsigne de ponctuation
PREFpréfixe
PROautres pronoms
PRORELautres pronomsrel
PROWHautres pronomsint
U?
Vverbe
VIMPverbe imperatif
VINFverbe infinitif
VPPparticipe passé
VPRparticipe présent
VSsubjonctif

More information on the tags can be found here:

http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf

Usage

The usage of this model follows the common transformers patterns. Here is a short example of its usage:

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("gilf/french-camembert-postag-model")
model = AutoModelForTokenClassification.from_pretrained("gilf/french-camembert-postag-model")

from transformers import pipeline

nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)

nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')

The lines above would display something like this on a Jupyter notebook:

[{'entity_group': 'NC', 'score': 0.5760144591331482, 'word': '<s>'},
 {'entity_group': 'U', 'score': 0.9946700930595398, 'word': 'Face'},
 {'entity_group': 'P', 'score': 0.999615490436554, 'word': 'à'},
 {'entity_group': 'DET', 'score': 0.9995906352996826, 'word': 'un'},
 {'entity_group': 'NC', 'score': 0.9995531439781189, 'word': 'choc'},
 {'entity_group': 'ADJ', 'score': 0.999183714389801, 'word': 'inédit'},
 {'entity_group': 'P', 'score': 0.3710663616657257, 'word': ','},
 {'entity_group': 'DET', 'score': 0.9995903968811035, 'word': 'les'},
 {'entity_group': 'NC', 'score': 0.9995649456977844, 'word': 'mesures'},
 {'entity_group': 'VPP', 'score': 0.9988670349121094, 'word': 'mises'},
 {'entity_group': 'P', 'score': 0.9996246099472046, 'word': 'en'},
 {'entity_group': 'NC', 'score': 0.9995329976081848, 'word': 'place'},
 {'entity_group': 'P', 'score': 0.9996233582496643, 'word': 'par'},
 {'entity_group': 'DET', 'score': 0.9995935559272766, 'word': 'le'},
 {'entity_group': 'NC', 'score': 0.9995369911193848, 'word': 'gouvernement'},
 {'entity_group': 'V', 'score': 0.9993771314620972, 'word': 'ont'},
 {'entity_group': 'VPP', 'score': 0.9991101026535034, 'word': 'permis'},
 {'entity_group': 'DET', 'score': 0.9995885491371155, 'word': 'une'},
 {'entity_group': 'NC', 'score': 0.9995636343955994, 'word': 'protection'},
 {'entity_group': 'ADJ', 'score': 0.9991781711578369, 'word': 'forte'},
 {'entity_group': 'CC', 'score': 0.9991298317909241, 'word': 'et'},
 {'entity_group': 'ADJ', 'score': 0.9992275238037109, 'word': 'efficace'},
 {'entity_group': 'P+D', 'score': 0.9993300437927246, 'word': 'des'},
 {'entity_group': 'NC', 'score': 0.8353511393070221, 'word': 'ménages</s>'}]
DEPLOY IN 60 SECONDS

Run french-camembert-postag-model on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.