sismetanin/rureviews
Updated • 89
How to use seara/rubert-tiny2-russian-sentiment with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="seara/rubert-tiny2-russian-sentiment") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("seara/rubert-tiny2-russian-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("seara/rubert-tiny2-russian-sentiment")This is RuBERT-tiny2 model fine-tuned for sentiment classification of short Russian texts. The task is a multi-class classification with the following labels:
0: neutral
1: positive
2: negative
Label to Russian label:
neutral: нейтральный
positive: позитивный
negative: негативный
from transformers import pipeline
model = pipeline(model="seara/rubert-tiny2-russian-sentiment")
model("Привет, ты мне нравишься!")
# [{'label': 'positive', 'score': 0.9398769736289978}]
This model was trained on the union of the following datasets:
An overview of the training data can be found on S. Smetanin Github repository.
Download links for all Russian sentiment datasets collected by Smetanin can be found in this repository.
Training were done in this project with this parameters:
tokenizer.max_length: 512
batch_size: 64
optimizer: adam
lr: 0.00001
weight_decay: 0
epochs: 5
Train/validation/test splits are 80%/10%/10%.
| neutral | positive | negative | macro avg | weighted avg | |
|---|---|---|---|---|---|
| precision | 0.7 | 0.84 | 0.74 | 0.76 | 0.75 |
| recall | 0.74 | 0.83 | 0.69 | 0.75 | 0.75 |
| f1-score | 0.72 | 0.83 | 0.71 | 0.75 | 0.75 |
| auc-roc | 0.85 | 0.95 | 0.91 | 0.9 | 0.9 |
| support | 5196 | 3831 | 3599 | 12626 | 12626 |