| --- |
| tags: |
| - setfit |
| - sentence-transformers |
| - text-classification |
| - generated_from_setfit_trainer |
| widget: [] |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| pipeline_tag: text-classification |
| library_name: setfit |
| inference: true |
| license: mit |
| datasets: |
| - NLBSE/nlbse25-code-comment-classification |
| language: |
| - en |
| base_model: |
| - sentence-transformers/all-MiniLM-L6-v2 |
| --- |
| |
| # Python comment classifier |
|
|
| This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification. |
|
|
| The model has been trained using few-shot learning that involves: |
|
|
| 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. |
| 2. Training a classification head with features from the fine-tuned model. |
|
|
| ## Model Description |
|
|
| - **Model Type:** SetFit |
| - **Classification head:** [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) |
|
|
| ## Sources |
|
|
| - **Repository:** [GitHub](https://github.com/fabiancpl/sbert-comment-classification/) |
| - **Paper:** [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440) |
| - **Dataset:** [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification) |
|
|
| ## How to use it |
|
|
| First, install the depencies: |
|
|
| ```bash |
| pip install setfit scikit-learn |
| ``` |
|
|
| Then, load the model and run inferences: |
|
|
| ```python |
| from setfit import SetFitModel |
| |
| # Download from the 🤗 Hub |
| model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python") |
| # Run inference |
| preds = model("This function sorts a list of numbers.") |
| ``` |
|
|
| ## Cite as |
|
|
| ```bibtex |
| @inproceedings{11029440, |
| author={Peña, Fabian C. and Herbold, Steffen}, |
| booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, |
| title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, |
| year={2025}, |
| pages={21-24}, |
| doi={10.1109/NLBSE66842.2025.00010}} |
| ``` |
|
|