Software Citation Intent Classifier
The Software Citation Intent Classifier (soft-cite-intent-cls
) can be used to predict the citation or "reference" intent
behind a text reference or citation to a piece of software in an academic article.
Possible values include:
used
mentioned
created
other
For example, given the sentence: "The XYZ code and software, with an example input dataset and detailed instructions are available from GitHub (https://github.com/user/repo)" should be predicted as "created" as the authors are directly referencing their own created code.
The specific name of the code and software and username and repository have been removed for privacy.
In comparison, the sentence: "For the statistical analyses of the data in this study, the Statistical Package for the Social Sciences (SPSS) version 22 (IBM Corp, Armonk, New York) was used" should be predicted as "used" as the authors are directly informing the reader that this is not their own software but rather software they used for analysis.
This was originally created during the CZI Software Impact Hackathon by Ana-Maria Istrate, Joshua Fisher, Xinyu Yang, Kara Moraw, Kai Li, Donghui Li, and Martin Klein.
Their original work can be found in the SoftwareCitationIntent Repository.
Eva Maxfield Brown recreated and uploaded this version of the model to Huggingface Hub for her own work, her scripts for recreating this model can be found in the grobid-soft-proc repository.
Model Details
Model Description
- Developed by: Originally created by Ana-Maria Istrate, Joshua Fisher, Xinyu Yang, Kara Moraw, Kai Li, Donghui Li, and Martin Klein.
- Made Available by Eva Maxfield Brown
- Language(s) (NLP): en (English)
- License: MIT
- Finetuned from model [optional]: microsoft/deberta-v3-base
Model Sources [optional]
- Repository: Original Repository, Distribution Repository
- Paper [optional]: Scientific Software Citation Intent Classification using Large Language Models
Training Details
Training Data
- HuggingFace Dataset: soft-cite-intent
- CSV from Repo: soft-cite-intent
Training Procedure
- Training Script: train-and-upload-best
Results
- Accuracy: 0.916
- Precision: 0.916
- Recall: 0.916
- F1: 0.916
Citation [optional]
See their paper Scientific Software Citation Intent Classification using Large Language Models at NSLP2024.
- Downloads last month
- 105