SetFit with BAAI/bge-base-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-base-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
5
  • '"All of the cited sources do mention her, and enable reliable sourcing of her childhood, education, and acting career. There are reviews of her acting that can be added. However, the reason why I created this article is her play ''How to Load a Musket'' which I believe passes . 3. ""The person has created... (a work that) ha(s) been the primary subject... of multiple independent periodical articles or reviews"'
  • 'I concur. The company does not appear to meet and
6
  • '"I was the one who put this up for deletion, and I almost want to change my own vote, because of the disregard for the rules which people are performing by excercising such predjudice against this page. Come on people, the problem is that there are no reliable secondary sources as of yet, not that ""evil bloggers"" are trying to rule the world, or that neolgisms must be squooshed without mercy. The issue here is lack of established credible sources because it's way too young to have sources that qualify under "'
  • "Um, with all due respect, her widely publicized grassroots-organized ouster is what she is notable for, if being an elected official somehow wasn't enough. Nominator hasn't even looked at the massive amount of supporting media coverage, otherwise he would have known that the contents of the since-removed YouTube video are well documented by
8
2
  • 'as Carlos Suarez has pointed out this is an unsourced violation which even if sourced would likely fail our biographical guidelines anyhow. Lose-lose'
  • "While he isn't super notable, I don't see why he should be excluded any more than say a no-name backbench MP from one of the major parties should be excluded. Ultimately, he received considerable media coverage as a result of being elected to federal Parliament โ€“ and then again when he subsequently lost his spot when the results were declared void. He again ran for election at the special election, and there was coverage on him following his failure to gain a seat at that. It's really more than just
4
  • 'unsourced, unverified - could be
3
  • 'complete nonsense, meets #G1
7
  • '"per nom. Insufficiently notable publication, and no ""''credible, third-party sources with a reputation for fact-checking and accuracy''"" as required by "'
  • 'I mentioned the NYT link for purposes, not , as I believe was clear'
  • "As it was mentioned, there are remarkable claims being made in the article that need to be
9
  • 'as the one who PRODded it; I was tempted to tag its initial incarnation as G11, but I still think it should be deleted per '
  • "I don't understand why you're all voting to keep. Don't you know what a son of a bitch is? Isn't policy, or has that been rejected now"
  • 'Big ole mess of and '
1
  • 'Bot-like nomination made without any
0

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the ๐Ÿค— Hub
model = SetFitModel.from_pretrained("research-dump/bge-base-en-v1.5_wikipedia_policy_wikipedia_policy")
# Run inference
preds = model("fails ")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 2 38.196 433
Label Training Sample Count
0 23
1 17
2 21
3 17
4 39
5 671
6 60
7 36
8 100
9 16

Training Hyperparameters

  • batch_size: (8, 2)
  • num_epochs: (5, 5)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 10
  • body_learning_rate: (1e-05, 1e-05)
  • head_learning_rate: 5e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: True
  • use_amp: True
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0004 1 0.2133 -
0.2 500 0.2428 0.2210
0.4 1000 0.1484 0.1927
0.6 1500 0.0528 0.1995
0.8 2000 0.0335 0.2373
1.0 2500 0.0346 0.2294
1.2 3000 0.0267 0.2447
1.4 3500 0.0239 0.2290
1.6 4000 0.0253 0.2354
1.8 4500 0.0219 0.2390
2.0 5000 0.02 0.2335
2.2 5500 0.019 0.2319
2.4 6000 0.0168 0.2281
2.6 6500 0.0154 0.2499
2.8 7000 0.013 0.2537
3.0 7500 0.015 0.2408
3.2 8000 0.0121 0.2423
3.4 8500 0.015 0.2391
3.6 9000 0.0131 0.2452
3.8 9500 0.0106 0.2438
4.0 10000 0.0135 0.2330
4.2 10500 0.0114 0.2396
4.4 11000 0.0115 0.2413
4.6 11500 0.0112 0.2348
4.8 12000 0.0111 0.2378
5.0 12500 0.013 0.2387

Framework Versions

  • Python: 3.12.7
  • SetFit: 1.1.1
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.2
  • PyTorch: 2.6.0+cu124
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
10
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for research-dump/bge-base-en-v1.5_wikipedia_policy_wikipedia_policy

Finetuned
(365)
this model