---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:400
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: What potential issues can arise from the use of AI systems in determining
access to financial resources and essential services?
sentences:
- the dispatching of emergency first response services, including by police, firefighters
and medical aid, as well as of emergency healthcare patient triage systems, should
also be classified as high-risk since they make decisions in very critical situations
for the life and health of persons and their property.
- systems do not entail a high risk to legal and natural persons. In addition, AI
systems used to evaluate the credit score or creditworthiness of natural persons
should be classified as high-risk AI systems, since they determine those persons’
access to financial resources or essential services such as housing, electricity,
and telecommunication services. AI systems used for those purposes may lead to
discrimination between persons or groups and may perpetuate historical patterns
of discrimination, such as that based on racial or ethnic origins, gender, disabilities,
age or sexual orientation, or may create new forms of discriminatory impacts.
However, AI systems provided for by Union law for the purpose of detecting fraud
in the offering
- In accordance with Articles 2 and 2a of Protocol No 22 on the position of Denmark,
annexed to the TEU and to the TFEU, Denmark is not bound by rules laid down in
Article 5(1), first subparagraph, point (g), to the extent it applies to the use
of biometric categorisation systems for activities in the field of police cooperation
and judicial cooperation in criminal matters, Article 5(1), first subparagraph,
point (d), to the extent it applies to the use of AI systems covered by that provision,
Article 5(1), first subparagraph, point (h), (2) to (6) and Article 26(10) of
this Regulation adopted on the basis of Article 16 TFEU, or subject to their application,
which relate to the processing of personal data by the Member States when carrying
- source_sentence: Why is the failure or malfunctioning of safety components in critical
infrastructure considered a significant risk?
sentences:
- As regards the management and operation of critical infrastructure, it is appropriate
to classify as high-risk the AI systems intended to be used as safety components
in the management and operation of critical digital infrastructure as listed in
point (8) of the Annex to Directive (EU) 2022/2557, road traffic and the supply
of water, gas, heating and electricity, since their failure or malfunctioning
may put at risk the life and health of persons at large scale and lead to appreciable
disruptions in the ordinary conduct of social and economic activities. Safety
components of critical infrastructure, including critical digital infrastructure,
are systems used to directly protect the physical integrity of critical infrastructure
or the
- (54)
- (42)
- source_sentence: How does the current Regulation relate to the provisions set out
in Regulation (EU) 2022/2065?
sentences:
- (39)
- '(11)
This Regulation should be without prejudice to the provisions regarding the liability
of providers of intermediary services as set out in Regulation (EU) 2022/2065
of the European Parliament and of the Council (15).
(12)'
- (53)
- source_sentence: Why is it important to ensure a consistent and high level of protection
for AI throughout the Union?
sentences:
- AI systems can be easily deployed in a large variety of sectors of the economy
and many parts of society, including across borders, and can easily circulate
throughout the Union. Certain Member States have already explored the adoption
of national rules to ensure that AI is trustworthy and safe and is developed and
used in accordance with fundamental rights obligations. Diverging national rules
may lead to the fragmentation of the internal market and may decrease legal certainty
for operators that develop, import or use AI systems. A consistent and high level
of protection throughout the Union should therefore be ensured in order to achieve
trustworthy AI, while divergences hampering the free circulation, innovation,
deployment and the
- '(5)
At the same time, depending on the circumstances regarding its specific application,
use, and level of technological development, AI may generate risks and cause harm
to public interests and fundamental rights that are protected by Union law. Such
harm might be material or immaterial, including physical, psychological, societal
or economic harm.
(6)'
- (57)
- source_sentence: What is the purpose of implementing a risk-based approach for AI
systems according to the context?
sentences:
- use of lethal force and other AI systems in the context of military and defence
activities. As regards national security purposes, the exclusion is justified
both by the fact that national security remains the sole responsibility of Member
States in accordance with Article 4(2) TEU and by the specific nature and operational
needs of national security activities and specific national rules applicable to
those activities. Nonetheless, if an AI system developed, placed on the market,
put into service or used for military, defence or national security purposes is
used outside those temporarily or permanently for other purposes, for example,
civilian or humanitarian purposes, law enforcement or public security purposes,
such a system would fall
- '(26)
In order to introduce a proportionate and effective set of binding rules for AI
systems, a clearly defined risk-based approach should be followed. That approach
should tailor the type and content of such rules to the intensity and scope of
the risks that AI systems can generate. It is therefore necessary to prohibit
certain unacceptable AI practices, to lay down requirements for high-risk AI systems
and obligations for the relevant operators, and to lay down transparency obligations
for certain AI systems.
(27)'
- To mitigate the risks from high-risk AI systems placed on the market or put into
service and to ensure a high level of trustworthiness, certain mandatory requirements
should apply to high-risk AI systems, taking into account the intended purpose
and the context of use of the AI system and according to the risk-management system
to be established by the provider. The measures adopted by the providers to comply
with the mandatory requirements of this Regulation should take into account the
generally acknowledged state of the art on AI, be proportionate and effective
to meet the objectives of this Regulation. Based on the New Legislative Framework,
as clarified in Commission notice ‘The “Blue Guide” on the implementation of EU
product rules
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.8958333333333334
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1.0
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.8958333333333334
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.8958333333333334
name: Cosine Recall@1
- type: cosine_recall@3
value: 1.0
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9560997762648827
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9409722222222222
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9409722222222223
name: Cosine Map@100
---
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Mdean77/legal-ft-3")
# Run inference
sentences = [
'What is the purpose of implementing a risk-based approach for AI systems according to the context?',
'(26)\n\n\nIn order to introduce a\xa0proportionate and effective set of binding rules for AI systems, a\xa0clearly defined risk-based approach should be followed. That approach should tailor the type and content of such rules to the intensity and scope of the risks that AI systems can generate. It is therefore necessary to prohibit certain unacceptable AI practices, to lay down requirements for high-risk AI systems and obligations for the relevant operators, and to lay down transparency obligations for certain AI systems.\n\n\n\n\n\n\n\n\n\n\n\n\n(27)',
'use of lethal force and other AI systems in the context of military and defence activities. As regards national security purposes, the exclusion is justified both by the fact that national security remains the sole responsibility of Member States in accordance with Article\xa04(2) TEU and by the specific nature and operational needs of national security activities and specific national rules applicable to those activities. Nonetheless, if an AI system developed, placed on the market, put into service or used for military, defence or national security purposes is used outside those temporarily or permanently for other purposes, for example, civilian or humanitarian purposes, law enforcement or public security purposes, such a\xa0system would fall',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.8958 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.8958 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.8958 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| **cosine_ndcg@10** | **0.9561** |
| cosine_mrr@10 | 0.941 |
| cosine_map@100 | 0.941 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 400 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 400 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
What is the significance of the number 55 in the given context?
| (55)
|
| How does the number 55 relate to the overall theme or subject being discussed?
| (55)
|
| What types of practices are prohibited by Union law according to the context?
| (45)
Practices that are prohibited by Union law, including data protection law, non-discrimination law, consumer protection law, and competition law, should not be affected by this Regulation.
(46)
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `num_train_epochs`: 10
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters