---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:400
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: Why should manipulative and exploitative uses of AI be prohibited
according to the context provided?
sentences:
- to operate without human intervention. The adaptiveness that an AI system could
exhibit after deployment, refers to self-learning capabilities, allowing the system
to change while in use. AI systems can be used on a stand-alone basis or as a component
of a product, irrespective of whether the system is physically integrated into
the product (embedded) or serves the functionality of the product without being
integrated therein (non-embedded).
- '(28)
Aside from the many beneficial uses of AI, it can also be misused and provide
novel and powerful tools for manipulative, exploitative and social control practices.
Such practices are particularly harmful and abusive and should be prohibited because
they contradict Union values of respect for human dignity, freedom, equality,
democracy and the rule of law and fundamental rights enshrined in the Charter,
including the right to non-discrimination, to data protection and to privacy and
the rights of the child.
(29)'
- A Union legal framework laying down harmonised rules on AI is therefore needed
to foster the development, use and uptake of AI in the internal market that at
the same time meets a high level of protection of public interests, such as health
and safety and the protection of fundamental rights, including democracy, the
rule of law and environmental protection as recognised and protected by Union
law. To achieve that objective, rules regulating the placing on the market, the
putting into service and the use of certain AI systems should be laid down, thus
ensuring the smooth functioning of the internal market and allowing those systems
to benefit from the principle of free movement of goods and services. Those rules
should be clear and robust
- source_sentence: What are the ethical principles mentioned in the context for developing
voluntary best practices and standards?
sentences:
- encouraged to take into account, as appropriate, the ethical principles for the
development of voluntary best practices and standards.
- completed human activity that may be relevant for the purposes of the high-risk
uses listed in an annex to this Regulation. Considering those characteristics,
the AI system provides only an additional layer to a human activity with consequently
lowered risk. That condition would, for example, apply to AI systems that are
intended to improve the language used in previously drafted documents, for example
in relation to professional tone, academic style of language or by aligning text
to a certain brand messaging. The third condition should be that the AI system
is intended to detect decision-making patterns or deviations from prior decision-making
patterns. The risk would be lowered because the use of the AI system follows a previously
- (17)
- source_sentence: How do climate change mitigation and adaptation relate to the conservation
of biodiversity?
sentences:
- of the conditions referred to above should draw up documentation of the assessment
before that system is placed on the market or put into service and should provide
that documentation to national competent authorities upon request. Such a provider
should be obliged to register the AI system in the EU database established under
this Regulation. With a view to providing further guidance for the practical implementation
of the conditions under which the AI systems listed in an annex to this Regulation
are, on an exceptional basis, non-high-risk, the Commission should, after consulting
the Board, provide guidelines specifying that practical implementation, completed
by a comprehensive list of practical examples of use cases of AI systems that
- the conservation and restoration of biodiversity and ecosystems and climate change
mitigation and adaptation.
- logistical point of view.
- source_sentence: How often should the risk-management system be reviewed and updated
to maintain its effectiveness?
sentences:
- The risk-management system should consist of a continuous, iterative process that
is planned and run throughout the entire lifecycle of a high-risk AI system. That
process should be aimed at identifying and mitigating the relevant risks of AI
systems on health, safety and fundamental rights. The risk-management system should
be regularly reviewed and updated to ensure its continuing effectiveness, as well
as justification and documentation of any significant decisions and actions taken
subject to this Regulation. This process should ensure that the provider identifies
risks or adverse impacts and implements mitigation measures for the known and
reasonably foreseeable risks of AI systems to the health, safety and fundamental
rights in light
- solely on profiling them or on assessing their personality traits and characteristics
should be prohibited. In any case, that prohibition does not refer to or touch
upon risk analytics that are not based on the profiling of individuals or on the
personality traits and characteristics of individuals, such as AI systems using
risk analytics to assess the likelihood of financial fraud by undertakings on
the basis of suspicious transactions or risk analytic tools to predict the likelihood
of the localisation of narcotics or illicit goods by customs authorities, for
example on the basis of known trafficking routes.
- be clear and robust in protecting fundamental rights, supportive of new innovative
solutions, enabling a European ecosystem of public and private actors creating
AI systems in line with Union values and unlocking the potential of the digital
transformation across all regions of the Union. By laying down those rules as
well as measures in support of innovation with a particular focus on small and
medium enterprises (SMEs), including startups, this Regulation supports the objective
of promoting the European human-centric approach to AI and being a global leader
in the development of secure, trustworthy and ethical AI as stated by the European
Council (5), and it ensures the protection of ethical principles, as specifically
requested by the
- source_sentence: How is the number 42 used in mathematical contexts?
sentences:
- (65)
- (42)
- to obtain prior authorisation. This could be, for example, a person involved in
a crime, being unwilling, or unable due to an accident or a medical condition,
to disclose their identity to law enforcement authorities.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.875
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1.0
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.875
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.19999999999999998
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.875
name: Cosine Recall@1
- type: cosine_recall@3
value: 1.0
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9484108127976215
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9305555555555555
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.9305555555555557
name: Cosine Map@100
---
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("arthikrangan/legal-ft-1")
# Run inference
sentences = [
'How is the number 42 used in mathematical contexts?',
'(42)',
'(65)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Information Retrieval
* Evaluated with [InformationRetrievalEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.875 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.875 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.875 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| **cosine_ndcg@10** | **0.9484** |
| cosine_mrr@10 | 0.9306 |
| cosine_map@100 | 0.9306 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 400 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 400 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
What was requested by the European Parliament?
| requested by the European Parliament (6).
|
| Who made the request to the European Parliament?
| requested by the European Parliament (6).
|
| What is the significance of the number 60 in the given context?
| (60)
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `num_train_epochs`: 10
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters