|
--- |
|
library_name: hierarchy-transformers |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- hierarchy-transformers |
|
- feature-extraction |
|
- hierarchy-encoding |
|
- subsumption-relationships |
|
- transformers |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- sentence-transformers/all-MiniLM-L6-v2 |
|
--- |
|
|
|
# Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun |
|
|
|
A **Hi**erarchy **T**ransformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships. |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
HiT-MiniLM-L6-WordNet is a HiT model trained on WordNet's subsumption (hypernym) hierarchy of noun entities. |
|
|
|
- **Developed by:** [Yuan He](https://www.yuanhe.wiki/), Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks |
|
- **Model type:** Hierarchy Transformer Encoder (HiT) |
|
- **License:** Apache license 2.0 |
|
- **Hierarchy**: WordNet's subsumption (hypernym) hierarchy of noun entities. |
|
- **Training Dataset**: [Hierarchy-Transformers/WordNetNoun](https://huggingface.co/datasets/Hierarchy-Transformers/WordNetNoun) |
|
- **Pre-trained model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
|
- **Training Objectives**: Jointly optimised on *Hyperbolic Clustering* and *Hyperbolic Centripetal* losses (see definitions in the [paper](https://arxiv.org/abs/2401.11374)) |
|
|
|
### Model Versions |
|
|
|
| **Version** | **Model Revision** | **Note** | |
|
|------------|---------|----------| |
|
|v1.0 (Random Negatives)| `main` or `v1-random-negatives`| The variant trained on random negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374).| |
|
|v1.0 (Hard Negatives)| `v1-hard-negatives` | The variant trained on hard negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374). | |
|
|
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/KRR-Oxford/HierarchyTransformers |
|
- **Paper:** [Language Models as Hierarchy Encoders](https://arxiv.org/abs/2401.11374) |
|
|
|
## Usage |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space. |
|
|
|
### Get Started |
|
|
|
Install `hierarchy_transformers` (check our [repository](https://github.com/KRR-Oxford/HierarchyTransformers)) through `pip` or `GitHub`. |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from hierarchy_transformers import HierarchyTransformer |
|
|
|
# load the model |
|
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun') |
|
|
|
# entity names to be encoded. |
|
entity_names = ["computer", "personal computer", "fruit", "berry"] |
|
|
|
# get the entity embeddings |
|
entity_embeddings = model.encode(entity_names) |
|
``` |
|
|
|
### Default Probing for Subsumption Prediction |
|
|
|
Use the entity embeddings to predict the subsumption relationships between them. |
|
|
|
```python |
|
# suppose we want to compare "personal computer" and "computer", "berry" and "fruit" |
|
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True) |
|
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True) |
|
|
|
# compute the hyperbolic distances and norms of entity embeddings |
|
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings) |
|
child_norms = model.manifold.dist0(child_entity_embeddings) |
|
parent_norms = model.manifold.dist0(parent_entity_embeddings) |
|
|
|
# use the empirical function for subsumption prediction proposed in the paper |
|
# `centri_score_weight` and the overall threshold are determined on the validation set |
|
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms)) |
|
``` |
|
|
|
### Train Your Own Models |
|
|
|
Use the example scripts in our [repository](https://github.com/KRR-Oxford/HierarchyTransformers/tree/main/scripts) to reproduce existing models and train/evaluate your own models. |
|
|
|
|
|
|
|
## Full Model Architecture |
|
``` |
|
HierarchyTransformer( |
|
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False}) |
|
) |
|
``` |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
*Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks.* **Language Models as Hierarchy Encoders.** Advances in Neural Information Processing Systems 37 (NeurIPS 2024). |
|
|
|
``` |
|
@inproceedings{NEURIPS2024_1a970a3e, |
|
author = {He, Yuan and Yuan, Moy and Chen, Jiaoyan and Horrocks, Ian}, |
|
booktitle = {Advances in Neural Information Processing Systems}, |
|
editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang}, |
|
pages = {14690--14711}, |
|
publisher = {Curran Associates, Inc.}, |
|
title = {Language Models as Hierarchy Encoders}, |
|
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/1a970a3e62ac31c76ec3cea3a9f68fdf-Paper-Conference.pdf}, |
|
volume = {37}, |
|
year = {2024} |
|
} |
|
``` |
|
|
|
|
|
## Model Card Contact |
|
|
|
For any queries or feedback, please contact Yuan He (`yuan.he(at)cs.ox.ac.uk`). |