Update README.md

e4eb7bf verified about 2 months ago

5.45 kB

	---
	library_name: hierarchy-transformers
	pipeline_tag: feature-extraction
	tags:
	- hierarchy-transformers
	- feature-extraction
	- hierarchy-encoding
	- subsumption-relationships
	- transformers
	license: apache-2.0
	language:
	- en
	metrics:
	- precision
	- recall
	- f1
	base_model:
	- sentence-transformers/all-MiniLM-L6-v2
	---

	# Hierarchy-Transformers/HiT-MiniLM-L6-WordNetNoun

	A Hierarchy Transformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	HiT-MiniLM-L6-WordNet is a HiT model trained on WordNet's subsumption (hypernym) hierarchy of noun entities.

	- Developed by: [Yuan He](https://www.yuanhe.wiki/), Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks
	- Model type: Hierarchy Transformer Encoder (HiT)
	- License: Apache license 2.0
	- Hierarchy: WordNet's subsumption (hypernym) hierarchy of noun entities.
	- Training Dataset: [Hierarchy-Transformers/WordNetNoun](https://huggingface.co/datasets/Hierarchy-Transformers/WordNetNoun)
	- Pre-trained model: [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
	- Training Objectives: Jointly optimised on Hyperbolic Clustering and Hyperbolic Centripetal losses (see definitions in the [paper](https://arxiv.org/abs/2401.11374))

	### Model Versions

	\| Version \| Model Revision \| Note \|
	\|------------\|---------\|----------\|
	\|v1.0 (Random Negatives)\| `main` or `v1-random-negatives`\| The variant trained on random negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374).\|
	\|v1.0 (Hard Negatives)\| `v1-hard-negatives` \| The variant trained on hard negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374). \|


	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/KRR-Oxford/HierarchyTransformers
	- Paper: [Language Models as Hierarchy Encoders](https://arxiv.org/abs/2401.11374)

	## Usage

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space.

	### Get Started

	Install `hierarchy_transformers` (check our [repository](https://github.com/KRR-Oxford/HierarchyTransformers)) through `pip` or `GitHub`.

	Use the code below to get started with the model.

	```python
	from hierarchy_transformers import HierarchyTransformer

	# load the model
	model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun')

	# entity names to be encoded.
	entity_names = ["computer", "personal computer", "fruit", "berry"]

	# get the entity embeddings
	entity_embeddings = model.encode(entity_names)
	```

	### Default Probing for Subsumption Prediction

	Use the entity embeddings to predict the subsumption relationships between them.

	```python
	# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
	child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
	parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)

	# compute the hyperbolic distances and norms of entity embeddings
	dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
	child_norms = model.manifold.dist0(child_entity_embeddings)
	parent_norms = model.manifold.dist0(parent_entity_embeddings)

	# use the empirical function for subsumption prediction proposed in the paper
	# `centri_score_weight` and the overall threshold are determined on the validation set
	subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))
	```

	### Train Your Own Models

	Use the example scripts in our [repository](https://github.com/KRR-Oxford/HierarchyTransformers/tree/main/scripts) to reproduce existing models and train/evaluate your own models.



	## Full Model Architecture
	```
	HierarchyTransformer(
	(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
	)
	```

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. Language Models as Hierarchy Encoders. Advances in Neural Information Processing Systems 37 (NeurIPS 2024).

	```
	@inproceedings{NEURIPS2024_1a970a3e,
	author = {He, Yuan and Yuan, Moy and Chen, Jiaoyan and Horrocks, Ian},
	booktitle = {Advances in Neural Information Processing Systems},
	editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
	pages = {14690--14711},
	publisher = {Curran Associates, Inc.},
	title = {Language Models as Hierarchy Encoders},
	url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/1a970a3e62ac31c76ec3cea3a9f68fdf-Paper-Conference.pdf},
	volume = {37},
	year = {2024}
	}
	```


	## Model Card Contact

	For any queries or feedback, please contact Yuan He (`yuan.he(at)cs.ox.ac.uk`).