YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Architecture & Training Configuration:

  • Base Model Configuration: This variant is built upon the Llama2-7B configuration, ensuring a robust foundation that aligns with the latest advancements in model architecture.

  • Sequence Length Adaptation: Originally processed data for a sequence length of 2048 was detokenized and re-encoded to fit a sequence length of 4096. This step follows the preprocessing strategy of Megatron-LM, enhancing our model's capacity to understand and generate more complex sequences.

  • Batch Size & Token Management: We adopted a batch size capable of managing 4 million tokens, tailored to accommodate the increased sequence length and ensure efficient data processing.

  • Integration of GQA Technologies: To boost training efficiency, our configuration includes the integration of Gradient Quantization and Aggregation technologies. With 32 attention heads and a group size of 4, this feature significantly enhances the model's learning and processing capabilities.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Collection including m-a-p/Amber-Reproduce-hkg_amber_hf