--- language: - en pipeline_tag: text-generation tags: - llama-3.1 - astronomy - astrophysics - cosmology - arxiv - gguf - quantized inference: true base_model: - meta-llama/Meta-Llama-3.1-8B --- # AstroSage-Llama-3.1-8B-GGUF https://arxiv.org/abs/2411.09012 AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities. ## Model Details - **Base Architecture**: Meta-Llama-3.1-8B - **Base Model**: AstroSage-Llama-3.1-8B - **Parameters**: 8 billion - **Quantization**: GGUF format with two precision options - **Training Focus**: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation - **License**: Llama 3.1 Community License - **Development Process**: 1. Based on the fully trained AstroSage-Llama-3.1-8B model 2. Quantized to GGUF format in two versions 3. Optimized for efficient inference ## Using the Model ### Python Implementation ```python from llama_cpp import Llama from huggingface_hub import hf_hub_download import os import sys import contextlib # Suppress warnings @contextlib.contextmanager def suppress_stderr(): stderr = sys.stderr with open(os.devnull, 'w') as devnull: sys.stderr = devnull try: yield finally: sys.stderr = stderr # or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"): try: os.makedirs("models", exist_ok=True) local_path = os.path.join("models", filename) if not os.path.exists(local_path): print(f"Downloading {filename}...") with suppress_stderr(): local_path = hf_hub_download( repo_id=repo_id, filename=filename, local_dir="models", local_dir_use_symlinks=False ) print("Download complete!") return local_path except Exception as e: print(f"Error downloading model: {e}") raise def initialize_llm(): model_path = download_model() with suppress_stderr(): return Llama( model_path=model_path, n_ctx=2048, n_threads=4 ) def get_response(llm, prompt, max_tokens=128): response = llm( prompt, max_tokens=max_tokens, temperature=0.7, top_p=0.9, top_k=40, repeat_penalty=1.1, stop=["User:", "\n\n"] ) return response['choices'][0]['text'] def main(): llm = initialize_llm() # Example question about galaxy formation first_question = "How does a galaxy form?" print("\nQuestion:", first_question) print("\nAI:", get_response(llm, first_question).strip(), "\n") print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n") while True: try: user_input = input("You: ") if user_input.lower() in ['quit', 'exit']: print("\nGoodbye!") break print("\nAI:", get_response(llm, user_input).strip(), "\n") except KeyboardInterrupt: print("\nGoodbye!") break except Exception as e: print(f"Error: {e}") if __name__ == "__main__": main() ``` ### Installation Requirements ```bash pip install llama-cpp-python huggingface_hub ``` For Macbook with Apple Silicon, install llama-cpp with the following instead ```bash CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python ``` ### Key Parameters - `n_ctx`: Context window size (default: 2048) - `n_threads`: Number of CPU threads to use (adjust based on your hardware) - `temperature`: Controls randomness - `top_p`: Nucleus sampling parameter - `top_k`: Limits vocabulary choices - `repeat_penalty`: Prevents repetition - `max_tokens`: Maximum length of response (128 default, increase for longer answers) ### Example Usage The model will automatically: 1. Download the quantized model from Hugging Face 2. Initialize it with recommended parameters 3. Start with an example question about galaxy formation 4. Allow for interactive conversation 5. Support easy exit with 'quit' or 'exit' commands For different use cases, you can: - Use the BF16 version for maximum accuracy - Adjust context window size for longer conversations - Modify temperature for more/less deterministic responses - Change max_tokens for longer/shorter responses ## Model Improvements and Performance The quantized model offers several advantages: - Reduced memory requirements - CPU inference capability - Faster inference speed - Broader hardware compatibility Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed. ## Quantization Details - **Format**: GGUF - **Available Versions**: - AstroSage-8B-BF16.gguf: bfloat16 precision, original precision - AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size - **Compatibility**: Works with llama.cpp and derived projects - **Trade-offs**: - BF16: - Best quality, closest to original model behavior - Larger file size and memory requirements - Recommended for accuracy-critical applications - Q8_0: - Reduced memory footprint - Good balance of performance and size - Suitable for most general applications ## Intended Use - Curiosity-driven question answering - Brainstorming new ideas - Astronomical research assistance - Educational support in astronomy - Literature review and summarization - Scientific explanation of concepts - Low-resource deployment scenarios - Edge device implementation - CPU-only environments - Applications requiring reduced memory footprint ## Limitations - All limitations of the original model apply - Additional considerations: - Potential reduction in prediction accuracy due to quantization - May show increased variance in numeric calculations - Reduced precision in edge cases - Performance may vary based on hardware configuration ## Technical Specifications - Architecture: Meta-Llama 3.1 - Deployment: CPU-friendly, reduced memory footprint - Format: GGUF (compatible with llama.cpp) ## Ethical Considerations While this model is designed for scientific use: - Should not be used as sole source for critical research decisions - Output should be verified against primary sources - May reflect biases present in astronomical literature ## Citation and Contact - Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com) - AstroMLab: astromachinelearninglab at gmail dot com - Please cite the AstroMLab 3 paper when referencing this model: ``` @preprint{dehaan2024astromlab3, title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model}, author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun}, year={2024}, eprint={2411.09012}, archivePrefix={arXiv}, primaryClass={astro-ph.IM}, url={https://arxiv.org/abs/2411.09012}, } ``` Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.