---
language:
- en
pipeline_tag: text-generation
tags:
- llama-3.1
- astronomy
- astrophysics
- cosmology
- arxiv
- gguf
- quantized
inference: true
base_model:
- meta-llama/Meta-Llama-3.1-8B
---

# AstroSage-Llama-3.1-8B-GGUF

https://arxiv.org/abs/2411.09012

AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities.

## Model Details

- **Base Architecture**: Meta-Llama-3.1-8B
- **Base Model**: AstroSage-Llama-3.1-8B
- **Parameters**: 8 billion
- **Quantization**: GGUF format with two precision options
- **Training Focus**: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
- **License**: Llama 3.1 Community License
- **Development Process**:
  1. Based on the fully trained AstroSage-Llama-3.1-8B model
  2. Quantized to GGUF format in two versions
  3. Optimized for efficient inference


## Using the Model

### Python Implementation
```python
from llama_cpp import Llama 
from huggingface_hub import hf_hub_download 
import os 
import sys 
import contextlib

# Suppress warnings
@contextlib.contextmanager 
def suppress_stderr(): 
    stderr = sys.stderr 
    with open(os.devnull, 'w') as devnull: 
        sys.stderr = devnull 
        try: 
            yield 
        finally: 
            sys.stderr = stderr 

# or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization
def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"): 
    try: 
        os.makedirs("models", exist_ok=True) 
        local_path = os.path.join("models", filename) 
        if not os.path.exists(local_path): 
            print(f"Downloading {filename}...") 
            with suppress_stderr(): 
                local_path = hf_hub_download( 
                    repo_id=repo_id, 
                    filename=filename, 
                    local_dir="models", 
                    local_dir_use_symlinks=False 
                ) 
            print("Download complete!") 
        return local_path 
    except Exception as e: 
        print(f"Error downloading model: {e}") 
        raise 

def initialize_llm(): 
    model_path = download_model() 
    with suppress_stderr(): 
        return Llama( 
            model_path=model_path, 
            n_ctx=2048, 
            n_threads=4 
        ) 

def get_response(llm, prompt, max_tokens=128): 
    response = llm( 
        prompt, 
        max_tokens=max_tokens, 
        temperature=0.7, 
        top_p=0.9, 
        top_k=40, 
        repeat_penalty=1.1, 
        stop=["User:", "\n\n"] 
    ) 
    return response['choices'][0]['text'] 

def main(): 
    llm = initialize_llm()
    
    # Example question about galaxy formation
    first_question = "How does a galaxy form?"
    print("\nQuestion:", first_question)
    print("\nAI:", get_response(llm, first_question).strip(), "\n")
    
    print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n")
    
    while True:
        try:
            user_input = input("You: ")
            if user_input.lower() in ['quit', 'exit']:
                print("\nGoodbye!")
                break
                
            print("\nAI:", get_response(llm, user_input).strip(), "\n")
            
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break
        except Exception as e:
            print(f"Error: {e}")

if __name__ == "__main__": 
    main()
```

### Installation Requirements

```bash
pip install llama-cpp-python huggingface_hub
```

For Macbook with Apple Silicon, install llama-cpp with the following instead
```bash
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python
```

### Key Parameters
- `n_ctx`: Context window size (default: 2048)
- `n_threads`: Number of CPU threads to use (adjust based on your hardware)
- `temperature`: Controls randomness
- `top_p`: Nucleus sampling parameter
- `top_k`: Limits vocabulary choices
- `repeat_penalty`: Prevents repetition
- `max_tokens`: Maximum length of response (128 default, increase for longer answers)

### Example Usage
The model will automatically:
1. Download the quantized model from Hugging Face
2. Initialize it with recommended parameters
3. Start with an example question about galaxy formation
4. Allow for interactive conversation
5. Support easy exit with 'quit' or 'exit' commands

For different use cases, you can:
- Use the BF16 version for maximum accuracy
- Adjust context window size for longer conversations
- Modify temperature for more/less deterministic responses
- Change max_tokens for longer/shorter responses


## Model Improvements and Performance

The quantized model offers several advantages:
- Reduced memory requirements
- CPU inference capability
- Faster inference speed
- Broader hardware compatibility

Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed.

## Quantization Details

- **Format**: GGUF 
- **Available Versions**:
  - AstroSage-8B-BF16.gguf: bfloat16 precision, original precision
  - AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size
- **Compatibility**: Works with llama.cpp and derived projects
- **Trade-offs**:
  - BF16: 
    - Best quality, closest to original model behavior
    - Larger file size and memory requirements
    - Recommended for accuracy-critical applications
  - Q8_0:
    - Reduced memory footprint
    - Good balance of performance and size
    - Suitable for most general applications

## Intended Use
- Curiosity-driven question answering
- Brainstorming new ideas
- Astronomical research assistance
- Educational support in astronomy
- Literature review and summarization
- Scientific explanation of concepts
- Low-resource deployment scenarios
- Edge device implementation
- CPU-only environments
- Applications requiring reduced memory footprint

## Limitations
- All limitations of the original model apply
- Additional considerations:
  - Potential reduction in prediction accuracy due to quantization
  - May show increased variance in numeric calculations
  - Reduced precision in edge cases
  - Performance may vary based on hardware configuration

## Technical Specifications
- Architecture: Meta-Llama 3.1
- Deployment: CPU-friendly, reduced memory footprint
- Format: GGUF (compatible with llama.cpp)

## Ethical Considerations

While this model is designed for scientific use:
- Should not be used as sole source for critical research decisions
- Output should be verified against primary sources
- May reflect biases present in astronomical literature

## Citation and Contact

- Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
- AstroMLab: astromachinelearninglab at gmail dot com
- Please cite the AstroMLab 3 paper when referencing this model:
```
@preprint{dehaan2024astromlab3,
      title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model}, 
      author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
      year={2024},
      eprint={2411.09012},
      archivePrefix={arXiv},
      primaryClass={astro-ph.IM},
      url={https://arxiv.org/abs/2411.09012}, 
}
```

Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.