--- license: mit datasets: - SustcZhangYX/ChatEnv language: - en tags: - Environmental Science ---

EnvGPT: Leveraging a Large Language Model for Environmental Science

**EnvGPT** is the first domain-specific large language model tailored for environmental science tasks. Environmental science presents unique challenges for LLMs due to its interdisciplinary nature. EnvGPT was developed to address these challenges by leveraging a domain-specific environmental science instruction dataset and benchmark. *The model was fine-tuned on this environmental science-specific instruction dataset, [ChatEnv](https://huggingface.co/datasets/SustcZhangYX/ChatEnv), through Supervised Fine-Tuning (SFT). The dataset contains a total token count of **107,197,329**, highlighting its depth and comprehensiveness for environmental science tasks.* ## 🚀 Getting Started ### Download the model Download the model: [EnvGPT](https://huggingface.co/SustcZhangYX/EnvGPT) ```shell git lfs install git clone https://huggingface.co/SustcZhangYX/EnvGPT ``` ### Model Usage Here is a Python code snippet that demonstrates how to load the tokenizer and model and generate text using EnvGPT. ```python import transformers import torch # Set the path to your local model model_path = "YOUR_LOCAL_MODEL_PATH" pipeline = transformers.pipeline( "text-generation", model=model_path, # Use local model path model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", ) messages = [ {"role": "system", "content": "You are an expert assistant in environmental science, EnvGPT.You are a helpful assistant."}, {"role": "user", "content": "What is the definition of environmental science?"}, ] # Pass top_p and temperature directly in the pipeline call outputs = pipeline( messages, max_new_tokens=4096, top_p=0.7, # Add nucleus sampling temperature=0.9, # Add temperature control ) print(outputs[0]["generated_text"]) ``` This code demonstrates how to load the tokenizer and model from your local path, define environmental science-specific prompts, and generate responses using sampling techniques like top-p and temperature. ## 🌏 Acknowledgement EnvGPT is fine-tuned based on the open-sourced [LLaMA](https://huggingface.co/meta-llama). We thank Meta AI for their contributions to the community. ## ❗Disclaimer This project is intended solely for academic research and exploration. Please note that, like all large language models, this model may exhibit limitations, including potential inaccuracies or hallucinations in generated outputs. ## Limitations - The model may produce hallucinated outputs or inaccuracies, which are inherent to large language models. - The model's identity has not been specifically optimized and may generate content that resembles outputs from other LLaMA-based models or similar architectures. - Generated outputs can vary between attempts due to sensitivity to prompt phrasing and token context. ## 🚩Citation If you use EnvGPT in your research or applications, please cite this work as follows: ```Markdown [Placeholder for Citation] Please refer to the forthcoming publication for details about EnvGPT. This section will be updated with the citation once the paper is officially published. ```