SustcZhangYX commited on
Commit
3bc2906
·
1 Parent(s): fbac713

Add README

Browse files
Files changed (2) hide show
  1. LOGO.PNG +0 -0
  2. README.md +81 -1
LOGO.PNG ADDED
README.md CHANGED
@@ -6,4 +6,84 @@ language:
6
  - en
7
  tags:
8
  - Environmental Science
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  tags:
8
  - Environmental Science
9
+ ---
10
+ <div align="center">
11
+ <img src="LOGO.PNG" width="600px">
12
+ <h1 align="center"><font face="Arial">EnvGPT: A Framework for Applying Large Language Models in Environmental Science</font></h1>
13
+
14
+ </div>
15
+
16
+ EnvGPT, based on LLaMA 3.1-8B-instruct, is the first domain-specific large language model tailored for environmental science tasks.
17
+
18
+ *Environmental science presents unique challenges for LLMs due to its interdisciplinary nature. EnvGPT was developed to address these challenges by leveraging domain-specific instruction datasets and benchmarks.*
19
+
20
+ ## 🚀 Getting Started
21
+
22
+ ### Download the model
23
+
24
+ Download the model: [EnvGPT](https://huggingface.co/SustcZhangYX/EnvGPT)
25
+
26
+ ```shell
27
+ git lfs install
28
+ git clone https://huggingface.co/SustcZhangYX/EnvGPT
29
+ ```
30
+
31
+ ### Model Usage
32
+
33
+ Here is a Python code snippet that demonstrates how to load the tokenizer and model and generate text using EnvGPT.
34
+
35
+ ```python
36
+ import transformers
37
+ import torch
38
+
39
+ # Set the path to your local model
40
+ model_path = "YOUR_LOCAL_MODEL_PATH"
41
+
42
+ pipeline = transformers.pipeline(
43
+ "text-generation",
44
+ model=model_path, # Use local model path
45
+ model_kwargs={"torch_dtype": torch.bfloat16},
46
+ device_map="auto",
47
+ )
48
+
49
+ messages = [
50
+ {"role": "system", "content": "You are an expert assistant in environmental science, EnvGPT.You are a helpful assistant."},
51
+ {"role": "user", "content": "What is the definition of environmental science?"},
52
+ ]
53
+
54
+ # Pass top_p and temperature directly in the pipeline call
55
+ outputs = pipeline(
56
+ messages,
57
+ max_new_tokens=512,
58
+ top_p=0.7, # Add nucleus sampling
59
+ temperature=0.9, # Add temperature control
60
+ )
61
+
62
+ print(outputs[0]["generated_text"])
63
+ ```
64
+
65
+ This code demonstrates how to load the tokenizer and model from your local path, define environmental science-specific prompts, and generate responses using sampling techniques like top-p and temperature.
66
+
67
+ ## 🌏 Acknowledgement
68
+
69
+ EnvGPT is fine-tuned based on the open-sourced [LLaMA](https://huggingface.co/meta-llama). We thank Meta AI for their contributions to the community.
70
+
71
+ ## ❗Disclaimer
72
+
73
+ This project is intended solely for academic research and exploration. Please note that, like all large language models, this model may exhibit limitations, including potential inaccuracies or hallucinations in generated outputs.
74
+
75
+ ## Limitations
76
+
77
+ - The model may produce hallucinated outputs or inaccuracies, which are inherent to large language models.
78
+ - The model's identity has not been specifically optimized and may generate content that resembles outputs from other LLaMA-based models or similar architectures.
79
+ - Generated outputs can vary between attempts due to sensitivity to prompt phrasing and token context.
80
+
81
+ ## 🚩Citation
82
+
83
+ If you use EnvGPT in your research or applications, please cite this work as follows:
84
+
85
+ ```Markdown
86
+ [Placeholder for Citation]
87
+ Please refer to the forthcoming publication for details about EnvGPT.
88
+ This section will be updated with the citation once the paper is officially published.
89
+ ```