zahemen9900 commited on
Commit
76512ea
ยท
1 Parent(s): 76474f3

Update README.md to include new streaming functionality, enhance usage examples, and improve model performance parameters. Added sections for simple non-streaming usage and clarified limitations and future improvements for FinSight AI.

Browse files
Files changed (1) hide show
  1. README.md +96 -31
README.md CHANGED
@@ -10,7 +10,10 @@ tags:
10
  - qlora
11
  - financial-advice
12
  - text-generation
13
- pipeline_tag: text-generation
 
 
 
14
  model-index:
15
  - name: FinSight AI
16
  results:
@@ -52,7 +55,7 @@ base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
52
 
53
 
54
  <div align="center">
55
- <h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">๐Ÿ“„ Read Model Paper ๐Ÿ“„</a></h3>
56
  </div>
57
 
58
  ## Model Details
@@ -72,10 +75,13 @@ Our evaluation demonstrates significant performance improvements across all stan
72
 
73
  ## Usage
74
 
 
 
75
  ```python
76
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
77
  import torch
78
  from peft import PeftModel
 
79
 
80
  # For 4-bit quantized inference (recommended)
81
  bnb_config = BitsAndBytesConfig(
@@ -96,44 +102,45 @@ base_model = AutoModelForCausalLM.from_pretrained(
96
  model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
97
  tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
98
 
99
- # Example usage
100
- prompt = "What's a good strategy for long-term investing?"
101
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
102
- outputs = model.generate(
103
- inputs.input_ids,
104
- max_new_tokens=512,
105
- temperature=0.7,
106
- top_p=0.95,
107
- do_sample=True
108
- )
109
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
110
- print(response)
111
- ```
112
-
113
- ### For streaming options:
114
 
115
- ```python
116
- from transformers import TextIteratorStreamer
117
- import threading
 
118
 
119
- # Setup model and tokenizer (same as above)
 
 
120
 
121
- prompt = "What's a good strategy for long-term investing?"
122
- inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 
123
 
124
  # Create a streamer
125
- streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
126
 
127
- # Generate in a separate thread
128
- generation_kwargs = {
129
- "input_ids": inputs.input_ids,
130
- "max_new_tokens": 512,
131
- "temperature": 0.7,
132
  "top_p": 0.95,
133
  "do_sample": True,
134
- "streamer": streamer
 
 
 
 
 
 
135
  }
136
 
 
 
 
 
137
  thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
138
  thread.start()
139
 
@@ -141,7 +148,63 @@ thread.start()
141
  print("Response: ", end="")
142
  for text in streamer:
143
  print(text, end="", flush=True)
 
 
 
 
 
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ```
146
 
147
  ## Training Details
@@ -174,6 +237,7 @@ The model was trained using the following configuration:
174
  - **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
175
  - **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
176
  - **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
 
177
 
178
  ## Future Improvements
179
 
@@ -181,6 +245,7 @@ The model was trained using the following configuration:
181
  - **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
182
  - **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
183
  - **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
 
184
 
185
  ## Citation
186
 
 
10
  - qlora
11
  - financial-advice
12
  - text-generation
13
+ - peft
14
+ - lora
15
+ - adapter
16
+ inference: false
17
  model-index:
18
  - name: FinSight AI
19
  results:
 
55
 
56
 
57
  <div align="center">
58
+ <h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">Read Model Paper ๐Ÿ“„</a></h3>
59
  </div>
60
 
61
  ## Model Details
 
75
 
76
  ## Usage
77
 
78
+ ### Streaming function
79
+
80
  ```python
81
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer
82
  import torch
83
  from peft import PeftModel
84
+ import threading
85
 
86
  # For 4-bit quantized inference (recommended)
87
  bnb_config = BitsAndBytesConfig(
 
102
  model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
103
  tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
104
 
105
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
106
+ system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
107
+ prompt = "What's your name, and what're you good at?"
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
+ messages = [
110
+ {"role": "system", "content": system_prompt},
111
+ {"role": "user", "content": prompt}
112
+ ]
113
 
114
+ formatted_prompt = tokenizer.apply_chat_template(
115
+ messages, tokenize=False, add_generation_prompt=True
116
+ )
117
 
118
+ # Tokenize the formatted prompt
119
+ inputs = tokenizer(formatted_prompt, return_tensors="pt")
120
+ inputs = {k: v.to(device) for k, v in inputs.items()} # Move all tensors to device
121
 
122
  # Create a streamer
123
+ streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True)
124
 
125
+ # Adjust generation parameters for more controlled responses
126
+ generation_config = {
127
+ "max_new_tokens": 256,
128
+ "temperature": 0.6,
 
129
  "top_p": 0.95,
130
  "do_sample": True,
131
+ "pad_token_id": tokenizer.eos_token_id,
132
+ "eos_token_id": tokenizer.eos_token_id,
133
+ "repetition_penalty": 1.2,
134
+ "no_repeat_ngram_size": 4,
135
+ "num_beams": 1,
136
+ "early_stopping": False,
137
+ "length_penalty": 1.0,
138
  }
139
 
140
+ # Combine inputs and generation config for the generate function
141
+ generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer}
142
+
143
+ # Start generation in a separate thread
144
  thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
145
  thread.start()
146
 
 
148
  print("Response: ", end="")
149
  for text in streamer:
150
  print(text, end="", flush=True)
151
+ ```
152
+
153
+ ### Simple Non-Streaming Usage
154
+
155
+ If you prefer a simpler approach without streaming:
156
 
157
+ ```python
158
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
159
+ import torch
160
+ from peft import PeftModel
161
+
162
+ # For 4-bit quantized inference
163
+ bnb_config = BitsAndBytesConfig(
164
+ load_in_4bit=True,
165
+ bnb_4bit_use_double_quant=True,
166
+ bnb_4bit_quant_type="nf4",
167
+ bnb_4bit_compute_dtype=torch.bfloat16
168
+ )
169
+
170
+ # Load base model with quantization
171
+ base_model = AutoModelForCausalLM.from_pretrained(
172
+ "HuggingFaceTB/SmolLM2-1.7B-Instruct",
173
+ quantization_config=bnb_config,
174
+ device_map="auto"
175
+ )
176
+
177
+ # Load adapter weights (LoRA)
178
+ model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
179
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
180
+
181
+ # Prepare input
182
+ system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
183
+ user_prompt = "What's a good strategy for long-term investing?"
184
+
185
+ messages = [
186
+ {"role": "system", "content": system_prompt},
187
+ {"role": "user", "content": user_prompt}
188
+ ]
189
+
190
+ formatted_prompt = tokenizer.apply_chat_template(
191
+ messages, tokenize=False, add_generation_prompt=True
192
+ )
193
+
194
+ inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
195
+
196
+ # Generate response
197
+ outputs = model.generate(
198
+ inputs.input_ids,
199
+ max_new_tokens=256,
200
+ temperature=0.7,
201
+ top_p=0.95,
202
+ do_sample=True,
203
+ repetition_penalty=1.2
204
+ )
205
+
206
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
207
+ print("Response:\n", response.strip())
208
  ```
209
 
210
  ## Training Details
 
237
  - **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
238
  - **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
239
  - **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
240
+ - **Size of Dataset**: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics.
241
 
242
  ## Future Improvements
243
 
 
245
  - **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
246
  - **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
247
  - **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
248
+ - **A larger, higher quality dataset**: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model
249
 
250
  ## Citation
251