Update README.md to include new streaming functionality, enhance usage examples, and improve model performance parameters. Added sections for simple non-streaming usage and clarified limitations and future improvements for FinSight AI.

Browse files

Files changed (1) hide show

README.md +96 -31

README.md CHANGED Viewed

@@ -10,7 +10,10 @@ tags:
   - qlora
   - financial-advice
   - text-generation
-pipeline_tag: text-generation
 model-index:
   - name: FinSight AI
     results:
@@ -52,7 +55,7 @@ base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
 <div align="center">
-<h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">📄 Read Model Paper 📄</a></h3>
 </div>
 ## Model Details
@@ -72,10 +75,13 @@ Our evaluation demonstrates significant performance improvements across all stan
 ## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
 import torch
 from peft import PeftModel
 # For 4-bit quantized inference (recommended)
 bnb_config = BitsAndBytesConfig(
@@ -96,44 +102,45 @@ base_model = AutoModelForCausalLM.from_pretrained(
 model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
 tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
-# Example usage
-prompt = "What's a good strategy for long-term investing?"
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-outputs = model.generate(
-  inputs.input_ids,
-  max_new_tokens=512,
-  temperature=0.7,
-  top_p=0.95,
-  do_sample=True
-)
-response = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(response)
-```
-### For streaming options:
-```python
-from transformers import TextIteratorStreamer
-import threading
-# Setup model and tokenizer (same as above)
-prompt = "What's a good strategy for long-term investing?"
-inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 # Create a streamer
-streamer = TextIteratorStreamer(tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
-# Generate in a separate thread
-generation_kwargs = {
-    "input_ids": inputs.input_ids,
-    "max_new_tokens": 512,
-    "temperature": 0.7,
     "top_p": 0.95,
     "do_sample": True,
-    "streamer": streamer
 }
 thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
 thread.start()
@@ -141,7 +148,63 @@ thread.start()
 print("Response: ", end="")
 for text in streamer:
     print(text, end="", flush=True)
 ```
 ## Training Details
@@ -174,6 +237,7 @@ The model was trained using the following configuration:
 - **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
 - **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
 - **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
 ## Future Improvements
@@ -181,6 +245,7 @@ The model was trained using the following configuration:
 - **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
 - **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
 - **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
 ## Citation

   - qlora
   - financial-advice
   - text-generation
+  - peft
+  - lora
+  - adapter
+inference: false
 model-index:
   - name: FinSight AI
     results:
 <div align="center">
+<h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">Read Model Paper 📄</a></h3>
 </div>
 ## Model Details
 ## Usage
+### Streaming function
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer
 import torch
 from peft import PeftModel
+import threading
 # For 4-bit quantized inference (recommended)
 bnb_config = BitsAndBytesConfig(
 model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
 tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
+prompt = "What's your name, and what're you good at?"
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": prompt}
+]
+formatted_prompt = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+# Tokenize the formatted prompt
+inputs = tokenizer(formatted_prompt, return_tensors="pt")
+inputs = {k: v.to(device) for k, v in inputs.items()}  # Move all tensors to device
 # Create a streamer
+streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True)
+# Adjust generation parameters for more controlled responses
+generation_config = {
+    "max_new_tokens": 256,
+    "temperature": 0.6,
     "top_p": 0.95,
     "do_sample": True,
+    "pad_token_id": tokenizer.eos_token_id,
+    "eos_token_id": tokenizer.eos_token_id,
+    "repetition_penalty": 1.2,
+    "no_repeat_ngram_size": 4,
+    "num_beams": 1,
+    "early_stopping": False,
+    "length_penalty": 1.0,
 }
+# Combine inputs and generation config for the generate function
+generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer}
+# Start generation in a separate thread
 thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
 thread.start()
 print("Response: ", end="")
 for text in streamer:
     print(text, end="", flush=True)
+```
+### Simple Non-Streaming Usage
+If you prefer a simpler approach without streaming:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+from peft import PeftModel
+# For 4-bit quantized inference
+bnb_config = BitsAndBytesConfig(
+  load_in_4bit=True,
+  bnb_4bit_use_double_quant=True,
+  bnb_4bit_quant_type="nf4",
+  bnb_4bit_compute_dtype=torch.bfloat16
+)
+# Load base model with quantization
+base_model = AutoModelForCausalLM.from_pretrained(
+  "HuggingFaceTB/SmolLM2-1.7B-Instruct",
+  quantization_config=bnb_config,
+  device_map="auto"
+)
+# Load adapter weights (LoRA)
+model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
+# Prepare input
+system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
+user_prompt = "What's a good strategy for long-term investing?"
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": user_prompt}
+]
+formatted_prompt = tokenizer.apply_chat_template(
+    messages, tokenize=False, add_generation_prompt=True
+)
+inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(
+    inputs.input_ids,
+    max_new_tokens=256,
+    temperature=0.7,
+    top_p=0.95,
+    do_sample=True,
+    repetition_penalty=1.2
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print("Response:\n", response.strip())
 ```
 ## Training Details
 - **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
 - **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
 - **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
+- **Size of Dataset**: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics.
 ## Future Improvements
 - **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
 - **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
 - **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
+- **A larger, higher quality dataset**: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model
 ## Citation