Update README.md to include new streaming functionality, enhance usage examples, and improve model performance parameters. Added sections for simple non-streaming usage and clarified limitations and future improvements for FinSight AI.
76512ea
language: | |
- en | |
license: mit | |
library_name: transformers | |
tags: | |
- financial-analysis | |
- conversational | |
- finance | |
- qlora | |
- financial-advice | |
- text-generation | |
- peft | |
- lora | |
- adapter | |
inference: false | |
model-index: | |
- name: FinSight AI | |
results: | |
- task: | |
type: text-generation | |
name: Financial Advisory Generation | |
dataset: | |
type: custom | |
name: Financial Conversations | |
metrics: | |
- type: rouge1 | |
value: 12.57% | |
name: ROUGE-1 Improvement | |
- type: rouge2 | |
value: 79.48% | |
name: ROUGE-2 Improvement | |
- type: rougeL | |
value: 24.00% | |
name: ROUGE-L Improvement | |
- type: bleu | |
value: 135.36% | |
name: BLEU Improvement | |
base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct | |
<div align="center"> | |
<h1>FinSight AI - Financial Advisory Chatbot</h1> | |
<p>A fine-tuned version of SmolLM2-1.7B optimized for financial advice and discussion.</p> | |
</div> | |
<div align="center"> | |
<a href="https://pytorch.org/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white" alt="PyTorch"></a> | |
<a href="https://huggingface.co/transformers/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Transformers-FFAE33?style=for-the-badge&logoColor=white" alt="Transformers"></a> | |
<a href="https://huggingface.co/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-0050C5?style=for-the-badge&logoColor=white" alt="Hugging Face"></a> | |
<a href="https://github.com/microsoft/LoRA" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/LoRA-2088FF?style=for-the-badge&logo=github&logoColor=white" alt="LoRA"></a> | |
<a href="https://github.com/TimDettmers/bitsandbytes" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/BitsAndBytes-4D4D4D?style=for-the-badge&logo=github&logoColor=white" alt="BitsAndBytes"></a> | |
</div> | |
<div align="center"> | |
<h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">Read Model Paper 📄</a></h3> | |
</div> | |
## Model Details | |
- **Base Model**: HuggingFaceTB/SmolLM2-1.7B-Instruct | |
- **Task**: Financial Advisory and Discussion | |
- **Training Data**: Curated dataset of ~11,000 financial conversations (~16.5M tokens) | |
- **Training Method**: QLoRA (4-bit quantization with LoRA) | |
- **Language**: English | |
- **License**: MIT | |
## Model Description | |
FinSight AI is a specialized financial advisory assistant built by fine-tuning SmolLM2-1.7B-Instruct using QLoRA (Quantized Low-Rank Adaptation). The model has been trained on a comprehensive dataset of financial conversations to provide accurate, concise, and helpful information across various financial domains including personal finance, investing, market analysis, and financial planning. | |
Our evaluation demonstrates significant performance improvements across all standard NLP metrics **(ROUGE-1 , ROUGE-2, ROUGE-L & BLEU)**, showcasing the effectiveness of our domain-specific training approach. The model exhibits enhanced capabilities with richer financial terminology usage, more precise responses, improved handling of numerical data, and greater technical accuracy - all while maintaining a compact, resource-efficient architecture suitable for deployment on consumer hardware. | |
## Usage | |
### Streaming function | |
```python | |
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer | |
import torch | |
from peft import PeftModel | |
import threading | |
# For 4-bit quantized inference (recommended) | |
bnb_config = BitsAndBytesConfig( | |
load_in_4bit=True, | |
bnb_4bit_use_double_quant=True, | |
bnb_4bit_quant_type="nf4", | |
bnb_4bit_compute_dtype=torch.bfloat16 | |
) | |
# First load the base model with quantization | |
base_model = AutoModelForCausalLM.from_pretrained( | |
"HuggingFaceTB/SmolLM2-1.7B-Instruct", | |
quantization_config=bnb_config, | |
device_map="auto" | |
) | |
# Then load the adapter weights (LoRA) | |
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai") | |
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") | |
device = 'cuda' if torch.cuda.is_available() else 'cpu' | |
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights" | |
prompt = "What's your name, and what're you good at?" | |
messages = [ | |
{"role": "system", "content": system_prompt}, | |
{"role": "user", "content": prompt} | |
] | |
formatted_prompt = tokenizer.apply_chat_template( | |
messages, tokenize=False, add_generation_prompt=True | |
) | |
# Tokenize the formatted prompt | |
inputs = tokenizer(formatted_prompt, return_tensors="pt") | |
inputs = {k: v.to(device) for k, v in inputs.items()} # Move all tensors to device | |
# Create a streamer | |
streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True) | |
# Adjust generation parameters for more controlled responses | |
generation_config = { | |
"max_new_tokens": 256, | |
"temperature": 0.6, | |
"top_p": 0.95, | |
"do_sample": True, | |
"pad_token_id": tokenizer.eos_token_id, | |
"eos_token_id": tokenizer.eos_token_id, | |
"repetition_penalty": 1.2, | |
"no_repeat_ngram_size": 4, | |
"num_beams": 1, | |
"early_stopping": False, | |
"length_penalty": 1.0, | |
} | |
# Combine inputs and generation config for the generate function | |
generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer} | |
# Start generation in a separate thread | |
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs) | |
thread.start() | |
# Iterate over the generated text | |
print("Response: ", end="") | |
for text in streamer: | |
print(text, end="", flush=True) | |
``` | |
### Simple Non-Streaming Usage | |
If you prefer a simpler approach without streaming: | |
```python | |
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
import torch | |
from peft import PeftModel | |
# For 4-bit quantized inference | |
bnb_config = BitsAndBytesConfig( | |
load_in_4bit=True, | |
bnb_4bit_use_double_quant=True, | |
bnb_4bit_quant_type="nf4", | |
bnb_4bit_compute_dtype=torch.bfloat16 | |
) | |
# Load base model with quantization | |
base_model = AutoModelForCausalLM.from_pretrained( | |
"HuggingFaceTB/SmolLM2-1.7B-Instruct", | |
quantization_config=bnb_config, | |
device_map="auto" | |
) | |
# Load adapter weights (LoRA) | |
model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai") | |
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") | |
# Prepare input | |
system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights" | |
user_prompt = "What's a good strategy for long-term investing?" | |
messages = [ | |
{"role": "system", "content": system_prompt}, | |
{"role": "user", "content": user_prompt} | |
] | |
formatted_prompt = tokenizer.apply_chat_template( | |
messages, tokenize=False, add_generation_prompt=True | |
) | |
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device) | |
# Generate response | |
outputs = model.generate( | |
inputs.input_ids, | |
max_new_tokens=256, | |
temperature=0.7, | |
top_p=0.95, | |
do_sample=True, | |
repetition_penalty=1.2 | |
) | |
response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
print("Response:\n", response.strip()) | |
``` | |
## Training Details | |
The model was trained using the following configuration: | |
- **QLoRA Parameters**: | |
- Rank (r): 64 | |
- Alpha: 16 | |
- Target modules: Query, Key, Value projections, MLP layers | |
- 4-bit NF4 quantization with double quantization | |
- **Training Hyperparameters**: | |
- Learning rate: 2e-4 | |
- Epochs: 2 | |
- Batch size: 2 (with gradient accumulation steps of 4) | |
- Weight decay: 0.05 | |
- Scheduler: Cosine with restarts | |
- Warmup ratio: 0.15 | |
- **Hardware**: Consumer-grade NVIDIA RTX 3050 GPU with 6GB VRAM | |
#### **More details can be found in the paper linked above.** | |
## Limitations | |
- **Information Currency**: Financial data and knowledge within the model is limited to the training data cutoff date. Market conditions, regulations, and financial instruments may have changed since then. | |
- **No Real-time Information**: The model operates without internet connectivity and cannot access current market data, breaking news, or recent economic developments. | |
- **Not Financial Advice**: Responses should not be considered personalized financial advice. The model cannot account for individual financial situations, risk tolerances, or specific circumstances required for proper financial planning. | |
- **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets. | |
- **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws. | |
- **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data. | |
- **Size of Dataset**: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics. | |
## Future Improvements | |
- **Retrieval Augmented Generation (RAG)**: Implementing RAG would allow the model to reference current financial data, market statistics, and regulatory information before generating responses, significantly improving accuracy and relevance. | |
- **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations. | |
- **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets. | |
- **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy. | |
- **A larger, higher quality dataset**: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model | |
## Citation | |
If you use FinSight AI in your research, please cite: | |
```md | |
@misc{FinSightAI2025, | |
author = {Zahemen, FinsightAI Team}, | |
title = {FinSight AI: Enhancing Financial Domain Performance of Small Language Models Through QLoRA Fine-tuning}, | |
year = {2025}, | |
publisher = {GitHub}, | |
journal = {GitHub repository}, | |
howpublished = {\url{https://github.com/zahemen9900/FinsightAI}} | |
} | |
``` |