finsight-ai / README.md

Update README.md to include new streaming functionality, enhance usage examples, and improve model performance parameters. Added sections for simple non-streaming usage and clarified limitations and future improvements for FinSight AI.

76512ea about 24 hours ago

preview code

raw

history blame contribute delete

10.8 kB

	---
	language:
	- en
	license: mit
	library_name: transformers
	tags:
	- financial-analysis
	- conversational
	- finance
	- qlora
	- financial-advice
	- text-generation
	- peft
	- lora
	- adapter
	inference: false
	model-index:
	- name: FinSight AI
	results:
	- task:
	type: text-generation
	name: Financial Advisory Generation
	dataset:
	type: custom
	name: Financial Conversations
	metrics:
	- type: rouge1
	value: 12.57%
	name: ROUGE-1 Improvement
	- type: rouge2
	value: 79.48%
	name: ROUGE-2 Improvement
	- type: rougeL
	value: 24.00%
	name: ROUGE-L Improvement
	- type: bleu
	value: 135.36%
	name: BLEU Improvement
	base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct
	---

	<div align="center">
	<h1>FinSight AI - Financial Advisory Chatbot</h1>

	<p>A fine-tuned version of SmolLM2-1.7B optimized for financial advice and discussion.</p>
	</div>

	<div align="center">
	<a href="https://pytorch.org/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge&logo=pytorch&logoColor=white" alt="PyTorch"></a>
	<a href="https://huggingface.co/transformers/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Transformers-FFAE33?style=for-the-badge&logoColor=white" alt="Transformers"></a>
	<a href="https://huggingface.co/" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-0050C5?style=for-the-badge&logoColor=white" alt="Hugging Face"></a>
	<a href="https://github.com/microsoft/LoRA" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/LoRA-2088FF?style=for-the-badge&logo=github&logoColor=white" alt="LoRA"></a>
	<a href="https://github.com/TimDettmers/bitsandbytes" style="display: inline-block; margin: 0 4px;"><img src="https://img.shields.io/badge/BitsAndBytes-4D4D4D?style=for-the-badge&logo=github&logoColor=white" alt="BitsAndBytes"></a>
	</div>


	<div align="center">
	<h3><a href="https://github.com/zahemen9900/Datasets-for-Finsight/blob/97d7cacfff62e7b6099ef3bb0af9cf3d044a5b35/metrics/model_paper.md">Read Model Paper 📄</a></h3>
	</div>

	## Model Details

	- Base Model: HuggingFaceTB/SmolLM2-1.7B-Instruct
	- Task: Financial Advisory and Discussion
	- Training Data: Curated dataset of ~11,000 financial conversations (~16.5M tokens)
	- Training Method: QLoRA (4-bit quantization with LoRA)
	- Language: English
	- License: MIT

	## Model Description

	FinSight AI is a specialized financial advisory assistant built by fine-tuning SmolLM2-1.7B-Instruct using QLoRA (Quantized Low-Rank Adaptation). The model has been trained on a comprehensive dataset of financial conversations to provide accurate, concise, and helpful information across various financial domains including personal finance, investing, market analysis, and financial planning.

	Our evaluation demonstrates significant performance improvements across all standard NLP metrics (ROUGE-1 , ROUGE-2, ROUGE-L & BLEU), showcasing the effectiveness of our domain-specific training approach. The model exhibits enhanced capabilities with richer financial terminology usage, more precise responses, improved handling of numerical data, and greater technical accuracy - all while maintaining a compact, resource-efficient architecture suitable for deployment on consumer hardware.

	## Usage

	### Streaming function

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer
	import torch
	from peft import PeftModel
	import threading

	# For 4-bit quantized inference (recommended)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	# First load the base model with quantization
	base_model = AutoModelForCausalLM.from_pretrained(
	"HuggingFaceTB/SmolLM2-1.7B-Instruct",
	quantization_config=bnb_config,
	device_map="auto"
	)

	# Then load the adapter weights (LoRA)
	model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
	tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

	device = 'cuda' if torch.cuda.is_available() else 'cpu'
	system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
	prompt = "What's your name, and what're you good at?"

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": prompt}
	]

	formatted_prompt = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)

	# Tokenize the formatted prompt
	inputs = tokenizer(formatted_prompt, return_tensors="pt")
	inputs = {k: v.to(device) for k, v in inputs.items()} # Move all tensors to device

	# Create a streamer
	streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True)

	# Adjust generation parameters for more controlled responses
	generation_config = {
	"max_new_tokens": 256,
	"temperature": 0.6,
	"top_p": 0.95,
	"do_sample": True,
	"pad_token_id": tokenizer.eos_token_id,
	"eos_token_id": tokenizer.eos_token_id,
	"repetition_penalty": 1.2,
	"no_repeat_ngram_size": 4,
	"num_beams": 1,
	"early_stopping": False,
	"length_penalty": 1.0,
	}

	# Combine inputs and generation config for the generate function
	generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer}

	# Start generation in a separate thread
	thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
	thread.start()

	# Iterate over the generated text
	print("Response: ", end="")
	for text in streamer:
	print(text, end="", flush=True)
	```

	### Simple Non-Streaming Usage

	If you prefer a simpler approach without streaming:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import torch
	from peft import PeftModel

	# For 4-bit quantized inference
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	# Load base model with quantization
	base_model = AutoModelForCausalLM.from_pretrained(
	"HuggingFaceTB/SmolLM2-1.7B-Instruct",
	quantization_config=bnb_config,
	device_map="auto"
	)

	# Load adapter weights (LoRA)
	model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai")
	tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")

	# Prepare input
	system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights"
	user_prompt = "What's a good strategy for long-term investing?"

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": user_prompt}
	]

	formatted_prompt = tokenizer.apply_chat_template(
	messages, tokenize=False, add_generation_prompt=True
	)

	inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

	# Generate response
	outputs = model.generate(
	inputs.input_ids,
	max_new_tokens=256,
	temperature=0.7,
	top_p=0.95,
	do_sample=True,
	repetition_penalty=1.2
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print("Response:\n", response.strip())
	```

	## Training Details

	The model was trained using the following configuration:

	- QLoRA Parameters:
	- Rank (r): 64
	- Alpha: 16
	- Target modules: Query, Key, Value projections, MLP layers
	- 4-bit NF4 quantization with double quantization

	- Training Hyperparameters:
	- Learning rate: 2e-4
	- Epochs: 2
	- Batch size: 2 (with gradient accumulation steps of 4)
	- Weight decay: 0.05
	- Scheduler: Cosine with restarts
	- Warmup ratio: 0.15

	- Hardware: Consumer-grade NVIDIA RTX 3050 GPU with 6GB VRAM

	#### More details can be found in the paper linked above.

	## Limitations

	- Information Currency: Financial data and knowledge within the model is limited to the training data cutoff date. Market conditions, regulations, and financial instruments may have changed since then.
	- No Real-time Information: The model operates without internet connectivity and cannot access current market data, breaking news, or recent economic developments.
	- Not Financial Advice: Responses should not be considered personalized financial advice. The model cannot account for individual financial situations, risk tolerances, or specific circumstances required for proper financial planning.
	- Language Limitations: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets.
	- Regulatory Compliance: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws.
	- Complexity Handling: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data.
	- Size of Dataset: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics.

	## Future Improvements

	- Retrieval Augmented Generation (RAG): Implementing RAG would allow the model to reference current financial data, market statistics, and regulatory information before generating responses, significantly improving accuracy and relevance.
	- Domain-Specific Fine-tuning: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations.
	- Multilingual Support: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets.
	- Personalization Framework: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy.
	- A larger, higher quality dataset: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model

	## Citation

	If you use FinSight AI in your research, please cite:
	```md

	@misc{FinSightAI2025,
	author = {Zahemen, FinsightAI Team},
	title = {FinSight AI: Enhancing Financial Domain Performance of Small Language Models Through QLoRA Fine-tuning},
	year = {2025},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/zahemen9900/FinsightAI}}
	}
	```