--- language: - en license: mit library_name: transformers tags: - financial-analysis - conversational - finance - qlora - financial-advice - text-generation - peft - lora - adapter inference: false model-index: - name: FinSight AI results: - task: type: text-generation name: Financial Advisory Generation dataset: type: custom name: Financial Conversations metrics: - type: rouge1 value: 12.57% name: ROUGE-1 Improvement - type: rouge2 value: 79.48% name: ROUGE-2 Improvement - type: rougeL value: 24.00% name: ROUGE-L Improvement - type: bleu value: 135.36% name: BLEU Improvement base_model: HuggingFaceTB/SmolLM2-1.7B-Instruct ---

FinSight AI - Financial Advisory Chatbot

A fine-tuned version of SmolLM2-1.7B optimized for financial advice and discussion.

PyTorch Transformers Hugging Face LoRA BitsAndBytes

Read Model Paper 📄

## Model Details - **Base Model**: HuggingFaceTB/SmolLM2-1.7B-Instruct - **Task**: Financial Advisory and Discussion - **Training Data**: Curated dataset of ~11,000 financial conversations (~16.5M tokens) - **Training Method**: QLoRA (4-bit quantization with LoRA) - **Language**: English - **License**: MIT Check out training repo here: [Finsight AI](https://github.com/zahemen9900/FinsightAI.git) ## Model Description FinSight AI is a specialized financial advisory assistant built by fine-tuning SmolLM2-1.7B-Instruct using QLoRA (Quantized Low-Rank Adaptation). The model has been trained on a comprehensive dataset of financial conversations to provide accurate, concise, and helpful information across various financial domains including personal finance, investing, market analysis, and financial planning. Our evaluation demonstrates significant performance improvements across all standard NLP metrics **(ROUGE-1 , ROUGE-2, ROUGE-L & BLEU)**, showcasing the effectiveness of our domain-specific training approach. The model exhibits enhanced capabilities with richer financial terminology usage, more precise responses, improved handling of numerical data, and greater technical accuracy - all while maintaining a compact, resource-efficient architecture suitable for deployment on consumer hardware. ## Usage ### Streaming function ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextIteratorStreamer import torch from peft import PeftModel import threading # For 4-bit quantized inference (recommended) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) # First load the base model with quantization base_model = AutoModelForCausalLM.from_pretrained( "HuggingFaceTB/SmolLM2-1.7B-Instruct", quantization_config=bnb_config, device_map="auto" ) # Then load the adapter weights (LoRA) model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai") tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") device = 'cuda' if torch.cuda.is_available() else 'cpu' system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights" prompt = "What's your name, and what're you good at?" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": prompt} ] formatted_prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Tokenize the formatted prompt inputs = tokenizer(formatted_prompt, return_tensors="pt") inputs = {k: v.to(device) for k, v in inputs.items()} # Move all tensors to device # Create a streamer streamer = TextIteratorStreamer(tokenizer, timeout=20.0, skip_prompt=True, skip_special_tokens=True) # Adjust generation parameters for more controlled responses generation_config = { "max_new_tokens": 256, "temperature": 0.6, "top_p": 0.95, "do_sample": True, "pad_token_id": tokenizer.eos_token_id, "eos_token_id": tokenizer.eos_token_id, "repetition_penalty": 1.2, "no_repeat_ngram_size": 4, "num_beams": 1, "early_stopping": False, "length_penalty": 1.0, } # Combine inputs and generation config for the generate function generation_kwargs = {**generation_config, "input_ids": inputs["input_ids"], "streamer": streamer} # Start generation in a separate thread thread = threading.Thread(target=model.generate, kwargs=generation_kwargs) thread.start() # Iterate over the generated text print("Response: ", end="") for text in streamer: print(text, end="", flush=True) ``` ### Simple Non-Streaming Usage If you prefer a simpler approach without streaming: ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch from peft import PeftModel # For 4-bit quantized inference bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) # Load base model with quantization base_model = AutoModelForCausalLM.from_pretrained( "HuggingFaceTB/SmolLM2-1.7B-Instruct", quantization_config=bnb_config, device_map="auto" ) # Load adapter weights (LoRA) model = PeftModel.from_pretrained(base_model, "zahemen9900/finsight-ai") tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct") # Prepare input system_prompt = "You are Finsight, a finance bot trained to assist users with financial insights" user_prompt = "What's a good strategy for long-term investing?" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] formatted_prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device) # Generate response outputs = model.generate( inputs.input_ids, max_new_tokens=256, temperature=0.7, top_p=0.95, do_sample=True, repetition_penalty=1.2 ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Response:\n", response.strip()) ``` ## Training Details The model was trained using the following configuration: - **QLoRA Parameters**: - Rank (r): 64 - Alpha: 16 - Target modules: Query, Key, Value projections, MLP layers - 4-bit NF4 quantization with double quantization - **Training Hyperparameters**: - Learning rate: 2e-4 - Epochs: 2 - Batch size: 2 (with gradient accumulation steps of 4) - Weight decay: 0.05 - Scheduler: Cosine with restarts - Warmup ratio: 0.15 - **Hardware**: Consumer-grade NVIDIA RTX 3050 GPU with 6GB VRAM #### **More details can be found in the paper linked above.** ## Limitations - **Information Currency**: Financial data and knowledge within the model is limited to the training data cutoff date. Market conditions, regulations, and financial instruments may have changed since then. - **No Real-time Information**: The model operates without internet connectivity and cannot access current market data, breaking news, or recent economic developments. - **Not Financial Advice**: Responses should not be considered personalized financial advice. The model cannot account for individual financial situations, risk tolerances, or specific circumstances required for proper financial planning. - **Language Limitations**: While optimized for English financial terminology, the model may have reduced performance with non-English financial terms or concepts specific to regional markets. - **Regulatory Compliance**: The model is not updated with the latest financial regulations across different jurisdictions and cannot ensure compliance with local financial laws. - **Complexity Handling**: May struggle with highly complex or niche financial scenarios that were underrepresented in the training data. - **Size of Dataset**: The size of the dataset appears to be a significant bottleneck in the fine-tuning process, as we observed it's inability to generate very useful content for niche or extremely specific topics. ## Future Improvements - **Retrieval Augmented Generation (RAG)**: Implementing RAG would allow the model to reference current financial data, market statistics, and regulatory information before generating responses, significantly improving accuracy and relevance. - **Domain-Specific Fine-tuning**: Additional training on specialized financial domains like cryptocurrency, derivatives trading, and international tax regulations. - **Multilingual Support**: Expanding capabilities to handle financial terminology and concepts across multiple languages and markets. - **Personalization Framework**: Developing mechanisms to better contextualize responses based on stated user preferences while maintaining privacy. - **A larger, higher quality dataset**: The model already shows promising results on the relatively small dataset trained on (16.5M tokens). This suggests that a larger high-quality dataset would yield very promisingly in future fine-tuning pipelines. Steps will be taken to address this in a future version of the model ## Citation If you use FinSight AI in your research, please cite: ```md @misc{FinSightAI2025, author = {Zahemen, FinsightAI Team}, title = {FinSight AI: Enhancing Financial Domain Performance of Small Language Models Through QLoRA Fine-tuning}, year = {2025}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/zahemen9900/FinsightAI}} } ```