Spaces:

VinitT
/

CA-Foundation

Running

App Files Files Community

“vinit5112” commited on 4 days ago

Commit

aff287e

1 Parent(s): d994686

load model locally

Browse files

Files changed (18) hide show

Dockerfile +7 -1
backend/CONVERSATION_HISTORY_SYSTEM.md +0 -249
backend/SETUP_OFFLINE.md +0 -68
backend/STREAMING_ANALYSIS.md +0 -178
backend/download_model.py +0 -66
backend/vector_store.py +59 -43
model/all-MiniLM-L6-v2/1_Pooling/config.json +10 -0
model/all-MiniLM-L6-v2/README.md +173 -0
model/all-MiniLM-L6-v2/config.json +25 -0
model/all-MiniLM-L6-v2/config_sentence_transformers.json +14 -0
model/all-MiniLM-L6-v2/model.safetensors +3 -0
model/all-MiniLM-L6-v2/modules.json +20 -0
model/all-MiniLM-L6-v2/sentence_bert_config.json +4 -0
model/all-MiniLM-L6-v2/special_tokens_map.json +37 -0
model/all-MiniLM-L6-v2/tokenizer.json +0 -0
model/all-MiniLM-L6-v2/tokenizer_config.json +65 -0
model/all-MiniLM-L6-v2/vocab.txt +0 -0
temp.py +6 -0

Dockerfile CHANGED Viewed

@@ -25,7 +25,13 @@ RUN pip install --no-cache-dir -r requirements.txt
 # Copy backend code
 COPY backend/ /app/backend
-# Copy built frontend from previous stage
 COPY --from=frontend-build /app/frontend/build /app/frontend_build
 # Install nginx

 # Copy backend code
 COPY backend/ /app/backend
+COPY model/ /app/model/
+ENV TRANSFORMERS_CACHE=/app/model
+ENV HF_HUB_OFFLINE=1
+ENV TRANSFORMERS_OFFLINE=1
 COPY --from=frontend-build /app/frontend/build /app/frontend_build
 # Install nginx

backend/CONVERSATION_HISTORY_SYSTEM.md DELETED Viewed

@@ -1,249 +0,0 @@
-# Conversation History Management System
-## Overview
-The conversation history system has been upgraded from a basic memory-only implementation to a comprehensive, persistent storage solution using localStorage with advanced features.
-## 🔄 **Previous Implementation (Memory Only)**
-```javascript
-// ❌ OLD - Lost on page refresh
-const [conversations, setConversations] = useState([]);
-```
-## ✅ **New Implementation (Persistent Storage)**
-### 1. **Core Storage Utility** (`utils/conversationStorage.js`)
-A comprehensive utility class that handles all conversation persistence:
-```javascript
-import ConversationStorage from './utils/conversationStorage';
-// Load conversations from localStorage
-const conversations = ConversationStorage.loadConversations();
-// Save conversations to localStorage
-ConversationStorage.saveConversations(conversations);
-```
-### 2. **Enhanced Conversation Structure**
-```javascript
-{
-  id: "timestamp_based_id",
-  title: "Conversation Title",
-  messages: [
-    {
-      id: "message_id",
-      role: "user" | "assistant",
-      content: "message content",
-      timestamp: Date
-    }
-  ],
-  createdAt: Date,
-  updatedAt: Date  // ✅ NEW - Track when conversation was last modified
-}
-```
-### 3. **Automatic Persistence**
-- **Load on App Start**: Conversations are automatically loaded from localStorage
-- **Save on Changes**: All conversation updates are automatically saved
-- **No Manual Intervention**: Everything happens transparently
-## 🚀 **Key Features**
-### ✅ **Persistent Storage**
-- Conversations survive page refreshes
-- Conversations persist across browser sessions
-- Automatic loading on app startup
-### ✅ **Conversation Management**
-- **Create**: New conversations are automatically saved
-- **Update**: Message additions and title changes are saved
-- **Delete**: Conversations can be permanently removed
-- **Search**: Full-text search across all conversations
-### ✅ **Storage Optimization**
-- **Quota Management**: Handles localStorage size limits
-- **Conversation Limits**: Maximum 50 conversations (configurable)
-- **Automatic Cleanup**: Reduces storage when quota exceeded
-### ✅ **Import/Export**
-- **Export**: Download all conversations as JSON
-- **Import**: Upload and merge conversation files
-- **Backup**: Easy backup and restore functionality
-### ✅ **Statistics & Monitoring**
-- **Storage Usage**: Track localStorage consumption
-- **Conversation Count**: Monitor total conversations
-- **Message Count**: Track total messages across all conversations
-## 🛠 **Implementation Details**
-### App.js Integration
-```javascript
-// Load conversations on app start
-useEffect(() => {
-  const savedConversations = ConversationStorage.loadConversations();
-  if (savedConversations.length > 0) {
-    setConversations(savedConversations);
-    setChatStarted(true);
-    setActiveConversationId(savedConversations[0].id);
-  }
-}, []);
-// Enhanced conversation management
-const updateConversations = (updatedConversations) => {
-  setConversations(updatedConversations);
-  ConversationStorage.saveConversations(updatedConversations);
-};
-```
-### ChatInterface.js Integration
-```javascript
-// Conversations are automatically saved when updated
-setConversations(prev => prev.map(conv =>
-  conv.id === conversationId
-    ? { ...conv, messages: [...conv.messages, newMessage] }
-    : conv
-));
-```
-### Sidebar.js Integration
-```javascript
-// Delete conversations with confirmation
-const handleDelete = (conversationId) => {
-  if (window.confirm('Are you sure you want to delete this conversation?')) {
-    onDeleteConversation(conversationId);
-  }
-};
-```
-## 📊 **Storage Management**
-### Local Storage Structure
-```
-Key: "ca_study_conversations"
-Value: JSON array of conversation objects
-```
-### Storage Limits
-- **Maximum Conversations**: 50 (prevents localStorage overflow)
-- **Auto-Reduction**: Reduces to 25 conversations if quota exceeded
-- **Size Monitoring**: Tracks storage usage in KB
-### Error Handling
-- **JSON Parse Errors**: Gracefully handles corrupted data
-- **Storage Quota**: Automatic handling of localStorage limits
-- **Network Issues**: Offline-first design
-## 🔧 **Advanced Features**
-### 1. **Search Functionality**
-```javascript
-// Search conversations by title or content
-const results = ConversationStorage.searchConversations("accounting");
-```
-### 2. **Export Conversations**
-```javascript
-// Download all conversations as JSON file
-ConversationStorage.exportConversations();
-```
-### 3. **Import Conversations**
-```javascript
-// Import conversations from file
-const result = await ConversationStorage.importConversations(file);
-console.log(`Imported ${result.count} conversations`);
-```
-### 4. **Storage Statistics**
-```javascript
-// Get detailed storage information
-const stats = ConversationStorage.getStatistics();
-// Returns: { totalConversations, totalMessages, storageSize, ... }
-```
-## 🔐 **Data Security & Privacy**
-### Client-Side Storage
-- **No Server Storage**: All data stays in user's browser
-- **Privacy First**: No conversation data sent to servers
-- **User Control**: Users can export/delete their own data
-### Data Format
-- **JSON Structure**: Human-readable format
-- **Portable**: Easy to migrate between devices
-- **Versionable**: Future-proof with version tracking
-## 🎯 **User Experience Improvements**
-### Before (Memory Only)
-❌ Lost conversations on page refresh
-❌ No conversation history
-❌ No persistent sessions
-❌ No conversation management
-### After (Persistent Storage)
-✅ Conversations survive page refreshes
-✅ Full conversation history
-✅ Persistent user sessions
-✅ Advanced conversation management
-✅ Search and filter capabilities
-✅ Export/import functionality
-✅ Storage monitoring and optimization
-## 🚀 **Future Enhancements**
-### Planned Features
-1. **Cloud Sync**: Optional cloud storage integration
-2. **User Authentication**: Multi-device synchronization
-3. **Advanced Search**: Semantic search within conversations
-4. **Tags/Categories**: Organize conversations by topics
-5. **Shared Conversations**: Share conversations with others
-6. **Analytics**: Conversation usage analytics
-### Backend Integration (Optional)
-```javascript
-// Future: Optional backend storage
-const backendStorage = new BackendConversationStorage();
-await backendStorage.syncConversations(localConversations);
-```
-## 📋 **Migration Guide**
-### For Existing Users
-1. **Automatic Migration**: Existing conversations will be migrated to new format
-2. **No Data Loss**: All existing conversations preserved
-3. **Enhanced Features**: Immediate access to new capabilities
-### For New Users
-1. **Automatic Setup**: No configuration required
-2. **Immediate Persistence**: Conversations saved from first use
-3. **Full Feature Access**: All features available immediately
-## 🔧 **Troubleshooting**
-### Common Issues
-1. **Storage Quota Exceeded**: Automatically handled with conversation reduction
-2. **Corrupted Data**: Graceful fallback to empty conversation list
-3. **Import Errors**: Validation and error reporting for file imports
-### Debug Information
-```javascript
-// Check storage status
-const stats = ConversationStorage.getStatistics();
-console.log('Storage Stats:', stats);
-// Clear all conversations (emergency)
-ConversationStorage.clearAllConversations();
-```
-## ✅ **Conclusion**
-The conversation history system has been completely upgraded to provide:
-- **Persistent Storage**: No more lost conversations
-- **Advanced Management**: Full CRUD operations
-- **User Control**: Export/import capabilities
-- **Performance**: Optimized for large conversation histories
-- **Reliability**: Robust error handling and data protection
-This system provides a professional-grade conversation management experience while maintaining simplicity and user privacy.

backend/SETUP_OFFLINE.md DELETED Viewed

@@ -1,68 +0,0 @@
-# Offline Mode Setup Guide
-## Problem
-The application fails to start with network connectivity errors when trying to download the sentence transformer model from Hugging Face.
-## Error Message
-```
-Failed to resolve 'huggingface.co' ([Errno 11001] getaddrinfo failed)
-```
-## Solutions
-### Option 1: Download Model When You Have Internet Access
-1. When you have internet access, run the download script:
-   ```bash
-   cd backend
-   python download_model.py
-   ```
-2. This will download and cache the model locally for offline use.
-### Option 2: Manual Download
-If you have internet access on another machine:
-1. On a machine with internet access, run:
-   ```python
-   from sentence_transformers import SentenceTransformer
-   model = SentenceTransformer('all-MiniLM-L6-v2')
-   ```
-2. Copy the cached model from:
-   - Windows: `C:\Users\{username}\.cache\huggingface\transformers\`
-   - Linux/Mac: `~/.cache/huggingface/transformers/`
-3. Place it in the same location on your offline machine.
-### Option 3: Force Offline Mode
-If you believe the model is already cached, you can force offline mode by setting environment variables:
-```bash
-set TRANSFORMERS_OFFLINE=1
-set HF_HUB_OFFLINE=1
-python backend_api.py
-```
-### Option 4: Network Troubleshooting
-If you should have internet access:
-1. Check your internet connection
-2. If behind a corporate firewall, ensure `huggingface.co` is accessible
-3. Try accessing `https://huggingface.co` in your browser
-4. Contact your IT department if needed
-## Verification
-After setting up offline mode, you can verify the model is working by running:
-```bash
-python download_model.py
-```
-This will check if the model is cached and available for offline use.
-## Technical Details
-The sentence transformer model "all-MiniLM-L6-v2" is approximately 80MB and is used for generating embeddings from text for the vector search functionality.
-The application has been modified to:
-1. Try loading the model normally first
-2. Fall back to offline mode if network fails
-3. Provide clear error messages with solutions

backend/STREAMING_ANALYSIS.md DELETED Viewed

@@ -1,178 +0,0 @@
-# Streaming Implementation Analysis
-## Overview
-This document analyzes the streaming implementation across the backend and frontend components of the CA Study Assistant application.
-## ✅ Backend Implementation Analysis
-### 1. RAG Streaming Function (`rag.py`)
-- **Status**: ✅ **GOOD** - Recently updated with latest API
-- **Implementation**:
-  ```python
-  for chunk in self.client.models.generate_content_stream(
-      model='gemini-2.5-flash',
-      contents=prompt
-  ):
-      yield chunk.text
-  ```
-- **✅ Improvements Made**:
-  - Updated to use `generate_content_stream` instead of deprecated method
-  - Uses `gemini-2.5-flash` model (latest)
-  - Proper error handling with try-catch
-### 2. FastAPI Streaming Endpoint (`backend_api.py`)
-- **Status**: ✅ **IMPROVED** - Enhanced with better error handling
-- **Implementation**:
-  ```python
-  @app.post("/api/ask_stream")
-  async def ask_question_stream(request: QuestionRequest):
-      async def event_generator():
-          for chunk in rag_system.ask_question_stream(request.question):
-              if chunk:  # Only yield non-empty chunks
-                  yield chunk
-      return StreamingResponse(event_generator(), media_type="text/plain")
-  ```
-- **✅ Improvements Made**:
-  - Added null/empty chunk filtering
-  - Enhanced error handling in generator
-  - Proper async generator implementation
-## ✅ Frontend Implementation Analysis
-### 1. API Service (`services/api.js`)
-- **Status**: ✅ **IMPROVED** - Enhanced with better error handling
-- **Implementation**:
-  ```javascript
-  export const sendMessageStream = async (message, onChunk) => {
-      const response = await fetch(`${API_BASE_URL}/ask_stream`, {
-          method: 'POST',
-          headers: { 'Content-Type': 'application/json' },
-          body: JSON.stringify({ question: message }),
-      });
-      const reader = response.body.getReader();
-      const decoder = new TextDecoder();
-      while (true) {
-          const { done, value } = await reader.read();
-          if (done) break;
-          const chunk = decoder.decode(value, { stream: true });
-          if (chunk) onChunk(chunk);
-      }
-  };
-  ```
-- **✅ Improvements Made**:
-  - Added HTTP status code checking
-  - Added reader.releaseLock() for proper cleanup
-  - Enhanced error handling
-  - Added null chunk filtering
-### 2. Chat Interface (`components/ChatInterface.js`)
-- **Status**: ✅ **GOOD** - Proper real-time UI updates
-- **Implementation**:
-  ```javascript
-  await sendMessageStream(message.trim(), (chunk) => {
-      fullResponse += chunk;
-      setConversations(prev => prev.map(conv =>
-          conv.id === conversationId ? {
-              ...conv,
-              messages: conv.messages.map(msg =>
-                  msg.id === assistantMessageId
-                      ? { ...msg, content: fullResponse }
-                      : msg
-              ),
-          } : conv
-      ));
-  });
-  ```
-- **✅ Features**:
-  - Real-time message updates
-  - Proper loading states
-  - Error handling with toast notifications
-  - Typing indicators during streaming
-## 🔧 Additional Improvements Made
-### 1. Error Handling Enhancement
-- **Backend**: Added comprehensive error handling in streaming generator
-- **Frontend**: Added HTTP status checking and proper resource cleanup
-- **Both**: Added null/empty chunk filtering
-### 2. Testing Infrastructure
-- **Created**: `test_streaming.py` - Comprehensive test suite for streaming
-- **Features**:
-  - API connection testing
-  - Streaming functionality testing
-  - Error handling verification
-  - Performance metrics
-### 3. Documentation
-- **Created**: `STREAMING_ANALYSIS.md` - This comprehensive analysis
-- **Updated**: Inline code comments for better maintainability
-## 🚀 How to Test the Implementation
-### 1. Test API Connection
-```bash
-cd backend
-python test_streaming.py
-```
-### 2. Test Full Application
-```bash
-# Terminal 1 - Backend
-cd backend
-python backend_api.py
-# Terminal 2 - Frontend
-cd frontend
-npm start
-```
-### 3. Test Streaming Manually
-1. Open the application in browser
-2. Ask a question
-3. Observe real-time streaming response
-4. Check browser dev tools for any errors
-## 📊 Performance Characteristics
-### Backend
-- **Latency**: Low - streams immediately as chunks arrive from Gemini
-- **Memory**: Efficient - no buffering, direct streaming
-- **Error Recovery**: Graceful - continues streaming even if some chunks fail
-### Frontend
-- **UI Responsiveness**: Excellent - real-time updates without blocking
-- **Memory Usage**: Low - processes chunks as they arrive
-- **Error Handling**: Comprehensive - proper cleanup and user feedback
-## 🎯 API Compatibility
-### Google Generative AI API
-- **✅ Model**: `gemini-2.5-flash` (latest)
-- **✅ Method**: `generate_content_stream` (current)
-- **✅ Parameters**: `model` and `contents` (correct format)
-### FastAPI Streaming
-- **✅ Response Type**: `StreamingResponse` (correct)
-- **✅ Media Type**: `text/plain` (compatible with frontend)
-- **✅ Async Generator**: Proper async/await implementation
-### Frontend Fetch API
-- **✅ ReadableStream**: Proper stream handling
-- **✅ TextDecoder**: Correct UTF-8 decoding
-- **✅ Resource Management**: Proper cleanup
-## ✅ Conclusion
-The streaming implementation is **WORKING CORRECTLY** and has been enhanced with:
-1. **Latest API compatibility** - Uses gemini-2.5-flash with correct method
-2. **Robust error handling** - Comprehensive error management
-3. **Performance optimizations** - Efficient streaming without buffering
-4. **Proper resource management** - No memory leaks or resource issues
-5. **Real-time UI updates** - Smooth user experience
-6. **Comprehensive testing** - Test suite for validation
-The implementation follows best practices and should provide a smooth, responsive chat experience with real-time streaming responses.

backend/download_model.py DELETED Viewed

@@ -1,66 +0,0 @@
-#!/usr/bin/env python3
-"""
-Download the sentence transformer model for offline use.
-Run this script when you have internet access to cache the model locally.
-"""
-import os
-import sys
-from sentence_transformers import SentenceTransformer
-def download_model():
-    """Download and cache the sentence transformer model."""
-    try:
-        print("Downloading sentence transformer model 'all-MiniLM-L6-v2'...")
-        print("This may take a few minutes on first run...")
-        # This will download and cache the model
-        model = SentenceTransformer("all-MiniLM-L6-v2")
-        # Test that it works
-        test_text = "This is a test sentence."
-        embedding = model.encode([test_text])
-        print(f"✓ Model downloaded successfully!")
-        print(f"✓ Model tested successfully!")
-        print(f"✓ Embedding dimension: {len(embedding[0])}")
-        print(f"✓ Model cache location: {model.cache_folder}")
-        return True
-    except Exception as e:
-        print(f"✗ Failed to download model: {e}")
-        return False
-def check_model_exists():
-    """Check if the model is already cached."""
-    try:
-        # Try to load from cache
-        import os
-        os.environ['TRANSFORMERS_OFFLINE'] = '1'
-        os.environ['HF_HUB_OFFLINE'] = '1'
-        model = SentenceTransformer("all-MiniLM-L6-v2")
-        print("✓ Model is already cached and available for offline use!")
-        return True
-    except Exception:
-        print("✗ Model is not cached or not available for offline use")
-        return False
-if __name__ == "__main__":
-    print("Sentence Transformer Model Downloader")
-    print("=" * 40)
-    # Check if model already exists
-    if check_model_exists():
-        print("\nModel is already available. No download needed.")
-        sys.exit(0)
-    # Download the model
-    print("\nDownloading model...")
-    if download_model():
-        print("\n✓ Setup complete! You can now run the application offline.")
-    else:
-        print("\n✗ Download failed. Please check your internet connection.")
-        sys.exit(1)

backend/vector_store.py CHANGED Viewed

@@ -38,62 +38,78 @@ class VectorStore:
         self._create_collection_if_not_exists()
     def _initialize_embedding_model(self):
-        """Initialize the embedding model with offline support"""
         try:
-            # Try to load the model normally first
-            print("Attempting to load sentence transformer model...")
-            model = SentenceTransformer("all-MiniLM-L6-v2")
-            print("Successfully loaded sentence transformer model")
             return model
         except Exception as e:
-            print(f"Failed to load model online: {e}")
-            print("Attempting to load model in offline mode...")
-            try:
-                # Try to load from cache with offline mode
-                import os
-                os.environ['TRANSFORMERS_OFFLINE'] = '1'
-                os.environ['HF_HUB_OFFLINE'] = '1'
-                model = SentenceTransformer("all-MiniLM-L6-v2", cache_folder=None)
-                print("Successfully loaded model in offline mode")
-                return model
-            except Exception as offline_error:
-                print(f"Failed to load model in offline mode: {offline_error}")
-                # Try to find a local cache directory
-                try:
-                    import transformers
-                    cache_dir = os.path.join(os.path.expanduser("~"), ".cache", "huggingface", "transformers")
-                    if os.path.exists(cache_dir):
-                        print(f"Looking for cached model in: {cache_dir}")
-                        # Try to load from specific cache directory
-                        model = SentenceTransformer("all-MiniLM-L6-v2", cache_folder=cache_dir)
-                        print("Successfully loaded model from cache")
-                        return model
-                except Exception as cache_error:
-                    print(f"Failed to load from cache: {cache_error}")
-                # If all else fails, provide instructions
-                error_msg = """
-                Failed to initialize sentence transformer model. This is likely due to network connectivity issues.
-                Solutions:
-                1. Check your internet connection
-                2. If behind a corporate firewall, ensure huggingface.co is accessible
-                3. Pre-download the model when you have internet access by running:
-                   python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
-                4. Or manually download the model and place it in your cache directory
-                For now, the application will not work without the embedding model.
-                """
-                print(error_msg)
-                raise RuntimeError(f"Cannot initialize embedding model: {str(e)}")
     def _create_collection_if_not_exists(self) -> bool:
         """

         self._create_collection_if_not_exists()
     def _initialize_embedding_model(self):
+        """Initialize the embedding model from a local directory"""
         try:
+            print("Loading sentence transformer model from local path...")
+            # Resolve local path to model directory
+            current_dir = os.path.dirname(os.path.abspath(__file__))
+            local_model_path = os.path.join(current_dir, "..", "model", "all-MiniLM-L6-v2")
+            model = SentenceTransformer(local_model_path)
+            print("Successfully loaded local sentence transformer model")
             return model
         except Exception as e:
+            print(f"Failed to load local model: {e}")
+            raise RuntimeError("Failed to initialize embedding model from local path")
+    # def _initialize_embedding_model(self):
+    #     """Initialize the embedding model with offline support"""
+    #     try:
+    #         # Try to load the model normally first
+    #         print("Attempting to load sentence transformer model...")
+    #         model = SentenceTransformer("all-MiniLM-L6-v2")
+    #         print("Successfully loaded sentence transformer model")
+    #         return model
+    #     except Exception as e:
+    #         print(f"Failed to load model online: {e}")
+    #         print("Attempting to load model in offline mode...")
+    #         try:
+    #             # Try to load from cache with offline mode
+    #             import os
+    #             os.environ['TRANSFORMERS_OFFLINE'] = '1'
+    #             os.environ['HF_HUB_OFFLINE'] = '1'
+    #             model = SentenceTransformer("all-MiniLM-L6-v2", cache_folder=None)
+    #             print("Successfully loaded model in offline mode")
+    #             return model
+    #         except Exception as offline_error:
+    #             print(f"Failed to load model in offline mode: {offline_error}")
+    #             # Try to find a local cache directory
+    #             try:
+    #                 import transformers
+    #                 cache_dir = os.path.join(os.path.expanduser("~"), ".cache", "huggingface", "transformers")
+    #                 if os.path.exists(cache_dir):
+    #                     print(f"Looking for cached model in: {cache_dir}")
+    #                     # Try to load from specific cache directory
+    #                     model = SentenceTransformer("all-MiniLM-L6-v2", cache_folder=cache_dir)
+    #                     print("Successfully loaded model from cache")
+    #                     return model
+    #             except Exception as cache_error:
+    #                 print(f"Failed to load from cache: {cache_error}")
+    #             # If all else fails, provide instructions
+    #             error_msg = """
+    #             Failed to initialize sentence transformer model. This is likely due to network connectivity issues.
+    #             Solutions:
+    #             1. Check your internet connection
+    #             2. If behind a corporate firewall, ensure huggingface.co is accessible
+    #             3. Pre-download the model when you have internet access by running:
+    #                python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
+    #             4. Or manually download the model and place it in your cache directory
+    #             For now, the application will not work without the embedding model.
+    #             """
+    #             print(error_msg)
+    #             raise RuntimeError(f"Cannot initialize embedding model: {str(e)}")
     def _create_collection_if_not_exists(self) -> bool:
         """

model/all-MiniLM-L6-v2/1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 384,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

model/all-MiniLM-L6-v2/README.md ADDED Viewed

	@@ -0,0 +1,173 @@

+---
+language: en
+license: apache-2.0
+library_name: sentence-transformers
+tags:
+- sentence-transformers
+- feature-extraction
+- sentence-similarity
+- transformers
+datasets:
+- s2orc
+- flax-sentence-embeddings/stackexchange_xml
+- ms_marco
+- gooaq
+- yahoo_answers_topics
+- code_search_net
+- search_qa
+- eli5
+- snli
+- multi_nli
+- wikihow
+- natural_questions
+- trivia_qa
+- embedding-data/sentence-compression
+- embedding-data/flickr30k-captions
+- embedding-data/altlex
+- embedding-data/simple-wiki
+- embedding-data/QQP
+- embedding-data/SPECTER
+- embedding-data/PAQ_pairs
+- embedding-data/WikiAnswers
+pipeline_tag: sentence-similarity
+---
+# all-MiniLM-L6-v2
+This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+## Usage (Sentence-Transformers)
+Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
+```
+pip install -U sentence-transformers
+```
+Then you can use the model like this:
+```python
+from sentence_transformers import SentenceTransformer
+sentences = ["This is an example sentence", "Each sentence is converted"]
+model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
+embeddings = model.encode(sentences)
+print(embeddings)
+```
+## Usage (HuggingFace Transformers)
+Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
+```python
+from transformers import AutoTokenizer, AutoModel
+import torch
+import torch.nn.functional as F
+#Mean Pooling - Take attention mask into account for correct averaging
+def mean_pooling(model_output, attention_mask):
+    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
+    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
+    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+# Sentences we want sentence embeddings for
+sentences = ['This is an example sentence', 'Each sentence is converted']
+# Load model from HuggingFace Hub
+tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
+model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
+# Tokenize sentences
+encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+# Compute token embeddings
+with torch.no_grad():
+    model_output = model(**encoded_input)
+# Perform pooling
+sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+# Normalize embeddings
+sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
+print("Sentence embeddings:")
+print(sentence_embeddings)
+```
+------
+## Background
+The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
+contrastive learning objective. We used the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model and fine-tuned in on a
+1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
+We developed this model during the
+[Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
+organized by Hugging Face. We developed this model as part of the project:
+[Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks.
+## Intended uses
+Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures
+the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
+By default, input text longer than 256 word pieces is truncated.
+## Training procedure
+### Pre-training
+We use the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model. Please refer to the model card for more detailed information about the pre-training procedure.
+### Fine-tuning
+We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch.
+We then apply the cross entropy loss by comparing with true pairs.
+#### Hyper parameters
+We trained our model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core).
+We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
+a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
+#### Training data
+We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences.
+We sampled each dataset given a weighted probability which configuration is detailed in the `data_config.json` file.
+| Dataset                                                  | Paper                                    | Number of training tuples  |
+|--------------------------------------------------------|:----------------------------------------:|:--------------------------:|
+| [Reddit comments (2015-2018)](https://github.com/PolyAI-LDN/conversational-datasets/tree/master/reddit) | [paper](https://arxiv.org/abs/1904.06472) | 726,484,430 |
+| [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Abstracts) | [paper](https://aclanthology.org/2020.acl-main.447/) | 116,288,806 |
+| [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs | [paper](https://doi.org/10.1145/2623330.2623677) | 77,427,422 |
+| [PAQ](https://github.com/facebookresearch/PAQ) (Question, Answer) pairs | [paper](https://arxiv.org/abs/2102.07033) | 64,371,441 |
+| [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Titles) | [paper](https://aclanthology.org/2020.acl-main.447/) | 52,603,982 |
+| [S2ORC](https://github.com/allenai/s2orc) (Title, Abstract) | [paper](https://aclanthology.org/2020.acl-main.447/) | 41,769,185 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Body) pairs  | - | 25,316,456 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title+Body, Answer) pairs  | - | 21,396,559 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Answer) pairs  | - | 21,396,559 |
+| [MS MARCO](https://microsoft.github.io/msmarco/) triplets | [paper](https://doi.org/10.1145/3404835.3462804) | 9,144,553 |
+| [GOOAQ: Open Question Answering with Diverse Answer Types](https://github.com/allenai/gooaq) | [paper](https://arxiv.org/pdf/2104.08727.pdf) | 3,012,496 |
+| [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 1,198,260 |
+| [Code Search](https://huggingface.co/datasets/code_search_net) | - | 1,151,414 |
+| [COCO](https://cocodataset.org/#home) Image captions | [paper](https://link.springer.com/chapter/10.1007%2F978-3-319-10602-1_48) | 828,395|
+| [SPECTER](https://github.com/allenai/specter) citation triplets | [paper](https://doi.org/10.18653/v1/2020.acl-main.207) | 684,100 |
+| [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Question, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 681,164 |
+| [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Question) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 659,896 |
+| [SearchQA](https://huggingface.co/datasets/search_qa) | [paper](https://arxiv.org/abs/1704.05179) | 582,261 |
+| [Eli5](https://huggingface.co/datasets/eli5) | [paper](https://doi.org/10.18653/v1/p19-1346) | 325,475 |
+| [Flickr 30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/229/33) | 317,695 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles) | | 304,525 |
+| AllNLI ([SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) | [paper SNLI](https://doi.org/10.18653/v1/d15-1075), [paper MultiNLI](https://doi.org/10.18653/v1/n18-1101) | 277,230 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (bodies) | | 250,519 |
+| [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles+bodies) | | 250,460 |
+| [Sentence Compression](https://github.com/google-research-datasets/sentence-compression) | [paper](https://www.aclweb.org/anthology/D13-1155/) | 180,000 |
+| [Wikihow](https://github.com/pvl/wikihow_pairs_dataset) | [paper](https://arxiv.org/abs/1810.09305) | 128,542 |
+| [Altlex](https://github.com/chridey/altlex/) | [paper](https://aclanthology.org/P16-1135.pdf) | 112,696 |
+| [Quora Question Triplets](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs) | - | 103,663 |
+| [Simple Wikipedia](https://cs.pomona.edu/~dkauchak/simplification/) | [paper](https://www.aclweb.org/anthology/P11-2117/) | 102,225 |
+| [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/1455) | 100,231 |
+| [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) | [paper](https://aclanthology.org/P18-2124.pdf) | 87,599 |
+| [TriviaQA](https://huggingface.co/datasets/trivia_qa) | - | 73,346 |
+| **Total** | | **1,170,060,424** |

model/all-MiniLM-L6-v2/config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.53.0",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model/all-MiniLM-L6-v2/config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "__version__": {
+    "sentence_transformers": "5.0.0",
+    "transformers": "4.53.0",
+    "pytorch": "2.7.1+cpu"
+  },
+  "model_type": "SentenceTransformer",
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model/all-MiniLM-L6-v2/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1377e9af0ca0b016a9f2aa584d6fc71ab3ea6804fae21ef9fb1416e2944057ac
+size 90864192

model/all-MiniLM-L6-v2/modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

model/all-MiniLM-L6-v2/sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 256,
+    "do_lower_case": false
+}

model/all-MiniLM-L6-v2/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

model/all-MiniLM-L6-v2/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

model/all-MiniLM-L6-v2/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 128,
+  "model_max_length": 256,
+  "never_split": null,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

model/all-MiniLM-L6-v2/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

temp.py ADDED Viewed

	@@ -0,0 +1,6 @@

+from sentence_transformers import SentenceTransformer
+# This downloads the model to your local cache
+model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
+print("Model downloading")
+model.save('model/all-MiniLM-L6-v2')