Sergei Petrov
		
	commited on
		
		
					Commit 
							
							·
						
						0ba6d5c
	
1
								Parent(s):
							
							2fe6fac
								
lance version
Browse files- README.md +1 -1
 - gradio_app/requirements.txt +1 -1
 
    	
        README.md
    CHANGED
    
    | 
         @@ -5,7 +5,7 @@ Deliberately stripped down to leave some room for experimenting 
     | 
|
| 5 | 
         
             
            - Clone https://github.com/huggingface/transformers to a local machine
         
     | 
| 6 | 
         
             
            - Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
         
     | 
| 7 | 
         
             
            - Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
         
     | 
| 8 | 
         
            -
            - Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME.
         
     | 
| 9 | 
         
             
            - Move the database directory (.lancedb by default) to **gradio_app/**
         
     | 
| 10 | 
         
             
            - Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
         
     | 
| 11 | 
         
             
            - In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
         
     | 
| 
         | 
|
| 5 | 
         
             
            - Clone https://github.com/huggingface/transformers to a local machine
         
     | 
| 6 | 
         
             
            - Use the **prep_scrips/markdown_to_text.py** script to extract raw text from markdown from transformers/docs/source/en/
         
     | 
| 7 | 
         
             
            - Break the resulting texts down into semantically meaningful pieces. Experiment with different chunking mechanisms to make sure the semantic meaning is captured.
         
     | 
| 8 | 
         
            +
            - Use **prep_scrips/lancedb_setup.py** to embed and store chunks in a [lancedb](https://lancedb.github.io/lancedb/) instance. It also creates an index for fast ANN retrieval (not really needed for this exercise but necessary at scale). You'll need to put your own values into VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME. If you are getting lancedb errors at the inference time try to drop the index because it might be not enough data to make it work.
         
     | 
| 9 | 
         
             
            - Move the database directory (.lancedb by default) to **gradio_app/**
         
     | 
| 10 | 
         
             
            - Use the template given in **gradio_app** to wrap everything into the [Gradio](https://www.gradio.app/docs/interface) app and run it on HF [spaces](https://huggingface.co/docs/hub/spaces-config-reference). Make sure to adjust VECTOR_COLUMN_NAME, TEXT_COLUMN_NAME, DB_TABLE_NAME according to your DB setup.
         
     | 
| 11 | 
         
             
            - In your space, set up secrets OPENAI_API_KEY and HUGGING_FACE_HUB_TOKEN to use OpenAI and open-source models correspondingly
         
     | 
    	
        gradio_app/requirements.txt
    CHANGED
    
    | 
         @@ -5,5 +5,5 @@ ipywidgets==8.1.1 
     | 
|
| 5 | 
         
             
            tqdm==4.66.1
         
     | 
| 6 | 
         
             
            aiohttp==3.8.6
         
     | 
| 7 | 
         
             
            huggingface-hub==0.17.3
         
     | 
| 8 | 
         
            -
            lancedb 
     | 
| 9 | 
         
             
            openai==0.28
         
     | 
| 
         | 
|
| 5 | 
         
             
            tqdm==4.66.1
         
     | 
| 6 | 
         
             
            aiohttp==3.8.6
         
     | 
| 7 | 
         
             
            huggingface-hub==0.17.3
         
     | 
| 8 | 
         
            +
            lancedb==0.3.1
         
     | 
| 9 | 
         
             
            openai==0.28
         
     |