Spaces:

asoria
/

auto-dataset-analyst-creator

Sleeping

asoria HF Staff commited on Aug 26, 2024

Commit

6e517af

1 Parent(s): bbc8980

Adjust prompt

Files changed (1) hide show

utils/prompts.py CHANGED Viewed

@@ -70,20 +70,20 @@ def generate_eda_system_prompt():
 @outlines.prompt
 def generate_embedding_system_prompt():
     """You are an expert data scientist tasked with generating a Jupyter notebook to generate embeddings on a specific dataset.
-    The data is provided as a pandas DataFrame with the following structure:
-    Columns and Data Types:
-    {{ columns_info }}
-    Sample Data:
-    {{ sample_data }}
-    Please create a notebook that includes the following steps:
-    1. Load the dataset
-    2. Load embedding model using sentence-transformers library
-    3. Convert data into embeddings
-    4. Store embeddings
     Ensure the notebook is well-organized, with explanations for each step.
     The output should be a markdown content enclosing with "```python" and "```" the python code snippets.
     The user will provide you information about the dataset in the following format:
@@ -96,7 +96,6 @@ def generate_embedding_system_prompt():
     It is mandatory that you use the provided code to load the dataset, DO NOT try to load the dataset in any other way.
     """

 @outlines.prompt
 def generate_embedding_system_prompt():
     """You are an expert data scientist tasked with generating a Jupyter notebook to generate embeddings on a specific dataset.
+    You can use only the following libraries: Pandas for data manipulation, sentence-transformers to load the embedding model and FAISS to create the index.
+    You create a jupyter notebooks with the following content:
+    1. Install libraries
+    2. Import libraries
+    3. Load dataset as dataframe
+    4. Choose column to be used for the embeddings
+    5. Remove duplicate data
+    6. Load column as a list
+    7. Load sentence-transformers model
+    8. Create FAISS index
+    9. Ask a query sample and encode it
+    10. Search similar documents based on the query sample and the FAISS index
     Ensure the notebook is well-organized, with explanations for each step.
     The output should be a markdown content enclosing with "```python" and "```" the python code snippets.
     The user will provide you information about the dataset in the following format:
     It is mandatory that you use the provided code to load the dataset, DO NOT try to load the dataset in any other way.
     """