Spaces:
Runtime error
Runtime error
initial commit
Browse files- app.py +57 -0
- formatted_data.csv +8 -0
- requirements.txt +2 -0
app.py
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import pandas as pd
|
| 3 |
+
|
| 4 |
+
# Path to the CSV file
|
| 5 |
+
csv_file_path = "formatted_data.csv"
|
| 6 |
+
|
| 7 |
+
# Reading the CSV file
|
| 8 |
+
df = pd.read_csv(csv_file_path)
|
| 9 |
+
|
| 10 |
+
# Displaying the DataFrame in the Streamlit app with enhanced interactivity
|
| 11 |
+
st.title('Olas Predict Benchmark')
|
| 12 |
+
st.markdown('## Leaderboard showing the performance of Olas Predict tools on the Autocast dataset.')
|
| 13 |
+
st.markdown("<style>.big-font {font-size:20px !important;}</style>", unsafe_allow_html=True)
|
| 14 |
+
st.markdown('Use the table below to interact with the data and explore the performance of different tools.', unsafe_allow_html=True)
|
| 15 |
+
st.dataframe(df.style.format(precision=2))
|
| 16 |
+
|
| 17 |
+
st.markdown("""
|
| 18 |
+
## Benchmark Overview
|
| 19 |
+
- The benchmark evaluates the performance of Olas Predict tools on the Autocast dataset.
|
| 20 |
+
- The dataset has been refined to enhance the evaluation of the tools.
|
| 21 |
+
- The leaderboard shows the performance of the tools based on the refined dataset.
|
| 22 |
+
- The script to run the benchmark is available in the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
|
| 23 |
+
|
| 24 |
+
## How to run your tools on the benchmark
|
| 25 |
+
- Fork the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
|
| 26 |
+
- Git init the submodules and update the submodule to get the latest dataset `mech` tool.
|
| 27 |
+
- `git submodule init`
|
| 28 |
+
- `git submodule update --remote --recursive`
|
| 29 |
+
- Include your tool in the `mech/packages` directory accordingly.
|
| 30 |
+
- Guidelines on how to include your tool can be found [here](xxx).
|
| 31 |
+
- Run the benchmark script.
|
| 32 |
+
|
| 33 |
+
## Dataset Overview
|
| 34 |
+
This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474).
|
| 35 |
+
The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools.
|
| 36 |
+
Both the original and refined datasets are hosted on HuggingFace.
|
| 37 |
+
|
| 38 |
+
### Refined Dataset Files
|
| 39 |
+
- You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
|
| 40 |
+
- `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
|
| 41 |
+
- `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
|
| 42 |
+
- `retrieved_docs.pkl`: Contains all the scraped texts.
|
| 43 |
+
|
| 44 |
+
### Filtering Criteria
|
| 45 |
+
To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
|
| 46 |
+
- URLs not returning HTTP 200 status codes are excluded.
|
| 47 |
+
- Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
|
| 48 |
+
- Links with less than 1000 words are removed.
|
| 49 |
+
- Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.
|
| 50 |
+
|
| 51 |
+
### Scraping Approach
|
| 52 |
+
The content of the filtered URLs has been scraped using various libraries, depending on the source:
|
| 53 |
+
- `pypdf2` for PDF URLs.
|
| 54 |
+
- `wikipediaapi` for Wikipedia pages.
|
| 55 |
+
- `requests`, `readability-lxml`, and `html2text` for most other sources.
|
| 56 |
+
- `requests`, `beautifulsoup`, and `html2text` for BBC links.
|
| 57 |
+
""", unsafe_allow_html=True)
|
formatted_data.csv
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Tool,Accuracy,Correct,Total,Mean Tokens Used,Mean Cost ($)
|
| 2 |
+
claude-prediction-offline,0.7201834862385321,157,218,779.4770642201835,0.006891669724770637
|
| 3 |
+
claude-prediction-online,0.6600660066006601,200,303,1505.3135313531352,0.013348171617161701
|
| 4 |
+
prediction-online,0.676737160120846,224,331,1219.6918429003022,0.001332990936555879
|
| 5 |
+
prediction-offline,0.6599326599326599,196,297,579.6565656565657,0.000621023569023569
|
| 6 |
+
prediction-online-summarized-info,0.6209150326797386,190,306,1008.4542483660131,0.0011213790849673195
|
| 7 |
+
prediction-offline-sme,0.599406528189911,202,337,1190.2017804154302,0.0013518635014836643
|
| 8 |
+
prediction-online-sme,0.5905044510385756,199,337,1834.919881305638,0.0020690207715133428
|
requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit
|
| 2 |
+
pandas
|