Spaces:

arshy
/

leaderboard

Runtime error

App Files Files Community

arshy commited on Mar 26, 2024

Commit

33a5c4f

1 Parent(s): 85e2c3f

initial commit

Browse files

Files changed (3) hide show

app.py +57 -0
formatted_data.csv +8 -0
requirements.txt +2 -0

app.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import streamlit as st
+import pandas as pd
+# Path to the CSV file
+csv_file_path = "formatted_data.csv"
+# Reading the CSV file
+df = pd.read_csv(csv_file_path)
+# Displaying the DataFrame in the Streamlit app with enhanced interactivity
+st.title('Olas Predict Benchmark')
+st.markdown('## Leaderboard showing the performance of Olas Predict tools on the Autocast dataset.')
+st.markdown("<style>.big-font {font-size:20px !important;}</style>", unsafe_allow_html=True)
+st.markdown('Use the table below to interact with the data and explore the performance of different tools.', unsafe_allow_html=True)
+st.dataframe(df.style.format(precision=2))
+st.markdown("""
+## Benchmark Overview
+- The benchmark evaluates the performance of Olas Predict tools on the Autocast dataset.
+- The dataset has been refined to enhance the evaluation of the tools.
+- The leaderboard shows the performance of the tools based on the refined dataset.
+- The script to run the benchmark is available in the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
+## How to run your tools on the benchmark
+- Fork the repo [here](https://github.com/valory-xyz/olas-predict-benchmark).
+- Git init the submodules and update the submodule to get the latest dataset `mech` tool.
+    - `git submodule init`
+    - `git submodule update --remote --recursive`
+- Include your tool in the `mech/packages` directory accordingly.
+    - Guidelines on how to include your tool can be found [here](xxx).
+- Run the benchmark script.
+## Dataset Overview
+This project leverages the Autocast dataset from the research paper titled ["Forecasting Future World Events with Neural Networks"](https://arxiv.org/abs/2206.15474).
+The dataset has undergone further refinement to enhance the performance evaluation of Olas mech prediction tools.
+Both the original and refined datasets are hosted on HuggingFace.
+### Refined Dataset Files
+- You can find the refined dataset on HuggingFace [here](https://huggingface.co/datasets/valory/autocast).
+- `autocast_questions_filtered.json`: A JSON subset of the initial autocast dataset.
+- `autocast_questions_filtered.pkl`: A pickle file mapping URLs to their respective scraped documents within the filtered dataset.
+- `retrieved_docs.pkl`: Contains all the scraped texts.
+### Filtering Criteria
+To refine the dataset, we applied the following criteria to ensure the reliability of the URLs:
+- URLs not returning HTTP 200 status codes are excluded.
+- Difficult-to-scrape sites, such as Twitter and Bloomberg, are omitted.
+- Links with less than 1000 words are removed.
+- Only samples with a minimum of 5 and a maximum of 20 working URLs are retained.
+### Scraping Approach
+The content of the filtered URLs has been scraped using various libraries, depending on the source:
+- `pypdf2` for PDF URLs.
+- `wikipediaapi` for Wikipedia pages.
+- `requests`, `readability-lxml`, and `html2text` for most other sources.
+- `requests`, `beautifulsoup`, and `html2text` for BBC links.
+""", unsafe_allow_html=True)

formatted_data.csv ADDED Viewed

	@@ -0,0 +1,8 @@

+Tool,Accuracy,Correct,Total,Mean Tokens Used,Mean Cost ($)
+claude-prediction-offline,0.7201834862385321,157,218,779.4770642201835,0.006891669724770637
+claude-prediction-online,0.6600660066006601,200,303,1505.3135313531352,0.013348171617161701
+prediction-online,0.676737160120846,224,331,1219.6918429003022,0.001332990936555879
+prediction-offline,0.6599326599326599,196,297,579.6565656565657,0.000621023569023569
+prediction-online-summarized-info,0.6209150326797386,190,306,1008.4542483660131,0.0011213790849673195
+prediction-offline-sme,0.599406528189911,202,337,1190.2017804154302,0.0013518635014836643
+prediction-online-sme,0.5905044510385756,199,337,1834.919881305638,0.0020690207715133428

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ streamlit
2	+ pandas