Spaces:
Sleeping
Sleeping
File size: 6,178 Bytes
0a1e238 b125eed 5d23ddc 59ac39c 5d23ddc 0a1e238 5d23ddc 0a1e238 5d23ddc 0a1e238 5d23ddc 0a1e238 5d23ddc 0a1e238 5d23ddc 0a1e238 5d23ddc 0a1e238 3936853 4722147 31b5a19 c48611a 5c7d86e 614088e db74214 eab471f 5d23ddc 74a942d 31b5a19 5c7d86e 2c973b6 5c7d86e 591a68c 5c7d86e afff22e 31b5a19 dbb81b7 0a1e238 57455f3 31b5a19 59ac39c 034d6ea 59ac39c 034d6ea 59ac39c 31b5a19 546504d 31b5a19 a12fa3b 36011ef 31b5a19 a12fa3b f57ce49 13beabf 4722147 b0ab312 13beabf 5c7d86e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
import streamlit as st
import os
import pkg_resources
# # Using this wacky hack to get around the massively ridicolous managed env loading order
def is_installed(package_name, version):
try:
pkg = pkg_resources.get_distribution(package_name)
return pkg.version == version
except pkg_resources.DistributionNotFound:
return False
# shifted from below - this must be the first streamlit call; otherwise: problems
st.set_page_config(page_title = 'Vulnerability Analysis - EVALUATION PIPELINE',
initial_sidebar_state='expanded', layout="wide")
@st.cache_resource # cache the function so it's not called every time app.py is triggered
def install_packages():
install_commands = []
if not is_installed("spaces", "0.12.0"):
install_commands.append("pip install spaces==0.17.0")
if not is_installed("pydantic", "1.8.2"):
install_commands.append("pip install pydantic==1.8.2")
if not is_installed("typer", "0.4.0"):
install_commands.append("pip install typer==0.4.0")
if install_commands:
os.system(" && ".join(install_commands))
# install packages if necessary
install_packages()
import appStore.vulnerability_analysis as vulnerability_analysis
import appStore.target as target_analysis
import appStore.doc_processing as processing
from utils.uploadAndExample import add_upload
from utils.vulnerability_classifier import label_dict
from utils.config import model_dict
import pandas as pd
import plotly.express as px
# st.set_page_config(page_title = 'Vulnerability Analysis',
# initial_sidebar_state='expanded', layout="wide")
with st.sidebar:
# upload and example doc
choice = st.sidebar.radio(label = 'Select the Document',
help = 'You can upload the document \
or else you can try a example document',
options = ('Upload Document', 'Try Example'),
horizontal = True)
add_upload(choice)
# Now display the document name
if 'filename' in st.session_state:
doc_name = os.path.basename(st.session_state['filename'])
# Create a list of options for the dropdown
model_options = ['Llama3.1-8B','Llama3.1-70B','Llama3.1-405B','Zephyr 7B β','Mistral-7B','Mixtral-8x7B']
# Dropdown selectbox: model
model_sel = st.selectbox('Select a model:', model_options)
model_sel_name = model_dict[model_sel]
st.session_state['model_sel_name'] = model_sel_name
with st.container():
st.markdown("<h2 style='text-align: center;'> Vulnerability Analysis 3.1 - EVALUATION PIPELINE</h2>", unsafe_allow_html=True)
st.write(' ')
with st.expander("ℹ️ - About this app", expanded=False):
st.write(
"""
Pipeline for automated evaluation of vulnerability and target classifications using GPT-4o as judge. The pipeline is integrated into a hacked version of the app, so you just run the thing as normal (I haven't pushed it HF yet as there are some dependency issues). It does the classifications and summarizations- then sends the full dataframe of classified paragraphs (i.e. not filtered) to openai using crafted prompts. This happens twice - once for vulnerabilitiles and once for target - for each row in the dataframe. You then get the option to download an Excel file containing 3 sheets:
* Meta: document name (using doc code3 as per the master excel 'vul_africa_01')
* Summary: summarizations
* Results: shows each paragraph, classifications, and automated evals:
* VC_prob: % probability that the vulnerability classification is True (using logprobs output from GPT-4o)
* VC_keywords: fuzzy matching index from 0 to 1 reflecting the alignment with the label text (levenshtein distance). I included this as a secondary measure because GPT4o understandably struggles with some of the vulnerability classifications.
* VC_eval: Boolean based on VC_prob > 0.5 OR VC_keywords > 0
* TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
* TMA_eval: Boolean based on TMA_prob > 0.5
* VC_check: used for manually noting corrections
* TMA_check: used for manually noting corrections
Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
The complete protocol is as follows:
1. VC_eval == 'FALSE': manually check vulnerability labels that are suspect
2. VC_eval == 'TRUE' AND VC_prob < 0.9: manually check all remaining vulnerability labels where GPT4o was not very certain (in some cases here I also use the VC_keywords to further filter down if there were alot of samples returned)
3. TMA_eval == 'FALSE': manually check target labels that are suspect
4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
Takeaways from evaluation:
* It appears the classifiers experience performance degradation in French-language source documents
* In particular, the vulnerability classifier had issues
* The target classifier returns alot of false negatives in all languages
* The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
""")
st.write("")
# Define the apps used
apps = [processing.app, vulnerability_analysis.app, target_analysis.app]
multiplier_val =1/len(apps)
if st.button("Analyze Document"):
prg = st.progress(0.0)
for i,func in enumerate(apps):
func()
prg.progress((i+1)*multiplier_val)
# If there is data stored
if 'key0' in st.session_state:
vulnerability_analysis.vulnerability_display()
target_analysis.target_display(model_sel_name=model_sel_name, doc_name = doc_name)
|