mtyrrell commited on
Commit
034d6ea
·
1 Parent(s): dbb81b7
Files changed (1) hide show
  1. app.py +3 -10
app.py CHANGED
@@ -84,7 +84,8 @@ with st.expander("ℹ️ - About this app", expanded=False):
84
  * TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
85
  * TMA_eval: Boolean based on TMA_prob > 0.5
86
  * VC_check: used for manually noting corrections
87
- * TMA_check: " "
 
88
 
89
  Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
90
  The complete protocol is as follows:
@@ -94,6 +95,7 @@ with st.expander("ℹ️ - About this app", expanded=False):
94
  4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
95
  5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
96
 
 
97
  Takeaways from evaluation:
98
  * It appears the classifiers experience performance degradation in French-language source documents
99
  * In particular, the vulnerability classifier had issues
@@ -101,15 +103,6 @@ with st.expander("ℹ️ - About this app", expanded=False):
101
  * The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
102
  """)
103
 
104
- st.write("""
105
- What Happens in background?
106
-
107
- - Step 1: Once the document is provided to app, it undergoes *Pre-processing*.\
108
- In this step the document is broken into smaller paragraphs \
109
- (based on word/sentence count).
110
- - Step 2: The paragraphs are then fed to the **Vulnerability Classifier** which detects if
111
- the paragraph contains any or multiple references to vulnerable groups.
112
- """)
113
 
114
  st.write("")
115
 
 
84
  * TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
85
  * TMA_eval: Boolean based on TMA_prob > 0.5
86
  * VC_check: used for manually noting corrections
87
+ * TMA_check: used for manually noting corrections
88
+
89
 
90
  Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
91
  The complete protocol is as follows:
 
95
  4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
96
  5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
97
 
98
+
99
  Takeaways from evaluation:
100
  * It appears the classifiers experience performance degradation in French-language source documents
101
  * In particular, the vulnerability classifier had issues
 
103
  * The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
104
  """)
105
 
 
 
 
 
 
 
 
 
 
106
 
107
  st.write("")
108