Spaces:
Sleeping
Sleeping
title
Browse files
app.py
CHANGED
@@ -84,7 +84,8 @@ with st.expander("ℹ️ - About this app", expanded=False):
|
|
84 |
* TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
|
85 |
* TMA_eval: Boolean based on TMA_prob > 0.5
|
86 |
* VC_check: used for manually noting corrections
|
87 |
-
* TMA_check:
|
|
|
88 |
|
89 |
Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
|
90 |
The complete protocol is as follows:
|
@@ -94,6 +95,7 @@ with st.expander("ℹ️ - About this app", expanded=False):
|
|
94 |
4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
|
95 |
5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
|
96 |
|
|
|
97 |
Takeaways from evaluation:
|
98 |
* It appears the classifiers experience performance degradation in French-language source documents
|
99 |
* In particular, the vulnerability classifier had issues
|
@@ -101,15 +103,6 @@ with st.expander("ℹ️ - About this app", expanded=False):
|
|
101 |
* The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
|
102 |
""")
|
103 |
|
104 |
-
st.write("""
|
105 |
-
What Happens in background?
|
106 |
-
|
107 |
-
- Step 1: Once the document is provided to app, it undergoes *Pre-processing*.\
|
108 |
-
In this step the document is broken into smaller paragraphs \
|
109 |
-
(based on word/sentence count).
|
110 |
-
- Step 2: The paragraphs are then fed to the **Vulnerability Classifier** which detects if
|
111 |
-
the paragraph contains any or multiple references to vulnerable groups.
|
112 |
-
""")
|
113 |
|
114 |
st.write("")
|
115 |
|
|
|
84 |
* TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
|
85 |
* TMA_eval: Boolean based on TMA_prob > 0.5
|
86 |
* VC_check: used for manually noting corrections
|
87 |
+
* TMA_check: used for manually noting corrections
|
88 |
+
|
89 |
|
90 |
Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
|
91 |
The complete protocol is as follows:
|
|
|
95 |
4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
|
96 |
5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
|
97 |
|
98 |
+
|
99 |
Takeaways from evaluation:
|
100 |
* It appears the classifiers experience performance degradation in French-language source documents
|
101 |
* In particular, the vulnerability classifier had issues
|
|
|
103 |
* The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
|
104 |
""")
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
st.write("")
|
108 |
|