cpv_3.1_eval_pipeline

Sleeping

App Files Files Community

mtyrrell commited on Sep 26, 2024

Commit

034d6ea

1 Parent(s): dbb81b7

title

Browse files

Files changed (1) hide show

app.py +3 -10

app.py CHANGED Viewed

@@ -84,7 +84,8 @@ with st.expander("ℹ️ - About this app", expanded=False):
             * TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
             * TMA_eval: Boolean based on TMA_prob > 0.5
             * VC_check: used for manually noting corrections
-            * TMA_check: " "
         Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
         The complete protocol is as follows:
@@ -94,6 +95,7 @@ with st.expander("ℹ️ - About this app", expanded=False):
         4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
         5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
         Takeaways from evaluation:
         * It appears the classifiers experience performance degradation in French-language source documents
         * In particular, the vulnerability classifier had issues
@@ -101,15 +103,6 @@ with st.expander("ℹ️ - About this app", expanded=False):
         * The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
         """)
-    st.write("""
-        What Happens in background?
-        - Step 1: Once the document is provided to app, it undergoes *Pre-processing*.\
-        In this step the document is broken into smaller paragraphs \
-        (based on word/sentence count).
-        - Step 2: The paragraphs are then fed to the **Vulnerability Classifier** which detects if
-        the paragraph contains any or multiple references to vulnerable groups.
-        """)
     st.write("")

             * TMA_prob: % probability that the target classification is True (using logprobs output from GPT-4o)
             * TMA_eval: Boolean based on TMA_prob > 0.5
             * VC_check: used for manually noting corrections
+            * TMA_check: used for manually noting corrections
         Evaluation with GPT4o-as-judge: to clarify, the automated pipeline is not 100% trustworthy, so I was just using the 'FALSE' tags as a starting point
         The complete protocol is as follows:
         4. TMA_eval == 'TRUE' AND TMA_prob < 0.9: manually check all remaining target labels where GPT4o was not very certain.
         5. If incorrect classification: enter corrected value in 'VC_check' and 'TMA_check' columns.
         Takeaways from evaluation:
         * It appears the classifiers experience performance degradation in French-language source documents
         * In particular, the vulnerability classifier had issues
         * The GPT4o pipeline is a useful tool for the assessment, but only in terms of increasing accuracy over random sampling. It still takes time to review each document.
         """)
     st.write("")