mihalykiss commited on
Commit
a57714d
Β·
1 Parent(s): 470f102

UX and formatted text fix

Browse files
Files changed (1) hide show
  1. app.py +16 -4
app.py CHANGED
@@ -35,7 +35,19 @@ label_mapping = {
35
  39: 'text-davinci-002', 40: 'text-davinci-003'
36
  }
37
 
 
 
 
 
 
 
 
 
 
 
 
38
  def classify_text(text):
 
39
  if not text.strip():
40
  result_message = (
41
  f"---- \n"
@@ -72,7 +84,7 @@ def classify_text(text):
72
  else:
73
  result_message = (
74
  f"**The text is** <span class='highlight-ai'>**{ai_total_prob:.2f}%** likely <b>AI generated</b>.</span>\n\n"
75
- f"**Identified AI Model:** {ai_argmax_model}"
76
  )
77
 
78
  return result_message
@@ -92,7 +104,7 @@ This tool uses the <b>ModernBERT</b> model to identify whether a given text was
92
  <div style="line-height: 1.8;">
93
  βœ… <b>Human Verification:</b> Human-written content is clearly marked.<br>
94
  πŸ” <b>Model Detection:</b> Can identify content from over 40 AI models.<br>
95
- πŸ“ˆ <b>Accuracy:</b> Works best with longer texts.
96
  πŸ“„ <b>Read more:</b> Our method is detailed in our paper:
97
  <a href="https://aclanthology.org/2025.genaidetect-1.15/" target="_blank" style="color: #007bff; text-decoration: none;"><b>LINK</b></a>.
98
  </div>
@@ -110,8 +122,8 @@ AI_texts = [
110
  ]
111
 
112
  Human_texts = [
113
- "The present book is intended as a text in basic mathematics. As such, it can have multiple use: for a one-year course in the high schools during the third or fourth year (if possible the third, so that calculus can be taken during the fourth year); for a complementary reference in earlier high school grades (elementary algebra and geometry are covered); for a one-semester course at the college level, to review or to get a firm foundation in the basic mathematics necessary to go ahead in calculus, linear algebra, or other topics. Years ago, the colleges used to give courses in β€œ college algebra” and other subjects which should have been covered in high school. More recently, such courses have been thought unnecessary, but some experiences I have had show that they are just as necessary as ever. What is happening is that thecolleges are getting a wide variety of students from high schools, ranging from exceedingly well-prepared ones who have had a good first course in calculus, down to very poorly prepared ones. T",
114
- "Fats are rich in energy, build body cells, support brain development of infants, help body processes, and facilitate the absorption and use of fat-soluble vitamins A, D, E, and K. The major component of lipids is glycerol and fatty acids. According to chemical properties, fatty acids can be divided into saturated and unsaturated fatty acids. Generally lipids containing saturated fatty acids are solid at room temperature and include animal fats (butter, lard, tallow, ghee) and tropical oils (palm,coconut, palm kernel). Saturated fats increase the risk of heart disease."
115
  "BERT, which stands for Bidirectional Encoder Representations from Transformers, is a deep learning model introduced by Google in 2018 to help machines understand the complex nuances of human language. Thanks to its Transformer-based architecture, it can grasp the deeper meaning and context of words in the text. This makes BERT especially effective at tasks like text classification, translation, question answering, and language inference."]
116
 
117
  iface = gr.Blocks(css="""
 
35
  39: 'text-davinci-002', 40: 'text-davinci-003'
36
  }
37
 
38
+ def clean_text(text):
39
+
40
+ text = text.replace("\r\n", "\n").replace("\r", "\n")
41
+
42
+ text = re.sub(r"\n\s*\n+", "\n\n", text)
43
+
44
+ text = re.sub(r"[ \t]+", " ", text)
45
+
46
+ text = text.strip()
47
+ return text
48
+
49
  def classify_text(text):
50
+ cleaned_text = clean_text(text)
51
  if not text.strip():
52
  result_message = (
53
  f"---- \n"
 
84
  else:
85
  result_message = (
86
  f"**The text is** <span class='highlight-ai'>**{ai_total_prob:.2f}%** likely <b>AI generated</b>.</span>\n\n"
87
+ f"**Identified AI Model: {ai_argmax_model}**"
88
  )
89
 
90
  return result_message
 
104
  <div style="line-height: 1.8;">
105
  βœ… <b>Human Verification:</b> Human-written content is clearly marked.<br>
106
  πŸ” <b>Model Detection:</b> Can identify content from over 40 AI models.<br>
107
+ πŸ“ˆ <b>Accuracy:</b> Works best with longer texts.<br>
108
  πŸ“„ <b>Read more:</b> Our method is detailed in our paper:
109
  <a href="https://aclanthology.org/2025.genaidetect-1.15/" target="_blank" style="color: #007bff; text-decoration: none;"><b>LINK</b></a>.
110
  </div>
 
122
  ]
123
 
124
  Human_texts = [
125
+ "The present book is intended as a text in basic mathematics. As such, it can have multiple use: for a one-year course in the high schools during the third or fourth year (if possible the third, so that calculus can be taken during the fourth year); for a complementary reference in earlier high school grades (elementary algebra and geometry are covered); for a one-semester course at the college level, to review or to get a firm foundation in the basic mathematics necessary to go ahead in calculus, linear algebra, or other topics. Years ago, the colleges used to give courses in β€œ college algebra” and other subjects which should have been covered in high school. More recently, such courses have been thought unnecessary, but some experiences I have had show that they are just as necessary as ever. What is happening is that thecolleges are getting a wide variety of students from high schools, ranging from exceedingly well-prepared ones who have had a good first course in calculus, down to very poorly prepared ones.",
126
+ "Fats are rich in energy, build body cells, support brain development of infants, help body processes, and facilitate the absorption and use of fat-soluble vitamins A, D, E, and K. The major component of lipids is glycerol and fatty acids. According to chemical properties, fatty acids can be divided into saturated and unsaturated fatty acids. Generally lipids containing saturated fatty acids are solid at room temperature and include animal fats (butter, lard, tallow, ghee) and tropical oils (palm,coconut, palm kernel). Saturated fats increase the risk of heart disease.",
127
  "BERT, which stands for Bidirectional Encoder Representations from Transformers, is a deep learning model introduced by Google in 2018 to help machines understand the complex nuances of human language. Thanks to its Transformer-based architecture, it can grasp the deeper meaning and context of words in the text. This makes BERT especially effective at tasks like text classification, translation, question answering, and language inference."]
128
 
129
  iface = gr.Blocks(css="""