wakeupmh commited on
Commit
7a11d41
·
1 Parent(s): 7133da4

refactor: using falcon

Browse files
Files changed (1) hide show
  1. app.py +114 -21
app.py CHANGED
@@ -17,7 +17,7 @@ DATA_DIR = "/data" if os.path.exists("/data") else "."
17
  DATASET_DIR = os.path.join(DATA_DIR, "rag_dataset")
18
  DATASET_PATH = os.path.join(DATASET_DIR, "dataset")
19
  TOKENIZER_MODEL = "google/flan-t5-small"
20
- SUMMARIZATION_MODEL= "HuggingFaceTB/SmolVLM-256M-Instruct"
21
  # SUMMARIZATION_MODEL="rhaymison/t5-portuguese-small-summarization"
22
 
23
  @st.cache_resource
@@ -210,26 +210,119 @@ def generate_answer(question, context, max_length=512):
210
  clean_question = clean_text(question)
211
 
212
  # Format the input for T5 (it expects a specific format)
213
- input_text = f"""Objective:
214
- Provide a clear, simple, and well-structured answer about autism that is easy to understand for a general audience. Use the provided research papers as references.
215
-
216
- Question: {clean_question}
217
- Research Papers:
218
- {clean_context}
219
-
220
- Instructions:
221
- Start with a simple definition
222
- - Explain what autism is in a short and clear way, avoiding technical terms.
223
- - Use real-life examples
224
- - Give practical and relatable examples to help illustrate key points.
225
- - Explain research in simple words
226
- - Instead of just citing studies, summarize their key findings in a way that anyone can understand. Example: "A study from X University found that..."
227
- - Avoid complex words
228
- - If a scientific term is needed, provide a short and simple explanation.
229
- - Use clear formatting
230
- - Write in short paragraphs, bullet points, or numbered lists to improve readability.
231
- - Keep a friendly tone
232
- - Make the response engaging and easy to follow, so people without prior knowledge can understand."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
233
 
234
  try:
235
  # T5 expects a specific format for the input
 
17
  DATASET_DIR = os.path.join(DATA_DIR, "rag_dataset")
18
  DATASET_PATH = os.path.join(DATASET_DIR, "dataset")
19
  TOKENIZER_MODEL = "google/flan-t5-small"
20
+ SUMMARIZATION_MODEL= "Falconsai/text_summarization"
21
  # SUMMARIZATION_MODEL="rhaymison/t5-portuguese-small-summarization"
22
 
23
  @st.cache_resource
 
210
  clean_question = clean_text(question)
211
 
212
  # Format the input for T5 (it expects a specific format)
213
+ input_text = f"""Context
214
+ Input Question: {clean_question}
215
+ Source Materials: {clean_context}
216
+ Primary Objective
217
+ Generate a comprehensive yet accessible summary of autism research that bridges the gap between academic knowledge and public understanding. The response should be evidence-based while remaining engaging and practical for general readers.
218
+ Content Structure
219
+ 1. Opening Overview
220
+
221
+ Begin with a concise, jargon-free definition of autism
222
+ Frame the topic within everyday experiences
223
+ Establish relevance to the reader's understanding
224
+
225
+ 2. Key Concepts Breakdown
226
+
227
+ Transform complex research findings into digestible information
228
+ Structure information in a logical progression
229
+ Connect each point to real-world scenarios
230
+
231
+ 3. Research Integration
232
+ Present research findings using this framework:
233
+
234
+ Main finding: [Clear statement of what was discovered]
235
+ Real-world meaning: [Practical implications]
236
+ Context: [How this fits into broader understanding]
237
+
238
+ 4. Examples and Applications
239
+ Include:
240
+
241
+ Concrete, relatable scenarios
242
+ Day-to-day situations
243
+ Practical implications for families and individuals
244
+
245
+ Writing Guidelines
246
+ Language Requirements
247
+
248
+ Target reading level: 8th grade
249
+ Sentence length: Maximum 20 words
250
+ Paragraph length: 2-4 sentences
251
+ Technical terms: Must include plain language explanation in parentheses
252
+
253
+ Tone and Style
254
+
255
+ Empathetic and respectful
256
+ Solution-focused approach
257
+ Balanced perspective
258
+ Inclusive language
259
+
260
+ Formatting Specifications
261
+
262
+ Use headers for major sections
263
+ Include white space between concepts
264
+ Implement bullet points for lists
265
+ Bold key terms with immediate explanations
266
+
267
+ Research Citation Format
268
+ When referencing studies, follow this pattern:
269
+ "Research from [Institution] shows [finding in simple terms]. This means [practical interpretation]."
270
+ Quality Checks
271
+ Before finalizing, ensure the summary:
272
+
273
+ Answers the original question directly
274
+ Maintains scientific accuracy while being accessible
275
+ Provides actionable insights
276
+ Respects neurodiversity perspectives
277
+ Balances depth with clarity
278
+
279
+ Response Framework
280
+
281
+ Introduction (2-3 sentences)
282
+
283
+ Core definition
284
+ Relevance statement
285
+
286
+
287
+ Main Body (3-4 key points)
288
+
289
+ Evidence-based insights
290
+ Practical examples
291
+ Real-world applications
292
+
293
+
294
+ Conclusion (2-3 sentences)
295
+
296
+ Summary of key takeaways
297
+ Actionable next steps or implications
298
+
299
+
300
+
301
+ Engagement Elements
302
+
303
+ Include thought-provoking questions
304
+ Provide relatable scenarios
305
+ Connect to common experiences
306
+ Offer practical applications
307
+
308
+ Modified Output Analysis
309
+ The response should be evaluated against these criteria:
310
+
311
+ Clarity: Is the information immediately understandable?
312
+ Accuracy: Does it reflect the research correctly?
313
+ Relevance: Does it address the specific question?
314
+ Practicality: Are the insights actionable?
315
+ Engagement: Does it maintain reader interest?
316
+
317
+ Special Considerations
318
+
319
+ Acknowledge spectrum nature of autism
320
+ Respect diverse perspectives
321
+ Focus on strengths and challenges
322
+ Avoid deficit-based language
323
+ Include support-oriented information
324
+
325
+ Remember to adapt the depth and complexity based on the specific question while maintaining accessibility and scientific accuracy."""
326
 
327
  try:
328
  # T5 expects a specific format for the input