andthattoo commited on
Commit
ebe443c
·
verified ·
1 Parent(s): 55238c6

benchmark updates for. new weights

Browse files
Files changed (1) hide show
  1. README.md +31 -29
README.md CHANGED
@@ -168,26 +168,26 @@ Let's break down the code into these steps:
168
  ```python
169
  from datetime import datetime, timedelta
170
 
171
- # Get today's date and calculate tomorrow's date
172
- today = datetime.now()
173
- tomorrow = today + timedelta(days=1)
174
- tomorrow_str = tomorrow.strftime('%Y-%m-%d')
175
 
176
  # Define the time slots
177
- start_time = '10:00'
178
- end_time = '12:00'
179
 
180
  # Step 1: Check availability
181
- is_available = check_availability(tomorrow_str, start_time, end_time)
182
 
183
- # Step 2: Make appointment if available
184
  if is_available:
185
- appointment_result = make_appointment(tomorrow_str, start_time, end_time, "Meeting with Thesis Supervisor")
186
-
187
- # Step 3: Add to reminders if appointment is made
188
- if appointment_result['appointment_made']:
189
- reminder_text = f"Appointment made for {appointment_result['day']} from {appointment_result['start_time']} to {appointment_result['end_time']}."
190
- add_to_reminders(reminder_text)
 
 
 
191
  ```
192
 
193
  This code will first determine if the specified time slot is available tomorrow. If it is, it will attempt to make the appointment and then add it to the reminders if successful.
@@ -203,21 +203,23 @@ We evaluate the model on the following benchmarks:
203
 
204
  Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B***, ***Dria-Agent-α-7B***, and ***gpt-4o-2024-11-20***
205
 
206
- | Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-a-7B | gpt-4o-2024-11-20 (Prompt) |
207
- |---------------------------------------|-----------|-----------|-----------|-----------|
208
- | **Non-Live Simple AST** | 75.50% | 75.08% | 77.83% | 79.42% |
209
- | **Non-Live Multiple AST** | 90.00% | 93.00% | 94.50% | 95.50% |
210
- | **Non-Live Parallel AST** | 80.00% | 85.00% | 87.00% | 94.00% |
211
- | **Non-Live Parallel Multiple AST** | 78.50% | 79.00% | 88.00% | 83.50% |
212
- | **Non-Live Simple Exec** | 82.07% | 87.57% | 80.00% | 100.00% |
213
- | **Non-Live Multiple Exec** | 86.00% | 85.14% | 84.00% | 94.00% |
214
- | **Non-Live Parallel Exec** | 82.00% | 90.00% | 70.00% | 86.00% |
215
- | **Non-Live Parallel Multiple Exec** | 80.00% | 88.00% | 65.00% | 77.50% |
216
- | **Live Simple AST** | 68.22% | 70.16% | 82.95% | 83.72% |
217
- | **Live Multiple AST** | 66.00% | 67.14% | 78.25% | 79.77% |
218
- | **Live Parallel AST** | 62.50% | 50.00% | 81.25% | 87.50% |
219
- | **Live Parallel Multiple AST** | 66.67% | 70.83% | 70.83% | 70.83% |
220
- | **Relevance Detection** | 88.89% | 100.00% | 100.00% | 83.33% |
 
 
221
 
222
  and the MMLU-Pro and DPAB results:
223
 
 
168
  ```python
169
  from datetime import datetime, timedelta
170
 
171
+ # Get tomorrow's date
172
+ tomorrow = (datetime.now() + timedelta(days=1)).strftime("%Y-%m-%d")
 
 
173
 
174
  # Define the time slots
175
+ start_time = "10:00"
176
+ end_time = "12:00"
177
 
178
  # Step 1: Check availability
179
+ is_available = check_availability(tomorrow, start_time, end_time)
180
 
 
181
  if is_available:
182
+ # Step 2: Make the appointment
183
+ appointment_details = make_appointment(tomorrow, start_time, end_time, "Meeting with thesis supervisor")
184
+
185
+ if appointment_details['appointment_made']:
186
+ # Step 3: Add to reminders
187
+ reminder_text = f"Appointment with thesis supervisor scheduled for {tomorrow} from {start_time} to {end_time}."
188
+ add_to_reminders(reminder_text)
189
+ else:
190
+ appointment_details = {"day": tomorrow, "start_time": start_time, "end_time": end_time, "appointment_made": False}
191
  ```
192
 
193
  This code will first determine if the specified time slot is available tomorrow. If it is, it will attempt to make the appointment and then add it to the reminders if successful.
 
203
 
204
  Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B***, ***Dria-Agent-α-7B***, and ***gpt-4o-2024-11-20***
205
 
206
+ | Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-7B | gpt-4o-2024-11-20 (Prompt) |
207
+ |---------------------------------------|----------------------------|-------------------|-------------------|---------------------------|
208
+ | **Non-Live Simple AST** | 75.50% | 75.08% | 77.58% | 79.42% |
209
+ | **Non-Live Multiple AST** | 90.00% | 93.00% | 94.00% | 95.50% |
210
+ | **Non-Live Parallel AST** | 80.00% | 85.00% | 93.50% | 94.00% |
211
+ | **Non-Live Parallel Multiple AST** | 78.50% | 79.00% | 89.50% | 83.50% |
212
+ | **Non-Live Simple Exec** | 82.07% | 87.57% | 93.29% | 100.00% |
213
+ | **Non-Live Multiple Exec** | 86.00% | 85.14% | 88.00% | 94.00% |
214
+ | **Non-Live Parallel Exec** | 82.00% | 90.00% | 88.00% | 86.00% |
215
+ | **Non-Live Parallel Multiple Exec** | 80.00% | 88.00% | 72.50% | 77.50% |
216
+ | **Live Simple AST** | 68.22% | 70.16% | 81.40% | 83.72% |
217
+ | **Live Multiple AST** | 66.00% | 67.14% | 78.73% | 79.77% |
218
+ | **Live Parallel AST** | 62.50% | 50.00% | 75.00% | 87.50% |
219
+ | **Live Parallel Multiple AST** | 66.67% | 70.83% | 62.50% | 70.83% |
220
+ | **Relevance Detection** | 88.89% | 100.00% | 100.00% | 83.33% |
221
+
222
+
223
 
224
  and the MMLU-Pro and DPAB results:
225