georgeck commited on
Commit
2d9aa59
·
1 Parent(s): c1e0b35

Updated Model Card

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -119,30 +119,31 @@ print(summary)
119
  This model was fine-tuned on the [georgeck/hacker-news-discussion-summarization-large](https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large) dataset, which contains 14,531 records of Hacker News front-page stories and their associated discussion threads.
120
 
121
  The dataset includes:
122
- - 13,077 training examples
123
- - 1,454 test examples
124
  - Structured representations of hierarchical comment threads
125
  - Normalized scoring system that represents comment importance
126
  - Comprehensive metadata about posts and comments
127
 
128
- Each example includes a post title, author information, timestamps, and a structured representation of the comment thread with information about comment scores, reply counts, and downvotes.
129
 
130
  ### Training Procedure
131
 
132
  #### Preprocessing
133
 
134
  - The hierarchical comment structure was preserved using a standardized format
135
- - Comments were filtered based on downvote counts, with heavily downvoted content (4+ downvotes) excluded
136
  - A normalized scoring system (1-1000) was applied to represent each comment's relative importance
137
  - Comments were organized to maintain their hierarchical relationships
138
 
 
 
139
  ## Evaluation
140
 
141
  ### Testing Data, Factors & Metrics
142
 
143
  #### Testing Data
144
 
145
- The model was evaluated on the test split of the georgeck/hacker-news-discussion-summarization-large dataset, comprising 1,454 examples of Hacker News discussions and summaries.
146
 
147
  #### Factors
148
 
@@ -157,7 +158,8 @@ Evaluation considered:
157
 
158
  ### Model Architecture and Objective
159
 
160
- This model is based on Llama-3.1-8B-Instruct, a causal language model. The primary training objective was to generate structured summaries of hierarchical discussion threads that capture the most important themes, perspectives, and insights while maintaining proper attribution.
 
161
 
162
  The model was trained to specifically understand and process the hierarchical structure of Hacker News comments, including their scoring system, reply counts, and downvote information to appropriately weight content importance.
163
 
 
119
  This model was fine-tuned on the [georgeck/hacker-news-discussion-summarization-large](https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large) dataset, which contains 14,531 records of Hacker News front-page stories and their associated discussion threads.
120
 
121
  The dataset includes:
122
+ - 6,300 training examples
123
+ - 700 test examples
124
  - Structured representations of hierarchical comment threads
125
  - Normalized scoring system that represents comment importance
126
  - Comprehensive metadata about posts and comments
127
 
128
+ Each example includes a post title, and a structured representation of the comment thread with information about comment scores, reply counts, and downvotes.
129
 
130
  ### Training Procedure
131
 
132
  #### Preprocessing
133
 
134
  - The hierarchical comment structure was preserved using a standardized format
 
135
  - A normalized scoring system (1-1000) was applied to represent each comment's relative importance
136
  - Comments were organized to maintain their hierarchical relationships
137
 
138
+ The training was done by using [OpenPipe](https://openpipe.ai/) infrastructure.
139
+
140
  ## Evaluation
141
 
142
  ### Testing Data, Factors & Metrics
143
 
144
  #### Testing Data
145
 
146
+ The model was evaluated on the test split of the georgeck/hacker-news-discussion-summarization-large dataset.
147
 
148
  #### Factors
149
 
 
158
 
159
  ### Model Architecture and Objective
160
 
161
+ This model is based on Llama-3.1-8B-Instruct, a causal language model.
162
+ The primary training objective was to generate structured summaries of hierarchical discussion threads that capture the most important themes, perspectives, and insights while maintaining proper attribution.
163
 
164
  The model was trained to specifically understand and process the hierarchical structure of Hacker News comments, including their scoring system, reply counts, and downvote information to appropriately weight content importance.
165