Updated Model Card
Browse files
README.md
CHANGED
@@ -119,30 +119,31 @@ print(summary)
|
|
119 |
This model was fine-tuned on the [georgeck/hacker-news-discussion-summarization-large](https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large) dataset, which contains 14,531 records of Hacker News front-page stories and their associated discussion threads.
|
120 |
|
121 |
The dataset includes:
|
122 |
-
-
|
123 |
-
-
|
124 |
- Structured representations of hierarchical comment threads
|
125 |
- Normalized scoring system that represents comment importance
|
126 |
- Comprehensive metadata about posts and comments
|
127 |
|
128 |
-
Each example includes a post title,
|
129 |
|
130 |
### Training Procedure
|
131 |
|
132 |
#### Preprocessing
|
133 |
|
134 |
- The hierarchical comment structure was preserved using a standardized format
|
135 |
-
- Comments were filtered based on downvote counts, with heavily downvoted content (4+ downvotes) excluded
|
136 |
- A normalized scoring system (1-1000) was applied to represent each comment's relative importance
|
137 |
- Comments were organized to maintain their hierarchical relationships
|
138 |
|
|
|
|
|
139 |
## Evaluation
|
140 |
|
141 |
### Testing Data, Factors & Metrics
|
142 |
|
143 |
#### Testing Data
|
144 |
|
145 |
-
The model was evaluated on the test split of the georgeck/hacker-news-discussion-summarization-large dataset
|
146 |
|
147 |
#### Factors
|
148 |
|
@@ -157,7 +158,8 @@ Evaluation considered:
|
|
157 |
|
158 |
### Model Architecture and Objective
|
159 |
|
160 |
-
This model is based on Llama-3.1-8B-Instruct, a causal language model.
|
|
|
161 |
|
162 |
The model was trained to specifically understand and process the hierarchical structure of Hacker News comments, including their scoring system, reply counts, and downvote information to appropriately weight content importance.
|
163 |
|
|
|
119 |
This model was fine-tuned on the [georgeck/hacker-news-discussion-summarization-large](https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large) dataset, which contains 14,531 records of Hacker News front-page stories and their associated discussion threads.
|
120 |
|
121 |
The dataset includes:
|
122 |
+
- 6,300 training examples
|
123 |
+
- 700 test examples
|
124 |
- Structured representations of hierarchical comment threads
|
125 |
- Normalized scoring system that represents comment importance
|
126 |
- Comprehensive metadata about posts and comments
|
127 |
|
128 |
+
Each example includes a post title, and a structured representation of the comment thread with information about comment scores, reply counts, and downvotes.
|
129 |
|
130 |
### Training Procedure
|
131 |
|
132 |
#### Preprocessing
|
133 |
|
134 |
- The hierarchical comment structure was preserved using a standardized format
|
|
|
135 |
- A normalized scoring system (1-1000) was applied to represent each comment's relative importance
|
136 |
- Comments were organized to maintain their hierarchical relationships
|
137 |
|
138 |
+
The training was done by using [OpenPipe](https://openpipe.ai/) infrastructure.
|
139 |
+
|
140 |
## Evaluation
|
141 |
|
142 |
### Testing Data, Factors & Metrics
|
143 |
|
144 |
#### Testing Data
|
145 |
|
146 |
+
The model was evaluated on the test split of the georgeck/hacker-news-discussion-summarization-large dataset.
|
147 |
|
148 |
#### Factors
|
149 |
|
|
|
158 |
|
159 |
### Model Architecture and Objective
|
160 |
|
161 |
+
This model is based on Llama-3.1-8B-Instruct, a causal language model.
|
162 |
+
The primary training objective was to generate structured summaries of hierarchical discussion threads that capture the most important themes, perspectives, and insights while maintaining proper attribution.
|
163 |
|
164 |
The model was trained to specifically understand and process the hierarchical structure of Hacker News comments, including their scoring system, reply counts, and downvote information to appropriately weight content importance.
|
165 |
|