julsCadenas
/

summarize-reddit

Model card Files Files and versions Community

julsCadenas commited on Feb 2

Commit

d2260f4

·

verified ·

1 Parent(s): f09504f

Update README.md

Files changed (1) hide show

README.md +64 -0

README.md CHANGED Viewed

@@ -48,6 +48,70 @@ This project uses a fine-tuned version of the BART model from Facebook for summa
 - **Original Model:** [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
 - **Fine-Tuned Model:** [julsCadenas/summarize-reddit](https://huggingface.co/julsCadenas/summarize-reddit)
 ## **Model Evaluation**
 ### **ROGUE-1 SCORES:**

 - **Original Model:** [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
 - **Fine-Tuned Model:** [julsCadenas/summarize-reddit](https://huggingface.co/julsCadenas/summarize-reddit)
+## **Usage**
+```python
+from transformers import pipeline
+class Summarize:
+    def __init__(self):
+        self.summarizer = pipeline(
+            "summarization",
+            model = "julsCadenas/summarize-reddit",
+            tokenizer = "julsCadenas/summarize-reddit",
+        )
+    def summarize(self, text, prompt):
+        inputs = f"{prompt}: {text}"
+        input_tokens = self.summarizer.tokenizer.encode(inputs, truncation=False)
+        input_len = len(input_tokens)
+        max_length = min(input_len * 2, 1024) # change depending on your use case
+        min_length = max(32, input_len // 4) # change depending on your use case
+        summary = self.summarizer(
+            inputs,
+            max_length=max_length,
+            min_length=min_length,
+            length_penalty=2.0,
+            num_beams=4,
+        )
+        return summary[0]['summary_text']
+    def process_data(self, response, prompt):
+        post_content = response[0]['data']['children'][0]['data'].get('selftext', '')
+        comments = []
+        for comment in response[1]['data']['children']:
+            if 'body' in comment['data']:
+                comments.append(comment['data']['body'])
+        comments_all = ' '.join(comments)
+        post_summary = self.summarize(post_content, prompt)
+        comments_summary = self.summarize(comments_all, prompt)
+        return {
+            "post_summary": post_summary,
+            "comments_summary": comments_summary
+        }
+```
+- You can also use a script to format the JSON
+```python
+def fix_json(jsonfile, path):
+    improper_json = jsonfile
+    fixed_json = json.loads(improper_json)
+    fixed_post_summary = json.loads(fixed_json['post_summary'])
+    fixed_comments_summary = json.loads(fixed_json['comments_summary'])
+    fixed_json['post_summary'] = fixed_post_summary
+    fixed_json['comments_summary'] = fixed_comments_summary
+    print(json.dumps(fixed_json, indent=4))
+    with open(path, 'w') as file:
+        json.dump(fixed_json, file, indent=4)
+```
 ## **Model Evaluation**
 ### **ROGUE-1 SCORES:**