SumanX22 commited on
Commit
ac6a7fc
·
verified ·
1 Parent(s): f66464a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -1
README.md CHANGED
@@ -6,4 +6,66 @@ pipeline_tag: text-classification
6
  library_name: fasttext
7
  tags:
8
  - news
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  library_name: fasttext
7
  tags:
8
  - news
9
+ ---
10
+
11
+ Below is a sample README for your repository:
12
+
13
+ ---
14
+
15
+ # FastText News Categorization
16
+
17
+ FastText News Categorization is a simple, yet effective, project to classify news articles into different categories using Facebook’s FastText library. This repository contains scripts for data preprocessing, model training, evaluation, and prediction on news datasets.
18
+
19
+ ## Table of Contents
20
+
21
+ - [Overview](#overview)
22
+ - [Features](#features)
23
+ - [Usage](#usage)
24
+ - [Evaluating the Model](#evaluating-the-model)
25
+ - [Predicting Categories](#predicting-categories)
26
+ - [Dataset](#dataset)
27
+ - [Results](#results)
28
+ - [Contributing](#contributing)
29
+ - [License](#license)
30
+
31
+ ## Overview
32
+
33
+ In today’s digital age, automatically categorizing news articles is essential for improving content organization and enhancing information retrieval. This project leverages FastText to build a text classifier that categorizes news articles into predefined topics (e.g., politics, sports, technology, entertainment).
34
+
35
+ ## Features
36
+
37
+ - **Efficient Text Classification:** Utilizes FastText’s supervised learning approach for quick and accurate news categorization.
38
+ - **Easy Model Evaluation:** Evaluate its performance with minimal configuration.
39
+ - **Prediction Interface:** Run predictions on new articles to determine their categories.
40
+
41
+ #### Below is a list of news categories along with their definitions:
42
+ - **__label__POLITICS_AND_GOVERNMENT:** News related to political events, government policies, elections, and political analysis.
43
+ - **__label__BUSINESS_AND_ECONOMY:** News concerning economic trends, business updates, financial markets, and economic policies.
44
+ - **__label__CRIME_AND_JUSTICE:** News focusing on crime reports, legal cases, law enforcement actions, and judicial decisions.
45
+ - **__label__SPORTS:** News covering sports events, athlete performances, game results, and sports analysis.
46
+ - **__label__ENTERTAINMENT:** News related to movies, music, television, celebrity gossip, and cultural events.
47
+ - **__label__HEALTH_AND_SCIENCE:** News covering medical research, health trends, scientific discoveries, and wellness advice.
48
+ - **__label__ENVIRONMENT_AND_CLIMATE:** News addressing long-term environmental issues, climate change, conservation efforts, and sustainability.
49
+ - **__label__TECHNOLOGY:** News about technological advancements, new gadgets, software innovations, and IT trends.
50
+ - **__label__EDUCATION:** News concerning educational policies, academic research, school and university updates, and academic achievements.
51
+ - **__label__LIFESTYLE_AND_CULTURE:** News covering cultural trends, lifestyle, fashion, travel, and social commentary.
52
+ - **__label__DISASTER_AND_ACCIDENT:** News related to natural disasters, accidents, emergencies, and crisis events.
53
+ - **__label__SOCIAL_ISSUES:** News addressing societal challenges, human rights, public debates, and community concerns.
54
+ - **__label__MILITARY_AND_DEFENSE:** News covering military operations, defense policies, international conflicts, and security matters.
55
+ - **__label__WEATHER_AND_CLIMATE:** News focused on immediate weather updates, forecasts, and meteorological conditions.
56
+ - **__label__PROMOTIONAL:** Content intended for advertising, sponsored material, or promotional purposes.
57
+ - **__label__ARCHIVE:** News that is outdated or no longer relevant and is generally not considered worth sharing.
58
+ - **__label__MISCLENIOUS:** News that do not fit into other categories, encompassing miscellaneous topics.
59
+
60
+
61
+ ## Dataset
62
+
63
+ The default dataset used in this project is a collection of news articles with labeled categories. The model is trained on 140,000 news datasets.
64
+
65
+ ## Results
66
+
67
+ After training and evaluation, the model typically achieves an accuracy of around 85-90% on the test set (depending on the dataset and preprocessing quality). Detailed evaluation reports are generated and saved in the `results/` directory.
68
+
69
+ ## License
70
+
71
+ This project is licensed under the [Apache 2.0 License](LICENSE).