Spaces:
Sleeping
Sleeping
File size: 5,677 Bytes
3bf50ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
title: "News Summarizer & TTS"
emoji: "📰"
colorFrom: "blue"
colorTo: "green"
sdk: "streamlit"
app_file: "app.py"
pinned: false
---
# News Summarisation and Hindi TTS Application
## Project Overview
This project is a web-based application that extracts news articles from multiple sources for a given company, summarizes the articles using advanced NLP techniques (with both Transformer-based and fallback methods), performs sentiment analysis with visual graphs, translates the generated summary to Hindi, and finally converts the Hindi summary into an audio file via text-to-speech (TTS). The application is built using FastAPI for the backend and Streamlit for the frontend, ensuring a smooth and interactive user experience.
## Features
- **News Extraction:**
Extracts news articles from multiple sources using web scraping techniques.
- **Summarization:**
Generates a combined summary using a Transformer-based summarizer (with fallback to Sumy if needed).
- **Sentiment Analysis:**
Analyzes the sentiment of the news content and visualizes the comparative sentiment (Positive, Negative, Neutral) as a bar graph using matplotlib.
- **Translation:**
Translates the summary from English to Hindi using googletrans for improved quality.
- **Text-to-Speech (TTS):**
Converts the Hindi summary into an audio file using Edge TTS.
## Setup Instructions
### Dependencies
Install all required packages using the command below:
```bash
pip install fastapi uvicorn streamlit transformers newspaper3k beautifulsoup4 edge-tts selenium webdriver-manager spacy nltk sumy sacremoses requests googletrans==4.0.0-rc1 matplotlib
python -m spacy download en_core_web_sm
python -c "import nltk; nltk.download('vader_lexicon'); nltk.download('punkt')"
```
### Running the FastAPI Backend
In your project directory, run:
```bash
uvicorn api:app --reload
```
This will start the backend server at [http://127.0.0.1:8000](http://127.0.0.1:8000).
### Running the Streamlit Frontend
In another terminal (or a new tab), run:
```bash
streamlit run streamlit_app.py
```
This will launch the web interface where you can input a company name and interact with the application.
## Project Structure
- **`api.py`**
Contains the FastAPI application which exposes endpoints for processing news, generating summaries, performing sentiment analysis, translating summaries to Hindi, and creating TTS audio.
- **`utils.py`**
Houses utility functions for:
- Extracting articles from news URLs.
- Generating combined summaries using Transformer models with Sumy as a fallback.
- Translating text to Hindi using googletrans.
- Performing comparative sentiment analysis and generating a matplotlib bar chart.
- Generating TTS audio from the Hindi summary.
- **`streamlit_app.py`**
Provides a simple and interactive web-based interface using Streamlit. Users can input a company name, view extracted news and summaries, see the sentiment analysis graph, and play the generated TTS audio.
- **`scrapes.py`**
Contains functions for scraping valid news URLs and extracting article content from web pages.
- **`sentiV_v2.py`**
Implements sentiment analysis on the article content using both NLTK’s VADER and Transformer-based methods.
- **`tts_hindi_edgetts.py`**
Utilizes Edge TTS to convert text to speech and saves the output as an audio file.
- **`.gitignore`**
Ensures that large or unnecessary files (like the virtual environment folder `venv/`) are not tracked by Git.
## Deployment Details
The application can be deployed on platforms like [Hugging Face Spaces](https://huggingface.co/spaces), Heroku, or Render. For example, if deployed on Hugging Face Spaces:
- The repository is linked to a new Space.
- The Streamlit interface is used as the main application.
- The deployment link (e.g., `https://huggingface.co/spaces/your-username/news-summarisation`) will be provided in the repository README for access.
## Usage Instructions
1. **Launch the Application:**
Run the FastAPI backend and Streamlit frontend as described above.
2. **Input a Company Name:**
On the Streamlit interface, enter the name of a company (e.g., "Tesla", "Netflix") and click the "Fetch News" button.
3. **View Results:**
- **News Articles:**
See a list of extracted news articles along with their metadata (title, URL, date, sentiment, excerpt).
- **Sentiment Analysis:**
View the comparative sentiment counts and a bar chart visualizing the distribution of positive, negative, and neutral articles.
- **Summaries:**
Read the combined summary of the news and the translated Hindi summary.
- **Audio:**
Play the TTS-generated audio of the Hindi summary.
## Limitations & Future Improvements
### Limitations:
- Reliance on web scraping can sometimes result in incomplete article extraction due to website restrictions.
- The summarization and translation quality might vary based on input length and complexity.
- TTS accuracy depends on the Edge TTS service and may not always be perfect.
### Future Improvements:
- Integrate more robust error handling and fallback mechanisms.
- Enhance the UI for better user experience.
- Expand the number of news sources and improve the filtering of relevant content.
- Implement caching to reduce API call latency.
- Explore additional TTS options for higher quality audio output.
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please see the [CONTRIBUTING](CONTRIBUTING.md) file for guidelines on how to contribute to this project.
|