Spaces:
Running
Running
| # π Speed Optimized Summarization with DistilBART | |
| The BART model is quite large (~1.6GB) and slow. I optimized it with a much faster, lighter model and better performance settings. | |
| --- | |
| ## π Major Speed Optimizations Applied | |
| ### 1. Faster Model | |
| - **Switched from** `facebook/bart-large-cnn` (**~1.6GB**) | |
| - **To** `sshleifer/distilbart-cnn-12-6` (**~400MB**) | |
| - π₯ **6x smaller model size** = Much faster loading and inference | |
| ### 2. Processing Optimizations | |
| - **Smaller chunks:** 512 words vs 900 (faster processing) | |
| - **Limited chunks:** Max 5 chunks processed (prevents hanging on huge docs) | |
| - **Faster tokenization:** Word count instead of full tokenization for chunking | |
| - **Reduced beam search:** 2 beams instead of 4 (2x faster) | |
| ### 3. Smart Summarization | |
| - **Shorter summaries:** Reduced max lengths across all modes | |
| - **Skip final summary:** For documents with β€2 chunks (saves time) | |
| - **Early stopping:** Enabled for faster convergence | |
| - **Progress tracking:** Shows which chunk is being processed | |
| ### 4. Memory & Performance | |
| - **Float16 precision:** Used when GPU is available (faster inference) | |
| - **Optimized pipeline:** Better model loading with fallback | |
| - **`optimum` library added:** For additional speed improvements | |
| --- | |
| ## β‘ Expected Speed Improvements | |
| | Task | Before | After | | |
| |-------------------|----------------------|------------------------------| | |
| | Model loading | ~30+ seconds | ~10 seconds | | |
| | PDF processing | Minutes | ~5β15 seconds | | |
| | Memory usage | ~1.6GB | ~400MB | | |
| | Overall speed | Slow | π 5β10x faster | | |
| --- | |
| ## 𧬠What is DistilBART? | |
| **DistilBART** is a **compressed version of the BART model** designed to be **lighter and faster** while retaining most of BARTβs performance. Itβs the result of **model distillation**, where a smaller model (the *student*) learns from a larger one (the *teacher*), in this case, `facebook/bart-large`. | |
| | Attribute | Description | | |
| |------------------|---------------------------------------------------------------------| | |
| | **Full Name** | Distilled BART | | |
| | **Base Model** | `facebook/bart-large` | | |
| | **Distilled By** | Hugging Face π€ | | |
| | **Purpose** | Faster inference and smaller footprint for tasks like summarization | | |
| | **Architecture** | Encoder-decoder Transformer, like BART, but with fewer layers | | |
| --- | |
| ## βοΈ Key Differences: BART vs DistilBART | |
| | Feature | BART (Large) | DistilBART | | |
| |----------------|--------------|------------------------| | |
| | Encoder Layers | 12 | 6 | | |
| | Decoder Layers | 12 | 6 | | |
| | Parameters | ~406M | ~222M | | |
| | Model Size | ~1.6GB | ~400MB (~55% smaller) | | |
| | Speed | Slower | ~2x faster | | |
| | Performance | Very high | Slight drop (~1β2%) | | |
| --- | |
| ## π― Use Cases | |
| - β **Text Summarization** (primary use case) | |
| - π **Translation** (basic use) | |
| - β‘ Ideal for **edge devices** or **real-time systems** where speed & size matter | |
| --- | |
| ## π§ͺ Example: Summarization with DistilBART | |
| You can easily use DistilBART with Hugging Face Transformers: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM | |
| # Load pretrained DistilBART model | |
| tokenizer = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6") | |
| model = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6") | |
| # Input text | |
| ARTICLE = "The Indian Space Research Organisation (ISRO) launched a new satellite today from the Satish Dhawan Space Centre..." | |
| # Tokenize and summarize | |
| inputs = tokenizer([ARTICLE], max_length=1024, return_tensors="pt", truncation=True) | |
| summary_ids = model.generate( | |
| inputs["input_ids"], | |
| max_length=150, | |
| min_length=40, | |
| length_penalty=2.0, | |
| num_beams=4, | |
| early_stopping=True | |
| ) | |
| print(tokenizer.decode(summary_ids[0], skip_special_tokens=True)) | |
| ```` | |
| --- | |
| ## π¦ Available Variants | |
| | Model Name | Task | Description | | |
| | --------------------------------- | ---------------------------- | ---------------------------------------- | | |
| | `sshleifer/distilbart-cnn-12-6` | Summarization | Distilled from `facebook/bart-large-cnn` | | |
| | `philschmid/distilbart-xsum-12-6` | Summarization (XSUM dataset) | Short, abstractive summaries | | |
| π [Find more on Hugging Face Model Hub](https://huggingface.co/models?search=distilbart) | |
| --- | |
| ## π Summary | |
| * π§ **DistilBART** is a distilled, faster version of **BART** | |
| * π§© Ideal for summarization tasks with lower memory and latency requirements | |
| * π‘ Trained using **knowledge distillation** from `facebook/bart-large` | |
| * βοΈ Works well in apps needing faster performance without significant loss in quality | |
| --- | |
| β **Try it now β it should be significantly faster!** πββοΈπ¨ | |
| ``` | |
| Thank You | |
| ``` | |