|
# Content Classification LoRA Adapter for Gemma-2B |
|
|
|
A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning. |
|
|
|
Used in a pipeline. |
|
|
|
|
|
## Technical Specifications |
|
|
|
### Base Model |
|
- Model: unsloth/gemma-2b |
|
- LoRA Rank: 64 |
|
- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj |
|
- Task: CAUSAL_LM |
|
- Dropout: 0 |
|
- Alpha: 32 |
|
|
|
### Input/Output Format |
|
|
|
Input XML structure: |
|
```xml |
|
<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction> |
|
<suitable> |
|
<content>{input_text}</content> |
|
``` |
|
|
|
Output XML structure: |
|
```xml |
|
<thinking>{reasoning_process}</thinking> |
|
<category>{content_type}</category> |
|
<should_index>{true|false}</should_index> |
|
</suitable> |
|
|
|
``` |
|
|
|
The model then expects an indefinite list of `<suitable> ... </suitable>` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results. |
|
|
|
Your stop token should be `</suitable>`. |
|
|
|
## Deployment |
|
|
|
### VLLM Server Setup |
|
```bash |
|
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1 |
|
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 |
|
|
|
vllm serve unsloth/gemma-2-2b \ |
|
--gpu-memory-utilization=1 \ |
|
--port 6002 \ |
|
--served-model-name="gemma" \ |
|
--trust-remote-code \ |
|
--max-model-len 8192 \ |
|
--disable-log-requests \ |
|
--enable-lora \ |
|
--lora-modules lora=./dataset/output/unsloth/lora_model \ |
|
--max-lora-rank 64 |
|
``` |
|
|
|
### Processing Pipeline |
|
|
|
1. Install Dependencies: |
|
```bash |
|
pip install requests tqdm concurrent.futures |
|
``` |
|
|
|
2. Run Content Processor: |
|
```bash |
|
python process.py --input corpus.jsonl --output results.jsonl --threads 24 |
|
``` |
|
|
|
### Client Implementation |
|
|
|
```python |
|
import requests |
|
|
|
def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict: |
|
xml_content = ( |
|
'<instruction>Determine true or false if the following content is ' |
|
'suitable and should be indexed.</instruction>\n' |
|
'<suitable>\n' |
|
f' <content>{text}</content>' |
|
) |
|
|
|
response = requests.post( |
|
vllm_url, |
|
json={ |
|
"prompt": xml_content, |
|
"max_tokens": 6000, |
|
"temperature": 1, |
|
"model": "lora", |
|
"stop": ["</suitable>"] |
|
}, |
|
timeout=30000 |
|
) |
|
|
|
completion = response.json()["choices"][0]["text"] |
|
|
|
# Parse XML tags |
|
import re |
|
def extract_tag(tag: str) -> str: |
|
match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL) |
|
return match.group(1).strip() if match else "" |
|
|
|
return { |
|
"thinking": extract_tag("thinking"), |
|
"category": extract_tag("category"), |
|
"should_index": extract_tag("should_index") |
|
} |
|
``` |
|
|
|
### Example Usage |
|
|
|
```python |
|
text = """Multiservice Tactics, Techniques, and Procedures |
|
for |
|
Nuclear, Biological, and Chemical Aspects of Consequence |
|
Management |
|
|
|
TABLE OF CONTENTS...""" |
|
|
|
result = classify_content(text) |
|
print(result) |
|
``` |
|
|
|
Example output: |
|
```json |
|
{ |
|
"thinking": "This is a table of contents for a document, not the actual content.", |
|
"category": "table of contents", |
|
"should_index": "false" |
|
} |
|
``` |
|
|
|
## Batch Processing |
|
|
|
The included processor supports parallel processing of JSONL files: |
|
|
|
```python |
|
from request_processor import RequestProcessor |
|
|
|
processor = RequestProcessor( |
|
input_file="corpus.jsonl", |
|
output_file="results.jsonl", |
|
num_threads=24 |
|
) |
|
processor.process_file() |
|
``` |
|
|
|
Input JSONL format: |
|
```json |
|
{ |
|
"pid": "document_id", |
|
"docid": "path/to/source", |
|
"content": "document text", |
|
"metadata": { |
|
"key": "value" |
|
} |
|
} |
|
``` |
|
|
|
Output JSONL format: |
|
```json |
|
{ |
|
"pid": "document_id", |
|
"docid": "path/to/source", |
|
"content": "document text", |
|
"metadata": { |
|
"key": "value" |
|
}, |
|
"thinking": "reasoning process", |
|
"category": "content type", |
|
"should_index": "true/false", |
|
"processed_at": "2024-10-22 02:52:33" |
|
} |
|
``` |
|
|
|
## Implementation and Performance Considerations |
|
|
|
- Use thread pooling for parallel processing |
|
- Implement atomic writes with file locking |
|
- Progress tracking with tqdm |
|
- Automatic error handling and logging |
|
- Configurable thread count for optimization |
|
|
|
## Error Handling |
|
|
|
Errors are captured in the output JSONL: |
|
```json |
|
{ |
|
"error": "error message", |
|
"processed_at": "timestamp" |
|
} |
|
``` |
|
|
|
Monitor errors in real-time: |
|
```bash |
|
tail -f results.jsonl | grep error |
|
``` |