forcemultiplier
/

is-suitable-to-index-cot-xml-g2bb-adapter

Safetensors

Model card Files Files and versions Community

is-suitable-to-index-cot-xml-g2bb-adapter / README.md

fullstack

Upload folder using huggingface_hub

e0b1078 verified 5 months ago

preview code

raw

history blame

4.43 kB

	# Content Classification LoRA Adapter for Gemma-2B

	A LoRA adapter for unsloth/gemma-2b that determines content indexing suitability using chain-of-thought reasoning.

	Used in a pipeline.


	## Technical Specifications

	### Base Model
	- Model: unsloth/gemma-2b
	- LoRA Rank: 64
	- Target Modules: q_proj, up_proj, down_proj, gate_proj, o_proj, k_proj, v_proj
	- Task: CAUSAL_LM
	- Dropout: 0
	- Alpha: 32

	### Input/Output Format

	Input XML structure:
	```xml
	<instruction>Determine true or false if the following content is suitable and should be indexed.</instruction>
	<suitable>
	<content>{input_text}</content>
	```

	Output XML structure:
	```xml
	<thinking>{reasoning_process}</thinking>
	<category>{content_type}</category>
	<should_index>{true\|false}</should_index>
	</suitable>

	```

	The model then expects an indefinite list of `<suitable> ... </suitable>` that you may not want. But you can use this to do fewshots with incontext learning to correct a mistake or enhance the results.

	Your stop token should be `</suitable>`.

	## Deployment

	### VLLM Server Setup
	```bash
	export VLLM_ALLOW_RUNTIME_LORA_UPDATING=1
	export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1

	vllm serve unsloth/gemma-2-2b \
	--gpu-memory-utilization=1 \
	--port 6002 \
	--served-model-name="gemma" \
	--trust-remote-code \
	--max-model-len 8192 \
	--disable-log-requests \
	--enable-lora \
	--lora-modules lora=./dataset/output/unsloth/lora_model \
	--max-lora-rank 64
	```

	### Processing Pipeline

	1. Install Dependencies:
	```bash
	pip install requests tqdm concurrent.futures
	```

	2. Run Content Processor:
	```bash
	python process.py --input corpus.jsonl --output results.jsonl --threads 24
	```

	### Client Implementation

	```python
	import requests

	def classify_content(text: str, vllm_url: str = "http://localhost:6002/v1/completions") -> dict:
	xml_content = (
	'<instruction>Determine true or false if the following content is '
	'suitable and should be indexed.</instruction>\n'
	'<suitable>\n'
	f' <content>{text}</content>'
	)

	response = requests.post(
	vllm_url,
	json={
	"prompt": xml_content,
	"max_tokens": 6000,
	"temperature": 1,
	"model": "lora",
	"stop": ["</suitable>"]
	},
	timeout=30000
	)

	completion = response.json()["choices"][0]["text"]

	# Parse XML tags
	import re
	def extract_tag(tag: str) -> str:
	match = re.search(f'<{tag}>(.*?)</{tag}>', completion, re.DOTALL)
	return match.group(1).strip() if match else ""

	return {
	"thinking": extract_tag("thinking"),
	"category": extract_tag("category"),
	"should_index": extract_tag("should_index")
	}
	```

	### Example Usage

	```python
	text = """Multiservice Tactics, Techniques, and Procedures
	for
	Nuclear, Biological, and Chemical Aspects of Consequence
	Management

	TABLE OF CONTENTS..."""

	result = classify_content(text)
	print(result)
	```

	Example output:
	```json
	{
	"thinking": "This is a table of contents for a document, not the actual content.",
	"category": "table of contents",
	"should_index": "false"
	}
	```

	## Batch Processing

	The included processor supports parallel processing of JSONL files:

	```python
	from request_processor import RequestProcessor

	processor = RequestProcessor(
	input_file="corpus.jsonl",
	output_file="results.jsonl",
	num_threads=24
	)
	processor.process_file()
	```

	Input JSONL format:
	```json
	{
	"pid": "document_id",
	"docid": "path/to/source",
	"content": "document text",
	"metadata": {
	"key": "value"
	}
	}
	```

	Output JSONL format:
	```json
	{
	"pid": "document_id",
	"docid": "path/to/source",
	"content": "document text",
	"metadata": {
	"key": "value"
	},
	"thinking": "reasoning process",
	"category": "content type",
	"should_index": "true/false",
	"processed_at": "2024-10-22 02:52:33"
	}
	```

	## Implementation and Performance Considerations

	- Use thread pooling for parallel processing
	- Implement atomic writes with file locking
	- Progress tracking with tqdm
	- Automatic error handling and logging
	- Configurable thread count for optimization

	## Error Handling

	Errors are captured in the output JSONL:
	```json
	{
	"error": "error message",
	"processed_at": "timestamp"
	}
	```

	Monitor errors in real-time:
	```bash
	tail -f results.jsonl \| grep error
	```