Phramer_AI

Running on Zero

App Files Files Community

Phramer_AI / README.md

Malaji71

Update README.md

2f71b86 verified 2 months ago

preview code

raw

history blame contribute delete

8.19 kB

	---
	title: Phramer AI
	emoji: 🎬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.33.2
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- multimodal
	- image-to-prompt
	- flux
	- midjourney
	- generative-ai
	- computer-vision
	- cinematic
	- photography
	- bagel
	- pariente-ai
	---

	# Phramer AI
	By Pariente AI, for MIA TV Series

	Logline: Phramer AI is a multimodal tool that reads an image and turns it into a refined, photo-realistic prompt. Ready for Midjourney, Flux or any generative engine.

	## Overview

	Phramer AI is an advanced multimodal system developed by Pariente AI for the MIA TV Series creative pipeline.

	Upload any image, and Phramer AI will:
	- Analyze it deeply using a custom Bagel architecture
	- Generate a detailed semantic-visual description
	- Enhance it using a curated photographic knowledge base
	- Output a structured prompt with camera settings, composition hints, mood, and style — ready for Flux or other diffusion-based platforms

	Whether you're creating cinematic storyboards, photorealistic scenes, or exploring visual concepts, Phramer AI bridges the gap between image understanding and generative prompting.

	## Key Features

	### 🔍 Deep Multimodal Analysis
	- Custom Bagel-7B architecture for advanced image understanding
	- Semantic-visual analysis with professional photography insights
	- Context-aware scene detection and composition analysis

	### 🎯 Multi-Engine Optimization
	- Flux-ready prompts with technical specifications
	- Midjourney compatibility with style and mood descriptors
	- Universal format compatible with major generative engines

	### 📸 Professional Photography Knowledge
	- Curated database of camera settings and equipment
	- Lighting techniques and composition principles
	- Technical parameters optimized for photorealistic output

	### 🎬 Cinematic Focus
	- Designed for TV series and film production workflows
	- Storyboard and concept art optimization
	- Dramatic lighting and mood analysis

	## How It Works

	1. Image Upload - Support for JPG, PNG, WebP formats up to 1024px
	2. Bagel Analysis - Custom architecture analyzes visual content and composition
	3. Knowledge Enhancement - Professional photography database enriches the analysis
	4. Prompt Generation - Structured output with technical details and artistic direction
	5. Multi-Engine Ready - Copy and use in Flux, Midjourney, or any diffusion platform

	## Technical Specifications

	### Architecture
	- Base Model: Custom Bagel-7B multimodal architecture
	- Vision Processing: Advanced semantic-visual understanding
	- Knowledge Integration: Professional photography database with 30+ years expertise
	- Output Optimization: Multi-engine compatibility layer

	### Processing Pipeline
	- Image Preprocessing: Automatic optimization and format conversion
	- Multimodal Analysis: Deep scene understanding with technical assessment
	- Professional Enhancement: Camera, lighting, and composition recommendations
	- Prompt Structuring: Organized output with technical and artistic elements

	### Supported Platforms
	- Flux - Primary optimization target with technical specifications
	- Midjourney - Style and mood descriptors
	- Stable Diffusion - Technical parameter integration
	- Other Engines - Universal prompt format compatibility

	## Use Cases

	### 🎬 Film & TV Production
	- Storyboard creation and visualization
	- Concept art development
	- Scene planning and mood reference
	- Visual consistency across episodes

	### 📸 Photography Reference
	- Lighting setup recreation
	- Camera configuration guidance
	- Composition analysis and improvement
	- Technical parameter optimization

	### 🎨 Creative Development
	- Visual concept exploration
	- Style reference generation
	- Mood and atmosphere studies
	- Character and environment design

	### 💼 Commercial Applications
	- Product visualization
	- Marketing material creation
	- Brand consistency maintenance
	- Commercial photography planning

	## Example Workflow

	```
	Input: Portrait photograph of a person in dramatic lighting

	Phramer AI Analysis:
	├── Scene Detection: Studio portrait with dramatic side lighting
	├── Technical Analysis: Professional setup with controlled lighting
	├── Camera Recommendation: Canon EOS R5 with 85mm f/1.4 lens
	└── Enhancement: Cinematic mood with film-quality specifications

	Output Prompt:
	"A cinematic portrait of [subject description], shot on Canon EOS R5
	with 85mm f/1.4 lens at f/2.8, dramatic side lighting with subtle rim
	light, professional studio setup, film grain, photorealistic,
	ultra-detailed, commercial photography style"
	```

	## Quality Scoring

	Phramer AI evaluates generated prompts across multiple dimensions:

	- Prompt Quality (25%) - Content detail and description accuracy
	- Technical Details (25%) - Camera settings and equipment specifications
	- Professional Photography (25%) - Lighting, composition, and technical expertise
	- Multi-Engine Optimization (25%) - Compatibility and enhancement features

	Scores range from 0-100 with grades from POOR to LEGENDARY.

	## Installation & Usage

	### Requirements
	- Python 3.8+
	- CUDA-compatible GPU (recommended)
	- 8GB+ RAM
	- Internet connection for model access

	### Local Setup
	```bash
	git clone [repository-url]
	cd phramer-ai
	pip install -r requirements.txt
	python app.py
	```

	### Cloud Usage
	Available on Hugging Face Spaces with instant access - no installation required.

	## API Integration

	Phramer AI provides a simple API for integration into existing workflows:

	```python
	from phramer import PhramerlAI

	phramer = PhramerAI()
	prompt, metadata = phramer.analyze_image("path/to/image.jpg")
	print(f"Generated prompt: {prompt}")
	```

	## Performance

	- Average Processing Time: 2-4 seconds per image
	- Supported Image Size: Up to 1024x1024 pixels
	- Batch Processing: Multiple images with queue management
	- Memory Optimization: Automatic cleanup and resource management

	## Roadmap

	### Version 2.1 (Coming Soon)
	- Video frame analysis
	- Batch processing improvements
	- Additional engine-specific optimizations
	- Enhanced cinematic analysis

	### Version 2.2 (Planned)
	- Style transfer integration
	- Custom knowledge base training
	- API rate limiting and authentication
	- Advanced composition analysis

	## Technical Details

	### Model Architecture
	- Bagel-7B Base: Advanced vision-language model
	- Custom Training: Optimized for prompt generation
	- Knowledge Integration: Professional photography database
	- Multi-Modal Processing: Image + text understanding

	### Optimization Features
	- Memory Efficient: Automatic resource management
	- GPU Acceleration: CUDA optimization when available
	- Batch Processing: Multiple image support
	- Error Handling: Robust fallback systems

	## Contributing

	We welcome contributions to improve Phramer AI:

	1. Fork the repository
	2. Create a feature branch
	3. Submit a pull request with detailed description
	4. Follow coding standards and include tests

	## License

	Apache 2.0 - See LICENSE file for details.

	## Support

	For technical support, feature requests, or collaboration inquiries:

	- Technical Issues: Create an issue in the repository
	- Feature Requests: Submit detailed proposals
	- Commercial Licensing: Contact Pariente AI
	- MIA TV Series Integration: Production team coordination

	## Credits

	Phramer AI is developed by Pariente AI specifically for the MIA TV Series production pipeline.

	### Core Technologies
	- Bagel-7B multimodal architecture
	- Professional photography knowledge base
	- Advanced prompt optimization algorithms
	- Multi-engine compatibility layer

	### Research & Development
	- Pariente AI - Advanced multimodal AI research
	- MIA TV Series - Creative pipeline integration
	- Professional Photography Consultants - 30+ years expertise database
	- Community Contributors - Feature improvements and testing

	---

	Pariente AI • Advanced Multimodal AI Research & Development • MIA TV Series

	Bridging the gap between image understanding and generative prompting