Malaji71 commited on
Commit
2f71b86
Β·
verified Β·
1 Parent(s): 21abf5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +223 -34
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
- title: Frame 0 Laboratory for MIA
3
- emoji: πŸ”¬
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
@@ -9,57 +9,246 @@ app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  tags:
12
- - image-analysis
 
13
  - flux
14
- - prompt-optimization
 
15
  - computer-vision
16
- - ai
 
 
 
17
  ---
18
 
19
- # Frame 0 Laboratory for MIA
 
 
 
20
 
21
- Advanced image analysis and FLUX prompt optimization tool powered by state-of-the-art vision-language models.
 
 
 
 
 
 
 
 
22
 
23
- ## Features
24
 
25
- - **Advanced Image Analysis**: Uses Florence-2 model for detailed image understanding
26
- - **FLUX Optimization**: Applies proven rules for optimal FLUX image generation
27
- - **Professional Prompts**: Adds camera settings, lighting, and technical parameters
28
- - **Quality Scoring**: Multi-dimensional evaluation of prompt quality
29
- - **Clean Interface**: Simple, professional Gradio interface
 
30
 
31
- ## How it Works
 
 
 
32
 
33
- 1. **Upload Image**: Support for JPG, PNG, WebP formats up to 1024px
34
- 2. **AI Analysis**: Florence-2 model analyzes content, composition, and style
35
- 3. **FLUX Optimization**: Applies camera configurations and lighting setups
36
- 4. **Quality Score**: Evaluates prompt across multiple quality dimensions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Technical Details
39
 
40
- - **Model**: Microsoft Florence-2 for vision-language understanding
41
- - **Processing**: CPU/GPU adaptive with memory optimization
42
- - **Interface**: Gradio 4.44.0 with responsive design
43
- - **Optimization**: FLUX-specific rules and enhancements
 
44
 
45
- ## Usage
 
 
 
 
46
 
47
- Simply upload an image and click "Analyze Image" to get:
48
- - Optimized FLUX prompt ready for image generation
49
- - Detailed analysis report with technical specifications
50
- - Quality score with breakdown across different dimensions
51
 
52
- ## Architecture
53
 
54
- - **Modular Design**: Clean separation of concerns
55
- - **Error Handling**: Robust error management at every step
56
- - **Memory Efficient**: Automatic cleanup and optimization
57
- - **Scalable**: Ready for batch processing and API integration
58
 
59
  ## License
60
 
61
- Apache 2.0 - See LICENSE file for details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ---
64
 
65
- *Frame 0 Laboratory for MIA - Advanced AI Research & Development*
 
 
 
1
  ---
2
+ title: Phramer AI
3
+ emoji: 🎬
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
 
9
  pinned: false
10
  license: apache-2.0
11
  tags:
12
+ - multimodal
13
+ - image-to-prompt
14
  - flux
15
+ - midjourney
16
+ - generative-ai
17
  - computer-vision
18
+ - cinematic
19
+ - photography
20
+ - bagel
21
+ - pariente-ai
22
  ---
23
 
24
+ # Phramer AI
25
+ *By Pariente AI, for MIA TV Series*
26
+
27
+ **Logline:** Phramer AI is a multimodal tool that reads an image and turns it into a refined, photo-realistic prompt. Ready for Midjourney, Flux or any generative engine.
28
 
29
+ ## Overview
30
+
31
+ **Phramer AI** is an advanced multimodal system developed by **Pariente AI** for the **MIA TV Series** creative pipeline.
32
+
33
+ Upload any image, and Phramer AI will:
34
+ - **Analyze it deeply** using a custom Bagel architecture
35
+ - **Generate a detailed semantic-visual description**
36
+ - **Enhance it** using a curated photographic knowledge base
37
+ - **Output a structured prompt** with camera settings, composition hints, mood, and style β€” ready for **Flux** or other diffusion-based platforms
38
 
39
+ Whether you're creating cinematic storyboards, photorealistic scenes, or exploring visual concepts, Phramer AI bridges the gap between image understanding and generative prompting.
40
 
41
+ ## Key Features
42
+
43
+ ### πŸ” **Deep Multimodal Analysis**
44
+ - Custom Bagel-7B architecture for advanced image understanding
45
+ - Semantic-visual analysis with professional photography insights
46
+ - Context-aware scene detection and composition analysis
47
 
48
+ ### 🎯 **Multi-Engine Optimization**
49
+ - **Flux-ready prompts** with technical specifications
50
+ - **Midjourney compatibility** with style and mood descriptors
51
+ - **Universal format** compatible with major generative engines
52
 
53
+ ### πŸ“Έ **Professional Photography Knowledge**
54
+ - Curated database of camera settings and equipment
55
+ - Lighting techniques and composition principles
56
+ - Technical parameters optimized for photorealistic output
57
+
58
+ ### 🎬 **Cinematic Focus**
59
+ - Designed for TV series and film production workflows
60
+ - Storyboard and concept art optimization
61
+ - Dramatic lighting and mood analysis
62
+
63
+ ## How It Works
64
+
65
+ 1. **Image Upload** - Support for JPG, PNG, WebP formats up to 1024px
66
+ 2. **Bagel Analysis** - Custom architecture analyzes visual content and composition
67
+ 3. **Knowledge Enhancement** - Professional photography database enriches the analysis
68
+ 4. **Prompt Generation** - Structured output with technical details and artistic direction
69
+ 5. **Multi-Engine Ready** - Copy and use in Flux, Midjourney, or any diffusion platform
70
+
71
+ ## Technical Specifications
72
+
73
+ ### Architecture
74
+ - **Base Model**: Custom Bagel-7B multimodal architecture
75
+ - **Vision Processing**: Advanced semantic-visual understanding
76
+ - **Knowledge Integration**: Professional photography database with 30+ years expertise
77
+ - **Output Optimization**: Multi-engine compatibility layer
78
+
79
+ ### Processing Pipeline
80
+ - **Image Preprocessing**: Automatic optimization and format conversion
81
+ - **Multimodal Analysis**: Deep scene understanding with technical assessment
82
+ - **Professional Enhancement**: Camera, lighting, and composition recommendations
83
+ - **Prompt Structuring**: Organized output with technical and artistic elements
84
+
85
+ ### Supported Platforms
86
+ - **Flux** - Primary optimization target with technical specifications
87
+ - **Midjourney** - Style and mood descriptors
88
+ - **Stable Diffusion** - Technical parameter integration
89
+ - **Other Engines** - Universal prompt format compatibility
90
+
91
+ ## Use Cases
92
+
93
+ ### 🎬 **Film & TV Production**
94
+ - Storyboard creation and visualization
95
+ - Concept art development
96
+ - Scene planning and mood reference
97
+ - Visual consistency across episodes
98
+
99
+ ### πŸ“Έ **Photography Reference**
100
+ - Lighting setup recreation
101
+ - Camera configuration guidance
102
+ - Composition analysis and improvement
103
+ - Technical parameter optimization
104
+
105
+ ### 🎨 **Creative Development**
106
+ - Visual concept exploration
107
+ - Style reference generation
108
+ - Mood and atmosphere studies
109
+ - Character and environment design
110
+
111
+ ### πŸ’Ό **Commercial Applications**
112
+ - Product visualization
113
+ - Marketing material creation
114
+ - Brand consistency maintenance
115
+ - Commercial photography planning
116
+
117
+ ## Example Workflow
118
+
119
+ ```
120
+ Input: Portrait photograph of a person in dramatic lighting
121
+
122
+ Phramer AI Analysis:
123
+ β”œβ”€β”€ Scene Detection: Studio portrait with dramatic side lighting
124
+ β”œβ”€β”€ Technical Analysis: Professional setup with controlled lighting
125
+ β”œβ”€β”€ Camera Recommendation: Canon EOS R5 with 85mm f/1.4 lens
126
+ └── Enhancement: Cinematic mood with film-quality specifications
127
+
128
+ Output Prompt:
129
+ "A cinematic portrait of [subject description], shot on Canon EOS R5
130
+ with 85mm f/1.4 lens at f/2.8, dramatic side lighting with subtle rim
131
+ light, professional studio setup, film grain, photorealistic,
132
+ ultra-detailed, commercial photography style"
133
+ ```
134
+
135
+ ## Quality Scoring
136
+
137
+ Phramer AI evaluates generated prompts across multiple dimensions:
138
+
139
+ - **Prompt Quality** (25%) - Content detail and description accuracy
140
+ - **Technical Details** (25%) - Camera settings and equipment specifications
141
+ - **Professional Photography** (25%) - Lighting, composition, and technical expertise
142
+ - **Multi-Engine Optimization** (25%) - Compatibility and enhancement features
143
+
144
+ Scores range from 0-100 with grades from POOR to LEGENDARY.
145
+
146
+ ## Installation & Usage
147
+
148
+ ### Requirements
149
+ - Python 3.8+
150
+ - CUDA-compatible GPU (recommended)
151
+ - 8GB+ RAM
152
+ - Internet connection for model access
153
+
154
+ ### Local Setup
155
+ ```bash
156
+ git clone [repository-url]
157
+ cd phramer-ai
158
+ pip install -r requirements.txt
159
+ python app.py
160
+ ```
161
+
162
+ ### Cloud Usage
163
+ Available on Hugging Face Spaces with instant access - no installation required.
164
+
165
+ ## API Integration
166
+
167
+ Phramer AI provides a simple API for integration into existing workflows:
168
+
169
+ ```python
170
+ from phramer import PhramerlAI
171
+
172
+ phramer = PhramerAI()
173
+ prompt, metadata = phramer.analyze_image("path/to/image.jpg")
174
+ print(f"Generated prompt: {prompt}")
175
+ ```
176
+
177
+ ## Performance
178
+
179
+ - **Average Processing Time**: 2-4 seconds per image
180
+ - **Supported Image Size**: Up to 1024x1024 pixels
181
+ - **Batch Processing**: Multiple images with queue management
182
+ - **Memory Optimization**: Automatic cleanup and resource management
183
+
184
+ ## Roadmap
185
+
186
+ ### Version 2.1 (Coming Soon)
187
+ - Video frame analysis
188
+ - Batch processing improvements
189
+ - Additional engine-specific optimizations
190
+ - Enhanced cinematic analysis
191
+
192
+ ### Version 2.2 (Planned)
193
+ - Style transfer integration
194
+ - Custom knowledge base training
195
+ - API rate limiting and authentication
196
+ - Advanced composition analysis
197
 
198
  ## Technical Details
199
 
200
+ ### Model Architecture
201
+ - **Bagel-7B Base**: Advanced vision-language model
202
+ - **Custom Training**: Optimized for prompt generation
203
+ - **Knowledge Integration**: Professional photography database
204
+ - **Multi-Modal Processing**: Image + text understanding
205
 
206
+ ### Optimization Features
207
+ - **Memory Efficient**: Automatic resource management
208
+ - **GPU Acceleration**: CUDA optimization when available
209
+ - **Batch Processing**: Multiple image support
210
+ - **Error Handling**: Robust fallback systems
211
 
212
+ ## Contributing
 
 
 
213
 
214
+ We welcome contributions to improve Phramer AI:
215
 
216
+ 1. Fork the repository
217
+ 2. Create a feature branch
218
+ 3. Submit a pull request with detailed description
219
+ 4. Follow coding standards and include tests
220
 
221
  ## License
222
 
223
+ Apache 2.0 - See LICENSE file for details.
224
+
225
+ ## Support
226
+
227
+ For technical support, feature requests, or collaboration inquiries:
228
+
229
+ - **Technical Issues**: Create an issue in the repository
230
+ - **Feature Requests**: Submit detailed proposals
231
+ - **Commercial Licensing**: Contact Pariente AI
232
+ - **MIA TV Series Integration**: Production team coordination
233
+
234
+ ## Credits
235
+
236
+ **Phramer AI** is developed by **Pariente AI** specifically for the **MIA TV Series** production pipeline.
237
+
238
+ ### Core Technologies
239
+ - Bagel-7B multimodal architecture
240
+ - Professional photography knowledge base
241
+ - Advanced prompt optimization algorithms
242
+ - Multi-engine compatibility layer
243
+
244
+ ### Research & Development
245
+ - **Pariente AI** - Advanced multimodal AI research
246
+ - **MIA TV Series** - Creative pipeline integration
247
+ - **Professional Photography Consultants** - 30+ years expertise database
248
+ - **Community Contributors** - Feature improvements and testing
249
 
250
  ---
251
 
252
+ **Pariente AI** β€’ Advanced Multimodal AI Research & Development β€’ **MIA TV Series**
253
+
254
+ *Bridging the gap between image understanding and generative prompting*