| # AWS SageMaker Deployment Guide | |
| This guide provides step-by-step instructions for deploying the Image Description application to AWS SageMaker. | |
| ## Prerequisites | |
| - AWS account with SageMaker permissions | |
| - AWS CLI installed and configured | |
| - Docker installed on your local machine | |
| - The source code from this repository | |
| ## Step 1: Create an Amazon ECR Repository | |
| ```bash | |
| aws ecr create-repository --repository-name image-descriptor | |
| ``` | |
| Note the repository URI returned by this command. You'll use it in the next step. | |
| ## Step 2: Build and Push the Docker Image | |
| 1. Log in to ECR: | |
| ```bash | |
| aws ecr get-login-password --region your-region | docker login --username AWS --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com | |
| ``` | |
| 2. Build the Docker image: | |
| ```bash | |
| docker build -t image-descriptor . | |
| ``` | |
| 3. Tag and push the image: | |
| ```bash | |
| docker tag image-descriptor:latest your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest | |
| docker push your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest | |
| ``` | |
| ## Step 3: Create a SageMaker Model | |
| 1. Create a model.json file: | |
| ```json | |
| { | |
| "ModelName": "QwenVLImageDescriptor", | |
| "PrimaryContainer": { | |
| "Image": "your-account-id.dkr.ecr.your-region.amazonaws.com/image-descriptor:latest", | |
| "Environment": { | |
| "PORT": "8080" | |
| } | |
| }, | |
| "ExecutionRoleArn": "arn:aws:iam::your-account-id:role/service-role/AmazonSageMaker-ExecutionRole" | |
| } | |
| ``` | |
| 2. Create the SageMaker model: | |
| ```bash | |
| aws sagemaker create-model --cli-input-json file://model.json | |
| ``` | |
| ## Step 4: Create an Endpoint Configuration | |
| 1. Create a config.json file: | |
| ```json | |
| { | |
| "EndpointConfigName": "QwenVLImageDescriptorConfig", | |
| "ProductionVariants": [ | |
| { | |
| "VariantName": "AllTraffic", | |
| "ModelName": "QwenVLImageDescriptor", | |
| "InstanceType": "ml.g5.2xlarge", | |
| "InitialInstanceCount": 1 | |
| } | |
| ] | |
| } | |
| ``` | |
| 2. Create the endpoint configuration: | |
| ```bash | |
| aws sagemaker create-endpoint-config --cli-input-json file://config.json | |
| ``` | |
| ## Step 5: Create the Endpoint | |
| ```bash | |
| aws sagemaker create-endpoint --endpoint-name qwen-vl-image-descriptor --endpoint-config-name QwenVLImageDescriptorConfig | |
| ``` | |
| This will take several minutes to deploy. | |
| ## Step 6: Invoke the Endpoint | |
| You can invoke the endpoint using the AWS SDK or AWS CLI. | |
| Using Python SDK: | |
| ```python | |
| import boto3 | |
| import json | |
| import base64 | |
| from PIL import Image | |
| import io | |
| # Initialize the SageMaker runtime client | |
| runtime = boto3.client('sagemaker-runtime') | |
| # Load and encode the image | |
| with open('data_temp/page_2.png', 'rb') as f: | |
| image_data = f.read() | |
| image_b64 = base64.b64encode(image_data).decode('utf-8') | |
| # Create the request payload | |
| payload = { | |
| 'image_data': image_b64 | |
| } | |
| # Invoke the endpoint | |
| response = runtime.invoke_endpoint( | |
| EndpointName='qwen-vl-image-descriptor', | |
| ContentType='application/json', | |
| Body=json.dumps(payload) | |
| ) | |
| # Parse the response | |
| result = json.loads(response['Body'].read().decode()) | |
| print(json.dumps(result, indent=2)) | |
| ``` | |
| ## Step 7: Set Up API Gateway (Optional) | |
| For public HTTP access, set up an API Gateway: | |
| 1. Create a new REST API in API Gateway | |
| 2. Create a new resource and POST method | |
| 3. Configure the integration to use the SageMaker endpoint | |
| 4. Deploy the API to a stage | |
| 5. Note the API Gateway URL for client use | |
| ## Cost Optimization | |
| To optimize costs: | |
| 1. Use SageMaker Serverless Inference instead of a dedicated endpoint | |
| 2. Implement auto-scaling for your endpoint | |
| 3. Use Spot Instances for non-critical workloads | |
| 4. Schedule endpoints to be active only during business hours | |
| ## Monitoring | |
| Set up CloudWatch Alarms to monitor: | |
| 1. Endpoint invocation metrics | |
| 2. Error rates | |
| 3. Latency | |
| 4. Instance utilization | |
| ## Cleanup | |
| To avoid ongoing charges, delete resources when not in use: | |
| ```bash | |
| aws sagemaker delete-endpoint --endpoint-name qwen-vl-image-descriptor | |
| aws sagemaker delete-endpoint-config --endpoint-config-name QwenVLImageDescriptorConfig | |
| aws sagemaker delete-model --model-name QwenVLImageDescriptor | |
| ``` |