Model card for PTA-Text - A Text Only Click Model

Table of Contents

  1. TL;DR
  2. Using the model
  3. Contribution
  4. Citation

TL;DR

Details for PTA-Text:

-> Input: An image with a header containing the desired UI click command.

-> Output: [x,y] coordinate in relative coordinates 0-1 range.

PTA-Text is an image encoder based on Matcha, which is an extension of Pix2Struct

Installation

pip install askui-ml-helper

Download the checkpoint ".pt" model from files in this model card. Or download it from your terminal

curl -L "https://huggingface.co/AskUI/pta-text-0.1/resolve/main/pta-text-v0.1.1.pt?download=true" -o pta-text-v0.1.1.pt

Running the model

Get the annotated image

You can run the model in full precision on CPU:

import requests
from PIL import Image
from askui_ml_helper.utils.pta_text import PtaTextInference

pta_text_inference = PtaTextInference("pta-text-v0.1.1.pt")
url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = 'click on the text "Operating System"'

render_image = pta_text_inference.process_image_and_draw_circle(image, prompt, radius=15)
render_image.show()
>>> Uploaded image with "a red dot", where click operation is predicted 

image/png

Get the coordinates

import requests
from PIL import Image
from askui_ml_helper.utils.pta_text import PtaTextInference

pta_text_inference = PtaTextInference("pta-text-v0.1.1.pt")
url = "https://docs.askui.com/assets/images/how_askui_works_architecture-363bc8be35bd228e884c83d15acd19f7.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
prompt = 'click on the text "Operating System"'

coordinates = pta_text_inference.process_image(image, prompt)
coordinates
>>> [0.3981265723705292, 0.13768285512924194]

Contribution

An AskUI's open source initiative. This model is contributed and added to the Hugging Face ecosystem by Murali Manohar @ AskUI.

Citation

TODO

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Space using AskUI/pta-text-0.1 1