Facial-Landmark-Detection: Pose Estimation

Facial-Landmark-Detection is a lightweight deep learning model for real-time facial keypoint detection (e.g., eyes, nose tip, mouth corners), optimized via multi-task learning and attention mechanisms for robustness in complex scenarios. It employs a hybrid backbone (e.g., MobileNetV3-HRNet) with dynamic coordinate regression to handle occlusion, lighting variations, and extreme poses, supporting 68/106-point high-precision localization. Through knowledge distillation, the model is compressed below 1MB parameters, achieving NRMSE <4.5% on 300W and WFLW datasets with 30+ FPS on mobile devices—10x faster than traditional Dlib. Ideal for AR virtual makeup, expression analysis, face alignment, and medical facial assessment, it balances edge deployment efficiency and sub-millimeter accuracy, with INT8 quantization for ultra-low latency.

Source model

  • Input shape: 1x3x128x128
  • Number of parameters: 5.17M
  • Model size: 20.95M
  • Output shape: 1x265

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support