|
<!DOCTYPE html> |
|
<html> |
|
<head> |
|
<meta charset="utf-8"> |
|
|
|
|
|
<meta content="VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild" |
|
property="og:title"> |
|
<meta content="VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild" |
|
name="description" property="og:description"> |
|
<meta content="https://vinthony.github.io/video-retalking/" property="og:url"> |
|
|
|
<meta property="og:image" content="static/image/your_banner_image.png" /> |
|
<meta property="og:image:width" content="1200"/> |
|
<meta property="og:image:height" content="630"/> |
|
|
|
|
|
<meta name="twitter:title" content="TWITTER BANNER TITLE META TAG"> |
|
<meta name="twitter:description" content="TWITTER BANNER DESCRIPTION META TAG"> |
|
|
|
<meta name="twitter:image" content="static/images/your_twitter_banner_image.png"> |
|
<meta name="twitter:card" content="summary_large_image"> |
|
|
|
<meta name="keywords" content="KEYWORDS SHOULD BE PLACED HERE"> |
|
<meta name="viewport" content="width=device-width, initial-scale=1"> |
|
|
|
|
|
<title>VideoRetalking</title> |
|
<link rel="icon" type="image/x-icon" href="static/images/favicon.ico"> |
|
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" |
|
rel="stylesheet"> |
|
|
|
<link rel="stylesheet" href="static/css/bulma.min.css"> |
|
<link rel="stylesheet" href="static/css/bulma-carousel.min.css"> |
|
<link rel="stylesheet" href="static/css/bulma-slider.min.css"> |
|
<link rel="stylesheet" href="static/css/fontawesome.all.min.css"> |
|
<link rel="stylesheet" |
|
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"> |
|
<link rel="stylesheet" href="static/css/index.css"> |
|
|
|
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> |
|
<script src="https://documentcloud.adobe.com/view-sdk/main.js"></script> |
|
<script defer src="static/js/fontawesome.all.min.js"></script> |
|
<script src="static/js/bulma-carousel.min.js"></script> |
|
<script src="static/js/bulma-slider.min.js"></script> |
|
<script src="static/js/index.js"></script> |
|
</head> |
|
<body> |
|
|
|
|
|
<section class="hero"> |
|
<div class="hero-body"> |
|
<div class="container is-max-desktop"> |
|
<div class="columns is-centered"> |
|
<div class="column has-text-centered"> |
|
<h1 class="xtitle is-1 publication-title">VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild</h1> |
|
<br/> |
|
<div class="is-size-5 publication-authors"> |
|
|
|
<span class="author-block"> |
|
<a href="#" target="_blank">Kun Cheng</a><sup>*,1,2</sup></span> |
|
<span class="author-block"> |
|
<a href="https://vinthony.github.io" target="_blank">Xiaodong Cun</a><sup>*,2</sup></span> |
|
<span class="author-block"> |
|
<a href="https://yzhang2016.github.io" target="_blank">Yong Zhang</a><sup>2</sup> |
|
</span> |
|
<span class="author-block"> |
|
<a href="https://menghanxia.github.io/" target="_blank">Menghan Xia</a><sup>2</sup> |
|
</span> |
|
<span class="author-block"> |
|
<a href="https://feiiyin.github.io/" target="_blank">Fei Yin</a><sup>2,3</sup> |
|
</span> |
|
</br> |
|
<span class="author-block"> |
|
<a href="https://web.xidian.edu.cn/mrzhu/en/index.html" target="_blank">Mingrui Zhu</a><sup>1</sup> |
|
</span> |
|
<span class="author-block"> |
|
<a href="https://xuanwangvc.github.io/" target="_blank">Xuan Wang</a><sup>2</sup> |
|
</span> |
|
<span class="author-block"> |
|
<a href="https://juewang725.github.io/" target="_blank">Jue Wang</a><sup>2</sup> |
|
</span> |
|
<span class="author-block"> |
|
<a href="https://web.xidian.edu.cn/nnwang/en/index.html" target="_blank">Nannan Wang</a><sup>1</sup> |
|
</span> |
|
</div> |
|
<br/> |
|
<div class="is-size-5 publication-authors"> |
|
<span class="author-block"> |
|
<sup>1</sup> Xidian University |
|
<sup>2</sup> Tencent AI Lab |
|
<sup>3</sup> Tsinghua University |
|
<br>SIGGRAPH Asia 2022 (Conference Track)</span> |
|
<span class="eql-cntrb"><small><br><sup>*</sup>Indicates Equal Contribution</small></span> |
|
</div> |
|
|
|
<div class="column has-text-centered"> |
|
<div class="publication-links"> |
|
|
|
<span class="link-block"> |
|
<a href="https://arxiv.org/pdf/2211.14758.pdf" target="_blank" |
|
class="external-link "> |
|
<span class="icon"> |
|
<i class="fas fa-file-pdf"></i> |
|
</span> |
|
<span>Paper</span> |
|
</a> |
|
</span> |
|
|
|
|
|
<span class="link-block"> |
|
<a href="https://github.com/vinthony/video-retalking/" target="_blank" |
|
class="external-link "> |
|
<span class="icon"> |
|
<i class="fab fa-github"></i> |
|
</span> |
|
<span>Code</span> |
|
</a> |
|
</span> |
|
|
|
|
|
<span class="link-block"> |
|
<a href="https://arxiv.org/abs/2211.14758" target="_blank" |
|
class="external-link "> |
|
<span class="icon"> |
|
<i class="ai ai-arxiv"></i> |
|
</span> |
|
<span>arXiv</span> |
|
</a> |
|
</span> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
<section class="hero teaser"> |
|
<div class="container is-max-desktop"> |
|
<div class="hero-body-img"> |
|
<img src="./static/images/teaser.png" width="80%"> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
<section class="section hero is-light"> |
|
<div class="container is-max-desktop"> |
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
<h2 class="title is-3">Abstract</h2> |
|
<div class="content has-text-justified"> |
|
<p> |
|
We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, |
|
producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective |
|
into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and |
|
(3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame |
|
according to the same expression template using the expression editing network, resulting in a video with the canonical |
|
expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. |
|
Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and |
|
post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential |
|
pipeline without any user intervention. |
|
</p> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<section class="hero is-small is-light"> |
|
<div class="hero-body"> |
|
<div class="container"> |
|
|
|
<h2 class="title is-3">Pipeline</h2> |
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
|
|
<div class="hero-body-img"> |
|
|
|
<img width='80%' src="static/images/pipeline.png"> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
|
|
<section class="hero is-small is-light"> |
|
<div class="hero-body"> |
|
<div class="container"> |
|
|
|
<h2 class="title is-3"><strong>Video1</strong>: Video Results in the Wild.</h2> |
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
|
|
<video controls="" width="100%"> |
|
|
|
<source src="./static/videos/Results_in_the_wild.mp4#t=0.001" type="video/mp4"> |
|
</video> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
<section class="hero is-small is-light"> |
|
<div class="hero-body"> |
|
<div class="container"> |
|
|
|
<h2 class="title is-3"><strong>Video2</strong>: Comparison with SOTA Methods.</h2> |
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
|
|
<video controls="" width="100%"> |
|
|
|
<source src="./static/videos/Comparison.mp4#t=0.001" type="video/mp4"> |
|
</video> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
<section class="hero is-small is-light"> |
|
<div class="hero-body"> |
|
<div class="container"> |
|
|
|
<h2 class="title is-3"><strong>Video3</strong>: Ablation Study on Different Modules. </h2> |
|
<div class="columns is-centered has-text-centered"> |
|
<div class="column is-four-fifths"> |
|
|
|
<video controls="" width="100%"> |
|
|
|
<source src="./static/videos/Ablation.mp4#0.001" type="video/mp4"> |
|
</video> |
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
|
|
<section class="section" id="BibTeX"> |
|
<div class="container is-max-desktop content"> |
|
<h2 class="title">BibTeX</h2> |
|
<pre><code>@misc{videoretalking, |
|
title={VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild}, |
|
author={Kun Cheng and Xiaodong Cun and Yong Zhang and Menghan Xia and Fei Yin and Mingrui Zhu and Xuan Wang and Jue Wang and Nannan Wang}, |
|
year={2022}, |
|
eprint={2211.14758}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CV} |
|
}</code></pre> |
|
</div> |
|
</section> |
|
|
|
|
|
|
|
<footer class="footer"> |
|
<div class="container"> |
|
<div class="columns is-centered"> |
|
<div class="column is-8"> |
|
<div class="content"> |
|
|
|
<p> |
|
This page was built using the <a href="https://github.com/vinthony/project-page-template">modification version</a> of <a href="https://github.com/eliahuhorwitz/Academic-project-page-template" target="_blank">Academic Project Page Template</a> from <a href="https://github.com/vinthony">vinthony</a>. |
|
You are free to borrow the of this website, we just ask that you link back to this page in the footer. <br> This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank">Creative |
|
Commons Attribution-ShareAlike 4.0 International License</a>. |
|
</p> |
|
|
|
</div> |
|
</div> |
|
</div> |
|
</div> |
|
</footer> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</body> |
|
</html> |
|
|