Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras


Hanzhang Tu1, Ruizhi Shao1, Xue Dong2, Shunyuan Zheng†,3, Hao Zhang2, Lili Chen2, Meili Wang2, Wenyu Li2, Siyan Ma2, Shengping Zhang3,
Boyao Zhou✉,1, Yebin Liu✉,1

1Tsinghua University    2BOE Technology Group    3Harbin Institute of Technology
Corresponding author     †Work done during an internship at Tsinghua University


In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, TeleAloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant communication. As the core of Tele-Aloha, we propose an efficient novel view synthesis algorithm for upper-body. Firstly, we design a cascaded disparity estimator for obtaining a robust geometry cue. Additionally a neural rasterizer via Gaussian Splatting is introduced to project latent features onto target view and to decode them into a reduced resolution. Further, given the high-quality captured data, we leverage weighted blending mechanism to refine the decoded image into the final resolution of 2K. Exploiting world-leading autostereoscopic display and lowlatency iris tracking, users are able to experience a strong threedimensional sense even without any wearable head-mounted display device. Altogether, our telepresence system demonstrates the sense of co-presence in real-life experiments, inspiring the next generation of communication.

Comparison with previous telepresence systems

Systems Input setting Efficiency Resolution GPU equipment Complexity
Holoportation 8 RGBD 34 fps - 10 Titan X Medium (F.B.)
LookinGood 1 RGBD 34 fps 512 × 1024 1 Titan V High (U.B.)
Starline 3 RGBD 60 fps 1600 × 1200 2 RTX 6000 + 2 Titan RTX High (U.B.)
VirtualCube 6 RGBD 30 fps 1280 × 960 2 RTX 3090 + 1 RTX 2080 High (U.B.)
MetaStream 4 RGBD 30 fps - 4 Jetson Nano + 1 RTX 2080S Medium (F.B.)
AI-mediated 1 RGB 24 fps 512 × 512 1 RTX 6000 + 1 RTX 4090 Low (Head)
Live4d 20 RGB 24 fps - 4 RTX 3090 + 1 RTX 4090 Medium (F.B.)
Ours 4 RGB 30 fps 2048 × 2048 1 RTX 4090 High (U.B.)

Static result on THuman dataset

Demo Video


    title={Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras},
    author={Hanzhang, Tu and Ruizhi, Shao and Xue, Dong and Shunyuan, Zheng and Hao, Zhang and Lili, Chen and Meili, Wang and Wenyu, Li and Siyan, Ma and Shengping, Zhang and Boyao, Zhou and Yebin, Liu},
    booktitle={ACM SIGGRAPH Conference Proceedings},