Shiqi Yang

杨诗琪 · Ph.D.

Director, Principal Research Scientist
Multimodal AI Department · SB Intuitions Corp., Tokyo

Email shiqi.yang147.jp@gmail.com

Location Tokyo, Japan

CV (updated July 2026) Scholar Open positions LinkedIn X

I lead the application-oriented Multimodal AI Department at SB Intuitions, a SoftBank R&D company, bringing together the Creative Vision Team and Conversational Speech Team, leading teams of researchers and engineers across two sub-teams, covering vision and speech. Before joining SB Intuitions, I was an audio-visual research scientist at Sony Group Corporation. I received my Ph.D. (2023) from the Learning and Machine Perception (LAMP) team at the Computer Vision Center, Autonomous University of Barcelona, advised by Joost van de Weijer. I also serve the community as an area chair for ICML and NeurIPS, guest editor for an IJCV special issue, and organizer of the workshop series EVG and AVGenL.

Research focus: Interactive multimodal generation · Video world models

Hiring

Join our team

We work on streaming audio-video generation, video world models (for robotics), and speech-related topics such as speech-to-speech translation and streaming generation.

Currently no open headcount. Feel free to reach out by email if you'd like to express interest for future openings.

Updates

Latest News

Jun. 2026 We are organizing the official Runway Local Meetup Tokyo on July 16. Please see the official Runway meetup page and register here.
Apr. 2026 We will organize 2 ECCV workshops this year: 1st workshop on Efficient Visual Generation (EVG) and 3rd workshop on Audio-Visual Generation and Learning (AVGenL).
Sep. 2025 2 papers are accepted by NeurIPS 2025.
May 2025 We will host the 2nd workshop on Audio-Visual Generation and Learning (AVGenL) in ICCV 2025. We will have 1 industrial session this year: Veo 3 from Google DeepMind. Stay tuned for more details.
Feb. 2025 "One-way ticket" is accepted by CVPR 2025.
Jan. 2025 "Mine Your Own Secrets", "InterLCM" and "One-Prompt-One-Story" (spotlight) are accepted by ICLR 2025.
Oct. 2024 Have visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento.
Sep. 2024 Our paper "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models" is accepted by NeurIPS 2024.
Apr. 2024 We are organizing an ECCV 2024 workshop "AVGenL: Audio-Visual Generation and Learning". Please check the site for CfP and speakers.
Dec. 2023 My doctoral thesis received "Pioneer Awards 2023 - CERCA".
Sep. 2023 Our paper "Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing" is accepted by NeurIPS 2023.
Aug. 2023 Extended version of "NRC" is accepted by IEEE TPAMI.
Jun. 2023 "Casting a BAIT for Offline and Online Source-free Domain Adaptation" is finally accepted by CVIU.
Jan. 2023 Have a visiting talk in Prof. Maria Brbic's group in EPFL.
Nov. 2022 I present our work on model adaptation under domain and category shift on TrustML Young Scientist Seminars (hosted by RIKEN AIP) on Dec. 7.
Sep. 2022 "Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation" is accepted by NeurIPS 2022 as Spotlight, and our paper "Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification" is accepted by BMVC 2022.
Sep. 2021 "Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation" is accepted by NeurIPS 2021.
Jul. 2021 "Generalized Source-free Domain Adaptation" is accepted by ICCV 2021.

Career

Industry Experience

SB Intuitions, SoftBank, Tokyo, Japan

Jun. 2026 – Now Principal Research Scientist and Research Manager of Creative Vision Team

Apr. 2026 – Now Director of Multimodal AI Department

Apr. 2025 – May 2026 Chief Research Scientist and Research Manager of Creative Vision Team

Dec. 2024 – Mar. 2025 Lead Research Scientist
Sony Group Corporation, Tokyo, Japan

Oct. 2023 – Nov. 2024 Research Scientist
OMRON SINIC X, Tokyo, Japan

Jan. 2023 – Jun. 2023 Research Intern
Kyoto University, Japan

Oct. 2018 – Mar. 2019 Guest Research Associate

Recognition

Talks, Awards & Activities

Visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento, Italy, Oct. 2024.
Pioneer Awards 2023, CERCA Research Center of Catalonia, Spain, Dec. 2023.
Visiting talk in Prof. Maria Brbic's group in EPFL, Switzerland, Jan. 2023.
Invited talk on TrustML Young Scientist Seminars, RIKEN AIP, Japan, Dec. 2022.
Participation in ICVSS Summer School, Sicily, Italy, Jul. 2022.
Invited talk on AI Time Seminar on NeurIPS 2021 (Virtual), China, Feb. 2022.

Community

Academic Service

Conference Area Chair: NeurIPS, ICML
Guest Editor: IJCV Special Issue "Audio-Visual Generation".
Workshop Organizer: ECCV 2026 "Workshop on Efficient Visual Generation (EVG)"; ECCV 2024 / ICCV 2025 / ECCV 2026 "Audio-Visual Generation and Learning (AVGenL) workshop".
Conference Reviewer: ICLR; ICCV; NeurIPS; ECCV; ICML; CVPR; WACV.
Journal Reviewer: IEEE TKDE; TPAMI; TAI; IJCV.

Background

Education

Sep. 2012 – Jun. 2016
Bachelor in Automation, Wuhan University of Science and Technology, China.
Sep. 2016 – Jun. 2019
Master in Control Science and Technology, Huazhong University of Science and Technology, China.
Oct. 2019 – Jul. 2023
Ph.D. in Computer Science, Computer Vision Center, Autonomous University of Barcelona, Spain.

Research Output

Full Publications

Browse journal articles, preprints, and international conference papers by year.

* Project lead

International Conference

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang CVPR 2026 Findings. [arXiv]
Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Jiang Qin, Senmao Li, Alexandra Gomez-Villa, Shiqi Yang, Yaxing Wang, Kai Wang, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2025. [arXiv]
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang NeurIPS, 2025. [arXiv]
One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv]
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji International Conference on Learning Representations (ICLR), 2025. [arXiv] [openreview] [project]
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. (Spotlight) [arXiv] [openreview] [project]
InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. [arXiv] [openreview] [project]
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang Advances in Neural Information Processing Systems (NeurIPS), 2024. [project] [arXiv] [code]
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji International Society for Music Information Retrieval (ISMIR), 2024. [arXiv]
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2023. [paper] [arXiv] [code]
Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification Kai Wang, Chenshen Wu, Andrew D. Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer British Machine Vision Conference (BMVC), 2022. [arXiv] [code]
Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer NeurIPS, 2022. (Spotlight) [project] [paper] [arXiv] [code]
Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui Advances in Neural Information Processing Systems (NeurIPS), 2021. [project] [paper] [arXiv] [code]
Generalized Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui International Conference on Computer Vision (ICCV), 2021. [project] [paper] [arXiv] [code] [video]
Parallel Convolutional Networks for Image Recognition via a Discriminator Shiqi Yang, Gang Peng Asian Conference on Computer Vision (ACCV), 2018. [paper] [arXiv]
Attention to Refine Through Multi Scales for Semantic Segmentation Shiqi Yang, Gang Peng Pacific-Rim Conference on Multimedia (PCM), 2018. [paper] [arXiv]

Journal

Training-free image inversion for one-step diffusion models Tao Wu, Senmao Li, Yaxing Wang, Shiqi Yang, Kai Wang, Joost van de Weijer Pattern Recognition, 2026. [paper]
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass Transactions on Machine Learning Research (TMLR), 2025. [arXiv]
Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. [paper] [arXiv]
Casting a BAIT for Offline and Online Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Luis Herranz, Shangling Jui, Joost van de Weijer Computer Vision and Image Understanding (CVIU), 2023. [paper] [arXiv] [code]
On Implicit Attribute Localization for Generalized Zero-Shot Learning Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer IEEE Signal Processing Letters, 2021. [paper] [arXiv]

Preprint and workshop paper

Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling Saurav Jha, M Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar World Modeling Workshop 2026. [arxiv]
OpenMU: Your Swiss Army Knife for Music Understanding Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji preprint, 2024. [arXiv] [code]
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji preprint, 2024. [arXiv] [demo]
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang preprint, 2023. [arXiv]
A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku preprint, 2023. [arXiv]
OneRing: A Simple Method for Source-free Open-partial Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer preprint, 2022. [project] [arXiv] [code]