Shiqi Yang

杨诗琪 · Ph.D.

Team leader / Researcher, SB Intuitions (SoftBank), Tokyo, Japan

#Multi-Modal Generation

Email shiqi.yang147.jp@gmail.com

Location Tokyo, Japan

CV (updated Nov. 2025) Scholar Linkedin X

From 2024.12, I am a researcher and team leader (as well as a research manager) of the Creative Vision team at SB Intuitions in Tokyo, and I am also affiliated with SoftBank Corp. From 2023.10 to 2024.11, I worked as an audio-visual research scientist in Sony Group Corporation, Tokyo. Before that, I was a Ph.D. student in the Learning and Machine Perception (LAMP) team (2019.10 – 2023.7), advised by Joost van de Weijer at the Computer Vision Center , Autonomous University of Barcelona, Spain.

Currently, I am leading industrial projects (pretraining and also post training) in visual generation and manipulation.
I was working on multi-modal (especially audio-visual) generation when I was in Sony.
During my Ph.D., I focused on how to efficiently adapt pretrained models to real-world environments under domain and category shift in an unsupervised manner, where the related research topics cover zero-shot learning, source-free / test-time / continual / open-set domain adaptation.

News

[2025.11] Invited to serve as area chair for ICML 2026.
[2025.9] 2 papers are accepted by NeurIPS 2025.
[2025.5] We will host the 2nd workshop on Audio-Visual Generation and Learning (AVGenL) in ICCV 2025. We will have 1 industrial session this year: Veo 3 from Google DeepMind. Stay tuned for more details.
[2025.2] "One-way ticket" is accepted by CVPR 2025.
[2025.1] "Mine Your Own Secrets" , "InternLCM" and " One-Prompt-One-Story " (spotlight) are accepted by ICLR 2025.
[2024.10] Have visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento.
[2024.9] Our paper "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models" is accepted by NeurIPS 2024.
[2024.4] We are organizing an ECCV 2024 workshop "AVGenL: Audio-Visual Generation and Learning" . Please check the site for CfP and speakers.
[2023.12] My doctoral thesis received "Pioneer Awards 2023 - CERCA" .
[2023.9] Our paper "Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing" is accepted by NeurIPS 2023.
[2023.8] Extended version of "NRC" is accepted by IEEE TPAMI.
[2023.6] "Casting a BAIT for Offline and Online Source-free Domain Adaptation" is finally accepted by CVIU.
[2023.1] Have a visiting talk in Prof. Maria Brbic 's group in EPFL.
[2022.11] I present our work on model adaptation under domain and category shift on TrustML Young Scientist Seminars (hosted by RIKEN AIP) on Dec. 7.
[2022.9] "Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation" is accepted by NeurIPS 2022 as Spotlight, and our paper "Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification" is accepted by BMVC 2022.
[2021.9] "Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation" is accepted by NeurIPS 2021.
[2021.7] "Generalized Source-free Domain Adaptation" is accepted by ICCV 2021.

Experience

SB Intuitions, SoftBank, Tokyo, Japan

Apr. 2025 – Now Chief Research Scientist and Research Manager

Dec. 2024 – Mar. 2025 Lead Research Scientist
Sony Group Corporation, Tokyo, Japan

Oct. 2023 – Nov. 2024 Research Scientist
OMRON SINIC X , Tokyo, Japan

Jan. 2023 – Jun. 2023 Research Intern
Kyoto University, Japan

Oct. 2018 – Mar. 2019 Guest Research Associate

Invited Talks, Awards & Activities

Visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento, Italy, 2024.10.
Pioneer Awards 2023 , CERCA Research Center of Catalonia, Spain, 2023.12.
Visiting talk in Prof. Maria Brbic 's group in EPFL, Switzerland, 2023.1.
Invited talk on TrustML Young Scientist Seminars , RIKEN AIP, Japan, 2022.12.
Participation in ICVSS Summer School , Sicily, Italy, 2022.7.
Invited talk on AI Time Seminar on NeurIPS 2021 (Virtual), China, 2022.2.

Academic Service

Conference Area Chair: ICML 2026.
Guest Editor: IJCV Special Issue " Audio-Visual Generation ".
Organizer: ECCV 2024 / ICCV 2025 " Audio-Visual Generation and Learning workshop ".
Conference Reviewer: ICLR; ICCV; NeurIPS; ECCV; ICML; CVPR; WACV.
Journal Reviewer: IEEE TKDE; TPAMI; TAI; IJCV.

Education

Oct. 2019 – Jul. 2023
Ph.D. in Computer Science, Computer Vision Center , Autonomous University of Barcelona, Spain.
Sep. 2016 – Jun. 2019
Master in Control Science and Technology, Huazhong University of Science and Technology, China.
Sep. 2012 – Jun. 2016
Bachelor in Automation, Wuhan University of Science and Technology, China.

Contact

Contact: shiqi.yang147.jp@gmail.com

Full Publications

Journal articles, preprints, and international conference papers.

International Conference

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Jiang Qin, Senmao Li, Alexandra Gomez-Villa, Shiqi Yang, Yaxing Wang, Kai Wang, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2025. [arXiv]
From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang NeurIPS, 2025. [arXiv]
One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv]
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji International Conference on Learning Representations (ICLR), 2025. [arXiv] [openreview] [project]
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. (Spotlight) [arXiv] [openreview] [project]
InternLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. [arXiv] [openreview] [project]
Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang Advances in Neural Information Processing Systems (NeurIPS), 2024. [project] [arXiv] [code]
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji International Society for Music Information Retrieval (ISMIR), 2024. [arXiv]
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2023. [paper] [arXiv] [code]
Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification Kai Wang, Chenshen Wu, Andrew D. Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer British Machine Vision Conference (BMVC), 2022. [arXiv] [code]
Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer NeurIPS, 2022. (Spotlight) [project] [paper] [arXiv] [code]
Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui Advances in Neural Information Processing Systems (NeurIPS), 2021. [project] [paper] [arXiv] [code]
Generalized Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui International Conference on Computer Vision (ICCV), 2021. [project] [paper] [arXiv] [code] [video]
Parallel Convolutional Networks for Image Recognition via a Discriminator Shiqi Yang, Gang Peng Asian Conference on Computer Vision (ACCV), 2018. [paper] [arXiv]
Attention to Refine Through Multi Scales for Semantic Segmentation Shiqi Yang, Gang Peng Pacific-Rim Conference on Multimedia (PCM), 2018. [paper] [arXiv]

Journal

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass Transactions on Machine Learning Research (TMLR), 2025. [arXiv]
Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. [paper] [arXiv]
Casting a BAIT for Offline and Online Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Luis Herranz, Shangling Jui, Joost van de Weijer Computer Vision and Image Understanding (CVIU), 2023. [paper] [arXiv] [code]
On Implicit Attribute Localization for Generalized Zero-Shot Learning Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer IEEE Signal Processing Letters, 2021. [paper] [arXiv]

Preprint and workshop paper

Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling Saurav Jha, M Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar World Modeling Workshop 2026. [arxiv]
EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang preprint, 2025. [arXiv]
OpenMU: Your Swiss Army Knife for Music Understanding Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji preprint, 2024. [arXiv] [code]
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji preprint, 2024. [arXiv] [demo]
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang preprint, 2023. [arXiv]
A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku preprint, 2023. [arXiv]
OneRing: A Simple Method for Source-free Open-partial Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer preprint, 2022. [project] [arXiv] [code]