Shiqi Yang

杨 诗琪   ·   Ph.D.
Research Scientist
Team leader of Creative Vision Team
Director of Multimodal AI Department
SB Intuitions Corp. (SoftBank Corp.), Tokyo, Japan
Location Tokyo, Japan

Since April 2026, I have been the Director and General Manager of the Multimodal AI Department at SB Intuitions, a SoftBank R&D company in Tokyo. The department is an application-oriented research and development unit that includes the Creative Vision Team and the Conversational Speech Team. I also continue to work as a research scientist. Since December 2024, I have led the Creative Vision Team as a research scientist, team leader, and research manager. From October 2023 to November 2024, I worked as an audio-visual research scientist at Sony Group Corporation in Tokyo. Before that, I was a Ph.D. student in the Learning and Machine Perception (LAMP) team from October 2019 to July 2023, advised by Joost van de Weijer at the Computer Vision Center , Autonomous University of Barcelona, Spain.

Currently, I lead industrial projects on visual generation and manipulation. I have broad, hands-on experience in image generation and manipulation, covering both pre-training and post-training stages, as well as audio-visual generation, transfer learning, and continual learning.

I also actively serve the research community as an area chair for ICML and NeurIPS, a guest editor for an IJCV special issue, and the major organizer of the Workshop on Efficient Visual Generation (EVG) and the Audio-Visual Generation and Learning (AVGenL) workshop .

News

Experience

  • SB Intuitions, SoftBank, Tokyo, Japan
    Apr. 2026 – Now Director of Multimodal AI Department
    Apr. 2025 – Now Chief Research Scientist and Research Manager of Creative Vision Team
    Dec. 2024 – Mar. 2025 Lead Research Scientist
  • Sony Group Corporation, Tokyo, Japan
    Oct. 2023 – Nov. 2024 Research Scientist
  • OMRON SINIC X , Tokyo, Japan
    Jan. 2023 – Jun. 2023 Research Intern
  • Kyoto University, Japan
    Oct. 2018 – Mar. 2019 Guest Research Associate

Invited Talks, Awards & Activities

Academic Service

Education

  • Oct. 2019 – Jul. 2023
    Ph.D. in Computer Science, Computer Vision Center , Autonomous University of Barcelona, Spain.
  • Sep. 2016 – Jun. 2019
    Master in Control Science and Technology, Huazhong University of Science and Technology, China.
  • Sep. 2012 – Jun. 2016
    Bachelor in Automation, Wuhan University of Science and Technology, China.

Contact

Contact: shiqi.yang147.jp@gmail.com

Full Publications
Journal articles, preprints, and international conference papers.

International Conference

  • EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang CVPR 2026 Findings. [arXiv]
  • Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Jiang Qin, Senmao Li, Alexandra Gomez-Villa, Shiqi Yang, Yaxing Wang, Kai Wang, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2025. [arXiv]
  • From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang NeurIPS, 2025. [arXiv]
  • One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv]
  • Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji International Conference on Learning Representations (ICLR), 2025. [arXiv] [openreview] [project]
  • One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. (Spotlight) [arXiv] [openreview] [project]
  • InternLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. [arXiv] [openreview] [project]
  • Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang Advances in Neural Information Processing Systems (NeurIPS), 2024. [project] [arXiv] [code]
  • SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji International Society for Music Information Retrieval (ISMIR), 2024. [arXiv]
  • Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2023. [paper] [arXiv] [code]
  • Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification Kai Wang, Chenshen Wu, Andrew D. Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer British Machine Vision Conference (BMVC), 2022. [arXiv] [code]
  • Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer NeurIPS, 2022. (Spotlight) [project] [paper] [arXiv] [code]
  • Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui Advances in Neural Information Processing Systems (NeurIPS), 2021. [project] [paper] [arXiv] [code]
  • Generalized Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui International Conference on Computer Vision (ICCV), 2021. [project] [paper] [arXiv] [code] [video]
  • Parallel Convolutional Networks for Image Recognition via a Discriminator Shiqi Yang, Gang Peng Asian Conference on Computer Vision (ACCV), 2018. [paper] [arXiv]
  • Attention to Refine Through Multi Scales for Semantic Segmentation Shiqi Yang, Gang Peng Pacific-Rim Conference on Multimedia (PCM), 2018. [paper] [arXiv]

Journal

  • GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass Transactions on Machine Learning Research (TMLR), 2025. [arXiv]
  • Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. [paper] [arXiv]
  • Casting a BAIT for Offline and Online Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Luis Herranz, Shangling Jui, Joost van de Weijer Computer Vision and Image Understanding (CVIU), 2023. [paper] [arXiv] [code]
  • On Implicit Attribute Localization for Generalized Zero-Shot Learning Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer IEEE Signal Processing Letters, 2021. [paper] [arXiv]

Preprint and workshop paper

  • Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling Saurav Jha, M Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar World Modeling Workshop 2026. [arxiv]
  • OpenMU: Your Swiss Army Knife for Music Understanding Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji preprint, 2024. [arXiv] [code]
  • Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji preprint, 2024. [arXiv] [demo]
  • MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang preprint, 2023. [arXiv]
  • A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku preprint, 2023. [arXiv]
  • OneRing: A Simple Method for Source-free Open-partial Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer preprint, 2022. [project] [arXiv] [code]