Shiqi Yang

杨 诗琪   ·   Ph.D.
Team leader / Researcher, SB Intuitions (SoftBank), Tokyo, Japan
Visiting Researcher, Nankai University, China
#Multi-Modal Generation
Location Tokyo, Japan

From 2024.12, I am a researcher and team leader (as well as a research manager) of the Creative Vision team at SB Intuitions in Tokyo, and I am also affiliated with SoftBank Corp. From 2023.10 to 2024.11, I worked as an audio-visual research scientist in Sony Group Corporation, Tokyo. Before that, I was a Ph.D. student in the Learning and Machine Perception (LAMP) team (2019.10 – 2023.7), advised by Joost van de Weijer at the Computer Vision Center , Autonomous University of Barcelona, Spain.

  • Currently, I am leading industrial projects (pretraining and also post training) in visual (image and video) generation and manipulation.
  • I was working on multi-modal (especially audio-visual) generation when I was in Sony.
  • During my Ph.D., I focused on how to efficiently adapt pretrained models to real-world environments under domain and category shift in an unsupervised manner, where the related research topics cover zero-shot learning, source-free / test-time / continual / open-set domain adaptation.

News

Experience

  • Dec. 2024 – present
    (Chief) Research Scientist / Team Leader, Research manager, SB Intuitions, SoftBank, Tokyo, Japan.
  • Oct. 2023 – Nov. 2024
    Research Scientist, Sony Group Corporation, Tokyo, Japan.
  • Jan. 2023 – Jun. 2023
    Research Intern, OMRON SINIC X , Tokyo, Japan.
  • Oct. 2018 – Mar. 2019
    Guest Research Associate, Kyoto University, Japan.

Invited Talks, Awards & Activities

Academic Service

Education

  • Oct. 2019 – Jul. 2023
    Ph.D. in Computer Science, Computer Vision Center , Autonomous University of Barcelona, Spain.
  • Sep. 2016 – Jun. 2019
    Master in Control Science and Technology, Huazhong University of Science and Technology, China.
  • Sep. 2012 – Jun. 2016
    Bachelor in Automation, Wuhan University of Science and Technology, China.

Contact

Contact: shiqi.yang147.jp@gmail.com

Full Publications
Journal articles, preprints, and international conference papers.

International Conference

  • Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Jiang Qin, Senmao Li, Alexandra Gomez-Villa, Shiqi Yang, Yaxing Wang, Kai Wang, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2025. [arXiv]
  • From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang NeurIPS, 2025. [arXiv]
  • One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv]
  • Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models Saurav Jha, Shiqi Yang*, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji International Conference on Learning Representations (ICLR), 2025. [arXiv] [openreview] [project]
  • One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. (Spotlight) [arXiv] [openreview] [project]
  • InternLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng ICLR, 2025. [arXiv] [openreview] [project]
  • Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang Advances in Neural Information Processing Systems (NeurIPS), 2024. [project] [arXiv] [code]
  • SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji International Society for Music Information Retrieval (ISMIR), 2024. [arXiv]
  • Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer Advances in Neural Information Processing Systems (NeurIPS), 2023. [paper] [arXiv] [code]
  • Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification Kai Wang, Chenshen Wu, Andrew D. Bagdanov, Xialei Liu, Shiqi Yang, Shangling Jui, Joost van de Weijer British Machine Vision Conference (BMVC), 2022. [arXiv] [code]
  • Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer NeurIPS, 2022. (Spotlight) [project] [paper] [arXiv] [code]
  • Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui Advances in Neural Information Processing Systems (NeurIPS), 2021. [project] [paper] [arXiv] [code]
  • Generalized Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui International Conference on Computer Vision (ICCV), 2021. [project] [paper] [arXiv] [code] [video]
  • Parallel Convolutional Networks for Image Recognition via a Discriminator Shiqi Yang, Gang Peng Asian Conference on Computer Vision (ACCV), 2018. [paper] [arXiv]
  • Attention to Refine Through Multi Scales for Semantic Segmentation Shiqi Yang, Gang Peng Pacific-Rim Conference on Multimedia (PCM), 2018. [paper] [arXiv]

Journal

  • GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass Transactions on Machine Learning Research (TMLR), 2025. [arXiv]
  • Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui, Jian Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. [paper] [arXiv]
  • Casting a BAIT for Offline and Online Source-free Domain Adaptation Shiqi Yang, Yaxing Wang, Luis Herranz, Shangling Jui, Joost van de Weijer Computer Vision and Image Understanding (CVIU), 2023. [paper] [arXiv] [code]
  • On Implicit Attribute Localization for Generalized Zero-Shot Learning Shiqi Yang, Kai Wang, Luis Herranz, Joost van de Weijer IEEE Signal Processing Letters, 2021. [paper] [arXiv]

Preprint and workshop paper

  • Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling Saurav Jha, M Jehanzeb Mirza, Wei Lin, Shiqi Yang, Sarath Chandar World Modeling Workshop 2026. [arxiv]
  • EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization Yixiong Yang, Tao Wu, Senmao Li, Shiqi Yang, Yaxing Wang, Joost van de Weijer, Kai Wang preprint, 2025. [arXiv]
  • OpenMU: Your Swiss Army Knife for Music Understanding Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji preprint, 2024. [arXiv] [code]
  • Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji preprint, 2024. [arXiv] [demo]
  • MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing Kangneng Zhou, Daiheng Gao, Xuan Wang, Jie Zhang, Peng Zhang, Xusen Sun, Longhao Zhang, Shiqi Yang, Bang Zhang, Liefeng Bo, Yaxing Wang preprint, 2023. [arXiv]
  • A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku preprint, 2023. [arXiv]
  • OneRing: A Simple Method for Source-free Open-partial Domain Adaptation Shiqi Yang, Yaxing Wang, Kai Wang, Shangling Jui, Joost van de Weijer preprint, 2022. [project] [arXiv] [code]