Shiqi Yang
Visiting Researcher, Nankai University, China
From 2024.12, I am a researcher and team leader (as well as a research manager) of the Creative Vision team at SB Intuitions in Tokyo, and I am also affiliated with SoftBank Corp. From 2023.10 to 2024.11, I worked as an audio-visual research scientist in Sony Group Corporation, Tokyo. Before that, I was a Ph.D. student in the Learning and Machine Perception (LAMP) team (2019.10 – 2023.7), advised by Joost van de Weijer at the Computer Vision Center , Autonomous University of Barcelona, Spain.
- Currently, I am leading industrial projects (pretraining and also post training) in visual (image and video) generation and manipulation.
- I was working on multi-modal (especially audio-visual) generation when I was in Sony.
- During my Ph.D., I focused on how to efficiently adapt pretrained models to real-world environments under domain and category shift in an unsupervised manner, where the related research topics cover zero-shot learning, source-free / test-time / continual / open-set domain adaptation.
News
- [2025.11] Serve as area chair of ICML 2026.
- [2025.9] 2 papers are accepted by NeurIPS 2025.
- [2025.5] We will host the 2nd workshop on Audio-Visual Generation and Learning (AVGenL) in ICCV 2025. We will have 1 industrial session this year: Veo 3 from Google DeepMind. Stay tuned for more details.
- [2025.2] "One-way ticket" is accepted by CVPR 2025.
- [2025.1] "Mine Your Own Secrets" , "InternLCM" and " One-Prompt-One-Story " (spotlight) are accepted by ICLR 2025.
- [2024.10] Have visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento.
- [2024.9] Our paper "Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models" is accepted by NeurIPS 2024.
- [2024.4] We are organizing an ECCV 2024 workshop "AVGenL: Audio-Visual Generation and Learning" . Please check the site for CfP and speakers.
- [2023.12] My doctoral thesis received "Pioneer Awards 2023 - CERCA" .
- [2023.9] Our paper "Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing" is accepted by NeurIPS 2023.
- [2023.8] Extended version of "NRC" is accepted by IEEE TPAMI.
- [2023.6] "Casting a BAIT for Offline and Online Source-free Domain Adaptation" is finally accepted by CVIU.
- [2023.1] Have a visiting talk in Prof. Maria Brbic 's group in EPFL.
- [2022.11] I present our work on model adaptation under domain and category shift on TrustML Young Scientist Seminars (hosted by RIKEN AIP) on Dec. 7.
- [2022.9] "Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation" is accepted by NeurIPS 2022 as Spotlight, and our paper "Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification" is accepted by BMVC 2022.
- [2021.9] "Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation" is accepted by NeurIPS 2021.
- [2021.7] "Generalized Source-free Domain Adaptation" is accepted by ICCV 2021.
Experience
-
Dec. 2024 – present
(Chief) Research Scientist / Team Leader, Research manager, SB Intuitions, SoftBank, Tokyo, Japan. -
Oct. 2023 – Nov. 2024
Research Scientist, Sony Group Corporation, Tokyo, Japan. -
Jan. 2023 – Jun. 2023
Research Intern, OMRON SINIC X , Tokyo, Japan. -
Oct. 2018 – Mar. 2019
Guest Research Associate, Kyoto University, Japan.
Invited Talks, Awards & Activities
- Visiting talks in MICC Lab (Prof. Andrew Bagdanov) in University of Florence and MHUG Lab (Prof. Nicu Sebe) in University of Trento, Italy, 2024.10.
- Pioneer Awards 2023 , CERCA Research Center of Catalonia, Spain, 2023.12.
- Visiting talk in Prof. Maria Brbic 's group in EPFL, Switzerland, 2023.1.
- Invited talk on TrustML Young Scientist Seminars , RIKEN AIP, Japan, 2022.12.
- Participation in ICVSS Summer School , Sicily, Italy, 2022.7.
- Invited talk on AI Time Seminar on NeurIPS 2021 (Virtual), China, 2022.2.
Academic Service
- Conference Area Chair: ICML 2026.
- Guest Editor: IJCV Special Issue " Audio-Visual Generation ".
- Organizer: ECCV 2024 / ICCV 2025 " Audio-Visual Generation and Learning workshop ".
- Conference Reviewer: ICLR; ICCV; NeurIPS; ECCV; ICML; CVPR; WACV.
- Journal Reviewer: IEEE TKDE; TPAMI; TAI; IJCV.
Education
-
Oct. 2019 – Jul. 2023
Ph.D. in Computer Science, Computer Vision Center , Autonomous University of Barcelona, Spain. -
Sep. 2016 – Jun. 2019
Master in Control Science and Technology, Huazhong University of Science and Technology, China. -
Sep. 2012 – Jun. 2016
Bachelor in Automation, Wuhan University of Science and Technology, China.
Contact
Contact: shiqi.yang147.jp@gmail.com
International Conference
- Free-Lunch Color-Texture Disentanglement for Stylized Image Generation Advances in Neural Information Processing Systems (NeurIPS), 2025. [arXiv]
- From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging NeurIPS, 2025. [arXiv]
- One-way ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [arXiv]
- Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models International Conference on Learning Representations (ICLR), 2025. [arXiv] [openreview] [project]
- One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt ICLR, 2025. (Spotlight) [arXiv] [openreview] [project]
- InternLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration ICLR, 2025. [arXiv] [openreview] [project]
- Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models Advances in Neural Information Processing Systems (NeurIPS), 2024. [project] [arXiv] [code]
- SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond International Society for Music Information Retrieval (ISMIR), 2024. [arXiv]
- Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing Advances in Neural Information Processing Systems (NeurIPS), 2023. [paper] [arXiv] [code]
- Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification British Machine Vision Conference (BMVC), 2022. [arXiv] [code]
- Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation NeurIPS, 2022. (Spotlight) [project] [paper] [arXiv] [code]
- Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation Advances in Neural Information Processing Systems (NeurIPS), 2021. [project] [paper] [arXiv] [code]
- Generalized Source-free Domain Adaptation International Conference on Computer Vision (ICCV), 2021. [project] [paper] [arXiv] [code] [video]
- Parallel Convolutional Networks for Image Recognition via a Discriminator Asian Conference on Computer Vision (ACCV), 2018. [paper] [arXiv]
- Attention to Refine Through Multi Scales for Semantic Segmentation Pacific-Rim Conference on Multimedia (PCM), 2018. [paper] [arXiv]
Journal
- GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models Transactions on Machine Learning Research (TMLR), 2025. [arXiv]
- Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. [paper] [arXiv]
- Casting a BAIT for Offline and Online Source-free Domain Adaptation Computer Vision and Image Understanding (CVIU), 2023. [paper] [arXiv] [code]
- On Implicit Attribute Localization for Generalized Zero-Shot Learning IEEE Signal Processing Letters, 2021. [paper] [arXiv]
Preprint and workshop paper
- Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling World Modeling Workshop 2026. [arxiv]
- EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization preprint, 2025. [arXiv]
- OpenMU: Your Swiss Army Knife for Music Understanding preprint, 2024. [arXiv] [code]
- Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation preprint, 2024. [arXiv] [demo]
- MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing preprint, 2023. [arXiv]
- A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task preprint, 2023. [arXiv]
- OneRing: A Simple Method for Source-free Open-partial Domain Adaptation preprint, 2022. [project] [arXiv] [code]