Biography
I am a second-year Ph.D. student in the Department of Computer Science at the University of Maryland, College Park, advised by Prof. Ang Li. Previously, I was a research intern advised by Dr. Tianlong Chen at Massachusetts Institute of Technology (CSAIL@MIT). I used to be a research assistant at the Research Institute of Intelligent Complex Systems at Fudan University, supervised by Prof.Siqi Sun. Before that, I was a research intern at JD Explore Academy, supervised by Dr. Liang Ding and Prof. Dacheng Tao. My research interests primarily lie in the area of deep learning, model compression, natural language processing (NLP), and AI + X (e.g., health, finance).
News
[10/2023]: One paper (Merging Experts into One) is accepted by EMNLP 2023.
[05/2023]: One paper (PAD-Net) is accepted by ACL 2023.
[04/2023]: One paper (NeuralSlice) is accepted by ICML 2023.
[10/2022]: One paper (SparseAdapter) is accepted by EMNLP 2022.
[08/2022]: One paper (SD-Conv) is accpeted by WACV 2023.
[07/2022]: 🏆 Ranked 1st (Chinese<=>English, German<=>English, Czech<=>English, English=>Russian), 2nd (Russian=>English, Japanese=>English), and 3rd (English=>Japanese) in General Translation Task in WMT 2022.
[01/2022]: One paper is accepted by AAAI-22 KDF.
Research Experience
- CSAIL, Massachusetts Institute of Technology
- 11/2023 - 04/2024
- Efficient ML
- IICS, Fudan University
- 07/2022 - 03/2023
- AI for Protein, Computational Biology
- NLP Group, JD Explore Academy
- 02/2022 - 10/2022
- Machine Learning, Efficient Methods for NLP
Selected Publications
- Shwai He*, Guoheng Sun*, Zheyu Shen, Ang Li, “What Matters in Transformers? Not All Attention is Needed”, arXiv. [Paper] [Code]
- Shwai He*, Daize Dong*, Liang Ding, Ang Li, “Demystifying the Compression of Mixture-of-Experts\ Through a Unified Framework”, arXiv. [Paper] [Code]
- Shwai He, Ang Li, Tianlong Chen, “Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity and Performance Restoration”, arXiv. [Paper] [Code]
- Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao, “Merging Experts into One: Improving Computational Efficiency of Mixture of Experts”, Proceedings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023 Oral). [Paper] [Code]
- Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao, “PAD-Net: An Efficient Framework for Dynamic Networks”, Proceedings of The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023). [Paper] [Code]
- Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao, “SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters”, Findings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [Paper] [Code]
- Shwai He, Chenbo Jiang, Daize Dong, Liang Ding, “SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution”, IEEE/CVF Winter Conference on Applications of Computer Vision, 2023 (WACV 2023). [Paper]
- Shwai He, Shi Gu, “Multi-modal Attention Network for Stock Movements Prediction”, The AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Service (KDF 2022). [Paper]
- Chenbo Jiang, Jie Yang, Shwai He, Yu-Kun Lai and Lin Gao. “NeuralSlice: Neural 3D Triangle Mesh Reconstruction via Slicing 4D Tetrahedral Meshes.”, Proceedings of the 40th International Conference on Machine Learning, 2023 (ICML 2023). [Paper] [Code]
- Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan and Dacheng Tao, “Vega-MT: The JD Explore Academy Translation System for WMT”, The Conference on Machine Translation, 2022 (WMT 2022). [Paper]