Attention and Transformers for Vision

Title：Attention and Transformers for Vision

Date＆Time：14:00 – 16:00, December 26, 2021 Location: Prince Ballroom

Biography：Prof. Xi Li，IET Fellow，IEEE Senior Member，was the winner of Zhejiang Provincial Science Foundation for Outstanding Young Scholars, and was appointed as a distinguished expert of the Zhejiang Province government. Also, he served as a council member of the China Image and Graphics Society. Meanwhile, he had a wealth of academic experience in serving as program committee members of top conferences (e.g., NIPS, ICML, CVPR, ICCV) or reviewers of premier journals (e.g., IEEE TPAMI, IJCV, IEEE TIP). Besides, he made several invited talks at many well-known conferences at home and abroad (i.e., RACV 2016, ICSW 2017, ICDS 2017, IEEE FMT 2018). His research interests mainly focus on the AI fields of computer vision and machine learning, and has published approximately 150 top conferences and leading journal papers with about 4600 Google Scholar citations. He devoted his efforts to enabling many academic roles in conference organization (e.g., PRCV 2019 AC, ICPR 2018 AC, IJCAI 2019 SPC, and ICCV 2019 AC, CVPR 2020 AC, ICPR 2020 AC) and journal editorial management (e.g., AEs of IEEE TNNLS, IEEE TCSVT, Neurocomputing, and Neural Processing Letters). He won two Best International Conference Paper Awards (including ACCV 2010 and DICTA 2012), an ICIP 2015 Top 10% Conference Paper Award, and an ACML 2017 Best Student Paper Award. In addition, he won two China Natural Science and Technology Awards (including first-class and second-class prizes) and a Chinese Patent Excellence Award.

Abstract：The era of the Internet and the Internet of Things gives rise to big data of images or videos. It is in urgent need of artificial intelligence technologies and methods in order to effectively extract knowledge from these massive vision data. Therefore, in the epoch of knowledge economy, how to perform artificial intelligence-driven visual computing has become a core technical problem that needs to be solved foremost. This lecture mainly focuses on data-driven visual feature learning based on attention and transformers from large-scale image/video data. It analyzes and introduces the main research problems and technical methods involved in large-scale visual feature learning in terms of target visual perception characteristics, visual feature representation, deep learner construction mechanism, high-level semantic understanding. Also, it systematically reviews the development of visual feature representation and learning, and introduces a series of representative works and practical applications in recent years using visual feature learning for visual semantic analysis and understanding. At the end of the lecture，some open problems and difficulties in the learning of visual features will be discussed.