ICIG 2021 Symposium on Image Representation and Understanding

Date&Time:15:40 – 17:40, December 28, 2021   Location: Meeting Room 1

Intro: Image-related researches have attracted significant attentions in the past decades. There have been great efforts to process and analyze the massive image data including both images and videos. Undoubtably, the processing of images, from bottom-level representation to high-level understanding, will still be a popular topic in the foreseeable future. This workshop invites four speakers to share their recent contributions and novel ideas in image representation and understanding.


Organizer: Tiesong Zhao (Fuzhou University) and Xu Wang (Shenzhen University)




Bridging Vision and Language: Recent Advances in Visual Translation

Hanli Wang


Mask-convolution for restoring over-exposure of solar image

Long Xu


Recent researches in Interactive Video

You Yang


Deep Learning based Video Coding: Challenges and Opportunities

Yun Zhang


Title: Bridging Vision and Language: Recent Advances in Visual Translation

Speaker: Hanli Wang,Tongji University

Abstract: Translating an image or a video automatically into natural language is interesting, promising, but challenging. The task is to summarize the visual content of image or video and re-express it with decent words, suitable grammars, sentence patterns and human habits. Nowadays, the encoding-decoding pipeline is the most commonly used framework to achieve this goal. In particular, the convolutional neural network is used as the encoder to extract semantics of images or videos, while the recurrent neural network is employed as the decoder to generate word sequence. In this talk, the literature on image and video description is firstly reviewed, then the preliminary research advances are introduced, including visual captioning, visual storytelling, visual dense captioning, visual sentiment captioning, and the more complex visual paragraph description.

Biography: Hanli Wang received the B.S. and M.S. degrees in Electrical Engineering from Zhejiang University, Hangzhou, China, in 2001 and 2004, respectively, and the Ph.D. degree in Computer Science from City University of Hong Kong, Kowloon, Hong Kong, in 2007. From 2007 to 2008, he was a Research Fellow with the Department of Computer Science, City University of Hong Kong. From 2007 to 2008, he was also a Visiting Scholar with Stanford University, Palo Alto, CA. From 2008 to 2009, he was a Research Engineer with Precoad, Inc., Menlo Park, CA. From 2009 to 2010, he was an Alexander von Humboldt Research Fellow in University of Hagen, Hagen, Germany. Since 2010, he has been a full Professor with the Department of Computer Science and Technology, Tongji University, Shanghai, China. His research interests include multimedia signal processing, computer vision, and machine learning. He has published more than 160 research papers in these research fields. His personal website is at https://mic.tongji.edu.cn.


Title: Mask-convolution for restoring over-exposure of solar image

Speaker: Long Xu,Chinese Academy of Sciences

Abstract: Over-exposure may happen for imaging of solar observation in case extremely violet solar bursts occur, which means that signal intensity goes beyond the dynamic range of an imaging system, resulting in information loss. Although over-exposure can be alleviated a little by reducing exposure time in case of flares, it cannot be solved completely. Recently, thanks to deep learning, lots of traditional image processing / reconstruction problems got breakthroughs, including image inpainting. Over-exposure recovery is like image inpainting. In this talk, we present a learning-based model, namely mask-pix2pix network for recovering/completing over-exposure regions of solar images. The proposed model is established over the pix2pix GAN, so it has the form of a GAN, where the generator and discriminator are a U-net and a PatchGAN, respectively. Beyond conventional pix2pix, it introduces a new convolution operator, namely mask-convolution, which is specially designed for inpainting tasks. To provide a convolution operator which could both fulfill block-wise convolution and eliminate interference of invalid pixels in mased region, we designed a mask convolution operator, which highlight the mask regions during repairing damaged image. Experimental results validate the advantage of the proposed model in the recovery of exposure solar images.

Biography:  Long Xu received his M.S. degree in applied mathematics from Xidian University, Xi’an, China, in 2002, and the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. He was a Postdoc with the Department of Computer Science, City University of Hong Kong, the Department of Electronic Engineering, Chinese University of Hong Kong, from July Aug. 2009 to Dec. 2012. From Jan. 2013 to March 2014, he was a Postdoc with the School of Computer Engineering, Nanyang Technological University, Singapore. Currently, he is with the Key Laboratory of Solar Activity, National Astronomical Observatories, Chinese Academy of Sciences. His research interests include image/video processing, wavelet, machine learning, and computer vision. He was selected into the 100-Talents Plan, Chinese Academy of Sciences, 2014.


Title: Recent researches in Interactive Video

Speaker: You Yang,Huazhong University of Science and Technology

Abstract: Video service accounts for more than 80% amount of data transmission over internet, and how can we bring more immersive experiences to audience becomes a challenge task to both academia and industries. People tend to be involved into the video, just like the visual experience in their daily life. In this case, more dimensions of interaction should be considered, including viewing angles, illumination conditions, focal length, etc. So far, only limited dimensions of interactions have been taken into application by industrials, and consideration by MPEG. In this talk, we present the recent research progress of the processing chain from data capture in source part to interaction in terminal part. Subtopics include multiview video capture and calibration, illumination coding and capture, focal stack images and their coding schemes.

Biography: You Yang received the Ph.D. degree in computer science from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in 2009. Since 2013, he has been with the School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China, and also with the Division of Intelligent Media and Fiber Communications, Wuhan National Laboratory for Opto-Electronics, Wuhan, where he is currently a Professor and the Head of Information Engineering Department. Before that, He has worked as a Postdoctoral Fellow with the Automation Department, Tsinghua University, from 2009 to 2011, and a Senior Research Scientist with Sumavision Research, from 2011 to 2013. He has authored or coauthored over 90 peer-reviewed articles and authorized 26 patents. His research interest includes three-dimensional (3D) vision system and its applications, including multiview imaging systems, 3D/VR/AR content processing and visual communications, human-machine interaction techniques, and interactive visual applications. Dr. Yang is a Fellow of the Institute of Engineering and Technology of United Kingdom. He has been awarded the High Commended in 2020 E&T Innovation Award of Outstanding Innovation in Communication & IT. He has been a Committee/TPC Member or the Session Chair of over 30 international conferences, including ICME, ICASSP, VCIP, ICIMCS, MMM, and others, and a Reviewer of 33 prestigious international journals from IEEE, ACM, OSA, and other associations. He has served as the General Secretary of Image and Video Communication Technical Committee in CSIG since 2020. He was invited to be the Judge of IET Innovation Award, in 2019. He was a Guest Editor of Neurocomputing, from 2014 to 2016. He has been an Associate Editor of Journal of Electronic Imaging since 2021, IEEE Access since 2018, IET Image Processing since 2018, and PLoS ONE since 2015.


Title: Deep Learning based Video Coding: Challenges and Opportunities

Speaker: Yun Zhang,Shenzhen Institute of Advanced Technology

Abstract: Due to the rapid growth of video applications and boosting demands for higher quality video services, such as UHD/HDR, 3D and VR, video data volume has been increasing explosively worldwide, which has been the most severe challenge for multimedia computing, transmission and storage. Video coding, such as HEVC and VVC, by compressing videos into a much smaller size for transmission or storage is one of the key solutions; however, its development has become saturated to some extent while the compression ratio continuously grows in the last three decades. Deep leaning algorithms provide new opportunities for further upgrading video coding algorithms. In this talk, our recent progress on deep learning-based video coding will be introduced. Firstly, intra prediction in video coding is formulated as an inpainting task, and Generative Adversarial Network (GAN) based intra video coding is developed to achieve higher coding efficiency. Secondly, chroma prediction in video coding is formulated as image colorization task, and deep learning based chroma predictive coding is proposed. Experimental results on HEVC and VVC are given to validate the effectiveness of the proposed deep learning-based coding optimizations. Finally, challenging issues and opportunities will be identified.

Biography: Yun Zhang received the Ph.D. degree in Computer Science from Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS), Beijing, China, in 2010. From 2010 to 2017, he was an Assistant Professor and an Associate Professor in Shenzhen Institute of Advanced Technology (SIAT), CAS, Shenzhen, China, where he is currently a professor. His research interests are in the field of multimedia communications and visual signal processing, including video compression, computational visual perception, VR/AR, and machine learning. Prof. Zhang has published 1 book and over 100 high quality scientific research papers, more than 40 of them are published on Top IEEE/ACM Transactions, such as IEEE Trans. Image Process., IEEE Trans. Broadcast., IEEE Trans. Circuits Syst. Video Technol., IEEE Trans. Indust. Electronics, IEEE Trans. Indust. Informatics. In addition, he has filed over 40 CN/US/PCT patents on visual signal processing and more than 20 of them are granted. He is a Senior Member of IEEE, and serves as Associate Editor of IEEE Access, Electronic Letters, Topic Editor of Sensors and Guest Editor on Special issue “Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing” in Sensors.