Workshop

ICIG 2021 Workshop on Emerging Technologies in Multimedia and Its Applications

Intro: Multimedia data, such as text, audio, images and video, are rapidly evolving as main avenues for the creation, exchange, and storage of information in the modern era. More and more emerging technologies were proposed both in academic and industrial domain to effectively process the massive multimedia data these. This workshop will invite three speakers from academic research to industrial application to discuss some novel ideals in multimedia and their applications.

 

Agenda

14:00-14:45 Junchi Yan,Shanghai Jiaotong University

14:45-15:30 Linna Zhou, Beijing University of Posts and telecommunications,

15:30-15:45 break

15:45-16:30 Junsong Wang, V-Origin Tech.,

 

 

Title: On the optimization of high-precision bounding box in rotation detection

Speaker: Junchi Yan

Abstract: Recently, many rotation detection algorithms suffer from problems such as inconsistency between metric and loss, boundary discontinuity, and square-like problem, which lead to the unrobust performance of the detection model in high-precision detection. For the inconsistency between metric and loss, we propose the IoU-Smooth L1 loss to approximate the underivable rotating IoU loss, so as to align the learning and evaluation of the model. For the boundary discontinuity, we innovatively convert the angle prediction from the regression-based form to the classification-based form, and design a Circular Smoothing Label (CSL) to fundamentally eliminate this problem. Besides, we also extended this problem to quadrilateral detection and proposed Modulated Loss to solve it. For the square-like problem, we propose a Dense Coded Label (DCL) based on CSL, and effectively overcome this problem by introducing a angle distance-aware and aspect ratio-aware weight. More importantly, we also designed a unified optimization method, namely Gaussian Wasserstein distance (GWD), which has several unique properties that can elegantly solve the above three problems without adding additional parameters and calculations. Finally, we open source a rotation detection benchmark that supports multiple methods and multiple datasets, and make corresponding quantitative and qualitative analysis of the above algorithms.

Biography: Dr. Junchi Yan is currently an Associate Professor with Shanghai Jiao Tong University. He is also the program manager for the prestigious SJTU ACM Class (AI direction). Before that, he was a Senior Research Staff Member (Principal Scientist for Cognitive IoT) with IBM Research – China where he started his career since April 2011, and once an adjunct professor with the School of Data Science, Fudan University. His research interests are machine learning, data mining and computer vision. He serves as an Associate Editor for IEEE ACCESS, IoT Discovery, (Managing) Guest Editor for IEEE Transactions on Neural Network and Learning Systems, Pattern Recognition Letters, Pattern Recognition, Vice Secretary of China CSIG-BVD Technical Committee, and on the executive board of ACM China Multimedia Chapter. He has published 80+ peer reviewed papers (CCF-A) in top venues in AI and has filed 30+ US patents. He has once been with IBM Thomas J. Watson Research Center, Japan NII, and Tencent/JD AI lab as a visiting researcher. He won the Distinguished Young Scientist of Scientific Chinese for year 2018, CAAI Wuwenjun Outstanding Young Scientist 2020, and CCF Outstanding Doctoral Thesis 2016.

 

Title: Multimedia forensics from the perspective of content-behavior-cognition

Speaker: Linna Zhou

Abstract: As a new scientific paradigm, big data behavior analysis method provides new perspectives, new methods and new technologies characterized by data-driven storage and computing. This report integrates the technologies and methods of Deepfake, trace retention and data footprint extraction in the big data environment, and studies the digital forensics technology based on the whole life-cycle confrontation technology chain of Deepfake and its identification forensics. It also takes the development of facial forgery from facial synthesis face exchange face manipulation, and forensics identification from pixel layer three-dimensional modeling layer multi-clue semantic layer as an example. This talk interprets the AI to AI Game of Deepfake and forensics in the era of artificial intelligence.

Biography: Linna Zhou is a professor of Beijing University of Posts and telecommunications, graduated from Beijing University of Posts and telecommunications with a doctoral degree and a postdoctoral degree from Tsinghua University. She is the recipient of State Council special allowance and the head of the National Key Areas Innovation Team. She has been selected into the National Million Talents Project and awarded the honorary title of “young and middle-aged experts with outstanding contributions”. Besides, she has served as a committee or member of national associations. She has published many high-level academic papers and 4 monographs in related research fields. She has presided over and completed three key and general projects of National Nature Science Foundation of China (NSFC), three National Key Scientific Research Projects, and more than dozens of major scientific research projects at the ministerial level. She has won the second prize of National Science and Technology Progress Award in 2006, the second prize of National Science and Technology Progress Award in 2009, the second prize of National Science and Technology Progress Award in 2016, and the second prize of National Science and Technology Progress Award in 2019. In 2011, she won the first prize of National Science and Technology Progress Award.

 

 

Title: Fine-Grained Texture Identification for Reliable Product Traceability

Speaker: Junsong Wang,

Abstract: Texture exists in lots of the products, such as wood, beef and compression tea. These abundant and stochastic texture patterns are significantly different between any two products. Unlike the traditional digital ID tracking, in this paper, we propose a novel approach for product traceability, which directly uses the natural texture of the product itself as the unique identifier. A texture identification traceability system for Pu’er compression tea is developed to demonstrate the feasibility of the proposed solution. With tea-brick images collected from manufactures and individual users, a large-scale dataset has been formed to evaluate the performance of tea-brick texture verification and searching algorithm. The texture similarity approach with local feature extraction and matching achieves the verification accuracy of 99.6% and the top-1 searching accuracy of 98.9%, respectively.

Biography: Junsong Wang is the director of Innovation Lab in V-origin, a startup company focus on agricultural digitization using AI and blockchain. He received his Master degree from Xi’an Jiaotong University. He worked as the research staff member in IBM China Research Lab from 2010 to 2019. His research focuses on wireless signal processing, computer vision and deep learning accelerator. His recent work is texture identification and searching in digital agriculture. He has published more than 10 papers in international conferences and

journals, and filed 20+ patents. He received Best Paper awards in ICCAD2018.

 

 

 

 

 

 

 

 

 

 

ICIG 2021 Workshop on

3D Point Cloud Processing, Analysis, and Communication (PC-PAC)

Intro: One emerging representation of 3D world objects is point clouds where in addition to the spatial coordinates of points on the surface of the objects, attributes such as color, reflectance, transparency, normal direction, motion direction, and so forth are captured. Point clouds are receiving increased attention due to their potential to improve immersive experience in virtual reality, augmented reality, mixed reality, and telepresence applications. The aim of the workshop is to bring together researchers interested in recent developments in this field, stimulate discussions, and promote new approaches and solutions to current challenges. This workshop is supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 836192.

 

Title: High-Efficiency Point Cloud Compression: Learnt Approach & Distortion Measurement

Abstract: This talk will cover two topics in recently-emerged point cloud processing: one is the learning driven point cloud geometry (PCG) compression and the other is graph based point cloud distortion measurement. For PCG compression, we start with the theoretical motivation and discuss the explorations along this avenue. Currently, our learnt PCG compressions offers state-of-the-art efficiency in comparision to the standard compliant solutions. As reported in many studies that existing point cloud distortion measurements are of great limitation due to its unstructured geometry appearance in 3D space, e.g., point-to-point, or point-to-plane, we introduce graph based similarity (GraphSIM) to address this problem. Extensive simulations have shown that our GraphSIM have demonstrated encouraging efficiency in different point cloud applications.

Zhan Ma is with the school of Electronic Science and Engineering, Nanjing University, Jiangsu, 210093, China. He received the B.S. and M.S. from the Huazhong University of Science and Technology, Wuhan, China, in 2004 and 2006 respectively, and his Ph.D. degree from the New York University, New York, in 2011. From 2011 to 2014, he has been with Samsung Research America, Dallas, TX, and  Futurewei Technologies, Inc., Santa Clara, CA, respectively. His research focuses include the learning-based image/video coding, and computational imaging. He is a co-winner of the 2018 PCM Best Paper Finalist, 2019 IEEE Broadcast Technology Society Best Paper Award, and 2020 IEEE MMSP Grand Challenge Best Image Coding Solution.

 

 

Title: Deep Regular Geometry Representations for 3D Point Clouds

Abstract: Although convolutional neural networks have achieved remarkable success in analyzing 2D images/videos, it is still non-trivial to apply the well-developed 2D techniques in regular domains to the irregular 3D point cloud data. To bridge this gap, we propose ParaNet, a novel end-to-end deep learning framework, for representing 3D point clouds in a completely regular and nearly lossless manner. To be specific, ParaNet converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI), where each pixel encodes the spatial coordinates of a point. In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible. The PGIs can be seamlessly coupled with a task network established upon standard and mature techniques for 2D images/videos to realize a specific task for 3D point clouds. We evaluate ParaNet over shape classification and point cloud upsampling, in which our solutions perform favorably against the existing state-of-the-art methods. We believe such a paradigm will open up many possibilities to advance the progress of deep learning-based point cloud processing and understanding.

 Junhui Hou received the B.Eng. degree in information engineering (Talented Students Program) from the South China University of Technology, Guangzhou, China, in 2009, the M.Eng. degree in signal and information processing from Northwestern Polytechnical University, Xian, China, in 2012, and the Ph.D. degree in electrical and electronic engineering from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2016. He has been an Assistant Professor with the Department of Computer Science, City University of Hong Kong, since 2017. His research interests fall into the general areas of visual computing, such as image/video/3D geometry data representation, processing and analysis, semi/un-supervised data modeling, and data compression and adaptive transmission Dr. Hou was the recipient of several prestigious awards, including the Chinese Government Award for Outstanding Students Study Abroad from China Scholarship Council in 2015, and the Early Career Award (3/381) from the Hong Kong Research Grants Council in 2018. He is currently serving as an Associate Editor for IEEE Transactions on Circuits and Systems for Video Technology, The Visual Computer, an Area Editor for Signal Processing: Image Communication, the Guest Editor for the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. He also served an Area Chair of ACM International Conference on Multimedia (ACM MM) 2019 and 2020, and IEEE International Conference on Multimedia & Expo (IEEE ICME) 2020. He is a senior member of IEEE.

 

Title: Technique and standardization about point cloud

Abstract: As a typical presentation format for visual media, 3D point clouds have drawn wide attention from both industry and academia. Different from image or video pixels that are distributed evenly over the 2D grid, the structures of the point clouds exhibit high irregularity, which puts forward new requirements for related processing techniques. For point cloud compression, the well-known Moving Picture Experts Group (MPEG) establishes specific working groups, such as the Video-based point cloud compression (V-PCC) and Geometry-based point cloud compression (G-PCC), to develop efficient point cloud compression methods using data projection and octree-based data partition. As one of the earliest participants of MPEG-PCC, Shanghai Jiao Tong University (SJTU) carried out systematic study on point cloud compression and achieved SOTA performance. Meanwhile, the development of lossy compression techniques induces urgent demands for robust point cloud quality assessment. In this perspective, the SJTU group proposed projection-based, graph signal processing (GSP)-based and point potential energy-based models to realize effective and robust subjective perception prediction. These quality assessment models can also be used as loss functions to assist the generation of high-quality samples, in an unsupervised learning manner, in tasks such as point cloud reconstruction, shape completion, and upsampling.

 

Yiling Xu received the B.S., M.S. and Ph.D. from the University of Electronic Science and Technology of China, in 1999, 2001 and 2004 respectively. From 2004 to 2013, she has been with Samsung Electronics as a senior engineer in Multimedia Communication Research Institute of Samsung Electronics Inc, Korea. She joined the Shanghai Jiaotong University (SJTU) and is now a professor in the areas of multimedia communication, 3D point cloud compression and assessment, system design and network optimization. She is also an active member in standard organizations including MPEG, 3GPP, AVS.

 

Title: Video Encoder optimization for video-based point cloud compression

Abstract: Moving pictures experts group (MPEG) video-based point cloud compression (V-PCC) achieves state-of-the-art performance for dynamic point cloud compression. The main idea of V-PCC is to use mature video coding technologies to compress point clouds. However, the current video coding technologies are mainly designed for natural videos, which limits the performance of the V-PCC. In this talk, I would like to introduce some video encoder optimization methods to further improve the performance of the V-PCC. First, I will give a very brief introduction about V-PCC. Second, to handle the non-corresponding patches between neighboring point cloud frames, I will introduce a 3D motion prediction method to find an accurate motion vector predictor for each patch. Third, to handle the unoccupied pixels in point cloud frames, I will introduce an occupancy-map-based rate-distortion optimization method to ignore distortions of the unoccupied pixels in V-PCC. Finally, to handle the boundaries between the occupied and unoccupied pixels, I will introduce an adaptive partition method to give a proper partition to each boundary block. The experimental results show that the proposed algorithm leads to significant BD-rate savings compared with the V-PCC anchor.

Li Li received the B.S. and Ph.D. degrees in electronic engineering from the University of Science and Technology of China (USTC), Hefei, Anhui, China, in 2011 and 2016, respectively. He is currently a research fellow in the University of Science and Technology of China. He was a visiting assistant professor in University of Missouri-Kansas City from 2016 to 2020. His research interests include image/video/point cloud compression and processing. He has 60+ publications in book chapters, journals, and conferences in these areas. He received the Top 10% Paper Award at the 2016 IEEE Visual Communications and Image Processing (VCIP) Conference and 2019 IEEE International Conference on Image Processing (ICIP) Conference. He was the winner of the 2016 IEEE International Conference on Multimedia and Expo Grand Challenge on Light Field Image Compression.

 

Title: No Reference Point Cloud Quality Assessment via Multi-view Projection

Abstract: Recently, 3D point cloud is becoming popular due to its capability to represent the real world for advanced content modality in modern communication systems. In view of its wide applications, especially for immersive communication towards human perception, quality metrics for point clouds are essential. Existing point cloud quality evaluations rely on a full or certain portion of the original point cloud, which severely limits their applications. To overcome this problem, we propose a novel deep learning-based no reference point cloud quality assessment method, namely PQA-Net. Specifically, the PQA-Net consists of a multi-view-based joint feature extraction and fusion (MVFEF) module, a distortion type identification (DTI) module, and a quality vector prediction (QVP) module. The DTI and QVP modules share the feature generated from the MVFEF module. By using the distortion type labels, the DTI and the MVFEF modules are first pre-trained to initialize the network parameters, based on which the whole network is then jointly trained to finally evaluate the point cloud quality. Experimental results on the Waterloo Point Cloud dataset show that PQA-Net achieves better or equivalent performance comparing with the state-of-the-art quality assessment methods.

 

Hui Yuan (S’08-M’12-SM’17, IEEE) received the B.E. and Ph.D. degree in telecommunication engineering from Xidian University, Xi’an, China, in 2006 and 2011, respectively. From 2011.04 to now, he works as Lecturer (2011.04-2014.12), Associate Professor (2015.01-2016.08), and Full Professor (2016.09-), at Shandong University (SDU), Jinan, China. From 2013.01-2014.12, and 2017.11-2018.02, he also worked as a post-doctor fellow (Granted by the Hong Kong Scholar Project) and a research fellow, respectively, with the department of computer science, City University of Hong Kong (CityU). He has been a Marie Skłodowska-Curie Fellow working on the OPT-PCC project funded by the European Commission since November 2020. He has authored and co-authored more than 80 academic papers in video compression (especially for 3D video coding and processing), transmission. His current research interests include video/image/immersive media compression, adaptive video streaming, computer vision, etc.

 

 

 

 

ICIG 2021 Workshop on

Few-Shot Learning-Based High-speed Railway Catenary Image Detection and Analysis

Intro: The catenary of high-speed railways provides electricity for EMU trains and plays a vital role in the operation of high-speed railways. Therefore, the railway department adopts a large number of inspection methods for the high-speed railway catenary, including manual inspection and track inspection. With the rapid development of UAV technology, UAV aerial photography has become a novel method of catenary inspection, and a large number of catenary pictures have been taken. To ensure the safe operation of high-speed railway and improve inspection efficiency, it is necessary to intelligently analyze the catenary pictures taken by the track inspection vehicle and the drone inspection. However, due to the fact that researchers have little knowledge of the industry and lack of sufficient samples, domestic and foreign research in this area is almost blank.

To promote the development of intelligent detection and analysis of high-speed railway catenary, a challenge termed ” Few-Shot Learning-Based High-speed Railway Catenary Image Detection and Analysis” is proposed. A part of the data collected on the high-speed railway is used for the challenge. This challenge aims to attract researchers in the field of image processing and intelligent transportation to solve small sample problems by technical advantages of image processing and expertise of intelligent transportation. This will provide a research foundation for the intelligent detection and analysis of the high-speed railway catenary, and promote the development of technical research and application levels in related fields.

The winner will receive an award certificate and a generous bonus, as well as a cooperating international journal paper.

Those interested in participating, please send the team information to zhiwei@bjtu.edu.cn.

2 Dataset

We provide a set of training set and validation set of catenary inspection images. The training set and validation set include 50 defect images of high-speed railway catenary and annotation information in PASCAL VOC format. The testing set is not open and consists of 10 catenary defect images. We will test the codes of the participating teams on this data set. The ranking criterion will be the ranking of the average accuracy (mAP) of the testing set.

Training set and validation set:

Google drive:

https://drive.google.com/file/d/1YZO5JGpKd9_raS7NYQl3eSllonXu0Bcu/view?usp=sharing

Baiduyun:

https://pan.baidu.com/s/1Uh1cKRMaxLgEI_kfhygUyg,password:720p 。

3 Guide

3.1Time

  • May 1 Release the challenge
  • June 20 Registration deadline
  • July 20 Deadline for code submission
  • July 20 Announce results
  • August 6-8 Awards during the ICIG 2021 conference

3.2 Awards

(1)Award certificate

(2)Bonus

(3)A paper in a cooperative international journal

3.3 Submit procedure

Participants will send the code (with readme) of the object detection algorithm to the mail cuijing@bjtu.edu.cn.

3.4 Attention

You guarantee that your submission is your own original work, otherwise the results will be cancelled;

It is strictly forbidden to use other data sets for supervised or unsupervised training.

4 Organizing Committee

DepartmentState Key Lab of Rail Traffic Control & Safety (Beijing Jiaotong University)

ChairmanProfessor Qin Yong, Professor Jia Limin

Members: Xie Zhengyu, Ma Xiaoping, Wu Yunpeng, Cao Zhiwei, Cui Jing

Contact: Cao Zhiwei (zhiwei@bjtu.edu.cn), Cui Jing (cuijing@bjtu.edu.cn)

Organizing CommitteeState Key Lab of Rail Traffic Control & Safety (Beijing Jiaotong University) has long been engaged in the research and application of rail transit safety technology based on image processing. It has accumulated a wealth of academic research results and field application experience, and has accumulated a wealth of field data, which can be used in this challenge. The team participated in the 2018 CVPR image dehazing challenge and the 2018 ChinaMM image dehazing competition. They won the second and third place respectively and have rich experience in participating. In addition, the team has hosted many international conferences such as EITRT and organized ICIG competitions, and has rich conference and competition organization experience.

If you have any questions about this challenge, please send an email to zhiwei@bjtu.edu.cn.