Navigation


Guided Study: Crowd Counting Project

AT-A-GLANCE: Crowd Counting is a common and practical measurement nowadays. The crowd counting is defined as the technique to estimate the number of detected people. As humans cannot accurately measure the number of people in a detected scene quickly, especially when a tremendous amount of people is crammed inside a screen, the use of deep learning and computer vision can greatly increase the efficiency and reduce the error rate. Under the development of computer vision related technology, the use of convolutional neural networks (CNN) would be useful to train the model to learn the characteristics of images in order to generate a density map, which is useful in predicting and estimating the crowd number. This project adapts this state-of-art network method - Dilated Convolutional Neural Networks - with three steps: generate ground truth density map using the given dataset; train the model with data; make the prediction and test the results. To go through the entire application, this paper depicts from the background of the project, including the comparison of possible approaches, to the design and implementation, with the help of the dataset CityStreet provided by the City University of Hong Kong. This also serves as my first computer vision related project.

Deliverables

Reflection

As a beginner of Computer Vision, it is my greatest attitude to my supervisor in bringing the brief concepts and ideas for the use of techniques. Hence, thank you Zhang Qi, the proposer of the CityStreet dataset, for providing me the detailed information of the dataset; as well as Jia Wan in teaching the coding skills face-to-face under this critical period of time. Most importantly, I am grateful towards Prof. Antoni Chan's support to my project. Without the weekly meeting and updates, I would never understand the wondrous computer vision project.

This project was originally set to be the social distancing violation map. It is an essential issue nowadays under the pandemic. It provides practical usage for authorities to control and it is conducive to safeguard public health. While counting objects and distinguishing distances are relatively effortless for human-being; it can still be complicated due to the classification between shapes and objects for computer vision algorithms. Lamentably, due to the time and knowledge limitation, that I am new to this field, I was not able to finish the social distance violation map project. However, it is my firm belief that I will continue the study later on. The proposed social distance violation map is generated with four steps: firstly,use the camera distance to calculate the size of the image patch in real meters, which is the transformation between images to reality; secondly, estimate the people count from the density map in the given image patch; next, calcualte the people density in count per meter square from the people count and image patch size; finally, threshold the people density accordingly to get the social distance violation map. By generating the thresholded ground-truth density map, which is the ground-truth dot annotation with nearest neighbor distance, a violation map can be produced. The method is a transformation between the physical world and the image world.