报告题目:Content Analysis of Internet Video
报告人:Prof. Alexander G. Hauptmann (School of Computer Science Carnegie Mellon University)
报告地点:信息学院四楼报告厅
报告时间:6月27日(周六)上午10:00
报告摘要:
Recent developments in deep convolutional neural networks trained over large image collections have shown exciting progress in still image object detection. This talk will report on different approaches our team has been pursuing to make video analysis scalable and analyze as well as benefit from larger amounts of data. Specifically, I will review several years of work on content analysis, indexing and search of internet video. The research resulted in advances in feature extraction and representation speeds, semantic concept labeling scale and speeds, event detection accuracy and speed, and surveillance event detection accuracy. Our video features show advances over current state of the art “improved dense trajectories”. Using quantization, huge feature representations can be reduced to very tractable dimensionality. Mapping non-linear kernels into linear spaces allows for efficient linear classifiers, using several learning approaches. By using a multitude of features, a new approach to fusion provides additional benefits.
Since we have long been convinced that semantic analysis will be critical in the long term, we have been studying ways to train large numbers of semantic classifiers. We have been obtaining very encouraging results in obtaining large amounts of automatically harvested weakly-labeled training data from the web, as well as building an infrastructure that allows us to efficiently process the video data into semantic concept classifiers for objects, scenes, and actions. In turn, this has uncovered new challenges in how to use thousands of semantic detectors during browsing or querying.
This research was critical in the top performance of our team at this year’s TRECVID video analysis competition, as well as setting the stage for significant advances in coming years.
报告人简介:
Alex Hauptmann is a Principal Systems Scientist in the Carnegie Mellon University Computer Science Department and a faculty member with CMU’s Language Technologies Institute. His research interests have led him to pursue and combine several different areas: man-machine communication, natural language processing, speech understanding and synthesis, machine learning. He worked on speech and machine translation at CMU from 1984-94, when he joined the Informedia project where he developed the News-on-Demand application. Since then he has conducted research on video analysis and retrieval on broadcast news as well as observational video with success documented by outstanding performance in the annual NIST TRECVID video retrieval evaluations. His current research centers on robust analysis of internet-style and surveillance video as large scale.