-
Hussain Kanafani authoredHussain Kanafani authored
Evaluating and Extending Unsupervised VideoSummarization Methods
- Evaluation unsupervised methods with different metric under same configuration.
- Investigating effect of extracted features in unsupervised methods and extend it to perform better than baseline work.
- Identify gaps form last step and try to fill by porposed solution by extending or modifying an existing model.
Abstract
This study validates the recent work in unsupervised video summarization and extends the experiments with feature variation.
In our understanding, deep learning based approaches select frame-candidates for a video summary.
These frames are drawn with a probability that can be used to calculate the scaled importance for comparison with the basic truth as an evaluation.
However, most works in the literature focus on network architecture and follow the same feature extraction technique using the resulting deep features from one pretrained model.
At this point a notable gap of the existing approaches is feature variation, which this study explores.
Building on that, model variables like network architecture, optimizer, and activation functions can have an impact on the performance combined with different feature selection techniques that haven’t beenexplored yet.Further, the Evaluation of existing models is conducted using one metric,which may be not representative in video summarization task, since it could ignore key-frames in favor of other frames and still have a relatively high value. This work performs feature extraction using multiple pretrained neural models, and then measures the impact of them on current state-of-the-art works.Then,it evaluates the state-of-the-art works using different evaluation metrics.
Eventually, it aims to find an unsupervised video summarization method, in order to fill the gaps, and leverage the existing works.
Important Wiki Pages:
Datasets
Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and [Wei-Lun Chao] and the h5 files were obtained from Kaiyang Zhou.
These files have the following structure:
/key
/features 2D-array with shape (n_steps, feature-dimension)
/gtscore 1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
/user_summary 2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
/change_points 2D-array with shape (num_segments, 2), each row stores indices of a segment
/n_frame_per_seg 1D-array with shape (num_segments), indicates number of frames in each segment
/n_frames number of frames in original video
/picks positions of subsampled frames in original video
/n_steps number of subsampled frames
/gtsummary 1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
/video_name (optional) original video name, only available for SumMe dataset
Original videos and annotations for each dataset are also available in the authors' project webpages:
TVSum dataset: https://github.com/yalesong/tvsum
SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark
CSNet
We used the implementation of SUM-GAN method as a starting point to implement CSNet.
How to train
The implementation of CSNet is located under the directory csnet. Run main.py file with the configurations specified in configs.py to train the model.
SUM-Ind
Make splits
python create_split.py -d datasets/eccv16_dataset_summe_google_pool5.h5 --save-dir datasets --save-name summe_splits --num-splits 5
As a result, the dataset is randomly split for 5 times, which are saved as json file.
Train and test codes are written in main.py
. To see the detailed arguments, please do python main.py -h
.
How to train
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose
How to test
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results
Citations
@article{zhou2017reinforcevsumm,
title={Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward},
author={Zhou, Kaiyang and Qiao, Yu and Xiang, Tao},
journal={arXiv:1801.00054},
year={2017}
}
@inproceedings{DBLP:conf/aaai/JungCKWK19,
author = {Yunjae Jung and
Donghyeon Cho and
Dahun Kim and
Sanghyun Woo and
In So Kweon},
title = {Discriminative Feature Learning for Unsupervised Video Summarization},
booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
2019, The Thirty-First Innovative Applications of Artificial Intelligence
Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
USA, January 27 - February 1, 2019},
pages = {8537--8544},
publisher = {{AAAI} Press},
year = {2019},
url = {https://doi.org/10.1609/aaai.v33i01.33018537},
doi = {10.1609/aaai.v33i01.33018537},
timestamp = {Wed, 25 Sep 2019 11:05:09 +0200},
biburl = {https://dblp.org/rec/conf/aaai/JungCKWK19.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}