README.md

## Evaluating and Extending Unsupervised VideoSummarization Methods

1. Reproduce unsupervised method CSNet and RL-method SUM-Ind.
2. Evaluating SUM-GAN-AAE, SUM-GAN-sl, CSNet, and SUM-Ind using F1-score and rank correlation coefficients.
2. Extending CSNet with features variation by applying fusion techniques.


# Project Structure
```
Directory: 
- /data
- /csnet (implementation of csnet method)
- /src/evaluation (evaluation using F1-score and rank correlations coefficients)
- /src/visualization 
- /sum-ind (implementation of SUM-Ind method)
- /CSNET-places365-early-fusion (Fusion variation based on CSNet)
- /CSNET-places365-late-fusion (Fusion variation based on CSNet)
- /CSNET-places365-intermediate-fusion (Fusion variation based on CSNet)

```
# Datasets
Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by [Ke Zhang](https://github.com/kezhang-cs) and [Wei-Lun Chao] and the h5 files were obtained from [Kaiyang Zhou](https://github.com/KaiyangZhou/pytorch-vsumm-reinforce). 

These files have the following structure:
```
/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset
```
Original videos and annotations for each dataset are also available in the authors' project webpages:

**TVSum dataset**: [https://github.com/yalesong/tvsum](https://github.com/yalesong/tvsum) 


**SumMe dataset**: [https://gyglim.github.io/me/vsum/index.html#benchmark](https://gyglim.github.io/me/vsum/index.html#benchmark)


### CSNet
We used the implementation of [SUM-GAN](https://github.com/j-min/Adversarial_Video_Summary) method as a starting point to implement CSNet.

#### How to train
The implementation of CSNet is located under the directory csnet. Run main.py file with the configurations specified in configs.py to train the model.


### SUM-Ind
Make splits
```bash
python create_split.py -d datasets/eccv16_dataset_summe_google_pool5.h5 --save-dir datasets --save-name summe_splits  --num-splits 5
```
As a result, the dataset is randomly split for 5 times, which are saved as json file.

Train and test codes are written in `main.py`. To see the detailed arguments, please do `python main.py -h`.

#### How to train
```bash
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose
```

#### How to test
```bash
python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results
```


**Important Wiki Pages:**

* [Notes on SUM-GAN-AAE (Apostolidis et al. 2020)](https://gitlab.uni-hannover.de/hussainkanafani/unsupervised-video-summarization/-/wikis/SUM-GAN-AAE-(Apostolidis-et-al.-2020))

* [References and important findings](https://gitlab.uni-hannover.de/hussainkanafani/unsupervised-video-summarization/-/wikis/Findings)

* [Reproduce (Zhou et al. 2018)](https://gitlab.uni-hannover.de/hussainkanafani/unsupervised-video-summarization/-/wikis/Reproduce-(Zhou-et-al.-2018)%E2%80%8B)

* [Conda cheat sheet](https://gitlab.uni-hannover.de/hussainkanafani/unsupervised-video-summarization/-/wikis/Conda-cheat-sheet)


### Citations
```
@article{zhou2017reinforcevsumm, 
   title={Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward},
   author={Zhou, Kaiyang and Qiao, Yu and Xiang, Tao}, 
   journal={arXiv:1801.00054}, 
   year={2017} 
}
```

```
@inproceedings{DBLP:conf/aaai/JungCKWK19,
  author    = {Yunjae Jung and
               Donghyeon Cho and
               Dahun Kim and
               Sanghyun Woo and
               In So Kweon},
  title     = {Discriminative Feature Learning for Unsupervised Video Summarization},
  booktitle = {The Thirty-Third {AAAI} Conference on Artificial Intelligence, {AAAI}
               2019, The Thirty-First Innovative Applications of Artificial Intelligence
               Conference, {IAAI} 2019, The Ninth {AAAI} Symposium on Educational
               Advances in Artificial Intelligence, {EAAI} 2019, Honolulu, Hawaii,
               USA, January 27 - February 1, 2019},
  pages     = {8537--8544},
  publisher = {{AAAI} Press},
  year      = {2019},
  url       = {https://doi.org/10.1609/aaai.v33i01.33018537},
  doi       = {10.1609/aaai.v33i01.33018537},
  timestamp = {Wed, 25 Sep 2019 11:05:09 +0200},
  biburl    = {https://dblp.org/rec/conf/aaai/JungCKWK19.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```