2025 Innovations in Intelligent Systems and Applications Conference, ASYU 2025, Bursa, Türkiye, 10 - 12 Eylül 2025, (Tam Metin Bildiri)
This study proposes and tests a classification approach that performs spatial feature extraction with a pretrained deep convolutional neural network (CNN) model, DenseNet-121, and temporal pattern analysis with a long-shortterm memory (LSTM) model to classify short-length video clips into 'violent' and 'non-violent' categories. The proposed hybrid method is trained using the Real Life Violent Situations (RLVS) dataset and tested with a 5-fold cross-validation method. The results show that the proposal achieves an average accuracy of 95.22% and provides a generalizable performance with a standard deviation value of 0.64%.