Weakly supervised spatiotemporal violence detection in surveillance video

dc.contributor.advisorCamara Chavez, Guillermo
dc.contributor.authorChoqueluque Roman, David Gabriel
dc.date.accessioned2023-09-26T21:38:16Z
dc.date.available2023-09-26T21:38:16Z
dc.date.issued2023
dc.description.abstractViolence Detection in surveillance video is an important task to prevent social and personal security issues. Usually, traditional surveillance systems need a human operator to monitor a large number of cameras, leading to problems such as miss detections and false positive detections. To address this problem, in last years, researchers have been proposing computer vision-based methods to detect violent actions. The violence detection task could be considered a sub-task of the action recognition task but violence detection has been less investigated. Although a lot of action recognition works were proposed for human behavior analysis, there are just a few CCTV-based surveillance methods for analyzing violent actions. In the literature of violence detection, most of the methods tackle the problem as a classication task, where a short video is labeled as violent or non-violent. Just a few methods tackle the problem as a spatiotemporal detection task, where the method should detect spatially and temporally violent actions. We assume that the lack of such methods is due the exorbitant cost of annotating, at frame-level, current violence datasets. In this work, we propose a spatiotemporal violence detection method using a weakly supervised approach to train the model using only video-level labels. Our proposal uses a Deep Learning model following a Fast-RCNN (Girshick, 2015) style architecture extended temporally. Our method starts by generating spatiotemporal proposals leveraging a pre-trained person detector and motion appearance to build such proposals called action tubes. An action tube is dened as a set of temporally related bounding boxes that enclose and track a person doing an action. Then, a video with the action tubes is fed to the model to extract spatiotemporal features, and nally, we train a tube classier based on Multiple-instance learning (Liu et al., 2012). The spatial localization relies on the pre-trained person detector and motion regions extracted from dynamic images (Bilen et al., 2017). A dynamic image summarizes the movement of a set of frames to an image. Meanwhile, temporal localization is done by the action tubes by grouping spatial regions over time. We evaluate the proposed method on four publicly available datasets such as Hockey Fight, RWF-2000, RLVSD and UCFCrime2Local. Our proposal achieves an accuracy score of 97:3%, 88:71%, and 92:88% for violence detection in the Hockey Fight, RWF-2000, and RLVSD datasets, respectively; which are very close to the state-of-the-art methods. Besides, our method is able to detect spatial locations in video frames. To validate our spatiotemporal violence detection results, we use the UCFCrime2Local dataset. The proposed approach reduces the spatiotemporal localization error to 31:92%, which demonstrates the feasibility of the approach to detect and track violent actions.
dc.description.uriTesis de maestría
dc.formatapplication/pdf
dc.identifier.other1079964
dc.identifier.urihttps://hdl.handle.net/20.500.12590/17743
dc.language.isoeng
dc.publisherUniversidad Católica San pablo
dc.publisher.countryPE
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0/
dc.subjectWeakly supervised learning
dc.subjectSpatio-temporal detection of violence,Keywords
dc.subjectDynamic image
dc.subjectVideo surveillance
dc.subject.ocdehttp://purl.org/pe-repo/ocde/ford#1.02.01
dc.titleWeakly supervised spatiotemporal violence detection in surveillance video
dc.typeinfo:eu-repo/semantics/masterThesis
dc.type.versioninfo:eu-repo/semantics/publishedVersion
renati.advisor.dni30960286
renati.advisor.orcidhttps://orcid.org/0000-0003-2440-0247
renati.author.dni74071654
renati.discipline611017
renati.jurorOchoa Luna, Jose Eduardo
renati.jurorGomez Nieto, Erick Mauricio
renati.jurorAlves Bonfim de Queiroz, Rafael
renati.levelhttps://purl.org/pe-repo/renati/level#maestro
renati.typehttps://purl.org/pe-repo/renati/type#tesis
thesis.degree.disciplineCiencia de la Computación
thesis.degree.grantorUniversidad Católica San Pablo. Departamento de Ciencia de la Computación
thesis.degree.levelMaestría
thesis.degree.nameMaestro en Ciencia de la Computación
thesis.degree.programEscuela Profesional Ciencia de la Computación
Files
Original bundle
Now showing 1 - 4 of 4
No Thumbnail Available
Name:
AUTORIZACIÓN.pdf
Size:
202.9 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CHOQUELUQUE_ROMAN_DAV_WEA.pdf
Size:
93.41 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
TURNITIN.pdf
Size:
19.11 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
ACTA.pdf
Size:
58.64 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections