Machine Learning Application using Pose Estimation to Detect and Moderate Violence in Live Videos

Abstract

Don't use plagiarized sources. Get Your Custom Essay on
Machine Learning Application using Pose Estimation to Detect and Moderate Violence in Live Videos
Just from $13/Page
Order Essay

Recordings of public violence have never been as readily available as today. Livefeeds of shootings and attacks have become an ever increasing problem with gruesome images of violence being a click away from viewers of all ages. AI has begun to be employed to monitor video surveillance in prisons or psychiatric centres to detect “suspicious behaviour” but this technique has yet to be exploited for more general monitoring live broadcast and media sharing sites such as Twitch.tv and Facebook live. This proposed model could be useful as a “missing piece” in the field of censorship AI and used as the basis of a start-up company, as a web browser ad-on or sold directly to streaming services to be incorporated into their website. 

Overview

A significant amount of research has be done on the various methods to detect violence in videos, focusing on visual content,[1] audio content[2] or a combination of the two.[3] There has been major success with real-time monitoring of audio profanities and nudity but on most online platforms to date manual human monitoring and reporting is still relied upon to detect violent content.[4] This report will focus on violent human behaviour such as fighting as opposed to videos involving weapons, blood or fire which have been previously classified using simple image classification algorithms.[5]

Violent human behaviour can be classified in real-time using pose estimation, an emerging area of research where 3D stick-man poses of individuals can be extracted from 2D pictures and videos. Some of the numerous current applications include automatic creation of assets for digital media such as video games, analysing and coaching the techniques of athletes and with specific interest to this report, machine learning using image classification techniques.[6] Difficulties in the process include accounting for lighting, occlusion and variety of clothing. An advantage of using deep learning over “hand-crafted” techniques is the lack of the need for generalisation of frames and prior information means there is no need to heavy pre-processing of the data.

2.1.  Previous developments on pose classification 

Bogo and Kanazawa et al. previously reported a convolutional neural network, used to predict the position of individual joints which can be used as a basis to use a Skinned Multi-Person Linear Model optimisation in a classic bottom up – top down approach where the full 3D-geometry and body type is also conferred.[7] The computational demand of this approach is optimised by using constraints such as avoiding impossible join bends, thus minimising the number of possible solutions.

Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Essay Writing Service

Real-time pose estimation has been achieve with the work of Güler et al.[8] The technique of dense human pose estimation maps human pixels in a frame of a video to the 3D surface of the human body with positive results, improved by training an additional “inpainting” network that filled in missing values based on the surrounding data. Opposed to previous research of Bogo and Kanazawa et al., the output poses are dense correspondents between the 2D images and the 3D models, named DensePose.

  System description

The objective of this report was to develop a model for an end to end trainable, deep neural network to classify live videos to detect violence. The specific aims can be broken down as follows:

●       Assess the performance of employing a convolutional neural network to train on frame differences and a ConvLSTM to classify the frames where the output gives an overall probability violence score ranging between zero for extremely unlikely to be violent and 1 for certain violence.

●       Theorise the pros and cons of the system as well as performing an evaluation the algorithm against known benchmarks.

●       Detail how this model can be incorporated into a live-video moderation application for streaming sites and browsers.

3.1.  Pose classification

A convolutional neural network is used to extract frame level features using frame difference from a video in real-time. The output of the trained convolutional network, which will be the desired pose information will be subsequently fed into a seriesfeed-forward layer to output the probability and thus level of violence in the frames thus far. The model can be considered a classical blackbox of which a block-diagram can be viewed in Figure 1.

Although an advantage of convolutional neural networks is an absence of extensive pre-processing of the training data, the method in which the video frames are fed to the model can improve the accuracy of the algorithm. Classification accuracy was investigated by Krizhevsky et al. for the ImageNet dataset using each video frame separately and the difference between each frame.[9] The classification accuracy rose from 96.0

The recognition accuracy and error rate of the algorithm can be evaluated on a number of standard benchmark datasets, for example the Hollywood, YouTube-Actions and Violent-Flows dataset. This can be performed using cross validation against other well-studied classification techniques such as ViF/SVM,[12] OVif[13] and MoSIFT[14] that have been evaluated on the Violent-Flows dataset. An extremely robust dataset of pose-action classification known as Action Similarity Labelling (ASLAN) was presented by Kliper-Gross et al. in 2012 and has become a standard benchmark dataset for pose estimation in the years following.[15]  The dataset includes thousands of videos, collected incorporating over 400 complex pose-action with violent and non-violent classes. Models incorporating both convolution and ConvLSTM layers for pose classification exhibit accuracies upwards of approx. 94 % when evaluated against Violent-Flows and ASLAN.[4] From this similar results should be expected from the proposed model present in this report. As mentioned in Section 3, problems have arisen in previous work when including videos of sporting events into the training dataset. When evaluating this model, accuracy values should be taken when both including and excluding sports footage.

  Discussion: Application of the algorithm

The final output of the model described in section 3 gives a user a continuous probability score for the presence of violence in a livestream in real time. The model has been developed to incorporate modern methods such as ConvLSTM to improve classification accuracy and blackbox convolutional neural networks to allow for real-time detection. Assuming that the model performs well against established benchmark datasets, the next step is to research the algorithm’s viability as a commercial product and specify the niche market, functionality and obstacles that a start-up company using this technology may face.

5.1.  Market analysis

The use of deep neural networks for video violence detection applications is currently in its infancy. The most prominent use of the technology to date is seen in the “AI Guardman” developed by the company Earth Eyes released in late 2018. The software boasts the ability to target shoplifting using CCTV using a post estimation model based on OpenPose, a predecessor to DensePose discussed in Section 2. Although the source code is not available, knowing that the product is largely based on the OpenPose algorithm infers that the algorithm cannot compute in real-time. To combat this only a selected number of poses are defined to reflect “suspicious behaviour”, leading to more inaccurate results, and increasing the number of false positives. The software occupies a niche as it does not require sound, something that standard CCTV cameras do not process. The software can be installed on the CCTV directly and alerts are sent to a shop-workers phone, who can then handle the rest of the matter. The software is simple but currently is plagued with false alarms. As all commercial uses of violence detection are geared towards surveillance, a niche for violence detection for streaming services is identified, optimised for online applications.

5.2.  Implementation

After identifying this algorithm as a unique product, it is important to understand how to implement the model in a valuable product. At this point it is critical to note that violence detection software to date has occupied surveillance monitoring as crude analysis can be tolerated as it works as a warning system which then can be followed up by human investigation. Due to this it has not had the need to incorporate other elements such as audio classification.

On its own this algorithm will be able to moderate livestreams based on action recognition, but when paired with already well-researched audio profanity and nudity detection, provides the missing piece to a robust streaming video moderator. Much work has been previously been done on the detection of audio profanities in videos, most notably Bleep developed for iOS released in 2015, which has the ability to censor swear words from voice calls and videos. The same can be said for nudity detection with NudeNet, released for video censoring in March 2019. Amalgamation of these three technologies results in a much requested feature for steaming services such as Twitch.tv which censor and report streams instantaneously.

As mentioned above websites such as Twitch.tv could benefit from this technology and a first application of a start-up company would be to pitch the idea to streaming services for incorporation into their website and/or app. The feature could be toggled for adult users and made mandatory for kid-accounts. The output of the model gives a probability score for the violence, so thresholds could be put into place. In addition to this a browser add-on could be developed for unsupported streaming websites.

Find Out How UKEssays.com Can Help You!
Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.
View our services

The initial tasks of a start-up company would then be to incorporate a model which could have audio profanity and nudity detection algorithms running in parallel while still having the ability to detect frames in real-time. The idea could then be pitched to established streaming websites, who could host the algorithm server-side. In addition to this if there was demand for the product, a browser ad-on could be developed, incorporating a user-friendly interface with customisable censoring options such as only nudity censoring or censoring of violence above a certain probability threshold.

5.3.  Conclusion

Overall a literature review of pose-estimation and violence detection was conducted to present the notable research in the fields but the lack of a commercial application aside from surveillance. A model was proposed to use post estimation to detect violence in real-time, comprised of a convolutional neural network and LSTM-divided layers based on current research. The system architecture was discussed including a complete block-diagram for the system. Pros and cons for the algorithm were theorised along with a proposed system evaluation. Finally discussion was made on the capability of the algorithm to act as a steaming moderator and act as the product of a start-up company.

 References

[1] P. Bilinski, F. Bremond, I. S. Antipolis, R. Lucioles, and S. Antipolis, “Human Violence Recognition and Detection in Surveillance Videos,” AVSS, August, 2016.

[2] T. Giannakopoulos, A. Pikrakis, and S. Theodoridis, “A multi-class audio classification method with respect to violent content in movies using Bayesian Networks,” in 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 – Proceedings, 2007, pp. 90–93.

[3] E. Acar, F. Hopfgartner, and S. Albayrak, “Breaking down violence detection: Combining divide-et-impera and coarse-to-fine strategies,” Neurocomputing, vol. 208, pp. 225–237, Oct. 2016.

[4] S. Sudhakaran and O. Lanz, “Learning to detect violent videos using convolutional long short-term memory,” in 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, 2017.

[5] O. Arriaga, P. Plöger, and M. Valdenegro-Toro, “Image Captioning and Classification of Dangerous Situations,” 2017.

[6] M. Ariz, A. Villanueva, and R. Cabeza, “Robust and accurate 2D-tracking-based 3D positioning method: Application to head pose estimation,” Computer Vision and Image Understanding, vol. 180, Academic Press, 2016, pp. 13–22.

[7] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “arXiv : 1607 . 08128v1 [ cs . CV ] 27 Jul 2016 Keep it SMPL : Automatic Estimation of 3D Human Pose and Shape from a Single Image,” eccv2016, 2016,  pp. 1–18.

[8] R. A. Güler, N. Neverova, and I. Kokkinos, “DensePose: Dense Human Pose Estimation in the Wild,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp. 7297–7306.

[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “2012 AlexNet,” Adv. Neural Inf. Process. Syst., pp. 1–9, 2012.

[10] Z. Dong, J. Qin, and Y. Wang, “Multi-stream deep networks for person to person violence detection in videos,” in Communications in Computer and Information Science, 2016, vol. 662, pp. 517–531.

[11] C. Olah, A. Mordvintsev, and L. Schubert, “Feature Visualization,” Distill, vol. 2, no. 11, p. e7, Nov. 2017.

[12] T. Hassner, “Violent-Flows – Crowd Violence Non-violence Database and benchmark,” 2014.

[13] Y. Gao, H. Liu, X. Sun, C. Wang, and Y. Liu, “Violence detection using Oriented VIolent Flows,” Image and Vision Computing, vol. 48–49. pp. 37–41, 2016.

[14] E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García, and R. Sukthankar, “Violence Detection in Video Using Computer Vision Techniques,” Springer, Berlin, Heidelberg, 2011, pp. 332–339.

[15] O. Kliper-Gross, T. Hassner, and L. Wolf, “The action similarity labeling challenge,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3, pp. 615–621, 2012.
 

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code Happy