Know your data: understanding implicit usage versus explicit action in video content classification (original) (raw)

Know your data: Understanding implicit usage versus explicit action in video content classification

2011

In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.

Knowing Funny: Genre Perception and Categorization in Social Video Sharing

Proceedings of the 2011 annual …, 2011

Categorization of online videos is often treated as a tag suggestion task; tags can be generated by individuals or by machine classification. In this paper, we suggest categorization can be determined socially, based on people's interactions around media content without recourse to metadata that are intrinsic to the media object itself. This work bridges the gap between the human perception of genre and automatic categorization of genre in classifying online videos. We present findings from two internet surveys and from follow-up interviews where we address how people determine genre classification for videos and how social framing of video content can alter the perception and categorization of that content. From these findings, we train a Naive Bayes classifier to predict genre categories. The trained classifier achieved 82% accuracy using only social action data, without the use of content or media-specific metadata. We conclude with implications on how we categorize and organize media online as well as what our findings mean for designing and building future tools and interaction experiences.

Boosting web video categorization with contextual information from social web

World Wide Web, 2012

Web video categorization is a fundamental task for web video search. In this paper, we explore web video categorization from a new perspective, by integrating the model-based and data-driven approaches to boost the performance. The boosting comes from two aspects: one is the performance improvement for text classifiers through query expansion from related videos and user videos. The model-based classifiers are built based on the text features extracted from title and tags. Related videos and user videos act as external resources for compensating the shortcoming of the limited and noisy text features. Query expansion is adopted to reinforce the classification performance of text features through related videos and user videos. The other improvement is derived from the integration of model-based classification and data-driven majority voting from related videos and user videos. From the data-driven viewpoint, related videos and user videos are treated as sources for majority voting from the perspective of video relevance and user interest, respectively. Semantic meaning from text, video relevance from related videos, and user interest induced from user videos, are combined to robustly World Wide Web determine the video category. Their combination from semantics, relevance and interest further improves the performance of web video categorization. Experiments on YouTube videos demonstrate the significant improvement of the proposed approach compared to the traditional text based classifiers.

Content-based Automatic Video Genre Identification

International Journal of Advanced Computer Science and Applications, 2019

Video content is evolving enormously with the heavy usage of internet and social media websites. Proper searching and indexing of such video content is a major challenge. The existing video search potentially relies on the information provided by the user, such as video caption, description and subsequent comments on the video. In such case, if users provide insufficient or incorrect information about the video genre, the video may not be indexed correctly and ignored during search and retrieval. This paper proposes a mechanism to understand the contents of video and categorize it as Music Video, Talk Show, Movie/Drama, Animation and Sports. For video classification, the proposed system uses audio and visual features like audio signal energy, zero crossing rate, spectral flux from audio and shot boundary, scene count and actor motion from video. The system is tested on popular Hollywood, Bollywood and YouTube videos to give an accuracy of 96%.

YouTube Data Analysis and Prediction of Views and Categories

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

There's enormous growth and fashionability of YouTube. It has all implicit to move billion of lives encyclopedically as the number of observers is growing day by day. Nearly billions of vids are watched on YouTube every single day, generating a huge quantum of data daily. YouTube data is actually in unshaped form, so there's great demand to store the data, process the data and assaying the data. This analysis will help in discovering how people are performing on YouTube, one can fluently identify what content works best on YouTube. The primary purpose of this design is to find how real time data can be anatomized to get the rearmost analysis and trends in YouTube. The analysis is done using stoner features similar as views, commentary, markers, likes, and dislikes. Analysis can be performed using algorithms like direct retrogression, bracket and other machine literacy models and python libraries like pandas, matplotlib library to classify the YouTube data and gain useful information.

Metadata Based Classification and Analysis of Large Scale Web Videos

The astonishing growth of videos on the Internet such as YouTube, Yahoo Screen, Face Book etc, organizing videos into categories is of paramount importance for improving user experience and website utilization. In this information age, video information is the rapidly sharing by the people through social media websites such as YouTube, Face Book, yahoo Screen etc. Different categories of web video are shared on social websites and used by the billions of users all over the world. The classification/partitioning of web videos in terms of length of the video, ratings, age of the video, number of comments etc, and analysis of this web video as a unstructured complex data is a challenging task. In this work we propose effective classification model to classify each category of web-videos (Ex- ‘Entertainment’, ‘People and Blogs’, ‘Sports’, ‘News and Politics’, ‘Science and Technology’ etc) based on other web metadata attributes as splitting criteria. An attempt is made to extract metadata from web videos. Based on the extracted metadata, web videos are classified/partitioned into different categories by applying data mining classification algorithms such as and Random Tree and J48 classification model. The classification results are compared and analyzed using cost/benefit analysis. Also the results demonstrate classification of web videos depends largely on available metadata and accuracy of the classification model. Classification/partitioning of web-based videos are important task with many applications in video search and information retrieval process. However, collecting metadata required for classification model may be prohibitively expensive. The experimental difficulties arise from large data diversity within a category is pitiable of metadata and dreadful conditions of web video metadata.

Automatic genre identification for content-based video categorization

Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, 2000

This paper presents a set of computational features originating from our study of editing effects, motion, and color used in videos, for the task of automatic video categorization. These features besides representing human understanding of typical attributes of different video genres, are also inspired by the techniques and rules used by many directors to endow specific characteristics to a genre-program which lead to certain emotional impact on viewers. We propose new features whilst also employing traditionally used ones for classification. This research, goes beyond the existing work with a systematic analysis of trends exhibited by each of our features in genres such as cartoons, commercials, music, news, and sports, and it enables an understanding of the similarities, dissimilarities, and also likely confusion between genres. ClassiJication results from our experiments on several hours of video establish the usefulness of this feature set. We also explore the issue of video clip duration required to achieve reliable genre identification and demonstrate its impact on classification accuracy.

Genre-specific semantic video indexing

Proceedings of the ACM International Conference on Image and Video Retrieval - CIVR '10, 2010

In many applications, we find large video collections from different genres where the user is often only interested in one or two specific video genres. So, when users are querying the system with a specific semantic concept, they are likely aiming a genre specific instantiation of this concept. Thus, a question is how to detect genre specific semantic concepts such as Child in HomeVideo, or FrontalFace in Porn, in an efficient and accurate way. We propose a framework to do such genre-specific context detection. Genre specific models are trained based on a training set with data labelled at video level for genres and at shot level for semantic concepts. In the classification stage, video genre classification is applied first to reduce the entire data set to a relatively small subset. Then, the genre-specific concept models are applied to this subset only. Experiments have been conducted on a small, but realistic 28-hour video data set including YouTube videos, porn videos, TV programs, as well as home videos. Experimental results show that our proposed two-step method is efficient and effective. When filtering the data set such that approximately a percentage is kept equal to the prior probability of each video genre, the overall performance only decreases about 12%, while the processing speed increases about 2 to 10 times for different video genres.

Analysis and Implementation Machine Learning for YouTube Data Classification by Comparing the Performance of Classification Algorithms

2020

Every day, people around the world upload 1.2 million videos to YouTube or more than 100 hours per minute, and this number is increasing. The condition of this continuous data will be useless if not utilized again. To dig up information on large-scale data, a technique called data mining can be a solution. One of the techniques in data mining is classification. For most YouTube users, when searching for video titles do not match the desired video category. Therefore, this research was conducted to classify YouTube data based on its search text. This article focuses on comparing three algorithms for the classification of YouTube data into the Kesenian and Sains category. Data collection in this study uses scraping techniques taken from the YouTube website in the form of links, titles, descriptions, and searches. The method used in this research is an experimental method by conducting data collection, data processing, proposed models, testing, and evaluating models. The models applied...

Web Video Mining: Metadata Predictive Analysis using Classification Techniques

Now a days, the Data Engineering becoming emerging trend to discover knowledge from web audio-visual data such as- YouTube videos, Yahoo Screen, Face Book videos etc. Different categories of web video are being shared on such social websites and are being used by the billions of users all over the world. The uploaded web videos will have different kind of metadata as attribute information of the video data. The metadata attributes defines the contents and features/characteristics of the web videos conceptually. Hence, accomplishing web video mining by extracting features of web videos in terms of metadata is a challenging task. In this work, effective attempts are made to classify and predict the metadata features of web videos such as length of the web videos, number of comments of the web videos, ratings information and view counts of the web videos using data mining algorithms such as Decision tree J48 and navie Bayesian algorithms as a part of web video mining. The results of Decision tree J48 and navie Bayesian classification models are analyzed and compared as a step in the process of knowledge discovery from web videos.