- Middle East Technical University
Transkript
- Middle East Technical University
CORS: A Hybrid Music Recommender System Gulfem Demir, Tugba Kaya, Cagatay Ogut, Ali Can Sag Middle East Technical University, Ankara, Turkey http://www.neo4j.org/learn/neo4j Why do we need a recommender system? Why CORS? • Because CORS uses a graph database that has the ability to traverse deeply among the all dataset faster than slow SQL queries that span many table joins • Because CORS combines two main recommendation techniques into a hybrid approach, where supplementary content features are employed to improve the accuracy of collaborative filtering • Because CORS is trained with huge amount of metadata (~300GB) • To satisfy the need of discovery as a human • To deal with the massive scale of music data, i.e. a library of over 15 million songs on demand for free on the web • Because filters and guides are invaluable for music itself to coexist with the new ways of getting at it Introduction As the digital content distribution rises, access to music collections started skyrocketing. One million songs would take more than seven years of non-stop listening. On the other hand, commercial music libraries exceed 15 million songs, which is way greater than the listening capability of a single person. In order to help the community cope with the rapidly growing catalogue of readily available music, a wide range of academic efforts have been suggested to automate search and retrieval of musical content. Computational recommender systems have come into play to deal with this issue. They enable people to share their opinions and benefit from each other’s experience. 2. Approaches Most recommender systems take either of two basic approaches: collaborative filtering or content-based filtering, which are compared in the Figure 2. Basically, collaborative filtering arrives at a recommendation that's based on a model of prior user behavior and content-based approach tries to recommend items that are similar to those that a user liked in the past. In our project, hybrid approach is used that combine collaborative and content-based filtering are also increasing the efficiency (and complexity) of recommender systems. Figure 1. Model of Recommendation Process 1. Dataset The Million Song Dataset Challenge is a large scale, music recommendation challenge, where the aim is to predict which songs a user will listen to, when the listening history of the user is provided. The challenge is based on the Million Song Dataset (MSD), that is used in our project, a freely- available collection of meta data for one million of contemporary songs (e.g. song titles, artists, year of publication, audio features, and much more). The Million Song Dataset is also a cluster of complementary dataset contributed by the community such as lyrics provided by MusiXmatch dataset, user data provided by Taste Profile Subset. Figure 2. Recommender System Approaches Comparison CORS 1. Design and Implementation CORS is a music recommendation system which provides users with real-time song suggestions depending on their listening history. To be able to give instant recommendations, CORS uses graph database which significantly reduces the query time. Besides, in order to deliver richer content, CORS adopts a hybrid approach combining collaborative filtering with content based recommendations. use the value of alpha as 0.3, whereas in item-based recommendation strategy, the value of alpha was finalized as 0.15. 3. Evaluation Dataset: We have used two separate data sets for evaluation. Training data set is the one used in CORS for generating high quality recommendations. In order to measure the accuracy of our recommender system, the test dataset provided by the MSDC is used. Statistical data about these two collections is provided in Figure 4. Figure 4. Datasets Figure 3. Neo4j Graph Database Structure [3] 2. Similarity Metrics Collaborative filtering depends on similarity measure between users and items. The cosine similarity is one of many similarity metrics available. It, with a equals 0.5 in the following formulas, has the nice property to be symmetric but, especially for the item case, we are more interested in computing how likely it is that an item will be appreciated by a user when we already know that the same user likes another item. It is clear that this definition is not symmetric. That’s why a should not be equal to 0.5 in our case. Evaluation metrics: In the challenge it was asked to recommend 500 items (the number is x) for each user. They will evaluate the results using mean average precision, or MAP, metric. MAP@500 is also used in CORS so that we can compare our results with the finalists in the challenge. MAP is just an average of APs, or average precision, for all users. If we have 1000 users, we sum APs for each user and divide the sum by 1000. It is important to underline that order matters in MAP, so it’s better to submit more certain recommendations first, followed by recommendations we are less sure about. Results: We have evaluated all three techniques, that is implemented for generating recommendations, based on mean average precision. U(i): The set of items rated by a generic user U, I(u): The set of users which have rated item, a: parametrization variable that is between 0 and 1 As an alternative to the cosine similarity and, a parametric generalization of the above similarity measures is proposed in CORS with the following formulas. After some trials, in user-based recommendation, it was decided to Acknowledgements We would like to thank our advisors Dilek Onal, Prof. Dr. Ismail Hakki Toroslu and Prof. Dr. Veysi Isler for their comments, which helped us improve this project considerably. Figure 5. Mean Average Precision Results References 1. Aiolli F. A Preliminary Study on a Recommender System for the Million Songs Dataset Challenge 2. McFee B., Ellis D. P.W., Bertin-Mahieux T., Lanckriet Gert R.G., The Million Song Dataset Challenge 3. http://neo4j.com/blog/musicbrainz-in-neo4j-part-1/
Benzer belgeler
Software Requirements Specification
INTRODUCTION ........................................................................................... 5
Project Proposal
listening. On the other hand, commercial music libraries exceed 15 million
songs, which is way greater than the listening capability of a single person. In
order to help the community cope with the...