Conclusion and future work

In the literature we discussed recommender systems and their general context. We listed system properties, described three common recommendation approaches, and listed typical issues and shortcomings of recommender systems. One of these issues is the black box problem for which the end user fails to gain insight into the recommendation process and as a result may have little trust in its recommendations. To solve this problem an explanation system can be used that explains the recommendation rationale.

In the next part of the litature study we looked at a way to visualize this rationale. We came up with a graph-based visualization representing the underlying utility matrix of collaborative recommendation, that uses Holten’s edge-bundling algorithm along with node reduction to reduce the number of data dimensions, inspired by a visualization by Valdis Krebs.

Subsequently we looked at an evaluation method for visualization insight developed by Chris North. We also investigated the insight gaining process established by Klein et al. Finally we adapted Ware and Mitchell’s visual thinking algorithm to describe how a user would interact with the visualization to solve a certain problem.

A number of visual explanation systems were discussed. To compare these systems we used a number of goals presented by Tintarev and Masthoff.

6.1 Objectives

The first objective described in section 1.1 was to conduct a literature study on techniques for the visualization of music suggestions. This has not been entirely reflected in this text. Nonetheless an effort was made to link presented techniques either to the end product or other examples in the context of music recommendation.

The second objective was to design, implement and evaluate an interactive visualization that will allow the user to gain insight into the recommendation process as well as actively steer the process. The following success criteria for the application were listed in section 1.1.3 of the introduction:

  • Aimed at non-expert users with an average to high interest in music;
  • Achieve high usability, in particular learnability and memorability;
  • Provide transparency.

Although there were some casual listeners among the test users, the majority of participants was representative of the target audience. Results for the perceived usefulness in the last iteration vary between 2 and 5, suggesting that the first criteria has not been met entirely. Still, if the application would be developed further, and more users get tested, the average may still increase.

An overall average SUS score of 80.5 in the final iteration suggests that the usability of the system is good, as perceived by users. However, the learnability of the system has perhaps some room for improvement.

Results indicate that our design can be effective in explaining the rationale of collaborative recommendations. The explanations did not always increase system trust, but could give an indication of recommender system bias, as poor recommendations were often not connected to the user’s top neighbours. Finally, the explanation system may provide a starting point for further data exploration.

The objective that was not met, was to enable users to actively steer the recommendation process. This is due to the fact that the Last.fm API did not support this functionality. Of course an alternative could have been to make use of other methods in the API to construct our own custom recommender system, but we have chosen to explain the artist recommendations made by the actual recommender instead. Another possibility could have been to use another recommender system altogether, but from the systems that were investigated, e.g. Spotify, Grooveshark, Bandcamp, no significant additional functionality was discovered that could have overcome these issues.

6.2 Future work

6.2.1 Issues

Future work may include addressing problems with visual clutter, and slow data load as listed in table 4.8.

6.2.2 Evaluation

For future user tests, Last.fm users could be given a pre-test questionnaire to evaluate the Last.fm recommender and its explanations. Such a benchmark could have proven useful in understanding the usefulness of the application. Other evaluation methods that can be used, are for example expert-based evaluation, and heuristic approaches.

6.2.3 Visualization and music

The focus of the literature study was mainly on providing a context for the elements that were used in the application. To improve the initial design, it might have been better to also incorporate some sort of comparative study of visualization techniques for music, especially if we were to built an explanation system for content-based recommendation. On the other hand, this subject may provide enough content for another thesis.

6.2.4 Extensions

The interactive elements could be enhanced, and the amount of interactive elements increased. For example by allowing interaction with edges, the user could dig deeper into the relationship between artists and the corresponding users.

The visual explanation system could be tested using other data sets and collaborative recommendation systems. The model could be extended for use in a hybrid environment, for example by visualizing also tag-based or other relationships among artists in Last.fm.

6.3 Personal reflection

All in all this has been an interesting project. During its course, a lot has been learned and the reasoning on this subject has developed as well. In hindsight there are inevitably things that one would do differently, and this is no exception.

6.3.1 An overview of how the project unfolded

The first months of this project were probably the most difficult ones. It was not always easy to determine what to look for. A lot of papers had to be read again in a later stage, since a lot of details were overlooked due to a lack of context and direction. This is probably typical of students coming from the programme Schakelprogramma Master Toegepaste Informatica. Especially in a one year programme, some classes that could have provided background for the thesis subject, may come late in the academic year, such as the course Gebruikersinterfaces. As a result, this may have affected the motivation for working on the thesis by the end of the first semester.

During the Christmas break, some papers were reread and a better idea of what needed to be done was formed. As a result, the slow progress in the first semester had to be made undone in the second one. Still, looking back, it is not easy to counter this problem, which is perhaps part of the insight gaining process decribed in this thesis.

6.3.2 Lessons learned

Some things that could have been done differently are probably to have started earlier with user tests. The idea explained in this thesis was developed early on in the project, but was evaluated much later. Conducting user studies early on would have yielded more test results, and provided additional experience. Some issues with the application and testing methods are likely to have been discovered at a much earlier stage as well. One of the reasons for stalling, was lack of confidence in the idea, and also a lack of experience in conducting user studies.

Advertisements