# Literature study

This chapter describes the various components that are used in the development of this thesis. Section 2.1 looks at recommender systems and their rationale. Next we will look at visualization techniques and how these can be used to visualize the recommendation rationale in section 2.2. After this, insight gaining and human perceptual skills are investigated in section 2.3. We discuss how insight gaining can be evaluated and how the user is expected to use the visualization in a visual thinking algorithm. Finally we discuss and compare other visual explanation systems in section 2.4.

## 2.1 Recommender systems

A recommender system is a system that computes personalized suggestions for users based on item ratings by users. Additional information can be incorporated into the recommendation algorithm to refine suggestions. Several categorizations of these techniques are proposed in literature [4, 8, 19, 22, 35].

One of the incentives behind creating recommender system is the long-tail phenomenon [43]. This phenomenon can be explained as follows. Physical retail and warehouses can only keep a subset of all the available items in stock. These items are usually the most popular items on the market. Online vendors however, such as Amazon, can offer a vastly larger subset of these items to clients, including also less popular and/or less known items [43]. The Long Tail is visualized as a graph in which items are ordered by their popularity on the horizontal axis against the popularity rating on the vertical axis, as can be seen on figure 2.1. Physical stores will offer only items from the first part of the curve, whereas the online vendors will also sell items from the remaining ‘long tail’ of the graph [22, 43]. Recommender systems then provide a means to find relevant items within this much larger range of items [43]. They enable to “connect supply and demand, introducing consumers to these new and newly available goods and driving demand down the tail ” [1, 22].

Figure 2.1 : The long-tail: Items ordered by popularity are layed out against their popularity rating. Most of the items reside in the long tail of the graph. Companies such as Amazon can offer a vastly greater subset of the total item space.

Typical applications of recommender systems are product recommenders for online retailers, movie and music recommenders such as Netflix and Last.fm, and news article recommenders in online news services [22, 33, 43].

### 2.1.1 Properties of recommender systems

In [20] and [45], Herlocker et al. and Shani et al. respectively, provide metrics for the performance of recommender systems. They list a number properties associated with these systems. We will also describe methods for testing those properties, as described in [45].

• Accuracy: The accuracy of item recommendations. There are three broad classes of prediction accuracy measures:
• the prediction of the rating given by a user;
• the prediction whether or not a user will actually use the item (for example adding to a queue) opposed to predicting the rating itself;
• the prediction of a ranking among items rather than an explicit rating of each item independently.

This kind of property is typically tested through an offline test on a training set, where parts of the recorded user profiles are hidden. The accuracy of the recommender system can then be determined by comparing the found recommendations to the remainder of the user profiles.

• Novelty: Novel recommendations are recommendations for items that the user did not know about. In a user study, users can be asked whether or not recommendations were new to them.
• Serendipity: Serendipity is a measure of how surprising the successful recommendations are. One can think of serendipity as “the amount of relevant information that is new to the user in a recommendation, or alternatively as deviation from the ‘natural’ prediction” [45]. Serendipitous recommendations usually carry a higher risk, as they fall further from the class of known preferences. To find out the serendipity of recommendations, users can be asked directly if the recommendation was unexpected.
• Diversity: Diversity is generally defined as the opposite of similarity. To measure diversity content-based approaches can be used that compare recommended items.
• Coverage: One way to define coverage is the percentage of all items that are recommended to users during an experiment.

Many of these properties are closely related, as diverse recommendations that have a high coverage are likely to be serendipitous and novel. Also tradeoffs exist between properties; for example accuracy may drop as recommendations become more diverse [45]. Looking at the Long-tail phenomenon, coverage, serendipity, novelty and diversity tend to be important. In the context of music recommendation, users usually want to find new music [33].

### 2.1.2 A classification of recommendation algorithms

Based on classifications presented in [8] and [22], a categorization of different types of recommendation strategies can be identified. We will only discuss the two most prominent ones, namely collaborative filtering (CF) and content-based filtering (CB) [19, 43], and list some hybrid strategies. In the literature on recommender systems other general approaches that are commonly identified, are utility-based filtering, knowledge-based filtering, demographic filtering, and expert-based filtering [4, 8, 49].

#### Collaborative recommendation

Collaborative recommendation aggregates item ratings by users. By establishing overlaps between ratings in the corresponding user profiles, the system finds new item recommendations in the difference between user profiles [8, 19]. User profiles consist of a vector of items and their ratings, which are continuously updated as the user interacts with the system over time [8].

For CF-based recommendation, there are two classes of entities: users $U$ and items $I$. The data itself can then be represented by a utility matrix $A$. The entries $a_{i,j}$ of the utility matrix represent what is known about the degree of preference of user $u_{i}$ and item $i_{j}$ [43]. As can be seen in figure 2.2, the utility matrix will have many blanks as well. The goal of the recommendation algorithm is then to fill in the blanks that are likely to have high ratings [43].

Figure 2.2 : the utility matrix.

In order to calculate the blanks, there is a variety of similarity functions that has been developed, e.g. cosine distance, Pearson correlation, Tanimoto-Jaccard [56]. For example, a small cosine distance will correspond to a high similarity between profiles [43]. The discussion of the mathematics behind each of the algorithms is beyond the scope of this thesis.

#### Content-based recommendation

Content-based recommendation learns a profile of the user’s interests based on the features present in objects the user has rated. New recommendations can then be generated based on a similarity function on these features [8, 42].

When applying content-based filtering, the choice of similarity or classification function will have a significant impact on the quality of the recommendations. More importantly though, is the choice of features. To ensure good performance, these features should also be extracted easily from large quantities of data.

Depending on the type of item that is being recommended, different approaches can be applied to extract features and construct feature vectors. Textual information is often extracted using a technique called stemming. This technique uses root forms capturing a common meaning behind groups of words. Tuples of root forms and term frequency times inverse document frequency (TF.IDF) scores are computed for each word [42, 43]. The words with the highest scores are the words that characterize the document [43]. A downside of stemming is that the process may cause the loss of contextual information for each word [42].

In [4] and [35] web crawlers are used to gather and extract features from online documents. Each property or feature is a ‘bag of words’ that can be used in a naive Bayesian text classifier. This way each item can be categorized and the profile can be ‘learned’ [35].

Tags are very useful as well. Although they can be generated from text, for complex objects such as images and music, tag generation relies on user input [43]. Nonetheless, emerging technologies such as the ‘search by image’ option introduced by Google, allow to retrieve web sites, documents and key words related to the given image [11].

Mathematical models for music also allow for automated feature extraction. Algorithms have been developed to classify music based on content features [34, 54]. There are various types of acoustic features that can be extracted. In [34] a distinction is made between rhythmic content features, pitch content features and timbral content features.

#### Hybrid recommendation

Hybrid filtering combines two or more recommendation algorithms [8]. In [8] a number of hybrid recommendation strategies are discussed. Robin Burke lists seven different approaches for combining recommendation algorithms. Each of these combinations also has its advantages and disadvantages. Not necessarily all combinations will be successful, and not all of them have been implemented [8].

A good hybrid model is likely to outperform any single approach, as weaknesses of one strategy may be cancelled out by the advantages of another [49].

### 2.1.3 Challenges for recommender systems

Each recommendation technique has benefits as well as drawbacks. Some of these apply to all or most types of recommendation strategies, while others are only relevant to certain cases.

The cold start and gray sheep problem both affect prediction accuracy. The black box problem is more closely related to perceived accuracy and trust in the recommender system.

#### Cold start

Both CF and CB-based recommendation algorithms suffer from the ramp-up problem in one way or the other. The ramp-up or cold start problem (although they may refer to slightly different problems depending on the literature) is dual problem that encompasses two distinct, yet related problems as defined in [8]:

• New User : Finding recommendations for users with a limited rating history is hard. Since user profiles tend to build up over time, new users usually fall in this category. As a result, recommender accuracy tends to be lower for new users.
• New Item : A new item will most likely not have that many ratings associated with it, and as a result will not be easily recommended. This ‘new item problem’ typically emerges when new items are constantly added to the system; for example when browsing a constant stream of news articles. When new articles are introduced, not many users have had the chance yet to rate these items. In the case of a news feed, an additional problem is that these items are short-lived, meaning that at some point these item profiles will most likely stop receiving any ratings at all.

Both of these issues translate themselves into a sparse regions in the utility matrix. It is worth noting that content-based recommendation algorithms suffer less from the new item problem, as these tend to rely on features that are inherent to the items themselves, rather than user generated content. This is one of the reasons hybrid approaches can provide a solution to collaborative filtering [8]. For example, in [35], content-based predictors are used to create pseudo-user ratings to reduce sparsity of the utility matrix, used in a collaborative algorithm.

#### Gray sheep

A problem that is typical of collaborative filtering is the gray sheep problem [8, 19]. The gray sheep problem occurs when a user falls between different clusters of users that may have contradicting item ratings. As a result, it is hard to determine how to classify the user [8].

Classifying users is typically a harder problem than classifying artists. An artist can usually be classified into one particular genre, whereas two users may have one genre in common, but may have rather conflicting tastes as well [43].

#### Black box

Another issue with recommendation systems is that these system often appear as ‘black boxes’ towards the end user. The complexity of the algorithms used, prevents the user from understanding the recommendation rationale [62]. This problem may decrease the acceptance by the user of item suggestions. One of the solutions for this problem, proposed by Herlocker et al. in [19], is to provide an explanation system, i.e., the white box\index{white box}, on top of the recommender system that explains the recommendation process. This can be done through providing a transcript of the system’s reasoning or through visualizations [19].

In this thesis we will focus mainly on this problem. We will look at the black box problem in the context of collaborative music recommendation, and try to design, implement and test a new visual explanation system in an effort to overcome this problem.

### 2.1.4 Music recommendation

#### Examples

A number of popular recommender systems for music exist, such as Last.fm, Spotify, Grooveshark, Deezer, Tastekid, Pandora and so on.

Last.fm is an example of a music recommender system that is based on the collaborative filtering [33], although it uses additional content-based approaches as well [49]. Also Pandora, Allmusic and Shazam use hybrid approaches [49].

Bostandjiev et al. [4] have built a music recommender that uses Wikipedia, a content-based source, Facebook, a collaborative source, and Twitter, an expert-based or context-based source.

#### Profile generation

One of the difficulties for any recommendation algorithm is the modeling of good user and item profiles. Song et al. [49] list several approaches for music recommendation.

A typical user profile exists out of two types of data: a collection of demographic, geographic and/or psychologic data, and user listening experience [49].

Music libraries are often constantly changing, as new music is added to the user profile, listening patterns change and listening history is updated. As a result, music recommenders should be able to address this kind of dynamic evolvement [49]. For example, Last.fm deals with this problem through their Scrobbler, which tracks the user’s listening history through various media players [31].

In [49] three types of data are associated with item profiles:

• Editorial metadata (EM): for example composer, performing artist, title, genre, cover art, et cetera.
• Cultural metadata (CM): data obtained from analysis of text-based sources, enabling a categorization of music; for example tag-based feature extraction as described in section 2.1.2.
• Acoustic metadata (AM): data directly obtained from analysis of audio signals; for example wavelet-based approaches described section 2.1.2.

As described in section 2.1.2, item profiles can be used for content-based item recommendation.

## 2.2 Information visualization

In this section different aspects of information visualization that were used to visualize the recommendation rationale are highlighted. First types of data are discussed, next we will look at visual encodings for these types of data. Finally we will discuss some visualization, interaction and data reduction techniques that were used in the SoundSuggest application.

### 2.2.1 Types of data

Information visualization has been focusing on on data sets that lack inherent spatial semantics, thus posing a challenge to map the abstract data onto a two-dimensional screen space [25]. Within this category of data, still different types of data can be distinguished and their characteristics will have an influence on the type of visualization.

Tables of data consist out of rows, representing items, and columns, representing the data dimensions, or ‘attributes’. The number of dimensions is referred to as the dimensionality of the data set [25]. There are three different kinds of dimensions, namely [46]:

• Quantitative: numerical data on which arithmetic can be applied. For example playcount of a particular track, the duration of a track, the number of artists that two users have in common, et cetera. Quantative data occurs in all kinds of music and user metadata.
• Ordered: an enumeration that has a definite order. For example ratings such as ‘good’, ‘average’, and ‘bad’.
• Categorical: data that has no specific ordering, and is distinguished by name only. For example composer names, band members, artist tracks, users who like a particular item, et cetera. Most EM and CM can be considered categorical.

In the utility matrix each column corresponds to an item and a row to a user. The entries of this matrix are typically quantitative data, while item and user names are categorical data.

Relational data on the other hand consists out of nodes and links or edges [25, 46]. Both nodes and edges can have associated attributes. These attributes can again be either of quantitative, ordered, or categorical nature.

The underlying structure of collaborative filtering, the utility matrix, can be interpreted as a dual graph. This is a graph $G(V,E)$ for which $V = U \cup I$ such that $U \cap I = \emptyset \wedge E \subseteq U \times I$ [9]. Each non-blank entry in the utility matrix will then correspond to an edge. Figure 2.3 shows the dual graph of the corresponding utility matrix.

When applied to the context of collaborative filtering, the set of nodes $U$ corresponds to the set of users, and the other set of nodes $I$ is set of items. In conclusion this means that there only exist edges of that go from an item to a user or from a user to an item.

Figure 2.3 : Transforming the utility matrix into a dual graph: two distinct sets of nodes, users and items, only share edges between nodes of different sets.

### 2.2.2 Visual encoding and visual channels

Visual encoding is defined as the mapping of data set attributes to a visual representation. The choice of visual encoding is one of the central problems in the visualization design [46].

Visual encoding takes place through visual channels. A visual encoding corresponds to a graphical element, or ‘mark’. Examples of visual channels are spatial position, color, size, et cetera. The dimension of the mark may vary: a point is a zero-dimensional mark, a line a one-dimensional one, an area a two-dimensional one and so on.

A visual encoding has the following characteristics, as described in [46]:

• Distinguishability: the ability of a user to distinguish between visual encodings;
• Seperability: Separable visual channels are opposed to integral visual channels, which are focused together on a pre-conscious level. Separable visual channels are safe to use for encoding multiple dimensions;
• Pop-out: selecting a channel and make it visually stand out from all the others.

There is a variety of possible visual channels that a visualization designer can turn to in order to create a visual encoding, such as colour, spatial position, size, shape, orientation, and so on. The performance of the visual encoding (through a visual channel) depends on the type of data, i.e. quantitative, ordered or categorical [46]. Figure 2.4 gives an overview of the performance for each category, adapted from [46]. Note that spatial position is the most accurate for each data type [46].

Figure 2.4 : Visual encoding performance for each data type, ordered from best to worst.

#### Colour

In [46] colour is considered in terms of three separate channels: hue, saturation and brightness. This allows for different encodings. Just like for most visual channels, the choice of the channel (hue, saturation or brightness) depends heavily on the type of data.

For categorical data, hue can be successfully applied, keeping in mind its small range. As a rule of thumb, a maximum of eight hue categories should be used in a visualization. An important remark is that roughly 10% of men is red-green color deficient. If a coding uses red and green, it may be wise to apply redundant coding using lightness or saturation in addition to hue [46].

In the SoundSuggest application, items in a list of neighbours are highlighted using the hue visual channel to distinguish between three different groups of items.

#### Spatial layout

Spatial layouts form other visual channels. Although these tend to be the most accurate, spatial layouts in two and three dimensions have several weaknesses; two of these are [46]:

• Occlusion: Parts of the data set become hidden by others. In the case of the mapping of abstract dimensions onto spatial positions, understanding the details of a three-dimensional visualization may be challenging, even if the user is allowed to change viewpoints. For the SoundSuggest application, elements in the visualization may overlap when edges cross.
• Text in arbitrary orientations: Special care has to be taken with text, as it may become very hard to read depending on the orientation. In the SoundSuggest application artist names are turned according to their position on the circle layout.

To overcome limitations of visual channels, various visualization, interaction, and data reduction techniques may be applied [25, 46, 57]. Before discussing this any further, the following section will first take a closer look at graph drawing.

### 2.2.3 Graph-based visualization

Relational data is data that has an inherent relation among its elements [46]. The graph drawing problem describes the problem of how nodes and edges are visualized on a display [21].

Scalability is one of the central issues with graph drawing, as graph size poses several important challenges [21]. The following issues are of course closely related to the general problems of visualization design discussed in section 2.2.2.:

• Viewability and discernability: even if it is possible to layout and display all the elements, it may become impossible to discern between nodes and edges;
• Performance and responsiveness: graph layout algorithms may be relatively complex, and a large number of nodes and edges may become a bottleneck for performance, especially in interactive applications that require reasonable responsiveness;
• Usability: apart from problems with discernability, also information overload may occur. It is known that detailed analysis of data in graph structures is easiest when the displayed graph is small.

As graph size is one of the biggest issues of graph visualization, techniques have been developed to apply data reduction techniques on graphs [21, 46]. Apart from the traditional distortion and data transformation techniques, there are some techniques that are specific for graph-based visualization.

#### Data and dimensionality reduction

Based on a visualization design by Valdis Krebs in [50], a dimensionality reduction can be performed on the dual graph, by keeping only one set of nodes and representing the other set of nodes as implicit information in the edges. Figure 2.5 shows an example of this idea of row reduction. In Krebs’ visualization the items, books purchased from the Amazon web store in this case, were retained. In the resulting graph, two items would share an edge if a user bought both these items. The thickness of the edges represented the number of users that where linked to these items [29, 50].

Figure 2.5 : A row reduction operation on each pair of edges in a dual graph will result in a dimensionality reduction where one set of nodes is removed from the graph. An additional data reduction can be achieved by clustering edges into a thicker edge. Edge thickness then depends on the number of edges involved.

The SoundSuggest application does not use the final step of Krebs’ graph design. Instead parallel edges are retained to keep a direct link between user and edge. In the resulting visualization of the CF-based recommender, a quantification of the similarity between users can then be established by counting parallel edges between items that occur in neighbouring profiles. Figure 2.6 shows how the dual graph from figure 2.3 is transformed into a circular graph layout with the remaining item nodes, similar to the visualization in figure 2.7.

Figure 2.6 : Row reduction applied on the graph in figure 2.5.

To further limit graph size, the active user’s favourite items are used to give a representation of the active user’s profile. This way the user can still directly compare him/herself with neighbouring profiles.

#### Clutter reduction

Based on a survey by Herman et al. [21], and papers by Holten [23], and Holten et al. [24], the following gives an brief overview of two techniques: clustering and edge-bundling.

Clustering is the process of discovering groupings or classes in data based on chosen semantics, i.e., structure, or content. An example of structure-based approach is to bundle groups nodes that have certain number of edges between them. For content-based approaches, edges or nodes with similar attribute values can be clustered.

Due to the sparsity of the utility matrix, we might have to cluster multiple neighbouring profiles in a single node to ensure adequate connectivity within the resulting graph.

Edge-bundling is a technique to visualize compound graphs [23]. Edges are modelled as “flexible springs that can attract each other, similar to how electrical wires are bundled within a network” [24]. Figure 2.7 gives an overview of the different results for varying bundling strengths. It shows how edges are drawn closer towards each other, reducing visual clutter, and highlighting relationships between nodes.

Figure 2.7 : Hierarchical edge bundling. Taken from http://mbostock.github.io/d3/talk/20111116/bundle.html. By increasing the bundling strength, edges will be drawn towards each other, clearly marking pathways between endpoints.

Figure 2.8 shows how this can be applied on the graph from figure 2.6. From a graph drawing perspective we managed to reduce visual clutter by reducing the number of displayed items and applying edge-bundling [21, 23].

Figure 2.8 : Edge bundling applied on the graph in figure 2.6.

## 2.3 Gaining insight into interactive visualization

The goal of the SoundSuggest application is to allow the user to gain insight into the system through an interactive, visual explanation system. Two specific questions arise:

• How does a human gain insight?
• Are there any human limitations imposed on the design of an interactive visualization?

The following subsections try to establish an answer to these questions. Most of the ideas in these subsections are drawn from papers by Yi. et al. [61], Chris North [40], Klein et al. [27, 28], and a book by Colin Ware [57].

### 2.3.1 Insight gaining

In [40] it is argued that insight\index{insight} is not a well-defined term. A formal definition might be too restrictive to capture its essence, and yet too broad to be useful. To quantify insight, [40] and [61] list characteristics that allow a finer evaluation:

• Complex: insight is complex in the sense that it involves large amounts of data, cf. the input data of a recommender system, that form cognitive constructs, rather than individual units.
• Deep: insight is self-generating in a way, as insight provides a starting point for insight on the next level. Users may apply previously gained insight into a certain recommendation on other recommendations.
• Qualitative: insight is subjective, uncertain and can have multiple levels of resolution;
• Unexpected: insight is usually unpredictable, serendipitous and creative;
• Relevant: insight is deeply embedded in the data domain: it gives data meaning as it connects data to the existing domain knowledge: the user has to give meaning to seeminly random suggestions.

The quality of insight can then be determined by quantifying each of these characteristics [40]. North describes methods to evaluate insight gaining through visualizations, such as usability testing, heuristic evaluation, cognitive evaluation, and controlled experiments on benchmark tasks [40].

Controlled experiments are the dominant method for evaluating insight [40]. Chris North notes this kind of experiments suffer from four fundamental problems that may hinder effective evaluation of previously listed characteristics of insight. Such experiments are [40]:

• Predefined: Following specific, predefined instructions leave little room for unexpected insight.
• Limited in time: Short task times leave little room for deep insight.
• Definitive: Multiple choice questions leave little room for quantative insight.
• Superficial: Answers are concise, leaving little room for complex and relevant insight.

In this thesis we will focus on the think aloud technique instead. This is part of an alternative method described by Chris North that includes three key innovations [40]:

• an open-ended protocol;
• a qualitive insight analysis;
• an emphasis on domain relevance.

In this technique users are free to explore the data. By applying the think aloud protocol, the user’s insight’s can be captured. In order to increase domain relevance, the test users should be users from the target audience. Domain experts can then provide a “critical metric for the value of importance of the reported insights in the domain” [40].

#### Sensemaking

Sensemaking plays an important part in insight gaining [61]. The definitions for sensemaking may vary. We adapt the definition presented by [27] and [61].

In [27] sensemaking is looked at from a psychological perspective, a perspective of human-centered computing, and the perspective of naturalistic decision making. Sensemaking is then defined as follows: “sensemaking is a motivated, continuous effort to understand connections in order to anticipate their trajectories and act effectively” [27].

Based on the discussion in [28] and [61], Soo Yi et al. describe the process of sensemaking. Sensemaking is a:

• Cyclic and iterative proceduce: Consisting out of a generation loop searching for representations, a data coverage loop instantiating the representations and finally shift representations;
• Creation procedure: Being more about reasoning than discovery;
• Retrospective procedure: As people construct a framework and assign relevant information to a place withing this framework. If the data fits the framework well, the framework is confirmed, otherwise it may be updated or discarded;

An important remark made in [27] is that data fusion algorithms can reduce information overload, but they also pose challenges to sensemaking if the human can’t form an accurate mental model of the machine, to understand why and how the algorithms are doing what they are doing.

The data and dimensionality reduction algorithms used in SoundSuggest, increase the amount of implicit information that needs to be interpreted by the end user [21, 57]. In order to gain insight into the recommendation process, it is important that certain contextual information is retained. The contextual information we want to convey is two-fold:

1. The strength of the links between a recommendation and the user’s profile;
2. The position of the user in his/her neighbourhood and the relation with those neighbours.

The first type of information is contained in parallel edges between items. For the second type of information, the active user’s neighbours should be included in the visualization in one way or the other. In the resulting visualization the user’s top neighbours are listed next to the graph. By hovering or clicking one of the listed neighbours, the relevant parts of the graph, i.e., items owned by the neighbour and the edges between them, are highlighted.

#### Processes of insight gaining

Although sensemaking can play an important part in gaining insight, it is not the only path to arrive at insight [61]. Yi et al. [61] identify four processes, that are often intertwined, through which insight is established:

• Provide overview: In this process the individual gains understanding of the big picture of the dataset. It allows the user to make a distinction between what is known to him/her and what is not. The user gets an overview of the recommendation rationale.
• Adjust: In this process a person will explore a dataset by adjusting the level of abstraction and/or the range of selection. Typical actions involve filtering and grouping of data.
• Detect pattern: In this process the user will try to identify specific distributions, trends, frequencies, outliers or structure in the dataset. Through numerous interactions, the user is able to identify a pattern in the relationshpis between recommendations.
• Match mental model: In this process the gap between data and cognitive model is bridged, reducing cognitive load and linking the present visual information with real-world knowledge.

The link with sensemaking is found in the cyclic and iterative nature of sensemaking – provide overview, adjust and detected pattern can be applied iteratively, as well as its creative and retrospective aspects – adjust and detect pattern create hypotheses and test them through various interaction techniques [61].

It would be interesting to see if we can identify these steps in the user tests performed in this thesis as well.

#### Improving insight

Yi et al. [61] identify several ways in which the insight gaining process can be made more efficient. They list the system’s interactivity, the quality of visual encodings and usability among others, as possible enablers for increased insight gaining. Naturally, careless designs will act as barriers rather than enablers in the insight gaining process.

INTERACTIVITY Interactivity of the system promotes the user’s engagement into the dataset. Spending more time with the data will allow users to form more detailed and accurate hypotheses, and as a result greater insight [61]. At the same time, while using the visualization, the user will become more skilled at a task over time. Nonetheless, bare in mind that when performing long and tedious search tasks, vigilance will become an important aspect as well in the efficiency of data exploration [57]. Visual explanation systems may involve interaction techniques. For example, in the SoundSuggest application, users can click an item node. Related item nodes will then be highlighted with it as well. An important consideration is that the interval between the human action and the result on the screen should be low, promoting the so-called priciple of transparency. The objective of transparency is that the user will have the illusion of directly manipulating the data on the screen.

VISUALIZATION DESIGN Similarly visual encodings that are counter-intuitive will also increase the cognitive load. Other barriers on insight gaining are clutter, occlusion and data overload [61]. In the application we have built, we applied clutter reduction techniques for graph visualization.

USABILITY Usability is another aspect that may have an impact on the insight gaining process, as controls that are hard to use will inevitably occupy some of the cognitive capacity of the user [61]. In the ISO standard ISO 9241-11, usability is defined as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [55].

Note that usability should not be considered a one-dimensional property of a user interface. Nielsen identifies several characteristics of usability in applications [37]:

• Learnability: if the system is easy to learn, the user can get started quickly;
• Efficiency: if the system is efficient to use, it will be possible to complete more work in less time;
• Error rate and severity: if the system should be robust and minimize faults;
• Memorability: once the system is learned, acquired skills should not be forgotten easily;
• Satisfaction: the system should be pleasant to use.

Note that an application may not necessarily have to have a high value for each of these properties. An expert visual explanation system that is highly complex may score low on learnability, while still remaining a very usable system. On the other hand, a system for casual users may require high learnability, otherwise users will be reluctant to use it. The application we have built is aimed at users that want to dig deeper into the recommendation data to gain a better understanding of the recommendation system’s rationale, but without going into high detail.

One of the aspects of the user tests we conducted involved testing the perceived usability of the application through test questionnaires. Through the think aloud protocol described ealier, some usability problems could also be detected through observation.

### 2.3.2 Interactive visualization

The relation between insight gaining and data visualization has been pointed out in other research. Colin Ware [57] describes interactive visualization as an “internal interface between the user and the computer in a problem solving system”. Keim [25] notes that “idea behind visual data exploration, is to present data in a visual form, allowing the user to gain insight into the data, draw conclusions, and directly interact with the data”.

In a chapter on visualization in [46], Tamara Munzner describes visualization as follows: “visualization allows the user to offload cognition to the perceptual system, using graphical data representations as a form of external memory. Therefore, by augmenting human capabilities, the data analyst is aided to understand, explore and form hypotheses of the data” [46]. In conclusion, the visual data exploration process can then be understood as a hypothesis generation process [25].

A reoccurring theme in visualization design is Schneiderman’s mantra: “overview first, zoom and filter, and details on demand” [25, 46, 57]. First, the user looks for patterns of interest in the data space. Next the user focuses on one or more of these patterns, and starts looking at the data on a more detailed level. The user can then draw his/her conclusions and explore the data space further [25].

In what follows, we try to describe how a human interacts with interactive visualization on a cognitive level. It will be clear that some parallels can be drawn with the insight gaining process. This should come as no surprise, since these processes are intertwined [25, 57, 61].

In [57], interactive visualization is characterized by three classes of feedback loops:

• Data selection and manipulation loop: the user selects and moves objects that are selected through simple interactions based on eye-hand coordination;
• Exploration and manipulation loop: the user tries to find his/her way through a large visual data space;
• Problem solving loop: the user forms hypotheses about the data and refines them through an augmented visualization process.

The next subsections describe each feedback loop in greater detail.

#### Data selection and manipulation loop

In the data selection and manipulation loop a user will try to interact with the visualization. For example by clicking a node in a graph, hovering a list of elements and so on. The quality of performance of selecting and manipulating data on a screen, depends on certain factors. Colin Ware discusses the following attributes:

• Reaction time: This is the amount of time for a user to identify and select certain objects.
• Types of interaction: different types of interaction will have a different influence on user performance. In our visualization only two types of interaction are supported: clicking and hovering buttons, links or words.
• Learning: The speed at which a user performs a task may decrease over time, as the user becomes more skilled at executing the task.

Each of these attributes may be influenced by different factors. Through experiments predictions for these parameters have been captured in various laws such as Hick-Hyman law (reaction time), Fitt’s law (selecting an object in a two-dimensional space) and the power law of practice (learning effects) [57]. Based on these laws, it is expected that reaction time will decrease over time as users become more familiar with the visualization.

In the second loop the user navigates through the data space. The basic navigation control loop is described as an iterative process that involves two distinct aspects [57]:

• Human: The user gains understanding of, i.e., gains insight into, the data space through a logical, spatial model. Parts of this model may be encoded in the longterm memory, on the condition that the data space is maintained for a long enough period of time.
• Computer: The visualization may be updated and refined based on user input. Through clicks and hover queries the user will be able to change parts of the visualization.

When exploring spatial maps, a user is confronted with the focus-context problem, i.e., “the problem of finding detail in a larger context”[57]. The objective is to see the relation between the larger context and the details, rather than finding details. Ware goes on to discern between spatial, structural and temporal scales in which the focus-context problem manifests itself [57].

Ware and Mitchell remark that the human visual system is already well-adapted to the spatial focus-context problem [57]. Therefore, they argue that when designing a display, it should already try to take a maximal advantage of these perceptual skills. This can be done by displaying as much data as possible, without causing visual overload [57]. Of course, there are always computational costs, either for finding the best way to represent large amounts of data, or instead, for using interaction techniques to delve deeper into the data [21, 46].

#### Problem solving loop

The problem solving loop can be described through means of a visual thinking algorithm [57]. Such an algorithm combines perceptual and cognitive actions into a process, as the user interacts with the visualization and explores the data space. As we want to keep the cost of knowledge low, it is obvious that the cost and time complexity of each of these actions should be kept at a minimum. The cognitive system that runs these algorithms, is made up out of several different components [57].

Visual queries translate a hypothesis into a cognitive task. The result of a visual query can be a pattern or lack of a pattern. To support visual queries, actions that support information search are executed, called epistemic actions. Out of all epistemic actions, eye movements have the lowest cost, before mouse selection and hover queries. At the lowest level, elementary features such as color, texture information, and local edges are extracted from the image. Next, patterns are recognized by combining these features. Through eye movements possibly interesting patterns are explored. In the visual working memory, which forms the intermediary between the long-term memory and the incoming patterns, patterns are processed as object files; these are a combination of visual attributes and semantic meaning. Internal images can be combined with external images to construct and test hypotheses about the visualized data [57].

We will try to approximate a visual thinking algorithm\index{visual thinking algorithm} applied by users when exploring SoundSuggest‘s visualization. The algorithm in algorithm 1 is the degree-of-relevance highlighting algorithm derived by Ware and Mitchell [57].

Algorithm 1 : Degree-of-relevance highlighting visual thinking algorithm by Ware and Mitchell [57].

Display environment: A display containing many symbols representing entities linked by a complex set of relationships.

1. Construct a visual query to find a symbol that may lead to useful information (information scent). For example, the user wants to find out more about a particular recommendation, shown in the visualization.
2. Execute an epistemic action by selecting a symbol. A symbol corresponds to either a user from the user list, or node on the graph in the visualization.
3. Computer highlights all symbols with a high degree of relevance to the selected symbol. These are relevant parts of the graph, i.e., items owned by the neighbour and the edges between them, are highlighted.
4. Execute a visual pattern query among the highlighted symbols for additional information scent.
5. If a very high relevance symbol is found, execute an epistemic action to drill down for additional information. Usually this will be presented in a different display window. In this window artist and user metadata are shown, for example artist playcount, shared top artists et cetera.
6. Repeat from step 1 as needed, cognitively marking visited symbols.

Based on this algorithm, we can make an estimation of the efficiency by which the user may to use the visualization. We can combine this with the insight gaining process, described in section 2.3.1, to get a better understanding of the user. Ware and Mitchell state that “if the degree of relevance algorithm can reduce the visual search to around $10$ to $20$ items, then the gain in cognitive efficiency can easily be an order of magnitude” [57]. They point out even though that degree-of-relevance highlighting can be used for data displays with over a thousand items, to be efficient, the number of members in the highlighted patterns should be much lower [57].

## 2.4 Visual explanation systems

In this section we will take a look at visual explanation systems that already exist and that have been evaluated as well. Five different applications are discussed: PeerChooser, Pharos, SFVis, SmallWorlds, and TasteWeights.

First we will give an overview of a number of objectives for explanation systems listed by Tintarev and Masthoff in [53]. A general comparison of these systems is made, based on the previously mentioned aims. Next some additional information is given for each application.

### 2.4.1 Comparing visual explanation systems for item recommendation

To compare explanation systems, Tintarev and Masthoff [53] identified a set of aims for explanation systems. Table 1.1 gives an overview of these aims.

Tintarev and Masthoff explain that some of these aims are hard to reconcile. For example, a system that focuses on persuasion and efficiency, is likely to score poorly for effectiveness, as the user may be quicker to try something he/she may not like. As a result they indicate that the performance for these metrics depends on the overall system goal [53].

Table 2.2 shows which of the characteristics described by Tintarev and Masthoff, were pursued for each of the explanation systems. All of the presented systems aim to provide transparency into the recommender system through a visualization. Four out of the five systems listed here, allowed the user to make changes to the recommendation process, adding scrutability. Persuasiveness is a measure that was not explicitly covered in these studies. Most of the studies tried to find out if the recommendations found, using the visualization, were effective. Also satisfaction, trust and efficiency were evaluated in some of the studies.

Table 2.2 : A comparison of the visual explanation systems, based on the aims by Tintarev and Masthoff listed in [53].

 Tra. Scr. Trust Efk. Pers. Efc. Sat. PeerChooser x x x x Pharos x x SFVis x x SmallWorlds x x x x TasteWeights x x x x x

### 2.4.2 PeerChooser

PeerChooser is a “collaborative movie recommender system with an interactive graphical explanation interface” [41]. It aims to address the black box problem, described in section 2.1.3 [41]. Figure 2.9 shows the graph-based interface of the PeerChooser application.

Figure 2.9 : The PeerChooser interface.

The application shows a network of the user’s neighbourhood. The visualization uses a force-directed graph layout where the on-screen distance between nodes corresponds an approximation of the node similarity. The active user is able to manipulate this neighbourhood by repositioning nodes of the graph. To deal with the high dimensionality of the data, extra cluster nodes are added to the visualization. These cluster peer nodes by genre [41].

By moving a cluster node closer towards the active user’s node, all user nodes associated with this cluster will be drawn closer towards the active user node. As a result, these profiles will be temorary considered more similar than before. Similarly individual users can be moved closer towards or further away from the active user node [41].

### 2.4.3 Pharos

Content-centric social websites, such as blogs and discussion forums, contain vast amounts of fast growing information. Recommendation systems have been developed to help users find the information they are looking for. The Pharos application tries to address two distinctive problems that present themselves in this context: the cold start problem and the black box problem [62].

As they hope to overcome previously defined problems, Zhao et al. [62] collect and visualize content-related social behaviour. The data set is tranformed into a social map. The social map provides a context for new users, addressing the cold start problem. Secondly, the user can explore the social map to increase understanding and user interaction, in an effort to overcome the black box problem. An example of what the resulting visualization looks like, is depicted in figure 2.10.

Figure 2.10 : The Pharos social map. Colours indicate activity within a certain group.

The generation of the social map takes place through the following three step process:

• Community extraction: a map depicting ‘which users are talking about what’. Starting from either relationships, people or content communities can be derived;
• Community/item/people ranking: the next step is to rank these communities. The ‘hotness’ can be measured on content, people authorities and so on;
• Community labeling: describing what each community is about.

### 2.4.4 SFVis

SFVis (Social Friends Visualization) is an application developed by Gou et al. [17] that helps users interactively explore and find friends under a context of interest. The system is a hybrid approach of social tags and social networks. SFVis uses circular visualizations for the different trees and graphs for views and interaction. Figure 2.11 displays what the resulting visualization could look like.

Figure 2.11 : SFViz graphical user interface: tag tree.

The SFVis framework transforms a data model consisting out of social tags and social networks, into a visual form. Users can both manipulate the input and visuals on demand, adding scrutability.

The visualization is constructed as follows. Social tags can form a network. Within this structure clusters may arise. From this tag network, a hierarchy is derived. A compound graph is generated from the tag hierarchy and social networks. A mapping function assigns actors in each social network to a tag tree. The actor similarity algorithm in SFVis considers both structure similarity in a social network and semantic similarity in a tag network. These scores will allow the recommendation system to compute friend suggestions [17].

### 2.4.5 Smallworlds

In [18], Gretarsson et al. used the Facebook API to create an application to generate social recommendations for Facebook users. Unfortunately the Facebook API does not support unauthorized reading of item preference information beyond the immediate friend group. This would not have been a problem unless traditional collaborative filtering strategies tend to produce item suggestions of inferior quality for small items. In this case however the research team relies on the social filtering through the active user’s peer group [18].

SmallWorlds is “a visual interactive graph-based interface that allows users to specify, refine and build item-preference profiles” [18]. The system promotes transparancy in the recommendation process, and gives the user a sense of control over the recommendation process through interactions. This way, Gretarsson et al. try to further overcome the limitations of their recommender system [18].

SmallWorlds uses a five-layered design to create suggestions:

1. the active user’s node;
2. the active user’s profile items;
3. friends who have items in common with the active user;
4. items that are not in the active user’s profile, but are liked by friends in layer $3$;
5. friends who have no items in common with the active user and items in their profiles, but not items in the profiles of friends in layer $3$.

The user can move nodes in each layer further or closer towards the active user’s node to adjust the weights of each node. This is used in combination with similarity functions to calculate the suggestions. Figure 2.12 shows a screenshot of the application.

Figure 2.12 : The SmallWords interface.

### 2.4.6 TasteWeights

TasteWeights is a hybrid recommender with an interactive graphical user interface [4]. The application allows the user to express his/her preferences by changing the weights of incoming data sources.

One of the challenges Bostandjiev et al. try to address is that “social web APIs and other data sources are constantly evolving, and traditional recommender system techniques such as automated collaborative filtering need to adapt to the changing environment of the social web” [4] (cf. Smallworlds). They introduce two enhancements for the traditional techniques. Multiple web sources, namely from Facebook, Twitter and Wikipedia are combined when computing the recommendation. This combination provides a hybrid of different recommendation strategies, namely: collaborative filtering, expert-based and content-based respectively. The second enhancement is a new user interface that provides transparency into the recommendation process.

There are three levels to be distinguished that are represented visually as well:

• Profile layer: liked items on Facebook;
• Context layer: items coming from different sources, namely Twitter, Facebook, and Wikipedia;
• Recommendation layer containing the actual recommendations.

Figure 2.13 shows the corresponding visual representation of each of these levels. Edges connect relevant parts between each of these levels on the visualization, in an attempt to explain the provenance of item recommendations. The user can influence the outcome displayed in the recommendation layer by attributing weights to the nodes in the profile layer and context layer [4].

Figure 2.13 : The TasteWeights interface.

## 2.5 Summary

First recommender systems were discussed: we looked at types of recommendation algorithms, a number of properties of recommender systems, and finally some challenges of recommender systems. One of these challenges was the black box problem. We saw that visualizations can be used to develop a white box model.

Next, aspects of information visualization were discussed: we looked at different types of data and visual encodings. We saw that the recommendation rationale behind collaborative filtering could be interpreted as a set of relationships between users and items. As a result, a graph-based approach could be used to visualize this data. Using data, dimensionality, and clutter reduction techniques along with interaction techniques, we developed a visualization that could serve as a white box model for collaborative filtering.

Then we described how users can gain insight into interactive visualization. We look at evaluation techniques for insight gaining. Finally a visual thinking algorithm was proposed that describes how a user could gain insight into the visualization developed in the previous section.

The resulting idea for the visualization is shown in figure 2.14. This will serve as the starting point for the iterations described in chapter 4.

Figure 2.14 : The resulting visualization serving as a white box model for collaborative filtering.

In the last section we looked at five visual explanation systems that already exist. A comparison of these systems was presented based on the evaluation criteria by Tintarev and Masthoff listed in [53]. We hope to obtain interesting results based on these criteria for the previously decribed model. The objectives for the SoundSuggest application with respect to Tintarev and Masthoff’s aims are listed in table 2.3.

Table 2.3 : The objectives for the SoundSuggest application with respect to Tintarev and Masthoff’s aims. A $\pm$ indicates that this aim is not a priority, based on the success criteria listed in section 1.1.3.

 Tra. Scr. Trust Efk. Pers. Efc. Sat. $+$ $-$ $\pm$ $\pm$ $\pm$ $\pm$ $+$

The next chapter describes SoundSuggest’s target audience, the application’s story board, and use cases. Chapter 4 gives an overview of the evaluation and design iterations. Chapter 5 describes the implementation of the SoundSuggest application. In chapter 6 conclusions and future work are discussed.