A Similarity-Based Software Recommendation Method Reflecting User Requirements (original) (raw)

This study proposes a software recommendation method that reflects user requirements. The proposed method consists of a vector generation process and software retrieval process. Figure 1 shows the flow of software recommendation.

The proposed method consists of a vector generation process and software retrieval process. The vector generation process consists of the function matrix and function vector generation stage, user requirement functions, and a requirement vector generation stage. The software retrieval process consists of a software list generation phase using the Boolean model, and a similarity calculation phase based on the software list. The software recommendation list is generated based on the calculated similarity. The software list is generated using the Boolean model including the mandatory functions of user requirements. Then, the similarity between the requirement vector and function vectors are calculated based on the generated software list. The software recommendation list is generated based on the similarity. We expect that this will improve the search speed.

3.1 Generation of the Function Matrix and Function Vectors

To recommend software that reflects user requirements, the function matrix and function vectors are generated. To this end, the function matrix is defined based on software functional specifications, and the function vectors are assigned based on data collected from the web. The software information can be obtained by referencing data such as a dictionary or official homepage specifying the software information or conditions. Furthermore, it can be used by referring to the results of analyzing the data evaluating the software from users. Software can have various functions, and such functions can be added or modified according to user requirements. The generated function vectors are then stored in the database.

Table 1 shows the structuring of functions used as an example of software functions.

The software functions include the category, supported operating system, hardware specifications, programming language, other functions, usage level, and so on.

The categories include the Programming Language, Database Management System, and Web Design Programming according to the software use. The operating systems are base development environments for the use of software, and can be classified as Linux, Windows, and other systems. The programming languages are classified as C, C++, Java, and other development languages. The number of functions, scalability, and so on are functions that can be quantified. The functions can be classified and defined based on the software category.

The function vector values of the software are calculated based on the data collected from the web and statistical data. Eq. (2) is used to calculate the value of the function vector value, vs.

(2)vs=(b*wb)+(fgf*wf).

Here, b indicates the existence/non-existence of the corresponding function (Boolean value), f is the usage frequency of the corresponding function, and gf is the sum of the usage frequencies of the function group the corresponding function belongs to. In addition, wb is the weight of the Boolean, and wf is the weight of the usage frequency. Each has a value of between zero and 1 inclusively. Moreover, the sum of wb and wf is 1. As default values, wb is set to 1, and wf is set to zero, although each weight can be changed according to the user’s choice. By setting the weights, the usage frequency can be considered. However, if the absolute value is the same as the classification, OS, wb is set to 1, and wf is set to zero. These functions are used in the Boolean model. The value calculated through Eq. (2) is represented as a function vector.

The function indicating degree such as number of functions, scalability, security level, and processing speed is normalized to a value between zero and 1. Normalization is calculated using Eq. (3).

(3)vn=vf-min(n)max(n)-min(n).

Here, vn is the normalized degree of the function. n is a set of values {v1, v2, ..., vn} for each function of the software, and vf is the degree value of the function.

3.2 Generation of the Requirement Vector

To reflect user requirements, the functions and vector of user requirements are generated. The functions of user requirements are defined, and their values assigned according to the criteria of the user requirements. The functions of user requirements are calculated based on the importance conditions selected by the user. First, after assigning the mandatory/optionality function condition, the importance condition for the remaining functions is selected. For example, suppose that a user selects software functions a, b, c, d, and e, and assigns a and c as mandatory conditions. The vector values can then be given to the software functions selected by the user, as shown in Table 2. The functions a, c, and b, d, and e are assigned values between 0 and 1, respectively.

In this paper, the importance conditions refer to the importance ratio and priority, and the user can select any desired condition. The user requirement vector values are calculated according to the respective importance conditions, and herein, the sum of the respective user requirement vectors is 1. Next, with respect to the importance conditions, explanations are provided for the processes used to calculate the requirement vector values according to the assignment of the importance ratio value, calculating the requirement vector values according to the assignment of priority.

When a user assigns an importance ratio value as an importance condition for a software function, the requirement vector value, vi, is calculated using Eq. (4). The calculated value indicates the value according to the importance ratio of the software function set up by the user. The user-assigned ratio value of the software function is the proportion of the ratio value of the corresponding function in the total sum, and the sum of all function values is 1.

(4)vi=Ik∑k=1nIk.

Here, Ik is the ratio value for the corresponding function value, and n is the number of functions selected by the user. The weight is calculated by calculating the sum of all ratios. The value calculated through Eq. (4) is represented as a user vector. By setting the importance ratios for the software functions selected by the user, appropriate software for user requirements can be recommended.

When a user assigns a priority to an importance condition for a software function, the requirement vector value, vp, is calculated using Eq. (5). The calculated value refers to the value obtained according to the priority of the software function set up by the user. The sum of all vector values is 1.

(5)vp=Pk-1∑k=1nPk-1.

Here, Pk is a value assigned for the priority of the corresponding function value, and n is the number of functions selected by the user. To calculate the weight according to the priority, the priority value of each function is processed as a reciprocal, and the total sum of the values is calculated and used. The value calculated through Eq. (5) is represented as a requirement vector. By setting the priorities for the software functions selected by the user, appropriate software can be recommended for the user requirement.

The requirement vector value, vr, according to the user requirement is defined as Eq. (6) by assigning the importance ratio and priority, among other factors. Depending on the situation, other conditions can be added beyond these two, and according to such conditions, the ratios are calculated. The total value of the ratios is set to 1.

(6)vr={Impormation ratio,vi,Priority,vp.

3.3 Generation of Software List Using the Boolean Model

We used the Boolean model based on functions with Boolean values in the user requirements functions set. We then generated a software list using the Boolean model. To generate the software list, we used the generated software-function matrix.

Table 3 shows an example of the software-function matrix. If the operating system in the user’s requirement is “Windows AND Linux”, we take the vectors for Windows and Linux, and then do a bitwise AND. 11110 AND 11010 = 11000 results are SW-A, SW-B, and SW-D. The Boolean model can be used for any query in the form of a Boolean expression of functions, that is, in which functions are combined with the operators AND, OR, and NOT. We generated the software list by using the Boolean model, and pass passed the software list on to the next step.

3.4 Similarity Calculation between the Requirement Vector and Function Vectors

To recommend software that reflects user requirements, the similarity between the user’s requirement function and software function is calculated. To calculate the similarity, a cosine similarity equation of the vector space model is used. With the cosine similarity calculation method, the similarity is measured using the cosine value between the requirement vector and function vectors. The cosine value has a value of between zero and 1 inclusively, and when it approaches 1, the user requirement and software function are determined as similar. The cosine similarity between the requirement vector and function vectors is calculated using Eq. (7).

(7)Sim(s→, u→)=∑i=1nsiui∑i=1nsi2∑i=1nui2.

Here, s is the function vector value, and u is the requirement vector value. The similarity between the requirement vector and function vectors are calculated, and the software recommendation list produced by sorting the similarity values in descending order.