Android apps and user feedback: a dataset for software evolution and quality improvement (original) (raw)

Recommending and Localizing Change Requests for Mobile Apps Based on User Reviews

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017

Researchers have proposed several approaches to extract information from user reviews useful for maintaining and evolving mobile apps. However, most of them just perform automatic classification of user reviews according to specific keywords (e.g., bugs, features). Moreover, they do not provide any support for linking user feedback to the source code components to be changed, thus requiring a manual, time-consuming, and error-prone task. In this paper, we introduce CHANGEADVISOR, a novel approach that analyzes the structure, semantics, and sentiments of sentences contained in user reviews to extract useful (user) feedback from maintenance perspectives and recommend to developers changes to software artifacts. It relies on natural language processing and clustering algorithms to group user reviews around similar user needs and suggestions for change. Then, it involves textual based heuristics to determine the code artifacts that need to be maintained according to the recommended software changes. The quantitative and qualitative studies carried out on 44 683 user reviews of 10 open source mobile apps and their original developers showed a high accuracy of CHANGEADVISOR in (i) clustering similar user change requests and (ii) identifying the code components impacted by the suggested changes. Moreover, the obtained results show that CHANGEADVISOR is more accurate than a baseline approach for linking user feedback clusters to the source code in terms of both precision (+47%) and recall (+38%).

How can i improve my app? Classifying user reviews for software maintenance and evolution

2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2015

App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task. In this paper we present a taxonomy to classify app reviews into categories relevant to software maintenance and evolution, as well as an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis to automatically classify app reviews into the proposed categories. We show that the combined use of these techniques allows to achieve better results (a precision of 75% and a recall of 74%) than results obtained using each technique individually (precision of 70% and a recall of 67%).

Small Scale Analysis of Source Code Quality with regard to Native Android Mobile Applications

2015

The popularity of smart phones and mobile applications is growing every day. Every day, new or updated mobile applications are being submitted to different mobile stores. The rapidly increasing number of mobile applications has led to an increase in the interest in overall source code quality. Therefore, we will present a case study, where we analyzed different open source Android mobile applications from different domains and different sizes. They were analyzed using the SonarQube platform, based on the SQALE method. We were aiming to research the overall code quality, the connection between lines of code and technical depth and the most common issues facing mobile applications. The results show that the majority of applications tend to have similar code issues and potential difficulties when it comes to maintenance or updates.

An Empirical Investigation on the Effect of Code Smells on Resource Usage of Android Mobile Applications

IEEE Access

Code smells refer to suboptimal coding practices which impact software quality and software non-functional requirements such as performance, maintainability, and resource usage. Although desktop application code smells have been extensively studied in the literature, mobile applications are relatively new in nature, and the effect of code smells is only recently being studied on mobile devices. This paper investigates the effect of code refactoring on enhancing both CPU usage and Memory usage. It presents a study of three code smells: HashMap Usage, Member Ignoring Method and Slow Loop, and eight open-source applications were selected from Github for testing purposes. The three aforementioned code smells were refactored individually and cumulatively to study their effects on a mobile phone's resource usage, with CPU usage and memory usage as the metrics of choice. The resource usage of five different versions of eight different mobile applications were measured to find the optimal refactoring strategy. The results obtained suggest that refactoring HashMap Usage and Member Ignoring Methods yielded significantly an average improvement in CPU usage of 12.7% and 13.7% respectively, while the refactoring of all three code smells yielded an improvement of up to 7.1% in memory usage. This research shows that certain refactoring methods have significant impacts on improving both the CPU usage and Memory usage. These statistically significant results can be used as the basis of guidelines to assist in writing codes which utilize smartphones' resources more efficiently and enhance their quality. INDEX TERMS Code smells, Android, resource usage.

Lightweight detection of Android-specific code smells: The aDoctor project

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2017

Code smells are symptoms of poor design solutions applied by programmers during the development of software systems. While the research community devoted a lot of effort to studying and devising approaches for detecting the traditional code smells defined by Fowler, little knowledge and support is available for an emerging category of Mobile app code smells. Recently, Reimann et al. proposed a new catalogue of Androidspecific code smells that may be a threat for the maintainability and the efficiency of Android applications. However, current tools working in the context of Mobile apps provide limited support and, more importantly, are not available for developers interested in monitoring the quality of their apps. To overcome these limitations, we propose a fully automated tool, coined ADOCTOR, able to identify 15 Android-specific code smells from the catalogue by Reimann et al. An empirical study conducted on the source code of 18 Android applications reveals that the proposed tool reaches, on average, 98% of precision and 98% of recall. We made ADOCTOR publicly available.

App store mining is not enough for app improvement

Empirical Software Engineering, 2018

The rise in popularity of mobile devices has led to a parallel growth in the size of the app store market, intriguing several research studies and commercial platforms on mining app stores. App store reviews are used to analyze different aspects of app development and evolution. However, app users' feedback does not only exist on the app store. In fact, despite the large quantity of posts that are made daily on social media, the importance and value that these discussions provide remain mostly unused in the context of mobile app development. In this paper, we study how Twitter can provide complementary information to support mobile app development. By analyzing a total of 30,793 apps over a period of six weeks, we found strong correlations between the number of reviews and tweets for most apps. Moreover, through applying machine learning classifiers, topic modeling and subsequent crowd-sourcing, we successfully mined 22.4% additional feature requests and 12.89% additional bug reports from Twitter. We also found that 52.1% of all feature requests and bug reports were discussed on both tweets and reviews. In addition to finding common and unique information from Twitter and the app store, sentiment and content analysis were also performed for 70 randomly selected apps. From this, we found that tweets provided more critical and objective views on apps than reviews from the app store. These results show that app store review mining is indeed not enough; other information sources ultimately provide added value and information for app developers.

What would users change in my app? summarizing app reviews for recommending software changes

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Mobile app developers constantly monitor feedback in user reviews with the goal of improving their mobile apps and better meeting user expectations. Thus, automated approaches have been proposed in literature with the aim of reducing the effort required for analyzing feedback contained in user reviews via automatic classification/prioritization according to specific topics. In this paper, we introduce SURF (Summarizer of User Reviews Feedback), a novel approach to condense the enormous amount of information that developers of popular apps have to manage due to user feedback received on a daily basis. SURF relies on a conceptual model for capturing user needs useful for developers performing maintenance and evolution tasks. Then it uses sophisticated summarisation techniques for summarizing thousands of reviews and generating an interactive, structured and condensed agenda of recommended software changes. We performed an end-to-end evaluation of SURF on user reviews of 17 mobile apps (5 of them developed by Sony Mobile), involving 23 developers and researchers in total. Results demonstrate high accuracy of SURF in summarizing reviews and the usefulness of the recommended changes. In evaluating our approach we found that SURF helps developers in better understanding user needs, substantially reducing the time required by developers compared to manually analyzing user (change) requests and planning future software changes.

An Empirical Study on the Impact of Refactoring on Quality Metrics in Android Applications

2021 IEEE/ACM 8th International Conference on Mobile Software Engineering and Systems (MobileSoft)

Mobile applications must continuously evolve, sometimes under such time pressure that poor design or implementation choices are made, which inevitably result in structural software quality problems. Refactoring is the widely-accepted approach to ameliorating such quality problems. While the impact of refactoring on software quality has been widely studied in object-oriented software, its impact is still unclear in the context of mobile apps. This paper reports on the first empirical study that aims to address this gap. We conduct a large empirical study that analyses the evolution history of 300 open-source Android apps exhibiting a total of 42,181 refactoring operations. We analyze the impact of these refactoring operations on 10 common quality metrics using a causal inference method based on the Difference-in-Differences (DiD) model. Our results indicate that when refactoring affects the metrics it generally improves them. In many cases refactoring has no significant impact on the metrics, whereas one metric (LCOM) deteriorates overall as a result of refactoring. These findings provide practical insights into the current practice of refactoring in the context of Android app development.

Examining the Relationship between FindBugs Warnings and End User Ratings: A Case Study On 10,000 Android Apps

IEEE Software, 2016

In the mobile app ecosystem, end user ratings of apps (a measure of end user perception) are extremely important to study as they are highly correlated with downloads and hence revenues. In this study we examine the relationship between the app ratings (and associated review-comments) from end users with the static analysis warnings (collected using FindBugs) from 10,000 free-todownload Android apps. In our case study, we find that specific categories of FindBugs warnings such as the 'Bad Practice', 'Internationalization', and 'Performance' categories are found significantly more in low-rated apps. We also find that there exists a correspondence between these three categories of warnings and the complaints in the review-comments of end users. These findings provide evidence that certain categories of warnings from Find-Bugs have a strong relationship with the rating of an app and hence are closely related to the user experience. Thus app developers can use static analysis tools such as FindBugs to potentially identify the culprit bugs behind the issues that users complain about, before they release the app.

QDroid: Mobile Application Quality Analyzer for App Market Curators

Mobile Information Systems, 2016

Low quality mobile applications have damaged the user experience. However, in light of the number of applications, quality analysis is a daunting task. For that reason, QDroid is proposed, an automated quality analyzer that detects the presence of crashes, excessive resource usage, and compatibility problems, without source codes and human involvement. QDroid was applied to 67 applications for evaluation and discovered 78% more crashes and attained 23% higher Activity coverage than Monkey testing. For detecting excessive resource usage and compatibility problems, QDroid reduced the number of applications that required manual review by up to 96% and 69%, respectively.