How Web Scraping Google Play App Reviews Will Create Dataset For Sentiment Analysis?
A guide to using Python to Scrape Android App Reviews and turn the data into a sentiment analysis database.
Let’s look at how to scrape reviews and ratings for Android apps to produce a dataset for sentiment analysis. You’ll save the material to CSV files after converting the application and reviewing the data into Data Frames.
Executing the code with Scripting with Pytorch (Google Colab)
Installing necessary packages and setting up the imports
You’ll learn how to:
- Establish an objective and criteria for including your dataset.
- Look for real-world consumer comments on the internet.
- Use Pandas to convert and store the dataset into CSV files, which you can find on Google Play.
- The Dataset’s Purpose
Setup:
import json import pandas as pd from tqdm import tqdm import seaborn as sns import matplotlib.pyplot as plt from pygments import highlight from pygments.lexers import JsonLexer from pygments.formatters import TerminalFormatter from google_play_scraper import Sort, reviews, app %matplotlib inline %config InlineBackend.figure_format='retina' sns.set(style='whitegrid', palette='muted', font_scale=1.2)
The Target of the Dataset
You’d like to receive customer feedback on your items, whether positive or negative; both are valuable. You’d want to know what other people think of your app. Both the negative and positive features are advantageous. The negative one, on the other hand, may reveal critical features that are missing or service disruptions (when it is much more frequent).
Fortunately, Google Play offers a diverse selection of apps, ratings, and reviews. We can scrape app metadata and reviews using the google-play-scraper program.
When it comes to evaluating apps, you have a lot of alternatives. On the other hand, different app categories have diverse target audiences, domain-specific characteristics, and so on. Let’s start with the fundamentals.
We need applications that have been around for a long so that natural feedback may be gathered. We want to keep the amount of advertising we utilize to a minimum. Because apps are updated on a regular basis, the date of the review is crucial.
In a perfect world, you’d collect every possible review and use it to your advantage. In the real world, however, data is frequently restricted (too large, inaccessible, etc.). As a result, we’ll give it our all.
Let’s take a look at a few apps that meet the Productivity category’s requirements. We’ll use AppAnnie to select a few of the most popular apps in the US:
app_packages = [ 'com.anydo', 'com.todoist', 'com.ticktick.task', 'com.habitrpg.android.habitica', 'cc.forestapp', 'com.oristats.habitbull', 'com.levor.liferpgtasks', 'com.habitnow', 'com.microsoft.todos', 'prox.lab.calclock', 'com.gmail.jmartindev.timetune', 'com.artfulagenda.app', 'com.tasks.android', 'com.appgenix.bizcal', 'com.appxy.planner' ]
Extracting App Information
Scraping the information for every application
app_infos = [] for ap in tqdm(app_packages): info = app(ap, lang='en', country='us') del info['comments'] app_infos.append(info) For each of the 15 apps, we are able to gather information. Let's create a method to make printing JSON objects easier: def print_json(json_object): json_str = json.dumps( json_object, indent=2, sort_keys=True, default=str ) print(highlight(json_str, JsonLexer(), TerminalFormatter())) Here's an example of app data from the list: print_json(app_infos[0]) { "adSupported": null, "androidVersion": "Varies", "androidVersionText": "Varies with device", "appId": "com.anydo", "containsAds": null, "contentRating": "Everyone", "contentRatingDescription": null, "currency": "USD", "description": "ud83cudfc6 Editor's Choice by GooglernrnAny.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.rnrnud83eudd47 "Itu2019s A MUST HAVE PLANNER & TO DO LIST APP" (NYTimes, USA TODAY, WSJ & Lifehacker).rnrnAny.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.rnrnud83dudcc5 Organize Your Tasks & To-Do List in Secondsrnrnu2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.rnrnu2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so youu2019ll never forget a thing. Sync your phoneu2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you donu2019t forget an important event.rnrnu2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.rnrnu2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done. rnrn---rnrnALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONErnCreate and set reminders with voice to your to do list. rnFor better task management flow we added a calendar integration to keep your agenda always up to date. rnFor better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments. rnTo keep your to do list up to date, weu2019ve added a daily planner and focus mode.rnrnINTEGRATIONSrnAny.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.rnrnTO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLErnDesigned to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.rnrnPOWERFUL TO DO LIST TASK MANAGEMENTrnAdd a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.rnrnDAILY PLANNER & LIFE ORGANIZERrnAny.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have. rnrnSHARE LISTS, ASSIGN & ORGANIZE TASKSrnTo plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.rnrnGROCERY LIST & SHOPPING LISTrnAny.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "descriptionHTML": "ud83cudfc6 Editor's Choice by Google
Any.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.
ud83eudd47 “Itu2019s A MUST HAVE PLANNER & TO DO LIST APP” (NYTimes, USA TODAY, WSJ & Lifehacker).
Any.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.
ud83dudcc5 Organize Your Tasks & To-Do List in Seconds
u2022 ADVANCED CALENDAR & DAILY PLANNER – Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.
u2022 SYNCS SEAMLESSLY – Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so youu2019ll never forget a thing. Sync your phoneu2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you donu2019t forget an important event.
u2022 SET REMINDERS – One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.
u2022 WORK TOGETHER – Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done.
—
ALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE
Create and set reminders with voice to your to do list.
For better task management flow we added a calendar integration to keep your agenda always up to date.
For better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments.
To keep your to do list up to date, weu2019ve added a daily planner and focus mode.
INTEGRATIONS
Any.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.
TO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE
Designed to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do’s as complete, and shaking your device to remove completed from your to do list – you can stay organized and enjoy every minute of it.
POWERFUL TO DO LIST TASK MANAGEMENT
Add a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.
DAILY PLANNER & LIFE ORGANIZER
Any.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have.
SHARE LISTS, ASSIGN & ORGANIZE TASKS
To plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.
GROCERY LIST & SHOPPING LIST
Any.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.”, “developer”: “Any.do Calendar & To-Do List”, “developerAddress”: “Any.do Inc.nn6 Agripas Street, Tel Avivn6249106 ISRAEL”, “developerEmail”: “feedback+androidtodo@any.do”, “developerId”: “5304780265295461149”, “developerInternalID”: “5304780265295461149”, “developerWebsite”: “https://www.any.do”, “free”: true, “genre”: “Productivity”, “genreId”: “PRODUCTIVITY”, “headerImage”: “https://lh3.googleusercontent.com/dZknnlk1LM8fYS3wjOvVHOmWKOGH1HAe691Yuh7LAeBj6a730A1CQqZnXxjNahAYUFFw”, “histogram”: [27291, 9246, 13735, 29904, 262997], “icon”: “https://lh3.googleusercontent.com/zgOLUXCHkF91H8xuMTMLT17smwgLPwSBjUlKVWF-cZRFjlv-Uvtman7DiHEii54fbEE”, “installs”: “10,000,000+”, “minInstalls”: 10000000, “offersIAP”: true, “price”: 0, “privacyPolicy”: “https://www.any.do/privacy”, “ratings”: 343174, “recentChanges”: “Faster and smoother for better user experience!”, “recentChangesHTML”: “Faster and smoother for better user experience!”, “released”: “Nov 10, 2011”, “reviews”: 122170, “score”: 4.43388, “screenshots”: [ “https://lh3.googleusercontent.com/C-L3_FPMlKVrZItAORaszhnQzlzMyXcqF_-oGaabHm_OnwUW1jz02BXBVSKi0HRUtQ”, “https://lh3.googleusercontent.com/uAP6G5ANQcgVs4Uj6yrcsAo4OUhejTJRVCXOxnAVA5Efit_OtAnrOYyL1SUHj1rv”, “https://lh3.googleusercontent.com/AI5mLFu0Atsl0km2FO9_IwJXNy_1q1_X6Ua3EVMZNedp0dsDToDRaWQ1UDvI6mb1-I0”, “https://lh3.googleusercontent.com/bYCAn3mjgB4ugSY0PL-PCcMBfbvXCSFkzL-pLSIIbZ8sQByQPerHboPQ2fA126K4LDtU”, “https://lh3.googleusercontent.com/u-dX4lpTepsvXs33ds4xxYpApuGS4JBAEb0UsvY_fPbptxnF0QxaKNW0-tJVXaP8a1E”, “https://lh3.googleusercontent.com/qvUz_9IXHQd6FSLUALZo8NKLx-s4uDGyElPOGRsU28TCEficQc0BoNRloRRLqUkH2A”, “https://lh3.googleusercontent.com/tEyGs6MGlY97ccLc4c_HxV9xNOpsvwQyHz6uGAezkVtxm1ydAaTj5EZSUgqlg69qrrk”, “https://lh3.googleusercontent.com/StN0i2BskOs6HCfaPO0DMBOCQMCag3okWVI_SlFJtMytwbgNMBnD5i9hbSqdNlGxffmn”, “https://lh3.googleusercontent.com/GRKqWfo-PLzCKwpgZ8fej4PGsUp1q9eM5a3LQeiYCOW-KUpCOIHXOp3mteZWbJ-pz4My”, “https://lh3.googleusercontent.com/pFQQ_qi8u92duWCNXpEcNKpH2lVpD_hFd5f-UlTP_f6wft3YyYLMzwLitxt-UI6G8vs”, “https://lh3.googleusercontent.com/AoeCU6bT1x0eHRvJwvQyOSKJ31oSayox959qMNVaSzz3uN9bvk1cGek5zyRDe1BdtA”, “https://lh3.googleusercontent.com/vICme1f4J9vFt8wY3xBY-LshGgYyvSbsa4TLJyEtNsy0alUI0i9oMQVq8oJ4l_yR1Aw”, “https://lh3.googleusercontent.com/7sn9m__iVM-peiG6_jkKBuE-QVH_xDaycF_oR1XJlwcAC45ybNZ_Exor09ENOJ41Q2U”, “https://lh3.googleusercontent.com/9I_m2ZXgPtiU4Po4cw_cyIaEpZxynxQ1n3YkhFgakATfbu63a8_f8vGQDxKOHYITzew” ], “size”: “Varies with device”, “summary”: “Task Manager u2705 Organizer ud83dudcc5 Agenda ud83dudcdd Daily Reminders ud83dudd14 All-in-One Simple App.”, “summaryHTML”: “Task Manager u2705 Organizer ud83dudcc5 Agenda ud83dudcdd Daily Reminders ud83dudd14 All-in-One Simple App.”, “title”: “Any.do: To do list, Calendar, Planner & Reminders”, “updated”: 1586258773, “url”: “https://play.google.com/store/apps/details?id=com.anydo&hl=en&gl=us”, “version”: “Varies with device”, “video”: “https://www.youtube.com/embed/2nkllLD0x6o?ps=play&vq=large&rel=0&autohide=1&showinfo=0”, “videoImage”: “https://i.ytimg.com/vi/2nkllLD0x6o/hqdefault.jpg” }
This offers a great deal of information, such as the number of ratings, reviews, and ratings for each score (1 to 5). Let’s set aside all of that and have a look at their lovely icons:
def format_title(title): sep_index = title.find(':') if title.find(':') != -1 else title.find('-') if sep_index != -1: title = title[:sep_index] return title[:10] fig, axs = plt.subplots(2, len(app_infos) // 2, figsize=(14, 5)) for i, ax in enumerate(axs.flat): ai = app_infos[i] img = plt.imread(ai['icon']) ax.imshow(img) ax.set_title(format_title(ai['title'])) ax.axis('off')
We can save the app information for later by converting the JSON objects into a Pandas data frame and saving the output to a CSV file:
app_infos_df = pd.DataFrame(app_infos) app_infos_df.to_csv('apps.csv', index=None, header=True)
Scraping App Reviews
You may use the scraping tool to create a balanced dataset by filtering the review score. And, to receive a sample of evaluations for each app, you may arrange the reviews by how helpful they are, which Google Play considers to be the most essential factor.
We’re looking for:
- A well-balanced dataset — each score (1–5) has nearly the same number of reviews.
- A representative sample of each app’s reviews
You may achieve the first criterion by utilizing the scrape package option to filter the review score. For the second, we’ll sort the reviews by helpfulness, which suggests which are the most important to Google Play. Just in case, we’ll get a subset from the most recent:
app_reviews = [] for ap in tqdm(app_packages): for score in list(range(1, 6)): for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]: rvs, _ = reviews( ap, lang='en', country='us', sort=sort_order, count= 200 if score == 3 else 100, filter_score_with=score ) for r in rvs: r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest' r['appId'] = ap app_reviews.extend(rvs)
Each review includes the app’s id and sort order. Consider the following as an example:
print_json(app_reviews[0]) { "appId": "com.anydo", "at": "2020-04-05 22:25:57", "content": "Update: After getting a response from the developer I would change my rating to 0 stars if possible. These guys hide behind confusing and opaque terms and refuse to budge at all. I'm so annoyed that my money has been lost to them! Really terrible customer experience. Original: Be very careful when signing up for a free trial of this app. If you happen to go over they automatically charge you for a full years subscription and refuse to refund. Terrible customer experience and the app is just OK.", "repliedAt": "2020-04-07 14:09:03", "replyContent": "Our policy and TOS are completely transparent and can be found in the Help Center and our main page. In addition, payment can only be made upon the user's authorization via the app and Google Play. We provide users with a full 7 days trial to test the app with an additional 48 hours for a refund, along with priority support for all issues.", "reviewCreatedVersion": "4.17.0.3", "score": 1, "sortOrder": "most_relevant", "thumbsUpCount": 37, "userImage": "https://lh3.googleusercontent.com/a-/AOh14GiHdfNEu1DwwcJ6yNyju8Yvn4JwjpzuXvD74aVmDA", "userName": "Andrew Thomas" }
repliedAt and replyContent are the developer’s answer to the review is included in the content which can be sometimes found missing.
len(app_reviews)
How Many Reviews will you receive?
len(app_reviews) 15750 Save the reviews to CSV files app_reviews_df = pd.DataFrame(app_reviews) app_reviews_df.to_csv('reviews.csv', index=None, header=True)
Summary
We now have over 15K user reviews from 15 different productivity apps.
Scripting with Pytorch is used to run the code (Google Colab)
Installing required packages and configuring imports
You learned how to:
- Define the goals and expectations for your dataset.
- Google Play app information can be scraped.
- Google Play app user reviews can be scraped.
- Save the data as a CSV file.
Following that, we’ll use BERT to analyze the reviews for the sentiment.
Are you looking for a way to harvest reviews from Google Play?
Request a quotation from ReviewGators today.
0