When a WordPress plugin is hosted on WordPress.org, some basic usage statistics are shown on its page. These can be used in your plugin analytics.
The problem
I needed an easy way to gather up some of these daily stats into a .csv
file for later analysis and monitoring.
Data I was interested in:
- The approximate current number of installs (this number is rounded)
- The number of yesterday’s new downloads (today’s downloads depends on what time you run the script)
- The number of total downloads
- The number of support issues reported within the last month
- The number of support issues resolved within the last month
Most of these numbers are found in the static HTML, but the download counts are added in later via JavaScript. Hence the need for phantomjs.
The solution
A good place to go to gather these statistics is the advanced view of WordPress.org. For example, to see some basic statistics for my plugin, Bitcoin and Altcoin Wallets, you could visit:
https://wordpress.org/plugins/wallets/advanced/
Scraping a static HTML page is easy. For example, it could have been done in PHP using phpQuery. But I decided to use this opportunity as a gentle introduction into learning PhantomJS. PhantomJS is very useful for black box testing, but it is also suitable for scraping data. Additionally, I would like to scrape the download counts that are not available as part of the static HTML.
The script is now posted on github. See the README.md
file for usage instructions.
https://github.com/alex-georgiou/wordpress-plugin-stats-scraper
Running it daily
The plugin is suitable for running via cron. You should run it once a day.
Keep in mind that tasks running with cron may not have the working directory you expect. And the path variable might be empty. So make sure to specify full paths. Here’s how it looks like in my crontab:
0 1 * * * QT_QPA_PLATFORM=offscreen /usr/bin/phantomjs /home/alexg/wordpress-plugin-stats-scraper/wordpress-plugin-stats-scraper.js wallets /home/alexg/wallets-stats.csv