Project:LHS Graphs and Visualizations: Difference between revisions
Line 32: | Line 32: | ||
* Scrape the number of posts from the front page: http://groups.google.com/group/london-hack-space | * Scrape the number of posts from the front page: http://groups.google.com/group/london-hack-space | ||
Excellent idea: I'll try this. --[[User:Teabot|Teabot]] 12:22, 16 December 2010 (UTC) | Excellent idea: I'll try this. --[[User:Teabot|Teabot]] 12:22, 16 December 2010 (UTC) | ||
* Candidate | * Candidate regex for this: | ||
<pre> | <pre> | ||
<a [^>]+ href="/group/london-hack-space/topics"><b>Discussions</b></a><img [^>]+><span [^>]+>[0-9]+ of ([0-9]+) messages</span> | <a [^>]+ href="/group/london-hack-space/topics"><b>Discussions</b></a><img [^>]+><span [^>]+>[0-9]+ of ([0-9]+) messages</span> |
Revision as of 12:39, 16 December 2010
LHS Graphs and Visualizations
| |
---|---|
Members | Elliot |
QR code |
Overview
I'd am supplementing the Cacti graphs that we have for LHS bandwidth and power with metrics that provide insight to the growth of our community and organisation over time.
Phase 1 is complete and we now chart the following:
- Number of members
- Wiki activity
Later I'd like to investigate:
- Website visitors and/or page impressions
- Mailing list activity
- Space occupancy
In progress
Space occupancy
I have located two Laser Diode Retroreflective Sensors in our stores which I intend to use to monitor space occupancy by having the beams cross the main doorway. The beams will be staggered so that direction (arriving, leaving) can be determined by the order in which the beams are broken. I was thinking of mounting them at waist height.
These sensors appear to be in working order. They'd should be safe to use below child-head height. They were manufactured in 1996 and use a Class II laser rated at 3mW with a wavelength of 655-670nm (datasheet).
The model number is: Q45BB6LL
Desired
Website visitors and/or page impressions
The main site and the Wiki use Google Analytics and this has an API. This documented method looks promising:
https://www.google.com/analytics/feeds/data?metrics=ga%3Avisits%2Cga%3Apageviews&start-date=2010-11-29&end-date=2010-12-13&max-results=50
Mailing list activity
- Calculate the size of the list (members) from the mailing list download
- It would be easier to scrape this from the box on the right hand side: http://groups.google.com/group/london-hack-space
- Scrape the number of posts from the front page: http://groups.google.com/group/london-hack-space
Excellent idea: I'll try this. --Teabot 12:22, 16 December 2010 (UTC)
- Candidate regex for this:
<a [^>]+ href="/group/london-hack-space/topics"><b>Discussions</b></a><img [^>]+><span [^>]+>[0-9]+ of ([0-9]+) messages</span> <a [^>]+ href="/group/london-hack-space/members"><b>Members</b></a><img [^>]+><span [^>]+>([0-9]+) members</span>
Done
To chart the initial metrics various bits of data are exposed in a Cacti friendly way.
Number of members
COMPLETE - but may have to wait 12 months before it becomes interesting.
This data is stored in an Sqlite database on Turing. See the Schema. It can be queried like so:
SELECT COUNT(id) FROM users WHERE subscribed = true;
Cacti runs on babbage but the members database is on Turing. There is a PHP script on Turing to expose the member numbers and then a script on babbage to pull this in with a HTTP request.
This PHP generates the following output for cacti: subscribed:136 pending:2
<?php require_once( $_SERVER['DOCUMENT_ROOT'] . '/../lib/init.php'); $subscribers = $db->translatedQuery( 'SELECT COUNT(id) AS num FROM users WHERE subscribed=1' )->fetchRow(); $pending = $db->translatedQuery( 'SELECT COUNT(id) AS num FROM users WHERE subscribed=0' )->fetchRow(); print "subscribed:{$subscribers['num']} pending:{$pending['num']}";
Perl to fetch member numbers for cacti:
#!/usr/bin/perl use strict; use LWP::Simple; eval { my $file = get("http://london.hackspace.org.uk/member_stats.php"); print $file; 1; } or do { print "NaN"; }
Wiki statistics
COMPLETE
- We get this using the MediaWiki API:
http://wiki.hackspace.org.uk/w/api.php?action=query&meta=siteinfo&siprop=statistics&format=xml
It returns:
<?xml version="1.0"?> <api> <query> <statistics pages="759" articles="215" views="229656" edits="7368" images="186" users="166" activeusers="22" admins="61" jobs="13" /> </query> </api>
Perl to generate the following output for cacti: pages:759 articles:215 views:229656 edits:7368 images:186 users:166 activeusers:22 admins:61 jobs:13
#!/usr/bin/perl use strict; use LWP::Simple; use XML::Simple; eval { my $file = get("http://wiki.hackspace.org.uk/w/api.php?action=query&meta=siteinfo&siprop=statistics&format=xml"); my $xs1 = XML::Simple->new(); my $doc = $xs1->XMLin($file); my $first = 1; foreach my $key (keys (%{$doc->{query}->{statistics}})){ if ($first eq 1) { $first = 0; } else { print " "; } print $key . ":" . $doc->{query}->{statistics}->{$key}; } 1; } or do { print "NaN"; }
Mailing list activity
COMPLETE
- There is no API for Google Groups. We poll the RSS feed and count new messages.
https://groups.google.com/group/london-hack-space/feed/rss_v2_0_msgs.xml?num=50
This script yields: messages:0
#!/usr/bin/perl use Date::Manip; use LWP::Simple; use XML::Simple; use Data::Dumper; eval { my $ua = "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/540.0 (KHTML,like Gecko) Chrome/9.1.0.0 Safari/540.0"; my $client = LWP::UserAgent->new; $client->agent($ua); my $response = $client->get("http://groups.google.com/group/london-hack-space/feed/rss_v2_0_msgs.xml?num=50"); my $xml = $response->content; my $xs1 = new XML::Simple(keeproot => 1,searchpath => ".", forcearray => 1, keyattr => [key, tag]); my $doc = $xs1->XMLin($xml, KeepRoot => 1); if ($response->is_success) { $xml = $response->content; } else { print "Nan"; exit (1); } my $lastTime = 0; open (MYFILE, 'lhs_rss_feed_ts.dat'); while (<MYFILE>) { chomp; $lastTime = $_; } close (MYFILE); my $nextTime = $lastTime; my $messageCount = 0; foreach my $item (@{$doc->{rss}->[0]->{channel}->[0]->{item}}){ my $currDate = \$item->{pubDate}->[0]; my $unixDate = UnixDate($$currDate,"%s"); if ($unixDate gt $lastTime) { $messageCount++; if ($unixDate gt $nextTime) { $nextTime = $unixDate; } } } open (MYFILE, '>lhs_rss_feed_ts.dat'); print MYFILE $nextTime; close (MYFILE); print "messages:" . $messageCount; 1; } or do { print "NaN"; }