Extracting Useful Information from the Powersight Kit
A brief writeup of my BBWorld09 session. The PowerPoint file is here:
PowerSight Presentation from BBWorld09
What is the PowerSight Kit?
What is the PowerSight Kit? It’s a series of views in the database that are denormalized to make them easier to query. They include:
- rpt_gradebook
- rpt_learning_context
- rpt_learning_context_size
- rpt_member
- rpt_person
- rpt_template
- rpt_tracking
All of these views have interesting information in them, but for my purposes, I spent my time analyzing rpt_tracking. Rpt_tracking contains a record for every action taken in Vista, along with who took the action, which section it was in, which tool was used, and exactly when (down to the millisecond!). So you can see how parsing through this data could yield some useful results.
Special Notes about Chico
A couple notes about things we do at CSU, Chico that might be different than other institutions:
- We create a blank section for every section offered in the student information system. This results in many sections that never get used, but it also lowers the bar for faculty to jump in and use the system. It’s important for us to track how many sections actually end up getting used.
- We have our learning context hierarchy set up so that we have a division for every college (e.g. College of Business) and a group within each division for every department (e.g. Department of Management). This allows us to track usage by college and department. If you don’t do it that way, you could also gather this information by cleverly naming your sections, using other fields of the sections (like Organization Name, Organization Unit, etc).
- We use crosslisting. About 25% of the sections that get used are actually crosslisted. This has implications for gathering statistics because the “hits” in rpt_tracking are assigned to the parent section, but we want to track those hits by the child section. More about that next.
The Challenge of Crosslisted Sections
As mentioned above, crosslisted sections pose a special challenge. We’d like to report our usage based on the child sections (especially important when sections from different departments are crosslisted together). The best way I found to do this involves querying the rpt_tracking table by enrollments, rather than just by section. This way, any hits made by a student in a particular child section will be accumulated towards the usage of that child, rather than the parent.
The Basic Idea
So the basic concept I used is this:
- Query the database for a list of sections in a given term
- For each section in the list:
- Get a list of enrollments for the current section
- For each enrollment:
- Get all tracking information from rpt_tracking for the current section and current enrollment
- Step through each record, adding to various accumulators (I used Perl hashes, and hashes of hashes, rather extensively). For example, with a given record that involves a message-read action on the mail tool, I would accumulate a hit on the section, a hit by the person, a hit on the mail tool, a hit on the message-read action of the mail tool, etc.
- Go on to next enrollment
- Go on to next section
- When done with all sections, write output files based on all the data accumulated
SQL Used
Here are the main sql statements I used to gather the information:
To get all sections in a given term:
select tm.assigned_lcid, lc.source_id, lc.name, lc.description, lc.delivery_unit_type from webct.lc_term_mapping tm, webct.lc_term t, webct.learning_context lc where tm.lc_term_id = t.id and tm.assigned_lcid = lc.id and t.source_id = ?;
Note: you’ll notice that I didn’t use the rpt_learning_context view for this query; it turns out that the rpt_learning_context view doesn’t include what term a particular section is in.
Finding the parent section for a crosslisted child:
select master_lcid from webct.xlist_lc_mapping where child_lcid = ?
Finding the division (college, in our case) for a particular section:
select lc.name from webct.learning_context_index lci, webct.learning_context lc where lc.id = lci.left_lc_id and lci.hierarchy_level = 4 and right_lc_id = ?";
Finding the group (department, in our case) for a particular section:
select lc.name from webct.learning_context_index lci, webct.learning_context lc where lc.id = lci.left_lc_id and lci.hierarchy_level = 3 and right_lc_id = ?";
Getting a list of enrolled students for a given section:
select distinct p.webct_id from webct.member m, webct.role r, webct.role_definition rd, webct.person p where m.id = r.member_id and r.role_definition_id = rd.id and m.person_id = p.id and m.learning_context_id = ? and m.delete_status = 0 and rd.name = 'SSTU' and p.webct_id != ?";
Note: The demo student (used for Student View by instructors) is always named “webct_demo_” plus the learning context id of the section. This is the webct_id that is excluded by the last part of the where clause above.
Getting all activity for a given user in a given section:
select t.tracking_id, t.session_id, t.event_time_mil, p.webct_id, t.tool_name, t.action_name, t.page_name, t.dwell_time from webct.rpt_tracking t, webct.person p where t.person_id = p.id and t.learning_context_id = ? and t.event_time_mil >= ? and t.event_time_mil <= ? and p.webct_id = ?;
Note: event_time_mil is stored as milliseconds since Jan 1, 1970 UTC. If you divide by 1000 (to convert to seconds) and then pass this value to the Perl localtime function, it will return the actual date/time in your time zone.
Output Files
The files I output for each term include:
- summary_stats.txt – a text file with high level summaries of counts of faculty, students, and tools used, overall and by college and department.
- summary_stats.csv – a CSV file with essentially the same information as the text file, but in a format that allows easier manipulation (filtering, chart making, etc).
- detailed_stats.csv – a CSV file that contains information about each section, including primary instructors, number of enrolled students, crosslist status, total hits, and hits on each individual tool.
- student_list.csv – a list of unique students
- faculty_list.csv – a list of unique faculty
- pageview_stats.csv – a list of all content pages (learning module pages) and the number of hits on them (This was a special request from a faculty member who essentially wrote a text book for the class as a learning module. They wanted to know which pages got the most and least attention from the students).
- section_activity_stats.csv – a summary count of hits and sessions per section, including the average per week and per day of the term.
- person_activity_stats.csv – same as above, but by person instead of by section.
Pivot Tables
I demonstrated a pivot table built from the data in rpt_tracking as an example of what could be done with a data warehouse and a business intelligence package that can deal with multi-dimensional cubes. The pivot table is built by essentially dumping every record for the term into a flat file and then importing it to a database and using Excel to make the pivot table. Unfortunately, Microsoft Access has a 2GB file size limit, so I could only fit one week’s worth of date into it for the demonstration, but pivot tables are a powerful tool to “slice and dice” lots of data to look for interesting details. For instance, I was able to show which colleges on our campus used the Turnitin powerlink, and then drill down to show which departments in that college used it, then drill down again to see which sections. This kind of analysis is invaluable. Our plan is to move the tracking data into our data warehouse to enable this type of analysis across all sections and all terms, so perhaps next year’s BBWorld will include a presentation on how that went.
from webct.lc_term_mapping tm, webct.lc_term t, webct.learning_context lc
where tm.lc_term_id = t.id and tm.assigned_lcid = lc.id and t.source_id = ?;
