A Semantic Analysis of Participation at OSCon 2008

Schuyler Erle
[email protected]
23 July 2008

The Summary

O'Reilly Media's 2008 Open Source Convention presents a diverse multitude of talks on a variety of subjects, attracting participants from all across the Open Source community, and the IT industry at large. The conference website offers participants the opportunity to compose a 'personal calendar' of the event, and the aggregate information provided by these personal calendars paints, in broad strokes, a characterization of the attendeeship of the conference.

The Method

Edd Dumbill and Mark Levitt of O'Reilly were kind enough to provide a table summarizing the scheduled presentations and events at the conference, along with a sample of the 'personal calendars' created by attendees. All of the 'personal calendar' data was anonymized beforehand to guarantee the privacy of individual participants.

Armed with the data, I composed a matrix of the 298 scheduled talks and events, against the 374 participants represented in the sample. This 298 x 374 element matrix is unwieldy in size, and impractical to perform useful analysis on; further, the sparse population of elements strongly indicated a great deal of redundancy in this particular representation.

The Madness

With the scipy package in Python, I reduced the matrix from 298 columns to 7, by performing a singular value decomposition of the matrix, discarding the lower dimensions, and then normalizing the 7 x 374 matrix to the unit sphere. The result is, in essence, a set of 7-dimensional vectors representing the notional "location" of each attendee's interests in an abstract "semantic space" of the conference. I then performed a k-means clustering on the vector set, grouping participants' interests in the "semantic space" around particular topics.

By examining the most popular talks in each cluster, we can discern seven broad groups of participants in OSCon 2008:

  1. Trend spotters (23% of the sampled attendees)

    These are the folks that attend every single keynote, Q&A session, et cetera. They're at OSCon to find out what the buzz is. They might be bloggers or C*Os.

  2. Web 2.0 Enthusiasts (16%)

    A lot of the Ruby and Python folks are in here, too. They're the ones who really care about development methodologies this year, it seems.

  3. Perl Mongers (16%)

    The perennial backbone of the OSCon community. They also tend to have the weirdest talk titles.

  4. PHP Hackers (14%)

    The other perennial backbone of the OSCon community, creatures of tremendous patience and persistence. God bless them for it.

  5. Scalability Gurus (13%)

    This interest cluster is new this year, I think. XMPP, Hadoop, EC2, Erlang. Mmm, Erlang.

  6. UNIX Wizards (10%)

    These are the folks who keep your servers running. They also seem to be curious about Python this year.

  7. The "Open" Crowd (6%)

    This is a smaller cluster, and sort of a grab bag, consisting of the Open Mobile and Open Hardware enthusiasts, and the folks who take a particular interest in Open Source community processes and war stories.

At the bottom of this page, I've listed the leading talks in each cluster. The percentage given for each talk is the portion of the participants in that cluster who opted to see that particular talk or event.

It's interesting to note that there doesn't seem to be a particular cluster of tutorial participants, but rather that the tutorials are spread across five of the six technical clusters. Also, it's interesting, though less surprising, to note the tight clusters around Perl and PHP versus Python or Ruby.

Live and in 7-D

The video is a visualization of the attendee vectors along dimensions 3, 4, and 5, rendered in ggobi and captured with xvidcap. I had the devil's time doing video screen capture in Linux, so the quality is terrible, but you can plainly see how the attendees form a nice tetrahedron in these three dimensions. The colors were chosen to more or less match the clusters above. The Web 2.0, scalability, and PHP clusters form three of the four corners of the tetrahedron, with the Perl Mongers and the Open Source diehards forming the other corner. Quite appropriately, the trendspotters are squarely in the middle, with the network admins sprinkled throughout.

You Did What, Now?

Actually, I've done this before, for FOSS4G 2007, although for that conference, the analysis centered on the structure of the schedule (warning: PowerPoint) rather than the attendees themselves.

Now, obviously, this method of analysis is not without flaws. First off, the dimension reduction of the calendar matrix may be collapsing together interests or interest groups that don't belong together. Second, the clustering was done as a discrete classification of individual attendee interests, where a fuzzy classification is probably more appropriate. Third, the clustering method used is non-deterministic, as random points in the semantic space are used to seed the cluster centroids. I tried to overcome this by iteratively repeating the clustering process, Monte Carlo style, in order to maximize the explained variance, but it makes the results hard to reproduce precisely. Finally, the number of retained dimensions and the number of clusters used were both parameters chosen by trial and error. Further inquiry would be called for before drawing any hard conclusions from this analysis.


Trend Spotters (23%)
86% Welcome
83% Open Source on the O'Reilly Radar Keynote
77% Tim O'Reilly Interviews Monty Widenius & Brian Aker Keynote
71% Open Source Physical Security: Can We Have Both Privacy and Safety? Keynote
71% Keynote Keynote
70% Why Whinging Doesn't Work Keynote
67% Advocating Software Freedom by Revealing Errors Keynote
65% Language Inflection Point Keynote
60% Q & A Keynote
58% Anniversaries Keynote
58% Open Invention Network and Its Role in Open Source and Linux Keynote
57% Three Challenges Keynote
54% Expo Hall Reception Event
50% Open Source Heroes Keynote
49% fork() && exec(): Spawning the Next Generation of Hackers Keynote
49% Closing Get Together General
48% Learning from Airports Keynote
44% Q & A Keynote
42% Q & A Keynote
37% Keynote Keynote
Web 2.0 Enthusiasts (16%)
49% Skimmable Code: Fast to Read, Safe to Change Programming
47% CSS for High Performance JavaScript UI Web Applications
40% (The Lack of) Design Patterns in Python Python
40% Even Faster Web Sites Web Applications
37% Do You Believe in the Users? People
32% Web Frameworks of the Future: Flex, GWT, Grail, and Rails Web Applications
30% Code is Easy, People are Hard: Developing Meebo's Interview Process People
28% A Critical View of OpenID Web Applications
28% Experience-driven Development: Designers and Developers Working in Harmony Web Applications
27% Open Architecture at REST Web Applications
27% Top 10 Scalability Mistakes PHP
25% The Effects of Stress on Programmers' and Groups' Performance People
25% The Google Open Source Update Emerging Topics
25% Web Graphics & Animations Without Flash (or GFX Deliciousness with Dojo) Web Applications
25% An Introduction to Ruby Web Frameworks Ruby
23% Code Reviews for Fun and Profit Programming
23% Open Source Community Antipatterns People
23% Give Your Site a Boost with memcached PHP
23% An Illustrated History of Failure Programming
23% Beyond Agile: Enabling the Next Wave of Software Development Methods Programming, Ubuntu
Perl Mongers (16%)
63% Perl 6 Update Perl
63% The Twilight Perl Perl
59% Perl 5.10 for People Who Are Not (Totally) Insane Perl, Programming
50% Barely Legal XXX Perl Perl
47% Perl Lightning Talks Perl
47% Perl and Parrot: Baseless Myths and Startling Realities Perl
45% Stick a fork() in It: Parallel and Distributed Perl Perl
44% Perl Worst Practices Perl, Tutorial
42% Moose: A Postmodern Object System for Perl 5 Perl
39% Mastering Perl Perl, Tutorial
36% Rakudo: Perl 6 on Parrot Perl
34% Log4perl: the Only Logging System You'll Ever Need Perl
32% Perl Security Perl, Tutorial
31% Catalyst: 21st Century Perl Web Development Databases, Perl, Tutorial, Web Applications
29% Load Testing Using Perl Perl
27% Programming Vim Emerging Topics, Tutorial
27% Scaling Databases with DBIx::Router Perl
27% Subversion Worst Practices Programming
24% Shell Scripting Craftsmanship Programming
24% Code Reviews for Fun and Profit Programming
PHP Hackers (14%)
70% Caching and Performance: Lessons from Facebook PHP
66% Write Beautiful Code (in PHP) PHP
57% Securing the PHP Environment with PHPSecInfo PHP
55% Integration Testing PHP Applications PHP
51% Give Your Site a Boost with memcached PHP
50% Top 10 Scalability Mistakes PHP
46% PHP Taint Tool: It Ain't a Parser PHP
38% Even Faster Web Sites Web Applications
35% CSS for High Performance JavaScript UI Web Applications
35% PHP: Architecture, Scalability, and Security PHP, Tutorial
33% Trac: Project and Process Management for Developers and Sys Admins Fundamentals
33% The Internet is an Ogre: Finding Art in the Software Architecture PHP
31% Unlocking the APC Code PHP
31% PDO: PHP Data Objects PHP
31% Skimmable Code: Fast to Read, Safe to Change Programming
31% Security 2.0: Emerging Trends in Web Application Security Web Applications
29% Hack This App! PHP Security Workshop PHP, Tutorial
27% Testing with PHPUnit and Selenium PHP, Tutorial
25% Code Reviews for Fun and Profit Programming
25% intl Me This, intl Me That PHP
Scalability Gurus (13%)
54% XMPP/Open Source Components for Cloud Services Web Applications
50% Cloud Computing with bigdata Programming
50% Beyond REST? Building Data Services with XMPP PubSub Emerging Topics
48% CouchDB from 10,000 ft Databases
46% Cloud Computing with Persistent Data: Pushing the Envelope of Amazon Web Services Programming
44% Beautiful Concurrency with Erlang Programming
42% Processing Large Data with Hadoop and EC2 Programming
42% Practical Erlang Programming Emerging Topics, Programming, Tutorial
40% Code Reviews for Fun and Profit Programming
40% Open Architecture at REST Web Applications
38% HDFS Under the Hood Programming
36% Hypertable: An Open Source, High Performance, Scalable Database Databases
36% (The Lack of) Design Patterns in Python Python
32% Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources Emerging Topics
32% Top 10 Scalability Mistakes PHP
30% Web Frameworks of the Future: Flex, GWT, Grail, and Rails Web Applications
28% Even Faster Web Sites Web Applications
26% A Critical View of OpenID Web Applications
26% Introduction to LucidDB Databases
26% Secrets of JavaScript Libraries Programming, Tutorial, Web Applications
UNIX Wizards (10%)
44% TCP/IP Troubleshooting for System Administrators Administration, Linux, Tutorial
39% Python in 3 Hours Python, Tutorial
39% Introduction to Django Python, Tutorial
39% Open Source Virtualization for People Who Feel Guilty About Using VMware So Much Administration
39% Subversion Worst Practices Programming
34% Shell Scripting Craftsmanship Programming
28% Open Source Virtualization Hacks Administration
23% Using Puppet: Real World Configuration Management Administration
23% Trac: Project and Process Management for Developers and Sys Admins Fundamentals
23% Eat My Data: How Everybody Gets File IO Wrong Programming
21% Rebuilding Linux for the Desktop Linux
21% Programming Vim Emerging Topics, Tutorial
21% Open Source's (VoIP) Call for Change Business
21% Pro PostgreSQL Databases, Tutorial
18% MondoRescue: the GPL Disaster Recovery Solution Administration
18% Searching for Neutrinos Using Open Source at the Bottom of the World Emerging Topics
18% An Open Source Startup in Three Hours People, Programming, Tutorial, Web Applications
18% Skimmable Code: Fast to Read, Safe to Change Programming
18% Secrets of JavaScript Libraries Programming, Tutorial, Web Applications
18% Top 10 Scalability Mistakes PHP
The "Open" Crowd (6%)
45% Going Open Source: The 20 Most Important Things To Do Fundamentals
41% Keynote Kick-off Keynote, Open Mobile Exchange
37% An Open Source Project Called "Failure:" Community Antipatterns to Know and Avoid People
37% Mobile Lightning Talks Open Mobile Exchange
37% Mobile Browser Roundtable Open Mobile Exchange
37% Integration: The Mobile Web Open Mobile Exchange
33% An Open Source Startup in Three Hours People, Programming, Tutorial, Web Applications
33% Automating Open Source Governance Using Free Tools and Data Business
33% Welcome Keynote, Open Mobile Exchange
33% Platform Showdown Open Mobile Exchange
29% Open Source Software Economics, Standards, and IP in One Lesson Fundamentals
29% Trac: Project and Process Management for Developers and Sys Admins Fundamentals
29% Open Source in China Emerging Topics
29% Handwave to Hardware in Twelve Months Open Mobile Exchange
25% Open Source Community Antipatterns People
25% Legal Rules for the New Open Source Project Business, Emerging Topics
25% Moblin.org: The Community for Linux on Mobile Internet Devices (MID), netbooks, nettops and Moreā€¦ Linux
20% A Critical View of OpenID Web Applications
20% Running a Successful User Group Fundamentals
20% Open Source / Open World Emerging Topics