Monthly Archives: February 2012

Releasing the collection on GitHub

Late last week we released the Cooper-Hewitt’s collection metadata as a downloadable file. And in a first for the Smithsonian, we dedicated the release to the public domain, using Creative Commons Zero.

I’m often asked why releasing collection metadata is important. My teams did similar things at the Powerhouse Museum when I was there, and I still believe that this is the direction that museums and other collecting institutions need to go. With the growing Digital Humanities field, there is increasing value in scholars being able to ‘see’ a collection at a macro, zoomed out level – something which just isn’t possible with search interfaces. Likewise the release of such data under liberal licenses or to the public domain brings closer a future in which cross-institutional discovery is the norm.

Philosophically, too, the public release of collection metadata asserts, clearly, that such metadata is the raw material on which interpretation through exhibitions, catalogues, public programmes, and experiences are built. On its own, unrefined, it is of minimal ‘value’ except as a tool for discovery. It also helps remind us that collection metadata is not the collection itself.

Of course it is more complex than that.

There are plenty of reasons why museums are hesitant to release their metadata.

Collection metadata is often in a low quality state. Sometimes it is purposely unrefined, especially in art museums where historical circumstance and scholarly norms have meant that so called ‘tombstone data’ has sometimes been kept to a bare minimum so as to not ‘bring opinion’ to objects. Other times it has simply been kept at a minimum because of a lack of staff resources. Often, too, internal workflows still keep exhibition label and catalogue publishing separate from collection documentation meaning that obvious improvements such as the rendering of ‘label copy’ and catalogue narrative to object records is not automatic.

But I digress.

We released our metadata through GitHub, and that needs some additional explanation.

GitHub is a source repository of the kind traditionally used by coders. And, lacking a robust public endpoint of our own which could track changes and produce diff files as we uploaded new versions of the collection data, GitHub was the ideal candidate. Not only that, the type of ‘earlyvangelists’ we are targetting with the data release, hang out there in quantity.

The idea for using GitHub to host collection datasets had actually been bouncing around since April 2009. Aaron Straup-Cope and I were hanging out in-between sessions at Museums and the Web in Indianapolis talking about Solr, collection data, and APIs. Aaron suggested that GitHub would be the perfect place for museums to dump their collections – as giant text blobs – and certainly better than putting it on their own sites. Then 2010 happened and the early-mover museums all suddenly had built APIs for their collections. Making a text dump was suddenly off the agenda, but that idea of using GitHub still played on my mind.

Now, Cooper-Hewitt is not yet in a suitable position infrastructurally to develop an API for its collection. So when the time came to make release the dataset, that conversation from 2009 suddenly became a reality.

And, fittingly, Aaron has been the first to fork the collection – creating individual JSON for each object record.

Could GitHub become not just a source code repository but a repository for ‘cultural source code’?

(But read the data info first!)

Deploying 129 iPads to NYC Schools

One of the exciting projects we have underway is deploying 129 iPads to teachers in New York City public schools. As you can imagine, this is a somewhat challenging project and not just from a technical perspective.

This is happening as part of a US Department of Education funded project called Arts Achieve. Here’s Katie Shelly, who is handling the tech for the project.

What is i3 and Arts Achieve?

i3, which stands for Investing In Innovation, is a new grant program of the U.S. Department of Education. Established in 2009 as part of the American Recovery and Reinvestment Act, the i3 Fund provides competitive grants to cultural and educational organizations to expand innovative practices that will have an impact on improving education by advancing student achievement, student growth, closing achievement gaps, decreasing dropout rates, increasing graduation, college enrollment and/or college completion rates.

In 2010, Studio in a School was awarded an i3 grant in conjunction with five partner institutions – including Cooper-Hewitt. The winning entry, called Arts Achieve, proposed the development of an ambitious pilot program in New York City public schools to improve student achievement in the arts by building high-quality, digitally replicable arts assessments, along with a corresponding digital community and resource kit. Each partnering organization brings a different expertise to the development of the pilot. Carnegie Hall, The 92nd Street Y, ArtsConnection and Studio in a School bring music, dance, theatre and visual art expertise respectively. Cooper-Hewitt brings expertise on the innovative use of technology for education and design thinking expertise, and The NYC Department of Education brings expertise in curricula, state and national standards, and very importantly, the ability for us to connect with public schools. We’re currently in year 2 of 5, which means the Arts Achieve program is up and running– with 43 participating schools, hundreds of educational professionals and thousands of students participating. Our online community, powered by Ning, is abuzz with about 200 members who use the space to share lesson plans, media, resources, and feedback, bringing hundreds of individuals with different areas of niche expertise together in a single network focused on developing and implementing high-quality assessments for the arts.

How did you come to select iPads as the most appropriate technology? What can they do that others cannot?

We considered laptops, PC tablets, linking up peoples’ phones.. we even considered commissioning an elaborate arts assessment booth or kiosk (we jokingly call this wild idea “the laser box”).

The iPod touch and smartphones were cheap, but had too small of a screen for good collaboration.

Laptop user’s “don’t bug me” body language.

Laptops are highly capable but too individual-oriented to cultivate the feedback-rich classroom environment we wanted. When people are working on a laptop, it sends out this message that says: “I’m working–don’t bug me!” We wanted technology that would set the stage for collaboration, teamwork and most importantly, many layers of feedback– from student to student, teacher to student, outside the school to inside, and so on. Constant feedback flow is a big philosophical pillar in the project.

The kiosk idea, though tempting for the ability to infinitely customize for our project, was too pricey to be nationally replicable. A key goal of our pilot is to create something that can be replicated elegantly and affordably in any public school classroom in the United States. So for that reason we deciced to adapt existing technologies rather than develop something proprietary.

The only thing we rely on schools to provide is an Internet connection. Everything you need to participate is included our package. The package has 3 customized iPads, wireless access point with 14ft ethernet cable, speaker/pico projector combo dock (chosen for its wonderful lack of hookup cables) some styli, iRig mic, and durable iPad cases.

iPad users’ ‘hey, let’s explore together!’ body language

We liked iPads because we know that many teachers and students are already familiar and comfortable with the interface. The touchscreen was a big thing for us, because the flat, “swipey” interface fosters body language that says “come play with me, let’s explore!” We could imagine a bunch of kids gathered around a video or a group game, working together. And we had seen compelling reports of classrooms around the country using iPad to do just that, which confirmed our hunch that this was the best way to go.

What Apps did you select? What criteria did you use?

Our custom iPad image has 90 apps pre-loaded for the classroom. We asked our partner institutions in the different arts disciplines to suggest high-quality apps for their disciplines.

We looked for apps that we could picture a group of kids using together collaboratively. There’s a beautiful one called Visible Body that lets you zoom and spin around the entire human anatomy–bones, muscles, ligaments. I could see that engaging a group of students trying to draw from life or learning to understand their bodies through dance. I also like one called Educreations, which lets you draw on the screen while speaking/explaining, which you could imagine being useful for theater students planning out blocking while speaking cues, and perhaps using the recorded video to explain their vision to others. A drawing teacher could make a demo video to open a lesson on 3-D shading techniques, or as a tool to support students in need of extra help.

We included several video and photo editing apps to help teachers record and share what’s happening in their classroom in the Arts Achieve online community too.

If an app required elaborate setup or login, we nixed it. We operated on the assumption that since teachers are always extremely pressed for time and juggling many demands, there’s no time for anything that takes more than a few taps to get up and running.

What were the challenges in setting the iPads up for use across so many different schools?

A huge issue for teachers trying to harness the educational power of technology is simply getting online. Find the right wi-fi network, track down the IT guy to get the password, enter a 15 character password… enter a proxy setting, possibly another password in the browser… and repeat that process for multiple devices…. By now your 40 minute lesson time is halfway over.

A huge win for us was the ability to pre-configure these iPads for instant web access. All the teacher has to do is plug in an Airport Express brick to the outlet & the nearest ethernet jack. They turn on their iPads, which are pre-programmed to look for the Airport and pre-configured with DOE proxy settings. They plug in the brick, turn on the iPads, and they’re online. A little bit of configuration legwork from us will save hours of accumulated time for these teachers.

Each iPad has the same disk image that has been custom configured and optimized for the project. They’re pre-loaded with networking settings, relevant bookmarks for the Ning network, wallpaper with our logo and even keyboard shortcuts that reflect the Arts Achieve vocabulary. The iPads are centrally tethered and controlled using AirWatch, so we can see when and how they’re being used, where they are, push out new apps as we learn about them, and block whatever latest new distracting game is out there. We can also troubleshoot problems remotely, which is huge because the test schools are far-flung all around the city.

How did you balance the locked down needs of schools with the needs of the Apps?

3G connection was not an option because we needed to keep everyone inside the school firewall. So we’re satisfying a lot of the schools’ online safety needs because we’re staying inside their firewall.

Transferring photos easily and wirelessly with Photosync App

Use of e-mail to send and receive media is also not something we can encourage because that is currently not allowed for students. To get around this, we found two brilliant apps that let iPad send and receive data with any computer wirelessly– Photosync and MP3 player. This avoids the annoying issue of having to designate a computer and active iTunes account for a given iPad to sync up with. All the teacher wants to do is get their video or photo out of the iPad and onto a computer so they can work on it later or post it online. These apps allow them to do that in the simplest way possible.

What extra features did you wish the iPad/iOS had to help with these sorts of rollouts?

The iPad is designed for an individual to use and sync up with their personal computer. It would be nice if there was a “group mode” or something for iOS that made it easier to deploy multiple iPads  to a user group who don’t have syncing computers. In our dream setup, Arts Achieve central would have a Master iPad, and any changes we made to the master unit would automatically push out to the classroom iPads, without the teachers having to log in to iTunes, memorize any passwords, punch in credit card info, or any other time-killing, lesson-derailing obstacles. “Group mode” would be good for schools, or for a company issuing iPads out to employees, or a parent who wants to manage their kids’ devices. iCloud is close, in that it eliminates some of the headache of plugging in and physically syncing, but again, that service is designed for an individual consumer managing a personal media library… it wouldn’t work that well for a project like Arts Achieve, which demands replicability & uniformity from one classroom to the next.

Upending ticketing

One of the opportunities we have right now is to challenge the conventional wisdom that back-of-house systems need to always be ‘enterprise grade’. As we are currently in renovation mode and our exhibitions and programs are happening offsite and around the city, we have the chance to rethink and experiment with different systems to perform common functions such as ticketing. In so doing we are looking at the way different systems shape visitor/staff interactions and are also able to refocus by choosing systems on their user experience rather than their ‘backwards compatibility’.

A recent change we’ve made is to use EventBrite for ticketing, replacing a system that despite being tightly integrated with our donor management system placed an inscrutable purchasing interface between the customer and their desired tickets. It isn’t a permanent solution (what is these days?), but more the opening of a ‘possibility space’.

So how is it going?

Our ticket selling velocity has increased – events sell more quickly – and we’ve been able to integrate ticket selling directly into our email marketing, as well. When ticket price points have reached capacity we’ve used automatic waitlisting and we’ve even been able to collect donations as purchasers buy tickets, and we’ve also been able to issue refunds easily when required. Most importantly the customer experience of purchasing tickets has vastly improved.

Last night, we had our first trial of a medium size event check-in. Using the EventBrite iPhone Check-In App we were able to run a cashless door using staff members’ iPhones to check everyone in quickly. Checkins were done via ticket scans and where people had forgotten their printed ticket, by name. Each iPhone synced to the master list meaning that we could easily ‘add extra ticket staff’ to process more people if we had a logjam. This had a nice side effect of freeing up staff time to direct visitors to our roving iPads for quick signup to our mailing list on their way into the venue.

But the purpose of deploying lightweight technologies as a replacement for gargantuan enterprise systems is not just about improving visitor experience, or streamlining back-of-house operations – it is also about positioning us to reconceptualise the type of entry/ticketing experience we might want for our new building and galleries when they are completed.

If it is possible to do the entry experience to events in a seamless mannner with only mobile devices, can a museum jettison its ticket counter in a redesign? It also makes us ask ourselves to be specific about the other functions ticket counters might serve.

Media servers and some open sourceness

We use Amazon S3 for a good portion of our media hosting. It’s a simple and cost effective solution for serving up assets big and small. When we moved initially to Drupal 6.x ( about a year ago ) I wanted to be sure that we would use S3 for as many of our assets as possible. This tactic was partly inspired by wanting to keep the Drupal codebase nice and clean, and also to allow us to scale horizontally if needed ( multiple app servers behind a load balancer ).

Horizontal Scaling

Horizontal Scaling

So in an attempt to streamline workflows, we modified this amazon_s3 Drupal module a little. The idea was to allow authors to easily use the Drupal node editor to upload their images and PDFs directly to our S3 bucket. It would also rewrite the URLs to pull the content from our CloudFront CDN. It also sorts your images into folders based on the date ( a-la-Wordpress).


Our fork of amazon_s3 rewrite the URL for our CDN, and sorts into folders by date.

I’ve opened sourced that code now which is simply a fork of the amazon_s3 module. It works pretty well on Drupal 6.x. It has an issue where it uploads assets with some incorrect meta-data. It’s really only a problem for uploaded PDFs where the files will download but won’t open in your browser. This has to do with the S3 metadata tag of application/octet-stream vs. application/pdf. All in all I think its a pretty useful module.

As we move towards migrating to Drupal 7, I have been doing some more research about serving assets via S3 and CloudFront. Additionally, it seems that the Drupal community have developed some new modules which should help streamline a few things

Custom Origin

Create a CloudFront distribution for you whole site using a custom origin

As of a couple years ago Amazon’s CloudFront CDN allows you to use a custom origin. This is really great as you can simply tell it to pull from your own domain rather than an S3 bucket.

So for example, I set this blog up with a CloudFront distribution that pulls direct from The resultant distribution is at If you go to that URL you should see a mirror of this site. Then all we have to do is install a plugin for WordPress to replace static asset URLs with the CloudFront URL. You might notice this in action if you inspect the URL of any images on the site. You can of course add a CNAME to make the CloudFront URL prettier, but it isn’t required.

On the Drupal end of things, there is a simple module called CDN that does the same thing as we are doing here via the WordPress W3TC plugin. It simply replaces static asset files with your CloudFront domain. Additionally, I see there is now a new Drupal module called amazons3 ( note the lack of the underscore ). This module is designed to allow Drupal to replace it’s default file system with your S3 bucket. So, when a user uploads files through the Drupal admin interface ( which normally sends files to sites/default/files on your local server ) files automatically wind up in your S3 bucket.

I haven’t gotten this to work as of yet, but I think it’s a promising approach. Using this setup, you could maintain a clean and scalable Drupal codebase, keeping all of your user uploaded assets on an S3 bucket without much change to the standard workflow within the Drupal backend. NICE!


Moving to the Fog

When people have asked me where we host our website, I have usually replied with “it’s complicated.”

Last week we made some serious changes to our web infrastructure. Up until now we have been running most of our web properties on servers we have managed ourselves at Rackspace. These have included dedicated physical servers as well as a few cloud based instances. We also have a couple of instances running on Amazon EC2, as well as a few properties running at the Smithsonian Mothership in Washington DC.

For a long time, I had been looking for a more seamless and easier to manage solution. This was partially achieved when I moved the main site from our old dedicated server to a cloud-based set of instances behind a Rackspace load balancer. It seemed to perform pretty well, but still I was mostly responsible for it on my own.


PHPFog can be used to easily scale your web-app by adding multiple app servers

Eventually I discovered a service built on top of Amazon EC2 known as PHPFog. This Platform as a Service (PaaS) is designed to allow people like myself to easily develop and deploy PHP based web apps in the Cloud. Essentially, what PHPFog does is set up an EC2 instance, configured and optimized by their own design. This is placed behind their own set of load balancers, Varnish Cache servers and other goodies, and connected up with an Amazon RDS MySQL server. They also give you a hosted Git repository, and in fact, Git becomes your only connection to the file system. At first this seemed very un-orthrodox. No SSH, no FTP, nothing… just Git and PHPMyAdmin to deal with the database. However, I spent a good deal of time experimenting with PHPFog and after a while I found the workflow to be really simple and easy to manage. Deployment is as easy as doing a Git Push, and the whole thing worked in a similar fashion to, the popular Ruby on Rails PaaS.

What’s more is that PHPFog, being built on EC2 was fairly extensible. If I wanted to, I could easily add an ElastiCache server, or my own dedicated RDS server. Basically, through setting up security groups which allow communication to PHPFog’s instances, I am able to connect to just about anything that Amazon AWS has to offer.

I continued to experiment with PHPFog and found some additional highlights. Each paid account comes with a free NewRelic monitoring account. NewRelic is really great as it offers a much more comprehensive monitoring system than many of the typical server alerting and monitoring apps available today. You can really get a nice picture of where the different bottlenecks are happening on your app, and what the real “end user” experience is like. In short, NewRelic was the icing on the cake.


Our NewRelic Dashboard

So, last week, we made the switch and are now running our main site on “The Fog.” We have also been running this blog on the same instance. In fact, if you are really interested, you can check out our NewRelic stats for the last three hours in the “Performance menu tab!” It took a little tweaking to get our NewRelic alerts properly configured, but they seem to be working pretty seamlessly now.

Here’s a nice video explaining how AppFog/PHPFog works.

As you can see, we’ve got a nice little stack running here and all easily managed with minimal staff resource.

And here’s a somewhat different Fog altogether.

(Yes we are a little John Carpenter obsessed here)