Category Archives: Backends

Understanding how the Pen interacts with the API

Detail of instructional postcard now available to museum visitors at entry to accompany The Pen.

Detail of instructional postcard now available to museum visitors at entry to accompany The Pen.

The Pen has been up and running now for five weeks and the museum as a whole has been coming to terms with exactly what that means. Some things can be planned for, others can be hedged against, but inevitably there will be surprises – pleasant and unpleasant. We can report that our expectations of usage have been far exceeded with extremely high take up rates, over 400,000 ‘acts of collection’ (saving museum objects with the Pen), and a great post-visit log in rate.

The Pen touches almost every operation of the museum – even though the museum was able to operate completely without it from our opening in December until March. At its most simple, object labels need NFC tags which in turn needs up-to-the-minute location information entered into our collection management system (TMS); the ticketing system needs a constant connection not only to its own servers but also to our API functions that create unique shortcodes for each visitor’s visit; and the Pens need regular cleaning and their monthly battery change. So everyone in the museum has been continuously improving and altering backend systems, improving workflows, and even the front-end UI on tablets that the ticket staff use to pair Pens with tickets.

Its complex.

Katie drew up (another) useful diagram of the journey of a Pen through a visit and how it interacts with our API.

Single visit 'lifecycle' of The Pen. Illustration by Katie Shelly, 2015. [click to enlarge]

Single visit ‘lifecycle’ of The Pen. Illustration by Katie Shelly, 2015. [click to enlarge]

Even more details of the overall system design and development saga can be found in the (long) Museums and the Web 2015 paper by Chan & Cope.

The digital experience at Cooper Hewitt is supported by Bloomberg Philanthropies. The Pen is the result of a collaboration between Cooper Hewitt, SistelNetworks, GE, MakeSimply, Undercurrent, and an original concept by Local Projects with Diller Scofidio + Renfro.

A new feature you may never see – ticketing follow up emails

A few weeks ago we rolled out a small update to the ticketing website that sends a reminder email to anyone who has purchased tickets in advance.

This is sort of an experiment. but first, some background.

Recently I attended an event at another cultural institution ( which I won’t name ). A few days prior to the event I received a very up-selling reminder email, reminding me of membership discounts and other events I might like. The links within the email took me to the landing page for the event, and offered little actual information that I found useful unless I wanted to buy even more tickets, or combine my purchase with a book in the shop.

To make things worse, on the night of the event, and just as it was finishing up, I received a “follow up email” , which I found really annoying. It was literally timed to send exactly as the event was ending and while I was on my way out the door, as if to say, “wait, come back in and buy the book too!”

In fact, the subject line read “How did I enjoy [insert event title here]?” But, the email itself didn’t offer me a way to answer that question ( even if I wanted to ) and instead simply pointed me to the same landing page of the event I had just attended, along with links to their social media channels and other upcoming shows I might be interested in. The whole thing made me cringe a little as I pressed the delete button on my phone.

I thought to myself, “let’s not do that.”

I really just wanted to send a gentle reminder email, full of actually useful info to people who were planning to come visit the museum. I thought it would be nice if I had booked tickets in advance to get something like this the day before I was planning to visit. Something with a map and some info on how to get here, and potentially a little synopsis of what I might do once I arrived.

So, here was my thought process.

Like I said, it’s an experiment, and so I’m still just sort of beta-testing this feature, and trying to analyze how useful/annoying people find it. We already get so many emails, so I really wanted to make sure I wasn’t bombarding our visitors with additional garbage, or even worse, confusing them with unneeded information like what I’d recently experienced.

First of all, it would be all about timing. While talking out loud in the Labs about this one, Aaron’s comment was simply “time zones.” Computer’s have time zones ( all of ours are set to UTC ), people are in time zones. It was clearly something to consider.

Right now, we can only assume that you will be here sometime during our open hours on the day you purchased the ticket. We don’t presently do timed tickets, and unlike an event space, each day’s “performance” spans the entirety of our hours.

So we decided to try out sending the reminders the day before at 4pm, our time. I guess it’s generally safe to say that visitors are nearing our time zone the day before their visit, but its really still a best guess. Also, we are not going to “remind you” if you are booking for the same day as that’s probably overkill. So for now, at 4pm, the day before your visit, is when the email goes out.

Next I had some fun coming up with a way to extract all the relevant info from our Ticketing CRM ( Tessitura ).

I needed the following info:

  • All the things going on tomorrow ( this is sort of future proofing for when we let you book other things beyond general admission )
  • All the current orders for all the things going on tomorrow.
  • All the email addresses for all the orders for all the things going on tomorrow.

Getting “tomorrow” was pretty easy in PHP.

$datetime = new DateTime(‘tomorrow’);
$tomorrow = $datetime->format(‘Y-m-d’);

4pm EST is 9pm UTC on the same day, so all good there in calculating “tomorrow.”

Our Tessitura API wrapper I mentioned in my last post has a method that lets us get all the “performances” in Tessitura for a given date range. Simply passing it “tomorrow” yields us all the things we are looking for. We also have a method that can get all the orders placed for a given performance. Finally, we have a method that gets the email for the user that placed the order.

Now we can send the actual email.

Obviously, people place multiple orders and buy multiple tickets per order. I really only want to send one email regardless of what you’ve booked. So when I am looping through all the orders, I only add the email address to the list once.

The last step was to make a cron job that runs this script once a day at 4pm. Done!

( Incidentally, all of our servers are set to UTC, but for some reason our RedHat server’s crontab doesn’t seem to care, and somehow ( possibly magically ) thinks it’s on EST. I have yet to figure out why this is, but for now I am just going with it. )

Right now, the email is a fixed template. We are sending out emails via Mandrill, so we get some decent analytics and can track open rates, and click rates. We also added Google Analytics tracking codes to all the links in the email so we can see what people are clicking right in GA.

So far we’ve experienced an open rate of about 75% and a click rate of about 20%, which seems pretty good to me.

Our open and click rate for the last 30 days.

Our open and click rate for the last 30 days.

And here is the GA results for the “Ticket Reminder” campaign from the same time period. From here you can dive deeper into the analytics to see what pages people are heading to once they are on the site, and all sorts of other metrics.

GA TicketReminder

Since you can only really “see” this feature if you book an advance ticket I’ve posted an image of what the email looks like below. We went through a few design iterations to get it to look the way it looks, and I’d really love to hear your thoughts about it. If you were visiting us, and received this email the day before your visit at 4pm, would you find it useful, annoying, or confusing?

Reminder Email Template

Our reminder email template

Of course this will change again when the Pen goes live shortly.

How re-opening the museum enhanced our online collection: new views, new API methods

At the backend of our museum’s new interactive experiences lies our API, which is responsible for providing the frontend with all the data necessary to flesh out the experience. From everyday information like an object’s title to more novel features such as tags, videos and people relationships, the API gathers and organizes everything that you see on our digital tables before it gets displayed.

In order to meet the needs of the experiences designed for us by Local Projects on our interactive tables, we added a lot of new data to the API. Some of it was sitting there and we just had to go find it, other aspects we had to generate anew.

Either way, this marks a huge step towards a more complete and meaningful representation of our collection on the internet.

Today, we’re happy to announce that all of this newly-gathered data is live on our website and is also publicly available over the API (head to the API methods documentation to see more about that if you’re interested in playing with it programmatically).

People

For the Hewitt Sisters Collect exhibition, Local Projects designed a front-end experience for the multitouch tables that highlights the early donors to the museum’s collection and how they were connected to each other. Our in-house “TMS liaison”, Sara Rubinow, worked to gather and structure this information before adding it to TMS, our collection management system, as “constituent associations”. From there I extracted the structured data to add to our website.

We created a the following new views on the web frontend to house this data:

We also added a few new biography-related fields: portraits or photographs of Hewitt Sisters people and two new biographies, one 75 words and the other 50 characters. These changes are viewable on applicable people pages (e.g. Eleanor Garnier Hewitt) and the search results page.

The overall effect of this is to make more use of this ‘people-related’ data, and to encourage the further expansion of it over time. We can already imagine a future where other interfaces examining and revealing the network of relationships behind the people in our collection are easily explored.

Object Locations and Things On Display

Some of the more difficult tasks in updating our backend to meet the new requirements related to dealing with objects no longer being static – but moving on and off display. As far as the website was concerned, it was a luxury in our three years of renovation that objects weren’t moving around a whole lot because it meant we didn’t have to prioritize the writing of code to handle their movement.

But now that we are open we need to better distinguish those objects in storage from those that are on display. More importantly, if it is on display, we also need to say which exhibition, and which room it is on display.

Object locations have a lot of moving parts in TMS, and I won’t get into the specifics here. In brief, object movements from location to location are stored chronologically in a database. The “movement” is its own row that references where it moved and why it moved there. By appropriately querying this history we can say what objects have ever been in the galleries (like all museums there are a large portion of objects that have never been part of an exhibition) and what objects are there right now.

We created the following views to house this information:

Exhibitions

The additions we’ve made to exhibitions are:

There is still some work to be done with exhibitions. This includes figuring out a way to handle object rotations (the process of swapping out some objects mid-exhibition) and outgoing loans (the process of lending objects to other institutions for their exhibitions). We’re expecting that objects on loan should say where they are, and in which external exhibition they are part of — creating a valuable public ‘trail’ of where an object has traveled over its life.

Tags

Over the summer, we began an ongoing effort to ‘tag’ all the objects that would appear on the multitouch tables. This includes everything on display, plus about 3,000 objects related to those. The express purpose for tags was to provide a simple, curated browsing experience on the interactive tables – loosely based around themes ‘user’ and ‘motif’. Importantly these are not unstructured, and Sara Rubinow did a great job normalizing them where possible, but there haven’t been enough exhibitions, yet, to release a public thesaurus of tags.

We also added tags to the physical object labels to help visitors draw their own connections between our objects as they scan the exhibitions.

On the website, we’ve added tags in a few places:

That’s it for now – happy exploring! I’ll follow up with more new features once we’re able to make the associated data public.

Until then, our complete list of API methods is available here.

Our new ticketing website

Screenshot 2014-12-16 16.25.03

This past week we launched https://tickets.cooperhewitt.org — a new online ticketing system which leverages our Consitutent Relationship Management application, Tessitura, as its “source of truth.”

It’s a simple application, really. It lets you pick the day you want to come visit us, select the kind of tickets you want to buy, and then you fill out your basic info, plug in your credit card digits and off you go. Moments later you receive an email with PDF versions of your tickets attached.

On the user-facing side of things, it is designed to be as simple as possible. You don’t need to log in, there is no “shopping cart”, and above all, you can do all of this from your phone if you want to skip ahead of the lines this Winter on your first visit back since we closed nearly three years ago.

A little background

The idea to pre-book tickets online came at us from a number of directions. Some time last year we decided to invest in Tessitura to handle all of our CRM needs. Tessitura, if you have never heard of it before, is an enterprise class, battleship that grew out of the Met Opera House and has made its way around the performing arts sector. It’s a great tool if you are looking to centralize everything there is to know about a Constituent. As a museum, it is also appealing in that it does many of the things that non-profit type cultural institutions need to do out of the box.

So, Tessitura. It is now a thing at our museum. Everyone on our staff started ramping up on the software and getting settled into the idea of using Tessitura for one thing or another. Our department began to get requests.

Obviously our membership department would like to use Tessitura to sell and manage memberships. Development would like to use it to manage and collect donations and gifts. Education would like to centralize all public programs, book tours, manage special events, and all of the other crazy things they do. And did I mention we have a museum that sells tickets?

This is how it always starts. The avalanche of ideas, whiteboard sessions, product demos and gentle emails that say things like “When will Tessitura be ready?” begin to pour in. You have to soak it all in and then wring it all out.

The Simplest, Dumbest Thing

Aaron says that quite often. “Just do the simplest, dumbest thing…” and he’s right. Often times you have to boil things down a bit to get to the real core issues at hand. It was clear from the start that this would be an essential part of the “design process” on this project.

So, I started out by asking myself this question:  “What is the most basic thing we want to do with Tessitura?”

I wound up with two clear answers.

  1. We wanted to be able to sell tickets online. Just basic, general admission tickets. Nothing fancy yet.
  2. We eventually want to use Tessitura as our identity provider, and as a way to pair your ticket with the Pen. More on that towards the end…

Tackling Tickets

So to get started, I thought about the challenge of selling a ticket online. I looked at other sites I liked such as StubHub, EventBrite, and other venue websites that I knew used Tessitura like BAM, Jazz at Lincoln Center, and the 92Y. I did some research, I bought some tickets, and I asked all my friends who used these sites what they liked and disliked. Eventually I started to find my way gravitating towards the Eventbrite way of doing things. We have been using Eventbrite for a couple of years here at Cooper Hewitt, for the most part as a way to sell tickets to education programs and events.

To tell you the truth, Eventbrite has been a dream come true for us and our sales for these events, both paid and free, have been very good. So, what is Eventbrite doing right? Simply put they’ve made the process of purchasing a ticket to an event online stupidly simple.

I wanted to know more. So, I spent some time and slowly walked myself through the process of booking all kinds of things on Eventbrite. I tried to step through each page in the process, I tried to notice what kind of user feedback I got, and what sort of emails and notifications I got. I tried the same on mobile devices and through their iPhone app. Here are a few takeaways.

  1. You don’t need to register in order to purchase a ticket on Eventbrite.
  2. If you don’t register when making an initial purchase, you can register later and see your purchase history.
  3. As soon as you book your tickets, you get them in an email.
  4. The Thank You page is just as useful when you are logged out as when you are logged in.
  5. Most importantly, you can only buy one thing at a time. In other words, there is no idea of a Shopping Cart.

That last one was pretty huge. Most eCommerce sites are built around the idea that users put items in a cart and then “Checkout.” Eventbrite doesn’t do it this way. Instead, you simply pick the thing you are wanting to attend, select the kinds of tickets you want ( student, senior, etc ) and then put in your credit card info. Once you hit submit, you’ve paid for your tickets and your transaction is complete.

I felt this flow was incredibly powerful and probably one of the reasons Eventbrite was working so well for our education programs. There are simply less chances to change your mind, less confusion over what you are buying, and the end-to-end process of picking something out and paying for it is just so much smaller than the more traditional shopping cart experience.

I began to think of it kind of like the difference between getting your weekly groceries and just picking up a six pack. The behaviors are totally different because you are trying to accomplish two totally different tasks. One is very routine, requires a little creativity and some patience, and a willingness to wander around and “pick.” The other is a strategic strike, designed to get in and get out so you can get home and relax with a nice cold one.

The Eventbrite concept seemed like what we wanted. I had my simplest dumbest thing, and something to model it on.

Technical Challenges

With every new project comes some kind of technical challenge(s). Tessitura is a “new to us” application and our staff at Cooper Hewitt were clearly at the bottom left of a steep learning curve when we started the project. We also had many challenges we knew we were going to have to face because we are a “Governmental Institution.” So things like PCI compliance, complex network configurations, and security scans were all things I was going to need to learn about.

Tessitura comes with two APIs. One is a somewhat older ( as in the first thing they built ) SOAP API, and the other is a newer ( as in still under development ) REST API. Both allow data to get in and out of Tessitura in a variety of forms.

In addition to the standard SOAP and REST APIs, Tessitura has the facility to expose just about anything you can build into an MS SQL Stored Procedure through its API. This is an incredibly powerful feature, which can also be quite dangerous if you think about it.

When I attended the Tessitura Learning Conference & Convention this past summer, it became clear to me that many institutions that use Tessitura are building some kind of API wrapper, or some type of middleware that helps them make sense of it all. We chose to do the same. To accomplish this, I chose to model the API wrapper off our own Collections API, which is a REST-ish API based on Flamework, and uses oAuth2 for authentication. Having this API wrapper allows us to all speak the same language and use the same interface. It is also very, very similar to the Collections API, so among our own staff, it is pretty easy to navigate. The API wrapper, wraps methods from both the Tessitura SOAP and the REST APIs and presents a unified interface to both of them. It doesn’t implement every single API method, and it exposes “new” methods that we have custom built via those Stored Procedures I mentioned.

The Tickets website is a separate project that talks directly to the API. It is also a Flamework project, written primarily in PHP. It uses a MySQL database to store a small amount of local data, but for the most part it is making calls to the API wrapper, which in turn is making calls to either the SOAP or REST Tessitura APIs. Tessitura is the source of truth for most of the things the Ticketing website does.

The Front End

Screenshot 2014-12-16 16.25.44

The Tickets site from the user’s perspective is designed to be extremely simple. I worked with Sam ( our in house front end guru ) to build a responsive, and simple web application that does basically one thing, but the devil is always in the details.

At first glance, all the site does is allow you to select the day you want to come visit us, pick out what kinds of tickets you want, and then fill out your billing info and receive your tickets. It’s basically a calendar and a form and not much more. But like I said, the devil is in the details.

First, Sam built a beautiful calendar like one I’d never seen before. We talked at length about how dumb most website calendars were, and we tried to push things in a new direction. Our calendar starts out by showing you what you most likely are looking for–Today. It displays the next bunch of days up to two weeks worth by default, and if you are looking for a special date, it lets you drop down and navigate around until you find it. On mobile its slightly different in that it doesn’t show you any past dates ( why would you want to book the past? ) and it limits things a little so you’re not as overwhelmed by the interface. We call this “designing for context” and we thought that users might be using their mobile phones to buy a ticket online and jump up to the front of the queue.

Once you’ve selected the date you want, the app loads up the available tickets right below the calendar. You can easily change your mind and pick a different date. From here you just select the type and quantity of each ticket you want. Sam’s code does a bunch of front-end validations to make sure everything you are trying to do makes sense ( you can only purchase a youth ticket with another paid ticket for example ). Between the two of us, we try to do as much validation to what you are selecting as possible, both in Javascript on the front-end and in PHP on the back.

Once you hit Order Now, an order form is generated and displayed. I think its important to note here that nothing has really happened yet in Tessitura-land. We asked Tessitura for some details about the tickets you are interested in, but we haven’t “added them to your cart” or anything like that as of yet.

Screenshot 2014-12-16 16.28.34

You can then fill out your vitals. We ask you to give us your name, email, credit card details and billing address. We store all of this, with the exception of your credit card, in Tessitura. We make you an account, and at this point we send you an activation email which allows you to set up your password at your leisure. If all goes well with your credit card, we build your tickets ( I chose to do all this with FPDF rather than try and use Tessitura’s built in Print at Home server thing ) send them in an email, and then take you to a Thank You Page. You never had to register, or log in, and you technically never do. Your PDF tickets arrive in your inbox and that is technically all you need.

Screenshot 2014-12-16 16.27.35

As a little bonus, we just stick the barcode and some basic metadata about each ticket you bought on the Thank You Page so you can just present your phone at the door. This part is still a little rough and I chose to leave it that way for the time being so we can do some user research in the galleries. It’s a nice feature, but only time will tell if people actually use it or if it needs some finesse.

Screenshot 2014-12-16 16.26.50

Now that you are in the system, you can buy more tickets using the same email address and they will be connected to your same account, even if you still have never logged in. If you do choose to activate your account and login, you can look at your order history, and reprint your tickets if you’ve lost your email copy.

Screenshot 2014-12-16 16.27.10

Tessitura & The Pen

A while back in this blog post, I mentioned that we also wanted to use Tessitura as our identity provider and as a way to pair the ticket you’ve purchased with the pen we’ve handed you. This work is nearly done, but not yet in production. It will go live when our pen is available sometime in early 2015. But, the short story is, when you buy a ticket, either online or in person, we generate a special coded version of your ticket. This code gets paired up with the internal ID of the pen we gave you and that pairing gets stored in a database. What this all amounts to is that when you get home, and you want to see all the cool things you’ve done with your pen, you simply enter the code ( or go to a custom short URL ) on our website. We look up your pairing and are able to connect your Identity ( Tessitura ) with your Visit ( on the Collections site ). But that is all the topic of a future series of blog posts.

Next

Now that the Tickets site is up and running, and we are watching the sales roll in, it’s easy to start thinking of more features and new ways to expand what the site can do. I’ve already started building simple admin tools and have been thinking about building a basic check-in app for off-site events. It’s too early to talk about all of the things we aim to do and how we plan to expand our online sales, but I’m hopeful that we will stay focused and narrow in our approach, offering our users the most elegant visitor experience possible. Or at the very least, the simplest, dumbest thing.

API methods (new and old) to reflect reality

design-eagle-cloud

A quick end-of-week blog post to mention that now that the museum has re-opened we have updated the cooperhewitt.galleries.openingHours and cooperhewitt.galleries.isOpen API methods to reflect… well, reality.

In addition to the cooperhewitt.galleries API methods we’ve also published corresponding openingHours and isOpen methods for the cafe!

For example, cooperhewitt.galleries.isOpen.

curl 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.galleries.isOpen&access_token=***'

{
	"open": 0,
	"holiday": 0,
	"hours": {
		"open": "10:00",
		"close": "18:00"
	},
	"time": "18:01",
	"timezone": "America/New_York",
	"stat": "ok"
}

Or, cooperhewitt.cafe.openingHours.

curl -X GET 'https://api.collection.cooperhewitt.org/rest/?method=cooperhewitt.cafe.openingHours&access_token=***'

{
	"hours": {
		"Sunday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Monday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Tuesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Wednesday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Thursday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Friday": {
			"open": "07:30",
			"close": "18:00"
		},
		"Saturday": {
			"open": "07:30",
			"close": "21:00"
		}
	},
	"timezone": "America/New_York",
	"stat": "ok"
}

Because coffee, right?

Sharing our videos, forever

This is one in a series of Labs blogposts exploring the inhouse built technologies and tools that enable everything you see in our galleries.

Our galleries and Pen experience are driven by the idea that everything a visitor can see or do in the museum itself should be accessible later on.

Part of getting the collections site and API (which drives all the interfaces in the galleries designed by Local Projects) ready for reopening has involved the gathering and, in some cases, generation of data to display with our exhibits and on our new interactive tables. In the coming weeks, I’ll be playing blogger catch-up and will write about these new features. Today, I’ll start with videos.

jazz hands

Besides the dozens videos produced in-house by Katie – such as the amazing Design Dictionary series – we have other videos relating to people, objects and exhibitions in the museum. Currently, these are all streamed on our YouTube channel. While this made hosting much easier, it meant that videos were not easily related to the rest of our collection and therefore much harder to find. In the past, there were also many videos that we simply didn’t have the rights to show after their related exhibition had ended, and all the research and work that went into producing the video was lost to anyone who missed it in the gallery. A large part of this effort was ensuring that we have the rights to keep these videos public, and so we are immensely grateful to Matthew Kennedy, who handles all our image rights, for doing that hard work.

A few months ago, we began the process of adding videos and their metadata in to our collections website and API. As a result, when you take a look at our page for Tokujin Yoshioka’s Honey-Pop chair , below the object metadata, you can see its related video in which our curators and conservators discuss its unique qualities. Similarly, when you visit our page for our former director, the late Bill Moggridge, you can see two videos featuring him, which in turn link to their own exhibitions and objects. Or, if you’d prefer, you can just see all of our videos here.

In addition to its inclusion in the website, video data is also now available over our API. When calling an API method for an object, person or exhibition from our collection, paths to the various video sizes, formats and subtitle files are returned. Here’s an example response for one of Bill’s two videos:

{
  "id": "68764297",
  "youtube_url": "www.youtube.com/watch?v=DAHHSS_WgfI",
  "title": "Bill Moggridge on Interaction Design",
  "description": "Bill Moggridge, industrial designer and co-founder of IDEO, talks about the advent of interaction design.",
  "formats": {
    "mp4": {
      "1080": "https://s3.amazonaws.com/videos.collection.cooperhewitt.org/DIGVID0059_1080.mp4",
      "1080_subtitled": "https://s3.amazonaws.com/videos.collection.cooperhewitt.org/DIGVID0059_1080_s.mp4",
      "720": "https://s3.amazonaws.com/videos.collection.cooperhewitt.org/DIGVID0059_720.mp4",
      "720_subtitled": "https://s3.amazonaws.com/videos.collection.cooperhewitt.org/DIGVID0059_720_s.mp4"
    }
  },
  "srt": "https://s3.amazonaws.com/videos.collection.cooperhewitt.org/DIGVID0059.srt"
}

The first step in accomplishing this was to process the videos into all the formats we would need. To facilitate this task, I built VidSmanger, which processes source videos of multiple sizes and formats into consistent, predictable derivative versions. At its core, VidSmanger is a wrapper around ffmpeg, an open-source multimedia encoding program. As its input, VidSmanger takes a folder of source videos and, optionally, a folder of SRT subtitle files. It outputs various sizes (currently 1280×720 and 1920×1080), various formats (currently only mp4, though any ffmpeg-supported codec will work), and will bake-in subtitles for in-gallery display. It gives all of these derivative versions predictable names that we will use when constructing the API response.

a flowchart showing two icons passing through an arrow that says "vidsmang" and resulting in four icons

Because VidSmanger is a shell script composed mostly of simple command line commands, it is easily augmented. We hope to add animated gif generation for our thumbnail images and automatic S3 uploading into the process soon. Here’s a proof-of-concept gif generated over the command line using these instructions. We could easily add the appropriate commands into VidSmanger so these get made for every video.

anim

For now, VidSmanger is open-source and available on our GitHub page! To use it, first clone the repo and the run:

./bin/init.sh

This will initialize the folder structure and install any dependencies (homebrew and ffmpeg). Then add all your videos to the source-to-encode folder and run:

./bin/encode.sh

Now you’re smanging!

HTTP ponies

Most of the image processing for the collections website is done using the Python programming language. This includes things like: extracting colours or calculating an image’s entropy (its “busy-ness”) or generating those small halftone versions of image that you might see while you wait for a larger image to load.

Soon we hope to start doing some more sophisticated computer vision related work which will almost certainly mean using the OpenCV tool chain. This likely means that we’ll continue to use Python because it has easy to use and easy to install bindings to hide most of the fiddly bits required to look at images with “robot eyes”.

shirt

The collections website itself is not written in Python and that’s okay. There are lots of ways for different languages to hold hands inside of a single “application” and we’ve used many of them. But we also think that most of these little pieces of functionality are useful in and of themselves and shouldn’t require that a person (including us) have to burden themselves with the minutiae of the collections website infrastructure to use them.

We’ve slowly been taking the various bits of code we’ve written over the years and putting them in to discrete libraries that can then be wrapped up in little standalone HTTP “pony” or “plumbing” servers. This idea of exposing bespoke pieces of functionality via a web server is hardly new. Dave Winer has been talking about “fractional horsepower HTTP servers” since 1997. What’s changed between then and now is that it’s more fun to say “HTTP pony” and it’s much easier to bake a little web server in to an application because HTTP has become the lingua franca of the internet and that means almost every programming language in use today knows how to “speak” it.

In practice we end up with a “stack” of individual pieces that looks something like this:

  1. Other people’s code that usually does all the heavy-lifting. An example of this might be Giv Parvaneh’s RoyGBiv library for extracting colours from images or Mike Migurski’s Atkinson library for dithering images.
  2. A variety of cooperhewitt.* libraries to hide the details of other people’s code.
  3. The cooperhewitt.flask.http_pony library which exports a setup of helper utilities for the running Flask-based HTTP servers. Things like: doing a minimum amount of sanity checking for uploads and filenames or handling (common) server configuration details.
  4. A variety of plumbing-SOMETHING-server HTTP servers which export functionality via HTTP GET and POST requests. For example: plumbing-atkinson-server, plumbing-palette-server and so on.
  5. Flask, a self-described “micro-framework” which is what handles all the details of the HTTP call and response life cycle.
  6. Optionally, a WSGI-compiliant server-container-thing-y for managing requests to a number Flask instances. Personally we like gunicorn but there are many to choose from.

Here is a not-really-but-treat-it-like-pseudo-code-anyway example without any error handling for the sake of brevity of a so-called “plumbing” server:

# Let's pretend this file is called 'example-server.py'.

import flask
from flask_cors import cross_origin
import cooperhewitt.example.code as code
import cooperhewitt.flask.http_pony as http_pony

app = http_pony.setup_flask_app('EXAMPLE_SERVER')

@app.route('/', methods=['GET', 'POST'])
@cross_origin(methods=['GET', 'POST'])
def do_something():

    if flask.request.method=='POST':
       path = http_pony.get_upload_path(app)
    else:
       path = http_pony.get_local_path(app)

    rsp = code.do_something(path)
    return flask.jsonify(rsp)

if __name__ == '__main__':
    http_pony.run_from_cli(app)

So then if we just wanted to let Flask take care of handling HTTP requests we would start the server like this:

$> python example-server.py -c example-server.cfg

And then we might talk to it like this:

$> curl -X POST -F 'file=@/path/to/file' https://localhost:5000

Or from the programming language of our choosing:

function example_do_something($path){
        $url = "https://localhost:5000";
        $file = curl_file_create($path);
        $body = array('file' => $file);
        $rsp = http_post($url, $body);
        return $rsp;
}

Notice the way that all the requests are being sent to localhost? We don’t expose any of these servers to the public internet or even between different machines on the same network. But we like having the flexibility to do that if necessary.

Finally if we just need to do something natively or want to write a simple command-line tool we can ignore all the HTTP stuff and do this:

$> python
>>> import cooperhewitt.example.code as code
>>> code.do_something("/path/to/file")

Which is a nice separation of concerns. It doesn’t mean that programs write themselves but they probably shouldn’t anyway.

If you think about things in terms of bricks and mortar you start to notice that there is a bad habit in (software) engineering culture of trying to standardize the latter or to treat it as if, with enough care and forethought, it might achieve sentience.

That’s a thing we try to guard against. Bricks, in all their uniformity, are awesome but the point of a brick is to enable a multiplicity of outcomes so we prefer to leave those details, and the work they entail, to people rather than software libraries.

face-tower

Most of this code has been open-sourced and hiding in plain sight for a while now but since we’re writing a blog post about it all, here is a list of related tools and libraries. These all fall into categories 2, 3 or 4 in the list above.

  • cooperhewitt.flask — Utility functions for writing Flask-based HTTP applications. The most important thing to remember about things in this class is that they are utility functions. They simply wrap some of the boilerplate tasks required to set up a Flask application but you will still need to take care of all the details.

Everything has a standard Python setup.py for installing all the required bits (and more importantly dependencies) in all the right places. Hopefully this will make it easier for us break out little bits of awesomeness as free agents and share them with the world. The proof, as always, will be in the doing.

face-mirror

We’ve also released go-ucd which is a set of libraries and tools written in Go for working with Unicode data. Or more specifically, for the time being since they are not general purpose Unicode tools, looking up the corresponding ASCII name for a Unicode character.

For example:

$> ucd 䍕
NET; WEB; NETWORK, NET FOR CATCHING RABBIT

Or:

$> ucd THIS → WAY
LATIN CAPITAL LETTER T
LATIN CAPITAL LETTER H
LATIN CAPITAL LETTER I
LATIN CAPITAL LETTER S
SPACE
RIGHTWARDS ARROW
SPACE
LATIN CAPITAL LETTER W
LATIN CAPITAL LETTER A
LATIN CAPITAL LETTER Y

There is, of course, a handy “pony” server (called ucd-server) for asking these questions over HTTP:

$> curl -X GET -s 'https://localhost:8080/?text=♕%20HAT' | python -mjson.tool
{
    "Chars": [
        {
            "Char": "u2655",
            "Hex": "2655",
            "Name": "WHITE CHESS QUEEN"
        },
        {
            "Char": " ",
            "Hex": "0020",
            "Name": "SPACE"
        },
        {
            "Char": "H",
            "Hex": "0048",
            "Name": "LATIN CAPITAL LETTER H"
        },
        {
            "Char": "A",
            "Hex": "0041",
            "Name": "LATIN CAPITAL LETTER A"
        },
        {
            "Char": "T",
            "Hex": "0054",
            "Name": "LATIN CAPITAL LETTER T"
        }
    ]
}

This one, potentially, has a very real and practical use-case but it’s not something we’re quite ready to talk about yet. In the meantime, it’s a fun and hopefully useful tool so we thought we’d share it with you.

Note: There are equivalent libraries and an HTTP pony for ucd written in Python but they are incomplete compared to the Go version and may eventually be deprecated altogether.

Comments, suggestions and gentle clue-bats are welcome and encouraged. Enjoy!

face-stand

Rethinking Search on the Collections Site

One of my longer-term projects since joining the museum has been rethinking how the search feature functions on the collections website. As we get closer to re-opening the museum with a suite of new technologies, our work in collaboration with Local Projects has prompted us to take a close look at the moving pieces that comprise the backend of our collections site and API. Search, naturally, forms a large piece of that. Last week, after a few weeks of research and experimentation, I pushed the first iteration live. In this post, I’ll share some of the thoughts and challenges that informed our changes.

First, a glossary of terms for readers who (like me, a month ago) have little-to-no experience with the inner-workings of a search engine:

  • Platform: The software that actually does the searching. The general process is that we feed data to the platform (see “index”), and then we ask it for results matching a certain set of parameters (see “query”). Everything else is handled by the platform itself. Part of what I’ll get into below involves our migration from one platform, Apache Solr, to another, Elasticsearch.
  • Index: An index is the database that the search platform uses to perform searches on. The search index is a lot like the primary database (it probably could fill that role if it had to) but it adds extra functionality to facilitate quick and accurate retrieval of search results.
  • Query: The rules to follow in selecting things that are appropriate to provide as search results. For users, the query could be something like “red concert poster,” but we have to translate that into something that the search provider will understand before results can be retrieved. Search providers give us a lot of different ways we can query things (ranges of a number, geographic distance or word matching to name a few), and a challenge for us as interface designers is to decide how transparent we want to make that translation. Queries also allow us to define how results should be sorted and how to facet results.
  • Faceting/Aggregation: A way of grouping results based on traits they posses. For example, faceting on “location” when you search our collection for “cat” reveals that 80 of our cat-related things are from the USA, 16 are from France, and so on.
  • Analysis (Tokenization/Stemming etc): A process that helps a computer work with sentences. Tokenization, for example, would split a search for “white porcelain vase” into the individual tokens: “white,” “porcelain” and “vase,” and then perform a search for any number of those tokens. Another example is stemming, which would allow the platform to understand that if a user searches for “running,” then items containing other words like “run” or “runner” are also valid search results. Analysis also gives us the opportunity to define custom rules that might include “marathon” and “track” as valid results in a search for “running.”

The State of Search

Our old search functionality showed its symptoms of under-performance in a few ways. For example, basic searches — phrases like “red concert poster” — turned up no results despite the presence of such objects in our collection, and searching for people would not return the person themselves, only their objects. These symptoms led me to identify what I considered the two big flaws in our search implementation.

On the backend, we were only indexing objects. This meant that if you searched for “Ray Eames,” you would see all of the objects we have associated with her, but to get to her individual person page, you would have to first click on an object and then click on her name. Considering that we have a lot of non-objects1, it makes sense to index them all and include them, where relevant, in the results. This made my first objective to find a way to facilitate the indexing and querying of different types of things.

On the frontend, we previously gave users two different ways to search our collection. The default method, accessible through the header of every page, performed a full text search on our Solr index and returned results sorted by image complexity. Users could also choose the “fancy search” option, which allows for searches on one or more of the individual fields we index, like “medium,” “title,” or “decade.” We all agreed here that “fancy search” was confusing, and all of its extra functionality — faceting, searching across many fields — shouldn’t be seen as “advanced” features. My second objective in rethinking how search works, then, was to unify “fancy” and “regular” search into just “search.”

Objective 1: Update the Backend

Our search provider, Solr, requires that a schema be present for every type of thing being indexed. The schema (an XML file) tells Solr what kind of value to expect for a certain field and what sort of analysis to perform on the field. This means I’d have to write a schema file — anticipating how I’d like to form all the indexed data — for each new type of thing we want to search on.

One of the features of Elasticsearch is that it is “schemaless,” meaning I can throw whatever kind of data I want at the index and it figures out how to treat it. This doesn’t mean Elasticsearch is always correct in its guesses — for example, it started treating our accession numbers as dates, which made them impossible to search on — so it also gives you the ability to define mappings, which has the same effect as Solr’s schema. But if I want to add “people” to the index, or add a new “location” field to an object, using Elasticsearch means I don’t have to fiddle with any schemas. This trait of Elasticsearch alone made worth the switch (see Larry Wall’s first great virtue of programmers, laziness: “the quality that makes you go to great effort to reduce overall energy expenditure”) because it’s important to us that we have the ability to make quick changes to any part of our website.

Before building anything in to our web framework, I spent a few days getting familiar with Elasticsearch on my own computer. I wrote a python script that loops through all of the CSVs from our public collections repository and indexed them in a local Elasticsearch server. From there, I started writing queries just to see what was possible. I was quickly able to come up with a lot of the functionality we already have on our site (full-text search, date range search) and get started with some complex queries as well (“most common medium in objects between 1990-2000,” for example, which is “paper”). This code is up on Github, so you can get started with your own Cooper Hewitt search engine at home!

Once I felt that I had a handle on how to index and query Elasticsearch, I got started building it into our site. I created a modified version of our Solr indexing script (in PHP) that copied objects, people, roles and media from MySQL and added them to Elasticsearch. Then I got started on the endpoint, which would take search parameters from a user and generate the appropriate query. The code for this would change a great deal as I worked on the frontend and occasionally refactored and abstracted pieces of functionality, but all the pieces of the pipeline were complete and I could begin rethinking the frontend.

Objective 2: Update the Frontend

Updating the frontend involved a few changes. Since we were now indexing multiple categories of things, there was still a case for keeping a per-category search view that gave users access to each field we have indexed. To accommodate these views, I added a tab bar across the top of the search forms, which defaults to the full-collection search. This also eliminates confusion as to what “fancy search” did as the search categories are now clearly labeled.

Showing the tabbed view for search options

Showing the tabbed view for search options

The next challenge was how to display sorting. Previously, the drop-down menu containing sort options was hidden in a “filter these results” collapsible menu. I wanted to lay out all of the sorting options for the user to see at a glance and easily switch between sorting modes. Instead of placing them across the top in a container that would push the search results further down the page, I moved them to a sidebar which would also house search result facets (more on that soon). While it does cut in to our ability to display the pictures as big as we’d like, it’s the only way we can avoid hiding information from the user. Placing these options in a collapsible menu creates two problems: if the menu is collapsed by default, we’re basically ensuring that nobody will ever use them. If the menu is expanded by default, then it means that the actual results are no longer the most important thing on the page (which, on a search results page, they clearly are). The sidebar gives us room to lay out a lot of options in an unobtrusive but easily-accessible way2.

Switching between sort mode and sort order.

Switching between sort mode and sort order.

The final challenge on the frontend was how to handle faceting. Faceting is a great way for users who know what they’re looking for to narrow down options, and a great way for users who don’t know what they’re looking for to be exposed to the various buckets we’re able to place objects in to.

Previously on our frontend, faceting was only available on fancy search. We displayed a few of the faceted fields across the top of the results page, and if you wanted further control, users could select individual fields to facet on using a drop-down menu at the bottom of the fancy search form. When they used this, though, the results page displayed only the facets, not the objects. In my updates, I’ve turned faceting on for all searches. They appear alongside the search results in the sidebar.

Relocating facets from across the top of the page to the sidebar

Relocating facets from across the top of the page to the sidebar

Doing it Live

We initially rolled these changes out about 10 days ago, though they were hidden from users who didn’t know the URL. This was to prove to ourselves that we could run Elasticsearch and Solr alongside each other without the whole site blowing up. We’re still using Solr for a bit more than just the search (for example, to show which people have worked with a given person), so until we migrate completely to Elasticsearch, we need to have both running in parallel.

A few days later, I flipped the switch to make Elasticsearch the default search provider and passed the link around internally to get some feedback from the rest of the museum. The feedback I got was important not just for working out the initial bugs and kinks, but also (and especially for myself as a relative newbie to the museum world) to help me get the language right and consider all the different expectations users might have when searching our collection. This resulted in some tweaks to the layout and copy, and some added functionality, but mostly it will inform my bigger-picture design decisions going forward.

A Few Numbers…

Improving performance wasn’t a primary objective in our changes to search, but we got some speed boosts nonetheless.

Query Before (Solr) After (Elasticsearch)
query=cat, facets on 162 results in 1240-1350ms 167 results in 450-500ms
year_acquired=gt1990, facets on 13,850 results in 1430-1560ms 14,369 results in 870-880ms
department_id=35347493&period_id=35417101, facets on 1,094 results in 1530-1580ms 1,150 results in 960-990ms

There are also cases where queries that turned up nothing before now produce relevant results, like “red concert poster,” (0 -> 11 results) “German drawings” (0 -> 101 results) and “checkered Girard samples” (0 -> 10 results).

Next Steps

Getting the improved search in front of users is the top priority now – that means you! We’re very interested in hearing about any issues, suggestions or general feedback that you might have — leave them in the comments or tweet us @cooperhewittlab.

I’m also excited about integrating some more exiting search features — things like type-ahead search and related search suggestion — on to the site in the future. Additionally, figuring out how to let users make super-specific queries (like the aforementioned “most common medium in objects between 1990-2000”) is a challenge that will require a lot of experimentation and testing, but it’s definitely an ability we want to put in the hands of our users in the future.

New Search is live on our site right now – go check it out!

1 We’ve been struggling to find a word to use for things that are “first-class” in our collection (objects, people, countries, media etc.) that makes sense to both museum-folk and the laypeople. We can’t use “objects” because those already refer to a thing that might go on display in the museum. We’ve also tried “items,” “types” and “isas” (as in, “what is this? it is a person”). But nothing seems to fit the bill.

2 We’re not in complete agreement here at the labs over the use of a sidebar to solve this design problem, but we’re going to leave it on for a while and see how it fares with time. Feedback is requested!

The Medium is the Message (and pubsocketd)

Screen Shot 2014-08-02 at 1.31.57 PM

Have you ever wanted to see a real-time view of all the objects that people are looking at on the collections website? Now you can!

At least for objects with images. There are lots of opportunities to think about interesting ways to display objects without images but since everything that follows has been a weekends-and-mornings project we’ve opted to start with the “simple” thing first.

We have lots of different ways of describing media: 12, 865 ways at last count to be precise. The medium with the most objects (2, 963) associated with it is cotton but all of these numbers are essentially misleading. The history of the cataloging of the collection has preferenced precision and detail over the kind of rough bucketing (for example, tags) that lots of people are used to these days.

It’s a practice that can sometimes seem frustrating in the moment but, in the long-run, we’re better served for it. In time we will get around to assigning high-level categorizations for equally high-level browsing but it’s worth remembering that the practice of describing objects in minute detail predates things like databases, which we take for granted today. In fact these classifications, and their associated conventions and rituals, were the de-facto databases before computers or databases had even been invented.

But 13,000 different media, most of which only describe a single object, can be overwhelming. Where do you start? How do you know what to look for? Given the breadth of our collection what don’t we have? And given the level of detail we try to assign to objects how to do you whether a search doesn’t yield any results because it’s not in our collection or simply because we’re using a different name for the same thing you’re looking for?

This is a genuinely Big and Hairy Problem and we have not solved it yet. But the ability to relay objects as they are viewed by the public, in real-time, offers an interesting opportunity: What if we just displayed (and where possible, read aloud) the medium for that object?

Screen Shot 2014-08-02 at 1.31.29 PM

That’s all The Medium is the Message does: It is an ambient display that let you keep an eye on the kinds of things that are in our collection and offered a gentle, polite way to start to see the shape of all the different things that tell the story of the museum. It’s not a tool to help you take a quiz so much as a way to absorb an awareness of the collection as if by osmosis. To show people an aspect of the collection as an avenue to begin understanding its entirety.

We’re not thinking enough about sound. If we want all these things to communicate with us, and we don’t want to be starting at screens and they’re going to do more than flash a couple of lights, then we need to work with sound. Either ‘sound effects’ that mean something or devices that talk to us. Personally, I think it’ll be the latter morphing into the former. And this is worth thinking about because it’s already creeping up on us. Self-serve checkouts are talking at us, reversing trucks are beeping at us, trucks turning left are barking at us, incoherently – all with much less apparent thought and ‘design’ than we devote to screens.

— Russell Davies,  the internet of talking

While The Medium is the Message is a full-screen application that displays a scaled-up version of the square-crop thumbnail for an object it also tries to use your browser’s text-to-speech capabilities to read aloud that object’s medium. It may not be the kind of thing you want playing in a room full of people but alone in your room, or under a pair of headphones, it’s fun to imagine it as a kind of Music for Airports for cultural heritage.

Screen Shot 2014-08-02 at 1.32.27 PM

Text-to-speech is currently best supported in Chrome and Safari. Conversely the best support for crisp and pixelated image-rendering is in Firefox. Because… computers, right?

For the time being The Medium is the Message lives in a little sand-box all by itself over here:

https://medium.collection.cooperhewitt.org

Eventually we hope to merge it back in to the main collections website but since it’s all brand-new we’re going to put it some place where it can, if necessary, have little melt-downs and temper-tantrums without adversely affecting the rest of the collections website. It’s also worth noting that some internal networks – like at a big company or organization – might still disallow WebSockets traffic which is what we’re using for this. If that’s the case try waiting until you’re home.

Screen Shot 2014-08-04 at 3.16.31 PM
And now, for the Nerdy Bits: The rest of this blog post is captial-T technical so you can stop reading now if that’s not your thing (though we think it’s stil pretty interesting even if the details sound like gibberish).

The Medium is the Message is part of a larger project to investigate a few different tools in order to understand how they might fit together and to what effect. They are:

  • Redis and in particular its implementation of Publish/Subscribe messaging paradigm – Every once in a while there’s a piece of software that is released which feels like genuine magic. Arguably one of the last examples of this was memcached originally written by Brad Fitzpatrick, for the website LiveJournal and without which entire slices of the web as we now know it wouldn’t exist. Both Redis and memcached are similar in spirit in that their feature-set is limited by design but what they claim to do Just Works™ and both have broad support across the landscape of programming languages. That last piece is incredibly important since it means we can use Redis to bridge applications written in whatever language suits the problem best. We’ll return to that idea in the discussion of “step 0” below.
  • Websockets – WebSockets are a way for a web browser and a server to create and maintain a persistent connection and to shuttle messages back and forth. Normally the chatter between a browser and a server happens akin to the way two people might send each other postcards in the mail and WebSockets are more like a pair of teenagers calling each on the phone and talking for hours and hours and hours. Sort of like Pub/Sub for a web browser, right? WebSockets have been around for a few years now but they are still a bit of a new territory; super-cool but not without some pitfalls.
  • Go – Go is a programming language from the nice people at Google, that recently celebrated its fourth anniversary. It is part of growing trend in language design to find a middle ground between loosely typed languages, and the need to develop stable applications with a minimum of fussiness. Go is probably not the language we would develop a complex user-facing application in but for long-running services with well-defined boundaries it seems kind of perfect. (Go’s notion of code-based channels are a fascinating parallel to both Pub/Sub and WebSockets but that’s a whole other blog post.)

Fun fact: The Labs’ very own Sam Brenner‘s ITP thesis project called Adventures of Teen Bloggers is an archive of old LiveJournal accounts in the shape of an 8-bit video game!

Adventures of Teen Bloggers

In order to test all of those technologies and how they might play together we built pubsocketd which is a simple daemon written in Go that subscribes to a Pub/Sub channel and ferries those messages to a browser using Websockets (WS).

  1. Listen for messages from a specific (Redis) Pub/Sub channel
  2. Accept incoming WS requests
  3. Shuttle any messages from the Pub/Sub channel to all the open WS connections

That’s it. It is left up to WS clients (your web browser) to figure out what to do with those messages.

$> ./pubsocketd -ws-origin=https://example.com
2014/08/01 17:23:38 [init] listening for websocket requests on 127.0.0.1:8080/, from https://example.com
2014/08/01 17:23:38 [init] listening for pubsub messages from 127.0.0.1:6379 sent to the pubsocketd channel
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][handshake] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][request] OK
2014/08/01 17:23:44 [10.20.30.40][10.20.30.40:56401][connect] OK
2014/08/01 17:23:53 [10.20.30.40][10.20.30.40:56401][send] OK
2014/08/01 17:24:05 [10.20.30.40][10.20.30.40:56401][send] OK
# and so on...

The “step 0” in all of this is the ability for the collections website itself to connect to a Redis server and send a Pub/Sub message, whenever someone views an object, to the same channel that the pubsocketd server is listening to.

ws-liden

This allows for a nice clean separation of concerns and provides a simple way for related, but fundamentally discrete, applications to interact without getting up in each other’s business.

Given the scope of the project we probably could have accomplished the same thing, with less scaffolding, using Server-Sent Events (SSE) but this was as much an exercise designed to get our feet wet with both WebSockets and Go so it’s been worth doing it the “hard way”.

Matthew Rothenberg, creator of the popular EmojiTracker, was nice enough to open-source the Go-based SSE server endpoint he wrote to feed his application and we may eventually re-write The Medium is the Message, or future applications like it, to use that.

Screen Shot 2014-08-04 at 4.11.45 PM

We’ve open-sourced the code for pubsocketd under a BSD license and we welcome suggestions, patches and (gentle) clue-bats:

https://github.com/cooperhewitt/go-pubsocketd

Enjoy!

Downgrading your website (or why we are moving to WordPress)

Below are the slides and most of what I said at the 2014 Museums & The Web conference in Baltimore, Maryland.

“I believe that if we think first about people and then try, try, and try again to prototype our designs, we stand a good chance of creating innovative solutions that people will value and enjoy.” — Bill Moggridge

MW2014.002

Let me begin by telling you a little story about a small museum that sat along 5th Ave. on New York’s Upper East Side. This is of course a largely fictional story. Names, and actual events have been changed.

MW2014.003
This is the story of a little museum with big aspirations. Long ago this little museum had a website. It had a webmaster, and it published a blog. It even had a whole bunch of microsites, flash driven exhibition sites, event calendars and archives. In fact, it won a few Webby’s.

MW2014.004
The website was very much the product of an organization trying to get the job done. And, it succeeded in this effort. Staff members would produce content on their company issued PCs and would then hand these documents off to the museum’s webmaster who would convert them into HTML and Javascript. The webmaster would press a specially designed “button” which would upload the new content to the little museum’s web servers where the pages would be served and maintained by a giant umbrella organization that had close ties to the government.
MW2014.005
With a single webmaster managing the entirety of the museum’s web properties, the little staff of this museum faced an inevitability. It was just too much work for the webmaster to do alone. Even if they allowed the webmaster an apprentice, the workload would continue to grow, and the little museum’s website would suffer. Eventually, they all realized they would have to move towards a system that would allow the entire staff to collaborate more efficiently.

Eventually, they realized they would need a content management system.

MW2014.006
There were many options out there already, and the little museum’s webmaster took stock in as many of them as he could. Meetings were had, and budgets were considered. The “committee to select a content management system” was formed, and consultants were brought in.

Wire frames were presented, and scopes of work were proposed, but the committee remained vigilant and put off making a decision as long as it could. They simply never felt like they had the right solution placed in front of them.

There was a lot at stake and many facets and bullet points drove them to a moment of indecision. There was due-dilligence due to their “mothership” in Washington, and there were “rights in data” clauses to be haggled over, with threats of time in a Federal prison always on everyone’s minds. Eventually the committee was disbanded and the project was put on hold.
MW2014.007
Time went on and the little museum’s website continued to shine as the public face of the institution. It continued to be updated with more and more content, and eventually the little museum even invested a fair amount of money in putting their collections online for all to see.

The word on the street was that this little museum’s website was starting to blow up, more and more people were beginning to rely on it as source of good information, and the time had come to re-think the idea of re-building.
MW2014.008
The webmaster at the little museum was doing his best, running around from staff member to staff member, trying to understand what had been going on all this time. One day he had the fortune to sit in on a meeting with a prominent weblogger and asked him a very important question.

“What CMS do you think we should chose” the webmaster said.

“CMS’s are all basically the same”, said the blogger, “just chose one you like and don’t look back.”

The webmaster took this to heart and selected three CMS systems that were free and easy to set up. He presented these to the higher ups and after a couple of hours of debate and one technical review board meeting, the webmaster had his answer.

MW2014.009
Drupal would be the content management system for the little museum. Drupal.

MW2014.010
The end, well sorta.

Most of that actually happened at the Cooper-Hewitt. The team eventually just had to pick a system, (without a whole lot of experience with the product itself) and kind of just “go for it.” From that point on, the staff at Cooper-Hewitt were living with Drupal. Drupal, a word almost none of the staff had ever heard before became, in less than a few months, a dirty word, spoken in fits of anger and dismay.

Now, before we go any further, it really needs to be said out loud that Drupal is really fine piece of software that has grown and evolved into a very sophisticated and well thought out framework for building websites. It has a rich community of developers and enthusiasts behind it and it powers some of the most popular websites on the planet. It’s used by giant companies far and wide, governments, and educational institutions all over. As well, our team in Washington has come a really long way in learning how to host and maintain Drupal based websites and presently, many of the latest Smithsonian websites are being built on Drupal. There is nothing intrinsically wrong with Drupal, we just realized, after a long time, it wasn’t for us.
MW2014.011
I’m Micah Walter. I’m part of the nerd crew at Cooper-Hewitt. We are part of the Smithsonian ( that umbrella corporation in Washington )… and we are in the middle of a re-launch of our physical museum, as well as our digital presence.
MW2014.012
Cooper-Hewitt started it’s life with a CMS by installing a copy of Drupal 6. Shortly thereafter, we installed some modules, and more modules, and more…modules. Eventually we had a pretty awesome website. We hired an engineering team to convert the look and feel of the old website into a Drupal theme, and we “went live.” Cooper-Hewitt was on a CMS and it felt good.

MW2014.013

random extra slide

MW2014.014
Some time during this process we sat down with all the staff members to show off our new CMS. We took them on a tour of the system and poked around with a few of the CMSs features, with the hopes of getting staffers excited about the whole thing. The staff seemed to respond positively, and after a couple of months of configuring Drupal’s permissions matrix, we gave out login details to a select number of “power users” around the museum. A few of these power users got it right away and were off and running, updating their existing webpages when they needed to. It wasn’t too bad actually. Staff could easily log in, search around for the relevant content and make minor changes to their pages. The problems started to appear when they wanted to do just slightly more. A staffer wasn’t able to easily upload an image to Drupal. The image had to first be sent to our graphics person who would convert it to a jpeg, resize it for the web and then it would be sent to the webmaster who would upload it to an Amazon S3 server. Once this was done the webmaster would email the URL to the image back to the staffer who would then try and figure out how to insert it into their page.

Another issue arose when staffers tried to author new pages. It was simply difficult for them to understand how the new page would find its way within the information architecture that was already in place. How were they to set the new page’s URL and menu items. Those kinds of tasks inevitably wound up back on the webmaster’s desk.
MW2014.015
For the most part, notwithstanding a few hiccups here and there, Drupal 6 ran pretty smoothly. Staffers were able to distribute the workload a little more than they used to, and that was considered a good thing. But, about a year into it, a grant became available and the notion of running a daily blog about our objects turned into a reality. Object of the Day was born, and we had our work cut out for us.
MW2014.016
Object of the Day went through many stages of evolution, eventually winding up as an institutional blog authored by staffers, students in our Masters program, docents, and even teens and high school kids interested in design. Every day another object from the collection was chosen and a post was written about it and published to our blog. Great pains were taken to ensure we considered the collection record, tags, the authors vitals and more. We met in committee meetings over and over and eventually worked out a plan to allow us to manage project. The end result would be a new post about a different object, every day.

In the beginning we toyed around with the idea of Object of the Day being run on a separate platform. We considered Tumblr, WordPress.com and even Blogger. But in the end, we decided we would put our new CMS to the test and put ourselves through the process of managing a daily blog with Drupal.
MW2014.017
To accomplish this, the digital team realized we’d probably be wise to migrate to Drupal 7 in order to take advantage of its much improved back end user interface. So, with Object of the Day as catalyst, we moved ahead with plans to migrate our Drupal installation to D7. Consultants were hired, interns were enslaved and the whole process took just a few months. In the end we wound up with a fresh installation of Drupal 7, and about 20 or so contributed modules.
MW2014.018
In parallel to this migration project we began to meet with staff members and work out the details of how this Object of the Day project would go down. We discussed a variety of organizational schemes, we talked about available resources, and how far the grant money might take us. In the end we came up with a pretty simple plan. Each month, one staff member would be the “editor” for Object of the Day. He or She would be responsible for collecting all the entries for the month, making sure they were entered into Drupal, edited and fact checked. They would then get scheduled to be published automatically on their specific day. This included many spreadsheets, checklists and meetings. It was of course, great user research for me and my team.

Once we had D7 up and running staff members started to get the hang of it. They started logging in and authoring content. And then the problems started to happen.
MW2014.019
We already had about 1500 pages ( Drupal calls these nodes ) in the CMS. They were mostly static web pages about one program or another, or blog posts from the old days, or exhibition archives and other kinds of historic content. This was just fine as that content rarely got touched or updated. It was also fine when we wanted to add a fresh blog post or a new static page every once in a while…

The problem though was what happened when the monthly Object of the Day editor had to log in to start work on their thirty some posts for the upcoming month. It was nearly impossible for them to collect all the posts in one place within the CMS so that they could see what had been entered, what was finalized and what was ultimately scheduled. This was a major first hiccup and the digital team worked out a solution involving a number of custom Drupal views that would allow the editors to more easily see what they were working on. It kind of worked, but we could tell that it was a hack solution to a real problem.

The end result was, they lived with it. They lived with the system, learned to hate it, and just didn’t talk about it much. Drupal became this beast that they just came to terms with.
MW2014.020
Time went one, and we all learned to work with Drupal. Many of the staff members became proficient enough to get by, and the calls to the webmaster desk lessened. But, the problems hadn’t gone away. In fact our little experiment to try and get staff members excited about authoring content on the web had actually backfired. Now, staff members authored content for Object of the Day because it was part of their job, listed in their work plans and reviewed during their performance evaluations at the end of each year. They hated it.

Meanwhile, Object of the Day took off. The public facing version of the blog became a big success. It received additional funding for a second year with the idea around the Sr. Management table being that it would go on forever. It was for a time our most popular page on the site.
MW2014.021
If there is one truth we have learned about maintaining a website using a CMS its that you’ll eventually jump ship and switch to something else. In fact you may do this operation again and again. Its just the nature of the beast—the grass is always greener.

When we realized we needed to jump ship, we took to heart all the feedback we got from our content creators. We realized that what they really wanted were pleasant, easy to work with tools that allowed them to feel empowered. Tools that gave them a sense of authority, and made them feel good about the work they were doing. Like it was a way for them to communicate with the world all the important things they had going on.

In the end we chose WordPress. We looked at lots of options. We thought about even simpler options like a static site generator, or hmm, Squarespace? Could a museum run their entire website on Tumblr? All of these options afforded us with a great user experience, but seemed to trade of the ability to be flexible enough for our institutions needs. It really depends on the needs of each institution.

We searched far and wide. But we kept coming back to WordPress. It was familiar to everyone. Many of the staff already had their own WordPress blogs. WordPress gave us a nice balance between having the ability to create a sophisticated website and also being simple enough to use. In fact, while I was writing tools to migrate our content to WordPress, we realized that its more simplified system allowed us to re-organize our content, making the site easier to navigate. It’s not that we couldn’t do this in Drupal, but over time, Drupal just got out of control, because it let us.
MW2014.022
We realized through the Object of the Day project that it was our CMS that stood in the way of success. The content was already good, the audience was already there. We just needed a way to get our own staff excited about doing it. It shouldn’t be hard. It should be really easy and really fun to do. WordPress lets our staff get excited about the work they are doing. It gives them a simple to use, enjoyable writing experience, and for the editors, we found some really great plugins that let them manage all the content without feeling overwhelmed. Thats really why we chose WordPress.

We kind of think of it as a downgrade on the technical side of things, but its definitely an upgrade when it comes to usability.

The end.

Postscript

There was some good discussion following the talk. A few things of note that were brought up included how our staff already had some experience with WordPress via our DesignOther90.org website, our use of EditFlow for notifications and calendaring/scheduling of content and Pressbooks to aid with the production of our eBooks.

We also talked a little about hindsight…