I’ve been visiting Cooper Hewitt for the last few months designing a new way of exploring the collection using timelines and tags. (For more background and details of the project, I’ve written a post here).
I’m set up with a prototype on a touchscreen in the Cooper Hewitt galleries today seeking impressions and feedback from visitors. Do drop in and have a play! I would love to hear your thoughts.
‘Black & white’ timeline detail, Cooper Hewitt data
“A physical museum is itself a sort of data set — an aggregation of the micro in order to glimpse the macro. One vase means little on its own, beyond perhaps illustrating a scene from daily life. But together with its contemporaries, it means the contours of a civilization. And when juxtaposed against all vases, it helps create a first-hand account of the history of the world.”
From ‘An Excavation Of One Of The World’s Greatest Art Collections’
“The ability to draw on historic examples from various cultures, to access forgotten techniques and ideas and juxtapose them with contemporary works, creates provocative dialogues and amplifies the historic continuum. This range is an asset few museums have or utilize and provides a continuing source of inspiration to contemporary viewers and designers.”
From ‘Making Design: Cooper Hewitt, Smithsonian Design Museum Collection’ p.28
I’m 4 months into a 5-month fellowship at Cooper Hewitt working with their digitised collection. I’m normally based in London where I’m a PhD student in Innovation Design Engineering at the Royal College of Art supervised by Stephen Boyd Davis, Professor of Design Research. My PhD topic is designing and building interactive timelines for exploring cultural data (digitised museum, archive and library collections). And, in London, I have been working with partners at the V&A, the Wellcome Library and the Science Museum.
The key issue in my PhD is how we ‘make sense’ of history using interactive diagrams. This is partly about visualisation of things we already know in order to communicate them to others. But it is also about visual analytics – using visuals for knowledge discovery. I’m particularly interested in what connects objects to one another, across time and through time.
I am very fortunate to be spending time at Cooper Hewitt as they have digitised their entire collection, more than 200,000 objects, and it is publicly available through an API. The museum is also known for its pioneering work in digital engagement with visitors and technical innovations in the galleries. It is a privilege to be able to draw on the curatorial, historical and digital expertise of the staff around me here for developing and evaluating my designs.
As I began exploring the collection API, I noticed many of the object records had ‘tags’ applied to them (like ‘birds’, ‘black & white’, ‘coffee and tea drinking’, ‘architecture’, ‘symmetry’ or ‘overlapping’). These tags connect diverse objects from across the collection: they represent themes that extend over time and across the different museum departments. This tagging interested me because it seemed to offer different paths through the data around shape, form, style, texture, motif, colour, function or environment. (It’s similar to the way users on platforms like Pinterest group images into ‘boards’ around different ideas). An object can have many tags applied to it suggesting different ways to look at it, and different contexts to place it in.
Where do these tags come from? Here, the tags are chosen and applied by the museum when objects are included in an exhibition. They provide a variety of ways to think about an object, highlighting different characteristics, and purposely offer a contrasting approach to more scholarly descriptive information. The tags are used to power recommendation systems on the museum collection website and applications in the galleries. They constitute both personal and institutional interpretation of the collection, and situate each item in a multi-dimensional set of context.
Some examples of tags and tagged objects in the Cooper Hewitt collection
I was interested to trace these themes over the collection and, since objects often have multiple tags, to explore what it would be like to situate or view each object through these various lenses.
The temporal dimension is important for identifying meaningful connections between items in cultural collections, so my first thoughts were to map tagged objects by date.
I’m working on a prototype interface that allows users to browse in a visually rich way through the collection by tags. A user starts with one object image and a list of the tags that apply to that object. They may be interested to see what other objects in the collection share a given tag and how the starting image sits in each of those contexts. When they click a tag, a timeline visualisation is generated of images of the other objects sharing that tag – arranged by date. The user can then click on further tags, to generate new timeline visualisations around the same starting image, viewing that image against contrasting historical narratives. And if they see a different image that interests them in one of these timelines, they can click on that image making it the new central image with a new list of tags through which to generate timelines and further dig into the collection. By skipping from image to image and tag to tag, it’s easy to get absorbed in exploring the dataset this way; the browsing can be undirected and doesn’t require a familiarity with the dataset.
‘Coffee and tea drinking’ timeline: designs in the collection stretch from 1700 to the present with a great diversity of forms and styles, elaborate and minimal.
‘Water’ timeline. Here there are many different ways of thinking about water: images of garden plans with fountains and lakes from the 16th–18th Century, or modern interventions for accessing and cleaning water in developing countries. Contrasting representations (landscape painting to abstracted pattern) and functions (drinking to boating) stretch between.
‘Water’ timeline, detail
‘Space’ timeline: 1960s ‘space age’ souvenirs (Soviet and American) precede modern telescope imaging. And a 19th Century telescope reminds us of the long history of mankind’s interest in space.
I’m plotting the object images themselves as data points so users can easily make visual connections between them and observe trends over time (for instance in how an idea is visually represented or embodied in objects, or the types of objects present at different points in time). The images are arranged without overlapping, but in an irregular way. I hoped to emulate a densely packed art gallery wall or mood board to encourage these visual connections. Since the tags are subjective and haven’t been applied across the whole collection, I also felt this layout would encourage users to view the data in a more qualitative way.
While most of the post-1800 objects in the dataset have a date/date span expressed numerically, pre-1800 objects often only have date information as it would appear on a label: for example ‘Created before 1870s’, ‘late 19th–early 20th century’, ‘ca. 1850’ or ‘2012–present’. My colleagues at the Royal College of Art have previously written about the challenges of visualising temporal data from cultural collections (Davis, S.B. and Kräutli, F., 2015. The Idea and Image of Historical Time: Interactions between Design and Digital Humanities. Visible Language, 49(3), p.101).
In order to process this data computationally, I translated the label date text to numbers using the yearrange library (which is written for working with curatorial date language). This library works by converting, for example, ‘late 18th century’ to ‘start: 1775, end: 1799’ For my purposes, this seems to work well, though I am unsure how to deal with some cases:
How should I deal with objects whose date is ‘circa X’ or ‘ca. X’ etc.? At the moment I’m just crudely extending the date span by ±20 years.
How should I deal with ‘before X’? How much ‘before’ does that mean? I’m currently just using X as the date in this case.
The library does not translate BC dates (though I could make adjustments to the code to enable this…) I am just excluding these at the moment.
There are some very old objects in the Cooper Hewitt collection for example ‘1.85 million years old’, ‘ca. 2000-1595 BCE’ and ‘300,000 years old’. These will create problems if I want to include them on a uniformly scaled timeline! Since these are rare cases, I’m excluding them at the moment.
Skewing the timeline scale
The Cooper Hewitt collection is skewed towards objects dating post-1800 so to even out image distribution over the timeline I am using a power scale. Some tags, however, – such as ‘neoclassical’ or ‘art nouveau’ – have a strong temporal component and the power scale fails to even out image distribution in these cases.
How are the images arranged?
My layout algorithm aims to separate images so that they are not overlapping, but still fairly closely packed. I am using a rule that images can be shifted horizontally to avoid overlaps so long as there is still some part of the image within its date span. Since images are large data markers, it is already not possible to read dates precisely from this timeline. And the aim here is for users to observe trends and relationships, rather than read off exact dates, so I felt it was not productive to worry too much about exact placement horizontally. (Also, this is perhaps an appropriate design feature here since dating cultural objects is often imprecise and/or uncertain anyway). This way the images are quite tightly packed, but don’t stray too far from their dates.
‘Personal environmental control’ timeline: a dry juxtaposition of these decorated fans against modern Nest thermostats.
‘Foliate’ timeline, detail
I’ve also tried to spread images out within date spans, rather than just use the central point, to avoid misleading shapes forming (such as a group of objects dating 18th century forming a column at the midpoint, 1750).
Things to think about
The layout algorithm slows when there are many (100 or more) images visualised. Is there a more efficient way to do this?
I’m considering rotating the design 90° for web-use; I anticipate users will be interested to scroll along time, and scrolling vertically may improve usability with a mouse.
Would a user be interested to see different timeline visualisations next to each other, to compare them?
It could be interesting to apply this interface to other ways of grouping objects such as type, colour, country or other descriptor.
I need to build in a back button, or some way to return to previously selected images. Maybe a search option for tags? Or a way to save images to return to?
This visualisation design relies on curator-applied tags and, therefore, would be difficult to apply to other datasets: might there be a way to automate part of this? Maybe using computer vision technologies?
Since objects are only tagged if they are featured in an exhibition, the interface misses many relevant objects in the collection when visualising a theme. For instance there are 23 objects tagged ‘Japanese’, but keyword searching the collection for ‘Japanese’ returns 453 objects. While the interface works well with the current quantities of images (up to about 100), what changes to the design would be needed to increase this number?
What about grouping tags together? There is no dictionary or hierarchy to them so some are very similar, for instance: ‘floral’, ‘floral bouquets’, ‘floral swag’, ‘flower’, ‘flowering vine’, and ‘flowers’. Though it can be interesting to see the subtle differences in how related tags have been applied. For instance: ‘biomorphic’ is more often applied to modern objects; ‘nature’ is generally applied to depictions of nature such as landscape paintings; while ‘organic’ is applied in a more abstract sense to describe objects’ form.
I’m at a stage where I’d like to get user feedback from a range of audiences (general and scholarly) to explore some of these questions.
This is very much a work in progress, and feedback is welcome! (email@example.com to get in touch by email)
Launching alongside the long-awaited Jazz Age exhibition, the exciting new large-print label feature on our collection site is a key part of Cooper Hewitt’s ongoing accessibility initiative.
The original goal for the large-print labels project was to create a physical manifestation of our exhibition label content that could be distributed to museum visitors by our Visitor Experiences team upon request. Graphic designer Ayham Ghraowi, one of our 2015 Peter A. Krueger summer interns, worked closely with Micah Walter and Pamela Horn, director of Cross-Platform Publications, to design and develop a prototype application, the Label Book Generator. This prototype, built with Python using the Flask framework and originally hosted on Heroku, produced printable label booklets that met the museum’s standards for accessibility. This prototype was the first big step toward providing an accessible complementary experience of exhibition content for our visitors with low vision.
This fall, using the CSS stylesheets that Ayham developed for his prototype, Digital & Emerging Media experimented with a number of possible ways to deliver this large-print content to VE in production. Ultimately, we decided that rather than establishing a dedicated application for large-print label booklets, integrating the large-print labels into our collection site would be the best solution. This would not only allow us to rely on existing database connections and application code to generate the large-print label documents, but it would also corral all of our exhibition content under one domain, reducing any complications or barriers to discovery for VE and visitors alike. And by providing the label chats for each object, which are otherwise not included in any of our digital content, the large-print pages serve to supplement the main exhibition pages for our website visitors as well, adding a deeper layer of engagement to both the web and in-gallery experiences.
As of today, when a user visits any exhibition page on our collection site or main website, they’ll see a new option in the sidebar, inviting them to view and print the exhibition labels. The large-print page for each exhibition includes A-panel text alongside all available object images and label chats. If an exhibition is organized into thematic sections, this is reflected in the ordering of the large-print labels, and B-panel text is included alongside the section headers.
To generate these pages, I created a new exhibitions_large_print PHP library, which leverages the existing exhibitions, objects, and exhibitions_objects libraries to assemble the necessary information. Because we want to be able to print the large-print pages as one document for in-gallery use, our large-print pages cannot be paginated, unlike our main exhibition pages. This presents no issues for smaller exhibitions, like the ongoing Scraps. But very large exhibitions — like Jazz Age, for example, with over 400 objects — require way too much memory to be processed in production.
To get around this issue, I decided to assemble the large-print label data for certain “oversized” exhibitions in advance and store it in a series of JSON files on the server. A developer can manually run a PHP script to build the JSON files and write them to a data directory, each identified by exhibition ID. The ID for each oversized exhibition is added to a config variable, which tells our application to load from JSON rather than query the database.
For greater flexibility based on individual needs, our large-print pages include clear and easy-to-locate UI tools for printing and adjusting font size. A user can select one of six default font sizes ranging from 18pt to 28pt, triggering some simple JS to reset the body font size accordingly. Internally, we can use the pt query string parameter to enable large-print links to open the page with a specific default font size selected. For example, navigating to the large-print label page from an exhibition page using the Large Print sidebar icon opens the page at 24pt font.
Visitor Experiences has prepared a number of printed large-print label booklets for our Jazz Age exhibition, available upon request at the front desk. Visitors may also print this document at home and bring it along with them, and any individual can access this responsive page on their desktop or mobile device.
We’ll be keeping an ear out for suggested improvements to this feature, and we’re excited to see how our visitors are engaging with these large-print labels on the web and in the galleries!
In addition to launching the large-print label pages, we’ve added an accessibility link to our header and footer navigation across all our sites, where visitors can learn more about the growing list of access services that Cooper Hewitt offers alongside large-print label booklets.
If you’ve taken a look at Cooper Hewitt’s website in the past few months, you may have noticed some subtle changes to our main navigation. Until recently, all visitors to the main museum site saw exactly the same options in the main navigation. Anyone could (and still can) use the nav to retrieve a visit, explore special events and public programs, visit the blog or the shop, and travel to our collection and tickets sites to take care of business over there.
But previously, our main site could not tell whether users were signed into their Cooper Hewitt accounts. And now?
We know who they are, everywhere! From anywhere within the cooperhewitt.org domain, a user can sign into their account or create a new one, and we’ll remember them as they browse. We’re seriously excited about this change.
The Cooper Hewitt account already plays an important role in helping our visitors reconnect with the experiences they’ve had inside the museum. When a visitor returns home from Cooper Hewitt, enters their ticket shortcode, and explores the objects they collected with the Pen, their user account allows them to store this visit so that they can come back to it again and again. Having an account also allows visitors to save collection objects to a digital shoebox, take personal notes on things they’ve seen at the museum, and download or gaze fondly at their creations from the Immersion Room and the interactive tables. For now, that’s about it — and as account experiences go, that’s pretty cool. But lately at Labs, we’ve been thinking about what else we might be able to do for visitors who choose to sign up for a Cooper Hewitt account.
The first step we’ve taken toward an improved Cooper Hewitt account experience is to make creating or accessing an account as seamless and consistent as possible. In reviewing our main navigation this fall, we realized that while our collection and ticketing sites made clear how you could connect to your account, our main site presented some substantial barriers. Finding your way to your account required locating the correct sub-menu item and navigating a circuitous path to our collection site, all before any signing in could commence. This required intent and some perseverance on the part of the visitor — not to mention prior knowledge that a Cooper Hewitt account is a thing one can have. (And sad as this is for us, we know there are at least a handful of you out there who don’t know this. Try it! You’ll like it.)
We wanted to make it much easier for visitors to access their accounts and to use the cool features available only to signed-in users. To do this, we need to know a user’s login status everywhere in our domain. This helps us get users where they really want to go. Is this an existing user who just wants to get to their saved visits quickly? Is this a new user who wants to sign up for an account, so that they can get started exploring the collection online? And thinking further down the line: does this user want to see all the tickets they’ve purchased, or manage their membership status and newsletter subscriptions, or explore data around their interactions with the collection — starting from anywhere in the Cooper Hewitt family of websites?
This unified experience requires Single Sign-On (SSO) authentication across all our sites. SSO is a method of user authentication that allows a collection of sites to share awareness of a user’s data, so that the user doesn’t need to sign in repeatedly as they move from site to site within the network.
This is a perfect method for us, because each of the websites we run is a separate application hosted on a subdomain of cooperhewitt.org. For now, this includes our main website, collection website, ticketing website, and a handful of other microsites. It also includes the dedicated instance that takes care of all SSO logic, which we call You. For security reasons, You is the only one of our servers that communicates directly with our user accounts database. This data is exposed internally through a private REST API, which allows us to verify and update user information from any of our sites with an authorized access token. Regardless of a user’s starting point within the Cooper Hewitt ecosystem, they’ll have to communicate with You to sign up for, sign into, or sign out of their account, before returning from whence they came.
Our collection and ticket sites had been using SSO for a while, but our main website was not. Implementing SSO on our main site would be a challenge, because while most of our sites are custom applications built on Flamework, an open-source PHP framework, our main site is actually a WordPress site dressed in the same clothes.
First, a little background on how the SSO flow currently works within a Flamework PHP app, of which our collection site is a good example. Every page load on the collection site triggers login_check_login(), a PHP function to check the user’s current login status, which:
Looks for the user ID in a global config variable, $GLOBALS['cfg']['user']['id'].
If the user ID is not set, looks to the browser for an authorized cookie, which if found we decrypt using mcrypt. (More in a moment on how that cookie gets there.)
Decodes the decrypted cookie, which contains a user’s numeric ID and base64-encoded password.
Queries our user_accounts database to retrieve user data by ID.
Compares the decrypted password from the cookie to the password pulled from our database.
If everything checks out, we confirm the user as signed in and set $GLOBALS['cfg']['user']['id'] appropriately. The user will now pass future login checks, so long as the authorized cookie remains in their browser. If any of these tests fail, the global variable remains empty, and the user is considered signed out.
On every page that requires a user to be signed in — your saved visits page, for example — we run login_ensure_loggedin(), which looks for $GLOBALS['cfg']['user']['id'] and redirects signed-out users to the collection sign-in page, along with a redir parameter pointing to the current URI.
When the user lands on the collection sign-in page, we build a request parameter that contains this redir param, alongside encoded server data and information about the SSO-enabled receiver (in this case, the collection site). The user is then kicked over to the sign-in page on You with these params in tow.
Once the user lands on the sign-in page, we validate the request data sent over by the receiver (here, collection) to ensure that the user has come to You from a friendly place. Then we double check that the user is actually signed out by running login_check_login() again , expecting another negative response. Finally, if all of these checks pass, we display the sign-in form to the user.
The user fills out their email and password and submits the form, which we validate both client-side and server-side using the You API. Once a user submits a valid form, we sign them in using login_do_login(), which:
Generates an authorized cookie, encrypts it using mcrypt, and secures it so that it’s visible only over HTTPS.
Sets the cookie in the user’s browser, giving it the name stored in $GLOBALS[‘cfg’][‘auth_cookie_name’].
Looks at the redir parameter passed down by the SSO receiver, and sends the user back to their starting point.
Now the user is signed in, cookied, and passing all login checks back on the collections site — and on any of our SSO-enabled sites. Users who need to sign up for an account go through a similar process that also includes writing new information to the user_accounts database. This flow is fairly seamless, in part because Flamework apps understand each other well and can communicate whatever pieces of data are needed to handle users securely.
The real work from here was providing our WordPress (WP) site with the data and logic necessary to establish a similar flow. My task was to write a plugin that would enable our WP site to:
Check whether or not a user is signed in.
Allow signed-in users to access their account page directly from the main nav.
Prompt signed-out users to sign in or sign up, also from the main nav.
Adapt the items shown in the main nav based on a user’s login status.
Back in 2015, Micah wrote wp-cooperhewitt, a WP plugin for talking with our collections API. This plugin has been live on our site since then, and right now we’re using it to enable a number of fun features on our main site. We decided that I would simply expand this plugin to take care of the SSO business as well. (And yes, next steps for this project should include breaking apart this plugin into a handful of plugins with separate purposes.)
Doing it the Wrong Way
My first attempt at SSO implementation registered an sso_check_login() action during page load, just after the theme was set up, using the after_theme_setup hook. This function relied on PHP’s $_COOKIE superglobal variable and a handful of WP admin dashboard settings to locate an authorized cookie and run it through a series of checks nearly identical to those in our Flamework code. Once we’d validated the signed-in user (or not), we set PHP’s $_SESSION (another superglobal) to reflect the user’s login status, and our header and footer templates reacted accordingly.
Rather than trying to establish a direct database connection here, I adapted the cooperhewitt_api_call() function that Micah had already created, so that we could accommodate requests to the You API as part of verifying the user’s identity.
Despite leaning heavily on what I later learned was some bad WordPress practice, this solution seemed to work on our development server. And then it worked on production, too, and this was cause for much celebration and the eating of an actual cookie — until I signed out of WordPress, and everything broke. Once I was signed out of my WP admin account, the code I’d written was suddenly unable to access anything in the $_COOKIE variable, and our main site could no longer tell whether I was signed into my Cooper Hewitt account.
Huh? I could see the elusive cookie in my browser. I had tested my plugin code on our dev server, both signed into and signed out of my WP admin account. What was different about production? Some irritated Googling revealed that WPEngine, our WP hosting service, uses page caching to improve site performance. This is a great thing on many levels, but it also means that for a non-admin user, site pages are often served from the cache. Any PHP code run upon loading those pages won’t be rerun, so $_COOKIE and $_SESSION weren’t being touched.
Doing it a Better Way
The best way to get around the limitations imposed by page caching, WPEngine suggested, was to tap into WP’s built-in Ajax support. On page load, we’d have to grab the appropriate cookie on the front end and post an Ajax request to the PHP server, which would then trigger a custom action to check for an authenticated user.
First, we enqueue the cooperhewitt_sso script as part of the WP initiation process. Then we expose to the script a local variable ajax_object, which contains the action URL and the sought-after cookie name.
When run on page ready, the cooperhewitt_sso script passes the browser cookie to the server-side sso_check_login function and awaits a response containing a signed-in user’s username. Based on this response, the main nav bar elements are modified to expose the appropriate options — either Sign In / Sign Up or a personalized greeting that links the user to their account page.
Today is my last day at Cooper Hewitt, Smithsonian Design Museum. It’s been an incredible seven years. You can read all about my departure over on my personal blog if you are interested. I’ve got some exciting things on the horizon, so would love it if you’d follow me!
Before I leave, I have been working on a couple last “parting gifts” which I’d like to talk a little about here. The first one is a new feature on our Collections website, which I’ve been working on in the margins with former Labs member, Aaron Straup Cope. He did most of the work, and has written extensively on the topic. I was mainly the facilitator in this story.
Zoomable Object Images
The feature, which I am calling “Zoomable Object Images” is live now and lets you zoom in on any object in our collection with a handy interface. Just go to any object page and add “/zoom” to the URL to try it out. It’s very much an experiment/prototype at this point, so things may not work as expected, and not ALL objects will work, but for the majority that do, I think it’s pretty darn cool!
Notice that there is also a handy camera button that lets you grab a snapshot of whatever you have currently showing in the window!
Click on the camera to download a snapshot of the view!
This project started out as a fork of some code developed around the relatively new IIIF standard that seems to be making some waves within the cultural sector. So, it’s not perfectly compliant with the standard at the moment ( Aaron says “there are capabilities in the info.json files that don’t actually exist, because static files”), but we think it’s close enough.
You can’t tell form this image, but you can pinch and zoom this on your mobile device.
To do the job of creating zoomable image tiles for over 200,000 object we needed a few things. First, we built a very large EC2 server. We tried a few instance sizes and eventually landed on an m4.2xlarge which seemed to have enough CPU and RAM to get the job done. I also added a terabyte of storage to the instance so we’d have enough room to store all the images as we processed them.
Aaron wrote the code to download all of our high res image files and process them into tiles. The code is designed to run with multiple threads so we can get the most efficiency out of our big-giant-ec2 server as possible (otherwise we’d be waiting all year for the results). We found out after a few attempts that there was one major flaw in the whole operation. As we started to generate thousands and thousands of small files, we also started to run up to the limit of inodes available to us on our boot drive. I don’t know if we ever really figure this out, but it seemed to be that the OS was trying to index all those tiny files in some way that was causing the extra overhead. There is likely a very simple way to turn that off, but in the interest of getting the job done we decided to take an alternative route.
Instead of processing and saving the images locally and then transferring them all to our S3 storage bucket, Aaron rewrote the code so that it would upload the files as they were processed. S3 would be their final resting place in any case, so they needed to get there one way or another and this meant that we wouldn’t need so much storage during processing. We let the script run and after a few weeks or so, we had all our tiles, neatly organized and ready to go on S3.
Last but not least, Aaron wrote some code that allowed me to plug it into our collections website, resulting in the /zoom pages that are now available. Woosh!
The second little gift is around some work I’ve been doing to try and make developing and maintaining code at Cooper Hewitt a tiny bit better.
Since we started all this work we’ve been utilizing a server that sits on the third-floor (right next to Josue) called, affectionately, “Bill.” Bill (pictured above) has been our development machine for many years, and has also served as the server in charge of extracting all of our collection data and images from TMS and getting them published to the web. Bill is a pretty important piece of equipment.
The pros to doing things this way are pretty clear. We always have, at the ready, a clone of our entire technology stack available to use for development. All a developer needs to do is log in to Bill and get coding. Being within the Smithsonian network also means we get built in security, so we don’t need to think about putting passwords in place or trying to hide in plain site.
The cons are many.
For one, you need to be aware of each other. If more than one developer is working on the same file at the same time, bad things can happen. Sam sort of fixed this by creating individual user instances of the major web applications, but that requires a good bit of work each time you set up a new developer. Also, you are pretty much forced to use either Emacs or Vi. We’ve all grown to love Emacs over time, but it’s a pain if you’ve never used it before as it requires a good deal of muscle memory. Finally, you have to be sitting pretty close to Bill. You at least need to be on the internal network to access it easily, so remote work is not really possible.
To deal with all of this and make our development environment a little more developer friendly, I spent some time building a Vagrant machine to essentially clone Bill and make it portable.
Vagrant is a popular system amongst developers since it can easily replicate just about any production environment, and allows you to work locally on your MacBook from your favorite coffee shop. Usually, Vagrant is setup on a project by project basis, but since our tech stack has grown in complexity beyond a single project ( I’ve had to chew on lots of server diagrams over the years ), I chose to build more of a “workspace.”
Essentially, there is a simple Vagrantfile that builds the machine to match our production server type, and then there is a setup.sh script that does the work of installing everything. Within each project repository there is also a /vagrant/setup.sh script that installs the correct components, customized a little for ease of use within a Vagrant instance. There is also a folder in each repo called /data with tools that are designed to fetch the most recent data-snapshots from each application so you can have a very recent clone of what’s in production. To make this as seamless as possible, I’ve also added nightly scripts to create “latest” snapshots of each database in a place where they Vagrant tools can easily download them.
This is all starting to sound very nerdy, so I’ll just sum it up by saying that now anyone who winds up working on Cooper Hewitt’s tech stack will have a pretty simple way to get started quickly, will be able to use their favorite code editor, and will be able to do so from just about anywhere. Woosh!
Lastly, it’s been an incredible journey working here at Cooper Hewitt and being part of the “Labs.” We’ve done some amazing work and have tried to talk about it here as openly as possible–I hope that continues after I’m gone. Everyone who has been a part of the Labs in one way or another has meant a great deal to me. It’s a really exciting place to be.
There is a ton more work to do here at Cooper Hewitt Labs, and I am really excited to continue to watch this space and see what unfolds. You should too!
To date, Cooper Hewitt has published several groupings of exhibition-related content in the channels editorial web format. You can read about the development of channels in my earlier post on the topic. This article will focus on post-launch observations of the two most content-rich channels currently on cooperhewitt.org: Scraps and By the People. The Scraps channel contains a wonderful series of posts about sustainability and textiles by Magali An Berthon, and the By the People channel has a number of in-depth articles written by the Curator of Socially Responsible Design at Cooper Hewitt, Cynthia E. Smith. This article focuses on channels as a platform, but I’d like to note that the metrics cited throughout reflect the appeal of the fabulous photography, illustration, research, and writing of channel contributors.
The Scraps exhibition installed in Cooper Hewitt’s galleries.
Since launch, there’s been a positive reaction among staff to channels. Overall they seem excited to have a considered editorial space in which to communicate with readers and exhibition-goers. There has also been strong user engagement with channels. Through Google Analytics we’ve seen two prominent user stories emerge in relation to channels. The first is a user who is planning or considering a trip to the museum. They enter the most common pathway to channel pages through the Current Exhibitions page. From the channel page, they then enter the ticket purchase path through the sidebar link. [Fig. 1] 4.25% of channel visitors proceeded to tickets.cooperhewitt.org from the Scraps channel; 6.09% did the same from the By the People channel. Web traffic through the Scraps channel contributed 13.31% of all web sales since launch, and By the People contributed 15.7%.
Fig. 1. The Scraps channel sidebar contains two well-trafficked links: one to purchase tickets to the museum, and one to the Scraps exhibition page on the collection website.
The second most prominent group of users demonstrates interest in diving into content. 16.32% of Scraps channel visitors used the sidebar link to visit the corresponding exhibition page that houses extended curatorial information about the objects on display; 10.99% used the navigation buttons to view additional channel posts. 19.11% of By the People channel visitors continued to the By the People exhibition page, and 2.7% navigated to additional channel posts.
Navigation patterns indicate that the two main types of users — those who are planning a visit to the museum and those who dive into editorial content — are largely distinct. There is little conversion from post reads to ticket sales, or vice-versa. Future iterations on channels could be directed at improving the cross-over between these user behaviors. Alternately, we could aim to disentangle the two user journeys to create clearer navigational pathways. Further investigation is required to know which is the right course of development.
Through analytics we’ve also observed some interesting behaviors in relation to channels and social media. One social-friendly affordance of the channel structure is that each post contains a digestible chunk of content with a dedicated URL. Social buttons on posts also encourage sharing. Pinterest has been the most active site for sharing content to date. Channel posts cross-promoted through Cooper Hewitt’s Object of the Day email subscription service are by far the most read and most shared. Because posts were shared so widely, 8.65% of traffic to the Scraps channels originated from from posts. (By the People content has had less impact on social media and has driven a negligible amount of site traffic.)
Since posts are apt for distribution, we realized they needed to serve as effective landing pages to drive discovery of channel content. As a solution, Publications department staff developed language to append to the bottom of each post to help readers understand the editorial context of the posts and navigate to the channel page. [Fig. 2] To make use of posts as points of entry, future channel improvements could develop discovery features on posts, such as suggested content. Currently, cross-post navigation is limited to a single increment forward or backward.
Fig. 2. Copy appended to each post contextualizes the content and leads readers to the channel home page or the exhibition page on the collection website.
Further post-launch iterations focused on the appearance of posts in the channels page. Publications staff began utilizing an existing feature in WordPress to create customized preview text for posts. [Fig. 3] These crafted appeals are much more to inviting potential readers than the large blocks of excerpted text that show up automatically. [Fig. 4]
Fig. 3. View of a text-based post on a channel page, displaying customized preview text and read time.
Fig. 4. View of a text-based post on a channel page, displaying automatically excerpted preview text.
Digital & Emerging Media (D&EM) department developer, Rachel Nackman, also implemented some improvements to the way that post metadata displays in channels. We opted to calculate and show read time for text-based posts. I advocated for the inclusion of this feature because channel posts range widely in length. I hypothesized that showing read time to users would set appropriate expectations and would mitigate potential frustration that could arise from the inconsistency of post content. We also opted to differentiate video and publication posts in the channel view by displaying “post type” and omitting post author. [Fig. 5 and 6] Again, these tweaks were aimed at fine-tuning the platform UX and optimizing the presentation of content.
Fig. 5. View of a video post on a channel page, displaying “post type” metadata and omitting post author information.
Fig. 6. View of a publication post on a channel page, displaying “post type” metadata and omitting post author information.
The channels project is as much an expansion of user-facing features as it is an extension of the staff-facing CMS. It has been useful to test both new methods of content distribution and new editorial workflows. Initially I intended channels to lean heavily on existing content creation workflows, but we have found that it is crucial to tailor content to the format in order to optimize user experience. It’s been an unexpectedly labor intensive initiative for content creators, but we’ve seen a return on effort through the channel format’s contribution to Cooper Hewitt business goals and educational mission.
Based on observed navigation patterns and engagement analytics it remains a question as to whether the two main user journeys through channels — toward ticket purchases and toward deep-dive editorial content — should be intertwined. We’ve seen little conversion between the two paths, so perhaps user needs would be better served by maintaining a separation between informational content (museum hours, travel information, what’s on view, ticket purchasing, etc.) and extended editorial and educational content. The question certainly bares further investigation — as we’ve seen, even the smallest UI changes to a content platform can have a big impact on the way content is received.
Cooper Hewitt, Smithsonian Design Museum is pleased to announce it will begin its first major initiative to address the conservation needs of digital materials in 2017. Supported by the Smithsonian Collections Care and Preservation Fund, the Digital Collection Materials Project will serve to set standards, practices, and strategies related to digital materials in Cooper Hewitt’s permanent collection. Of the more than 210,000 design objects in the collection, it is estimated that roughly 150 items incorporate information conveyed in a digital form. Many of these objects are home and office electronics, personal computing and mobile devices, and media players with interfaces that span both hardware and software. Among the 150 items, there are also born digital works–examples of design that originated in electronic form that are saved as digital data. These include both creative and useful software applications, as well as media assets, such as videos and computer-aided designs.
The first phase of the Digital Collection Materials Project will be the design and execution of a collection survey. The second phase will be case studies of select objects. The final phase will synthesize the survey results and case study findings in order to determine recommendations for a strategic plan of care, preservation, and responsible acquisition of digital materials for the collection.
The historical core of Cooper Hewitt, Smithsonian Design Museum’s collection is comprised of objects selected by the museum’s founders, Sarah and Eleanor Hewitt, to document outstanding technical and artistic accomplishments in the decorative arts. Established in 1897 as an educational resource for The Cooper Union for the Advancement of Science and Art, Cooper Hewitt’s collection continues to expand to encompass a range of materials exemplifying the broad category of human ingenuity and artistry that today we call design. The diversity of the museum’s collection exemplifies the core institutional belief that design is best understood through process, a framework that fosters understanding of human activity as it intersects with many materials and technologies, including the important fields of interface design, interaction design, and user experience design.
The Digital Collections Materials Project will help preserve long-term access to digital materials in the collection while maintaining the integrity of the designs they express. It will also allow Cooper Hewitt to move forward responsibly with acquisitions in the exciting realm of digital design. Since digital materials are especially vulnerable to the deleterious effects of technological obsolescence and decay, which can lead to inaccessibility and information loss, there is an urgent need to address the conservation needs of digital materials in the collection. It is with an eye to these materials’ cultural significance and vulnerability that the museum moves forward with the Digital Collection Materials Project.
This project received Federal support from the Smithsonian Collections Care and Preservation Fund, administered by the National Collections Program and the Smithsonian Collections Advisory Committee.
Digital Project, Ten Thousand Cents, 2007–08; Designed by Aaron Koblin and Takashi Kawashima; USA; processing, adobe flash cs3, php/mysql, amazon mechanical turk, adobe photoshop, adobe after effects; Gift of Aaron Koblin and Takashi Kawashima; 2014-41-2; Object Record
And we need your help! We are looking for two ultra-talented and fearless media spelunkers to dive into the collection and surface all of the computer, product design, and interaction design history within. We want you to help research and invigorate this part of the collection so that we can share it with the world. It’s a noble cause, and one that will help give museum visitors an even better experience of design at Cooper Hewitt.
One Laptop Per Child XO Computer, 2007; Designed by Yves Béhar, Bret Recor and fuseproject; injection molded abs plastic and polycarbonate, printed rubber, liquid crystal display, electronic components; steel, copper wire (power plug); H x W x D (closed): 3.5 × 22.9 × 24.1 cm (1 3/8 in. × 9 in. × 9 1/2 in.); Gift of George R. Kravis II; 2015-5-8-a,b; Object Record
We are hiring for two contract positions: Media Preservation Specialist and Time-Based Media Curatorial Assistant. The contractors will work together on the first phase of the Digital Collection Materials Project to survey and document collection items. Check out the official project announcement below to understand the full scope of the project.
To apply for the Media Preservation Specialist or Time-Based Media Curatorial Assistant position:
Looking forward to seeing your applications—we can’t wait to partner with you for this important work!
This project received Federal support from the Smithsonian Collections Care and Preservation Fund, administered by the National Collections Program and the Smithsonian Collections Advisory Committee.
Fig.1. Process Lab: Citizen Designer exhibition and signage, on view at Cooper Hewitt
The Process Lab is a hands-on educational space where visitors are invited to get involved in design process. Process Lab: Citizen Designer complimented the exhibition By the People: Designing a Better America, exploring the poverty, income inequality, stagnating wages, rising housing costs, limited public transport, and diminishing social mobility facing America today.
In Process Lab: Citizen Designer participants moved through a series of prompts and completed a worksheet [fig. 2]. Selecting a value they care about, a question that matters, and design tactics they could use to make a difference, participants used these constraints to create a sketch of a potential solution.
Cooper Hewitt’s Education Department asked Digital & Emerging Media (D&EM) to build an interactive experience that would encourage visitors to learn from each other by allowing them to share and compare their participation in exhibition Process Lab: Citizen Designer.
I served as project manager and user-experience/user-interaction designer, working closely with D&EM’s developer, Rachel Nackman, on the project. Interface Studio Architects (ISA) collaborated on concept and provided environmental graphics.
Fig. 2. Completed worksheet with question, value and tactic selections, along with a solution sketch
Project collaborators—D&EM, the Education Department, and ISA—came together for the initial steps of ideation. Since the exhibition concept and design was well established at this time, it was clear how participants would engage with the activity. Through the process of using cards and prompts to complete a worksheet they would generate several pieces of information: a value, a question, one to two tactic selections, and a solution sketch. The group decided that these elements would provide the content for the “sharing and comparing” specification in the project brief.
Of the participant-generated information, the solution sketch stood out as the only non-discrete element. We determined that given the available time and budget, a simple analog solution would be ideal. This became a series of wall-mounted display bins in which participants could deposit their completed worksheets. This left value, question, and tactic information to work with for the content of the digital interactive.
From the beginning, the Education Department mentioned a “broadcast wall.” Through conversation, we unpacked this term and found a core value statement within it. Phrased as a question, we could now ask:
“How might we empower participants to think about themselves within a community so that they can be inspired to design for social change?”
Framing this question allowed us to outline project objectives, knowing the solution should:
Help form a virtual community of exhibition participants.
Allow individual participants to see themselves in relation to that community.
Encourage participants to apply learnings from the exhibition other communities
As the project team clarified project objectives, we also identified a number of challenges that the design solution would need to navigate:
Adding Value, Not Complexity. The conceptual content of Process Lab: Citizen Designer was complex. The design activity had a number of steps and choices. The brief asked that D&EM add features to the experience, but the project team also needed to mitigate a potentially heavy cognitive load on participants.
Predetermined Technologies. An implicit part of the brief required that D&EM incorporate the Pen into user interactions. Since the Pen’s NFC-reading technology is embedded throughout Cooper Hewitt, the digital interactive needed to utilize this functionality.
Spatial Constraints. Data and power drops, architectural features, and HVAC components created limitations for positioning the interactive in the room.
Time Constraints. D&EM had two months to conceptualize and implement a solution in time for the opening of the exhibition.
Adapting to an Existing Design. D&EM entered the exhibition design process at it’s final stages. The solution for the digital interactive had to work with the established participant-flow, environmental graphics, copy, furniture, and spatial arrangement conceived by ISA and the Education Department.
Budget. Given that the exhibition design was nearly complete, there was virtually no budget for equipment purchases or external resourcing.
Process: Defining a Design Direction
From the design brief, challenges, objectives, and requirements established so far, we could now begin to propose solutions. Data visualization surfaced as a potential way to fulfill the sharing, comparing and broadcasting requirements of the project. A visualization could also accommodate the requirement to allow an individual participants to compare themselves to the virtual exhibition community by displaying individual data in relation to the aggregate.
ISA and I sketched ideas for the data visualization [figs. 3 and 4], exploring a variety of structures. As the project team shared and reviewed the sketches, discussion revealed some important requirements for the data organization:
The question, value and tactic information should be hierarchically nested.
The hierarchy should be arranged so that question was the parent of value, and value was the parent of tactics.
Fig. 3. My early data visualization sketches
Fig. 4. ISA’s data visualization sketch
With this information in hand, Rachel proceeded with the construction of the database that would feed the visualization. The project team identified an available 55-inch monitor to display the data visualization in the gallery; oriented vertically it could fit into the room. As I sketched ideas for data visualizations I worked within the given size and aspect ratio. Soon it became clear that the number of possible combinations within the given data structure made it impossible to accommodate the full aggregate view in the visualization. To illustrate the improbability of showing all the data, I created a leaderboard with mock values for the hundreds of permutations that result from the combination of 12 value, 12 question and 36 tactic selections [fig. 5, left]. Not only was the volume of information overwhelming on the leaderboard, but Rachel and I agreed that the format made no interpretive meaning of the data. If the solution should serve the project goal to “empower participants to think about themselves within a community so that they can be inspired to design for social change,” it needed to have a clear message. This insight led to a series steps towards narrativizing the data with text [fig. 5].
Concurrently, the data visualization component was taking shape as an enclosure chart, also known as a circle packing representation. This format could accommodate both hierarchical information (nesting of the circles) and values for each component (size of the circles). With the full project team on board with the design direction, Rachel began development on the data visualization using D3.js library.
Fig. 5. Series of mocks moving from a leaderboard format to a narrativized presentation of data with an enclosure chart
Process: Refining and Implementing a Solution
Through parallel work and constant communication, Rachel and I progressed through a number of decisions around visual presentation and database design. We agreed that to enhance legibility we should eliminate tactics from the visualization and present them separately. I created a mock that applied Cooper Hewitt’s brand to Rachel’s initial implementation of the enclosure chart. I proposed copy that wrapped the data in understandable language, and compared the latest participant to the virtual community of participants. I opted for percentage values to reinforce the relationship of individual response to aggregate. Black and white overall, I used hot pink to highlight the relationship between the text and the data visualization. A later iteration used pink to indicate all participant data points. I inverted the background in the lower quarter of the screen to separate tactic information from the data visualization so that it was apparent this data was not feeding into the enclosure chart, and I utilized tactic icons provided by ISA to visually connect the digital interactive to the worksheet design [fig. 2].
Next, I printed a paper prototype at scale to check legibility and ADA compliance. This let us analyze the design in a new context and invited feedback from officemates. As Rachel implemented the design in code, we worked with Education to hone the messaging through copy changes and graphic refinements.
Fig. 5. A paper prototype made to scale invited people outside the project team to respond to the design, and helped check for legibility
The next steps towards project realization involved integrating the data visualization into the gallery experience, and the web experience on collection.cooperhewitt.org, the collection website. The Pen bridges these two user-flows by allowing museum visitors to collect information in the galleries. The Pen is associated with a unique visit ID for each new session. NFC tags in the galleries are loaded with data by curatorial and exhibitions staff so that visitors can use the Pen to save information to the onboard memory of the Pen. When they finish their visit the Pen data is uploaded by museum staff to a central database that feeds into unique URLs established for each visit on the collection site.
The Process Lab: Citizen Designer digital interactive project needed to work with the established system of Pens, NFC tags, and collection site, but also accommodate a new type of data. Rachel connected the question/value/tactic database to the Cooper Hewitt API and collections site. A reader-board at a freestanding station would allow participants to upload Pen data to the database [fig. 6]. The remaining parts of the participant-flow to engineer were the presentation of real time data on the visualization screen, and the leap from the completed worksheet to digitized data on the Pen.
Rachel found that her code could ping the API frequently to look for new database information to display on the monitor—this would allow for near real-time responsiveness of the screen to reader-board Pen data uploads. Rachel and I decided on the choreography of the screen display together: a quick succession of entries would result in a queue. A full queue would cycle through entries. New entries would be added to the back of the queue. An empty queue would hold on the last entry. This configuration assumed that if the queue was full when they added their entry participants may not see their data immediately. We agreed to offload the challenge of designing visual feedback about the queue length and succession to a subsequent iteration in service of meeting the launch deadline. The queue length has not proven problematic so far, and most participants see their data on screen right away.
Fig. 6. Monitor displaying the data visualization website; to the left is the reader-board station
As Rachel and I brought the reader board, data visualization database, and website together, ISA worked on the graphic that would connect the worksheet experience to the digital interactive. The project team agreed that NFC tags placed under a wall graphic would serve as the interface for participants to record their worksheet answers digitally [fig. 7].
Fig. 7. ISA-designed “input graphic” where participants record their worksheet selections; NFC tags beneath the circles write question, value and tactic data to the onboard memory of the Pen
Process: Installation, Observation & Iteration
Rachel and I had the display website ready just in time for exhibition installation. Exhibitions staff and the project team negotiated the placement of all the elements in the gallery. Because of obstacles in the room, as well as data and power drop locations, the input wall graphic [fig. 7] had to be positioned apart from the reader-board and display screen. This was unfortunate given the interconnection of these steps. Also non-ideal was the fact that ISA’s numeric way-finding system omitted the step of uploading Pen data at the reader-board and viewing the data on-screen [fig.1]. After installation we had concerns that there would be low engagement with the digital interactive because of its disconnect from the rest of the experience.
As soon as the exhibition was open to the public we could see database activity. Engagement metrics looked good with 9,560 instances of use in the first ten days. The quality of those interactions, however, was poor. Only 5.8% satisfied the data requirements written into the code. The code was looking for at least one question, one value, and one tactic in order to process the information and display it on-screen. Any partial entries were discounted.
Fig. 8. A snippet of database entries from the first few days of the exhibition showing a high number of missing question, value and tactic entries
The project team met the steep challenges of limited time and budget—we designed and built a completely new way to use the Pen technology. High engagement with the digital interactive showed that what we created was inviting, and fit into the participatory context of the exhibition. Database activity, however, showed points of friction for participants. Most had trouble selecting a question, value and tactic on the input graphic, and most did not successfully upload their Pen data at the reader-board. Stringent database requirements added increased difficulty.
Based on these observations, it is clear that the design of the digital interactive could be optimized. We also learned that some of the challenges facing the project could have been mitigated by closer involvement of D&EM with the larger exhibition design effort. Our next objective is to stabilize the digital interactive at an acceptable level of usability. We will continue observing participant behavior in order to inform our next iterations toward a minimum viable product. Once we meet the usability requirement, our next goal will be to hand-off the interactive to gallery staff for continued maintenance over the duration of the exhibition.
As an experience, the Process Lab:Citizen Designer digital interactive has a ways to go, but we are excited by the project’s role in expanding how visitors use the Pen. This is the first time that we’ve configured Pen interactivity to allow visitors to input information and see that input visualized in near real-time. There’s significant potential to reuse the infrastructure of this project again in a different exhibition context, adapting the input graphic and data output design to a new educational concept, and the database to new content.
This part two in a series on digitization. My name is Allison Hale, Digital Imaging Specialist at Cooper Hewitt. I started working at the museum in 2014 during the preparations for a mass digitization project. Before the start of digitization, there were 3,690 collection objects that had high resolution, publication quality photography. The museum has completed phase two of the project and has completed photography of more than 200,000 collection objects.
Prior to DAMS (Digital Asset Management System), image files were stored on optical discs and RAID storage arrays. This was not an ideal situation for our legacy files or for a mass digitization workflow. There was a need to connect image assets to the collections database, The Museum System, and to deliver files to our public-facing technologies.
Vendor Server to DAMS Workflow
The Smithsonian’s DAMS Team and Cooper Hewitt staff worked together to build workflow that could be used daily to ingest hundreds of images. The images moved from a vendor server to Smithsonian’s digital repository. The preparation for the project began with 5 months of planning, testing, and upgrades to network infrastructure to increase efficiency. During mass digitization, 4 Cooper Hewitt staff members shared the responsibility for daily “ingests” or uploads of assets to DAMS. Here is the general workflow:
Cooper Hewitt to DAMS workflow.
Images are stored by vendor in a staging server, bucketed by a folder titled with shoot date and photography station. The vendor delivers 3 versions of each object image in separate folders: RAW (proprietary camera format or DNG file), TIF (full frame/with image quality target), JPG (full-scale, cropped and ready for public audience)
Images are copied from the server into a “hot folder”–a folder that is networked to DAMS. The folder contains two areas, a temporary area and then separate active folders called MASTER, SUBFILE, SUB_SUBFILE
Once the files have moved to the transfer area, the RAW files move to the MASTER folder, the TIF to the SUBFILE folder, and the JPG files to the SUB_SUBFILE folder. The purpose of the MASTER/SUB/SUB_SUB structure is to keep the images parent-child linked once they enter DAMS. The parent-child relationship keeps files “related” and indexable
An empty text file called “ready.txt” is put into the MASTER folder. Every 15 minutes a script runs to search for the ready.txt command
Images are ingested from the hot folder into the DAMS storage repository
During the initial setup, the DAMS user created a “template” for the hot folder. The template automatically applies bulk administrative information to the image’s DAMS record, as well as categories and security policies
Once the images are in DAMS, security policies and categories can be changed to allow the images to be pushed to TMS (The Museum System) via CDIS (Collection Dams Integration System) and IDS (Image Delivery Service)
DAMS to TMS and Beyond: Q&A with Robert Feldman
DAMS is repository storage and is designed to interface with databases. A considerable amount of planning and testing went into connecting mass digitization images to Cooper Hewitt’s TMS database. This is where I introduce Robert Feldman, who works with Smithsonian’s Digital Asset Management Team to manage all aspects of CDIS—Collection Dams Integration System. Robert has expertise in software development and systems analysis. A background in the telecommunications industry and experience with government agencies allows him to work in a matrixed environment while supporting many projects.
AH: Can you describe your role at Smithsonian’s DAMS?
RF: As a member of the DAMS team, I develop and support CDIS (Collection-DAMS Integration System). My role has recently expanded to creating and supporting new systems that assist the Smithsonian OCIO’s goal of integrating the Smithsonian’s IT systems beyond the scope of CDIS. One of these additional tools is VFCU (Volume File Copy Utility). VFCU validates and ingests large batches of media files into DAMS. CDIS and VFCU are coded in Java, and makes use of Oracle and SQL-Server databases.
AH: We understand that CDIS was written to connect images in DAMS to the museum database. Can you tell us more us more about the purpose of the code?
RF: The primary idea behind CDIS is to identify and store the connection between the image in DAMS and the rendition in TMS. Once these connections are stored in the CDIS database, CDIS can use these connections to move data from the DAMS system to TMS, and from TMS to DAMS.
AH: Why is this important?
RF:CDIS provides the automation of many steps that would otherwise be performed manually. CDIS interconnects ‘all the pieces together’. The CDIS application enables Cooper Hewitt to manage its large collection in the Smithsonian IT systems in a streamlined, traceable and repeatable manner, reduces the ‘human error’ element, and more.
AH: How is this done?
RF: For Starters, CDIS creates media rendition records in TMS based on the image in DAMS. This enables Cooper Hewitt to manage these renditions in TMS hours after they are uploaded into DAMS and assigned the appropriate DAMS category.
CDIS creates the media record in TMS by inserting new rows directly into 6 different tables in the TMS database. These tables hold information pertaining to the Media and Rendition records and the linkages to the object record. The Thumbnail image in TMS is generated by saving a copy the reduced resolution image from DAMS into the database field in TMS that holds the thumbnail, and a reference to the full-resolution image is saved in the TMS MediaFiles table.
This reference to the full-resolution image consists of the DAMS UAN (the Unique Asset Name – a unique reference to the particular image in DAMS) appended to the IDS base pathname. By stringing together the IDS base pathname with the UAN, we will have a complete url – pointing to the IDS derivative that is viewable in any browser.
The full references to this DAMS UAN and IDS pathname, along with the object number and other descriptive information populates a feed from TMS. The ‘Collections’ area of the Cooper Hewitt website uses this feed to display the images in its collection. The feed is also used for the digital tables and interactive displays within the museum and more!
A museum visitor looking at an image on the Digital Table. Photo by Matt Flynn.
Another advantage of the integration with CDIS is Cooper Hewitt no longer has to store a physical copy of the image on the TMS media drive. The digital media image is stored securely in DAMS, where it can be accessed and downloaded at any time, and a derivative of the DAMS image can be easily viewed by using the IDS url. This flow reduces duplication of effort, and removes the need for Cooper Hewitt to support the infrastructure to store their images on optical discs and massive storage arrays.
When CDIS creates the media record in TMS, CDIS saves the connection to this newly created rendition. This connection allows CDIS to bring object and image descriptive data from TMS into DAMS metadata fields. If the descriptive information in TMS is altered at any point, these changes are automatically carried to DAMS via a nightly CDIS process. The transfer of metadata from TMS to DAMS is known as the CDIS ‘metadata sync’ process.
On the left, image of object record in The Museum System database. On right, object in the DAMS interface with mapped metadata from the TMS record. Click photo to enlarge.
Because CDIS carries the object descriptive data into searchable metadata fields in the DAMS, the metadata sync process makes it possible to locate images in the DAMS. When a DAMS user performs a simple search of any of the words that describe the object or image in TMS, all the applicable images will be returned, provided of course that the DAMS user has permissions to see those particular images!
Image of search functionality in DAMS. Click to enlarge image.
The metadata sync is a powerful tool that not only provides the ability to locate Cooper Hewitt owned objects in the DAMS system, but also provides Cooper Hewitt control of how the Smithsonian IDS (Image Delivery Service) displays the image. Cooper Hewitt specifies in TMS a flag to indicate whether to make the image available to the general public or not, and the maximum resolution of the image to display on public facing sites on an image by image basis. With each metadata update, CDIS transfers these settings from TMS to DAMS along with descriptive metadata. DAMS in turn sends this information to IDS. CDIS thus is a key piece that bridges TMS to DAMS to IDS.
AH: Can you show us an example of the code? How was it written?
RF: What was once a small utility, CDIS has since expanded to what may be considered a ‘suite’ of several tools. Each CDIS tool or ‘CDIS operation type’ serves a unique purpose.
For Cooper Hewitt, three operation types are used. The ‘Create Media Record’ tool creates the TMS media, then the ‘Metadata Sync’ tool brings over metadata to DAMS, and finally the ‘Timeframe Report’ is executed. The Timeframe Report emails details of the activity that has been performed (successes and failures) in the past day. Other CDIS operations are used to support the needs of other Smithsonian units.
The following is a screenshot of the listing of the CDIS code, developed in the NetBeans IDE with Java. The classes that drives each ‘operation type’ are the highlighted classes in the top left.
A screenshot of the listing of the CDIS code, developed in the NetBeans IDE with Java.
It may be noted that more than half of the classes reside in folders that end in ‘Database’. These classes map directly to database tables of the corresponding name, and contain functions that act on those individual database tables. Thus MediaFiles.java in edu.si.CDIS.CIS.TMS.Database performs operations on the TMS table ‘MediaFiles’
Something I find a little more interesting than the java code is the configuration files. Each instance of CDIS requires two configuration files that enable OCIO to tailor the behavior of CDIS to each Smithsonian unit’s specific needs. We can examine one of these configuration files- the .xml formatted cdisSql.xml file.
The use of this file is two-fold. First, it contains the criteria CDIS uses to identify which records are to be selected each time a CDIS process is run. The criteria is specified by the actual SQL statement that CDIS will use to find the applicable records. To illustrate the first use, here is an example from the cdisSql.xml file:
The cdisSql.xml file.
This query is part of the metadataSync operation type as the xml tag indicates. This query obtains a list of records that have been connected in CDIS, are owned by Cooper Hewitt (CHSDM), and have never been metadata synced before (there is no metadata sync record in the cdis_activity_log table).
A second use for the cdisSql.xml file is it contains the mappings used in the metadata sync. Each Smithsonian unit has different fields in TMS that are important to them. Because Cooper Hewitt has its own xml file, CDIS provides specialized metadata mapping for Cooper Hewitt.
A selection of code for the metadata sync mapping.
If we look at the first query, the creditLine in TMS database table ‘object’ is mapped to the field ‘credit’ in DAMS. Likewise, the data in the TMS object table, column description is carried over to the description field in DAMS, etc. In the second query, there are three different fields in TMS appended to each other (with spaces between them) to make up the ‘other_constraints’ field in DAMS. In the third query (which is indicated to be a ‘cursor append’ query with a delimiter ‘,’ a list of values may be returned from TMS. Each member of the list is concatenated into to a single field in DAMS (the DAMS ‘caption’ field) with a comma (the specified delimiter) separating each value returned in the list from TMS. The metadata sync process combines the results of all three of these queries AND MORE to generate the update for metadata sync in DAMS.
The advantage of locating these queries in the configuration file is it allows for flexibility for each Smithsonian unit to be configured with different criteria for the metadata sync. This design also permits CDIS to perform a metadata sync on other CIS systems (besides TMS) that may use any variety of RDBMS systems. As long as the source data in can be selected with SQL query, it can be brought over to the DAMS.
AH: To date, how many Cooper Hewitt images have been successfully synced with the CDIS code?
RF: For Cooper Hewitt, CDIS currently maintains the TMS to DAMS connections of nearly 213,000 images. This represents more than 172,000 objects.
AH: From my perspective, many of our team projects have involved mapping metadata. Are there any other parts of the code that you find challenging, rewarding?
RF: As for challenges – I deal with many different Smithsonian Units. They each have their own set of media records in various IT systems and they all need to be integrated. There is a certain balancing act that must take place nearly every day. That is provide for the unique needs for each Smithsonian Unit, while also identifying the commonalities among the units. Because CDIS is so flexible, without proper planning and examining the whole picture with the DAMS team, CDIS would be in danger of becoming too unwieldy to support.
As far as rewards- I have always valued projects that allow me to be creative. Investigating the most elegant ways of doing things allows me to keep learning and be creative at the same time. The design of new processes, such as the newly redesigned CDIS and VFCU fulfill that need. But the most rewarding experience is discovering how my efforts are used by researchers, and educate the public in the form of public facing websites and interactive displays. Knowing that I am a part of the historical digitization effort the Smithsonian is undertaking is very rewarding in itself.
AH: Has the CDIS code changed over the years? What types of upgrades have you recently worked on?
RF: CDIS has changed much since we have integrated the first images for Cooper Hewitt. The sheer volume of the data flowing through CDIS has increased exponentially. CDIS now connects well over half a million images owned by nearly a dozen different Smithsonian Units, and that number is growing daily.
CDIS has undergone many changes to support such an increase in scale. In CDIS version 2, CDIS was intrinsically hinged to TMS and relied on database tables in each unit’s own TMS system. For CDIS version 3, we have taken issues such as this into account, and have migrated the backend database for CDIS to a dedicated schema within DAMS database. Cooper Hewitt’s instance of CDIS was updated to version 3 less than two months ago.
Now that the CDIS database is no longer hinged to TMS, CDIS version 3 has opened the doors to mapping DAMS records to a larger variety of CIS systems. We no longer depend the TMS database structure, or even that the CIS system uses the SQL-Server RDBMS. This has enabled the Smithsonian OCIO the ability to expand CDIS’s role beyond TMS and allow integration with other CIS systems such as the National Museum of Natural History’s EMuseum, the Archives of American Art’s proprietary system as well as the Smithsonian Gardens IRIS-BG. All three are currently using the new CDIS today, and there are more coming on board to integrate with CDIS in the near future!
One challenge has been correcting mass digitization images that end up in the wrong object record. If an object was incorrectly barcoded, the image in barcode’s corresponding collections record is also incorrect. Once the object record’s image is known to be incorrect, the asset must be exported, deleted, and purged from DAMS. The image must also be deleted from the media rendition in TMS. When the correct record is located, the file’s barcode or filename can be changed and re-ingested into DAMS. The process can take several days.
The adoption of Smithsonian’s DAMS system has greatly improved redundancy and our workflow with digitization and professional photography. The flexibility of the CDIS coding has allowed me to work with photography assets of our collection’s objects, or “collection surrogates” and images from other departments, such as the Library. Overall, the change has been extremely user-friendly.