Most of the image processing for the collections website is done using the Python programming language. This includes things like: extracting colours or calculating an image’s entropy (its “busy-ness”) or generating those small halftone versions of image that you might see while you wait for a larger image to load.
Soon we hope to start doing some more sophisticated computer vision related work which will almost certainly mean using the OpenCV tool chain. This likely means that we’ll continue to use Python because it has easy to use and easy to install bindings to hide most of the fiddly bits required to look at images with “robot eyes”.
The collections website itself is not written in Python and that’s okay. There are lots of ways for different languages to hold hands inside of a single “application” and we’ve used many of them. But we also think that most of these little pieces of functionality are useful in and of themselves and shouldn’t require that a person (including us) have to burden themselves with the minutiae of the collections website infrastructure to use them.
We’ve slowly been taking the various bits of code we’ve written over the years and putting them in to discrete libraries that can then be wrapped up in little standalone HTTP “pony” or “plumbing” servers. This idea of exposing bespoke pieces of functionality via a web server is hardly new. Dave Winer has been talking about “fractional horsepower HTTP servers” since 1997. What’s changed between then and now is that it’s more fun to say “HTTP pony” and it’s much easier to bake a little web server in to an application because HTTP has become the lingua franca of the internet and that means almost every programming language in use today knows how to “speak” it.
In practice we end up with a “stack” of individual pieces that looks something like this:
- Other people’s code that usually does all the heavy-lifting. An example of this might be Giv Parvaneh’s RoyGBiv library for extracting colours from images or Mike Migurski’s Atkinson library for dithering images.
- A variety of cooperhewitt.* libraries to hide the details of other people’s code.
- The cooperhewitt.flask.http_pony library which exports a setup of helper utilities for the running Flask-based HTTP servers. Things like: doing a minimum amount of sanity checking for uploads and filenames or handling (common) server configuration details.
- A variety of plumbing-SOMETHING-server HTTP servers which export functionality via HTTP GET and POST requests. For example: plumbing-atkinson-server, plumbing-palette-server and so on.
- Flask, a self-described “micro-framework” which is what handles all the details of the HTTP call and response life cycle.
- Optionally, a WSGI-compiliant server-container-thing-y for managing requests to a number Flask instances. Personally we like gunicorn but there are many to choose from.
Here is a not-really-but-treat-it-like-pseudo-code-anyway example without any error handling for the sake of brevity of a so-called “plumbing” server:
# Let's pretend this file is called 'example-server.py'. import flask from flask_cors import cross_origin import cooperhewitt.example.code as code import cooperhewitt.flask.http_pony as http_pony app = http_pony.setup_flask_app('EXAMPLE_SERVER') @app.route('/', methods=['GET', 'POST']) @cross_origin(methods=['GET', 'POST']) def do_something(): if flask.request.method=='POST': path = http_pony.get_upload_path(app) else: path = http_pony.get_local_path(app) rsp = code.do_something(path) return flask.jsonify(rsp) if __name__ == '__main__': http_pony.run_from_cli(app)
So then if we just wanted to let Flask take care of handling HTTP requests we would start the server like this:
$> python example-server.py -c example-server.cfg
And then we might talk to it like this:
$> curl -X POST -F 'file=@/path/to/file' https://localhost:5000
Or from the programming language of our choosing:
function example_do_something($path){ $url = "https://localhost:5000"; $file = curl_file_create($path); $body = array('file' => $file); $rsp = http_post($url, $body); return $rsp; }
Notice the way that all the requests are being sent to localhost
? We don’t expose any of these servers to the public internet or even between different machines on the same network. But we like having the flexibility to do that if necessary.
Finally if we just need to do something natively or want to write a simple command-line tool we can ignore all the HTTP stuff and do this:
$> python >>> import cooperhewitt.example.code as code >>> code.do_something("/path/to/file")
Which is a nice separation of concerns. It doesn’t mean that programs write themselves but they probably shouldn’t anyway.
If you think about things in terms of bricks and mortar you start to notice that there is a bad habit in (software) engineering culture of trying to standardize the latter or to treat it as if, with enough care and forethought, it might achieve sentience.
That’s a thing we try to guard against. Bricks, in all their uniformity, are awesome but the point of a brick is to enable a multiplicity of outcomes so we prefer to leave those details, and the work they entail, to people rather than software libraries.
Most of this code has been open-sourced and hiding in plain sight for a while now but since we’re writing a blog post about it all, here is a list of related tools and libraries. These all fall into categories 2, 3 or 4 in the list above.
- cooperhewitt.swatchbook — Functions for working with colour palettes.
- cooperhewitt.roboteyes.atkinson — Functions for rendering halftone images using Bill Atkinson’s dithering technique in both pure-Python (slow) or C (fast) if the
atk
library is available. - cooperhewitt.roboteyes.colors — Functions for extracting colours from an image using a specific palette (as defined by the
cooperhewitt.swatchbook
library). - cooperhewitt.roboteyes.opencv — A variety of OpenCV related functions. This one doesn’t really do anything yet but we’re including it here for good measure.
- cooperhewitt.roboteyes.shannon — Functions for measuring an image’s entropy and for calculating where to crop an image when generating thumbnails.
- cooperhewitt.roboteyes — A meta library whose only purpose is to install all the other
py-cooperhewitt-roboteyes
libraries at once.
- cooperhewitt.flask — Utility functions for writing Flask-based HTTP applications. The most important thing to remember about things in this class is that they are utility functions. They simply wrap some of the boilerplate tasks required to set up a Flask application but you will still need to take care of all the details.
- plumbing-atkinson-server — A simple Flask-based HTTP pony server to dither images.
- plumbing-shannon-server — A simple Flask-based HTTP pony server for extracting “Shannon-related” properties from images.
- plumbing-palette-server — A simple Flask-based HTTP pony server for extracting colors from images.
- plumbing-bauta-server — A simple Flask-based HTTP pony server for doing OpenCV related processing. This one, like cooperhewitt.roboteyes.opencv, doesn’t really do anything yet but it will and I just like saying “bauta” in the context of face detection.
Everything has a standard Python setup.py
for installing all the required bits (and more importantly dependencies) in all the right places. Hopefully this will make it easier for us break out little bits of awesomeness as free agents and share them with the world. The proof, as always, will be in the doing.
We’ve also released go-ucd which is a set of libraries and tools written in Go for working with Unicode data. Or more specifically, for the time being since they are not general purpose Unicode tools, looking up the corresponding ASCII name for a Unicode character.
For example:
$> ucd 䍕 NET; WEB; NETWORK, NET FOR CATCHING RABBIT
Or:
$> ucd THIS → WAY LATIN CAPITAL LETTER T LATIN CAPITAL LETTER H LATIN CAPITAL LETTER I LATIN CAPITAL LETTER S SPACE RIGHTWARDS ARROW SPACE LATIN CAPITAL LETTER W LATIN CAPITAL LETTER A LATIN CAPITAL LETTER Y
There is, of course, a handy “pony” server (called ucd-server
) for asking these questions over HTTP:
$> curl -X GET -s 'https://localhost:8080/?text=♕%20HAT' | python -mjson.tool { "Chars": [ { "Char": "u2655", "Hex": "2655", "Name": "WHITE CHESS QUEEN" }, { "Char": " ", "Hex": "0020", "Name": "SPACE" }, { "Char": "H", "Hex": "0048", "Name": "LATIN CAPITAL LETTER H" }, { "Char": "A", "Hex": "0041", "Name": "LATIN CAPITAL LETTER A" }, { "Char": "T", "Hex": "0054", "Name": "LATIN CAPITAL LETTER T" } ] }
This one, potentially, has a very real and practical use-case but it’s not something we’re quite ready to talk about yet. In the meantime, it’s a fun and hopefully useful tool so we thought we’d share it with you.
Note: There are equivalent libraries and an HTTP pony for ucd
written in Python but they are incomplete compared to the Go version and may eventually be deprecated altogether.
Comments, suggestions and gentle clue-bats are welcome and encouraged. Enjoy!