Wednesday, January 13, 2016

Not using HTTPS on your website is like sending your users outside in just their underwear.

#ALAMW16 exhibits,
viewed from the escalator
This past weekend, I spent 3 full days talking to librarians, publishers, and library vendors about making the switch to HTTPS. The Library Freedom Project staffed a table in the exhibits at the American Library Association Midwinter meeting. We had the best location we could possibly wish for, and we (Alison Macrina, Nima Fatemi, Jennie Rose Halperin and myself) talked our voices hoarse with anyone interested in privacy in libraries, which seemed to be everyone. We had help from Jason Griffey and Andromeda Yelton (who were next to us, showing off the cutest computers in town for the "Measure the Future" project).

Badass librarians with
framed @snowden tweet.
We had stickers, we had handouts. We had ACLU camera covers and 3D-printed logos. We had new business cards. We had a framed tweet from @Snowden praising @libraryfreedom and "Badass Librarians", who were invited to take selfies.
Apart from helping to raise awareness about internet privacy, talking to lots of real people can help hone a message. Some people didn't really get encryption, and a few were all "What??? Libraries don't use encrypted connections???" By the end of the first day, I had the message down to the one sentence:
Not using HTTPS on your website is like sending your users outside in just their underwear.
Because, if you don't use HTTPS, people can see everything, and though there's nothing really WRONG with not wearing clothes outside, we live in a society where doing so by custom is the respectful thing. There are many excellent reasons to preserve our users' privacy, but many of the reasons tend to highlight the needs of other people. The opposing viewpoint is often "Privacy is a thing of the past, just get over it" or "I don't have anything to hide, so why work hard so you can keep all your dirty secrets?" But most people don't think wearing clothes is a thing of the past; a connection made between encrypted connections and nice clothes just normalizes the normal.

We've previously used the analogy that HTTP is like sending postcards while HTTPS is like sending notes in envelopes. This is a harder analogy to use in a 30 second explainer because you have to make a second argument that websites shouldn't be sent on postcards.

We need to craft better slogans because there's a lot of anti-crypto noise trying to apply an odor of crime and terrorism to good privacy and security practices. The underwear argument is effective against that - I don't know anyone that isn't at least a bit creeped out by the "unclothing" done by the TSA's full body scanners.

No Pants Subway Ride 2015: cosmetic trierarchs CC BY-NC-ND by captin_nod

Maybe instead of green lock icons for HTTPS, browser software could display some sort of flesh-tone nudity icon for unencrypted HTTP connections. That might change user behavior rather quickly. I don't know about you but I never lose sleep over door locks, but I do have nightmares about going out without my pants!

Saturday, January 2, 2016

The Best eBook of 2015: "What is Code?"

When the Compact Disc was being standardized, its capacity was set to accommodate the length of Beethoven's Ninth Symphony, reportedly at the insistence of Sony executive Norio Ohga. In retrospect it seems obvious that a media technology should adapt to media art wherever possible, not vice versa. This is less possible when new media technologies enable new forms of creation, but that's what makes them so exciting.

I've been working primarily on ebooks for the past 5 years, mostly because I'm excited at the new possibilities they enable. I'm impressed - and excited -  when ebooks do things that can't be done for print books, partly because ebooks often can't capture the most innovative uses of ink on paper.

Looking back on 2015, there was one ebook more than any other that demonstrated the possibilities of the ebook as an art form, while at the same time being fun, captivating, and awe-inspiring, Paul Ford's What Is Code?

Unfortunately, today's ebook technology standards can't fully accommodate this work. The compact disc of ebooks can store only four and a half movements of Beethoven's Ninth. That makes me sad.

You might ask, how does What Is Code? qualify as an ebook if it doesn't quite work on a kindle or your favorite ebook app? What Is Code? was conceived and developed as an HTML5 web application for Business Week magazine, not with the idea of making an ebook. Nonetheless, What Is Code? uses the forms and structures of traditional books. It has a title page. It has chapters, sections, footnotes and a table of contents which support a linear narrative. It has marginal notes, figures and asides.

Despite its bookishness, it's hard to imagine What Is Code? in print. Although the share buttons and video embeds are mostly adornments for the text, activity modules are core to the book's exposition. The book is about code, and by bringing code to life, the reader becomes immersed in the book's subject matter. There's a greeter robot that waves and seems to know the reader, showing the ebook's "intelligence". The "how do you type an "A" activity in section 2.1 is a script worth a thousand words  and the "mousemove" activity in section 6.2 is a revelation even to an experienced programmer. If all that weren't enough, there's a random, active background that manages to soothe more than it distracts.

Even with its digital doodads, What Is Code? can be completely self contained and portable. To demonstrate this, I've packaged it up and archived it at Internet Archive; you can download with this link (21MB).  Once you've downloaded it, unzip it and load the "index.html" file into a modern browser. Everything will work fine, even if you turn off your internet connection. What Is Code? will continue to work after Business Week disappears from the internet (or behind the most censorious firewall). [1]

I was curious how much of What is Code? could be captured in a standard EPUB ebook file. I first tried making a EPUB version 2 file with Calibre. The result was not a lame as I thought it would be, but stripped of interactivity, it seemed like a photocopy of a sticker book - the story's there, but the fun, not so much. Same with the Kindle version .

I hoped that more of the scripts would work with an EPUB 3 file. This is more or the same as the zipped html file I made but I was unable to get it to display properly in iBooks despite 2 days of trying. Perhaps someone more experienced with javascript in EPUB3 could manage it. The display in Calibre was a bit better. Readium, the flagship software for EPUB3, just sat there spinning a cursor. It seems that the scripts handling the vertical swipe convention of the web conflict with the more bookish pagination attempted by iBooks.

The stand-alone HTML zip archive that I made addresses most of the use cases behind EPUB. The text is reflowable and user-adjustable. Elements adjust nicely to the size of the screen from laptop to smartphone. The javascript table of contents works the same as in an ebook reader. Accessibility could be improved, but that's mostly a matter of following accessibility standards that aren't specific to ebooks.

My experimentation with the code behind What Is Code? is another exciting aspect of making books into ebooks. Code and digital text can use a open licenses [2] that permit others to use, re-use and learn from What Is Code?. The entire project archive is hosted on GitHub and to date has been enhanced 671 times by 29 different contributors. There have been 187 forks (like mine) of the project. I think this collaborative creation process will be second nature to the ebook of the future.

There have been a number of proposals for portable HTML5 web archive formats for ebook technology moving forward. Among these are "5DOC"  and W3C's "Portable Web Platform".   As far as I can tell, these proposals aren't getting much traction or developer support. To succeed, the format has to be very lightweight and useful, or be supported by at least 2 of Amazon, Apple, and Google. I hope someone succeeds at this.

Whatever happens I hope there's room for Easter Eggs in the future of the ebook. There's a "secret" keyboard combination that triggers a laugh-out-loud Easter Egg on What is Code? And if you know how to look at What Is Code?'s javascript console, you'll see a message that's an appropriate ending for this post:


Best of 2015, don't you agree?

[1] To get everything in What Is Code? to work without an internet connection, I needed to add a small number of remotely loaded resources and fix a few small javascript bugs specific to loading from a file. (If you must know, pushing to the document.history of a file isn't allowed.) The YouTube embed is blank, of course, and a horrible, gratuitous Flash widget needed to be excised. You can see the details on GitHub.

[2] In this case, the Apache License and the Creative Commons By-NC-ND License.

Thursday, December 31, 2015

A New Year's Resolution for Publishers and Libraries: Switch to HTTPS

The endorsement list for the Library Digital Privacy Pledge of 2015-2016 is up and ready to add the name of your organization. We added the "-2016" part, because various things took longer than we thought.

Everything takes longer than you think it will. Web development, business, committee meetings, that blog post. Over the past few months, I've talked to all sorts of people about switching to HTTPS. Librarians, publishers, technologists. Library directors, CEOs, executive editors, engineering managers. Everyone wants to do it, but there are difficulties and complications, many of them small and some of them sticky. It's clear that we all have to work together to make this transition happen.

The list will soon get a lot longer, because a lot of people wanted to meet about it at the ALA Midwinter meeting just 1 week away OMG it's so soon! Getting it done is the perfect New Year's resolution for everyone in the world of libraries.

Here's what you can do:

If you're a Publisher...

... you probably know you need to make the switch, if for no other reason than the extra search engine ranking. By the end of the year, don't be surprised if non-secure websites look unprofessional, which is not what a publisher wants to project.

If you're a Librarian...

... you probably recognize the importance of user privacy, but you're at the mercy of your information and automation suppliers. If those publishers and suppliers haven't signed the pledge, go and ask them why not. And where you control a service, make it secure!

If you're a Library Technology Vendor...

... here's your opportunity to be a hero. You can now integrate security and privacy into your web solution without the customer paying for certificates. So what are you waiting for?

If you're a Library user...

... ask your library if their services are secure and private. Ask publishers if their services are immune to eavesdropping and corruption. If those services are delivered without encryption, the answer is NO!

Everything takes longer than you think it will. Until it happens faster than you can imagine. Kids grow up so fast!

Tuesday, December 22, 2015

xISBN: RIP


When I joined OCLC in 2006 (via acquisition), one thing I was excited about was the opportunity to make innovative uses of OCLC's vast bibliographic database. And there was an existence proof that this could be done, it was a neat little API that had been prototyped in OCLC's Office of Research: xISBN.

xISBN was an example of a microservice- it offered a small piece of functionality and it did it very fast. Throw it an ISBN, and it would give you back a set of related ISBNs. Ten years ago, microservices and mashups were all the rage. So I was delighted when my team was given the job of "productizing" the xISBN service- moving it out of research and into the marketplace.

Last week,  I was sorry to hear about the imminent shutdown of xISBN. But it got me thinking about the limitations of services like xISBN and why no tears need be shed on its passing.

The main function of xISBN was to say "Here's a group of books that are sort of the same as the book you're asking about." That summary instantly tells you why xISBN had to die, because any time a computer tells you something "sort of", it's a latent bug. Because where you draw the line between something that's the same and something that's different is a matter of opinion and depends on the use you want to make of the distinction. For example, if you ask for A Study in Scarlet, you might be interested in a version in Chinese, or you might be interested to get a paperback version, or you might want to get Sherlock Holmes compilations that included A Study in Scarlet. For each  question you want a slightly different answer. If you are a developer needing answers to these questions, you would combine xISBN with other information services to get what you need.

Today we have better ways to approach this sort of problem. Serious developers don't want a microservice, they want richly "Linked Data". In 2015, most of us can all afford our own data crunching big-data-stores-in-the-cloud and we don't need to trust algorithms we can't control. OCLC has been publishing rather nice Linked Data for this purpose. So, if you want all the editions for Cory Doctorow's Homeland, you can "follow your nose" and get all the data you need.

  1. First you look up the isbn at http://www.worldcat.org/isbn/9780765333698
  2. which leads you to http://www.worldcat.org/oclc/795174333.jsonld (containing a few more isbns
  3. you can follow the associated "work" record: http://experiment.worldcat.org/entity/work/data/1172568223
  4. which yields a bunch more ISBNs.

It's a lot messier than xISBN, but that's mostly because the real world is messy. Every application requires a different sort of cleaning up, and it's not all that hard.

If cleaning up the mess seems too intimidating, and you just want light-weight ISBN hints from a convenient microservice, there's always "thingISBN". ThingISBN is a data exhaust stream from the LibraryThing catalog. To be sustainable, microservices like xISBN need to be exhaust streams. The big cost to any data service is maintaining the data, so unless maintaining that data is in the engine block of your website, the added cost won't be worth it. But if you're doing it anyway, dressing the data up as a useful service costs you almost nothing and benefits the environment for everyone. Lets hope that OCLC's Linked Data services are of this sort.

In thinking about how I could make the data exhaust from Unglue.it more ecological, I realized that a microservice connecting ISBNs to free ebook files might be useful. So with a day of work, I added the "Free eBooks by ISBN" endpoint to the Unglue.it api.

xISBN, you lived a good micro-life. Thanks.

Wednesday, November 11, 2015

Using Let's Encrypt to Secure an Elastic Beanstalk Website

Since I've been pushing the library and academic publishing community to implement HTTPS on all their informations services, I was really curious to see how the new Let's Encrypt (LE) certificate authority is really working, with its "general availability" date imminent. My conclusion is that "general availability" will not mean "general usability" right away; its huge impact will take six months to a year to arrive. For now, it's really important for the community to put our developers to work on integrating Let's Encrypt into our digital infrastructure.

I decided to secure the www.gitenberg.org website as my test example. It's still being developed, and it's not quite ready for use, so if I screwed up it would be no disaster. Gitenberg.org is hosted using Elastic Beanstalk (EB) on Amazon Web Services (AWS), which is a popular and modern way to build scaleable web services. The servers that Elastic Beanstalk spins up have to be completely configured in advance- you can't just log in and write some files. And EB does its best to keep servers serving. It's no small matter to shut down a server and run some temporary server, because EB will spin up another server to handle rerouted traffic. These characteristics of  Elastic Beanstalk exposed some of the present shortcomings and future strengths of the Let's Encrypt project.

Here's the mission statement of the project:
Let’s Encrypt is a free, automated, and open certificate authority (CA), run for the public’s benefit.
While most of us focus on the word "free", the more significant word here is "automated":
Automatic: Software running on a web server can interact with Let’s Encrypt to painlessly obtain a certificate, securely configure it for use, and automatically take care of renewal.
Note that the objective is not to make it painless for website administrators to obtain a certificate, but to enable software to get certificates. If the former is what you want, in the near term, then I strongly recommend that you spend some money with one of the established certificate authorities. You'll get a certificate that isn't limited to 90 days, as the LE certificates are, you can get a wildcard certificate, and you'll be following the manual procedure that your existing web server software expects you to be following.

The real payoff for Let's Encrypt will come when your web server applications start expecting you to use the LE methods of obtaining security certificates. Then, the chore of maintaining certificates for secure web servers will disappear, and things will just work. That's an outcome worth waiting for, and worth working towards today.

So here's how I got Let's Encrypt working with Elastic Beanstalk for gitenberg.org.

The key thing to understand here is that before Let's Encrypt can issue me a certificate, I have to prove to them that I really control the hostname that I'm requesting a certificate for. So the Let's Encrypt client has to be given access to a "privileged" port on the host machine designated by DNS for that hostname. Typically, that means I have to have root access to the server in question.

In the future, Amazon should integrate a Let's Encrypt client with their Beanstalk Apache server software so all this is automatic, but for now we have to use the Let's Encrypt "manual mode". In manual mode, the Let's Encrypt client generates a cryptographic "challenge/response", which then needs to be served from the root directory of the gitenberg.org web server.

Even running Let's Encrypt in manual mode required some jumping through hoops. It won't run on Mac OSX. It doesn't yet support the flavor of Linux used by Elastic Beanstalk, so it does no good configuring Elastic Beanstalk to install it there. Instead I used the Let's Encrypt Docker container, which works nicely, and I ran a Docker-Machine inside "virtualbox" on my Mac.

Having configured Docker, I ran
docker run -it --rm -p 443:443 -p 80:80 --name letsencrypt \    
-v "/etc/letsencrypt:/etc/letsencrypt" \
-v "/var/lib/letsencrypt:/var/lib/letsencrypt" \
quay.io/letsencrypt/letsencrypt:latest -a manual -d www.gitenberg.org \
--server https://acme-v01.api.letsencrypt.org/directory auth
 
(the --server option requires your domain to be whitelisted during the beta period.) After paging through some screens asking for my email address and permission to log my IP address, the client responded with
Make sure your web server displays the following content at http://www.gitenberg.org/.well-known/acme-challenge/8wBDbWQIvFi2bmbBScuxg4aZcVbH9e3uNrkC4CutqVQ before continuing:
8wBDbWQIvFi2bmbBScuxg4aZcVbH9e3uNrkC4CutqVQ.hZuATXmlitRphdYPyLoUCaKbvb8a_fe3wVj35ISDR2A
To do this, I configured a virtual directory "/.well-known/acme-challenge/" in the Elastic Beanstalk console with a mapping to a "letsencrypt/" directory in my application (configuration page, software configuration section, static files section.). I then made a file named  "8wBDbWQIvFi2bmbBScuxg4aZcVbH9e3uNrkC4CutqVQ" with the specified content in my letsencrypt directory, committed the change with git, and deployed the application with the elastic beanstalk command line interface. After waiting for the deployment to succeed, I checked that http://www.gitenberg.org/.well-known/acme-challenge/8wBD... responded correctly, and then hit <enter>. (Though the LE client tells you that the MIME type "text/plain" MUST be sent, elastic beanstalk sets no MIME header, which is allowed.)

And SUCCESS!
IMPORTANT NOTES:  - Congratulations! Your certificate and chain have been saved at    /etc/letsencrypt/live/www.gitenberg.org/fullchain.pem. Your cert    will expire on 2016-02-08. To obtain a new version of the    certificate in the future, simply run Let's Encrypt again.
...except since I was running Docker inside virtualbox on my Mac, I had to log into the docker machine and copy three files out of that directory (cert.pem, privkey.pem, and chain.pem). I put them in my local <.elasticbeanstalk> directory. (See this note for a better way to do this.)

The final step was to turn on HTTPS in elastic beanstalk. But before doing that, I had to upload the three files to my AWS Identity and Access Management Console. To do this, I needed to use the aws command line interface, configured with admin privileges. The command was
aws iam upload-server-certificate \ --server-certificate-name gitenberg-le \ --certificate-body file://<.elasticbeanstalk>/cert.pem \ --private-key file://<.elasticbeanstalk>/privkey.pem \ --certificate-chain file://<.elasticbeanstalk>/chain.pem
One more trip to the Elastic Beanstalk configuration console (network/load balancer section), and gitenberg.org was on HTTPS.


Given that my sys-admin skills are rudimentary, the fact that I was able to get Let's Encrypt to work suggests that they've done a pretty good job of making the whole process simple. However, the documentation I needed was non-existent, apparently because the LE developers want to discourage the use of manual mode. Figuring things out required a lot of error-message googling. I hope this post makes it easier for people to get involved to improve that documentation or build support for Let's Encrypt into more server platforms.

(Also, given that my sys-admin skills are rudimentary, there are probably better ways to do what I did, so beware.)

If you use web server software developed by others, NOW is the time to register a feature request. If you are contracting for software or services that include web services, NOW is the time to add a Let's Encrypt requirement into your specifications and contracts. Let's Encrypt is ready for developers today, even if it's not quite ready for rank and file IT administrators.

Update (11/12/2015):
I was alerted to the fact that while https://www.gitenberg.org was working, https://gitenberg.org was failing authentication. So I went back and did it again, this time specifying both hostnames. I had to guess at the correct syntax. I also tested out the suggestion from the support forum to get the certificates saved in may mac's filesystem. (It's worth noting here that the community support forum is an essential and excellent resource for implementers.)

To get the multi-host certificate generated, I used the command:
docker run -it --rm -p 443:443 -p 80:80 --name letsencrypt \
-v "/Users/<my-mac-login>/letsencrypt/etc/letsencrypt:/etc/letsencrypt" \
-v "/Users/<my-mac-login>/letsencrypt/etc/letsencrypt/var/lib/letsencrypt:/var/lib/letsencrypt" \
-v "/Users/<my-mac-login>/letsencrypt/var/log/letsencrypt:/var/log/letsencrypt" \
quay.io/letsencrypt/letsencrypt:latest -a manual \
-d www.gitenberg.org -d gitenberg.org \
--server https://acme-v01.api.letsencrypt.org/directory auth
This time, I had to go through the challenge/response procedure twice, once for each hostname.

With the certs saved to my filesystem, the upload to AWS was easier:
aws iam upload-server-certificate \
--server-certificate-name gitenberg-both \
--certificate-body file:///Users/<my-mac-login>/letsencrypt/etc/letsencrypt/live/www.gitenberg.org/cert.pem \
--private-key file:///Users/<my-mac-login>/letsencrypt/etc/letsencrypt/live/www.gitenberg.org/privkey.pem \
--certificate-chain file:///Users/<my-mac-login>/letsencrypt/etc/letsencrypt/live/www.gitenberg.org/chain.pem
And now, traffic on both hostnames is secure!

Resources I used:

Update 12/6/2015:  Let's Encrypt is now in public beta, anyone can use it. I've added details about creating the virtual directory in response to a question on twitter.