JakeSavin.com

“Have no fear of perfection—you’ll never reach it.” – Salvador Dalí

Hello World!

If you see this, you’re looking at all of my old site’s content running on my self-hosted WordPress server… Phew!

I think that’s enough for tonight, but there will be more soon. I’ve got a bunch of details I want to write up about this project. Plus now that I have the tools I wrote to migrate from Manila to WordPress, I’ve got a bunch of other old content I want to migrate.

I suppose I should first figure out some redirects though, at least so my RSS subscribers don’t all break.

Stay tuned… ;-)


Ps. All of the <guid>’s in my feed are now changed to a new format. I apologize that your RSS aggregator is probably about to freak out now. Fixing this was sadly not worth the effort at this point. :-(

Pps. I realize it’s now 4:30 am, but I couldn’t let my links rot. I modified my WordPress permalink format to make legacy incoming links continue to work. I still need to do some testing, but for the moment things are much better than having basically every incoming permalink go 404.

Porting to WordPress: Worknotes Part 1

wordpress-logo-sm.png:
About a week ago I started a project to port this site from
Manila to WordPress. While there are probably very few Manila users still out there who might want to do this, I thought it would still be a useful exercise to document the process here, in case anything I’ve learned might be useful.

This is something I have been wanting to do in my spare time for many months now — probably two years or more. But with family and work obligations, a couple of job changes, and a move from Woodinville to Seattle last fall, carving out the time to do this well was beyond my capacity.

Now that I’m recently between jobs, the only big obligation I have outside of spending time with my wife and son is to find another job. Some learning can’t hurt that effort. Plus I seem to have a surplus of spare time on my hands right at the moment.

Managed or Self-hosted?

The first question I needed to answer before even starting this process is whether I want to host on a managed service (most likely WordPress.com), or if I should self-host. There are trade-offs either way.

The biggest advantages of the managed option come from the very fact that the servers are run by someone else. I wouldn’t have to worry about network outages, hardware failures, software installation and updates, and applying an endless stream of security patches.

But some of the same features which are good if I were to go with a hosted solution, are also limiting. I would have limited control over customization. I wouldn’t be able to install additional software along-side of WordPress. I would be limited to the number of sub-sites I was willing to pay for. I wouldn’t necessarily have direct access to the guts of the system (database, source code, etc).

Most importantly, I wouldn’t be in control of my web presence end-to-end — something which has been important to me ever since I first started publishing my own content on the Web in 1997.

There’s one more advantage of self-hosting which is important to me: I want to learn how WordPress itself actually works. I want to understand what’s actually required to administer a server, and also start learning about the WordPress source code. The fringe benefit of this is also learning some PHP, which while some web developers prefer alternate languages like Ruby, Python, or Node.js, the install-base of WordPress itself is so enormous, that from a professional development perspective, learning some PHP is a pretty sensible thing to do.

I decided to go self-hosted, on my relatively new Synology DS-412+ NAS. It’s more than capable of running the site along with the other services I use it for. It’s always on, always connected to the Internet, and has RAID redundancy which will limit at least somewhat, the risks associated with hardware failure.

Develop a Strategy

The next thing I needed to work out was an overarching plan for how to do this.

Aside from getting WordPress installed and running on my NAS, how the heck am I going to get all the data ported over?

First, I made a list of what’s actually on the site:

  1. A bunch of blog posts (News Items) sitting in a Frontier object database
  2. Comments on those posts
  3. A small amount of discussion in the threaded discussion group
  4. User accounts for everyone who commented or posted in the discussion group
  5. A bunch of pictures and other media files
  6. A few “stories”
  7. Some “departments” that blog posts live in
  8. A site structure that put some pages on friendlier URLs
  9. Logs and stats that I don’t care much about
  10. A sub-site I never posted much to, and abandoned years ago

For the most part, there aren’t any types of information that don’t have an allegory in WordPress. News items are blog posts, comments are comments, stories are pages, pictures are image attachments, departments are categories. The stats and logs I’m happy to throw away. Not sure what to do with the site structure, but if push comes to shove, I can just use .htaccess files to redirect the old URLs to their new homes.

Next I needed a development environment — someplace where I can write and refine code that would extract the data and get it into WordPress.

On the Manila side, I did some work a little over a year ago to get Manila nominally working in Dave Winer’s OPML editor, which is based on the same kernel and foundation as UserLand Frontier, over which Manila was originally developed. The nice thing about this is that I have a viable development environment that I can use separately from the Manila server that’s currently running the production site.

On the WordPress side it makes sense to just host my development test sites on my MacBook Air, and then once I have the end-to-end porting process working well, actually port to my production server — the Synology NAS.

Data Transmogrification

Leaving the media files and comments aside for a moment, I needed to make a big decision about how to get the blog post data out of my site, and into my WordPress site. This was going to involve writing code somewhere to pull the data out, massage it in an as-yet unknown way, and then put it somewhere that WordPress could use it to (re-)build the site.

It seemed like there were about five ways to go and maybe only one or two good ones. Which method I picked would determine how hard this would actually be, how long it might take, and if it’s even feasible at all.

Method 1: Track down Erin Clerico

A bunch of years ago, Erin Clerico (a long-time builder and hoster of Manila sites in the 2000’s) had developed some tools to port Manila sites to WordPress.

As it turned out, a couple years back I’d discussed with Erin the possibility of porting my site using his tools. Sadly he was no longer maintaining them at that time.

If I remembered correctly, his tools used the WordPress API to transmit the content into WordPress from a live Manila server — I have one of those. It might be possible, I thought, to see if Erin would share his code with me, and I could update and adapt it as necessary for my site, and the newer versions of WordPress itself.

But this was unknown territory: I’ve never looked at Erin’s code, know very little about what may have changed over the years in the WordPress API, and don’t even know if Erin still has that code anywhere.

Method 2: Use the WordPress API

I could of course write my own code from scratch that sends content to WordPress via its API.

This would be a good learning exercise, since I would get to know the API well. And the likelihood that WordPress will do the right thing with the data I send it is obviously pretty high. Since that component is widely used, it’s probably quite well tested and robust.

This approach would also work equally well, no matter where I decided to host the site — on my own server or whatever hosted service I chose.

But there potential problems:

  • Manila/Frontier may speak a different dialect on the wire than WordPress — I haven’t tested it myself.
  • Client/server debugging can be a pain, unless you have good debugging tools on both sides of the connection. I’ve got great tools on the Manila side, but basically no experience debugging web services in PHP on the WordPress side.
  • It’s likely to be slow because of all the extra work the machines will have to do in order to get the data to go between the “on-the-wire” format and their native format. (This will also make debugging more tedious.)

Method 3: Use Manila’s RSS generator

Of course Manila speaks RSS (duh). And WordPress has an RSS import tool — Cool!

In theory I should be able to set Manila’s RSS feed to include a very large number of items (say 5,000), and then have WordPress read and import from the feed.

The main problem here is that I would lose all the comments. Also I’m not sure what happens to images and the like. Would they be imported too? Or would I have to go through every post that has a picture, upload the picture, and edit the post to link to the new URL?

I’m less worried about the images, since I can just maintain them at their current URLs. It’s a shame not to have real attachment objects in my WordPress site, but not the end of the world.

Loss of the comments however would be a let-down to my users, and would also limit the export tool’s potential usefulness for other people (or my other sites).

Method 4: Make Manila impersonate another service

In theory it should be possible to make Manila expose RPC interfaces that work just like Blogger, LiveJournal, or Tumblr. WordPress has importers that work with all of these APIs against the original services.

Assuming there aren’t limitations of Frontier (for example no HTTPS, or complications around authentication) that would prevent this from working, this should get most or all of the information I want into WordPress.

But there are limitations with some of the importers:

  • The Tumblr importer imports posts and media, but I’d lose comments and users (commenters’ identities)
  • The LiveJournal importer seems to only understand posts
  • The Movable Type and TypePad importer imports from an export/backup file, and understands posts and comments, but not media

The only importer that appears to work directly from an API, and supports posts, comments, and users is the Blogger importer. (It doesn’t say it’ll pick up media however.)

In the Movable Type / TypePad case, I’d have to write code to export to their file format, and it’s not clear what might get lost in that process. It’s probably also roughly the same amount of work that would be needed to export to WordPress’ own WXP format (see below), so that’s not a clear win.

When it comes to emulating the APIs of other services (Blogger, Tumblr, LiveJournal), there’s potentially a large amount of work involved, and except for Blogger, there would be missing data. There’s also the non-trivial matter of learning those APIs. (If I’m going to learn a new API, I’d rather learn the WordPress API first.)

Method 5: Make Manila pretend to be WordPress

While researching the problem, I discovered quickly that WordPress itself exports to a format they call WXR, which stands for WordPress eXtended RSS. Basically it’s an XML file containing an RSS 2.0 feed, with additional elements in an extension namespace (wp:). The extension elements provide additional information for posts, and also add comments and attachment information.

On first glance, this seemed like the best approach, since I wouldn’t be pretending to understand the intricacies of another service, and instead would be speaking RSS with the eXtended WordPress elements — a format that WordPress itself natively understands.

Also since I’m doing a static file export, my code-test-debug cycle should be tighter: More fun to do the work, and less time overall.

Method 6: Reverse-engineer the WordPress database schema

I did briefly consider diving into MySQL and trying to understand how WordPress stores data in the database itself. It’s theoretically possible to have Manila inject database records into MySQL directly, and then WordPress wouldn’t be the wiser that the data didn’t come from WordPress itself.

This idea is pretty much a non-starter for this project though, for the primary reason that reverse-engineering anything is inherently difficult, and the likelihood that I would miss something important and not realize it until much later is pretty high.

Time to get started!

I decided on Method 5: Make Manila pretend to be WordPress. It’s the easiest overall from a coding perspective, the least different from things I already know (RSS 2.0 + extensions), and should support all of the data that I want to get into WordPress from my site. It also has the advantage of being likely to work regardless of whether I stick with the decision to self-host, or decide to host at WordPress.com or wherever else.

Implementing the Blogger API was a close second, and indeed if Manila still had a large user-base I almost certainly would have done this. (There are many apps and tools that know how to talk to Blogger, so there would have been multiple benefits from this approach for Manila’s users.)

In my next post I’ll talk about WXR itself, and the structure of the code I wrote in the OPML Editor to export to it.

Join the Battle for Net Neutrality

Spinner-DarkRed.gif:
Today, sites all over the Web are making a statement by

“[covering] the web with symbolic ‘loading’ icons, to remind everyone what an Internet without net neutrality would look like, and drive record numbers of emails and calls to lawmakers.”

Obviously if you’re reading this, you see that I’m participating. You can too.

Go here: https://www.battleforthenet.com/sept10th/

There are super-simple instructions there (scroll down) for adding a modal or banner to your site, to show your support. The modal like the one you saw here is best because visitors to your site can very easily add their names to the letter for congress, and also get connected to their representative by phone–without even having to dial. Either way all it takes is a few lines of HTML code in your site’s &lt;head> element.

The Web and indeed the Internet as we know it today wouldn’t exist if it weren’t for equal access to bandwidth without the throttling and corporate favoritism that the big ISPs and carriers are lobbying for. Without Net Neutrality, we will be forced to pay more for services we love, and miss out on continued incredible innovation that’s only possible if new and small players have the same access to Internet bandwidth as the BigCo’s.

Please help!

https://www.battleforthenet.com/sept10th/

Ps. If you need a spinner (gif or png), check out SpiffyGif.

CloudKit: Square Pegs and Round Holes

My long-time friend Brent Simmons has been pretty prolific on his blog recently — sadly me, not so much. (I’m working on it.) Monday, he wrote a response to Marco Tabini‘s Macworld article, Why you should care about CloudKit:

“While it’s technically possible to use the public data for group collaboration, it’s only the code in the client app that enforces this. CloudKit has no notion of groups.

“It would be irresponsible to use the public data for private group collaboration.

“Neither of the two apps mentioned as example — Glassboard and Wunderlist — should use CloudKit.”

I completely agree, and actually the question of whether Glassboard (or Twitter) would be possible to build with CloudKit, was the source of some discussion among some of the folks with whom I attended WWDC this year.

CloudKit doesn’t actually provide any mechanism at all for natively declaring that person X and person Y have access to resource Q (and noone else does). It provides the ability to securely and privately store some data for a single person as associated with an app -and/or- to store some data that’s available to everyone who is associated with that app. That’s it (mostly).

It’s possible separately via a web portal (not programmatically as far as I know), to configure a subset of data to be editable only by specific people, but the idea is more about providing a way for the maintainers of some data resources to update that data, than it is about providing a mechanism for users to create ad-hoc groups among themselves. (i.e. dynamic configuration data that’s loaded by the app at launch.)

While this is a super useful feature, the value of which hasn’t really been called out much by the iOS dev community, it is not what Marco Tabini described. (I can see how the misunderstanding arose though.)

But can’t I do groups on top of CloudKit?

Seems like a reasonable question, right? Why not leverage the lower-level infrastructure that Apple is providing, and implement the security over the top of it? Bad idea.

While it’s probably be theoretically possible to integrate an encryption library and set up a mechanism for building and maintaining groups that is actually private and secure — on top of Apple’s CloudKit service, this would be a terrible idea from a security, testability, and code maintainability perspective.

First there’s the issue of bugs and vulnerabilities in the encryption library you choose to include. I’m not saying anything specific about any particular open-source or licensable encryption code or algorithm, but this is a notoriously difficult thing to get right, and encryption is under constant attack from every angle you could imagine. The world’s government intelligence services and organized crime syndicates are almost certain to do a better job hacking these things than you (or the maintainers of the open source code) are going do at protecting your users.

Then there’s the problem of an external dependency keeping up with changes to iOS itself. Let’s say for example that two years from now you want to move your code to Swift, but you’re dependent on an open source project that hasn’t been updated to work either with Swift or with ObjC in the latest version of iOS. Guess what: You’re now in a holding pattern until either (gasp!) you port or patch the open source code, or someone else does. That’s a dependency you don’t want to take.

Then there’s Apple. It seems likely (and I speak without any insider knowledge at all) that at some point Apple will start to add group collaboration features to CloudKit itself, to its successor, or to some higher-level service.

Now you have another horrible choice to make: Do I continue to bear the burden of technical debt that comes from having rolled my own solution, or do I hunker down for six months and port to the new thing? And how do I migrate my users’ data? What’s going to break when I have a data migration bug? How am I going to recover from that? Where’s the backup?

(Brent also made the excellent point that if you want your users to be able to get to their data from anywhere else besides their iOS devices, CloudKit isn’t going to get you there right now.)

Architectural decisions should not be taken lightly

I’ll say it again: Architectural decisions should not be taken lightly.

You have to think deeply about this stuff right at the beginning if you want your app, your product(s), and your company to succeed over time. The big design decisions you make early on will have a lasting and possibly profound impact on what happens in the long run…

… And, when it comes to privacy and security, we almost never get second chances. You should fully expect that a breach of trust, whether intentional or not, will be met with revolt.

Looking at the situation from 30,000 feet: Would you rather go with a somewhat more difficult solution up-front, one that came perhaps with some of its own problems, but which solved the privacy, security, and platform-footprint issues right now?

Or would you rather build something you don’t fully understand yourself, on top of a service which isn’t really intended to do what you’re forcing it to do?

Just sayin’…

CloudKit is very promising

For simpler scenarios, CloudKit is going to provide a ton of value. More than likely, the service will meet the needs of a huge number of developers… with some caveats:

  • It’s Apple-only. You’re not going to get to the web or Android right now, and no promises at all about the future.
  • Access is public or individual. There’s no good way to deal with groups right now.
  • You can’t write any server-side business logic. It’s purely a data store, and that’s it. This might change in the future, but don’t bet your business or livelihood on it.

Those are the big ones. There are almost certainly others, including pricing, resiliency, backups, roll-backs, etc.

Cloud-based data storage is a huge and complex field. I for one am very happy to see Apple taking a methodical and measured approach to it this time around. But that inherently means we have to live within its limitations.

I’m confident that CloudKit is the right approach for a lot of developers, and mostly confident that it will work for those developers and not fall on its face. It’s not the end-all and be-all that some folks would want it to be. And frankly I’m glad it’s not trying to be.

Is it possible to leave Facebook?

(Posted originally to my Facebook feed.)

I keep trying to reclaim online time from Facebook, and then someone tags me or posts to my timeline, and my inner moderator kicks in.

And then there I am right back on Facebook again, with their web bugs and their GPS tracking. To me recently Facebook feels like the web version of talking to your friends on the phone while the NSA records your call, only plus baby pictures and pithy memes. (Oh-hi, NSA agent, how’s your day? Did I mention NSA? [Attempts Jedi hand-wave.])

I wonder sometimes if Facebook has made it so difficult to feel as if you have any privacy, that for some of us the only way to feel we’re not being spied on by the big-data-big-brother is to delete our accounts entirely–to commit Facebook-Seppuku.

… And now that I’ve said all that, I’m pretty sure that since I’m mentioning on Facebook, that I’m considering leaving, well, Facebook, I’m going to start getting a flood of “compelling” push notifications and emails saying how much my friends miss me and that I need to come back to Facebook and approve all those timeline posts and wish distant acquaintances “Happy Birthday” and the “like”…

I love it that my online life has brought me closer to those that I love, work with, and care about. I love that information now flows (mostly) so easily. I owe my livelihood to the Internet and the web.

But Facebook is *not* the Internet, people. It’s totally possible to interact engagingly online without it. Send some email. Start a website. Spread your footprint out to other services. Sure you won’t have quite so many “friends” “liking” your pictures or leaving pithy comments on your posts, but you might just get some intimacy back in return, and you’re going to have a hard time finding that inside the walls of Zuck’s castle.

rss_glass_128.png:
Monday will be the very last day that you will be able to access your RSS feeds using Google Reader, so if you haven’t already migrated to one of the other services, I strongly recommend that you…

Export your Google Reader data before Monday!

Here’s a quick how-to:

  1. Go to Google Takeout at https://www.google.com/takeout/
  2. Assuming you only want Reader data right now, click on Choose services at the top of the page
  3. Click on the Reader button
  4. Click the Create Archive button

Unless you have an enormous number of feeds, your archive should be created relatively quickly. If it’s taking a long time, you can check the box to send you an email when the archive is ready for download. (There’s no need to keep your browser opened in this case.)

Once it’s all packaged up, click the Download button next to your new archive. (If you did the email option, there will be a link in the email to take you to the download page.)

What the heck do I do with it?

The download will be a zip archive containing a bunch of JSON files and one XML file. The JSON files have your notes, likes, starred, shared and follower data.

The XML file – subscriptions.xml – is the important one. It has a list of all of your feeds, and what folders they are in. (It’s actually in OPML format, which is based on XML.) Most feed reading services and apps will know how to import this file, and recreate your subscriptions. Some will be able to understand your folder structure, but not all.

Preserving my read/unread state?

Sadly, importing just subscriptions.xml doesn’t keep your read/unread state, and most services also don’t know how import the JSON files at all.

There are only two web-based services that I’ve tried so far that actually do keep your read/unread states: Feedly, and Newsblur. Of the two, I prefer Newsblur’s UI over Feedly’s since it’s more like what I’m used to, but lots of people seem to like Feedly’s slicker, less cluttered UI better.

Both Feedly and Newsblur were able to import from Google directly, as can many others, but these are the only two I know of that keep your read/unread state. To do this, you connect the app to your Google Account, and they go out to Google to get your data.

Both services can also import your subscriptions.xml, connecting to your Google account is the better option if you’re doing your import before Reader is shut off. This will capture read/unread state (and in Newsblur’s case your shared stories) instead of just your subscriptions.

Edit: I just tried the new AOL Reader, and while it has a decent mobile web UI (gah), and did import my feeds from Google, it did not preserve my read/unread state.

Other web-based services

There are a slew of other services out there too, spanning a wide range of feature-completeness, API support, iPhone or other mobile apps, and social/sharing functionality.

The ones I’ve looked at most closely are:

  • David Smith’s Feedwrangler which lacks folder support but has a very interesting Smart Streams feature, and has its own iOS apps and API
  • The Old Reader, which started as a web-only Reader replacement that imitated an older version of Google’s UI, and does include folder support

Both services can import your feed list either directly from Google or using your subscriptions.xml from Google Takeout, but neither will preserve your read/unread articles or shared/starred stories.

Disclosure:

Last month, I started working at Black Pixel, which just released NetNewsWire 4 in open beta.

NetNewsWire is a desktop RSS reader for Mac, which was originally created by my friend and former colleague, Brent Simmons. Previous versions of the app supported Google Reader syncing, but Reader sync was removed from this version since Reader itself is shutting down.

To be clear: I’m not working on NetNewsWire at Black Pixel, so I don’t have intimate knowledge of the roadmap, but syncing will come.

There is some public information on the beta release announcement here.

I’m Joining Black Pixel

Last Friday was my last day at Hitachi Data Systems, and I mentioned that I’m leaving the BigCo’s for something new.

blackpixel-logo.png:
Wednesday will be my first day at Black Pixel. I’m SuperExcited™ to be joining the team! I’ve known a few folks there for some time, and regard all of them as super bright, thoughtful, respectful, high-integrity people—exactly the type of people that I want to surround myself with. When the opportunity arose to work with them, I immediately realized that it was too good to pass up.

In my own experience, the chance to make a move like this only comes along once every few years (if that). I’ve learned, sometimes the hard way, that when they happen you shouldn’t let them pass you by.

I love new challenges, and I can’t wait to start this new chapter in my professional life!

Ps. For anyone at HDS who might be reading this, it’s been a real pleasure working with you all. Thanks to everyone for all your help and support over the last year. The team really is awesome, and I wish you all great success moving forward!

Today is my last full day at Hitachi Data Systems. I’m leaving the world of the BigCo’s. Next week I start something new, and I’m super excited about it! More on that very soon…

In the meantime, here’s this:

Hello, and welcome to ‘The Middle of the Film’, the moment where we take a break to invite you, the audience, to join us, the film-makers, in ‘Find the Fish’. We’re going to show you a scene from another film and ask you to guess where the fish is, but, if you think you know, don’t keep it to yourselves. Yell out so that all the cinema can hear you. So, here we are with… ‘Find the Fish’.

(For a hint, check out the categories on this post.)

Filling in Behind Google Reader

love-rss-icon-sm.jpg:
Russell Beattie has compiled a quite comprehensive list of services and companies that either already exist, or are moving in to fill the gap that will be left behind when Google shuts off Reader.

And there seem to be around 50 of them.

My guess — and it’s a total guess — is that there’s room for one or two successful startups, and around five successful pivots by existing companies. It’s going to be one wild ride once the shutdown actually happens.

Of the 50 or so, probably fewer than 10 will end up being reasonable direct replacements for both Reader’s web interface and never-supported sync API. Half of those will screw up something like scaling or reliability pretty early, or fail to catch enough buzz to be sustainable financially.

So that leaves maybe 3 to 5 serious products that will back-fill the gap that Reader leaves behind in the near-to-mid term.

Just a guess.

(I’m very intrigued to see what the open source community provides in this space as well.)