Palagpat Coding

Fun with JavaScript, HTML5 game design, and the occasional outbreak of seriousness

Why do we need NaCl?

Tuesday, January 17, 2012

Just a quick post on some of the touted aspects of Google's Native Client (NaCl), and how they match up against the various browser-native alternatives. I'll have more to say on this soon.

Audio

Pepper audio API vs. Mozilla's Audio Data API

User input

NaCl input events vs. Mozilla's joystick API & mouse lock API (in fact, these are both part of a larger Mozilla project called Paladin that could be seen as something of a browser-native alternative to NaCl)

3D graphics

NaCl 3D graphics vs. WebGL

So why do we need NaCl, again?

Optimizing the Dojo DataGrid

Wednesday, November 9, 2011

Recently I had the privilege of both attending and presenting at the inaugural DojoConf event, a developer's conference focused specifically on the Dojo toolkit, but more generally, on using it in conjunction with JavaScript best practices to build awesome stuff. My topic was something that seems to engender a lot of interest, at least based on the comments I got afterward: how to use the Dojox DataGrid widget to its best advantage.

(If you want to follow along at home, my HTML5-powered slides — which use the jQuery-dependent deck.js, ironically enough — are here. If you have an older browser, there's a slightly lower-fidelity copy on my SlideShare page)

In my day job at White Oak Technologies, we spend quite a bit of time and energy on helping our clients to find interesting nuggets of information hidden in gigantic piles of data. For the past year or so we've been working with a normalized database schema that's grown to just shy of 200 columns wide, and users have started producing query result sets that frequently contain tens (or even hundreds) of thousands of rows. To say it's grown beyond our team's expectations would be understating things, and as these data sizes ballooned, we started getting reports of substandard — in many cases outright unacceptable — performance.

For the rest of this article, I'm going to outline what we learned about the DataGrid, why it was underperforming so badly, and what we could do to get around its limitations.

Grid Hacking 101

When dealing with large data sets, the DataGrid's defaults and the sample code from Dojo's online tutorials are a poor fit at best. There are basically four different classes of hacks to consider when using the Grid for big data: its configuration, presentation structure, custom formatting options, and data store optimization.

Configuration Hacks

Until recently, almost all Dojo tutorials involving data-bound widgets used either the ItemFileReadStore or ItemFileWriteStore class as their data provider. That's fine for small, static data sets, and it's pretty easy to get up and running with these; the problem is that they're not optimized. As Bryan Forbes said in a recent tutorial on the Dojo website:

dojo.data.ItemFileReadStore and dojo.data.ItemFileWriteStore were originally intended only as reference implementations. For a more performant store, consider using dojox.data.JsonRestStore.

Bryan Forbes, DojoToolkit.org

Bryan's suggestion is a good one, if your back-end is fully RESTful. Another good option is the one we've been using for the past few years, dojox.data.QueryReadStore, which provides support for partial dataset loading: all filtering, sorting, and pagination are handled by the server, instead of having all the data in memory on the client all the time. Once you get over a few hundred or a thousand rows of data, that becomes crucial.

Along with using a better data store, you should consider the number of records DataGrid should request in each transaction. This is exposed via the rowsPerPage property, and defaults to 25 – meaning the widget will request 25 rows at a time. Depending on the speed of the server code that fetches your data, it might make more sense to fetch rows in batches of 10, 50, or even 100. This will probably take you some trial and error to get right, but fine-tuning could help a lot.

Structure Hacks

As I said during my presentation: “the more columns you show, the slower you'll go.” While DataGrid does have the awesome ability to create new DOM elements on demand, throwing away nodes previously created but no longer visible as you scroll down the grid, it can't do the same thing for columns.

Like I said at the beginning, our primary schema has almost 200 distinct columns, all fairly sparsely populated but all of some degree of significance to the user. Originally, each of those columns was being shown in our primary DataGrid, even if the user viewing the grid never bothered to scroll to the right to see the ones beyond the right edge of the grid's viewport! Each of those cells has a CSS display value of table-cell, which means they all have to be taken into account by the browser's reflow calculations.

As a rough example, say your DataGrid keeps three 25-row pages in memory at any given time (which is the default behavior). For each extra column added to the structure definition, the grid has to manage an additional 75 DOM nodes, not counting the overhead of nodes needed to render the column header. Two hundred columns would mean 15 thousand DOM nodes, which get repeatedly created and destroyed, triggering reflow after reflow as the user scrolls through the grid! See why that might be sub-optimal?

There are a couple of things to consider when evaluating your grid structure. The first is to look for empty columns. Query your database for those columns that never have a value in them (i.e., they have a cardinality of 0): there's a good chance you can eliminate that column from your grid, unless the absence of any values is itself meaningful information.

The second structural hack involves careful evaluation of your schema to determine which fields actually need to be conveyed at a glance in tabular format, and which could be reserved for a “details view” when examining a particular row. Not only can this drastically reduce the load on your grid, but it may even make the grid more useful for your users... I recently had an enlightening conversation with one of our power users, who told me that seeing the entirety of the schema makes the grid harder for him to use! The salient data points he's seeking get lost in a sea of unimportant details: the proverbial “needles” lost in the haystack.

Formatting Hacks

One of the really cool things about the DataGrid is its ability to adapt to accommodate the data it presents. By default, it tries to size each row according to the height of the cell containing the most data in that row, and it does this by looking at the offsetHeight property of each node in the row — which will most certainly trigger a reflow. That is, it forces the browser to recalculate the layout of every visible element in the document, which can be slow, especially if you've got a lot of DOM nodes (at which point I refer you back to my earlier point about column count).

Fortunately, DataGrid provides us with an escape hatch to get around these potentially expensive calculations: if your data values are of a predictable size, you can preemptively set the rowHeight parameter to fit that size. The DataGrid's internal _ViewManager class then takes your explicit height as a given, and bypasses all of the height calculation code. When we figured out we could do this, it gave us a big performance win. Your mileage may vary, but it's worth setting if you don't care about dynamic row height.

Another performance bottleneck comes from overuse (or abuse) of custom cell formatters. If you have a formatter function that returns a cool rich-text representation of a cell's value, awesome. Good for you. If you have multiple columns with custom formatters, that's cool too. But if your formatters have to do any kind of slow work (like heavy-duty string parsing, complex math calculations, etc), you might be introducing another bottleneck. Remember our numbers above: by default, your grid may have up to 75 rows in memory at any given time. If each row takes even half a second to render due to slow formatters, things are going to start feeling sluggish as you scroll.

Too many cell formatters!
Slick, but slow. #winning?

“But Ryan,” you say, “my grid is so much easier on the eyes with formatters!” I don't doubt that it is, and I'm not saying not to use them. Just be careful... is formatting alone worth a price paid in poor responsiveness? If your formatters are small and quick, you'll likely not have a performance problem. However, if you've got formatters that take a lot of processing time, you should consider doing that formatting on the server side, caching the “cooked” representation, and sending that to the Grid in place of the raw data value. Then the Grid is just rendering the data you're passing down the wire, rather than having to generate it on the fly. That could make a noticable difference in user-perceived response time.

Query Hacks

Finally, don't forget the server side of the equation. Some possible hacks to consider:

If you have an extremely expensive / slow database query, maybe it makes sense to have a really large rowsPerPage size, so you don't have to pay the database penalty as often.

Maybe you can optimize that query, or have an experienced DBA look at it for you to see if there are any available caching strategies that you may have missed.

Or, you could periodically save slow database results in a JSON file and use a JsonRestStore to back your grid instead, which could work fine if your data set doesn't change very often.

You might be able to federate your query somehow, splitting it into multiple, smaller database queries and piping the results back to the grid as they return via a custom-written data store (we're doing this in one area of our app, and the user experience was dramatically improved as a result).

The point is, there are more than likely a lot of different ways for you to approach the server end of your grid transactions, and restricting your optimizations to the client side will limit your effectiveness in the end.

Putting it all together

As part of my presentation, I put together a sample page that illustrates the difference between poorly-configured DataGrids and one where these performance hacks have been applied. The difference is pretty substantial; here are some timings for the three grids, based on recent runs on a relatively suboptimal test platform (Internet Explorer 8 on my year-old work PC), and a pretty good one (Chrome 14 on my new dual-core laptop):

DataGrid Columns Datastore Setup time Rows rowPerPage Paging time (averaged)
Slow (Chrome) 206 ItemFileReadStore 6.978 sec 1,000 100 6.823 sec
Slow (IE8) 206 ItemFileReadStore 17.66 sec 1,000 100 13.629 sec
Optimized (IE8) 6 QueryReadStore 0.546 sec 100,000 25 0.281 sec

It's remarkable to me how much slower the poorly-configured DataGrid is, even though it's rendering only 1% of rows that the optimized DataGrid handles with ease. Note too that the demo grids only differ in three ways: a better data store (QueryReadStore versus ItemFileReadStore), the configured rowsPerPage, and significantly fewer columns in the structure (6 versus 206). If dynamic row height and custom cell formatters were included, the difference could be even more dramatic.

Finally, although we web developers may wish we could ignore support for legacy versions of Internet Explorer, IE8 is likely to be around for a while, and the "Slow" grid is not just slow in IE, it's nearly unusable. Nearly 18 seconds just to create the grid, and more than 13 every time you fetch a new page? Users are more likely to just navigate away in disgust than they are to quietly put up with that noise. Especially in light of the recent moves by the Mozilla and Chrome teams to ultra-compressed release cycles, there are going to be environments where, like it or not, "get a better browser" isn't going to be an acceptable solution to this problem.

One postscript on timings, which I didn't cover during my talk. I've added a third grid to the test page, identical to Optimized Grid in every way but one: 5 of the 6 columns have (relatively benign) custom formatter functions. The difference in performance illustrates a point worth making:

DataGrid Setup time Paging time (averaged)
Optimized (IE8) 0.546 sec 0.281 sec
Custom-formatted (IE8) 0.546 sec 0.374 sec

As the table shows, the setup time for the two grids was exactly the same; it's only paging time that suffers, and even then, it's not by much. So, if you can optimize enough other aspects of your grid, and if your formatters are reasonably fast, there's no harm in using them... just don't go crazy.

The Future

Shortly after my talk was accepted, the Sitepen blog published an article by Kris Zyp, Dojo ninja, about how the DataGrid was getting a little long in the tooth, hard to configure, and harder to style (these things are all true, of course). Then he announced that he was working on a new dGrid, which would address all of these problems:

... the DataGrid is suboptimal and difficult to customize and extend. The time has come for a fresh start on the grid.

Kris Zyp, Sitepen blog

I was stunned. The new grid sounded awesome, but what did this mean for my talk? Was I giving what would amount to the DataGrid's eulogy? I reached out to Kris, and learned that the plan is, apparently, for both components to live on, each suited to a slightly different use case (I gather the goal is to make dGrid much lighter and more flexible, but at least initially, it may not be as feature-rich).

As it happened, Kris immediately followed me in the DojoConf schedule, presenting the work in progress on his new project, and I found myself among those wanting to use dGrid as soon as I could, because it does indeed look very, very cool. A co-worker and I have been playing with the latest version for a little while, and although our humongous data will probably stay in DataGrid for the foreseeable future, there are other places where dGrid looks like it'll fit quite comfortably. As with so many other things in our field, this is a case of using the right tool for the right job.

JSConf.us 2011

Monday, May 2, 2011

Oregon or Bust!

This May my company gave me the opportunity to fly to Portland for a few days to be a part of the annual JSConf, JavaScript developers conference. I was grateful for the chance to get out of the office for a few days and take a step back to look at my day to day responsibilities from a slightly better vantage point. Plus, I got to hear from some pretty awesome speakers about the latest and greatest stuff happening in the Web-centric world, which never fails to inspire me to come back and write better code myself.

I took notes on a lot of the talks I attended, although sadly not all of them. I'll share some of those here, in more-or-less unfiltered form; I'll likely have more to say about specific aspects of the trip at a later date.

Day 1 (May 2nd)

Allen Pike: Making a JS meetup blow minds

Alan reviewed his efforts to bootstrap a JavaScript community in Vancouver. By organizing and shepherding VanJS (the Vancouver JavaScript developers meetup), he learned about the priciples that go into a successful local event:

  • should last around 45 minutes
  • should be 2 speakers (not everyone will connect with a single speaker, but the odds go up quite a bit with two, while not stretching things out too much)
  • should happen on a Monday through Thursday, not a weekend (makes people less likely to have other plans).
  • Speakers? Start with yourself, ask your friends to help (Don't go after the "big names" right away).
  • Make sure there's "beer" afterward (i.e. a pub of some kind within walking distance) — “this is what turns a group of people talking about code into a community” (it certainly seems to be a key part of the JSConf formula)

I definitely left this session inspired to try something like this in my own community. Stay tuned on that one.

Paolo Fragomeni: Rethinking Templating

  • Django-style templating, while currently acceptable, is “really not ok” (as it pollutes the HTML markup)
  • Introduced his own project, weld
  • Avoid "too much templating"
  • Instead, leverage your APIs and deliver data via AJAX/JSON

I can see where Paolo is coming from, but completely removing templating logic struck me as a bit extreme. Maybe my opinion will change over time, but right now I fail to see the benefit.

Day 2 (May 3rd)

Ray Morgan: App vs Web

Ray detailed lessons he'd learned at Zappos.com while working on their mobile offerings.

  • iPad app (and full API) developed in 8 weeks, after 3 weeks of experience
  • when developing an API, build an app to go along with it
  • “native apps aren't that hard to write”
  • live-coded a simple Objective-C app
  • crazy-weird syntax, but altogether not that hard (lots of good documentation)
  • instead of targeting Android, they targeted Mobile Web:
    • “Should mobile web really be trying to emulate native apps?”
    • it takes a lot of code to do things (even simple things) with mobile web
  • why not just semantic HTML?
  • progressive enhancement
  • you don't have to support everything, just your users
  • focus on usability
  • simplicity (“links should be links”)
  • flickr mobile website “captures what the web is about” (but back button behavior is still a little frustrating
  • Facebook provides all the core functionality in a mobile-friendly way

Adam Baldwin: Writing an (in)secure webapp in 3 easy steps

  • “I haven't secured it yet” — the problem is that “yet” never comes
  • devs should take a little more responsibility for security
  • #! navigation (e.g. "/#http://evilpacket.net/login") will work if it sets CORS headers right
  • don't roll your own encoders — there are others who've done it already
  • Content Security Policy (new feature in Firefox 4)
  • implementing browser-specific security policies may not completely cover you, but it puts you ahead of the guy who doesn't do anything
  • cross-site request forgery
  • clickjacking (x-frame-options allows you to specify which sites are allowed to frame you (if any)
  • cookies (HTTPOnly / secure)

Dan Webb: Pimp your JS library

  • “All programmers are API designers”
  • a beautiful API is what made jQuery so successful
  • 3 things:
    1. predictability
      • developers don't want complicated solutions when they go looking for a library
      • think about your audience
      • useCamelCase, yesReally
      • be careful with polyfills (your library probably isn't the right place to fix array.forEach)
      • borrow conventions from other popular JS libs (e.g. Raphael aping jQuery's attr())
    2. simplicity
      • Steve Krug's Don't Make Me Think
      • “Don't make me RTFM again...” (hint: your API is too big)
      • decide on sensible defaults, and what can be optional - called out the DOM .initMouseEvent() and .addEventListener() methods for overly-complex APIs - use options hashes (e.g. Dojo's style) for optional arguments
      • function calls should read well (compared DOM node replace functions in raw DOM vs Dojo vs jQuery)
    3. flexibility
      • how?
      • don't be like the 60-tool swiss army knife: you can't please everyone!
      • instead of infinitely growing your options hash, think about how to add hackability
      • have public, internal, and protected API levels (e.g. _internal vs public)
      • solve your own problems with your library

Matthew Eernisse: Heaven and Hell: JavaScript Everywhere

  • An AIR app doesn't equal “bad app” anymore than “native app” means “better”
  • problems with the AIR JS environment:
    • older Webkit
    • limited/nonstandard JS (no eval, sandboxBridge instead of postMessage, etc)
    • debugging is painful (no firebug/console)
  • limit use of privates: “sometimes you need to monkeypatch, so you can just ship”
  • test!
  • system tools: look at Jake
  • Node isn't the first SSJS... it's been around in various forms since the late 90s
  • if you aren't testing in the real environment, you aren't really testing

Rebecca Murphey: Modern Javascript

I missed this the first time around, but the Twitter feedback and positive buzz afterward were enormous, so it was one of the very first talks I watched when they started showing up on the JSConf channel on blip.tv. There's a great writeup of the talk here, Rebecca blogged about it later herself, and it's really, really worth your time to watch the video.

Dethe Elza: Introducing Waterbear

  • Alan Kay, who worked on the Alta, which in turn inspired the Mac, said: “The revolution hasn't happened yet”
  • promote a revolution in JS teaching (echoes of Rebecca Murphey's talk)
  • showed off Scratch, a visual code-builder
    • snap together building blocks
    • readable code
  • Waterbear is the same idea: drag and drop script blocks
  • Safari/Mobile Safari are the targets right now
  • children learn best by immersion, but the only thing we actually teach that way is language
  • making the world safe for “casual programmers” — those who only build little chunks when they need (e.g. a poll on a personal blog)

Thomas Fuchs: Zepto

  • what does a “real” computer have that a mobile device doesn't?
    • a fast & stable network connection
    • lots of storage
    • fast, multi-core CPUs
    • hardware-accelerated graphics
  • all the major JS libs were created before phones had web browsers of any significance
  • document.querySelectorAll returns a NodeList, not an array — otherwise it's pretty awesome
    • You can call [].slice.apply(nodelist) to get an array
  • native JSON is faster than library implementations
  • [1,2,3].forEach(alert)
  • why do we need a true mobile JavaScript framework? - more code causes longer download and init times - need something easy to extend - need a fallback for non-webkit browsers
  • Talked about Zepto
    • target size: 5k
    • jQuery-compatible API
    • uses webkit features whenever possible
    • easily replacable with full jQuery if needed
    • but it's 4.83kb vs 31.33k
  • “one more thing...”
  • classic frameworks have a lot of drawbacks on mobile:
    • they force you into an API
    • do all things and do them ok-ish
    • not modular (enough)
    • 25k+ minified/gzipped
    • many extensions are now available in latest DOM/JS
  • micro-frameworks, on the other hand:
    • do one thing, and do it really well
    • directly or loosely-coupled
    • no dependencies
    • small is beautiful (faster, easier to grok, fewer bugs, and you'll really grok JS)
  • Launched MicroJS.com: a simple list that tells you what's out there

Nikolai Onken: Mobile Deathmatch — (almost) all you need to know in 20 minutes

Nikolai talked about a way to do lightweight, inter-object communication, binding object state to the DOM. Three main points:

  1. Delegate
    • tiny library to approach event handling differently (it's basically just a mixin)
    • less overhead than dojo.connect
    • simple, clear implementation
    • more explicit than pub/sub
    • https://github.com/uxebu/delegate
    • showed an example using dojo.io.iframe and Delegate to check file upload progress (WOW)
  2. Data binding
    • based on delegate's event handling
    • no templating language
    • objects emit events on state change
    • data binding takes care of modifying DOM
  3. DOM events
    • binds native DOM events to object methods

Tobias Schneider: VNC with JavaScript

Tobias showed how you can create your own services on Node, built on libvnc. HOLY COW, his live-coding demo blew my mind! You should totally watch the video, it's pretty amazing what he's doing.

JavaScript: Say it like you mean it

The final talk was a fakeout by Peter Higgins, then Alex Russell and a co-worker from Google talked about the future of JavaScript, specifically introducing Traceur, their JS.Next->JS.Now compiler. Interesting stuff. I liked the gist of what Alex said:

language design is library design, library design is language design

Hi, I'm Buyog... I mean Ryan

I love that conferences provide the opportunity for "hallconf", i.e. the meetings between the scheduled meetings that are often our first "meatspace" interactions with people we'd previously known solely through their digital personas. That's pretty cool, because as awesome as the Internet is, there's something about physical interaction that just deepens relationships in important ways. I suspect it's buried deep in our shared subconscious as human beings.

Anyway, on this trip, I had the pleasure of meeting Carter Rabasa (IE Product Manager for Microsoft), Dustin Machi (Sitepen developer in Blacksburg Virginia, an area where we have some interest in relocating at some point), Rey Bango (Microsoft developer evangelist, jQuery committer, and Ajaxian), Daniel Lopez (front-end guy at Zappos.com), and Richard D. Worth (jQuery UI lead, Dev/Trainer at Boucoup, and founder of RewardJS). I even got to help Richard's RewardJS out a little with some code I hacked together for a leaderboard for that project, which became my first Github pull request! Pretty neat.

I hereby resolve...

Whenever I attend events like these, I always come away with a fresh view of code challenges I face back home in my role as my project's lead front-end guy. This trip was no exception: I see a lot of opportunities for improvement in the code I'm producing from day to day, and hope to pass along something of value to the rest of my team.