Just a quick post on some of the touted aspects of Google's Native Client (NaCl), and how they match up against the various browser-native alternatives. I'll have more to say on this soon.
NaCl input events
vs.
Mozilla's
joystick API & mouse lock API (in fact, these are both part of a larger Mozilla project called Paladin that could be seen as something of a browser-native alternative to NaCl)
Recently I had the privilege of both attending and presenting at the inaugural DojoConf event, a developer's conference focused specifically on the Dojo toolkit, but more generally, on using it in conjunction with JavaScript best practices to build awesome stuff. My topic was something that seems to engender a lot of interest, at least based on the comments I got afterward: how to use the Dojox DataGrid widget to its best advantage.
(If you want to follow along at home, my HTML5-powered slides — which use the jQuery-dependent deck.js, ironically enough — are here. If you have an older browser, there's a slightly lower-fidelity copy on my SlideShare page)
In my day job at White Oak Technologies, we spend quite a bit of time and energy on helping our clients to find interesting nuggets of information hidden in gigantic piles of data. For the past year or so we've been working with a normalized database schema that's grown to just shy of 200 columns wide, and users have started producing query result sets that frequently contain tens (or even hundreds) of thousands of rows. To say it's grown beyond our team's expectations would be understating things, and as these data sizes ballooned, we started getting reports of substandard — in many cases outright unacceptable — performance.
For the rest of this article, I'm going to outline what we learned about the DataGrid, why it was underperforming so badly, and what we could do to get around its limitations.
Grid Hacking 101
When dealing with large data sets, the DataGrid's defaults and the sample code from Dojo's online tutorials are a poor fit at best. There are basically four different classes of hacks to consider when using the Grid for big data: its configuration, presentation structure, custom formatting options, and data store optimization.
Configuration Hacks
Until recently, almost all Dojo tutorials involving data-bound widgets used either the ItemFileReadStore or ItemFileWriteStore class as their data provider. That's fine for small, static data sets, and it's pretty easy to get up and running with these; the problem is that they're not optimized. As Bryan Forbes said in a recent tutorial on the Dojo website:
dojo.data.ItemFileReadStore and dojo.data.ItemFileWriteStore were originally intended only as reference implementations. For a more performant store, consider using dojox.data.JsonRestStore.
Bryan's suggestion is a good one, if your back-end is fully RESTful. Another good option is the one we've been using for the past few years, dojox.data.QueryReadStore, which provides support for partial dataset loading: all filtering, sorting, and pagination are handled by the server, instead of having all the data in memory on the client all the time. Once you get over a few hundred or a thousand rows of data, that becomes crucial.
Along with using a better data store, you should consider the number of records DataGrid should request in each transaction. This is exposed via the rowsPerPage property, and defaults to 25 – meaning the widget will request 25 rows at a time. Depending on the speed of the server code that fetches your data, it might make more sense to fetch rows in batches of 10, 50, or even 100. This will probably take you some trial and error to get right, but fine-tuning could help a lot.
Structure Hacks
As I said during my presentation: “the more columns you show, the slower you'll go.” While DataGrid does have the awesome ability to create new DOM elements on demand, throwing away nodes previously created but no longer visible as you scroll down the grid, it can't do the same thing for columns.
Like I said at the beginning, our primary schema has almost 200 distinct columns, all fairly sparsely populated but all of some degree of significance to the user. Originally, each of those columns was being shown in our primary DataGrid, even if the user viewing the grid never bothered to scroll to the right to see the ones beyond the right edge of the grid's viewport! Each of those cells has a CSS display value of table-cell, which means they all have to be taken into account by the browser's reflow calculations.
As a rough example, say your DataGrid keeps three 25-row pages in memory at any given time (which is the default behavior). For each extra column added to the structure definition, the grid has to manage an additional 75 DOM nodes, not counting the overhead of nodes needed to render the column header. Two hundred columns would mean 15 thousand DOM nodes, which get repeatedly created and destroyed, triggering reflow after reflow as the user scrolls through the grid! See why that might be sub-optimal?
There are a couple of things to consider when evaluating your grid structure. The first is to look for empty columns. Query your database for those columns that never have a value in them (i.e., they have a cardinality of 0): there's a good chance you can eliminate that column from your grid, unless the absence of any values is itself meaningful information.
The second structural hack involves careful evaluation of your schema to determine which fields actually need to be conveyed at a glance in tabular format, and which could be reserved for a “details view” when examining a particular row. Not only can this drastically reduce the load on your grid, but it may even make the grid more useful for your users... I recently had an enlightening conversation with one of our power users, who told me that seeing the entirety of the schema makes the grid harder for him to use! The salient data points he's seeking get lost in a sea of unimportant details: the proverbial “needles” lost in the haystack.
Formatting Hacks
One of the really cool things about the DataGrid is its ability to adapt to accommodate the data it presents. By default, it tries to size each row according to the height of the cell containing the most data in that row, and it does this by looking at the offsetHeight property of each node in the row — which will most certainly trigger a reflow. That is, it forces the browser to recalculate the layout of every visible element in the document, which can be slow, especially if you've got a lot of DOM nodes (at which point I refer you back to my earlier point about column count).
Fortunately, DataGrid provides us with an escape hatch to get around these potentially expensive calculations: if your data values are of a predictable size, you can preemptively set the rowHeight parameter to fit that size. The DataGrid's internal _ViewManager class then takes your explicit height as a given, and bypasses all of the height calculation code. When we figured out we could do this, it gave us a big performance win. Your mileage may vary, but it's worth setting if you don't care about dynamic row height.
Another performance bottleneck comes from overuse (or abuse) of custom cell formatters. If you have a formatter function that returns a cool rich-text representation of a cell's value, awesome. Good for you. If you have multiple columns with custom formatters, that's cool too. But if your formatters have to do any kind of slow work (like heavy-duty string parsing, complex math calculations, etc), you might be introducing another bottleneck. Remember our numbers above: by default, your grid may have up to 75 rows in memory at any given time. If each row takes even half a second to render due to slow formatters, things are going to start feeling sluggish as you scroll.
Slick, but slow. #winning?
“But Ryan,” you say, “my grid is so much easier on the eyes with formatters!” I don't doubt that it is, and I'm not saying not to use them. Just be careful... is formatting alone worth a price paid in poor responsiveness? If your formatters are small and quick, you'll likely not have a performance problem. However, if you've got formatters that take a lot of processing time, you should consider doing that formatting on the server side, caching the “cooked” representation, and sending that to the Grid in place of the raw data value. Then the Grid is just rendering the data you're passing down the wire, rather than having to generate it on the fly. That could make a noticable difference in user-perceived response time.
Query Hacks
Finally, don't forget the server side of the equation. Some possible hacks to consider:
If you have an extremely expensive / slow database query, maybe it makes sense to have a really large rowsPerPage size, so you don't have to pay the database penalty as often.
Maybe you can optimize that query, or have an experienced DBA look at it for you to see if there are any available caching strategies that you may have missed.
Or, you could periodically save slow database results in a JSON file and use a JsonRestStore to back your grid instead, which could work fine if your data set doesn't change very often.
You might be able to federate your query somehow, splitting it into multiple, smaller database queries and piping the results back to the grid as they return via a custom-written data store (we're doing this in one area of our app, and the user experience was dramatically improved as a result).
The point is, there are more than likely a lot of different ways for you to approach the server end of your grid transactions, and restricting your optimizations to the client side will limit your effectiveness in the end.
Putting it all together
As part of my presentation, I put together a sample page that illustrates the difference between poorly-configured DataGrids and one where these performance hacks have been applied. The difference is pretty substantial; here are some timings for the three grids, based on recent runs on a relatively suboptimal test platform (Internet Explorer 8 on my year-old work PC), and a pretty good one (Chrome 14 on my new dual-core laptop):
DataGrid
Columns
Datastore
Setup time
Rows
rowPerPage
Paging time (averaged)
Slow (Chrome)
206
ItemFileReadStore
6.978 sec
1,000
100
6.823 sec
Slow (IE8)
206
ItemFileReadStore
17.66 sec
1,000
100
13.629 sec
Optimized (IE8)
6
QueryReadStore
0.546 sec
100,000
25
0.281 sec
It's remarkable to me how much slower the poorly-configured DataGrid is, even though it's rendering only 1% of rows that the optimized DataGrid handles with ease. Note too that the demo grids only differ in three ways: a better data store (QueryReadStore versus ItemFileReadStore), the configured rowsPerPage, and significantly fewer columns in the structure (6 versus 206). If dynamic row height and custom cell formatters were included, the difference could be even more dramatic.
Finally, although we web developers may wish we could ignore support for legacy versions of Internet Explorer, IE8 is likely to be around for a while, and the "Slow" grid is not just slow in IE, it's nearly unusable. Nearly 18 seconds just to create the grid, and more than 13 every time you fetch a new page? Users are more likely to just navigate away in disgust than they are to quietly put up with that noise. Especially in light of the recent moves by the Mozilla and Chrome teams to ultra-compressed release cycles, there are going to be environments where, like it or not, "get a better browser" isn't going to be an acceptable solution to this problem.
One postscript on timings, which I didn't cover during my talk. I've added a third grid to the test page, identical to Optimized Grid in every way but one: 5 of the 6 columns have (relatively benign) custom formatter functions. The difference in performance illustrates a point worth making:
DataGrid
Setup time
Paging time (averaged)
Optimized (IE8)
0.546 sec
0.281 sec
Custom-formatted (IE8)
0.546 sec
0.374 sec
As the table shows, the setup time for the two grids was exactly the same; it's only paging time that suffers, and even then, it's not by much. So, if you can optimize enough other aspects of your grid, and if your formatters are reasonably fast, there's no harm in using them... just don't go crazy.
The Future
Shortly after my talk was accepted, the Sitepen blog published an article by Kris Zyp, Dojo ninja, about how the DataGrid was getting a little long in the tooth, hard to configure, and harder to style (these things are all true, of course). Then he announced that he was working on a new dGrid, which would address all of these problems:
... the DataGrid is suboptimal and difficult to customize and extend. The time has come for a fresh start on the grid.
I was stunned. The new grid sounded awesome, but what did this mean for my talk? Was I giving what would amount to the DataGrid's eulogy? I reached out to Kris, and learned that the plan is, apparently, for both components to live on, each suited to a slightly different use case (I gather the goal is to make dGrid much lighter and more flexible, but at least initially, it may not be as feature-rich).
As it happened, Kris immediately followed me in the DojoConf schedule, presenting the work in progress on his new project, and I found myself among those wanting to use dGrid as soon as I could, because it does indeed look very, very cool. A co-worker and I have been playing with the latest version for a little while, and although our humongous data will probably stay in DataGrid for the foreseeable future, there are other places where dGrid looks like it'll fit quite comfortably. As with so many other things in our field, this is a case of using the right tool for the right job.
This May my company gave me the opportunity to fly to Portland for a few days to be a part of the annual JSConf, JavaScript developers conference. I was grateful for the chance to get out of the office for a few days and take a step back to look at my day to day responsibilities from a slightly better vantage point. Plus, I got to hear from some pretty awesome speakers about the latest and greatest stuff happening in the Web-centric world, which never fails to inspire me to come back and write better code myself.
I took notes on a lot of the talks I attended, although sadly not all of them. I'll share some of those here, in more-or-less unfiltered form; I'll likely have more to say about specific aspects of the trip at a later date.
Alan reviewed his efforts to bootstrap a JavaScript community in Vancouver. By organizing and shepherding VanJS (the Vancouver JavaScript developers meetup), he learned about the priciples that go into a successful local event:
should last around 45 minutes
should be 2 speakers (not everyone will connect with a single speaker, but the odds go up quite a bit with two, while not stretching things out too much)
should happen on a Monday through Thursday, not a weekend (makes people less likely to have other plans).
Speakers? Start with yourself, ask your friends to help (Don't go after the "big names" right away).
Make sure there's "beer" afterward (i.e. a pub of some kind within walking distance) — “this is what turns a group of people talking about code into a community” (it certainly seems to be a key part of the JSConf formula)
I definitely left this session inspired to try something like this in my own community. Stay tuned on that one.
Instead, leverage your APIs and deliver data via AJAX/JSON
I can see where Paolo is coming from, but completely removing templating logic struck me as a bit extreme. Maybe my opinion will change over time, but right now I fail to see the benefit.
developers don't want complicated solutions when they go looking for a library
think about your audience
useCamelCase, yesReally
be careful with polyfills (your library probably isn't the right place to fix array.forEach)
borrow conventions from other popular JS libs (e.g. Raphael aping jQuery's attr())
simplicity
Steve Krug's Don't Make Me Think
“Don't make me RTFM again...” (hint: your API is too big)
decide on sensible defaults, and what can be optional - called out the DOM .initMouseEvent() and .addEventListener() methods for overly-complex APIs - use options hashes (e.g. Dojo's style) for optional arguments
function calls should read well (compared DOM node replace functions in raw DOM vs Dojo vs jQuery)
flexibility
how?
don't be like the 60-tool swiss army knife: you can't please everyone!
instead of infinitely growing your options hash, think about how to add hackability
have public, internal, and protected API levels (e.g. _internal vs public)
I missed this the first time around, but the Twitter feedback and positive buzz afterward were enormous, so it was one of the very first talks I watched when they started showing up on the JSConf channel on blip.tv. There's a great writeup of the talk here, Rebecca blogged about it later herself, and it's really, really worth your time to watch the video.
what does a “real” computer have that a mobile device doesn't?
a fast & stable network connection
lots of storage
fast, multi-core CPUs
hardware-accelerated graphics
all the major JS libs were created before phones had web browsers of any significance
document.querySelectorAll returns a NodeList, not an array — otherwise it's pretty awesome
You can call [].slice.apply(nodelist) to get an array
native JSON is faster than library implementations
[1,2,3].forEach(alert)
why do we need a true mobile JavaScript framework?
- more code causes longer download and init times
- need something easy to extend
- need a fallback for non-webkit browsers
Tobias showed how you can create your own services on Node, built on libvnc. HOLY COW, his live-coding demo blew my mind! You should totally watch the video, it's pretty amazing what he's doing.
JavaScript: Say it like you mean it
The final talk was a fakeout by Peter Higgins, then Alex Russell and a co-worker from Google talked about the future of JavaScript, specifically introducing Traceur, their JS.Next->JS.Now compiler. Interesting stuff. I liked the gist of what Alex said:
language design is library design, library design is language design
Hi, I'm Buyog... I mean Ryan
I love that conferences provide the opportunity for "hallconf", i.e. the meetings between the scheduled meetings that are often our first "meatspace" interactions with people we'd previously known solely through their digital personas. That's pretty cool, because as awesome as the Internet is, there's something about physical interaction that just deepens relationships in important ways. I suspect it's buried deep in our shared subconscious as human beings.
Anyway, on this trip, I had the pleasure of meeting Carter Rabasa (IE Product Manager for Microsoft), Dustin Machi (Sitepen developer in Blacksburg Virginia, an area where we have some interest in relocating at some point), Rey Bango (Microsoft developer evangelist, jQuery committer, and Ajaxian), Daniel Lopez (front-end guy at Zappos.com), and Richard D. Worth (jQuery UI lead, Dev/Trainer at Boucoup, and founder of RewardJS). I even got to help Richard's RewardJS out a little with some code I hacked together for a leaderboard for that project, which became my first Github pull request! Pretty neat.
I hereby resolve...
Whenever I attend events like these, I always come away with a fresh view of code challenges I face back home in my role as my project's lead front-end guy. This trip was no exception: I see a lot of opportunities for improvement in the code I'm producing from day to day, and hope to pass along something of value to the rest of my team.
(Social Network Icon Pack by Komodo Media)