Richard Backhouse's Blog: 2011

Saturday, October 29, 2011

AMD - Loading from HTML5 localstorage

One of the best advantages of using AMD as your javascript loader architecture is that it allows for different AMD implementations to be "plugged-in" without requiring modifications to your application code. As more devices provide HTML5 compliant browsers as part of their bundled software the need for more specialized loaders becomes important, a tailored loader will ensure your application loads in the most efficient manor.

HTML5 can play an important part in all of this as it provide a number of "localstorage" specifications that can be leveraged by AMD loaders to load locally within the device if possible.

Over the summer I decided to try this out and the result is an AMD compliant loader that will load its modules from a "localstorage" implementation. It's available here on github.

One goal I had was that the "localstorage" implementation used should be configurable. The HTML5 localStorage API works great however it has a hard limit of 5MB of storage. This limit is quite easy to reach, especially if you are using a large set of uncompressed modules. In addition to the default localStorage implementation the loader provides a Web SQL storage implementation that can specify the size of the storage to be used. While Web SQL is no longer active in the HTML5 spec it is still currently available in all webkit based browsers. That means Google Chrome, Safari and iOS based browsers can use it. Also there is now an IndexedDB storage implementation available too that works in Chrome and Firefox. Alternative custom storage implementations can be written by following the storage implementation spec.

Another goal I had was to make the loader work independently of any server-side requirement. It is usable just by referencing it via a script tag in your HTML file. There is a caveat with using it this way though, when modules need to be updated the whole storage area must be cleared. To overcome this limitation the loader optionally uses a configured "check timestamp" URL to check timestamps on each of the modules it loads. If a returned timestamp for a module is different than the one stored locally for it the module is reloaded from the server. The protocol for the "check timestamp" API is very simple and allows for a variety of server-side technologies to be used to provide this functionality. In fact the lsjs project provides a Java based servlet implemenetation and also a node.js based implementation.

So how does it perform ? When compared with regular AMD based loaders that load modules asynchronously its performance benefit can be seen in higher latency environments where the time taken to load the modules or perform validation based cache calls for the modules can affect load time. The lsjs loader does still have to load each module independently via script injection or eval calls and this can affect load performance, especially in browsers with slower Javascript engines. One thought I have on improving this is to allow the lsjs loader to load all the required modules in a single script injection call or eval call. That would require the loader storing knowledge of the dependency chains involved.

In addition to the load performance improvements I have some other ideas on how the loader can emulate HTML5 appcache. Using AMD plugins other required resources such a text, css and images can be loaded from localstorage too. The advantage over appcache is that the loader has complete control on if and when the cache is used versus loading from the server.

Overall it been a very interesting experiment that has allowed me to learn how to write an AMD compliant loader. As I have found with my work on dynamic server-side optimizers the ability to write specialized AMD loaders (my next blog post will cover more on this) has become a hard requirement.

Saturday, May 21, 2011

Compressing JavaScript

As part of this blog post one of the recommendations I made for obtaining the best performance for downloading your JavaScript is to compress it before returning it to the requesting Web Client. There are a number of options available and I'm going to compare a variety of them to demonstrate their differences.

Before I get into the comparison result here are details on the environment used to run the tests.

The HTML page simply contained a <div> tag with a dojo dijit.Calendar widget attached.
All JavaScript is loaded via the Zazl AMD dynamic optimizer. It is delivered in one single response connected to a single script tag in the page.
When a JavaScript compressor was applied it was on an individual module basis, not on the whole JavaScript response that is returned.
Dojo 1.6 in AMD mode was used for providing the dijit.Calendar Widget.
RequireJS was used for the AMD loader.
Google Chrome was used to load the page and its Developer Tools used to show the size of the JavaScript downloaded
The list of modules returned were as follows

dojo/_base/_loader/bootstrap.js
dojo/lib/backCompat.js
dojo/_base/_loader/hostenv_browser.js
dojo/lib/kernel.js
dojo/_base/lang.js
dojo/_base/array.js
dojo/_base/declare.js
dojo/_base/connect.js
dojo/_base/Deferred.js
dojo/_base/json.js
dojo/_base/Color.js
dojo/_base/window.js
dojo/_base/event.js
dojo/_base/html.js
dojo/_base/NodeList.js
dojo/_base/query.js
dojo/_base/xhr.js
dojo/_base/fx.js
dojo/lib/main-browser.js
dijit/lib/main.js
dojo/i18n.js
dojo/cldr/supplemental.js
dojo/date.js
dojo/regexp.js
dojo/string.js
dojo/date/locale.js
dijit/_base/manager.js
dojo/Stateful.js
dijit/_WidgetBase.js
dojo/window.js
dijit/_base/focus.js
dojo/AdapterRegistry.js
dijit/_base/place.js
dijit/_base/window.js
dijit/_base/popup.js
dijit/_base/scroll.js
dojo/uacss.js
dijit/_base/sniff.js
dijit/_base/typematic.js
dijit/_base/wai.js
dijit/_base.js
dijit/_Widget.js
dojo/date/stamp.js
dojo/parser.js
dojo/cache.js
dijit/_Templated.js
dijit/_CssStateMixin.js
dijit/form/_FormWidget.js
dijit/_Container.js
dijit/_HasDropDown.js
dijit/form/Button.js
dijit/form/DropDownButton.js
dijit/Calendar.js
app/Calendar.js

The types of compression used were :

No Compression
Gzip
Gzip + Simple comment and whitespace removal
Gzip + Dojo Shrinksafe
Gzip + Google Closure
Gzip + Uglify-JS

No Compression

With no compression at all, that is gzip is turned off and the modules are written into the response "as is" results in a transfer of around 755kb. This is quite a sizable chunk considering that all the page contains is a single widget.

Gzip

As you can see just by turning on Gzip results in a 218kb download vs 755kb.

Gzip + Simple Comment and Whitespace removal

Once again quite a signification size reduction (85kb vs 218kb) just by removing comments and whitespace. I should note that I used simple home grown code to remove comments and whitespace. Writing code that reliably removes comments is actually not as straight forward as it would first appear. Ideally using a real JS parser is the best solution but that can increase the compression time. Google Closure does support a "comment and whitespace" removal mode and this will do a thorough job at the potential expense of increased time to compress.

Gzip + Dojo Shrinksafe

Shrinksafe renames locally scoped variable names in addition to comment and whitespace removal. We have now gone from 85kb to 68kb.

Gzip + Google Closure

This result is from using Google Closure with its SIMPLE_OPTIMIZATIONS mode turned on. We see a better results than Shrinksafe going from 68kb to 58kb. Closure also support more advanced optimizations but typically you have to modify your code to be closure friendly. If you plan to exclusively use Closure this might be something to consider.

Gzip + Uglify-JS

This result is from using Uglify-JS in its default mode. As can be seen Uglify-JS compresses almost as well as Closure. The following code was used to execute it.

var jsp = require("uglify-js").parser;
var pro = require("uglify-js").uglify;

var ast = jsp.parse(src);
ast = pro.ast_mangle(ast);
ast = pro.ast_squeeze(ast);
var compressedSrc = pro.gen_code(ast);

Conclusion
It's pretty obvious that compression can provide substantial reduction in the size of the JavaScript code delivered to WebClients. Certainly the best bang for the buck is simply to turn on Gzip, however adding any of the 3 compression engines used here, or even just removing whitespace and comments, will result in much smaller downloads.

Saturday, April 30, 2011

Optimizing with AMD JavaScript loaders

In a previous blog post I talked about writing JavaScript optimizers that run dynamically. That is to say when a Web Application is serving JavaScript resources the server itself is performing optimizations to ensure the JavaScript is returned to the web client in the most efficient way possible,

AMD (Asynchronous Module Definition) is rapidly becoming the preferred type of loader to use when developing Web Client based JavaScript. Its "define' API provides a way for modules to declare what other resources (other JavaScript modules, text, i18n etc) they depend on. This information is key to how an optimizer running in the server can ensure that when a request is received for a JavaScript resource all its dependencies can be included in the response too. This avoids the loader code running in the web client having to make additional HTTP requests for these dependencies.

So how does one obtain this dependency information? Probably the most reliable way is to use a JavaScript Lexer/Parser to parse the code and analyze the calls to the "define" and "require" API's. There are a number of options here (Mozilla Rhino and Google Closure, both Java based, provide AST parsers) however this typically mean having to pick a environment or language. For the AMD Optimizer that I wrote I decided to use the parser that is part of Uglify-JS. As it is written in JavaScript itself I had more flexibility in the environments where I wanted to run it.

The Uglify-JS AST parser API provides a mechanism to walk the AST's that are returned by the parser. The parse call to obtain the AST object is a one time step. You can then walk the AST as many times as you need. The module "process.js" exports an function called "ast_walker". You can use this function to walk the AST with your provided walker function.

Here is an example of an AST walker that obtains the position within a module where a name/id should be inserted into a "define" call if one is not provided (The AMD optimizer I have written uses this walker to perform any name insertions that are required).

var jsp = require("uglify-js").parser;
var uglify = require("uglify-js").uglify;
var ast = jsp.parse(src, false, true);
var w = uglify.ast_walker();
var nameIndex = -1;
w.with_walkers({
    "call": function(expr, args) {
        if (expr[0] === "name" && expr[1] === "define") {
            if (args[0][0] !== "string") {
                nameIndex = w.parent()[0].start.pos + (src.substring(w.parent()[0].start.pos).indexOf('(')+1);
            }
        }
    }
}, function(){
    w.walk(ast);
});

The walker looks for "call" statements that have a name of "define". The "expr" argument provides this information. The "args" argument provides the details of the type and value of the arguments provided to the "call". In this example the code checks that the type of the first argument is not a string and records the position that the name should be inserted. The walker object (w) provides access to the parent object that contains the position in the source code.

Using it in a Dynamic Environment

For the Optimizer that I have written for AMD based environments (the walker code can be seen here) I use a single AST walker function. The walker code itself recursively walks down the chain of modules recording the relevant information that each module provides. Given a module url for a starting point it obtains the information for each dependency module, placing it in a data structure and adding it to a module map using the module uri as a key.

The walker records the following :

For each module a list of JavaScript dependencies
A list of "text" dependencies
A list of "i18n" dependencies
If a module does not contain an id then records its index within the source where and id should be placed.

With this information now available it's possible to write a HTTP handler that can write a single streams of content that contains :

An AMD compliant loader
A "i18n" plugin handler
A "text" plugin handler
Each "text" resource as a separate "define"'d module
Each "i18n" resource (see below for more details)
Each required module in it correct dependency order

Note: All modules with have an id added if one is not present

i18n dependencies
An i18n dependency is declared in the form "i18n!<..../nls/...>". If the required locale value is available (with HTTP this can be obtained by parsing the "accept-language" for the best fit) the set of messages can be provided in separate AMD modules that will be merged together by the i18n plugin. When processing what has to be written into the response stream the Optimizer will provide up to 3 separate modules based on :

The root locale
The intermediate locale
The full locale

For example given the the i18n dependency "i18n!amdtest/nls/messages" and the request locale is "fr-fr" then the Optimizer will look for the :

root module in "amdtest/nls/messages.js"
intermediate module in "amdtest/nls/fr/messages.js"
full module in "amdtest/nls/fr_fr/messages.js"

Seeing it in action
You can see some samples of the AMD optimizer if you download

amdoptimizerjsp.war (load into a JEE webcontainer and access "/amdoptimizerjsp/amdcalendar.jsp" or "/amdoptimizerjsp/amddeclarative.jsp")
zazlnodejs.tar.gz (run node zazlserver.js ./amdsamples ./dojo16 requirejs.json)

Sunday, April 3, 2011

JavaScript Loaders and Dynamic Optimization

If you are using a JavaScript framework to develop browser based applications then the chances are you are using a JavaScript loader that the framework provides. The advantages of using a loader are that you can write your JavaScript code as separate modules and also declare dependencies on other modules. When your application is loaded the frameworks loader will ensure that the modules and their dependencies are loaded in the correct order.

For most of the JavaScript work I have done I have used Dojo as a framework. Dojo provides two core API's that enable modules to be loaded and declare dependencies

dojo.provide() - used to declare the id of your module
dojo.require() - used to declare a dependency on another module

When the dojo.require() statement is called the Dojo framework will see if the module has already been loaded and if not it will load it via an XHR request.

This works well while developing your application, however when it comes time to deploy in a production environment you do not want it making potentially hundred of HTTP requests to load all it modules. Performance will suffer especially in a high latency environment.

To handle this issue most frameworks provide a "build" tool that allows you to package your application and all it depencies into a single resource or multiple resources that can be loaded via a small set of script tags. Dojo provides such a "build" tool.

If you are like me though and don't particularly care for having to run static builds then using a dynamic optimizer is more appealing. Also if your application is one that supports extensibility then using a static build may not even be an option unless you are willing to customize the frameworks build tool. One example of this is Jazz whose Web UI is extensible and also provides a dynamic optimizer that supports providing code contributions via extensions. (I should note that I am the writer of the original Jazz Web UI optimizer).

Concatenating modules and their dependencies together into a easily loadable resource is only one step in obtaining well performing JavaScript loading. Bill Higgins, a former colleague of mine from Jazz wrote an excellent blog post that details some core techniques for optimizing. With this in mind I decided to write an optimizer that would support these objectives :

Given a set of module id's enable the loading of these modules + all their dependencies in a single HTTP request.
Use both validation based caching and expiration based caching when possible.
Load any localization modules that are required using the locale of the client to determine the message files written into the response.
Support a "debug" flag that when passed set to true will ensure each module + its dependencies can be written into the loading HTML response as individual <script> tags thus enabling easy debugging via browser tooling.
Allow the javascript to be compressed as part of writing the HTTP response. Use both gzip and other javascript specific compressors (shrinksafe, uglifyjs, etc).
Support a variety of server-side environments written in Java and Javascript. For example JEE Servlet based (OSGi, JEE WebContainers) and commonjs based environments such as nodeJS.

So far I have what I have talked about covers the old style Dojo sync loader. Dojo is now moving to adopting Asynchronous Module Definition for its loader architecture (1.6 in its source form is AMD compliant in the dojo and dijit namespaces, for 1.7 it should be using an AMD loader by default). This affects 1) and 4) above in how they are implemented.

1) Given a set of module id's enable the loading of these modules + all their dependencies in a single HTTP request.

This requires performing dependency analysis on the set of modules that make up the application. The result is an ordered set of modules that can be used to build a stream of content written into the response to the HTTP request.

For a Dojo sync loader based system this has typically meant using the Dojo bootstrap process with replaced versions of dojo.provide() and dojo. require() that record the id/dependencies. For each module that is loaded a Regular Expression is applied on the source code to obtain the dojo.provide() details and then for each included dojo.require() the dependencies. Regular Expression works quite well in this scenario as the API's are very simple in structure. This is how the Dojo build tool works to build its optimized versions and also the Jazz Web UI Framework.

For an AMD loader based system using Regular Expression, while certainly possible, is not what I would consider the best option as the AMD API is more complex in structure. In this case using a real JavaScript language lexer/parser is a much better solution. As I wanted the optimizer to run in environments such as NodeJS I needed a JavaScript lexer/parser that was written in JavaScript itself. Luckily the excellent Uglify-JS provides one. Similar to how the Dojo sync loader analyzer works each module is parsed and scanned for "require" and "define" API calls, the results of which is recorded to obtain the ordered list of dependencies. One downside to using a true lexer/parser over RegEx is that the performance is affected somewhat, however other sort of optimizations can now be better supported as the information available from the parser is far richer in detail to what the RegEx can provide. For example Dojo is considering using has.js to sniff out features. The parser can be used to remove features identified by the "has" statements thus making the returned code better tailored to the environment that is loading it.

2) Use both validation based caching and expiration based caching when possible.

While returning a single stream of content versus each individual module is a significant performance improvement the size of the content can still take considerable time to be retrieved from the server. Using both validation and expiration based caching helps significantly here. Both techniques require using some form of unique identifier that represents the content being delivered. For my optimization work I decided to use an MD5 checksum value calculated from the JavaScript contents itself.

Validation based caching makes use of some HTTP caching headers. When the JavaScript response is returned an "ETag" header is added that contains the checksum value. When requests are received for the JavaScript content the request is seached for a "If-None-Match" header. If one is found and the value matches the checksum for the modules being returned an HTTP status code of SC_NOT_MODIFIED (304) is set and no reponse written. This indicates to the browser that the contents should be loaded from its local cache.

Expiration based caching takes it one stage further. If it's possible for the HTML page loading the optimized JavaScript content to include a URL that contains the checksum then the HTTP handler that returns the JavaScript content can also set an "Expires" HTTP header that sets the expiration to be sometime in the future. For the optimizer I have written the HTTP handler look for a "version" query parameter and if it matches the relevant checksum value it sets the "Expires" header one year in advanced. This works better than the validation based caching as the browser will not even make an HTTP request for the JavaScript content instead loading from its local cache. To use this type of caching you must have some form of HTML generation that is able to access the optimizer to obtain the checksum values for the set of modules. This will ensure that a URL with the correct checksum value is specified for the script tag loading the JavaScript content. Here is an example of a script tag URL that the optimizer I have written uses. Note it also contains a locale parameter that ensures applications using i18n modules can use the caching too.

/_javascript?modules=app/Calendar,&namespaces=&version=63ffd833cbe0a7ded0b92a124abd437a&locale=en_US

3) Load any localization modules that are required using the locale of the client to determine the message files written into the response.

The Dojo framework allows developers to use i18n modules for their messages. This means that simply changing the browsers configured locale will show the language specific messages.

Dojo sync loader based applications use the dojo.requireLocalization() API to identify dependencies on i18n modules. Likewise in an AMD environment quite a few of the implementations provide an i18n plugin that allows developers to prefix i18n dependency specifiers with "i18n". Using similar techniques that were used for obtaining the JavaScript module dependencies the optimizer I have written gather these i18n dependencies too. The HTTP JS Handlers I have written look for a locale query parameter and use that value to determine which language specific i18n modules to write into the response. This means that having to return i18n modules for all locales can be avoided although it does require that the mechanism responsible for writing the applications HTML containing the script tags must be able to obtain the locale information from the HTTP request (via the "accept-language" HTTP header)

4) Support a "debug" flag that when passed set to true will ensure each module + its dependencies can be written into the loading HTML response as individual <script> tags thus enabling easy debugging via browser tooling.

What has been described so far is great for obtaining the quick loading JavaScript however it is not particularly friendly for developer usage. Debugging code that is concatenated together and also potentially compressed using a JavaScript compression tool is not fun at all.

Dealing with this in an AMD based environment is actually very simple as one of the goals of AMD is to enable loading of modules indivually. What this means is that the debug flag is used just to ensure that neither JavaScript compression or concatenation of modules is applied.

For the Dojo sync loader environments my optimizer will allow HTML generation mechanisms to obtain the list of dependencies and write script tags for each into the HTML generated. This means that debugging tools will see each module independently.

5) Allow the javascript to be compressed as part of writing the HTTP response. Use both gzip and other javascript specific compressors (shrinksafe, uglifyjs, etc)

If your HTTP handler's environment supports gzip then it is a very simple way to significantly reduce the size of the JavaScript content written back to the browser. This typically involves using a gzip stream of some form that the content is written into and then written out into the returning HTTP response. Browser that support gzip will provide the HTTP header "Accept-Encoding" including the value "gzip".

In addition to this using a JavaScript compression tool can reduce the content size significantly too. Tools such as shrinksafe, uglifyjs and Google Closure all provide good compression results. The optimizer I have written enables different compression tools to be plugged into the resource loading step of the process. The HTTP handler responsible for writing the JavaScript content uses these JS compression enabled resource loaders.

6) Support a variety of server-side environments written in Java and Javascript. For example JEE Servlet based (OSGi, JEE WebContainers) and commonjs based environments such as nodeJS.

I'm not going to go into too much detail here as I will probably write more about this in future blog posts but briefly I will say that where possible the optimizer I wrote uses JavaScript to do any optimization work. I have written Java binding and also NodeJS bindings. Both sets of bindings use a common set of javascript modules. All the Java modules can be used in both OSGi and POJO environments.

Seeing the optimizers in action

The optimizers I have written are available in the Dojo Zazl project. The easiest way to see them is via the samples that use the Zazl DTL templating engine for HTML generation. You can use the README for setup help and this site too.

For a Dojo sync loader example run the "personGrid" sample in the zazlsamples WAR file or from the zazlnodejs package using this command :

node zazlserver.js ./samples ./dojo15

For an AMD loader using RequireJS run the zazlamdsamples WAR file or run the zazlnodejs package using this command

node zazlserver.js ./amdsamples ./dojo16 requirejs.json

Use a tool like Firebug to observe the JavaScript tag load. You should see subsequent requests load from the cache be significantly faster.

Also, you can see the i18n support in action by selecting a different locale. In the sync loader "personGrid" sample you can see console messages displayed in the language currently selected. In the AMD samples you should observe the calendar widget change.

Saturday, March 26, 2011

Writing your own CommonJS module loader

As more and more JavaScript code uses the CommonJS module spec for loading it became apparent that the work I had been doing in the Dojo Zazl project was going to have to support loading CommonJS modules. One of the driving factors for me was wanting to use the fantastic JavaScript compressor and parser Uglify-JS which is written in JavaScript itself.

The Uglify-JS code uses "require" for its dependency loading and if I wanted to use it in my Zazl code I would have to either modify the Uglify-JS code to remove the "require" statements and manually load them or make the Zazl JavaScript loader be CommonJS module compliant. Also I was interested in making my Zazl code run in NodeJS and that meant supporting CommonJS modules in the Zazl JavaScript too.

The existing Zazl JavaScript loader is implemented as a native method called "loadJS". When using Rhino "loadJS" uses the Rhino API's to compile and load. Likewise for V8 it's API's are used also. Loading a CommonJS module is more involved than this though. In addition to compiling and loading it has to also minimally :

Provide a sandbox for the module such that access to other globals is restricted
Provide a "require" method to enable loading of other modules.
Track paths such that requests to "require" containing relative paths (starts with ./ or ../) are resolved correctly
Provide an "exports" variable that the module can attach its exports to.
Provide a "module" variable that contains details about the current module.

My current loadJS support now had to be somewhat smarter to support these requirements. The solution I came up with was to write a common "loader" JavaScript module that can be used in both Rhino and V8 environments. In support of this new loader code a new native method had to be provided for both Rhino and V8 called "loadCommonJSModule". In addition to the "path" parameter passed to "loadJS" this new method also expects a "context" object that contains the values for "exports" and "module". The native methods ensures that this context object is used for the modules parent context, basically its global object.

Doing this in Rhino is fairly straightforward. The java code that supports the "loadCommonJSModule" call has to create a org.mozilla.javascript.Scriptable object. The commonjs loader makes the call :

loadCommonJSModule(modulePath, moduleContext);

And the Rhino based Java code does this :

Scriptable nativeModuleContext = Context.toObject(moduleContext, thisObj);

classloader.loadJS(resource, cx, nativeModuleContext);

The thisObj parameter is what Rhino has passed to the java invocation of "loadCommonJSModule". The classloader object is a RhinoClassLoader (see another blog post for more details) used to create an instance of the module script and use the moduleContext object for its scope.

In V8 it's a little more involved. A new V8 Context is created for each module load. Creating Contexts in V8 is cheap so the performance overhead should be minimal. For the new Context object created each attribute in the provided moduleContext is copied in. This new Context object is then used to run the module script in. The following is some code snippets from the v8javabridge.cpp file.

v8::Handle<v8::ObjectTemplate> global = CreateGlobal();

v8::Handle<v8::Context> moduleContext = v8::Context::New(NULL, global);

v8::Handle<v8::Value> requireValue =

context->Global()->Get(v8::String::New("require"));

v8::Context::Scope context_scope(moduleContext);

moduleContext->Global()->Set(v8::String::New("require"), requireValue);

v8::Local<v8::Object> module = args[1]->ToObject();

v8::Local<v8::Array> keys = module->GetPropertyNames();

            

unsigned int i;

for (i = 0; i < keys->Length(); i++) {

       v8::Handle<v8::String> key = keys->Get(v8::Integer::New(i))->ToString();

       v8::String::Utf8Value keystr(key);

       v8::Handle<v8::Value> value = module->Get(key);

       moduleContext->Global()->Set(key, value);

}

You can see the JavaScript loader code here

Running CommonJS code in Zazl now is just a matter of :

loadJS('/jsutil/commonjs/loader.js');

require('mymodule');

For validation I used the CommonJS set of unit tests to check my loader runs correctly to the spec.

I have some gists that provide both a RhinoCommonJSLoader and a V8CommonJSLoader. Using a Multiple Rooted Resource Loader its pretty easy to run common js based code from Java:

File root1 = new File("/commonjs/example");
File root2 = new File("/commonjs/runtime");
File[] roots = new File[] {root1, root2};
ResourceLoader resourceLoader = new MultiRootResourceLoader(roots);
try {
    RhinoCommonJSLoader loader = new RhinoCommonJSLoader(resourceLoader);
    loader.run("program");
} catch (IOException e) {
    e.printStackTrace();
}

The /commonjs/runtime root must contain the contents of the Zazl jsutil JavaScript directory pathed such that the runtime directory contains the jsutil directory. You can place a "program.js" and its dependencies in the /commonjs/example directory and the loader will perform a "require('program'); to load it.

Tuesday, March 22, 2011

Making the most of using the Mozilla Rhino Javascript Engine

If you are using the Mozilla Rhino Javascript Engine to run your javascript code from Java the you have probably found that the performance is not one of its strong points. There are, however, a few things you can do to improve it, especially if you are creating reoccurring contexts that use the same script files.

Load javascript resources from some form of caching resource loader.
Compile scripts into org.mozilla.javascript.Script objects and use them instead of compiling the script each time.

1) This is should be pretty obvious but using a caching resource loader ensures that javascript resources are not read from disk multiple times. If you use a single caching resource loader for given application then you will end up on loading a given resource only once from disk. The Dojo Zazl Project source code contains an simple implementation of a caching resource loader. The resource loader interface and the caching implementation can be seen here

2) Rhino provides an API for compiling javascript resources into Script objects. If your usage of Rhino involves loading the same set of javascript resources multiple times then using the compiled Script to instantiate instances improves performance significantly.

For the Dojo Zazl project I wrote a RhinoClassLoader class that enables the compiling of the scripts and also uses a classloader as a script class cache store. Here is a snippet from the code showing the compliation :

import org.mozilla.javascript.CompilerEnvirons;

import org.mozilla.javascript.optimizer.ClassCompiler;

ClassCompiler classCompiler = new ClassCompiler(new CompilerEnvirons());

 Object[] classBytes = classCompiler.compileToClassFiles(resource, fileName.replace('-', '_'), 1, name.replace('-', '_'));

Class c = defineClass(name.replace('-', '_'), (byte[])classBytes[1], 0, ((byte[])classBytes[1]).length);

You can see the full code for it here

The loadClass method of the RhinoClassLoader uses the classloader cache by simply using the ClassLoader classes findLoadedClass method :

Class<?> c = findLoadedClass(name.replace('-', '_'));

if (c != null) {  

    return c;

}

Usage is simply :

RhinoClassLoader rcl = new RhinoClassLoader(resourceLoader);

Object scriptIntance = rcl.loadJS(uri, context, thisObj);

with the returned object being the instance of the script. The RhinoClassLoader uses the resourceLoader instance to load the javascript resource. If you use a caching version of the resource loader then you improve the efficiency even more.

Monday, March 21, 2011

Using the Google V8 Javascript Engine in Java

When you want to run javascript code in a java environment the only option you really have is to use the Mozilla Rhino Javascript Engine. It has some great features but performance is quite lacking especially when compared to a native engine such as Google's V8 engine. So what if you wanted to run javascript code in V8 from java.

As part of the work I did for the Dojo Zazl Project I investigated using V8 for the javascript engine when executing requests for DTL templates. This consisted of writing a JNI layer on top of the V8 C API. This part was fairly straightforward and was really just an exercise in writing JNI code. There is a pretty good embedding guide here that explains the V8 concepts etc.

But what if you wanted to make calls back into your java code from your javascript code ? In Rhino this is very easy to achieve. It's also easy in V8 if you know the names of the methods you want to call and their corresponding signatures. This does not make it something that is easily extendable though.

For the Zazl project I wanted to produce something that was more flexible. What I ended up writing was a V8 Java Bridge that allowed you to run any javascript code you wanted and also be able callback any provided java methods. The only restriction was that the signature of the methods had to be fixed. Because of this JSON was used for the method parameters and also for the return value.

Using the bridge is a simple matter of writing a class that extends org.dojotoolkit.rt.v8.V8JavaBridge. You can see the code for it here. You must provide your own readResource method that is responsible for loading javascript resources that are requested to be loaded by the javascript engine :

public String readResource(String path, boolean useCache) throws IOException {

......

}

The Zazl project provides a number of implementations of a ResourceLoader interface for different types of environmetns (JEE WebApplications, OSGi). Also there are some gists that provide examples of File based ResourceLoaders. (example).

Running the script is simply like this :

        StringBuffer sb = new StringBuffer();
       sb.append("var v = test(JSON.stringify({input: \"Hello\"})); print(v);");
       try {
           runScript(sb.toString(), new String[]{"test"});
       } catch (V8Exception e) {
           e.printStackTrace();
       }

The call to runScript passes the name of a callback method (in this example "test"). With the javascript code the test method is called. Note that the parameter must be a stringified JSON object. Also the return value with be a stringified JSON object. For this example the test method looks like :

    public String test(String json) {

        try {

            Map<String, Object> input = (Map<String, Object>)JSONParser.parse(new StringReader(json));

            System.out.println("json input value = "+input.get("input"));

            Map<String, Object> returnValue = new HashMap<String, Object>();

            returnValue.put("returnValue", "Hello Back");

            StringWriter sw = new StringWriter();

            JSONSerializer.serialize(sw, returnValue);

            return sw.toString();

        } catch (IOException e) {

            e.printStackTrace();

            return "{}";

        }

    }

The whole example can be seen in this gist.

The V8JavaScript bridge class also contains runScript methods that support providing a object reference in addition to the method names so that a more genertic extender of the bridge can be passed callback methods.

For the Zazl project I have produced native libraries for 32bit Linux, 32 bit Windows and both 64 bit and 32 bit Mac. These native libraties must be accesible to the JVM (in the same directory as the JVM invocation or via a -Djava.library.path). Alternatively you can use if you run in an OSGi environment you can use the org.dojotoolkit.rt.v8 Bundle. You can get the native libraries from here.

Details on building the Java code can be found on the main github page for the Zazl Project. If you build the org.dojotoolkit.rt.v8. feature and org.dojotoolkit.server.util.feature features you will have POJO JAR files that you can use in a variety of different Java environments.

One thing that should be noted is that V8 is single threaded. Because of this the JNI code has to ensure synchronization via its v8::Locker object. An unlock occurs while any java callback is in process so that the lock is only in effect while the v8 engine is actually running javascript code. As the V8 engine is so fast I have not seen any noticeable issues with this so far but it is something that has to be considered when deciding when and what code is run via V8.