Tagneto: commonjs

Showing posts with label commonjs. Show all posts

Monday, June 25, 2012

ES Modules: suggestions for improvement

There has been a recent bout of comments about ECMAScript (ES) harmony modules on twitter and elsewhere. Here is my attempt to explain parts of it, some of the design tradeoffs, and perhaps a middle ground that would open up some options that may bridge some gaps.

Modules are one of those things that seem very simple, but involve quite a lot of decisions and tradeoffs. This post is mostly just about module linking and module ID resolution, and even with that, it is quite long.

If ES Modules do not come up with different ways to work (or maybe explain where I have it wrong), they are not competing well with what can be done with a combination of CommonJS/Node and AMD.

My background: I work on RequireJS and AMD.

What is it

First, some links to the specs. The "harmony" moniker means it is in process for the next version of the ECMAScript (JavaScript) language:

Harmony Modules: The basic module spec.
Harmony Module Loaders: Specifies how you can create isolated loaders for modules. Very useful.

The module examples page is suggested if you want to get a feel for it, but it is good to read the other docs too. It can be a bit daunting though, unless you speak the spec language.

Points of reference

One way to evaluate how ES Modules works is to compare it to something you may already know:

CommonJS / Node. Node implements a version of the CommonJS module API.
AMD / RequireJS. RequireJS implements the AMD module API.

Run time vs compile time

ES is a "compile time" approach where the formats mentioned above are "run time" approaches. Maybe not precise terms, but here is a definition of what is meant by those terms for the purposes of this post:

"Compile time" means:

JS text is parsed, and the "module" "import", and "export" syntax is found.
Any dependencies are fetched and parsed.
Once the dependency tree has been all fetched, the ES module loader will wire up the exports from a dependency to a module's "module" or "import" use, and do type checking on that export type and how it is referenced in the module
The module code is then evaluated/executed.

"Run time" means: there is usually no pre-parse stage. The JS text is evaluated, and any module API that is encountered is run as it is encountered.

AMD will actually do a small parse step if the module looks like:

define(function (require) {
var a = require('a');
});

In that case, it will parse the function to look for require() dependencies, and then load and execute the dependencies first before running the function above.

"Compile time" was chosen for ES because:

it is familiar from other scripting languages
sets the way for other possible static features, like macros
ensures that "import *" are static, *not* dynamic bindings
allows some type checking on the values that are explicitly "export"ed.
generally seen as safer and easier to reason about that run time.

The CommonJS/Node style of pure runtime, no parse, was hard to get to work with some edge cases as I understand it, but I heard that second-hand, I did not see that discussion.

Impact of compile time

For compile time to work well, it should use new keywords in the language, to have clear markers on what is participating in the module system.

Although, it could work with a module API instead of new syntax, by only recognizing literal use of that API, and do not support variable assignment of the API or dependencies to other names. This is what AMD does for the "sugared CommonJS form" mentioned above.

For "import *", static binding is critical because anything that is a runtime scope lookup gets into "with" territory, and "with" has been seen as a mistake by the committee. ES5's 'use strict' bars its use.

Since new syntax is involved, ES Modules cannot be "shimmed" into existing JS libraries. There is a Module Loader "runtime" registration call that can used for a module to register its, but it means those libraries cannot participate in the compile time linking stage, so they need to be pre-loaded by a script loader before an ES Module can effectively reference it with module syntax.

Module ID resolution

One other consideration, one that is usually overlooked when talking about modules, is how a module ID like "jquery" is resolved to a file path and loaded.

Both CommonJS/Node and AMD/RequireJS support "short, logical names" for dependencies. So, you can say require('jquery') and that jquery gets converted to a path using some algorithm. Node uses multiple paths to find jquery.js, and AMD in the browser relies on a declarative configuration to do so.

ES modules do not really have support for this, unless you also implement an imperative resolver. They support full URLs, like:

module foo at 'http://example.com/scripts/foo.js'

but we have found in AMD that it is useful to be able to say require('jquery'), but then declaratively map that to zepto.js.

So, an individual module specifies a dependency on an API provider, but how that provider is satisfied is resolved using the declarative configuration.

If there is only an imperative resolve API, no simple declarative API to resolve short names, it will mean shipping a userland "loader library" to effectively use modules. This opens the door to balkanization in module ID resolution since there is not built in support.

Special factors in JavaScript

There are a few special factors with JavaScript that are not usually in other programming languages, and they have an impact on the design:

The largest deployed use case of JavaScript, the browser, is async, network IO. File size and number of requests are very important to performance. So combining modules together into one file, and minifying/transforming the source for smaller delivery is common.
There is a large legacy codebase of browser-based JavaScript that just use browser globals, and no real module format. Some small uses of JavaScript do not need modules, and browsers will support those use cases indefinitely.

My goals

I want AMD and RequireJS to go away.

They solve a real problem, but ideally the language and runtime should have similar capabilities built in.

Native support should be able to cover the 80% case of RequireJS usage, to the point that no userland "module loader" library should be needed for those use cases, at least in the browser.

If the ES module format requires a web developer to use a script loader to use existing, non-AMD/CommonJS, non-ES JS in a project for those 80% use cases, it is a failure.

Example: If I cannot use jquery and backbone in an ES 6 module without needing another library to preload or prep those libraries for ES 6 module use, then existing JS users will not see much advantage over using AMD.

If the web developer needs to code any imperative logic to wire up the ES Module Loader, that will result in a loader library. That is a failure condition.

As compared to AMD: if the ES approach cannot do the above without a helper loader library and the ES approach does not allow something like loader plugins, then there is no contest -- AMD will still be more useful to a developer than the built in system. Small savings in the amount to type and a thin layer of type checking is not enough.

This may very well not be the goal of ES modules, and it would be great if the specs or some background material acknowledge that, and list out the mitigation strategies developers are expected to use.

Shortcomings of ES modules

Right now, ES harmony modules do not improve an AMD user's workflow because of the following:

New syntax makes it very hard to optionally upgrade

If I am the author of something like jQuery or Backbone, I cannot optionally add in a way to register as an ES module because ES modules use new syntax. However, there are many uses of those libraries which will not be in ES module-capable browsers.

The Node and AMD communities have found optional opt-in via a runtime API very useful for adoption of code that works with their module systems, but still work in older "use plain script tags with browser globals" approach.

There is a runtime API in the ES module loader proposal that would allow a legacy script to register something as a module, but that requires the end developer to use another script loader library to load that library so it can do that runtime call, then start loading ES module code.

The developer may as well just stick with AMD. Complexity has not been reduced.

Register module and a global

Backbone originally had trouble adopting AMD because if it called define() to register a module, they found other libraries, like Backbone plugins, would break. The plugins were expecting to find a Backbone global variable but when Backbone called define() it was not also exporting a global.

This same problem will exist in ES-mixed code. Any dynamic registration also needs to allow an export of a global so that downstream libraries will work until they are also converted to optional module registration.

There should be a migration path, one that allows gradual rollout of modules without requiring a project to go whole hog on module syntax.

Declarative module ID resolution

While I have made tools to allow a developer to "convert" an existing library to AMD, there are many developers that did not want to touch existing libraries. It makes it difficult to compare against new versions and there is a concern that the conversion introduces breaking scope changes (rightly so).

So the "shim" configuration for requirejs was introduced to allow specifying dependencies and an export value for JS code that does not call a module API. This has been well received in the community. More background on shim here.

"shim" with "paths" and "map" make it possible to declaratively set up a configuration that allows for one file IO lookup per module ID, an easy way to "shim" old libraries, and to load more than one version of a module for use by different modules. That covers the 80% case for using old and new code with a module loader.

By using a declarative configuration that is supported by the "default" module ID resolution mechanism in the language, then it avoids having to ship a userland loader library for browser use. This is a big win because it will help kill AMD.

It is fine if the Module Loaders API still has an imperative API to set up different module ID resolution logic. That would allow Node to maintain its current multiple IO, nested directory lookup logic. However, the default should favor the harsher browser environment in such a way that an extra loader library is not needed.

Loader plugins

I can appreciate that supporting Loader plugins may seem out of scope for the default module loader, but they have been incredibly useful for AMD. Node has seen a use for them, even though they are done in a different way via require.extensions. They effectively allow use of transpiled languages.

I find the AMD loader plugins better than Node's approach because:

load behavior vs. file format: since a prefix is used on the resource ID instead of just using a file extension suffix, it allows multiple plugins to deal with the same type of file extension. For example, "text!index.html" and "template!index.html" can be used in the same app, the first one just giving the raw text, the second one "compiling" some text for use as a template. The developer, not the plugin provider, chooses the right use. It still allows "single extension" plugins too, and for those, they can omit the file extension in the ID, so no increase in ID length.
one IO lookup: For a resource ID "foo", node may do a lookup for "foo.js", "foo.coffee" and "foo.node". By specifying the loading mechanism via the prefix, it avoids multiple IO lookups, which are important for browser use. It also makes it clear what handles the loading.

AMD loader plugins can participate in build steps, so the "text!" plugin can inline the text as module in a built file:

define('text!index.txt', function() {
return 'hello world';
});

That is incredibly useful for getting good network performance in the browser.

Even for local file environments like Node, being able to combine all the assets for a program into one file is really great for distribution. It is conceptually simpler to reason about tracking one file vs. "nested directory of directory" installs. Think of it as a way to easily share shell scripts.

The middle way

For developer workflow, right now AMD is a better alternative than ES harmony modules, given the choices around compile time linking, new syntax, and the use of imperative ID resolution.

Here are some suggestions on how to allow some of the benefits of the compile time approach with the run time ones used by AMD. The goals are reuse non-module code in modular systems, allow for a way to get some static version of import *, and perhaps even macros.

Fetch dependencies, execute, modify, execute

The core of the middle way for compile time vs run time:

do not force compile time operations to be all up front, before any evaluation.
evaluate dependencies before executing the current module.
provide an API for modules, not just new syntax

These are effectively what AMD does today, except it does not have a way to alter the AST before final execution. Well, an AMD loader could do that, but AMD loaders have traditionally avoided it. However, the harmony loader plugin I made effectively does this to support "import *". More below:

The ES module loader would operate like so:

Load JS text. Parse out dependency references.
Load dependencies, parse out their dependencies, load them, etc...
Before executing a given module, execute its dependencies, and wait for the dependencies to finish exporting their module values.
Take that exported value and if there is an "import *" in the current module, modify the AST of the current module such that it gets a locally bound variable to any of the hasOwnProperties of the dependency that are known at that time. So, any properties added to the module after this point are not visible. This should avoid concerns about dynamic scope.
Once the module AST has been fixed up for any import *, then evaluate it.

When parsing out dependency references, look for any new keywords, but also any API that corresponds to that keyword. So, look for at('moduleID') for dependency references in addition to at 'moduleID'.

The runtime API for the module should be something like at('moduleID') for dependencies and exports.propertyName for specifying export properties. I am not arguing for that specific API, just mentioning that there would be an API alternative to the new syntax. The API alternative does not need an import alternative though.

Since a module is executed before giving the exports to a module that depends on it, and since there is a runtime API for modules, then that allows existing JS code to opt-in to ES modules without getting bitten by new syntax.

Since all dependencies are executed before executing the current module, an "import *" can be supported, and I believe that would allow for macros later.

There are some limitations around circular dependencies, but they are still possible, and the restrictions are minor in comparison to allowing existing code to opt in to ES modules and still work in non-ES module environments.

Declarative configuration

Support something like the "paths", "map" and "shim" config as used in RequireJS. This allows easier use of old code, and scales up to very large code without requiring a developer to ship a library that sets up an imperative resolution API.

Support loader plugins

Now that all dependencies are executed before executing the current module, then it is easier to support loader plugins, as the loader will have the exported value for that plugin resource before running the current module.

These environment-based loading, like an "env!" plugin that can load a module for Node and a different API-compatible one for the browser. See also a "has!" plugin for feature detection-based loading, and plugins to enable transpilers.

Yes, it is more to sort out, but they provide a lot of benefit. AMD has already primed this pump. It even works with a build/optimization step for inlining resources.

Use string IDs for module identifiers

This allows the module references in dependencies to be the same as the ID that is inlined when modules are combined and named in built files. Right now it is weird to use a JS identifier, like module Foo {} to name a module, but then see module Foo at "Foo". It is hard to match up at "Foo" with module Foo.

The extreme positions

The following is based on my limited experience. I am not a language designer. I am but a simple plumber that uses the pipes that available to build things. I may not have the right long term thinking involved, but I think the following make the ES module proposal simpler.

To be clear though, I think the middle way above is enough to bridge the gap. Please, do not read the following and then discount the middle way. The middle way is separate from these more extreme measures.

No new syntax

If there is a runtime API available to allow existing code to opt in, just shed the new syntax. Just have one way to do it via API that can also then be shimmed.

no import *

import * makes it difficult to determine where code comes from. If
this type of construct is allowed:

module foo {
   var sin = function () {};

   module bar {
       import * from "Math";
       sin();
   }
}

for a minifier, it now needs access to Math to do its work correctly. This has not been the case in the past. It would suck to need all of the code for all the modules used in a system just to complete a minifier pass.

For developers, if you have two modules that do import * it can be difficult to know where something comes from.

Destructuring provides enough benefit for these use cases, just do the
comma separated list for things you really use:

   import {sin, cos} from "Math";

import * is a bad pattern and it does not save much.

If you get rid of import * then with the "middle way" of evaluating modules, then regular var/let-based destructuring is enough, there is no need for an import keyword.

No macros

Similarly, rethink the need for macros long term. They suffer from the same "where did this come from" problem as import * does. The function capabilities in JavaScript are good enough to get the job done for the "don't repeat yourself" task.

A way forward for today's code

The nice thing is that we can prototype this new world by combining what CommonJS/Node does today with AMD. So we can just use the require() and define() as used today to get there. The ES committee does not have to ratify it, and we get the benefit of real world implementation and use before committing to default language support.

Cajon is my attempt from the AMD side to bridge the gap with plain Node code and a runtime browser loader. LinkedIn's Inject is another AMD loader that uses a similar approach. So, just use CommonJS/Node modules in the browser in dev, use the r.js optimizer to compile down to AMD for final deployment.

The cjsTranslate capability in the r.js optimizer allows a developer that always likes to do builds, even in dev, can code in Node syntax but output to AMD and load it in the browser either by the small Almond AMD shim, or the full dynamic loader via RequireJS. Or choose Dojo or curl.js.

browserify can be updated to use AMD as its transport format instead of its home-grown require.define() API, and then not have to ship a loader, but use one of the AMD loaders/API shims. browserify is nice in that, unlike the r.js optimizer+cjsTranslate, it provides browser module shims for the native node modules. It would be great to break those out as a separate project that could be consumed by a project just using the r.js optimizer.

If Node adds define() support, callback-require for use within a module for dynamically calculated dependencies, and supports at least a limited form of loader plugins, then we're done. The amdefine project is an implementation proof of that support. There are details to sort out, but it is doable. Any node committers are interested, give me holler. We can work out the details.

Summary

For developer workflow, the current ES module spec is not competing well with a combination of CommonJS/Node and AMD with loader plugins. Or even just AMD with loader plugins.

Using the middle way for module execution and getting a good declarative module ID to path configuration in the ES spec will level the playing field. Add loader plugins to get language transpiler support and environment/feature detection loading that is efficient for the browser.

I have given some of this feedback to the es-discuss list, but I think some of it, in particular the "middle way" module evaluation flow, got lost in my poor communication where it seemed like I was proposing a dynamically scoped import *. Hopefully this post clarifies what I was trying to achieve with that earlier feedback.

Finally, I appreciate working on the ES committee is very difficult work. I do not envy them. I do not mean for this feedback to come across harshly, but the committee is running out of time, and I do not feel like it has made the case very well for how what is being proposed is better than what we have cobbled together with existing technology. To be clear, I want an ES Modules proposal to succeed because I do not want to do AMD or RequireJS forever. Hopefully this feedback can be viewed as loyal opposition, and as a challenge to do better, or at least to do it in a way that is explained more clearly.

Tuesday, March 30, 2010

CommonJS Module Trade-offs

First of all: why should you care about module formats?

If you use JavaScript, particularly in the browser, more is being expected of you each day. Every site or webapp that you build will want to do more things over time, and browser engines are getting faster, making more complex, web-native experiences possible. Having modular code makes it much easier to build these experiences.

One wrinkle though, there is no standard module format for the browser. There is the very useful Module Pattern, that helps encapsulate code to define a module, but there is no standard way to indicate your module's dependencies.

I have been following some of the threads in the CommonJS mailing list about trying to come up with a require.async/ensure spec and a Transport spec. The reason those two specs are needed in addition to the basic module spec is because the CommonJS module spec decided to make some tradeoffs that were not browser-friendly.

This is my attempt to explain the trade-offs the CommonJS module spec has made, and why I believe they are not the right trade-offs. The trade-offs end up creating a bunch of extra work and gear that is needed in the browser case -- to me, the most important case to get right.

I do not expect this to influence or change the CommonJS spec -- the developers that make up most of the list seem to generally like the module format as written. At least they agreed on something. It is incredibly hard to get a group of people to code in a certain direction, and I believe they are doing it because they love coding and want to make it easier.

I want to point out the trade-offs made though, and suggest my own set of trade-offs. Hopefully by explicitly listing them out, other developers can make informed choices on what they want to use for their project.

Most importantly, just because "CommonJS" is used for the module spec, it should not be assumed that it is an optimal module spec for the browser, or that it should be the default choice for a module spec.

Disclosure: I have a horse in this race, RequireJS, and much of its design comes from a different set of tradeoffs that I will list further down. I am sure someone who prefers the CommonJS spec might have a different take on the trade-offs.

To the trade-offs:

1) No function for encapsulating a module.

A function around a module can seem like more boilerplate. Instead each module in the CommonJS spec is just a file. This means only one module per file. This is fine on the server or local disk, but not great in the browser if you want performance.

2) Referencing and loading dependencies synchronously is easier than asynchronous

In general, sync programming is easier to do. That does not work so well in the browser though.

3) exports

How do you define the module value that other modules can use? If a function was used around the module, a return value from that function could be used as the module definition. However, in the effort to avoid a function wrapper, it complicates setting up a return value. The CommonJS spec instead uses a free variable called "exports".

The value of exports is different for each module file, and it means that you can only attach properties to the exports module. Your module cannot assign a value to exports.

It means you cannot make a function as the module value. Some frameworks use constructor functions as the module values -- these will not be possible in CommonJS modules. Instead you will need to define a property on the exports object that holds the function. More typing for users of your module.

Using an exports object has an advantage: you can pass it to circular dependencies, and it reduces the probability of an error in a circular dependency case. However, it does not completely avoid circular dependency problems.

Instead, I favor these trade-offs:

1) Use a function to encapsulate the module.

This is basically the core of the previously-mentioned Module Pattern. It is in use today, it is an understood practice, and functions are at the core of JavaScript's built-in modularity.

While it is an extra function(){} to type, it is fairly standard to do this in JavaScript. It also means you can put more than one module in a file.

While you should avoid multiple modules in a file while developing, being able to concatenate a bunch of modules together for better performance in the browser is very desirable.

2) Assume async dependencies

Async performs better overall. While it may not help performance much in the server case, making sure a format performs well out of the box in the browser is very important.

This means module dependencies must be listed outside the function that defines the module, so they can be loaded before the module function is called.

3) Use return to define modules

Once a function is used to encapsulate the module, the function can return a value to define the module. No need for exports.

This fits more naturally with basic JavaScript syntax, and it allows returning functions as the module definition. Hooray!

There is a slightly higher chance of problems in circular dependency cases, but circular dependencies are rare, and usually a sign of bad design. There are valid cases for having circular dependencies, but the cases where a return value might be a problem for a circular dependency case is very small, and can be worked around.

If getting function return values means a slightly higher probability of a circular dependency error (which has a mitigation) then that is the good trade-off.

This avoids the need for the "exports" variable. This is fairly important to me, because exports has always looked odd to me, like it did not belong. It requires extra discovery to know its purpose.

Return values are more understandable, and allowing your module to return a function value, like a constructor function, seems like a basic requirement. It fits better with basic JavaScript.

4) Pass in dependencies to the module's function wrapper

This is done to decrease the amount of boilerplate needed with a function wrapped modules. If this is not done, you end up typing the dependency name twice (an opportunity for error), and it does not minify as well.

An example: let's define a module called "foo", which needs the "logger" module to work:


require.def("foo", ["logger"], function () {

    //require("logger") can be a synchronous call here, since
    //logger was specified in the dependency array outside
    //the module function
    require("logger").debug("starting foo's definition");

    //Define the foo object
    return {
        name: "foo"
    };
});

Compare with a version that passes in "logger" to the function:


require.def("foo", ["logger"], function (logger) {

    //Once "logger" module is loaded it is passed
    //to this function as the logger function arg
    logger.debug("starting foo's definition");

    //Define the foo object
    return {
        name: "foo"
    };
});

Passing in the module has some circular dependency hazards -- logger may not be defined yet if it was a circular dependency. So the first style, using require() inside the function wrapper should still be allowed. For instance, require("logger") inside a method that is created on the foo object could be used to avoid the circular dependency problem.

So again, I am making a trade-off where the more common useful case is easier to code vs increasing the probability of circular dependency issues. Circular dependencies are rare, and the above has a mitigation via the use of require("modulename").

There is another hazard that can happen with naming args in the function for each dependency. You can get an off-by-one problem:


require.def("foo", ["one", "two", "three"], function (one, three) {
    //In here, three is actually pointing to the "two" module
});

However, this is a standard coding hazard, not matching inputs args to a function. And there is mitigation, you could use require("three") inside the module if you wanted.

The convenience and less typing of having the argument be the module is useful. It also fits well with JSLint -- it can help catch spelling errors using the argument name inside the function.

5) Code the module name inside the module

To define the foo module, the name "foo" needs to be part of the module definition:


require.def("foo", ["logger"], function () {});

This is needed because we want the ability to combine multiple module definitions into one file for optimization. In addition, there is no good way to match a module definition to its name in the browser without it.

If script.onload fired exactly after the script is executed, not having the module name in the module definition might work, but this is not the case across browsers. And we still need to allow the name to be there for optimization case, where more than one module is in a file.

There is a legitimate concern that encoding the module name in the module definition makes it hard to move around code -- if you want to change the directory where the module is stored, it means touching the module source to change the names.

While that can be an issue, in Dojo we have found it is not a problem. I have not heard complaints of that specific issue. I am sure it happens, but the fix cost is not that onerous. This is not Java. And YUI 3 does something similar to Dojo, encode a name with the module definition.

I think the rate of occurrence of this issue, and the work it takes to fix are rarer and one time costs vs. forcing every browser developer taking extra, ongoing costs of using the CommonJS module format in the browser.

Conclusion

Those are the CommonJS trade-offs and my trade-offs. Some of them are not "more right" but just preferences, just like any language design. However, the lack of browser support in the basic module spec is very concerning to me.

In my eyes, the trade-offs CommonJS has made puts more work on browser developers to navigate more specs and need more gear to get it to work. Adding more specs that allow modules to be expressed in more than one way is not a good solution for me.

I see it as the CommonJS module spec making a specific bet: treating the browser as a second class module citizen will pay off in the long run and allow it to get a foothold in other environments where Ruby or Python might live.

Historically, and more importantly for the future, treating the browser as second class is a bad bet to make.

All that said, I wish the CommonJS group success, and there are lots of smart people on the list. I will try to support what I can of their specs in RequireJS, but I do feel the trade-offs in the basic module spec are not so great for browser developers.