In a
comment,
Neel Krishnaswami asked us to discuss networked persistence. Here
are some preliminary thoughts on this.
The most important thing about persistence in any context is that it
should fit naturally with the language. This means lots of things:
internal-external data interchange; persistent storage identification
and management; hooking up to the language's control model; and more.
For now I'll discuss these three points. Notice that the questions
themselves aren't really about networked persistence, but the
network will show up in each of the answers.
Data interchange on the Internet naturally demands an XML story, and
lots of language research is trying to wrestle with this. If you step
outside research and spend some time in blog-space or studying Web
service APIs, you see a very interesting trend: the growing acceptance
and even promotion of JSON. Because JSON obviously integrates nicely
into JavaScript, and its support is increasing, we're ducking the XML
question entirely for now. In fact, our Web services API
automatically converts XML data into JSON for your convenience.
What is persistent storage? On a local system, computer scientists
find it useful to categorize at least three different kinds of
storage: the database, the persistent heap, and the filesystem. (And
that's arguably just a 1970s view of systems.) We can distinguish
between these along several dimensions: the object model, the power of
query, the style of naming, the interaction with processes, etc. In
Flapjax, though, many of these dimensions lose their relevance for two
important reasons. First, we don't have (and don't want) a true
distributed operating system, so the process boundary dies at the
network interface (with end-to-end services taking its place).
Second, network latency greatly distorts the cost models. So some
unification of these concepts may be possible. For now, Flapjax gives
each user a home object; their data are held by fields. From the
program's viewpoint it's just a persistent object, but the user can
think of fields as a pun for subdirectories, and use the
object-browser like a file-browser. What we definitely don't have is
a story on search, aggregation, and other functions you get from SQL.
We have various thoughts on this, ranging from leveraging XPath to
letting users define schemas, but (indeed, therefore!) this is very
much an open question.
The control front is somewhat obvious: because Flapjax computations
are time-varying, changes must naturally be pushed to the server, and
server changes must naturally trigger renewed computation on the
client. Defining this crisply has proven to be somewhat tricky. One
issue is handling aggregate data, which Michael Greenberg is studying
from first principles. The other is how to deal with multiple clients
that write simulaneously. Obviously there is already research on this
topic, but we do have our hands tied by the lack of server-push, the
sheer number of clients that may be accessing a datum, existing APIs,
etc. We have our own distributed algorithm that helps with this, but
just defining reasonable behavior is tricky: if you don't get it
right, a client who is sharing a writeable buffer will see the new
characters they type disappear before their very eyes.
We need to clean things up a little more before we say much. My
reticence isn't in the usual academic I have a paper coming up
and I don't want someone to scoop it or find the flaw in it before it
gets in mannerrather, we really do have more design work
to.
And, of course, we welcome feedback and prioritization.