Julius Šėporaitis / Sepa Software Engineering

There is a problem with web frameworks

This is an essay about a new way to build websites, a way completely different from what currently web frameworks (at least PHP) offer. I will dare to say that they all are flawed and then present you one way that to fix the problem, and will boast a little, that this way was presented at one of W3C workshops for guys from W3C, IBM, Nokia, Oracle and others. The way is also available for you to try out, as an open source project. Although this post is long, I hope you won't be frightened by that and will enjoy reading it as much as I enjoyed writing it!

However...

TL;DR. Frameworks are bad because of object-relational impedance mismatch and leaky abstractions. We made Graphity - an open-source project. Read our position paper for W3C workshop here.

First things first. In the summer of 2011 I was working, among couple of other things, on a website called Wulffmorgenthaler. There is a different story behind the english version (just in case you have questions), but today I will concentrate on a Danish website and its successor - HeltNormalt. So, around the middle 2011, the owner of the website decided to do rebranding - new website with a LOT of new kinds of content. Its old codebase was terrible piece of craftsmanship, part of which, sadly, involves me too, but the rebranding meant a green light to a fresh start on the codebase. Oh boy, was I happy about it! No more whining about Other People's Code and also I was anxious to not repeat my past mistakes and make my code even better this time.

At the same time there happened to be Symfony2 Beta. I had not used Symfony, but decided to take a look - especially as some of my good developer friends were buzzing about it. I downloaded the sandbox, read the documentation and some blog posts. Symfony2 looked as a fresh breeze after years with Zend Framework. I liked the structure of the code, I liked their approach to development/production environments and asset management. I thought, "I shall push to use it for our fresh start!". Early adoption all the way.

But then again, I was not in this alone. We were three coders and while I happened not to be a lead, I did have a voice. Oh man, did I preach "Symfony2! Symfony2!", but the final word was in other hands.

The lead - Martynas, had a long time and never ending interest in a technology called the Semantic Web. He deserves credit, for what happened in the following months. So the decision was made, to use his approach and make any tools neccessary along the way (there was not much of them for PHP anyway). Then - all hell break loose. I and another colleague, Aleksandras, both loudly objected, argued, discussed, SHOUTED IN CAPS LOCK and so on. But the decision was immovable and as the future showed - it was to the benefit of our own experience, the code quality and a different perspective about things.

The coding began... And during those months of development I saw that there is a drastically different approach to do things. I do not lie - it took maybe three months to grasp all this myself. Sometimes I didn't even know what I was coding! Luckily, Martynas had a clear vision and lead us through, though there were still some discussions up until the end.

So, what did I grasp?

There is a problem with frameworks, because

They try to help me too much. I started to hate the idea of using any MVC Based framework. Why? is a good question. Here's an answer:

Contrary to the "easy to learn" slogans, shiny documentation and easy examples, frameworks do not let me prototype things fast, unless I know them inside-out. And even if I am expert, I am constrained.

Consider this: a framework is missing a feature. There are two ways to solve this. I can search for a piece of code written for that mutual purpose by another great developer, and use it. But, 9 out of 10 times that piece of code does not fit my exact needs - I will need to "polish" it. Or, it fits my requirements, but the code structure is totally different. Or, it has no tests, but seems to work. I might end up writing it myself, hopefuly if I understand the problem domain good enough, the code will be pretty good too. However, even in this case I am constrained. I am bound to structure my code in the way the framework authors intended the framework to be extended. What if I want to be a free spirit and do things my way if I know I will do them better?

P.S. As I am writing this - "Linus Torvalds on C++" appeared on Hacker News. And I totally relate to his ideas, in a different domain though.

models are a fifth foot on a dog, and

I have long since heard that ORM is an anti-pattern and I agree with that. But I had always thought the problem is with implementations - no one created that right one yet. Now i have come to the horrifying conclusion, that Models in MVC are totally worthless piece of code. <irony>They are needed only for bugs to hide somewhere.</irony> This comes from realizing, that mostly I opened Model code to fix something.

Consider this mindflow: Requirements of (especially web) software are constantly changing and these changes mirror directly onto the data structure. When an unavoidable change to the data and logic arrives, you update them and in the end open up Model class, add/remove methods/parameters, change various queries to use those methods/parameters. It feels so unnatural, that when you change your data and your queries, you have to change something more. This is a constraint on data. There is a better way.

Your data should be your model - it should be self contained. Your code shouldn't bother about internal representation, rather it should care about data transformation. Which brings me to my next idea.

views are impostors!

Yes, Views do not do what they should. Views are representing your data, rather than presenting. This is so subtle difference, that didn't come to me quickly, but when I grasped this - I was shocked again.

I'd expected view to be, in one way or another, the same data I have in my database, except that it is presented in the way the human eye or a browser script could make sense of it. However, your commonly known View mocks itself by representing the data using a spaghetti of moustaches! View is also static and tightly knit to the underlying Model it tries to represent.

Here's an example: a requirement change happens for some theoretical website. Some theoretical engineer opens up Model code, updates properties and queries in accordance to new requirements and then goes on to change the View.

It's quite alright to update the views with information on how to present new properties, but why should a View care if those properties are available in the data or not? It shouldn't.

The alternative is The Transformation - a thorough dramatic change in form or appearance.

Think of it like this: The Transformation thingy knows what properties our data might have and how to present them. Yes yes, tiny snippets of code. Then you query your data, retrieve some properties and push them through The Transformation. What comes out? Data is presented according to these rules.

If you change the query, add or remove properties, the data is still presented without any change to the transformation code!

I know, the difference might seem very subtle, but the implications are significant.

You can also consider this metaphor: a View here is like looking through a window in a picture - you will see the same thing until you make another image of the same window from a different perspective. A transformation, on the other hand, is like looking directly through the window and seeing a different view as you smoothly change your perspective.

How it should work.

What do I want from a framework, then? Primarily, healthy abstractions of the low level stuff that I need for a webapp. For example, I now argue (I wouldn't have believed a year ago), that the simplest abstractions needed for a website are:

  1. Request - something that comes from the client.
  2. Response - something that goes back to the client.
  3. Resource - the data.
  4. Repository - a place to store, retrieve and update Resources.

Do you see the difference? Instead of hiding the low-level stuff that modern frameworks tend to hide, we embraced it! Indeed, last year, after months of work, when we finished the rebranded Danish entertainment website called HeltNormalt, there are just those four things behind the scenes. Yet, believe it or not, the new website holds more than ten different types of content compared to just two in the old one. Here are some statistics about code:

  • Controller code in the old website, LOC (Lines-of-Code): 7625 vs Resource code in new one, LOC: 652.
  • View code in the old website, LOC: 2528 vs Transformations code in new one, LOC: 1898.
  • Model code in the old website, LOC: 19125 vs Query code in the new one, LOC: 614.
  • Zend Framework behind old website, LOC: tens of thousands vs Our framework (Graphity), LOC: ~5000.

Less code - fewer bugs.

The Model/Query difference comes mainly from our ORM. We used Propel, which generated a lot of code. You might ask, what's the Query thing? Well, we don't have models - but we do query the data. The point is, because our data is autonomous, we need only to query for the stuff (properties) that we need. We need not describe the data as Models.

Putting it all together (IRL).

Let me explain how it all works in real life, on the HeltNormalt website, without diving deeply into what RDF and SPARQL (the Semantic Web technologies, behind the scenes)

Resources

Every resource in our datastore is comprises a number of triples, each of which is a Resource URI, a property name, and a value. For simplicity sake, a heavily striped down version of a resource looks like this:

@base <http://heltnormalt.dk> .

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix sioc: <http://rdfs.org/sioc/ns#> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .

    </striben/2012/02/28> rdf:type Post .
    </striben/2012/02/28> dct:issued "2012-02-28T00:00:00"^^dateTime .
    </striben/2012/02/28> sioc:has_container </striben> .
    </striben/2012/02/28> foaf:thumbnail </img/strip/thumb/2012/02/28.jpg> .
    </striben/2012/02/28> foaf:depiction </img/strip/2012/02/28.jpg> .

It is pretty straightforward and natural - every resource has an URL. Each URL (thus - resource too) can have named properties, where each property also has a value. A value can be another URL (thus - link to another Resource) or a string.

Remember, how I wrote that we don't need Models, because our data is self-contained? This is what I meant.

P.S. The snippet above is a very helpful thing called Turtle syntax, though simplified here. Actual data is in RDF/XML.

Queries

Now, as we don't have Models and ORM as such, we still need to get our data somehow. So imagine above triples as a graph - in the center there is a resource with edges going out (properties). On the other part of the edge there is a value - a string, or (surprise surprise) another resource linked to this one. Now imagine hundreds of resources linked this way. A web of linked data.

How do you retrieve information from this graph? By a thing similar to pattern matching! In the query you say that you want to get some triples with some properties and values, and leave some blanks that should be filled up in results. Sounds vague, but here's an example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX foaf: <http://xmlns.com/foaf/0.1>
    PREFIX sioc: <http://rdfs.org/sioc/ns#>

    CONSTRUCT {
        ?uri rdf:type sioc:Post .
        ?uri foaf:thumbnail ?thumbUrl .
    } WHERE {
        ?uri rdf:type sioc:Post .
        ?uri sioc:has_container <http://heltnormalt.dk/striben> .
        ?uri foaf:thumbnail ?thumbUrl .
    }

A very similar query is executed when you type in address: http://heltnormalt.dk/striben - and you get the list of strips. In simplified form results look like this:

<?xml version="1.0" encoding="utf-8"?>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1">
        <rdf:Description rdf:about="http://heltnormalt.dk/striben/2012/02/28">
            <rdf:type rdf:resource="http://rdfs.com/sioc/ns#Post"/>
            <foaf:thumbnail rdf:resource="http://heltnormalt.dk/img/strip/thumb/2012/02/28.jpg"/>
        </rdf:Description>
        <rdf:Description rdf:about="http://heltnormalt.dk/striben/2012/02/29">
            <rdf:type rdf:resource="http://rdfs.com/sioc/ns#Post"/>
            <foaf:thumbnail rdf:resource="http://heltnormalt.dk/img/strip/thumb/2012/02/29.jpg"/>
        </rdf:Description>
    </rdf:RDF>

Sorry for the XML, but I promise - it is important. I hope you see how the pattern matching worked here. Just in case: query said "find me all resources (?uri) and their thumbnails (?thumbUrl), where resources have type Post, container and a property thumbnail."

Transformation

Time to live up the promise - why XML is important? Well, because for transformations we used the most natural transformation tool available for XML. The XSLT. I won't dive into XSLT, the Wikipedia article has some examples, but suffice to say that we can represent our data in any way we want - HTML, XML, JSON, Plain Text, etc. just by using different XSL stylesheets. We could even generate a valid SQL dump to import into MySQL database, but seriously - we don't want to do that. :-) (But we did an exact opposite! We had to import old data.)

That's one of the greatest outcomes of all this - logic is stripped down (but still there is some logic), what's left for you is XML transformations. The thing is, you greatly reduce a chance of bug - your data can be incorrect but can not contain bugs or be invalid (as long as validation in datastore works correctly). And when we did have had issues when some properties were missing in resource, nothing broke, we had our XSLTs set up in way that just the part where that property value should be shown - it was not shown. No ifs, no template logic. And you get this pretty much by default if you use XSLT correctly.

Embrace the Open Source version of this!

Early in the beginning, Martynas said that after we finish with the website, we should extract the back bone system and present it as an open-source framework. Behold fellow hackers - The Graphity.

At the end of the process, we felt that we did not invent anything new - we just reused what was always there, but hidden. So sometimes we like to call Graphity not a Framework, but instead - an Architecture! In theory this approach should work in any language that is widely used in web development today. Martynas has a working Java version of the same thing, slightly more sophisticated because Java already has some packages to work with linked data / semantic web, so he did not need to write everything from scratch as in PHP version. Python? Ruby? I hope there will be a version for those languages, and others, too!

Oh, and by the way, you might be puzzled - where to store the data if you decide to play with Graphity? Well, I happen to know one SaaS company, Dydra - just register and use it. In fact we did and, oh boy!, how friendly and helpful they were through the process of developing HeltNormalt! A perfect example for me how customer care should look like. Seriously, check them out.

Adventures at M.I.T. and a paper about Graphity.

In fall 2011 Martynas found a call for participation in a Linked Enterprise Data Patterns Workshop and said we should try to enter this event, by writing a short paper about what we did and how and how this could benefit the Semantic Web movement.

We did write the paper, we were accepted, and in early December flew over The Pond to Boston, MA to do a presentation! Actually, then I felt that our presentation looked a little bit off (a Danish entertainment company website) among enterprise grands like: IBM, Nokia, Oracle, just to name a few! But then again, who cares about being a little bit off, when sitting next to Tim Berners-Lee and listening to other bunch of great people talking about this great technology and drafting the guidelines for it's future.

If you are interested in the paper we wrote, it has more comparative information how this approach differs from todays common practice in web development. Read it here: "Graphity - A Generic Linked Data Framework".

We invite you to collaborate!

I hope someone endured up until here :-) This is the most important part actually!

We truly believe in Graphity, but as our ways with HeltNormalt have parted - we can not spend a significant amount of time on it.

Although we do spend some hours per week improving it - more hands and minds are always better, so if you feel interested - don't hesitate! Try it out and contribute, we will be there for you on our Github account, I also will try to write more about it on this new and shiny blog and you can always drop an email for me directly or info@graphity.org

What are your thoughts on this - let's talk in comments!

P.S. This essay wouldn't be real without some help and feedback from: Martynas, Aleksandras, Aurelijus, James and Adomas. Huge thanks, guys!