Julius Šėporaitis / Sepa Software Engineering

Being email friendly

Managing email is hard. All notifications, error messages and marketing email aside I get a few actionable emails from people, a day. The number itself is by no means big, but the keyword here is actionable. Actionable means - it has an important information inside - which triggers an action - a code review reminder, a bug hunt request, a check in the code to confirm that a feature is not a bug, a reply to a partner or coleague, unblocking their tasks.

All of it takes time and we always want to save it to have more time for our own tasks at hand. And I, for instance, can be charged guilty for trying to save this precious time by reducing the email output: write less emails or if I can - write less words. Less words mean less information and human touch. Less of both result in miscommunication, uncertainty and general grumpiness.

I used to do this extensively - reply when I have something to share, with just enough information. If an actionable email came in - I used to reply when I had done it, unless explicitly asked for an estimate. Well, it took one book and a period of silent experiments, until I think I finally can say that my email etiquette has made an almost 180 turn, not full 180 because when overwhelmed - I sin and not reply properly or in time, and my personal inbox still hurts. But here's how the good things happened...

There are countless blog posts and books about inbox management, what might set this one as unique - is the fact that the book which helped me was not about email management, but rather - about making friends. In fact a neveraging classic - "How to Make Friends and Influence People" by Dale Carnegie.

It is a simple book about human(e) interactions. I honestly regret not reading this book much much earlier, but I am happy that I did eventually. What I liked the most about it - it had practical examples of situations. For me - lacking in people skills - this was the backbone of the change and I started applying the principles from the book, that could be applied to email, immediately.

In the similar way as the book, I'll present a principle and how I applied it. Hopefully, you'll find something new and useful.

Remember that a person's name is to that person the sweetest and most important sound in any language

Writing a name in emails for me feels rude, but if an email is genuinely about the person why start it with plain ol' "Hi," or other, when you can write:

Hi Name,

Feels awkward to start with a name in a multi-person thread, but if I appreciate what person does - I would try and use the name at least once in the main body.

Talk in terms of other persons interests

I use this one at work, if I have to write an email to our partners - I spend couple of extra minutes thinking what my goal is and how to reflect it in terms of what the recipient might be interested in. It is very easy to just go with "Hi, I want this and that done by then, otherwise I cannot do mutual interest here," it is much more effective to write:

Hi Name, I am close to delivering mutual interest, but before I can do that I need couple of questions answered.

Make the other person feel important - and do it sincerely

I recently read an article, of which one idea stuck to me: leaders avoid using first person pronouns. So, striving to emulate a small leadership quality, I started to avoid using first person pronouns, except when absolutely necessary. In turn - the result is more about the recipient. It felt unnatural and insincere at first. Just because I have never tried to do this, but after a little practice - multiple rewrites of the paragraph - I would get to a satisfying result and feel genuinely happy that I do not sound selfish and the person on the other end, whom I might have never seen, would feel important - I started to sincerely want that.

Hi Name, mutual interest is not far from being finished and your guidance could help me take the last step.

Begin with a praise and honest appreciation

Raise hand if you started email with your needs and requirements. Well, working day to day we all usually are at a mercy of various requests that we have to eventually fulfill. We all strive to feel important and appreciated, but do we appreciate others? I certainly feel like I did not and even as I write this I think - I could do better. Why not spend an extra minute or two and do a recap of things already done and write one or two sentences about that?

Hi Name, your guidance was really invaluable for us so far towards the mutual interest. We are not far from the finish line and if you could provide us with more information - that would really make a big difference.

If someone did a good job taking initiative - mention it and thank for it. If you are just a man in the middle delegating requests that others implement - mention how the implementation is doing and how it helped. If long weeks of mutual work is finally over - recap and reflect how good the collaboration was! Heck, if someone replied quickly - appreciate the quick response!

People already spent or will spend time for your interest - at the very least say "Thanks a lot!" I know this sort of example smells generic - it is, but the point holds: there is always something to appreciate.

Let the other person save face

Too often something is mistyped or forgotten. At work people are busy and overwhelmed with more than work - it is easy to forget that mistakes are natural. Thus, instead of pointing to an error, why not think - is it really that important?

Here's one idea from computer science - idempotence - that I applied multiple times with a lot of success. Roughly speaking it means that you can do certain operation multiple times without changing the result. I apply it by thinking: is the mistake bad enough that if I would ask to repeat the same action - something would break? If the answer is "Not" - I rephrase the question as if nothing happend and send it back. No harm, no bitter.

Hi Name, your guidance was really invaluable for us so far towards the mutual interest. We are not far from the finish line and if you could provide us with more information - could you send us documentation on XYZ?

Call attention to people's mistakes indirectly; and Talk about your own mistakes before criticizing the other person

Another tool I use - is to not assume too much about what happened at recipient side of things and assume it was my fault. I am not a native english speaker, but at least 8 hours a day I speak, think and write in english. Mistakes are destined to follow me. When something is done in a different way than I asked - I assume it was because I was not clear enough. Rephrasing the question in terms of what I failed to say and what I meant (not pointing to failure to understand me) usually helps a great deal.

Hi Name, your guidance was really invaluable for us so far towards the mutual interest. I have read the documentation and I failed to find the part explaining what I need to do, could you pinpoint me to a page or keyword I should search for?

By admitting that you misunderstood something or failed to express something - you will be right most of the time, even when you're wrong.

Ask questions instead of giving direct orders

Do you respond to "do this and that" with enthusiasm? Or "From now on things will be this way"? I don't, though I used to do it, not that harshly, but in a similar manner. The fact is - noone likes to take orders, especially if they are not justified. Save yourself some time arguing and ask a question: "What do you think if ..?", "Could you find time for ..?", "Do you think doing ... makes sense?" - what is worst that could happen? You'll hear something you haven't thought about.

Throw down a challenge; and Let the other person do a great deal of the talking; and Let the other person feel that the idea is his or hers

Just recently after a nudge on the side I had another thought about how I write specs: why do I take the juiciest bit out of the task? The exploration? The ideas? The possibility of slipping, getting up and finishing it just in time? Oh the excitement! I don't know! I work with smart bunch of people and I decided to give it a try - my latest spec is "where we are now" and "where we want to be" and couple of observations where I think slippery places might be. So far it's going great - there's now a plan which sounds better than I had been thinking of and it is about to kick off. Hope it goes well!

No but's and yet's

I think this one is not from the book, but a rule that I adopted to great extent.

What do you feel when you read "You are great, but ..." or "Things are working fine, yet ..." - you just wait for something negative. Alternatively, what do you think when you read "You are great and ..." or "Things are working fine and ...". Your defenses are down! Red alert! No!

The ugly truth is that if I say something relatively negative after "and" it sounds much much milder than after a "but" or "yet", even if the whole sentence feels weird. Compare "This is great, but could be better." to "This is great and it could be even better."

Conclusion

To most - all of this probably sounds obvious or natural, but for me - it wasn't.

I can even share couple of success stories: The changes were noticed by colleagues. One partner replied even while on holiday, although the request I had was nothing urgent, I just wrote it in a friendly way - I felt happy and appreciated the reply. Another, a big tech company every year running a big show in London, said it was a pleasure to work with us - which indeed was from our side too and this appreciation was exchanged multiple times. Various technical support communications are now more exciting when you feel that people care and try to reply quickly and even follow up unprompted.

I have yet to bring all this into my personal inbox, which lacks attention, and generally into real life people skillset. Sometimes I forget, but I guess practice just takes time.

If you feel you can relate to all this - I cannot recommend enough "How to Make Friends and Influence People" by Dale Carnegie - it has made a profound impact to the quality of my people skills.

Most url dispatchers are redundant.

Last week as I was reviewing a small snippet of code adding yet another url pattern to application route table, I got struck by this thought:

wait a second, this url routing - it should not be part of application, but rather - of webserver. Application is responsible for taking input and producing output, and url parsing and processing is not part of that process. Meanwhile, webservers are super efficient at taking a request and passing it to a correct handler functions, via gateways.

Immediately, I noted it to myself and over following days I began to tinker this idea in my mind. Why did it suddenly pop in my head? Does it make sense at all to do that? If it does - why it is not done and how it should be done? What would be some immediate problems if suddenly I decided to switch all my application routing to webservers?

As to why it caught my attention, the answer is pretty easy: at my startup I happen to be two-in-one person. As a software engineer I work on the APIs our servers provide for mobile application; as a site reliability engineer I make sure that our infrastructure is in perfect shape to run the server code - part of which is configuring web servers. So, as I am sharing my time between the two parts of the whole picture, I suppose, the inception of this idea was a natural thought process.

Pros

Does it make sense? The more time I spend thinking about it, the more I am convinced that it actually does. Hear me out:

1. Web servers are more efficient at routing urls.

Probably all modern day web frameworks or libraries implement a url routing component. This component usually works by applying request url to a list of patterns to find first most specific match. Some of them do not even try to optimize this process[1], some of them - do[2, 3]. But they are all bound by the performance of the language they are implemented in and the overhead of request processing and initialization.

I may as well be an ignorant fool, but I think in this exact case a compiled language solution would always win and so it happens that modern production grade web servers are written in compiled languages. Also they have more options for URL patterns, e.g. they can be static, prefix based or regular expressions. This leads to my next point...

2. Web servers have better low-level[4] http control options.

After the routing is done, web frameworks require application to parse request, either directly or with support from helpers, to give some control over which HTTP methods are supported. Also, some of the frameworks force you into doing response caching at application level.

Again, this could be done by webservers - they are good at low-level http control as well as working as a proxy cache. It would free the application of a lot of redundant code.

3. Web servers are good at rudimentary url processing.

Imagine, a webserver matches correct url pattern, extracts the parameters and forwards them to the application handler. Well... here's a snippet from Nginx configuration[5]:

location ~ ^/api/1/resource/(?P<resource_id>\d+)/?$ {
        uwsgi_pass      app_server;
        include         /etc/nginx/uwsgi_params;
        uwsgi_param     SCRIPT_NAME /resource;
        uwsgi_param     PARAM_RESOURCE_ID $resource_id;
        uwsgi_modifier1 30;
    }

And a handler:

import uwsgi

    def resource(request, response):
        response('200 OK', [('Content-Type', 'text/plain')])
        yield "Resource Id: {resource_id}".format(
            resource_id=request['PARAM_RESOURCE_ID']
        )

    uwsgi.applications = {
        '/resource': resource,
    }

Notice how resource_id is captured[7] and passed along to uWsgi handler (upstream).

Now the framework/application is doing what it actually is supposed to be doing: given input - generate output.

4. Url patterns are inherently static.

Emphasis on the word patterns.

Any time a new website view (function) or resource (for API) is created - url pattern to access it is born, and it shouldn't[6] change. In essence, meaning it is a static content. Gentlemans rule of thumb in web development is that serving static content (js, css, images) should never reach the execution path of your application and so - most of the static files on the internet are served directly by web servers[7].

Wouldn't it make more sense to somehow deploy the mapping of patterns-handlers to webserver, rather than process them on every request?

5. Url routing becomes reduced to to a well defined interface.

I think this is the most impactful thing that comes out of the previous 4 points.

The single black-box component, that ties together all different pieces of many web frameworks together, is gone. A component, which more often than not forces the developer to a specific mindset or code architecture, is gone and suddenly - nothing stands in between a request and application code.

Please, give yourself a minute for this idea to sink in - a web framework is reduced to a well defined, standardized and transparent interface[8] that is plugged into a webserver. To me this sounds like liberating.

Cons

Thinking further, I tried to come up with countering ideas - what immediate problems this practice would cause. Disclaimer: I am very high about this idea right now, so most likely I am doing very bad job at countering it. More ideas are welcome.

First, url patterns are used two ways - to resolve request into a handler, and also a reverse - given parameters generate url. Wouldn't this immediately force into duplication - in application and in web server? A dubious answer is - it depends. Obviously, if application needs them and web server needs them and you type them in both by hand - it is of course a duplication.

But then again, if one place is promoted as authority - let's say application level - a tool that generates web server url pattern configuration file, that can be deployed together with code, would reduce the effect of duplication by automating it.

Second - testing. Some of the frameworks provide a testing facility, that allows to mimick url requests in unit tests or integration tests. At first thought - this would require to implement a url dispatcher nonetheless, just to make it work, but...

On second thought, does that testing facility actually test the url? No way - there's a webserver in production which can drastically change the behaviour of any url, as often is the case. What that facility is doing - is just testing web frameworks url routing component. Does this test belong to unit or integration tests? No. More like acceptance or smoke tests.

So with urls in webserver, fake testing client becomes redundant and can (must!) be exchanged with a real http requests library. Further, acceptance tests are in their correct place and test the actual thing, rather than faking it.

Third - what's the point? Well, with this I can not argue, but I can express my position, which is: I am a huge fan of clean and simple interfaces. I am a huge fan of being able to control the structure of the code as it fits the application and not a framework or a library.

Some might say, that there's no point even thinking about it, because url resolving takes very little time wherever it happens. True! But... I am a huge fan of less code - if this allows me to shave a huge chunk (all) of code standing between my code and the web server - for the better!

Also, I am curious how one would benchmark url router under heavy load? I think benchmarks are hard and I haven't done anything more except for a simple proof of concept that such setup could work, but my gut tells me that under stress - performance of non-webserver url dispatcher would diverge to the worse. (This is my speculation.)

History

As to why would it seem so ubiquitous for everyone to use a framework/library embedded url dispatcher I came up with two hypotheses:

1. It is historical legacy.

Long time ago web servers were configured to use one directory with multiple .html files as an entry point to a website. In fact, people were lazy to always enter index.html at the end of url, so there were options like DirectoryIndex, which would tell web server which file to load if none was specified. Then dynamic and template languages emerged and suddenly websites became a bunch of .py, .php, .cgi script files with HTML embedded between bits of logic, representing a page of a website.

2. Convenience.

Before long someone noticed that with the help of dynamic languages it is much easier to process all requests in one index.php file, rather than repeat a lot of bootstrapping code throughout the files. Also, web server management ten years ago was really cumbersome process, that was mostly frowned upon and thought in terms of "yuck, let's finish this quickly and move back to the real deal."

Fast forward to this day and adding a new url to application level code seems normal, because hey - all the frameworks and libraries do that.

What now?

Honestly, I do not know, but I will continue to tinker this idea further. What do you think? (Discussion on HackerNews).

Footnotes:

[1]: I beleive Django url resolver considers every pattern as regex.

[2]: I believe Pyramid traverse does that.

[3]: JAX-RS Specification.

[4]: By low-level I mean which HTTP methods are supported and not Authorization or Authentication.

[5]: Regular expressions names in Nginx

[6]: Cool URIs don't change

[7]: Unless the static content is compiled by application code, but even then it is cached on webserver for performance reasons.

[8]: Be it WSGI or Pump.

An essay for aspiring engineers

This article was originally posted in my Lithuanian (native tongue) blog and was received quite well among my colleague software engineers as well as non-technical friends. It was meant as an answer to a different blog post complaining about: aspiring programmers asking wrong questions about career bootstrapping, businesses and universities marketing constant lack of talent (encouraging too much people to study IT related fields) and all the worthless stuff the universities are teaching. Some of the bits apply only to situation in Lithuania, but I think the main point I tried to make is universal.

It all started with another blog post written by a person, who had studied computer science, does not work as a programmer, but has career in IT. I thought that I could walk through similar points she makes, providing my perspective, because I also work in IT and as they say - "I have seen some things."

Questions

Seldom I receive questions on how to start a programming career. Where to start? What technologies to learn? Which ones generate the most revenue? It doesn't happen very often, but when it happens I never give a straight answer, I give directions where to look for answers - books, blog posts, other people. Much like a solitary monk would say to his first apprentice: "You must find your path yourself, young one."

Some could think that I am evasive, but I disagree. I do not know the correct answer to those and other questions: is my decision to be a developer - good? What are the qualities of a good programmer? How does a single day at work looks like? Honestly, I do not believe anyone could answer them - not even the last one. The difference between how my day looks like now and how it looked for someone, when I took my first steps learning a programming language, is like day and night.

However, I can give you one question with a constant right answer - who is a programmer? (software engineer & developer too - I'm using them interchangeably here).

A programmer is a person, efficiently solving someone else's problem(s).*

* With each solved problem programmer is creating at least one for him/her-self.

When you look through this perspective - it is not important any more whether web or desktop programming jobs are better, but rather what problem a person wants and is able to solve efficiently. If you're just starting out - internet is full of helpful resources, e.g. look for engineering blogs of tech companies[1]. Blogs full of articles about problems they were having and how they solved them. Read and learn! Get some perspective. If you are able to solve the same problem more effectively - you are guaranteed a job. If multiple companies have the same problem - maybe it is time to think about a startup.

Any technology you can think of is only a tool in developer hands, thus a question, like "which one to learn to maximize my income", is obviously from absurd-land. The more tools you have - the bigger solution space you create for yourself to choose from. The bigger solution space you have - the more chances you will solve the problem effectively. Even this myth of "natural lifelong learning skill", which engineers and developers are praised for, is nothing more than comprehending: "if I will always dig a well with a shovel, it is only a question of time until someone with an excavator will come to change me". I know few people, who are developers but are far from enthusiastic about it, yet they have enough grit to follow this truth and keep their skillset up-to-date.

Sometimes, however, even importance of all available tools fade away in the light of someone's ability to squeeze a maximum out of a single one. There are people[2], who are solving problems with Excel and/or Visual Basic (stereotypically an inferior language). I know at least one in person. And you know what... no one could beat them at what they do. Why? Switching would be too costly or breaking a human habit (users) - too hard[3].

Marketing

This was about various companies PR agenda too often being about lack of talent and boasting the size of income, which is usually couple of times than average monthly wage in Lithuania. Also, about universities agreeing on this agenda to boost number of student applications, because unis in Lithuania benefit from something called "student basket" - an amount of money they get for a single student. This, supposedly, had to increase the quality of studies. Don't ask to explain...

Marketing is good and I will repeat the message - there is a lack of software engineers. Not any engineers, but those who are able to understand the scope and context of a problem, come up with a sensible solution and write source code to make computers (even thousands of them) solve that problem in an automated and efficient manner.

Companies, who are not picky enough about their engineers, often do not understand the cost[4].

On the contrary, companies that value their product quality are really harsh at picking talent. And I mean really harsh. Consider this: output of my university class was roughly 70-80%. Best case, a good company will hire only 1 out of 10. Best case.

In Silicon Milkroundabout[5] I had a chat with a colleague working at StackExchange (Disclaimer: I do not work for SE, but I call all engineers my colleagues). He said they hire less than ten out of hundred applicants. I couldn't believe it at first, but the process is fine tuned to filter out only the best: a short call to acquaint with a candidate, a longer technical call, if there are doubts - another one, finally you're invited on-site for even more technical interviews, sometimes lasting throughout a day.

On the other hand - universities are educating knowledge synthesizers and (ideally) - innovators. Before applying to university, a person must understand that there is not enough demand for that many people with higher education. (There is this absurd belief in my country, arguably a soviet relict embedded in my parents generation and they are pushing it further, that if you do not have diploma - you are worthless and won't get a job.) For instance, some of my friends do not have a formal higher education in anything, but they are developing amazing things. I also had a chance to do an interview with a guy who, from the looks of it, dropped out of school to pursue a programming career. While I might not approve such approach, I was impressed with his ability to come up with optimal solutions to my questions and follow-ups, almost immediately. On the other end of the spectrum - some friends knew nothing about programming before uni and now are on a successful at software engineering career track.

Even better, when you are not a programmer by education, but know how to do it. I first heard this from a programming enthusiast in Copenhagen - he had quoted Zed Shaw[6]. Later, I had to hold my jaw closed when I found out who he was: general counsel at the worlds biggest shipping company - by day, Clojure hacker by night.

Studies

Disclaimer: this part is heavily based on quotes from that other article, but I'll try my best to give context.

"Universities lag behind business requirements." (This phrase is almost ubiquitous between students and businesses in Lithuania.) But following this logic it would mean that universities are training engineers almost for any business - be it template website "bakers" or banking systems providers. NO. Like, this NO!

Look:

Universities must prepare knowledge synthesizers who are, able to solve unique business problems.

Following this heuristic - it is the businesses that lag behind. It isn't bad - some of them do not need brain-picking bespoke solutions. It is absolutely natural and normal. But imagine when after four years of ingesting all the important knowledge about software engineering and ready to work on the next big thing, one finally is tasked to mash together a website using a template and a CMS. That is why I felt cheated, didn't write my thesis and dropped out after four(!) years. I completely lacked motivation and often sang the quote I started this section with. Took me couple of years, and career steps, until I understood what I missed, and I am fixing this mistake at my own expense now.

This next quote (about curriculum) also resonated with me:

However, you are forced to write a compiler, because "what if you'll need to write one." Maybe you could add aircraft flying to the curriculum? In case the whole crew passes out during a flight. You know, just in case.

It resonated with me, because once I was singing the same song - why the heck would I need to "write an operating system", dig into "computer architecture" or how "computer networks" at a low level work? All operating systems are written too hard to comprehend for newbie, no new mainstream computers were invented recently (except[7]) and there are networking libraries ready to be used.

In the last 6 months I had to roll my sleeves and dive into Linux Kernel source code to understand why packets sent over network to rsyslog were being lost, when all the documentation I could find and my understanding of the domain said they should go through. Do you feel the irony? All three of them - low level code, operating system and network. I could've brushed it away with an incompetent "this is not my domain anymore, I don't know why this is happening" and go after another, more rewarding, task. However, my present self was super happy that my past self wasn't that dumb and didn't skip all the classes, and those skipped - were compensated by self-educating. And I found the root cause and I fixed the problem: there is a limit to UDP packet size, which you can trivially increase, but only to as much memory as kmalloc can allocate, which, indeed, had a cap of 128k bytes.

There is a saying - ignorance is a bliss, and it fits here perfectly: there's a programming language and compiler or interpreter, nothing out of ordinary, but if something is not working - it becomes a language limitation, it is compiler fault or interpreter performance issue. This is where all the problems hide from ignorance and incompetence - this is the bliss.

Software engineer efficiency is limited to the depth of understanding his/her stack.

Sounds intuitive enough - if a developer only knows how to use a framework, his efficiency will be limited by that framework. If an engineer knows only his language, all optimizations will be limited by the language features. If, after everything else, they also know language internals - they are competent enough to consciously write the most efficient code.

If you ever dreamt of working at companies like Facebook (or build one), think of the problems they were having - Facebook website code was too slow. Rewriting 9.2mln[8] lines of code without stalling every day progress - is an abysmal step[9]. What else? Either buy a thousand more servers or rewrite the underlying language interpreter into a compiler. They successfully managed to do exactly that[10]. For the same purpose, they redefined how data centers are built - hardware and software[11]. I am doubtless - engineers who pulled all this off are heroes among their peers. And - they knew how to write a compiler.

That indistinguishable half a second saved, between you clicking "Like" and seeing a response, accounts for millions of dollars when we are talking about tens or hundreds of thousands of servers, generating Facebook wall for all of us.

Finally

I think I used different arguments to tell the same idea as in the other article. It doesn't rain gold bricks for a software engineer. On the contrary - work is full of intensive brain picking and, often, very little reward, because with every problem solved you have a new one for yourself - maintain the code. It all depends on problem solving skills and understanding of the toolkit, whether you're going to do boring tasks or push the boundaries, wherever they might be. So if you are an aspiring software engineer - start by finding a problem to solve. It doesn't need to be grandiose - solve one for yourself, your family or friends. And if you got hooked - continue, you are on the right path.

Hope you liked this article, please do comment or upvote on Hacker News.

References

Some additional reading, explaining where my thoughts are coming from. If you trust me - no need to follow them, but it's recommended if you want to check the facts.

The original article this was intended to reply (in Lithuanian) - here.

[1]: Facebook Engineering, Twitter Engineering, LinkedIn Engineering, Netflix Tech Blog, Code as Craft.

[2]: My friends call me a scumbag because I automate my work when I was hired to do it manually. Am I?

[3]: We are not normal people

[4]: Bad Indian Programmers

[5]: Silicon Milkroundabout

[6]: Zed Shaw on Programming

[7]: D-Wave Systems

[8]: How many lines of code is Facebook?

[9]: Things You Should Never Do, Part I

[10]: Facebook Speeds Development With “HipHop Virtual Machine”, A 60% Faster PHP Executor

[11]: Facebook Saved Over A Billion Dollars By Building Open Sourced Servers

There is a problem with web frameworks

This is an essay about a new way to build websites, a way completely different from what currently web frameworks (at least PHP) offer. I will dare to say that they all are flawed and then present you one way that to fix the problem, and will boast a little, that this way was presented at one of W3C workshops for guys from W3C, IBM, Nokia, Oracle and others. The way is also available for you to try out, as an open source project. Although this post is long, I hope you won't be frightened by that and will enjoy reading it as much as I enjoyed writing it!

However...

TL;DR. Frameworks are bad because of object-relational impedance mismatch and leaky abstractions. We made Graphity - an open-source project. Read our position paper for W3C workshop here.

First things first. In the summer of 2011 I was working, among couple of other things, on a website called Wulffmorgenthaler. There is a different story behind the english version (just in case you have questions), but today I will concentrate on a Danish website and its successor - HeltNormalt. So, around the middle 2011, the owner of the website decided to do rebranding - new website with a LOT of new kinds of content. Its old codebase was terrible piece of craftsmanship, part of which, sadly, involves me too, but the rebranding meant a green light to a fresh start on the codebase. Oh boy, was I happy about it! No more whining about Other People's Code and also I was anxious to not repeat my past mistakes and make my code even better this time.

At the same time there happened to be Symfony2 Beta. I had not used Symfony, but decided to take a look - especially as some of my good developer friends were buzzing about it. I downloaded the sandbox, read the documentation and some blog posts. Symfony2 looked as a fresh breeze after years with Zend Framework. I liked the structure of the code, I liked their approach to development/production environments and asset management. I thought, "I shall push to use it for our fresh start!". Early adoption all the way.

But then again, I was not in this alone. We were three coders and while I happened not to be a lead, I did have a voice. Oh man, did I preach "Symfony2! Symfony2!", but the final word was in other hands.

The lead - Martynas, had a long time and never ending interest in a technology called the Semantic Web. He deserves credit, for what happened in the following months. So the decision was made, to use his approach and make any tools neccessary along the way (there was not much of them for PHP anyway). Then - all hell break loose. I and another colleague, Aleksandras, both loudly objected, argued, discussed, SHOUTED IN CAPS LOCK and so on. But the decision was immovable and as the future showed - it was to the benefit of our own experience, the code quality and a different perspective about things.

The coding began... And during those months of development I saw that there is a drastically different approach to do things. I do not lie - it took maybe three months to grasp all this myself. Sometimes I didn't even know what I was coding! Luckily, Martynas had a clear vision and lead us through, though there were still some discussions up until the end.

So, what did I grasp?

There is a problem with frameworks, because

They try to help me too much. I started to hate the idea of using any MVC Based framework. Why? is a good question. Here's an answer:

Contrary to the "easy to learn" slogans, shiny documentation and easy examples, frameworks do not let me prototype things fast, unless I know them inside-out. And even if I am expert, I am constrained.

Consider this: a framework is missing a feature. There are two ways to solve this. I can search for a piece of code written for that mutual purpose by another great developer, and use it. But, 9 out of 10 times that piece of code does not fit my exact needs - I will need to "polish" it. Or, it fits my requirements, but the code structure is totally different. Or, it has no tests, but seems to work. I might end up writing it myself, hopefuly if I understand the problem domain good enough, the code will be pretty good too. However, even in this case I am constrained. I am bound to structure my code in the way the framework authors intended the framework to be extended. What if I want to be a free spirit and do things my way if I know I will do them better?

P.S. As I am writing this - "Linus Torvalds on C++" appeared on Hacker News. And I totally relate to his ideas, in a different domain though.

models are a fifth foot on a dog, and

I have long since heard that ORM is an anti-pattern and I agree with that. But I had always thought the problem is with implementations - no one created that right one yet. Now i have come to the horrifying conclusion, that Models in MVC are totally worthless piece of code. <irony>They are needed only for bugs to hide somewhere.</irony> This comes from realizing, that mostly I opened Model code to fix something.

Consider this mindflow: Requirements of (especially web) software are constantly changing and these changes mirror directly onto the data structure. When an unavoidable change to the data and logic arrives, you update them and in the end open up Model class, add/remove methods/parameters, change various queries to use those methods/parameters. It feels so unnatural, that when you change your data and your queries, you have to change something more. This is a constraint on data. There is a better way.

Your data should be your model - it should be self contained. Your code shouldn't bother about internal representation, rather it should care about data transformation. Which brings me to my next idea.

views are impostors!

Yes, Views do not do what they should. Views are representing your data, rather than presenting. This is so subtle difference, that didn't come to me quickly, but when I grasped this - I was shocked again.

I'd expected view to be, in one way or another, the same data I have in my database, except that it is presented in the way the human eye or a browser script could make sense of it. However, your commonly known View mocks itself by representing the data using a spaghetti of moustaches! View is also static and tightly knit to the underlying Model it tries to represent.

Here's an example: a requirement change happens for some theoretical website. Some theoretical engineer opens up Model code, updates properties and queries in accordance to new requirements and then goes on to change the View.

It's quite alright to update the views with information on how to present new properties, but why should a View care if those properties are available in the data or not? It shouldn't.

The alternative is The Transformation - a thorough dramatic change in form or appearance.

Think of it like this: The Transformation thingy knows what properties our data might have and how to present them. Yes yes, tiny snippets of code. Then you query your data, retrieve some properties and push them through The Transformation. What comes out? Data is presented according to these rules.

If you change the query, add or remove properties, the data is still presented without any change to the transformation code!

I know, the difference might seem very subtle, but the implications are significant.

You can also consider this metaphor: a View here is like looking through a window in a picture - you will see the same thing until you make another image of the same window from a different perspective. A transformation, on the other hand, is like looking directly through the window and seeing a different view as you smoothly change your perspective.

How it should work.

What do I want from a framework, then? Primarily, healthy abstractions of the low level stuff that I need for a webapp. For example, I now argue (I wouldn't have believed a year ago), that the simplest abstractions needed for a website are:

  1. Request - something that comes from the client.
  2. Response - something that goes back to the client.
  3. Resource - the data.
  4. Repository - a place to store, retrieve and update Resources.

Do you see the difference? Instead of hiding the low-level stuff that modern frameworks tend to hide, we embraced it! Indeed, last year, after months of work, when we finished the rebranded Danish entertainment website called HeltNormalt, there are just those four things behind the scenes. Yet, believe it or not, the new website holds more than ten different types of content compared to just two in the old one. Here are some statistics about code:

  • Controller code in the old website, LOC (Lines-of-Code): 7625 vs Resource code in new one, LOC: 652.
  • View code in the old website, LOC: 2528 vs Transformations code in new one, LOC: 1898.
  • Model code in the old website, LOC: 19125 vs Query code in the new one, LOC: 614.
  • Zend Framework behind old website, LOC: tens of thousands vs Our framework (Graphity), LOC: ~5000.

Less code - fewer bugs.

The Model/Query difference comes mainly from our ORM. We used Propel, which generated a lot of code. You might ask, what's the Query thing? Well, we don't have models - but we do query the data. The point is, because our data is autonomous, we need only to query for the stuff (properties) that we need. We need not describe the data as Models.

Putting it all together (IRL).

Let me explain how it all works in real life, on the HeltNormalt website, without diving deeply into what RDF and SPARQL (the Semantic Web technologies, behind the scenes)

Resources

Every resource in our datastore is comprises a number of triples, each of which is a Resource URI, a property name, and a value. For simplicity sake, a heavily striped down version of a resource looks like this:

@base <http://heltnormalt.dk> .

    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix dct: <http://purl.org/dc/terms/> .
    @prefix sioc: <http://rdfs.org/sioc/ns#> .
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .

    </striben/2012/02/28> rdf:type Post .
    </striben/2012/02/28> dct:issued "2012-02-28T00:00:00"^^dateTime .
    </striben/2012/02/28> sioc:has_container </striben> .
    </striben/2012/02/28> foaf:thumbnail </img/strip/thumb/2012/02/28.jpg> .
    </striben/2012/02/28> foaf:depiction </img/strip/2012/02/28.jpg> .

It is pretty straightforward and natural - every resource has an URL. Each URL (thus - resource too) can have named properties, where each property also has a value. A value can be another URL (thus - link to another Resource) or a string.

Remember, how I wrote that we don't need Models, because our data is self-contained? This is what I meant.

P.S. The snippet above is a very helpful thing called Turtle syntax, though simplified here. Actual data is in RDF/XML.

Queries

Now, as we don't have Models and ORM as such, we still need to get our data somehow. So imagine above triples as a graph - in the center there is a resource with edges going out (properties). On the other part of the edge there is a value - a string, or (surprise surprise) another resource linked to this one. Now imagine hundreds of resources linked this way. A web of linked data.

How do you retrieve information from this graph? By a thing similar to pattern matching! In the query you say that you want to get some triples with some properties and values, and leave some blanks that should be filled up in results. Sounds vague, but here's an example:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX foaf: <http://xmlns.com/foaf/0.1>
    PREFIX sioc: <http://rdfs.org/sioc/ns#>

    CONSTRUCT {
        ?uri rdf:type sioc:Post .
        ?uri foaf:thumbnail ?thumbUrl .
    } WHERE {
        ?uri rdf:type sioc:Post .
        ?uri sioc:has_container <http://heltnormalt.dk/striben> .
        ?uri foaf:thumbnail ?thumbUrl .
    }

A very similar query is executed when you type in address: http://heltnormalt.dk/striben - and you get the list of strips. In simplified form results look like this:

<?xml version="1.0" encoding="utf-8"?>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1">
        <rdf:Description rdf:about="http://heltnormalt.dk/striben/2012/02/28">
            <rdf:type rdf:resource="http://rdfs.com/sioc/ns#Post"/>
            <foaf:thumbnail rdf:resource="http://heltnormalt.dk/img/strip/thumb/2012/02/28.jpg"/>
        </rdf:Description>
        <rdf:Description rdf:about="http://heltnormalt.dk/striben/2012/02/29">
            <rdf:type rdf:resource="http://rdfs.com/sioc/ns#Post"/>
            <foaf:thumbnail rdf:resource="http://heltnormalt.dk/img/strip/thumb/2012/02/29.jpg"/>
        </rdf:Description>
    </rdf:RDF>

Sorry for the XML, but I promise - it is important. I hope you see how the pattern matching worked here. Just in case: query said "find me all resources (?uri) and their thumbnails (?thumbUrl), where resources have type Post, container and a property thumbnail."

Transformation

Time to live up the promise - why XML is important? Well, because for transformations we used the most natural transformation tool available for XML. The XSLT. I won't dive into XSLT, the Wikipedia article has some examples, but suffice to say that we can represent our data in any way we want - HTML, XML, JSON, Plain Text, etc. just by using different XSL stylesheets. We could even generate a valid SQL dump to import into MySQL database, but seriously - we don't want to do that. :-) (But we did an exact opposite! We had to import old data.)

That's one of the greatest outcomes of all this - logic is stripped down (but still there is some logic), what's left for you is XML transformations. The thing is, you greatly reduce a chance of bug - your data can be incorrect but can not contain bugs or be invalid (as long as validation in datastore works correctly). And when we did have had issues when some properties were missing in resource, nothing broke, we had our XSLTs set up in way that just the part where that property value should be shown - it was not shown. No ifs, no template logic. And you get this pretty much by default if you use XSLT correctly.

Embrace the Open Source version of this!

Early in the beginning, Martynas said that after we finish with the website, we should extract the back bone system and present it as an open-source framework. Behold fellow hackers - The Graphity.

At the end of the process, we felt that we did not invent anything new - we just reused what was always there, but hidden. So sometimes we like to call Graphity not a Framework, but instead - an Architecture! In theory this approach should work in any language that is widely used in web development today. Martynas has a working Java version of the same thing, slightly more sophisticated because Java already has some packages to work with linked data / semantic web, so he did not need to write everything from scratch as in PHP version. Python? Ruby? I hope there will be a version for those languages, and others, too!

Oh, and by the way, you might be puzzled - where to store the data if you decide to play with Graphity? Well, I happen to know one SaaS company, Dydra - just register and use it. In fact we did and, oh boy!, how friendly and helpful they were through the process of developing HeltNormalt! A perfect example for me how customer care should look like. Seriously, check them out.

Adventures at M.I.T. and a paper about Graphity.

In fall 2011 Martynas found a call for participation in a Linked Enterprise Data Patterns Workshop and said we should try to enter this event, by writing a short paper about what we did and how and how this could benefit the Semantic Web movement.

We did write the paper, we were accepted, and in early December flew over The Pond to Boston, MA to do a presentation! Actually, then I felt that our presentation looked a little bit off (a Danish entertainment company website) among enterprise grands like: IBM, Nokia, Oracle, just to name a few! But then again, who cares about being a little bit off, when sitting next to Tim Berners-Lee and listening to other bunch of great people talking about this great technology and drafting the guidelines for it's future.

If you are interested in the paper we wrote, it has more comparative information how this approach differs from todays common practice in web development. Read it here: "Graphity - A Generic Linked Data Framework".

We invite you to collaborate!

I hope someone endured up until here :-) This is the most important part actually!

We truly believe in Graphity, but as our ways with HeltNormalt have parted - we can not spend a significant amount of time on it.

Although we do spend some hours per week improving it - more hands and minds are always better, so if you feel interested - don't hesitate! Try it out and contribute, we will be there for you on our Github account, I also will try to write more about it on this new and shiny blog and you can always drop an email for me directly or info@graphity.org

What are your thoughts on this - let's talk in comments!

P.S. This essay wouldn't be real without some help and feedback from: Martynas, Aleksandras, Aurelijus, James and Adomas. Huge thanks, guys!