Some thoughts on technical debt and efficiency

For the past years I've been spending a lot of thought on how to improve things. As a softwar developer you might know this: You see something, a workflow or just a tiny task and you already start thinking "If I change this..." and a short while later the thing you just looked at is e.g. better, more efficient or just plain faster or easier to do.

But how to get there?

Let's go back a few years to a point where I first realized this. I was in-between schools and had nothing to do, so I started working at my father's company. My father is a craftsman, insulating roofs (flat ones, especially) and does so for some 40+ years now and I go as far as to call him an expert on what he is doing. I was helping him out in my holidays since I was at least 14 years old, so I basically know what to do around him and how to help - but in the few months we worked that close together I could take away a lot more than just money to spend on new computer stuff. He always showed me how to place the materials in an efficient way, so one doesn't have to walk back and get more materials or have just a few steps less to walk. At this point you probably ask yourself "where is this going and what does it have to do with programming?" and the answer is "everything". In that time with my father I learned how to work efficiently and how to prepare my work (materials), so I can do my job without too many iterations and thus get better every time I do something. Everything you do is potentially improvable! Give it some thought: When you're cleaning the house, how do you place your working materials and in what order do you clean? Do you have to walk back and get something? Can you work seamlessly? Asking these questions and REALLY looking at what you're doing is the key. See if you can do the same work with less effort, less walking, less material or just "more efficiently".

Now back to the all-technical stuff :-)

When developing some new software, it's always the same:

  • Get a development environment
  • Provide some "common ground" for starting (framework install e.g.)
  • Do the usual database layout, controllers, views, templates
  • Create some fancy stuff, the customer wants
  • do some testing
  • create some kind of deployment
  • ...

Do you find some similarities to your projects? Ok, let's take a closer look.

Improving workflows

Every time you start a new project, you have to start somewhere. Did a sentence with "again" in it cross your mind? That might be the starting point for your first improvement. A clear indicator for something that needs improvement is repetition. If you do something several times the same way with slight variations, it's likely to be automatable.

  • You need a Linux environment? Use e.g. vagrant or docker and VM images together with a provisioning tool like ansible
  • New project, clone a framework, add some default tools like node, gulp, PHP classes? Use a skeleton project.
  • Everytime you checkout that project you need to add configs to apache/nginx, restart services, ... use a task runner like robo
  • ...

I think you get the point there :-) Automation is your friend and there are many tools you can use to make your life better.

Improving work

When I started coding I read a rule somewhere, that got stuck in my mind: "Everytime you edit/look at a piece of code, leave it better than before.". But what does that mean? Most developers have some (or many) "old" systems to maintain. Meaning there's a legacy application with "bad code" and a lot of technical debt. No one likes these systems. So, what you want to do is just "get the hell out of here" and leave it be. No? The right answer should be: Make things better. Even if it's just tiny things you change, like moving to PSR-2, correct indentation, fix variable names, add inline comments, ... these things are likely to help the next developer who looks at this code. You can even go further, build a UnitTest for the class/module to have a specification for its behavior and then refactor the whole thing with the security of not breaking the application. But these things take courage, so be courageous! Intergrate that behavior into your work and you will see that there is steady improvement in your codebase and also your feelings towards your code will improve. And don't be mad at the people (or yourself) who wrote that special piece of shitty code. Always remember: Most likely somone with a lot of skill and best intentions successfully solved a problem with his/her best tools available at that specific point in time. This makes it a lot easier. At least somewhat easier. If it's not complete nonsense code... Most developers say "I don't have enough time to do this" or "No one pays for this": This is correct - unless you integrate a small amount of refactoring cost into your estimations :-) I call this a "technical debt tax" and it is not explicitly in my estimations. It's always a few percent on top of the estimate and it will help a lot, over time.

Improve yourself

Nothing much to say, except: If you want to make your work and also your life better, always improve yourself. Read about new technologies, try new things and find new ways of doing things. Even if it does not work it will help your understanding of how things work. So, never stop learning.

In the end...

I could go on for much longer on how to improve, but in the end it boils down to a few guidelines and a lot of hard work to change your mindset:

  • Think about your problem domain and understand all aspects before you start to code
  • Always look for repetitive tasks and automate where you can (and it makes sense)
  • Improve what you find whenever you have to work on existing code - even small improvements help over time
  • Be lazy. Because a lazy person tends to be an efficient problem solver...
  • Never stop learning

Tags: php, architecture, work

Flow based programming and ETL

For quite some time I've been searching for a resonable approch on Extract, Transform, Load (ETL) in php where I can define a workflow, based on e.g. a UML diagram and just "run" it asynchronously. A solution with a fully fledged ETL tool like MS SSIS or talend were out of the question, due to their high complexity and hardware requirements. Also the possible solution has to integrate into our existing php environment.

phpflo

If you have already read my other posts, you know me to already use RabbitMQ and php-amqp for asynchronously handlinge import processes. This goes one step further and introduces the flow based programming "design pattern".

In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. These black box processes can be reconnected endlessly to form different applications without having to be changed internally. FBP is thus naturally component-oriented. (Wikipedia)

Developers used to the Unix philosophy should be immediately familiar with FBP:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

It also fits well in Alan Kay's original idea of object-oriented programming:

I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages (so messaging came at the very beginning -- it took a while to see how to do messaging in a programming language efficiently enough to be useful).

Sounds good, doesn't it?

improvements and status quo

Initially I worked with phpflo, adapted it for symfony to use dependency injection instead of a factory-like process and was kind of happy. After a short while, the first problems arose:

Having serveral long running processes introduced the problem of "state" within components and also the network. So, already initialized networks could not be reused and had to be destroyed. Using a compiler-pass approach with a registry of components, also introduced port states within the process.

Several ideas came to my mind: Just restart the processes after every message from the queue or even fork the single ETL processes per message - but everything just lead into more problems:

  • Restarting processes means framework initialization overhead
  • Forking processes needs some kind of lowlevel process management

Overall, the best approach was to integrate some stage-management into phpflo, split the library into several components and implement a parser for the (more convinient) FBP domain specific language (DSL). You can find the implementation here. The split into several libraries was necessary due to separation of concerns, maintenance and possible future contributions of generic components.

integration

Added to our technology stack, phpflo integrated fine with symfony and all components are loaded via DIC. This allows for easy configuration of processes:

CategoryCreator() out -> in MdbPersister()
CategoryCreator() route -> in RouteCreator()
CategoryCreator() media -> in MediaIterator()
MediaIterator(MediaIterator) out -> in MediaHandler()
CategoryCreator() bannerset -> in BannersetHandler()
BannersetHandler() out -> bannerset CategoryCreator()
CategoryCreator() tag -> tags TagCreator(TagCreator)
TagCreator() tags -> tag CategoryCreator()
CategoryCreator() hierarchy -> hierarchy TagCreator()
TagCreator(TagCreator) hierarchy -> hierarchy CategoryCreator()
CategoryCreator() sidebar -> in SidebarHandler()
SidebarHandler() out -> sidebar CategoryCreator()
SidebarHandler() build -> in JsonFieldFetcher()
JsonFieldFetcher() sidebar -> in SidebarCreator()
RouteCreator() out -> in MdbPersister()

This replaces a 450+ lines JSON-file!

So, given all processes are defined as symfony (private) services, they can use all dependencies they need and are even easier to test.

Thanks to the datatype checks I've introduced into phpflo, connections are checked for compatibility. For us this means: Every component with compatible ports could be stitched together and worked with. That removed a lot of inheritance, type-checks and so on.

If you need a similar solutions, I suggest you continue reading here: phpflo on GitHub

And last, but not least: Big thanks to James (@aretecode) for his code reviews and support concerning architectural descisions!

Tags: php, symfony2

Refactoring the ansible-symfony2 deployment

For a few months now, I've been maintaining the ServerGrove ansible-symfony2 deployment, which by the time was more of a proof-of-concept ansible galaxy role.

In search of deployment

When I was first looking into deploying our Symfony application, I've first tried Capistrano (who hasn't). Although I already worked with ansible at that time, using it for deployment, rather than provisioning was not the first idea I had in mind. After struggling with Capistrano and its ruby-based dependencies, I quickly realized: I need something else with more manageable dependencies and also easy configuration.

ansible-symfony2

After some research (read: using google), I ran into ServerGrove's deployment role. At that time I just copied it and rewrote most parts to fit my application. At that time the repo was already quite abandoned. Seeing the open issues and PRs, I decided to ask if I could maintain the repo and even develop it further - Added a few code pieces from my modified deployment, added testing and thought I was done.

Refactoring

Because some people seemed to use the role for their deployments - or at least as a base to start from - I decided to take it further and incorporate some of the key features of my custom build. This includes a RELEASE file for each release that holds the release's unique identifier, hooks for additional tasks, git shallow copy and splitting into multiple files. Full changelog here.

Updating tests to use docker infrastructure

Getting Travis.ci to do as I wanted for testing, took me quite a while, too ;-). Main problem here: If you use python as the docker container language for version constraining to 2.7 for ansible, only php 5.3.x is available. Using the "old" infrastructure with virtual machines also just brings 5.3.x around - and some more side effects like you really have to install everything yourself. So for current testing, I had to go with symfony 2.6.

Extra tasks: hooks

In my current production deployment for Motor Presse, I need some tasks done with gulp. We're heavily relying on gulp to compile our stylus files to CSS, build minified JS and generally handle asset versioning. So how to include tasks like that into the deployment when it's most likely only an edge case? One option would be to add another trigger like "symfony_project_trigger_gulp" - but more config means more tests and not every project uses something like gulp or grunt. The second problem I was facing was cache flushing. How to flush caches on/after deployment? Depending on the infrastructure this might be a php5-fpm restart - or a apache2 restart, because someone uses mod_php. In our special case, we're forking into the php process and manually trigger php code to flush APCu. These are only three examples which already put so much complexity into our deployment that it's almost not manageable. Thankgod there's the include statement. Using a configuration variable for a (role-external) task path, you can include this YAML file within a task of your role dynamically. In config:

---
symfony_project_post_folder_creation_tasks: "{{playbook_dir}}/hooks/post_folder_creation.yml"

and in the actual task:

---
- include: "{{ symfony_project_post_folder_creation_tasks | default('empty.yml') }}"

Nice about this approach is the errorhandling by including an empty default file. I'd have preferred a solution with something like "skip if no include is provided", but instead of a nice onliner, this would have been a file stat, checks if stat.exists and then include. At least 6 lines of code instead of just one, just to have a "skipped". Ok, I could add a debug to the empty file where it says "skipped: no hook defined", but... no real sense in that.

Future ideas

Since I'm actively using this role to deploy large platforms, I'm trying to improve it step by step. The 2.0.1-alpha release is a huge leap forwards to a clean and flexible role. For future releases, there's still a lot of work: As soon as ansible is released in a 2.x version and error handling is finaly upon us, I'd be glad to integrate it into this role. For my current production deployment I'm using a GELF message curl request to a graylog server to promote a deployment. But how to check if there's an error? Or how to add partial errors/skips to such a message? If you have further ideas, improvements, issues or comments, just let me know on github :-)

Tags: php, symfony2, deployment, ansible