Sunday, January 24, 2016

Developer Productivity

Recently I gave a presentation regarding developer productivity - something that has always interested me.  Generally, when a system gets out of control with spiralling technical debt and software entropy it becomes impossible to make changes or add new functionality to that system in a reasonable and predictive way.   At a critical level, this can be the difference between project success and project failure.  If processes are smart, good structures are in place, developers are productive not burning out from late hours debugging crazy code, terrified of the impacts of any change.

Everyone wants a path to project delivery that is smoother.  No matter what the job or the task it is generally far easier to get a sense of satisfaction from a sense of productivity.   But let's face it; software entropy exists in nearly every project.  Why?  Are we not smart enough to use the machines we designed?  Do commercial realities inevitably mean bad code happens more than good code?

These are questions that spark lively discussions.  In this presentation, I outline some of my own ideas on developer productivity.

Saturday, September 12, 2015

Custom User types in GORM

Recently, I wanted to model a Merchant which like many things in a domain model had an Address. I thought it made sense that the Address was embedded inside the Merchant. Reasons:
  • It had no lifecycle outside the Merchant. Merchant dies so should the address.
  • It only ever belonged to one and only one Merchant
So pretty obvious this was a composition relationship.

Now, it is possible to model composition relationships in GORM. See here. However, this approach comes with the caveat that the Address must be a GORM object. I didn't want the Address being a GORM object because GORM objects are powerful in Grails. With all their dynamic finders and GORM APIs they are essentially like really a DAO on steroids. If a developer gets their hands on can do lots of things (not always good things). I didn't want or need any of this. In addition, a good architecture makes it difficult for developers to make mistakes when they are working under pressure at fast speeds. That means, when you are making design decisions you need to think about the power you need to give, should give and will give.

So with that in mind, I looked into wiring up a custom type for Address. This would just be a data structure that would model the address, could be reused outside the Merchant (thus promoting consistency and again thus promoting good design) and wouldn't come with the power of the GORM. There is some documentation in the GORM doc's for custom types but there isn't a full working example. I had a look at some Hibernate examples and then put managed to put this together and get working.

Here is my address object.

class Address {

    private final String city;
    private final String country;
    private final String state;
    private final String street1;
    private final String street2;
    private final String street3;
    private final String zip;

    public String getCity() {
        return city;

    public String getCountry() {
        return country;

    public String getZip() {
        return zip;

    public String getState() {
        return state;

    public String getStreet1() {
        return street1;

    public String getStreet2() {
        return street2;

    public String getStreet3() {
        return street3;
Here is my AddressUserType object:
class AddressUserType implements UserType {

    public int[] sqlTypes() {
        return [
        ] as int[]

    public Class getReturnedClass() {
        return Address.class;

    public Object nullSafeGet(ResultSet rs, String[] names, SessionImplementor session, Object owner) throws SQLException {
        assert names.length == 7;
        String city = StringType.INSTANCE.get(rs, names[0], session); // already handles null check
        String country = StringType.INSTANCE.get(rs, names[1], session ); // already handles null check
        String state = StringType.INSTANCE.get(rs, names[2], session ); // already handles null check
        String street1 = StringType.INSTANCE.get(rs, names[3], session ); // already handles null check
        String street2 = StringType.INSTANCE.get(rs, names[4], session ); // already handles null check
        String street3 = StringType.INSTANCE.get(rs, names[5], session ); // already handles null check
        String zip = StringType.INSTANCE.get(rs, names[6], session ); // already handles null check

        return city == null && v == null ? null : new GAddress(city: city, country: country, state: state, street1: street1, street2: street2,  street3: street3, zip: zip);

    void nullSafeSet(java.sql.PreparedStatement st, java.lang.Object value, int index, org.hibernate.engine.spi.SessionImplementor session) throws org.hibernate.HibernateException, java.sql.SQLException {
        if ( value == null ) {
            StringType.INSTANCE.set( st, null, index );
            StringType.INSTANCE.set( st, null, index+1 );
            StringType.INSTANCE.set( st, null, index+2 );
            StringType.INSTANCE.set( st, null, index+3 );
            StringType.INSTANCE.set( st, null, index+4 );
            StringType.INSTANCE.set( st, null, index+5 );
            StringType.INSTANCE.set( st, null, index+6 );
        else {
            final Address address = (Address) value;
            StringType.INSTANCE.set( st, address.getCity(), index,session );
            StringType.INSTANCE.set( st, address.getCountry(), index+1,session);
            StringType.INSTANCE.set( st, address.getState(), index+2,session);
            StringType.INSTANCE.set( st, address.getStreet1(), index+3,session);
            StringType.INSTANCE.set( st, address.getStreet2(), index+4,session);
            StringType.INSTANCE.set( st, address.getStreet3(), index+5,session);
            StringType.INSTANCE.set( st, address.getZip(), index+6,session);

    public boolean isMutable() {
        return false;

    public boolean equals(Object x, Object y) throws HibernateException {
        // for now
        return x.equals(y);

    public int hashCode(Object x) throws HibernateException {
        assert (x != null);
        return x.hashCode();

    public Object deepCopy(Object value) throws HibernateException {
        return value;

    public Object replace(Object original, Object target, Object owner)
            throws HibernateException {
        return original;

    public Serializable disassemble(Object value) throws HibernateException {
        return (Serializable) value;

    public Object assemble(Serializable cached, Object owner)
            throws HibernateException {
        return cached;

    public Class returnedClass() {
        return Address.class;
And here is my Merchant which has an Address.
class Merchant {
    UUID id;

    String color;
    String displayName;

    Address address
    static mapping = {
        address type: AddressUserType, {
            column name: "city"
            column name: "country"
            column name: "zip"
            column name: "state"
            column name: "street1"
            column name: "street2"
            column name: "street3"

As stated, with this approach, the Address data structure could be used in other GORM objects. Until the next time take care of yourselves.

Monday, August 3, 2015

Book Review: Cloud Based Architectures

Migrating to Cloud-Native Application Architectures, Matt Stine.

In the last few years, it can be very easy to believe marketing departments simply aren’t paid unless they use the word "Cloud" as often as possible.  Its usage has become so ubiquitous that is has inevitably become ambiguous so one very astute thing author Matt Stine does is outline what he means by "Cloud" (in particular cloud architectures) in the opening pages of this book.  Essentially, cloud architectures can be summarised by the three S’s. 

  • Speed: ability to provision and release resources (computing, networking and storage) with ease thereby achieving faster project development time
  • Safety: making software architectures more resilient (fault tolerance, fault isolation etc) 
  • Scalability: One the many non-functional characteristics that has become even more important with proliferation of mobile computing.  More mobile devices mean more application usage mean more back ends APIs getting hit.  Architectures must be able to respond to this demand otherwise applications fail. 

Twelve Factor App

Throughout the book approaches and patterns for achieving cloud based architectures are introduced. Now, if you have worked on anything distributed which was well architected, you will definitely experience a little bit of deja vu.  For example, the Twelve Factor App is a collection of patterns developed by engineers at Heroku containing twelve (you guessed it) principles to guide a good architecture fit for cloud.  You are sure to have used codebase (make deployable units have their own code base), dependencies (use proper tooling for managing dependencies), config (externalise configuration), backing services (databases, caches etc should be consumed identically by all resources) before but patterns are supposed to be common solutions to common problems and good architecture is good architecture, so it is not unexpected some of the described approaches won’t always sound completely new. 


How do Microservices help enable the 3 S’s?
  • Speed: Deployment is much simpler and faster with independent code chunks.
  • Security: Not too much here, but it is fairly intuitive that if you break something large into small chunks it is much easier to access control sensitive parts. 
  • Scale:  Much easier to only have to scale the parts of an architecture you need to scale as opposed to having to scale the entire architecture - which is what happens in a Monolithic application.

Cloud Migration

For organisations to move to Cloud based architectures, several changes are needed:
  • DevOps rather than Silos: this is to facilitate speed of delivery.
  • Continuous Delivery:  one way to be to sure the problems of long release cycles are guaranteed to be avoided 
  • Decentralise autonomy:  yes, it is a nice idea to give everyone more power and influence. It will certainly speed things up. But, in my own view some things (core APIs, major technical architectural decisions) should only be handled by people of a certain technical background who are prepared to be fully accountable for them.  Would you let every person working on the design of a bridge or in a heart transplant have the exact same say? 
  • Inverse conway manoeuvre: Software companies should do their architecture first and then re-align their organisations to fit that so that they avoid the anti-pattern of making architectures to resemble an existing organisational structure.   This sounds nice in theory but it may not be so easy to achieve (possibly easier to do in very small companies that never have a strict organisational structure anyway).

Break up the Monolith

So if you have bought into the cloud based approach and since it’s unlikely your project is green field, you need to figure out how to break up the Monolith.  Some ideas include:
  • Bounded Contexts: This allows inconsistent definitions of concepts as long as they are consistent inside a well defined context which makes it much easier to decompose data models. For example, in the Security context a User may always refer to something that can be authenticated and has particular roles but in a Management context it would be something that has an image, an address and some application capabilities (checking out a shopping cart).
  • Identifying bounded contexts in the monolith and making them micro services.
  • Containerisation: Using something like Docker can provide many advantages to using VM's
  • Writing new features as micro services
  • Making the Monolith look like micro services by using anti-corruption layers

Spring cloud and Netflix

By the sounds of it Spring cloud's and the Netflix OSS project sound like an absolute must to checkout. Amongst other things, they help:
  • Achieve dynamic configuration
  • Implement sophisticated service discovery
  • provide alternative to load balancing which involve client side more.  
In addition, Netflix's Hystrix library (which provides some very useful metrices) employs some clever fault tolerance patterns:
  • circuit breakers - if a system notices that another system is broken it stops calling it.
  • bulkheads -  failure is limited by strict partitioning

Any negatives?

So lots of positives about this book.  Plenty of well explained information in a short number of pages. That is a major achievement.   I have alluded to a few minor points above, but they are very minor.  It would be very difficult to criticise this book in any substantial manner especially when it is free.    

However, it is important for the reader to bear a few considerations.
  • Business realities: Approaches such as Continuous Delivery just may not fit your business plan.  Why would you want to provide a feature someone hasn’t paid for?  (In fairness, the author does make this very point, but it really is worth re-iterating)
  • Technical realities:  One problem I have with many software approaches is that they assume that everyone is at a given skill level. In an ideal world the only people who work in software engineering would be very good engineers but the reality is there is always a very wide range of skill levels on any project.  From super meticulous to complete spoofers.  This means that some things that can sound very elegant in theory but a lot more difficult to achieve in practise.  When considering moving to things like DevOps, there will definitely be benefits. But are organisations going to find it easier or harder to make engineers accountable? I would imagine there would be serious risks of it being latter if it is not properly thought through.
  • Some concepts e.g. Client Side load balancing sound very interesting. But, they are only introduced rather than critically analysed.  Much more research and thought would be required (for me anyway) before considering adopting.  

Sunday, July 5, 2015

Postgres indexes

Recently, I had a situation where I needed to think how I was using Postgres indexes. I had a simple Book table with the following schema...
>\d book

                  Table ""
       Column        |          Type          | Modifiers 
 id                  | uuid                   | not null
 version             | bigint                 | not null
 amount_minor_units  | integer                | not null
 currency            | character varying(255) | not null
 author       | character varying(255) | not null
 publisher           | character varying(255) |    
The author and publisher columns were just String pointers to actual Author and Publisher references that were on another system meaning that classical foreign keys couldn't be used and that these dudes were just normal columns.

I needed to get an idea how the table would perform with lots of data, so first up some simple SQL to put in lots of test data:

> CREATE EXTENSION "uuid-ossp";
> insert into book (id, version, amount_minor_units, currency, author, publisher) 
select uuid_generate_v4(), 2, 22, 'USD', 'author' ||, 'publisher' || from generate_series(1,1000000) AS x(id); 
This table was going to be hit lots of times with this simple query:
select * from book where author = 'Tony Biggins' and publisher='Books unlimited';
To get the explain plain, I did:
dublintech=> EXPLAIN (FORMAT JSON) select * from book where author = 'Tony Biggins' and publisher = 'Books unlimited';
                                                            QUERY PLAN                                                             
 [                                                                                                                                +
   {                                                                                                                              +
     "Plan": {                                                                                                                    +
       "Node Type": "Seq Scan",                                                                                                   +
       "Relation Name": "book",                                                                                                   +
       "Alias": "book",                                                                                                           +
       "Startup Cost": 0.00,                                                                                                      +
       "Total Cost": 123424.88,                                                                                                   +
       "Plan Rows": 1,                                                                                                            +
       "Plan Width": 127,                                                                                                         +
       "Filter": "(((author)::text = 'Tony Biggins'::text) AND ((publisher)::text = 'Books unlimited'::text))"+
     }                                                                                                                            +
   }                                                                                                                              +
(1 row)
As can be seen, Postgres is doing a Seq Scan, aka a table scan. I wanted to speed things up. There was only one index on the table which was for the id. This was just a conventional B-Tree index which would be useless in this query since it wasn't even in the where clause. Some of options I was thinking about:
  • Create an index on author or publisher
  • Create an index on author and create an index on publisher
  • Create a combination index on both index and publisher.
Hmmm... let the investigations begin. Start by indexing just author.
dublintech=> create index author_idx on book(author);
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                  QUERY PLAN                                   
 [                                                                            +
   {                                                                          +
     "Plan": {                                                                +
       "Node Type": "Index Scan",                                             +
       "Scan Direction": "Forward",                                           +
       "Index Name": "author_idx",                                  +
       "Relation Name": "book",                                               +
       "Alias": "book",                                                       +
       "Startup Cost": 0.42,                                                  +
       "Total Cost": 8.45,                                                    +
       "Plan Rows": 1,                                                        +
       "Plan Width": 127,                                                     +
       "Index Cond": "((author)::text = 'author3'::text)",+
       "Filter": "((publisher)::text = 'publisher3'::text)"                     +
     }                                                                        +
   }                                                                          +
(1 row)
As can be seen Postgres performs an index scan and the total cost is much lower than the same query which uses a table scan. What about the multiple column index approach? Surely, since both are used in the query it should be faster again, right?
dublintech=> create index author_publisher_idx on book(author, publisher);
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                                        QUERY PLAN                                                         
 [                                                                                                                        +
   {                                                                                                                      +
     "Plan": {                                                                                                            +
       "Node Type": "Index Scan",                                                                                         +
       "Scan Direction": "Forward",                                                                                       +
       "Index Name": "author_publisher_idx",                                                                     +
       "Relation Name": "book",                                                                                           +
       "Alias": "book",                                                                                                   +
       "Startup Cost": 0.42,                                                                                              +
       "Total Cost": 8.45,                                                                                                +
       "Plan Rows": 1,                                                                                                    +
       "Plan Width": 127,                                                                                                 +
       "Index Cond": "(((author)::text = 'author3'::text) AND ((publisher)::text = 'publisher3'::text))"+
     }                                                                                                                    +
   }                                                                                                                      +
(1 row)
This time Postgres, uses the multi-index, but the query doesn't go any faster. Mai, pourquoi? Recall, how we populated the table.
insert into book (id, version, amount_minor_units, currency, author, publisher) 
select uuid_generate_v4(), 2, 22, 'USD', 'author' ||, 'publisher' || from generate_series(1,1000000) AS x(id); 
There are lots of rows, but every row has a unique author value and a unique publisher value. That would mean the author index for this query should perform just as well. An analogy would be, you go into a music shop looking for a new set of loudspeakers someone has told you to buy that have a particular cost and a particular power output (number of watts). When you enter the shop, you see the speakers are nicely ordered by cost and you know what? No two sets of loudspeakers have the same cost. Think about it. Are you going to find the speakers any faster if you use just use the cost or you use the cost and the loudspeaker?

Now, imagine the case if lots of the loudspeakers were the same cost. Then of course using both the cost and the power will be faster.

Now, let's take this point to the extremes in our test data. Suppose all the authors were the same. The author index becomes useless and if we don't have the author / publisher combination index we would go back to table scan.

// drop combination index and just leave author index on table 
dublintech=> drop index author_uesr_ref_idx;
dublintech=> update book set author='author3';
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                                      QUERY PLAN                                                       
 [                                                                                                                    +
   {                                                                                                                  +
     "Plan": {                                                                                                        +
       "Node Type": "Seq Scan",                                                                                       +
       "Relation Name": "book",                                                                                       +
       "Alias": "book",                                                                                               +
       "Startup Cost": 0.00,                                                                                          +
       "Total Cost": 153088.88,                                                                                       +
       "Plan Rows": 1,                                                                                                +
       "Plan Width": 123,                                                                                             +
       "Filter": "(((publisher)::text = 'publisher3'::text) AND ((author)::text = 'author3'::text))"+
     }                                                                                                                +
   }                                                                                                                  +
(1 row)
So we can conclude from this that single column indexes for combination searches can perform as well as combinational indexes when there is a huge degree of variance in the data of that single column. However, when there isn't, they won't perform as well and a combinational index should be used. Yes, I have tested by going to extremes but that is the best way to make principles clear.And please note: For the case when there is maximum variance in data, adding another index to the other column in the where clause, publisher made no difference. This is as expected.

Ok, let's stick with the case when there is massive variance in data values in the column. Consider the case of maximum variance and the query only ever involves exact matching. In this case, all authors values are guaranteed to be unique and you never have any interest in doing anything like less than or greater than. So why not use a hash index instead of a B-Tree index?

dublintech=> create index author_hash on book using hash (author);
dublintech=> EXPLAIN (FORMAT JSON) select * from book where publisher = 'publisher3' and author='author3';
                                  QUERY PLAN                                   
 [                                                                            +
   {                                                                          +
     "Plan": {                                                                +
       "Node Type": "Index Scan",                                             +
       "Scan Direction": "NoMovement",                                        +
       "Index Name": "author_hash",                                 +
       "Relation Name": "book",                                               +
       "Alias": "book",                                                       +
       "Startup Cost": 0.00,                                                  +
       "Total Cost": 8.02,                                                    +
       "Plan Rows": 1,                                                        +
       "Plan Width": 127,                                                     +
       "Index Cond": "((author)::text = 'author3'::text)",+
       "Filter": "((publisher)::text = 'publisher3'::text)"                     +
     }                                                                        +
   }                                                                          +
(1 row)
Interesting, we have gone faster again. Not a massive difference this time around but an improvement nonetheless that could be more relevant with more data growth and / or when a more complex query with more computation is required. We can safely conclude from this part that yeah if you are only interested in exact matches then the hash index beats the b-tree index. Until the next time take care of yourselves. References: