Reflections on the second Chalice scrum

We had a second two-week Scrum session on code for the Chalice project. This was a followup to the first Chalice scrum during which we made solid progress.

During the second Scrum the team ran into some blocks and progress slowed. The following is quite a soul-searching post, in accordance with the project documentation instructions: “don’t forget to post the FAIL(s) as well: telling people where things went wrong so they don’t repeat mistakes is priceless for a thriving community.”

Our core problem was the relative inflexibility of the relational database backend. We’d chosen to use an RDBMS rather than an RDF triplestore mainly for the benefits of code-reuse and familiarity, as this enabled us to repurpose code from a couple of similar EDINA projects, Unlock and Addressing History.

However, when the time came to revise the model based on updated data extracted from EPNS volumes, this created a chain of dependencies – updates to the data model, then the API, then the prototype visualisation – progress slowed, and not much changed in the course of the second sprint.

A second problem was lack of really clearly defined use cases, especially for a visual interface to the Chalice data. Here we have a bit of a chicken-and-egg situation; the work exploring how different archive projects can re-use the Chalice data to enhance their collections, is still going on. This is something which we have more emphasis on during the latter part of the project.

So on the one hand there’s a need for a working prototype to be able to integrate Chalice data with other resources; and on the other, a need to know how those resources will re-use the Chalice data to inform the prototype.

So what would we do differently if we did it again?

  • More of a design phase before the Scrum proper starts – with time to experiment with different data storage backends
  • More work developing detailed use cases before software development starts
  • More active collaboration between people talking to end users and people developing the backend (made more difficult because the project partners are distributed in space)

Below are some detailed comments from two of the Scrum team members, Ross and Murray.

Ross: I found Scrum useful, efficient, great for noticing both what others are doing and when your heading down the wrong path and identifying when you need further meetings, as was the case a few times early in the process. The whiteboard idea developed later on was also very useful. I don’t think the bottlenecks where anything to do with the use of Scrum, just in the amount of information and quality of data we had available to us, maybe this is due partially to the absence of requirements gathering in Scrum.

The data we received had to be reverse engineered to some respect. As well as figuring out what everything in the given format was for (such as regnal dates, alternative names, contained places and their location relative to parent) and what parts where important to us (such as which of the many date formats we were going to store i.e. start, end and/or approximations) we also had no direct control over it.

In order for the database, interface and API to work we had to decide on a structure quickly and get data in the database meaning learning how to install and operate a triple store (the recommend method) or spend time figuring out how to get hibernate to work with the decided
structure (a more adaptable database access technology) would have delayed everything so a trade off was made to manually write code to
parse the data from XML and enter it into a familiar relational database which caused us more problems later on. One of these was that the data was to continue to change on every generation; elements being added and removed or completely changed meant changing the parsing, then the domain objects, then the database and lastly the database insertion code.

Lack of use cases: From the start we were developing an app without knowing what it should look like or how it should function. We were unsure as to what data we should or would need to store and how much control users of the service would have over the data in the database. We were unsure how to query the database and display API request responses so as to best fit the
needs of the intended users in an efficient, useful way. We are slightly more clear on this but more information on how the product will be used would be greatly helpful.

And as for future development… If we are sticking with the relational database model I definitely think it’s wise to get rid of all the database reading/writing code in favour of a hibernate solution, this would be tricky with our database structure however but more adaptable and symmetrical; so that changes to the input method are also made to the output and only one change needs
to be made. Some sort of XML-POJO relational tool may also be useful
although would make new dataset importing more complex (perhaps using
xslt) to further improve adaptability.
As well as that, some more specific use cases mentioning inputs and
required outputs would be very useful.

Murray: My comment, would be that we possibly should have worked on a hibernate
ORM first, before creating the database. As soon as we had natural keys,
triggers and stored procs in the database, it became too cumbersome to
reverse engineer them.

If we had created a ORM mapping first we could automatically generate
the db schema from that, rather than the other way round.
I presume we could write the searches even the spacial ones in hibernate
rather than stored procs.
Then it would be easier to cope will all the shifts in the xml
structure. Propagating to changes through the tiers would be case of
regenerating db and domain objects from the mappings rather than by hand.

The generated domain objects could be reused across the dataloading, api
and search. The default lazy loading in hibernate would have been good
enough to deal with the hierarchical nature of the data to a
indiscriminate depth.

Musings on the first Chalice Scrum

For a while i’ve been hearing enthusiastic noises about how Scrum development practise can focus productivity and improve morale; and been agitating within EDINA to try it out. So Chalice became the guinea-pig first project for a “Rapid Application Development” team; we did three weeks between September 20th and October 7th. In the rest of this post I’ll talk about what happened, what seemed to work, and what seemed soggy.

What happened?

  • We worked as a team 4 days a week, Monday-Thursday, with Fridays either to pick up pieces or to do support and maintenance work for other projects.
  • Each morning we met at 9:45 for 15 minutes to review what had happened the day before, what would happen the next day
  • Each item of work-in-progress went on a post-it note in our meeting room
  • The team was of 4+1 people – four software developers, with a database engineer consulting and sanity checking
  • We had three deliverables –
        a data store and data loading tools
        a RESTful API to query the data
        a user interface to visualise the data as a graph and map

In essence, this was it. We slacked on the full Scrum methodology in several ways:

  • No estimates.

Why no estimates? The positive reason: this sprint was mostly about code re-use and concept re-design; we weren’t building much from scratch. The data model design, and API to query bounding boxes in time and space, were plundered and evolved from Unlock. The code for visualising queries (and the basis for annotating results) was lifted from Addressing History. So we were working with mostly known quantities.

  • No product owner

This was mostly an oversight; going into the process without much preparation time. I put myself in the “Scrum master” role by instinct, whereas other project managers might be more comfortable playing “product owner”. With hindsight, it would have been great to have a team member from a different institution (the user-facing folk at CeRch) or our JISC project officer, visit for a day and play product owner.

What seemed to work?

The “time-boxed” meeting (every morning for 15 minutes at 9:45) seemed to work very well. It helped keep the team focused and communicating. I was surprised that team members actually wanted to talk for longer, and broke up into smaller groups to discuss specific issues.

The team got to share knowledge on fundamentals, that should be re-useful across many other projects and services – for example, the optimum use of Hibernate to move objects around in Java decoupled from the original XML sources and the database implementation.

Emphasis on code re-use meant we could put together a lot of stuff in a compressed amount of time.

Where did things go soggy?

From this point we get into some collective soul-searching, in the hope that it’s helpful to others for future planning.

The start and end were both a bit halting – so out of 12 days available, for only 7 or 8 of those were we actually “on”. The start went a bit awkwardly because:

      We didn’t have the full team available ’til day 3 – holidays scheduled before the Scrum was planned
      It wasn’t clear to other project managers that the team were exclusively working on something else; so a couple of team members were yanked off to do support work before we could clearly establish our rules (e.g. “you’ll get yours later”).

We could address the first problem through more upfront public planning. If the Scrum approach seems to work out and EDINA sticks with it for other projects and services, then a schedule of intense development periods can be published with a horizon of up to 6 months – team members know which times to avoid – and we can be careful about not clashing with school holidays.

We could address the second problem by broadcasting more, internally to the organisation, about what’s being worked on and why. Other project managers will hopefully feel happier with arrangements once they’ve had a chance to work with the team. It is a sudden adjustment in development practise, where the norm has been one or two people full-time for a longish stretch on one service or project.

The end went a bit awkwardly because:

    I didn’t pin down a definite end date – I wasn’t sure if we’d need two or three weeks to get enough-done, and my own dates for the third week were uncertain
    Non-movable requirements for other project work came up right at the end, partly as a by-product of this

The first problem meant we didn’t really build to a crescendo, but rather turned up at the beginning of week 3, looked at how much of the post-it-note map we still had to cover. Then we lost a team member, and the last couple of days turned into a fest of testing and documentation. This was great in the sense that one cannot underestimate the importance of tests and documentation. This was less great in that the momentum somewhat trickled away.

On the basis of this, I imagine that we should:

  • Schedule up-front more, making sure that everyone involved has several months advance notice of upcoming sprints
  • Possibly leave more time than the one review week between sprints on different projects
  • Wait until everyone, or almost everyone, is available, rather than make a halting start with 2 or 3 people

We were operating in a bit of a vacuum as to end-user requirements, and we also had somewhat shifting data (changing in format and quality during the sprint). This was another scheduling fail for me – in an ideal world we would have waited another month, seen some in-depth use case interviews from CeRch and had a larger and more stable collection of data from LTG. But when the chance to kick off the Scrum process within the larger EDINA team came up so quickly, I just couldn’t postpone it.

We plan a follow-up sprint, with the intense development time between November 15th and 25th. The focuses here will be

  • adding annotation / correction to the user interface and API (the seeds already existing in the current codebase)
  • adding the ability to drop in custom map layers

Everything we built at EDINA during the sprint is in Chalice’s subversion repository on Sourceforge – which I’m rather happy with.

CHALICE: The Plan of Work

DRAFT

GANTT-like chart showing the interconnection between different work packages and participants in CHALICE – not a very high-quality scan, sorry. When there are shifts and revisions in the workplan, Jo will rub out the pencil markings and scan the chart in again, but clearer this time.

As far as software development goes we aspire to do a Scrum though given the resources available it will be more of a Scrum-but. Depending how many people we can get to Scrum, we may have to compress the development schedule in the centre – spike one week, deliver the next pretty much – then have an extended maintenance and integration period with just one engineer involved.

The preparation of structured versions of digitised text with markup of extracted entities will be more of a long slog, but perhaps I can ask CDDA and LTG to write something about their methodologies.

The use case gathering and user engagement parts of the project will develop on the form already used in the TextVRE project.