Version 1 (modified by jannekevdp@…, 12 years ago) (diff)

moved from nbic wiki

GSCF 1.0 Code Cleanup

To professionalize our code base and to truly allow others to collaborate in this open source project easily, we have to make a few efforts. This page is intended to collect those efforts.

Add authentication

Without authentication, the usage of deploying GSCF on an internet site is limited.

Refactor the code in study/show

The code of the study/show is in bad shape. The controller and view logic are intertwined and almost everything is in the view. The data model / business logic should be moved to the controller, and the view should be split up into multiple templates.

Ability to exchange templates

In order to be able to copy example templates or specific template sets without having to modify the code, we should come to a template description format (probably XML or RDF or XMI!). Feature request #121 is a prerequisite for this.

Ability to export/import studies

This feature could arguably also be saved for a later version, but sooner or later we will need an open exchange format, preferably something which easily parseable such as XML. The templates could act as XML descriptors, but this is not necessary. See also Feature request #80.

Improve test coverage

The test coverage of large portions of the code is very low. This introduces issues in terms of reliability and maintainability of the code.

Test study create wizard systematically

In order to test the study create wizard, we need to implement some sort of webtesting (e.g. bij using the Grails webtest plugin) to check if the data in the flow scope remains consistent with all possible changes.

E.g. what happens in the following scenario:

  • create study
  • create template
  • select template
  • enter some values
  • change template: delete the fields you just entered values in


Turn GSCF into a Grails plugin

To really encourage people to build their own GSCF application, and to facilitate omics modules development, we could turn GSCF into a Grails plugin. This plugin should contain three key elements:

  • the templating structure
  • the wizard
  • the importer

The first goal, encourage people to build their own application, may be a little far-fetched, but on the other hand, if you look at the way for example MOLGENIS works, this is a good approach. And at least it would help ourselves to make our code prettier by having our own plugin project with core elements and an application project with specifics (maybe we need in the end even different ones for the different consortia).

The second goal is even more obvious. If you look at the current implementation of SAM, both the wizard and importer code is copied there. Of course that's not really good programming practice. In order to refactor on that level, the nicest way of sharing this could would be to create a Grails plugin containing those elements.

Also, the metabolomics module could then reuse this plugin and also profit from both the wizard and the importer, plus the templating structure. Because Excel-like editing is also needed there.

GSCF 1.0 - Hurdles to take from a user perspective

The biggest obstacles to smoothly enter a medium-sized nutrigenomics study (let's say 80 subjects, 40 events, 1000 samples) from a user perspective are:

Import of data in general

The importer should recognize the header names right away, even if they don't match exactly (best guess). Setting all the dropdown boxes to the right property can be very tedious. This goes even more so for the SAM module.

Entering of events

Due to complex time-based schema's, the flat representation of all possible events x times goes up quickly (something like 40-50 events). Right now, the only viable way to get all the events in in a reasonable manner is to write them down in Excel and then import them. This is because right now, you have to click 'Add' for each new event. It would already improve things if we could just say 'add 20 events of type X' just like that is possible with Subjects. But the real killer app would be if we could just enter the events in a event types x time (+ group?) table, and they would automatically be added. That would save a lot of entering, copying and group clicking.

Entering of samples

With samples, it's basically the same story as with events, but the numbers are even worse. It's no use scrolling through 1000 samples to see if they have the right template or something like that. The template should be auto-set, depending on the parent SamplingEvent?. It certainly would help if the table could be reviewed after that, especially with smaller studies, but the automatic generation is most important.

Entering data into SAM

As mentioned earlier, the column headers in the SAM importer should be auto-recognized, since there can be hundreds of them. Also, it should be possible to choose between different sheet layouts, the most common one being: samples in rows, different measurements (called MeasurementTypes?) in the columns. Recognition of the headers is fine, but if a MeasurementType? is not already in the database, you cannot import it without importing the MeasurementType? first. This is a little tedious as well. Maybe we should add on-the-fly adding of new MeasurementTypes?. It would be even fancier if SAM knows how to connect to some public compound databases and search there if it can identify what you are importing (HMDB, DrugBank? etc.)