wiki:Running imports

Introduction

Extending ConceptWiki with data from an additional source is supported by ETL scripts (extract, transform, and load). When you want to add a new data source to ConceptWiki, you can use the import interfaces in the ConceptWiki code to help you. In the nl.nbic.conceptwiki.imports.example package, there are several examples of imports you can use as inspiration: Enzyme, UMLS, and WikiPathways. We will dive into the Enzyme import here in more detail, which is a relatively straightforward import. This is a high level overview of the import process:

ETL process - whiteboard

Enzyme import

As the above diagram shows, the import is started by the ImportRunner class. The main method searches the Spring context (for the Enzyme import this is defined in the imports-enzyme-context.xml file) for a class that implements the ImportJob interface. For the Enzyme import, the DefaultImporter is used. The default importer uses an extractor, transformer, and loader. For the Enzyme import we use the following classes: EnzymeExtractor, EnzymeTransformer, and ConceptWikiLoader (the extractor, transformer and loader objects are injected using Spring).

UMLS import

(for the UMLS import this is defined in the imports-umls-context.xml file) for a class that implements the ImportJob interface. For the UMLS import, the DefaultImporter is used. The default importer uses an extractor, transformer, and loader. For the UMLS import we use the following classes: UMLSExtractor, UMLSTransformer, and ConceptWikiLoader (the extractor, transformer and loader objects are injected using Spring).

Troubleshooting

While you are getting your import up and running, you might run into some issues. In this section we discuss a few common issues and how you could resolve them.

The import terminates with a BeanInitializationException: Could not load properties

This is often caused by a properties file that was not found. The nested exception in this case is a FileNotFoundException and the path of the file that the system was looking for should be included. For example: /home/not-found/imports-umls.properties. The location for the imports-umls.properties file is specified in the context:property-placeholder element in the imports-umls-context.xml file. Please check the location and change it if needed.

ETL

The ETL scripts are based on the AbstractConceptImporter in the imports-common package.

Post-processing tasks

There are a few post-processing tasks defined in the trunk/code/conceptwiki/util/fix-ups project. There is a generic UtilRunner class that allows to select the tasks on the command-line. The complete list of options can be printed by:

 cd util/fix-ups
 mvn --quiet exec:java -Dexec.mainClass="nl.nbic.conceptwiki.fixups.UtilRunner" -Dexec.args="-h"

The UtilRunner takes configuration from fix-ups.properties in your home directory. This file is required to run the tasks.

The following properties must be defined in this file.

# History service database
jdbc.driver=org.h2.Driver
jdbc.host=jdbc:h2:file:target/h2-db/db
jdbc.user=sa
jdbc.pass=
jdbc.initscript=classpath:/nl/nbic/conceptwiki/service/schema-h2.sql

# setup neo4j properties
neo4j.storeDir=target/neo4j

# Solr search settings is waiting for the refactor step of removing the hardcoded
# configuration settings and make them more user/deployment-configurable.
solr.url=http://localhost:10080/solr

The databases and SOLR URL should be configured correctly for any of the utilities to work.

The UtilRunner supports the following operations:

reindex:

Rebuild the SOLR index

linksets:

Generate linksets. Takes a target directory as a required argument

preflabels:

Generate the mapping between preferred terms and concepts. Takes a target filename as required argument

getconcept:

Query the graph for a specific concept

After an import the reindex, linksets and preflabels tasks need to be run, but not in a particular order.

The output of the linksets and preflabels commands needs to be zipped and uploaded to http://downloads.nbiceng.net/linksets/

Last modified 5 years ago Last modified on Nov 20, 2013, 12:07:50 PM

Attachments (2)

Download all attachments as: .zip