Welcome to the Peregrine data-mining project
The data-mining project contains Peregrine and several supporting modules, e.g. for ontologies and datasets. Peregrine is an indexing engine or tagger: a piece of software that can be used to recognize concepts in human readable text, based on a database (thesaurus) of known terms. Multi-word terms are correctly recognized. If terms can represent multiple concepts, Peregrine will attempt to disambiguate them.
Peregrine was originally developed by Martijn Schuemie at the department of Medical Informatics of the Erasmus University Medical Center (EMC) in Rotterdam and has been improved and made into open source in collaboration with NBIC's BioAssist Engineering Team. Peregrine project has a dual licensing model. Its source code is released under the AGPL license for the open source community. For people who are interested in a closed source and commercial use of Peregrine, please contact Jan Kors or ErasmusMC's Technology Transfer Office.
Key applications of the Peregrine system are currently:
- Internal developments at the Biosemantics Group at Erasmus University Medical Center (EMC) in Rotterdam and the Leiden University Medical Center (LUMC).
- The Intext-semantic package and the Linker, which are used by NBIC BioAssist to provide a Peregrine engine on top of the ConceptWiki.
If you will be using Peregrine yourself, drop us a line and we will add your application to this list. Also, become a member of the data-mining-users list mentioned below, so that we can keep you up to date with things we're planning to do and answer any questions you may have.
Learning about Peregrine
Please become a member of the data-mining-users mailing list (mentioned below), and if you have any question that is not addressed here feel free to post it to that list.
Getting started guide
- Please read the prerequisites.
- If you want to integrate Peregrine into your existing Java project, you have two options:
- In all cases where you run Peregrine yourself, you need to load it with an ontology. There are two ways of supplying an ontology to Peregrine:
- Look at the release notes to learn more about features in Peregrine releases.
- Read the developer's guide if you want to setup an Eclipse project and develop new features in Peregrine.
- Read the indexing process if you want to know how Peregrine uses an ontology and index text.
- Read the disambiguation steps if you want understand the concept disambiguation procedure.
- The architecture specification uses several diagrams to give a high level overview of Peregrine and related components.
- We use continuous integration to maintain our code quality.
- Read the text indexing architecture if you want to understand the text indexing tool chain.
- The release procedure describes the steps to make a new release.
- The developer meetings page records the meetings that we have had with Peregrine developers and advanced users.
Accessing Peregrine from your browser
We provide a public Peregrine web service at http://peregrine.nbiceng.net. This service uses Peregrine and the Intext-semantic package to recognize concepts in the text you can supply. A English language bio-medical ontology is pre-loaded.
The Peregrine project provides the following mailing lists:
- data-mining-commits: a list that receives commit messages
- data-mining-devel: a list intended for discussion among developers (subscription is restricted to registered developers)
- data-mining-users: a list intended for general discussion on the project
- data-mining-admins: the project administrators of Peregrine project
You can get the source code by running the following Subversion (svn) command:
svn co https://trac.nbic.nl/svn/data-mining data-mining
Write access is only available to registered developers. You can become a developer by registering yourself on the NBIC trac system if you haven't already done so, and requesting write access on the data-mining-admins mailing list.
Other NBIC software projects
Many other NBIC software projects can be accessed from the project index.