[[ProjectMembers]] = Welcome to the Peregrine data-mining project = The ''data-mining'' project contains Peregrine and several supporting modules, e.g. for ontologies and datasets. Peregrine is an ''indexing engine'' or ''tagger'': a piece of software that can be used to recognize concepts in human readable text, based on a database (thesaurus) of known terms. Multi-word terms are correctly recognized. If terms can represent multiple concepts, Peregrine will attempt to disambiguate them. Peregrine was originally developed by Martijn Schuemie at the department of Medical Informatics of the Erasmus University Medical Center (EMC) in Rotterdam and has been improved and made into open source in collaboration with NBIC's [http://www.nbic.nl/support/BET BioAssist Engineering Team]. Peregrine project has a dual licensing model. Its source code is released under the AGPL license for the open source community. For people who are interested in a closed source and commercial use of Peregrine, please contact [http://www.biosemantics.org/new/index.php?page=Jan-Kors Jan Kors] or [http://www.erasmusmc.nl/tto-cs/976943/kennistransfer?lang=en ErasmusMC's Technology Transfer Office]. Key applications of the Peregrine system are currently: * Internal developments at the [http://www.biosemantics.org Biosemantics Group] at Erasmus University Medical Center (EMC) in Rotterdam and the Leiden University Medical Center (LUMC). * The [https://trac.nbic.nl/intext-semantic/ Intext-semantic] package and the [https://trac.nbic.nl/linker Linker], which are used by [http://www.nbic.nl/support/BET NBIC BioAssist] to provide a Peregrine engine on top of the [http://www.conceptwiki.org/ ConceptWiki]. If you will be using Peregrine yourself, drop us a line and we will add your application to this list. Also, become a member of the data-mining-users list mentioned below, so that we can keep you up to date with things we're planning to do and answer any questions you may have. == Learning about Peregrine == Please become a member of the data-mining-users mailing list (mentioned below), and if you have any question that is not addressed here feel free to post it to that list. ==== Getting started guide ==== * Please read the [wiki:Prerequisites prerequisites]. * If you want to integrate Peregrine into your existing Java project, you have two options: * [wiki:"Using plain jar files" Use Peregrine with plain jar files]. * [wiki:"Using maven" Use Peregrine with Maven]. * In all cases where you run Peregrine yourself, you need to load it with an ontology. There are two ways of supplying an ontology to Peregrine: * [wiki:"ErasmusMC ontology file format" ontology in a text file]. A sample [download:14 file ontology] can be found in the Downloads section. * [wiki:"DB Schema" ontology in a database]. * Look at the [wiki:ReleaseNotes release notes] to learn more about features in Peregrine releases. ==== Developer documentation ==== * Read the [wiki:DeveloperGuide developer's guide] if you want to setup an Eclipse project and develop new features in Peregrine. * Read the [wiki:IndexingProcess indexing process] if you want to know how Peregrine uses an ontology and index text. * Read the [wiki:DisambiguationSteps disambiguation steps] if you want understand the concept disambiguation procedure. * The [wiki:ArchitectureSpecification architecture specification] uses several diagrams to give a high level overview of Peregrine and related components. * We use [wiki:ContinuousIntegration continuous integration] to maintain our code quality. * Read the [wiki:"Text Indexing Architecture" text indexing architecture] if you want to understand the text indexing tool chain. * The [wiki:ReleaseProcedure release procedure] describes the steps to make a new release. * The [wiki:"Developer Meetings" developer meetings] page records the meetings that we have had with Peregrine developers and advanced users. == Accessing Peregrine from your browser == We provide a public Peregrine web service at http://peregrine.nbiceng.net. This service uses Peregrine and the [https://trac.nbic.nl/intext-semantic Intext-semantic] package to recognize concepts in the text you can supply. A English language bio-medical ontology is pre-loaded. == Mailing lists == The Peregrine project provides the following mailing lists: [[ProjectLists]] == Source access == You can get the source code by running the following Subversion (svn) command: {{{ svn co https://trac.nbic.nl/svn/data-mining data-mining }}} Write access is only available to registered developers. You can become a developer by [https://trac.nbic.nl/index/#register registering yourself] on the NBIC trac system if you haven't already done so, and requesting write access on the [https://trac.nbic.nl/mailman/listinfo/data-mining-admins data-mining-admins] mailing list. == Other NBIC software projects == Many other NBIC software projects can be accessed from the [https://trac.nbic.nl/ project index].