[nexml]
phylogenetic data in xmlThe future data exchange standard is here!
nexml is an exchange standard for representing phylogenetic data — inspired by the commonly used NEXUS format, but more robust and easier to process.
Overview
The NEXUS file format is a commonly used format for phylogenetic data. Unfortunately, over time, the format has become overloaded - which has caused various problems. Meanwhile, new technologies around the XML standard have emerged. These technologies have the potential to greatly simplify, and improve robustness, in the processing of phylogenetic data:
- Syntax validation — some of the issues hampering interoperability are caused by the fact that no formal specification exists for NEXUS and other flat files, and no unambiguous way to validate them. Thanks to XML Schema we can now define a grammar against which data files can be validated syntactically.
- Semantic annotation — another issue in current file formats is that their semantics are not well-defined. For example, what does it mean to use an ambiguity code in a matrix? Is it uncertainty or polymorphism? With the wider EvoInfo working group we are developing an ontology on which we are mapping nexml schema types so that the semantics of data files become well-defined.
- Web services — a number of different technologies (such as XML-RPC, REST and SOAP) have emerged allowing disparate, xml-based services to be glued together over the internet. Using such services, researchers can "farm out" their calculations to dedicated servers, such as those of the CIPRES project. The wider plan is to integrate such services in an ontology-mediated architecture.
Therefore, a group of developers of phylogenetic software have come together as part of the NESCent working group for evolutionary informatics to develop a new data exchange standard based on these technologies.
[ Back to top ]
What are we doing about it?
Nexml development is being undertaken in a number of subprojects:
- In the first place, we're designing an XML schema. This schema (designated as namespace http://www.nexml.org/1.0) is documented on our wiki; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
-
Secondly, we're implementing nexml read/write abilities in a number of
software packages:
Mesquite now supports reading
and writing of nexml. This implementation has been developed by Peter Midford and Rutger
Vos.
At the most recent EvoInfo meeting, Xuhua Xia demonstrated
DAMBE's abilities
to read and write nexml data transparently.
The phylobase
package for R reads and writes
tree descriptions, with character matrices under way. This implementation
is being developed by Aaron Mackey.
Jeet Sukumaran has implemented nexml I/O for python, in a DendroPy.
Chase Miller has implemented nexml I/O for
BioPerl's TreeIO and
AlignIO interfaces, which under the hood reuse Rutger Vos's
Bio::Phylo
parser libraries.
Mark Jensen has implemented nexml compatability for the
HIVQuery web
application.
- Third, we're crossreferencing the nexml schema with the Character Data Analysis Ontology which is being developed by other members of the EvoInfo working group.
[ Back to top ]
Get involved!
If you are interested in being involved in the nexml project in any way, please do! Here are some ways to get involved:
- Get informed — information about the nexml project is distributed over the wiki (for an overview of vision, plans, implementation), documentation (for formal description of the schema) and the mailing list (for immediate plans and discussion).
- Try it out — the download section of the website has nightly builds of bindings for various languages. Take these for a spin!
- Contribute — if you are a programmer interested in extending nexml support, please contact us through the mailing list to get commit support for the subversion repository.
[ Back to top ]