TextFormats: Simplifying the definition and parsing of text formats in bioinformatics

2022 | journal article. A publication with affiliation to the University of Göttingen.

Jump to: Cite & Linked | Documents & Media | Details | Version history

Cite this publication

​TextFormats: Simplifying the definition and parsing of text formats in bioinformatics​
Gonnella, G.​ (2022) 
PLoS One17(5) art. e0268910​.​ DOI: https://doi.org/10.1371/journal.pone.0268910 

Documents & Media

Main article769.03 kBAdobe PDF

License

Published Version

Attribution 4.0 CC BY 4.0

Details

Authors
Gonnella, Giorgio
Abstract
Text formats are common in bioinformatics, as they allow for editing and filtering using standard tools, as well as, since text formats are often human readable, manual inspection and evaluation of the data. Bioinformatics is a rapidly evolving field, hence, new techniques, new software tools, new kinds of data often require the definition of new formats. Often new formats are not formally described in a standard or specification document. Although software libraries are available for accessing the most common formats, writing parsers for text formats, for which no library is currently available, is a very common though tedious task, utilized by many researchers in the field. This manuscript presents the open source software library and toolset TextFormats (available at https://github.com/ggonnella/textformats ), which aims at simplifying the definition and parsing of text formats. Formats specifications are written in a simple data description format using an interactive wizard. Automatic generation of data examples and automatic testing of specifications allow for checking for correctness. Given the specification for a text format, TextFormats allows parsing and writing data in that format, using several programming languages (Nim, Python, C/C++) or the provided command line and graphical user interface tools. Although designed as a general purpose software, the main target application field, for the above mentioned reasons, is expected to be in bioinformatics: Thus, the specifications of several common existing bioinformatics formats are included.
Issue Date
2022
Journal
PLoS One 
eISSN
1932-6203
Language
English
Sponsor
Open-Access-Publikationsfonds 2022

Reference

Citations


Social Media