talk @
abiannotate: how to access sequence identifier? - Printable Version

+- talk @ (
+-- Forum: Bioinformatic softwares (
+--- Forum: OBITools (
+---- Forum: Using OBITools (
+---- Thread: abiannotate: how to access sequence identifier? (/showthread.php?tid=31)

abiannotate: how to access sequence identifier? - ekrell - 08-10-2016

My goal is to use the string of the sequence identifier when adding/modifying sequence attributes.

This can be done with the sequence itself:

> obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta

But I wish to do the same with the identifier. I have already tried "identifier" as well as a number of similar terms. I feel like this is supported and I just do not see it.

If it is not supported, I hope that it will be. Many existing fast[aq] files have identifiers such that each field is separated by some delimiter, such a colon. I find the obitools style of key-value attribute pairs much more appealing as they allow using obitools to their full potential. What I would like to do it use the existing sequence identifier and convert it into those key-value pairs using obiannotate.

For example, suppose I have a sequence identifier such as:


I would use the -S option to use python to grab substrings and form:

GH8768:flounder:5865:5687:47 SEQID=GH8768; common_name=flounder; taxid=5868; genbank=5687; count=47;

I also want to do the opposite: Since many existing programs only read that first colon-delimited string, I want to be able to collapse all or selected attributes into the delimited header. By having both, I would be able to use obitools but also prepare the data for vsearch, whose options sometimes require specific configuration of the sequence identifier.

Currently, I achieve both of the above via bash scripting, but much less easily than if I could just access the identifier within obitools.

RE: abiannotate: how to access sequence identifier? - coissac - 08-11-2016

Hi ekrell,

Just use

All the best