talk @ metabarcoding.org
abiannotate: how to access sequence identifier? - Printable Version

+- talk @ metabarcoding.org (http://talk.metabarcoding.org)
+-- Forum: Bioinformatic softwares (http://talk.metabarcoding.org/forumdisplay.php?fid=1)
+--- Forum: OBITools (http://talk.metabarcoding.org/forumdisplay.php?fid=2)
+---- Forum: Using OBITools (http://talk.metabarcoding.org/forumdisplay.php?fid=7)
+---- Thread: abiannotate: how to access sequence identifier? (/showthread.php?tid=31)



abiannotate: how to access sequence identifier? - ekrell - 08-10-2016

My goal is to use the string of the sequence identifier when adding/modifying sequence attributes.


This can be done with the sequence itself:

Code:
> obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta


But I wish to do the same with the identifier. I have already tried "identifier" as well as a number of similar terms. I feel like this is supported and I just do not see it.

If it is not supported, I hope that it will be. Many existing fast[aq] files have identifiers such that each field is separated by some delimiter, such a colon. I find the obitools style of key-value attribute pairs much more appealing as they allow using obitools to their full potential. What I would like to do it use the existing sequence identifier and convert it into those key-value pairs using obiannotate.

For example, suppose I have a sequence identifier such as:

Code:
GH8768:flounder:5865:5687:47

I would use the -S option to use python to grab substrings and form:

Code:
GH8768:flounder:5865:5687:47 SEQID=GH8768; common_name=flounder; taxid=5868; genbank=5687; count=47;

I also want to do the opposite: Since many existing programs only read that first colon-delimited string, I want to be able to collapse all or selected attributes into the delimited header. By having both, I would be able to use obitools but also prepare the data for vsearch, whose options sometimes require specific configuration of the sequence identifier.

Currently, I achieve both of the above via bash scripting, but much less easily than if I could just access the identifier within obitools.


RE: abiannotate: how to access sequence identifier? - coissac - 08-11-2016

Hi ekrell,

Just use sequence.id

All the best

Eric