Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
abiannotate: how to access sequence identifier?
#1
My goal is to use the string of the sequence identifier when adding/modifying sequence attributes.


This can be done with the sequence itself:

Code:
> obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta


But I wish to do the same with the identifier. I have already tried "identifier" as well as a number of similar terms. I feel like this is supported and I just do not see it.

If it is not supported, I hope that it will be. Many existing fast[aq] files have identifiers such that each field is separated by some delimiter, such a colon. I find the obitools style of key-value attribute pairs much more appealing as they allow using obitools to their full potential. What I would like to do it use the existing sequence identifier and convert it into those key-value pairs using obiannotate.

For example, suppose I have a sequence identifier such as:

Code:
GH8768:flounder:5865:5687:47

I would use the -S option to use python to grab substrings and form:

Code:
GH8768:flounder:5865:5687:47 SEQID=GH8768; common_name=flounder; taxid=5868; genbank=5687; count=47;

I also want to do the opposite: Since many existing programs only read that first colon-delimited string, I want to be able to collapse all or selected attributes into the delimited header. By having both, I would be able to use obitools but also prepare the data for vsearch, whose options sometimes require specific configuration of the sequence identifier.

Currently, I achieve both of the above via bash scripting, but much less easily than if I could just access the identifier within obitools.
Reply
#2
Hi ekrell,

Just use sequence.id

All the best

Eric
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)