talk @

Full Version: ecotag with custom reference library
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I've built a custom reference library from in house sequencing and converted it to an ecopcr database with obiconvert. Now, I'm trying to use it to assign taxonomy to my data. I downloaded taxonomic data as a taxdumptar.gz from NCBI. I'm getting errors when I run ecotag, it seems maybe my obiconvert didn't work right, as apparently i'm missing a .pdx file?
Thanks very much in advance for help! 

Obiconvert code: obiconvert  --fasta --ecopcrdb-output=customdb -t ./TAXO ./custom/custom_ref_lib.fasta

output files:

  24328144 24 Aug 17:04 customdb.adx
146202025 24 Aug 17:04 customdb.ndx
           385 24 Aug 17:03 customdb.rdx
  75063901 24 Aug 17:03 customdb.tdx
     130312 24 Aug 17:04 customdb_001.sdx

ecotag code:
ecotag -d customdb -R ./custom/custom_ref_lib.fasta ./EKB_run1/ebl_data/ebl.fasta ebl.taxonomy.fasta

ecotag error:

Reading binary taxonomy database...
 [INFO : Taxon alias file found] 
 [INFO : Local taxon file not found] 
Taxonomical tree read
[Errno 2] No such file or directory: 'customdb.pdx'
 [INFO : Preferred taxon name file not found]
Reading reference DB ...  : 750
Traceback (most recent call last):
  File "/Users/DC/Code/OBITools-1.2.10/export/bin/ecotag", line 345, in <module>
    assert seqid not in taxonlink

The "[Errno 2] No such file or directory: 'customdb.pdx'" message is actually a misleading information message, not an error, it means that you don't have a file with preferred taxon names in your database, which is optional.

The error when you run ecotag is actually due to the fact that there are (at least) 2 identical sequence identifiers in your fasta reference sequence file, which is not good because ids are used to identify sequences in a unique way.

To fix it, you can run:

obiannotate --uniq-id my_ref_seqs.fasta > my_ref_seqs_with_unique_ids.fasta