Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem working with OBItools/EcoPCR
#1
We working with Obitools based on the Wolves’ diet protocol and we
really love it. But we have a big problem by download the whole set of
EMBL sequences. It is downloading nothing by using:  wget -nH
--cut-dirs=4 -Arel_std_\*.dat.gz -m
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/ 

We also tried a to download with

wget -nH –cut-dirs=4 -Arel_std_\*.dat.gz -m
ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release 
Here we only get a wget-log file, which fails after obiconvert.
Probably by producing empty sdx files and an error in ecoPCR (error 3)
occur. Obviously it is not the write format.

Because of that, we tried also to use NCBI fasta downloads, but there
occurring also errors in obiconvert like:

 File "/opt/anaconda27/bin/obiconvert", line 52, in <module>
   writer(entry)
 File
"/opt/anaconda27/lib/python2.7/site-packages/obitools/format/options.py",
line 371, in sequenceWriter
   writer.put(sequence)
 File
"/opt/anaconda27/lib/python2.7/site-packages/obitools/ecopcr/sequence.py",
line 173, in put
   self._file.write(self._ecoSeqPacker(sequence))
 File
"/opt/anaconda27/lib/python2.7/site-packages/obitools/ecopcr/sequence.py",
line 127, in _ecoSeqPacker
   raise Exception("Taxonomy error for %s: %s"%(seq.id, "taxonomy is
missing" if self._taxonomy is None else "sequence has no taxid" if
'taxid' not in seq else "wrong taxid"))
Exception: Taxonomy error for gi|6066746|emb|AJ012532.1|: sequence has
no taxid

We also tried ecoPCRFormat.py for the same fasta files

ecoPCRFormat.py --fasta --name 18Seco_b --taxonomy ./TAXO
18S_NCBI.fasta

Gives us following files

-rw-r--r--    4  18Seco_b_001.sdx
-rw-r--r--  125429400 18Seco_b.ndx
-rw-r--r--  368  18Seco_b.rdx
-rw-r--r--  64173025 18Seco_b.tdx

ecoPCR -d 18Seco_b -e 3 -l 50 -L 250 \ AGGTCWGTRATGCCCTYMG
TGYACAAAGGBCAGGGAC > 18S_b.ecopcr
à Error 3 in file ecoapat.c line 157 : Error in pattern checking

We are targeting in general Metazoa with 18S and COI from environmental
samples. Sorry for annoying you, but we are a little bit lost. We hope
you can help us for downloading the whole set of sequences and for
formatting them suitable to an ecoPCR database.
Reply
#2
Hi Babett,

EBI FTP site is regularly changing is structure. The current address to use now is 

    http://ftp.ftp.ebi.ac.uk/pub/databases/embl/release/std

one more subdirectory is added 'std'

Concerning the NCBI data, you must always download the genbank format to retrieve the ncbi taxid stored in this format but not in the fasta format.

All the best

Eric
Reply
#3
Hi Eric,

Thanks so much for the new address!  Along the same lines, the mitochondria directory for this tutorial has been archived - is there a new directory we can use instead?

 Again, thank you so much for your time.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)