Welcome, Guest
You have to register before you can post on our site.

Username/Email:
  

Password
  





Search Forums

(Advanced Search)

Forum Statistics
» Members: 508
» Latest member: yjyzahe
» Forum threads: 30
» Forum posts: 83

Full Statistics

Online Users
There are currently 3 online users.
» 0 Member(s) | 3 Guest(s)

Latest Threads
ecotag with custom refere...
Forum: Using OBITools
Last Post: celine.mercier
09-28-2017, 06:56 AM
» Replies: 3
» Views: 567
Error running obiconvert
Forum: Using OBITools
Last Post: laur34
11-01-2016, 08:18 PM
» Replies: 3
» Views: 5,438
Problem with building the...
Forum: Using OBITools
Last Post: Robert
10-11-2016, 09:46 AM
» Replies: 3
» Views: 4,848
abiannotate: how to acces...
Forum: Using OBITools
Last Post: coissac
08-11-2016, 04:39 AM
» Replies: 1
» Views: 2,725
Database
Forum: Using OBITools
Last Post: adejode
08-08-2016, 02:50 PM
» Replies: 0
» Views: 1,667
obiextract: bug?
Forum: Using OBITools
Last Post: cbird
08-05-2016, 12:16 AM
» Replies: 2
» Views: 2,340
Parallel Strategies
Forum: Using OBITools
Last Post: cbird
08-01-2016, 03:11 PM
» Replies: 1
» Views: 2,012
ngsfilter: pattern matchi...
Forum: Using OBITools
Last Post: cbird
08-01-2016, 02:53 PM
» Replies: 4
» Views: 3,035
illuminapairedend: align...
Forum: Using OBITools
Last Post: cbird
08-01-2016, 02:41 PM
» Replies: 2
» Views: 2,496
ngsfilter: error arg / fu...
Forum: Using OBITools
Last Post: coissac
07-23-2016, 03:45 PM
» Replies: 1
» Views: 1,974

 
  ecotag with custom reference library
Posted by: davon - 08-25-2017, 05:15 PM - Forum: Using OBITools - Replies (3)

Hi,
I've built a custom reference library from in house sequencing and converted it to an ecopcr database with obiconvert. Now, I'm trying to use it to assign taxonomy to my data. I downloaded taxonomic data as a taxdumptar.gz from NCBI. I'm getting errors when I run ecotag, it seems maybe my obiconvert didn't work right, as apparently i'm missing a .pdx file?
Thanks very much in advance for help! 
Davon

Obiconvert code: obiconvert  --fasta --ecopcrdb-output=customdb -t ./TAXO ./custom/custom_ref_lib.fasta

output files:

  24328144 24 Aug 17:04 customdb.adx
146202025 24 Aug 17:04 customdb.ndx
           385 24 Aug 17:03 customdb.rdx
  75063901 24 Aug 17:03 customdb.tdx
     130312 24 Aug 17:04 customdb_001.sdx

ecotag code:
ecotag -d customdb -R ./custom/custom_ref_lib.fasta ./EKB_run1/ebl_data/ebl.fasta ebl.taxonomy.fasta

ecotag error:

Reading binary taxonomy database...
 [INFO : Taxon alias file found] 
 [INFO : Local taxon file not found] 
Taxonomical tree read
[Errno 2] No such file or directory: 'customdb.pdx'
 [INFO : Preferred taxon name file not found]
 ok
Reading reference DB ...  : 750
Traceback (most recent call last):
  File "/Users/DC/Code/OBITools-1.2.10/export/bin/ecotag", line 345, in <module>
    assert seqid not in taxonlink
AssertionError

Print this item

  Error running obiconvert
Posted by: laur34 - 10-20-2016, 02:07 PM - Forum: Using OBITools - Replies (3)

Hi,

I am also going through the tutorial for OBITools, and I run into an error when formatting the data using obiconvert:

command:

  • obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/std/*.dat.gz
After a few minutes, it causes the error
Error on sequence : EU394480;
and
struct.error: cannot convert argument to integer

I tried downloading a new taxdump tarball, but I still get the same error.
I found on another forum that it is caused by a discrepancy with the taxid, but I'm still not sure how to overcome it and get the program to finish running.
Thank you to anyone who helps. Heart

Print this item

  Problem with building the reference database
Posted by: Robert - 10-07-2016, 11:20 AM - Forum: Using OBITools - Replies (3)

Hi,
I am experiencing a problem when trying to build a reference database (→ wolf tutorial).

Using

obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/*.dat

generates the error messages

ValueError: too many values to unpack
Exception AttributeError: "'EcoPCRDBSequenceWriter' object has no attribute '_file'" in <bound method EcoPCRDBSequenceWriter.__del__ of <obitools.ecop

See full output below. Any suggestions what might be the cause and how to fix it? Thanks!

full output:


robert@vfm-d023:~/wolf_tutorial2$ obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/*.dat
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Traceback (most recent call last):
File "/home/robert/obitools/OBITools-1.2.9/export/bin/obiconvert", line 43, in <module>
writer = sequenceWriterGenerator(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/format/options.py", line 364, in sequenceWriterGenerator
writer=EcoPCRDBSequenceWriter(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/sequence.py", line 72, in __init__
self._taxonomy= loadTaxonomyDatabase(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/options.py", line 78, in loadTaxonomyDatabase
taxonomy = TaxonomyDump(options.taxdump)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/taxonomy.py", line 424, in __init__
for taxid,name,classname in self._nameIterator('%s/names.dmp' % taxdir):
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/taxonomy.py", line 526, in _nameIterator
for taxid,name,unique,classname,white in names:
ValueError: too many values to unpack
Exception AttributeError: "'EcoPCRDBSequenceWriter' object has no attribute '_file'" in <bound method EcoPCRDBSequenceWriter.__del__ of <obitools.ecop

Print this item

  abiannotate: how to access sequence identifier?
Posted by: ekrell - 08-10-2016, 02:49 PM - Forum: Using OBITools - Replies (1)

My goal is to use the string of the sequence identifier when adding/modifying sequence attributes.


This can be done with the sequence itself:

Code:
> obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta


But I wish to do the same with the identifier. I have already tried "identifier" as well as a number of similar terms. I feel like this is supported and I just do not see it.

If it is not supported, I hope that it will be. Many existing fast[aq] files have identifiers such that each field is separated by some delimiter, such a colon. I find the obitools style of key-value attribute pairs much more appealing as they allow using obitools to their full potential. What I would like to do it use the existing sequence identifier and convert it into those key-value pairs using obiannotate.

For example, suppose I have a sequence identifier such as:

Code:
GH8768:flounder:5865:5687:47

I would use the -S option to use python to grab substrings and form:

Code:
GH8768:flounder:5865:5687:47 SEQID=GH8768; common_name=flounder; taxid=5868; genbank=5687; count=47;

I also want to do the opposite: Since many existing programs only read that first colon-delimited string, I want to be able to collapse all or selected attributes into the delimited header. By having both, I would be able to use obitools but also prepare the data for vsearch, whose options sometimes require specific configuration of the sequence identifier.

Currently, I achieve both of the above via bash scripting, but much less easily than if I could just access the identifier within obitools.

Print this item

  Database
Posted by: adejode - 08-08-2016, 02:50 PM - Forum: Using OBITools - No Replies

Hi,

I have followed the Wolves'diet tutorial adapting it to my data.

Now I need to build or choose my database to make a taxonomic assignment for my sequences. My project is quite exploratory and so I would like to have a large reference data base.

I was wondering if it is possible to use the ecotag command directly on genbank ? or is it necessary to fisrt run ecopcr on genbank to create a subset of the whole dataset ?

Thanks

Print this item

  obiextract: bug?
Posted by: cbird - 08-01-2016, 03:30 PM - Forum: Using OBITools - Replies (2)

When using obiextract to divide a fasta file into multiple fasta files by sample (1 file per sample), a line is added at the beginning of the file that is not compatible with the fasta format.  

This then requires sed, awk, or perl to remove the added line from each file.

There seems to be no arguments that prevent this from happening.

Print this item

  illuminapairedend: alignment scoring system?
Posted by: cbird - 07-23-2016, 04:10 PM - Forum: Using OBITools - Replies (2)

I'm trying to determine what a reasonable alignment score would be, given my sequence length and expected amount of overlap.  

What is the alignment scoring system employed by the illuminapairedend command?

Match score? 1?

Mismatch penalty ? -2?

Or is a complex match/mismatch scoring matrix used?


Gap penalty type? Affine?


Gap open penalty value?

Gap extend penalty value?

Print this item

  Parallel Strategies
Posted by: cbird - 07-23-2016, 12:05 PM - Forum: Using OBITools - Replies (1)

We've been using obitools for a few months in my lab. It has been working well but, most or all of the commands are single threaded.  I actually like that because it gives me more control over how obitools is running.

I was wondering if anybody was willing to share their strategies for making obitools parallel?

Here is ours:

This past week, I was helping a student with illuminapaired end, obigrep, and ngsfilter.  It was taking several hours for our workstation (dual xeon v3, 40 threads of capacity) to plow through illuminapairedend because it was only using 1 thread. I wanted to see how adjusting the alignment score would affect the retention of sequences, so we wanted to run illumina paired end with 5 different alignment scores.  We wrote  a bash script that broke the fastq files into several files, then used gnu parallel (rather than a nested for loop) to run illuminapairedend on the directory of files while iterating through the 5 different alignment scores and it finished very quickly.  After the gnu parallel command, we concat the files back together.  FWIW, changing the alignment score didn't change much. However, after reading the documentation, we changed a couple arguments and increased our number of retained reads by 5x.  We were able to employ the parallel strategy with the other obitools commands also.

This same strategy works well for for other obitools steps if you are only processing 1 file, to review:
     Divide fastq into several files (one pair of f and r files per thread)
     Use gnu parallel instead of a for loop to run the obitools command on each of the sub files (this provides dramatic speed increase if you have a lot of threads available)
     Concatenate the files (or not if you are going to run another obitools command with gnu parallel)

Print this item

  ngsfilter: error arg / fuzzy pattern match
Posted by: cbird - 07-23-2016, 11:13 AM - Forum: Using OBITools - Replies (1)

The documentation states that the -e argument allows one to set the number of allowable errors in the pattern match to the primer.

What errors are included? Insertions, deletions, and substitutions?

If -e 2,  would that allow for 2 "errors" in the pattern match to one primer, or 2 errors across both primers?  

Am I correct in understanding that there can be no errors in the pattern match to the tags?


I can figure it out from here:

https://git.metabarcoding.org/obitools/o...sfilter.py

Print this item

  ngsfilter: pattern matching
Posted by: cbird - 07-22-2016, 10:46 AM - Forum: Using OBITools - Replies (4)

What is the general structure of the pattern match that ngsfilter searches for?

If this is the format of my sequences $tag1$fprimer.*$rprimer$tag2  where
$tag1 is the first tag in the sample description file
$fprimer is the forward primer
.* is posix regex, meaning 1 to any number of bases of any identity
then it seems to work fine.

What if, for some reason, there are extra bases in front of the tag1, will ngs filter still recognize a match or does it require that the tag be at the beginning of the line?

Will ngs filter also detect the following patterns?    

$fprimer$tag1.*$rprimer
$tag1$fprimer.*$rprimer

I have some data that is formatted like this, and need to demultiplex it.

Print this item