Welcome, Guest
You have to register before you can post on our site.

Username/Email:
  

Password
  





Search Forums

(Advanced Search)

Forum Statistics
» Members: 280
» Latest member: bmilesqchvs9812
» Forum threads: 30
» Forum posts: 82

Full Statistics

Online Users
There are currently 22 online users.
» 0 Member(s) | 22 Guest(s)

Latest Threads
Error running obiconvert
Forum: Using OBITools
Last Post: laur34
11-01-2016, 08:18 PM
» Replies: 3
» Views: 4,287
Problem with building the...
Forum: Using OBITools
Last Post: Robert
10-11-2016, 09:46 AM
» Replies: 3
» Views: 4,011
abiannotate: how to acces...
Forum: Using OBITools
Last Post: coissac
08-11-2016, 04:39 AM
» Replies: 1
» Views: 2,248
Database
Forum: Using OBITools
Last Post: adejode
08-08-2016, 02:50 PM
» Replies: 0
» Views: 1,307
obiextract: bug?
Forum: Using OBITools
Last Post: cbird
08-05-2016, 12:16 AM
» Replies: 2
» Views: 1,705
Parallel Strategies
Forum: Using OBITools
Last Post: cbird
08-01-2016, 03:11 PM
» Replies: 1
» Views: 1,564
ngsfilter: pattern matchi...
Forum: Using OBITools
Last Post: cbird
08-01-2016, 02:53 PM
» Replies: 4
» Views: 2,154
illuminapairedend: align...
Forum: Using OBITools
Last Post: cbird
08-01-2016, 02:41 PM
» Replies: 2
» Views: 1,881
ngsfilter: error arg / fu...
Forum: Using OBITools
Last Post: coissac
07-23-2016, 03:45 PM
» Replies: 1
» Views: 1,511
ngsfilter: how to represe...
Forum: Using OBITools
Last Post: cbird
07-22-2016, 10:23 AM
» Replies: 2
» Views: 1,769

 
  Error running obiconvert
Posted by: laur34 - 10-20-2016, 02:07 PM - Forum: Using OBITools - Replies (3)

Hi,

I am also going through the tutorial for OBITools, and I run into an error when formatting the data using obiconvert:

command:

  • obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/std/*.dat.gz
After a few minutes, it causes the error
Error on sequence : EU394480;
and
struct.error: cannot convert argument to integer

I tried downloading a new taxdump tarball, but I still get the same error.
I found on another forum that it is caused by a discrepancy with the taxid, but I'm still not sure how to overcome it and get the program to finish running.
Thank you to anyone who helps. Heart

Print this item

  Problem with building the reference database
Posted by: Robert - 10-07-2016, 11:20 AM - Forum: Using OBITools - Replies (3)

Hi,
I am experiencing a problem when trying to build a reference database (→ wolf tutorial).

Using

obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/*.dat

generates the error messages

ValueError: too many values to unpack
Exception AttributeError: "'EcoPCRDBSequenceWriter' object has no attribute '_file'" in <bound method EcoPCRDBSequenceWriter.__del__ of <obitools.ecop

See full output below. Any suggestions what might be the cause and how to fix it? Thanks!

full output:


robert@vfm-d023:~/wolf_tutorial2$ obiconvert --embl -t ./TAXO --ecopcrdb-output=embl_last ./EMBL/*.dat
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Traceback (most recent call last):
File "/home/robert/obitools/OBITools-1.2.9/export/bin/obiconvert", line 43, in <module>
writer = sequenceWriterGenerator(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/format/options.py", line 364, in sequenceWriterGenerator
writer=EcoPCRDBSequenceWriter(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/sequence.py", line 72, in __init__
self._taxonomy= loadTaxonomyDatabase(options)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/options.py", line 78, in loadTaxonomyDatabase
taxonomy = TaxonomyDump(options.taxdump)
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/taxonomy.py", line 424, in __init__
for taxid,name,classname in self._nameIterator('%s/names.dmp' % taxdir):
File "/home/robert/obitools/OBITools-1.2.9/lib/python2.7/site-packages/obitools/ecopcr/taxonomy.py", line 526, in _nameIterator
for taxid,name,unique,classname,white in names:
ValueError: too many values to unpack
Exception AttributeError: "'EcoPCRDBSequenceWriter' object has no attribute '_file'" in <bound method EcoPCRDBSequenceWriter.__del__ of <obitools.ecop

Print this item

  abiannotate: how to access sequence identifier?
Posted by: ekrell - 08-10-2016, 02:49 PM - Forum: Using OBITools - Replies (1)

My goal is to use the string of the sequence identifier when adding/modifying sequence attributes.


This can be done with the sequence itself:

Code:
> obiannotate -S short:'len(sequence)<100' seq1.fasta > seq2.fasta


But I wish to do the same with the identifier. I have already tried "identifier" as well as a number of similar terms. I feel like this is supported and I just do not see it.

If it is not supported, I hope that it will be. Many existing fast[aq] files have identifiers such that each field is separated by some delimiter, such a colon. I find the obitools style of key-value attribute pairs much more appealing as they allow using obitools to their full potential. What I would like to do it use the existing sequence identifier and convert it into those key-value pairs using obiannotate.

For example, suppose I have a sequence identifier such as:

Code:
GH8768:flounder:5865:5687:47

I would use the -S option to use python to grab substrings and form:

Code:
GH8768:flounder:5865:5687:47 SEQID=GH8768; common_name=flounder; taxid=5868; genbank=5687; count=47;

I also want to do the opposite: Since many existing programs only read that first colon-delimited string, I want to be able to collapse all or selected attributes into the delimited header. By having both, I would be able to use obitools but also prepare the data for vsearch, whose options sometimes require specific configuration of the sequence identifier.

Currently, I achieve both of the above via bash scripting, but much less easily than if I could just access the identifier within obitools.

Print this item

  Database
Posted by: adejode - 08-08-2016, 02:50 PM - Forum: Using OBITools - No Replies

Hi,

I have followed the Wolves'diet tutorial adapting it to my data.

Now I need to build or choose my database to make a taxonomic assignment for my sequences. My project is quite exploratory and so I would like to have a large reference data base.

I was wondering if it is possible to use the ecotag command directly on genbank ? or is it necessary to fisrt run ecopcr on genbank to create a subset of the whole dataset ?

Thanks

Print this item

  obiextract: bug?
Posted by: cbird - 08-01-2016, 03:30 PM - Forum: Using OBITools - Replies (2)

When using obiextract to divide a fasta file into multiple fasta files by sample (1 file per sample), a line is added at the beginning of the file that is not compatible with the fasta format.  

This then requires sed, awk, or perl to remove the added line from each file.

There seems to be no arguments that prevent this from happening.

Print this item

  illuminapairedend: alignment scoring system?
Posted by: cbird - 07-23-2016, 04:10 PM - Forum: Using OBITools - Replies (2)

I'm trying to determine what a reasonable alignment score would be, given my sequence length and expected amount of overlap.  

What is the alignment scoring system employed by the illuminapairedend command?

Match score? 1?

Mismatch penalty ? -2?

Or is a complex match/mismatch scoring matrix used?


Gap penalty type? Affine?


Gap open penalty value?

Gap extend penalty value?

Print this item

  Parallel Strategies
Posted by: cbird - 07-23-2016, 12:05 PM - Forum: Using OBITools - Replies (1)

We've been using obitools for a few months in my lab. It has been working well but, most or all of the commands are single threaded.  I actually like that because it gives me more control over how obitools is running.

I was wondering if anybody was willing to share their strategies for making obitools parallel?

Here is ours:

This past week, I was helping a student with illuminapaired end, obigrep, and ngsfilter.  It was taking several hours for our workstation (dual xeon v3, 40 threads of capacity) to plow through illuminapairedend because it was only using 1 thread. I wanted to see how adjusting the alignment score would affect the retention of sequences, so we wanted to run illumina paired end with 5 different alignment scores.  We wrote  a bash script that broke the fastq files into several files, then used gnu parallel (rather than a nested for loop) to run illuminapairedend on the directory of files while iterating through the 5 different alignment scores and it finished very quickly.  After the gnu parallel command, we concat the files back together.  FWIW, changing the alignment score didn't change much. However, after reading the documentation, we changed a couple arguments and increased our number of retained reads by 5x.  We were able to employ the parallel strategy with the other obitools commands also.

This same strategy works well for for other obitools steps if you are only processing 1 file, to review:
     Divide fastq into several files (one pair of f and r files per thread)
     Use gnu parallel instead of a for loop to run the obitools command on each of the sub files (this provides dramatic speed increase if you have a lot of threads available)
     Concatenate the files (or not if you are going to run another obitools command with gnu parallel)

Print this item

  ngsfilter: error arg / fuzzy pattern match
Posted by: cbird - 07-23-2016, 11:13 AM - Forum: Using OBITools - Replies (1)

The documentation states that the -e argument allows one to set the number of allowable errors in the pattern match to the primer.

What errors are included? Insertions, deletions, and substitutions?

If -e 2,  would that allow for 2 "errors" in the pattern match to one primer, or 2 errors across both primers?  

Am I correct in understanding that there can be no errors in the pattern match to the tags?


I can figure it out from here:

https://git.metabarcoding.org/obitools/o...sfilter.py

Print this item

  ngsfilter: pattern matching
Posted by: cbird - 07-22-2016, 10:46 AM - Forum: Using OBITools - Replies (4)

What is the general structure of the pattern match that ngsfilter searches for?

If this is the format of my sequences $tag1$fprimer.*$rprimer$tag2  where
$tag1 is the first tag in the sample description file
$fprimer is the forward primer
.* is posix regex, meaning 1 to any number of bases of any identity
then it seems to work fine.

What if, for some reason, there are extra bases in front of the tag1, will ngs filter still recognize a match or does it require that the tag be at the beginning of the line?

Will ngs filter also detect the following patterns?    

$fprimer$tag1.*$rprimer
$tag1$fprimer.*$rprimer

I have some data that is formatted like this, and need to demultiplex it.

Print this item

  ngsfilter: how to represent degenerate primers in sample description file?
Posted by: cbird - 07-21-2016, 04:18 PM - Forum: Using OBITools - Replies (2)

When using ngsfilter, I want it to be able to search for degenerate primers.

I was wondering if I can use regex in the sample description file that is referenced by ngsfilter?

If yes, what flavor of regex can I use?  POSIX?  PERL?

Otherwise, does ngsfilter understand the IUPAC ambiguity code?  

The universal COI primers developed by Leray et al for metabarcoding are highly degenerate.  Consequently, there has to be a way to search allow for multiple matches at certain positions. 

Let's say my primer is

GGYCTW

I'm thinking about putting the following into my sample description file
GG[CT]CT[AT]


I know that I could set the -e argument to specify the number of mismatches, but that's not an elegant solution with 10 in 26 positions being degenerate.

Print this item