Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Biology
BiologyBotanyMicrobiologyEntomologyEvolutionPaleontology
Chemistry
General ChemistryAnalytical ChemistryElectrochemistryOrganic Synthesis
Earth Science
GeologyMineralogyOceanographyMeteorologyEarthquakes
Physics
General PhysicsResearchRelativityParticle PhysicsElectromagnetismFusionOpticsAcousticsNew Theories

Natural Science Forum / Biology / Microbiology / June 2007



Tip: Looking for answers? Try searching our database.

PHYLIP and DNADIST

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Chris Hoffmann - 27 Jun 2007 16:44 GMT
Hi everybody,
I was wondering about DNADIST, from the PHYLIP package.
I am conducting a big sequencing project and there will be several phases. I
would like to construct a distance matrix using DNADIST with a initial
dataset and later on only add more sequences to the set. but I didn't want
to have to re-run the program with all the sequences again. is there a way
to only insert the new data into the matrix?
For example:
initially I want calculate the distances from sequences in group of
sequences A;
then when I get group of sequences B, calculate the distances within
sequences in group B;
and calculate the distances between sequences in group A and B without
having to re-calculate the distances for group A again.
Tthis is a simple example, I am actually likely to have 5 or more sets of
sequences, ranging from 5000 to 20000 sequences per group (perhaps more).
I realize I may have to adapt the code (another issue entirely) but what I
am concerned is if the methods used by DNADIST give reliable results if I
calculate them in this fashion.
I wanted to use the F84 model, the default, but I am open to suggestions.
Any help on this would be great.
Thanks
Chris
Joe Felsenstein - 30 Jun 2007 03:01 GMT
>I was wondering about DNADIST, from the PHYLIP package.
>I am conducting a big sequencing project and there will be several phases. I
[quoted text clipped - 14 lines]
>am concerned is if the methods used by DNADIST give reliable results if I
>calculate them in this fashion.

1. Dnadist will not add the new distances without recomputing the old ones
  in this way.
2. In any case, for the F84 distances the formulas use the base frequencies
  found (empirically) in the input sequences.  If you add more input
  sequences you then most likely have slightly altered empirical frequencies
  so you want to recompute the original ones anyway.
3. I suspect our formulas can compute this many distances, but
4. With 20,000 sequences there are 400,000,000 distances in all which, if each
  is about 10 bytes long, is a table 4 GB in size.  That is too big to
  use.  You ought to therefore reconsider your motivation for doing this.

I have posted this rather than emailing to the original poster because
it might be educational for others using our programs.

----
Joe Felsenstein         joe@removethispart.gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.