week 8 blog reflection
This week we talked a lot about ASVs and dada2. Dada2 is a tool I use with qiime2 to help assign taxonomy to DNA samples. I would like to learn more about how dada2 works. As it is, my understanding is that dada2 uses similarity scoring, error thresholds, predominance of exact sequences, and quality scores, but I don’t really feel like I quite know how it does any of that. This is partially the result of the advantage, and disadvantage of qiime2. The dada2 plugin for qiime has very few options, in fact, its run more or less through a single command, with only a few parameters compared to the R script version. As machinery for reading DNA becomes more accurate, and allow for longer reads, software that interprets that data, like DADA2 are also improving. Future methods of DNA clustering may also improve as machine learning methods improve. Perhaps researchers will find a way to somehow unify data more. For instance, in the presentation it was talked about clustering reads either ‘de novo’, just using the read data from a lane, or also using a reference database. As machine learning programs become better to more intelligent learn from diverse sources, perhaps this could be extended, perhaps algorithms could learn from dozens of different samples from the same machine in the lab, or maybe thousands of samples from a database of samples from that machine. Perhaps these different sampes could provide learning that was given lesser weight, but might add cumulatively to precision by learning at multiple scales. Regardless, one has the feeling that we are only scratching the tiniest portion of the mammoth iceberg that is the capabilities of DNA sequencing, and, by proxy, eDNA analysis.