mastodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance, open to everyone, but mainly English and French speaking.

Administered by:

Server stats:

811
active users

#clustering

0 posts0 participants0 posts today

**OptimOTU: Taxonomically aware OTU clustering with optimized thresholds and a bioinformatics workflow for metabarcoding data**

arxiv.org/abs/2502.10350

arXiv.orgOptimOTU: Taxonomically aware OTU clustering with optimized thresholds and a bioinformatics workflow for metabarcoding dataTo turn environmentally derived metabarcoding data into community matrices for ecological analysis, sequences must first be clustered into operational taxonomic units (OTUs). This task is particularly complex for data including large numbers of taxa with incomplete reference libraries. OptimOTU offers a taxonomically aware approach to OTU clustering. It uses a set of taxonomically identified reference sequences to choose optimal genetic distance thresholds for grouping each ancestor taxon into clusters which most closely match its descendant taxa. Then, query sequences are clustered according to preliminary taxonomic identifications and the optimized thresholds for their ancestor taxon. The process follows the taxonomic hierarchy, resulting in a full taxonomic classification of all the query sequences into named taxonomic groups as well as placeholder "pseudotaxa" which accommodate the sequences that could not be classified to a named taxon at the corresponding rank. The OptimOTU clustering algorithm is implemented as an R package, with computationally intensive steps implemented in C++ for speed, and incorporating open-source libraries for pairwise sequence alignment. Distances may also be calculated externally, and may be read from a UNIX pipe, allowing clustering of large datasets where the full distance matrix would be inconveniently large to store in memory. The OptimOTU bioinformatics pipeline includes a full workflow for paired-end Illumina sequencing data that incorporates quality filtering, denoising, artifact removal, taxonomic classification, and OTU clustering with OptimOTU. The OptimOTU pipeline is developed for use on high performance computing clusters, and scales to datasets with millions of reads per sample, and tens of thousands of samples.

¿Como trabajan con sus archivos de grabación de audio? si son archivos enormes a mi me gusta un poco de ayuda, normalmente analizo con algún algoritmo de trasients, beats, onsets para poder hacer cortes mas precisos, luego con un algoritmo de clustering eliminar esos segmentos de audio que se parecen demasiado, y organizarlos por similitudes. Hice una versión con GUI de esa herramienta para compartirla.

When you are reading up on deploying #databases the most frequent piece of drive-by advice is "don't use networked storage". Before you can ask the smart ass what they suggest instead in an age of #virtualization #clustering and #kubernetes they have already disappeared into the ether. Not an easy nut to crack, especially in a #homelab. This guy has an actual workable answer: medium.com/@camphul/cloudnativ using #longhorn and #cloundnativepg and some smart sheduling. #k8s #selfhosting

Medium · CloudNative-PG in the homelab with Longhorn - Luca Camphuisen - MediumBy Luca Camphuisen
Continued thread

Two great sources to explore the use of pan and zoom techniques in data visualization:

1. Shneiderman's "information-seeking mantra" emphasizes the importance of overview, zoom, and filter in exploring data clusters.
infovis-wiki.net/wiki/Visual_I
2. "Zoomland" (de Gruyter, 2023), edited by Armaselu and Fickers, offers insights on zooming in data visualization.
degruyter.com/document/doi/10.

infovis-wiki.netVisual Information-Seeking Mantra - InfoVis:Wiki

The paper "Interpretable Clusters for Representing Citizens’ Sense of Belonging through Interaction with Cultural Heritage" has been published in the ACM Journal of Computing and Cultural Heritage.

📝 Title: Interpretable Clusters for Representing Citizens’ Sense of Belonging through Interaction with Cultural Heritage

🌐 Index Terms: technology for #culturalheritage, #clustering
#affectivecomputing; social cohesion; #museum interaction
Full paper: doi.org/10.1145/3665142
@academicchatter

Continued thread

I’ve also gone deep into #clustering algorithms. I’m coming to the conclusion that K-Means has assumptions that don’t work well for me, and probably usually don’t work. Some big ones:

- clusters are the same size
- the number of clusters is known

I’m clustering posts by embedding (text content/meaning). Most of the time I don’t know how many posts there are, and my feed is too dynamic for these assumptions to hold.

I’m learning about other algorithms, like DBSCAN