@Natris1979 @brainwane Full text indexing of all content is certainly needed down the line. But "only" full text indexing README is a very interesting idea that I don't think we have explored in the past. Maybe it's something it could just fit in our current indexing infrastructure for metadata. Thanks for raising it!

@brainwane @Natris1979 on metadata search, for sure. Right now we index "package metadata" from a relatively small subset of package formats out there. We're working to extend coverage to other formats, as well as to crawl metadata from forges (e.g., github descriptions, tags, etc.). Search for "metadata" here: docs.softwareheritage.org/deve Code contributions are welcome on that front (and the tech entry barrier should be fairly low).

@Natris1979 @brainwane just on this one, the archive size if 1 PiB, not 11 TiB (which is just the archive "structure" in terms of commits and source code trees).

See mastodon.xyz/web/@zacchiro/109 for a more detailed answer on full text search plans.

@brainwane @Natris1979 @SWHeritage thanks for the highlight!

Full-text search is part of our technical roadmap, but the resources to deploy that at our scale are very significant and we don't have them right now.

The archive size is, in fact, 1 PiB (the 11 TiB mentioned above are just for the graph structure of the archive, not the actual source code files) and a decent fulltext index will be roughly the same size.

People and/or companies interested in helping us out with this are welcome!

#GNU #Guix fête ses 10 ans du 16 au 18 septembre 2022 avec un programme impressionnant: 10years.guix.gnu.org

Vous ne connaissez pas encore GNU Guix? Plus d'excuse 👉 guix.gnu.org/fr/

Second keynote at + , by @vlfilkov on the sustainability of FOSS projects (and communities).

Now that the deadline has passed, if you are a junior academic in software engineering (or you know one), here's a reminder that we're hiring a tenured associate professor in the field. Info in this previous toot: mastodon.xyz/@zacchiro/1086913 Deadline is only ~19 days away.

Inquiries welcome!

if you use for random sampling large files without having to materialize permutations (as with shuf), I've added just added a new flag -s/--seed so that you can make your sampling repeatable, e.g., in replication packages: gitlab.com/zacchiro/stocat

I finished a 5 part series on how to run an e-mail server in 2022 with all the DKIM/SPF/DMARC stuff working. It's not a simple HOWTO series, but it does explain all the moving parts. So if you're interested to learn a bit about modern e-mail - you're welcome! Starts here: jan.wildeboer.net/2022/08/Emai

@rdg @civodul FWIW there are scientific journals that work exactly that way, like PeerJ

It's called the Metaverse because it has mostly been Met with Averse reactions

My team at @telecomparis is hiring a tenured associate professor of for Safe and Secure Systems.

Keywords: , , mining sw. repositories, empirical sw.eng., .

Details: institutminestelecom.recruitee

Deadline: 25 September 2022.

YES! I'm SO EXCITED to say that the Spritely Institute (of which I'm CTO) just got a substantial amount of multi-year funding from the Filecoin Foundation for the Decentralized Web! spritely.institute/news/ffdw-s

We also just put this lovely blogpost giving a tour of the Spritely Institute's technology! spritely.institute/news/blast-

This is really huge! I'll expand more on what it means below in this thread!

@zacchiro @LiberalArtist @TheRegisterBot @fsfe We have it on our radar but are still discussing the best alternative. Please feel free to participate: github.com/fsfe/reuse-docs/iss

@LiberalArtist @TheRegisterBot @fsfe I wasn't aware of this recommendation and I think it should be rectified.

@emacsen the beancount mailing list is amazing (although on Google Groups...). The IRC channel not so active.

Show older

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!