See https://email@example.com/T/#u for more context.
@Natris1979 @brainwane Full text indexing of all content is certainly needed down the line. But "only" full text indexing README is a very interesting idea that I don't think we have explored in the past. Maybe it's something it could just fit in our current indexing infrastructure for metadata. Thanks for raising it!
@brainwane @Natris1979 on metadata search, for sure. Right now we index "package metadata" from a relatively small subset of package formats out there. We're working to extend coverage to other formats, as well as to crawl metadata from forges (e.g., github descriptions, tags, etc.). Search for "metadata" here: https://docs.softwareheritage.org/devel/roadmap/roadmap-2022.html Code contributions are welcome on that front (and the tech entry barrier should be fairly low).
See https://mastodon.xyz/web/@zacchiro/109038129028072927 for a more detailed answer on full text search plans.
Full-text search is part of our technical roadmap, but the resources to deploy that at our scale are very significant and we don't have them right now.
The archive size is, in fact, 1 PiB (the 11 TiB mentioned above are just for the graph structure of the archive, not the actual source code files) and a decent fulltext index will be roughly the same size.
People and/or companies interested in helping us out with this are welcome!
Now that the #ICSE2023 deadline has passed, if you are a junior academic in software engineering (or you know one), here's a reminder that we're hiring a tenured associate professor in the field. Info in this previous toot: https://mastodon.xyz/@zacchiro/108691392837627884 Deadline is only ~19 days away.
I finished a 5 part series on how to run an e-mail server in 2022 with all the DKIM/SPF/DMARC stuff working. It's not a simple HOWTO series, but it does explain all the moving parts. So if you're interested to learn a bit about modern e-mail - you're welcome! Starts here: https://jan.wildeboer.net/2022/08/Email-0-The-Journey-2022/
« The Toxic Culture of Rejection in Computer Science » – ACM SIGBED: https://sigbed.org/2022/08/22/the-toxic-culture-of-rejection-in-computer-science/
My team at @telecomparis is hiring a tenured associate professor of #SoftwareEngineering for Safe and Secure Systems.
Deadline: 25 September 2022.
YES! I'm SO EXCITED to say that the Spritely Institute (of which I'm CTO) just got a substantial amount of multi-year funding from the Filecoin Foundation for the Decentralized Web! https://spritely.institute/news/ffdw-support-announcement.html
We also just put this lovely blogpost giving a tour of the Spritely Institute's technology! https://spritely.institute/news/blast-off-spritely-institutes-tech-tour.html
This is really huge! I'll expand more on what it means below in this thread!
@emacsen the beancount mailing list is amazing (although on Google Groups...). The IRC channel not so active.
Computer Science full professor at Télécom Paris, Polytechnic Institute of Paris. Co-founder & CTO Software Heritage. Previously: Debian, Open Source Initiative board.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!