For the gory details: we have used #machinelearning, specifically convolutional neural networks (CNN) pretrained on generic image recognition tasks and then adapted using #transferlearning to the problem domain of visual programming language identification. [2/n]
@zacchiro
Do you know of similar software for natural languages?
We are building a database with translations. https://blog.translatescience.org/launch-of-translate-science/
And we would like the adding of new data to the database to be as easy as possible. So it would be great if people could just give the DOI/URL of the original and translation and the system would determine/estimate the languages. Or the system would pre-fill a web form with a reasonably accurate guess and people would normally only have to confirm.
We trained on 300k real-world code snippets from popular #GitHub repositories extracted from #SoftwareHeritage @swheritage, achieving 92% precision and recall. Even more gory details can be found in the #openaccess #replication package. Feedback welcome, enjoy! [3/3]