We really need a FOSS search engine with, and this is important: its own in-house, FOSS crawler
@sir Where do you store the crawlers data?
@sir a search engine that searches only (independent) blogs would be great too
One major problem with using the global yacy network is that you have to decide a cut-off for how long you want to wait for global results and drop slower servers because some use minutes before they respond. That's just too slow. Also, patch is needed to sort results, default is first come first shown.
@sir lets hope spider/ask.moe will free us from this search engine prison. I've been using qwant which seem to make similar promises to ddg, but its also not FOSS which is a shame.
@sir The Gigablast search engine published their source code to a git repository a while back, but it definitely needs an overhaul.
@sir I was literally just working on this! My use-case is that I've contributed lots on GitHub and I want to download all of the repos I've worked on... but I can't get a list of them.
Currently fighting with their GraphQL API, but I'd kill for a "give me a list of all repos where a commit is authored by me" search query.
@christianbundy that's not what I meant. I meant a FOSS search engine for searching the web at large
@sir oh! I haven't looked into those in a while, last I saw I think YaCy was state-of-the-art. If you find anything (or build anything) I'd be happy to test.
@sir 1) https://yacy.net/ - implementation of P2P (peer-to-peer) search engine
2) https://commoncrawl.org/2020/06/may-june-2020-crawl-archive-now-available/ - they provide public index and code: https://github.com/commoncrawl
@_1751015 where can I play with a search engine powered by this data?
@cuniculus @sir YaCy has some niche applications that are interesting. Check the writing here and the comments:
Personal index of curated URLs + eventually sharing the index - IMO it has advantages over a general purpose search engine.
@sir Even if you have a FOSS search engine with a FOSS crawler like what's running on https://yacy.everdot.org/ you'll quickly run into performance issues and economic issues. Going FOSS won't automatically bring in advertisement revenue and that's what Google/Bing/etc actually do, they are advertisement agencies not search engines. That's how they afford thousands of servers. There's free software but there's no such thing as free hardware.
@sir Agree. But, once and again: Maybe this is not so much a F(L)OSS issue but more an issue of handling a large, potentially decentralized / distributed search index at runtime, keeping things available, stable, performant 24x7. Maybe, finally, a situation to understand our current focus on code and code licensing is important but not *all* it takes to have working technology available.....? 🙂
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!