arisuchan    [ tech / cult / art ]   [ λ / Δ ]   [ psy ]   [ ru ]   [ random ]   [ meta ]   [ all ]    info / stickers     temporarily disabledtemporarily disabled

/Δ/ - diy/projects

make it. create it. do-it-yourself. hardware, software, and community projects.
Name
Email
Subject
Comment

formatting options

File
Password (For file deletion.)

Help me fix this shit. https://legacy.arisuchan.jp/q/res/2703.html#2703

Kalyx ######


File: 1504479006217.jpg (30.53 KB, 800x600, SerialExperimentLainTachib….jpg)

 No.528

Hello there,
After few years of browsing the web I've noticed that I often visit the same websites every 2 months or so. For example, I can sometimes forget how to initialise an array in Python because I haven't touch the language in 3 months and so I just Google it and open a StackOverflow related question.

However, this is an inefficient way to browse the web.

I think it is possible to improve this logic by:
- Caching the relevant information on a "personal web" server
- Building a smarter way to order these data
- Building a smarter way to access these data

For now, all I do is that I type my request on Google and my long-time-ago-visited website pops up amongst other less irrelevant websites. Then, when I click the website, it takes time to download it again. Once it's loaded, I've to look for the piece of information I am after.

The idea of this personal web is to keep only the relevant information by discarding the style (CSS) and all the menu buttons etc to only keep a black background and a white text with the information clearly displayed on it.
Then, I thought it might be a good thing to implement some kind of service that checks if the formatted informations is still the same of the website and hasn't been updated and if so, update it too.

So far, to select which relevant informations to keep and cache, I thought of developing a Google Chrome extension that would send the information selected (mostly text, but also image and misc files) to the server to proceed to the smart ordering phase.
Then to order the received data, I thought about using some sort of Machine Learning algorithm to learn which data goes into which category. I know about Deep Neural Network and how to setup them so I could do that.
Finally, we've got to be able to access those data. To do so we should minimise the amount of mouse movement as this would slow down the request typing (so no point'n'click system). Instead, I thought about implementing SPARQL, which is IMO a rather smart way to request data and it'd fit this project requirement quite well.

If you know of any better technology to do this, and also if you have suggestion about the smart ordering part using Deep Learning I'd really appreciate.

 No.531


 No.538

counterpoint:

squid cache. did this on dialup in another lifetime. It worked very well to reduce traffic, and sped up browsing immensely.

if you're not into having the 12GBP per year cost and having the data on someone elses boxen, you might want to look into this.
http://www.squid-cache.org/

 No.553

File: 1505069532772.png (1.35 MB, 3480x2280, screenshot.png)

Ok, so I've been working hard on this during the past week and the idea evolved.
So I still think it'd be good to develop some cache system in the future but that's not a priority.
The main idea is to allow a fast access to ordered information. To do that, I've developed a website that allow anyone to research amongst other people's entries on a particular topic and see only the informations these people found to be relevant. Each entry also include the source URL from which the information is from. For now the website is accessible through https://wired.sa.muel.coffee/ .
Also I'm currently working on a Google Chrome extension that allow users to:
- Add new entries simply by selecting text, images and links from pages and adding them the new entry, then allow them to manually edit it if they want to and send it to the entry database. (not implemented yet, WIP)
- Highlight information on the current page that is in common with entries which got this same webpage as a source (or Origin). (implemented, see the screenshot)
- Search into the entries database directly from the extension's popup (implemented, see the screenshot)

I've changed my mind about the way to access information: I think the good old search bar is the best, but the problem is that there are too much informations for the same search terms because Google and the web in general is full of commercial ads. So by creating a search engine that only reference useful knowledge (relative to the authors), this can be made more efficient.

However I'm still looking for a way to filter the content, avoid people from posting ads. For now I think of using a machine learning algorithm to detect ad posts and also check if the content match - even partially - with the source (so that they'd have to create a real webpage to post ad content). So, if any of you have even the slightest idea on how to sort spam posts solely based on their content, feel free to reply.

Finally, I'm still thinking about clustering the articles, to do that I first thought of using Linear Component Analysis but it needs to know the classes (clusters) labels, then I thought of K-mean but you also need to specify the number of clusters, so for now I'm planning on using DBSCAN clustering (on a friend's advice). Once the entries will be numerous enough (i.e. when I'll have finish the extension and open a "beta" stage so that people could start sending entries), I'll use the clustering to link each entry to other entries in the same cluster, so when people browse the website and find an entry on a specific topic, they can easily access related entries and discover associated concepts they didn't know about.

Anyway, if you want to talk about this project more in depth, check out arisuchan's Discord, I'm usually on it, just link this post and I'll show up.


I named the project "The Wired", because every entry is connected to another.

 No.587

>>553
That's really awesome.
Some way to navigate through entries as a kind of graph of relevance would be useful. At least some way to navigate when you don't know what you are looking for.

 No.596

Very cool, though I would discard the lain font and use something more readable for informational text. Keep it for the main title for style maybe.

Something like a proper monospaced font would still look cool and cyber but be more readable. Or just go with a nice sans serif.

Cheers

 No.597

I just run it through httrack . I have a web archive organized by topics and subtopics. i.e. networking > routing and switching > pagesineed.html

then can type patterns into mlocate or grep and it will bring them right up

 No.598

>>553
This is a very cool project and I'm all on board and all, but isn't this just a wiki?

A wiki where content is arranged by task rather than subject is a good idea though.

 No.599

>>553
Is there some other way to add the functionality you want than a chrome extension?

What about people who don't use chrome?



[Return] [Go to top] [ Catalog ] [Post a Reply]
Delete Post [ ]