Page Content Capture
remio is designed to capture valuable knowledge from web pages. This guide will explain how remio identifies, filters, records, and structures page content, and how users can interact with the captured information.
1. Automatic Content Capture Process
remio automatically identifies and captures potentially valuable web pages.
Most content is captured by remio, but there is a filtering process to ensure that only useful information is stored.
The following steps outline the capture and filtering process:
Identification and Filtering: Web pages are first identified and filtered based on specific criteria. Useful content is captured, while irrelevant content is filtered out.
Recording and Structuring: Valuable content is recorded and structured into a format called resource, which is parsed content. Key attributes such as URL, title, body (in markdown format), content type, author (optional), screenshots (optional), last modified time, logged time, origin created time (optional), and related links are recorded.
2. Content Filtering Standards
remio uses a series of filters to determine whether or not to capture content from a webpage. And all these filter rules run locally, without third-party service. So you don't need to worry about your browsing history being leaked to external parties.
Filters are applied in the order shown below:
Page Load Status: Pages that do not fully load are filtered out. Pages that load in less than 1 second are not automatically captured.
Pages with HTTP status codes other than 200 or that do not trigger the "DOM Content Loaded" event are also excluded.
URL Filtering: Some certain URLs are filtered out based on keywords. URLs containing terms like "login", "logout", "account", "setting", "auth", "search", "index", "mail", "app", "dashboard", "edit", "permissions", "roles", "order", "pay", and "submit" are generally excluded. These pages are more about basic functions, not content oriented.
Specific sites without any knowledge info are also filtered.
Inappropriate Content: Pages containing inappropriate content are filtered out using keyword scanning, AI image/video detection, and domain blacklists. This includes pages with: Explicit content (adult, gambling, drugs, violence, hate, racism). Potentially pirated or unauthorized content
3. How Captured Content is Used
Unprocessed: User highlights are also included in the "Unprocessed" section, and are directly associated with their source web page. However, the auto-captured content may be not be sorted in "Unprocessed" section for the user to review in current version. You can search or ask directly to use them.
De-dup: If a user highlights a page multiple times, all highlights will be stored together in 1 note.
User Editing: Users can edit and further process the "Unprocessed" content.
Mark as unprocessed/processed: You can mark and switch the status of 1 note between unprocessed & processed.
4. User Interactions with Web Pages
After installing the remio browser extension, users can interact with webpages in 4 ways:
Highlighting: Users can select and highlight specific content on a webpage, for both text & image. Highlighted content is stored in "Unprocessed" section and displayed as a note with webpage source.
The notes are tied to the page’s URL.
Users could cancel highlights on the page, but the highlighted content will remain in the note.Adding Comments: Users can add comments to highlighted content.Comments are shown below their corresponding highlights in the note.
Multiple comments can be added to a single highlight, and cannot be deleted individually.Add to Collection: Users can add an entire webpage into a collection. After thar, the whole page is saved as a note, and added to All Notes. remio will recommend you the most suitable collection for the page. And you can add 1 page into multiple collections by following remio recommendation or mannully.
Add to Favorites: Users can save the whole page to favorites.When a page is added to favorites, the whole page is saved as a "page" note, and added to All Notes.
The system generates a summary of the page and adds it to the Favorites.
5. Information Updates
When a page or file changes, remio will record the changes through fetching or listening methods.
Content that is related to the same topic but updated over time is recorded into a single source. Examples include email threads or frequently updated posts.
By following these steps, remio helps users automatically capture, organize, and review valuable knowledge from the web.