in

A Unix-style personal search engine and web crawler for your digital footprint


Background
Thesis
Design
Architecture
Data Schema
Workflows
Document Storage
Shut up, how can I use it?
Notes
Future
Inspirations

articles, podcasts, and other stuff, I forget things all the time.

architecture
Apollo’s client side is written in Poseidon. The client side interacts with the backend via a REST-like API which provides endpoints for searching data and adding a new entry.

The backend is written in Go and is composed of a couple of important components

  1. The web server which serves the endpoints
  2. A tokenizer and stemmer used during search queries and when building the inverted index on the data
  3. A simple web crawler for scraping links to articles/blog posts/YouTube video
  4. The actual search engine which takes a query, tokenizes and stems it, finds the relevant results from the inverted index using those stemmed tokens
    then ranks results with TF-IDF
  5. A package which pulls in data from a couple of different sources – if you want to pull data from a custom data source, this is where you should add it.

read.amazon.com and a readwise extension to download the exported highlights for a book. I put any new book JSON files in a kindle folder in the outer directory and every time the inverted index is recomputed, the kindle file takes any new book highlights, integrate them into the main kindle.json file stored in the data folder, then delete the old file.

  1. this to install it.
  2. Navigate to the root directory of the project: cd apollo .
    Note since Apollo syncs from some personal data sources, you’ll want to remove them, add your own, or build stuff on top of them. Otherwise the terminal wil complain if you attempt to run it, so:
  3. Navigate to the pkg/apollo/sources in your preferred editor and replace the body of the GetData function with return make(map[string]schema.Data)
  4. Create a folder data in the outer directory
  5. Create a .env file in the outermost directory (i.e. in the same directory as the README.md) and add PASSWORD=<val> where <val> is whatever password you want. This is necessary for adding or scraping the data, you’ll want to “prove you’re Amir” i.e. authenticate yourself and then you won’t need to do this in the future. If this is not making sense, try adding some data on apollo.amirbolous.com/add and see what happens.
  6. Go back to the outer directory (meanging you should see the files the way GitHub is displaying them right now) and run go run cmd/apollo.go in the terminal.
  7. Navigate to 127.0.0.1:8993 on your browser
  8. It should be working! You can add data and index data from the database
    If you run into problems, open an issue or DM me on Twitter

As a side note, although I want others to be able to use Apollo, this is not a “commercial product” so feel free to open a feature request if you’d like one but it’s unlikely I will get to it unless it becomes something I personally want to use.

Inspirations

GitHub

https://github.com/amirgamil/apollo




Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

GIPHY App Key not set. Please check settings

Working With Folders & Files In Ruby

Fresh notification on Puducherry civic polls issued; Speaker calls for emergency meet thumbnail

Fresh notification on Puducherry civic polls issued; Speaker calls for emergency meet