Wednesday, April 19, 2023 - Logseq Knowledge Base

Error converting content: marked is not a function

- Rules
- Don’t check phone or notification of any form unless I’m out of bed.
- Robert Spira knows
- Product Market Fit
collapsed:: true
- @Rahul Vohra's Superhuman Email story
- {{twitter https://twitter.com/rahulvohra/status/1062492954277736448?s=12&t=ouA9TAj95-G6bqAxFZLRKQ}}
- #projectidea
- Rabbit Hole
- AI based deep dive on topics. Remember my search on unix system and it's evolution. It was my first evernote note! a good one. I went down the rabbit hole. Would be great UX if I could just drill down the rabbit hole - high light things I like as I see it - click suggested links or what I see. Then I have a summary in front of me. Wow.
- Tech/Text Splitting
- Text splitting is heart of created embedded db for LLMs. Remember me remunerating that this tokenization of data is 80% of the problem. As this tokenization is the representation of data. Tech/Langchain folks have give some nice text splitting objects and ideas. Their conceptual overview is inspiring:
- When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds, there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What “semantically related” means could depend on the type of text. This notebook showcases several ways to do that.
- At a high level, text splitters work as following:
- Split the text up into small, semantically meaningful chunks (often sentences).
- Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
- Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).
- That means there two different axes along which you can customize your text splitter: a) How the text is split b) How the chunk size is measured