Quick notes while reading The AI Pocket Book, Emmanual Maggiori
A good enough book to learn the necessary keywords for future explorations.
LLMs are autoregressive. They're designed to produce a single extra piece of content based on previous content
autocomplete on cracks!
Difference between LLM and LLM wrapper
LLMs -> GPT-4, GPT-4o, Sonnet 4
LLM wrapper -> ChatGPT, Claude
To generate full sentence, LLM wrapper use LLM to generate 1 word, then attached that word to initial prompt, then used the LLM again to generate 1 more word
Hello -> LLM -> Cruel
Hello Cruel -> LLM -> World
Hello Cruel World -> LLM -> <|end of text|>
Retrieval-augmented Generation (RAG)
basically "augmenting" LLM by adding more context like internal documents (external knowledge base)
popular approach to create in-house chatbot that's customize to specific needs
Concept of token
each element in vocab is a token. Don't confuse with "words" eg;
common words (fish, dog)
common pieces of words (ish)
common latin characters (a, b, c)
LLM don't read raw text. Instead LLM wrapper will first convert the input prompt to list of integers
Embeddings
most common ways on on how LLM represent meaning from this token
this is why most LLM struggles to correctly analyze the individual letters in a word (how many r in strawberry)
Transformer architecture
Attention is all you need -> start from this paper (2017)
The methodology that powers current LLM. Previously LSTM
CNN (Conventional Neural Network) are common architecture to process other type of data like image
CNN + Transformers is use to create multimodal AI like Diffusion
Hallucinations
confidently wrong outputs generated by AI
what? unsatisfactory output by AI based on these characteristics; incorrect, overconfidence & happen in unpredicatable ways
why? inedequate world models, misaligned objectives
can be mitigated by prompt engineering techniques
always keep hallucination in mind