The Transformer Architecture: This diagram, from the original "Attention Is All You Need" paper by Vaswani et al. (2017), illustrates the key components of the Transformer model, including the Encoder and Decoder stacks, Multi-Head Attention layers, and Positional Encoding.
This repository documents my learning journey into the world of AI and Large Language Models (LLMs), with a specific focus on applications in Finance and Consultancy. The structure of this journey is based on the "AI/LLM Learning Plan for Finance & Consultancy Roles" document.
Here, I will share my notes, projects, and implementations as I progress through the learning plan.
Learning Objectives:
- Master transformer architecture from first principles
- Understand attention mechanisms mathematically
- Grasp positional encodings, layer normalization, and residual connections
Projects & Learnings:
- Transformers from Scratch: This project implements the Transformer architecture from the ground up, as detailed in the seminal paper "Attention Is All You Need". The implementation is done using TensorFlow and provides a detailed, step-by-step guide to understanding the core components of a Transformer. This notebook serves as a practice guide for the concepts covered in the Sequence Models course from the DeepLearning.AI Natural Language Processing Specialization on Coursera.
- Jupyter Notebook: Transformers-from-scratch.ipynb
- Requirements: The
requirements.txtfile in thePhase 1folder contains the necessary packages for this notebook. - Key Concepts Covered:
- Positional Encodings
- Masking (Padding and Look-Ahead)
- Self-Attention (Scaled Dot Product Attention)
- Encoder (Encoder Layer and Full Encoder)
- Decoder (Decoder Layer and Full Decoder)
- Transformer Assembly
Learning Objectives:
- Implement an Encoder-only Transformer model for a classification task.
- Apply the model to a real-world financial dataset.
- Evaluate the model's performance and identify areas for improvement.
Projects & Learnings:
- Mini-Transformer for Financial Sentiment Analysis: This project involves building a smaller version of the Transformer model, using only the Encoder layer from the previously developed
transformers_model.py, to perform sentiment analysis on financial news headlines. The model is trained on thefinancial_phrasebankdataset from HuggingFace.- Jupyter Notebook: Mini-Transformer.ipynb
- Key Concepts Covered:
- Using a pre-built Transformer Encoder.
- Sentiment analysis as a classification task.
- Data preprocessing for financial text.
- Training and evaluating a Transformer-based model.
- Analyzing model performance and suggesting improvements.
Learning Objectives:
- Understand different pre-training objectives (MLM, CLM, etc.)
- Master fine-tuning strategies and when to use each
- Learn about parameter-efficient fine-tuning (LoRA, Adapters)
Projects & Learnings:
- Fine-tuning BERT for Financial Text Classification: This project explores two methods for fine-tuning a pre-trained BERT model for a financial sentiment analysis task: full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) using Low-Rank Adaptation (LoRA).
- Jupyter Notebook: Fine_tune_BERT.ipynb
- Performance Comparison Report: performance_comparison_report.md
- Key Concepts Covered:
- Full fine-tuning of a pre-trained BERT model.
- Parameter-Efficient Fine-Tuning (PEFT) with LoRA.
- Comparison of performance, training time, and trainable parameters between the two methods.
- Cost analysis of the two fine-tuning approaches.
- Results: The project demonstrates that LoRA can achieve performance comparable to full fine-tuning with significantly fewer trainable parameters, leading to faster training times and lower computational costs. The following image summarizes the comparison:
Learning Objectives:
- Apply fine-tuning to complex financial NLP task
- Master evaluation metrics for NLP models
- Understand overfitting prevention in fine-tuning
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Master financial text preprocessing and domain-specific challenges
- Understand financial entity recognition and relationship extraction
- Learn about financial document summarization and key information extraction
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Master question-answering systems for financial documents
- Understand retrieval-augmented generation (RAG) systems
- Learn about conversational AI for financial advisory
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Understand GPT family evolution (GPT-1 to GPT-4+)
- Master prompt engineering techniques and best practices
- Learn about in-context learning and few-shot prompting
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Learn LLM deployment strategies and optimization
- Understand model serving, caching, and scaling
- Master monitoring and evaluation of LLM applications
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Integrate all learned concepts into a comprehensive project
- Create professional documentation and presentation materials
- Optimize existing projects for maximum impact
Projects & Learnings:
- (Add your notes and project links here)
Learning Objectives:
- Master technical interviews for AI/ML roles
- Understand business case studies relevant to finance/consulting
- Build industry connections and personal brand
Projects & Learnings:
- (Add your notes and project links here)

