technology

Creating a Youtube Video Chatbot using OpenAI and ChromaDB

You can download and run this notebook from here.

In this blog post, we will have a look at how can we build AI chat with youtube videos.

The main problem that I wanted to solve was, sometimes I need to watch a long videos to get the information that I need. So, I thought why not build a chatbot that can help me with that. This would not only save my time but also help me to get the information that I need quickly.

Checkout this blog in video

Lets install the pytubefix library. We will be using this library to get the audio from the video.

!pip install pytubefix

I have choosen a video that explains the concept of pointers in C language. I will be using this video to build the chatbot.

Checkout the video here

Lets download the audio from the video.

from pytubefix import YouTube
yt = YouTube("https://www.youtube.com/watch?v=KGhacRRMnDw")

ys = yt.streams.get_audio_only()
ys.download()

Now, lets install the openai library. We will be first transcribing the audio to text using the openai library and then use the LLM to generate coherent responses.

!pip install openai

In the next step, I am setting up the openai api key, in the jupyter notebook. If you are not using jupyter notebook, you can set the api key in the environment variable.

import getpass
openai_key = getpass.getpass("Enter your OpenAI key: ")

!export OPENAI_API_KEY=$openai_key

Now, we do the transcription of the audio to text.

from openai import OpenAI

client = OpenAI(api_key=openai_key)

audio_file = open("./23 C Pointers  C Programming For Beginners.mp4", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
transcription = transcription.text

Now that we have transcribed the audio to text, we need to split the text into smaller chunks. This is because with most of the LLMs we need to keep the context to a certain limit. So, we will split the text into smaller chunks and then use the LLM to generate responses. We will use nltk library to split the text into smaller chunks.

!pip install nltk

import nltk.data
nltk.download('punkt_tab')
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

Using the tokenizer, we will split the text into smaller chunks.

lines = tokenizer.tokenize(transcription)

Lets group the sentences into smaller chunks. Something like 5 sentences in each chunk.

# group the lines array by combining 5 lines into a single string
grouped_lines = []
for i in range(0, len(lines), 5):
    grouped_lines.append(" ".join(lines[i : i + 5]))

Next, we will install ChromaDB, a popular vector database, that is very convinient to use with python. We will be using in-memory mode with ChromaDB.

!pip install chromadb

Now that we have installed the ChromaDB, let initialize the database.

import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

Let's feed the text to ChromaDB. ChromaDB will convert the text to vectors and store them in the database. Although, ChromaDB uses a smaller transformer model, it is still very powerful and can be used for a lot of NLP tasks. But for production use, you might want to use a bigger transformer model.

grouped_lines_metadata = [{"text": line} for line in grouped_lines]
# use the index of the line in the grouped_lines array as the document ID
grouped_lines_ids = [str(i) for i in range(len(grouped_lines))]

collection.add(
    documents=grouped_lines,  # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=grouped_lines_metadata,  # filter on these!
    ids=grouped_lines_ids,  # unique for each doc
)

Let's now define a user query and get the response from the ChromaDB.

question = "how to read a pointers value?"

results = collection.query(
    query_texts=[question],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

results
import json
context = json.dumps(results["documents"], indent=2)

Let's now download LangChain, a library that can be used to generate responses from the LLMs.

!pip install langchain

Also install the openai version of langchain.

!pip install langchain-openai

Lets now prompt the LLM with the user query and get the response. We are using a very basic prompt here. You can use more complex prompts to get better responses.

from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

llm = OpenAI(api_key=openai_key)
prompt = PromptTemplate.from_template(
    """
    You are a C programming instructor. you can only answer from the context that is given to you.
    Question: {question}
    
    Context: {context}
    
    Based on the the question and the context, answer the question. Do not provide any information that is not present in the context.
"""
)

chain = prompt | llm
chain.invoke({"question": question, "context": context})

Conclusion

In this blog post, we saw how can we build a chatbot that can help us with the information in the videos. We used the openai library to transcribe the audio to text and then used the LLM to generate responses. We also used the ChromaDB to store the text and get the responses. This is a very basic implementation of the chatbot. You can use more complex prompts and models to get better responses.

You can download and run this notebook from here.