Chat with Github

Notes

GitHub Repository Chat

A Streamlit application that allows you to chat with any GitHub repository using Retrieval-Augmented Generation (RAG).

Features

  • Repository Loading: Load any public GitHub repository by its URL.
  • Intelligent Chat: Ask questions about code, structure, or functionality of the repository.
  • Persistent Context: The application maintains context throughout your conversation.
  • Single Instance Architecture: Optimized to use a single embedchain instance for better performance.

How It Works

This application uses:

  • Embedchain: For creating a knowledge base from GitHub repositories
  • Clarifai: For AI model hosting and embeddings
  • Streamlit: For the web interface
  • DeepSeek-R1-Distill-Qwen-32B: As the large language model backend

The app employs RAG (Retrieval-Augmented Generation) technology to:

  1. Index the repository contents
  2. Retrieve relevant information when you ask questions
  3. Generate informative responses based on the repository's code and documentation

Getting Started

Prerequisites

  • Python 3.8+
  • Streamlit
  • Embedchain
  • GitHub Personal Access Token (for repository access)
  • Clarifai API Token

Installation

# Install required packages
pip install streamlit embedchain clarifai

# Run the application
streamlit run app.py

Environment Setup

The application requires the following environment variables:

  • CLARIFAI_PAT: Your Clarifai Personal Access Token
  • GITHUB_TOKEN: Your GitHub Personal Access Token with repository read access

Usage

  1. Enter a GitHub repository URL in the sidebar (e.g., https://github.com/owner/repo)
  2. Click "Load Repository"
  3. Wait for the repository to be processed (this may take a few moments)
  4. Start asking questions about the repository in the chat interface

Example Questions

  • "What are the main components of this repository?"
  • "How does the authentication system work?"
  • "Explain the data flow in this application"
  • "What dependencies does this project have?"
  • "Show me how error handling is implemented"

Technical Details

The application uses a single Embedchain App instance throughout its lifecycle to improve performance and reduce resource usage. The vector database is stored in a temporary directory and the application maintains session state to preserve your conversation history.

Limitations

  • Very large repositories may take longer to process
  • The app works best with well-documented repositories
  • For repositories with limited documentation, the responses might be less detailed