Application card: GitHub Copilot Chat - GitHub Docs (original) (raw)

Learn how to use GitHub Copilot Chat responsibly by understanding its purposes, capabilities, and limitations.

What is an Application Card?

GitHub’s application and platform cards are intended to help you understand how our AI technology works, the choices application owners can make that influence application performance and behavior, and the importance of considering the whole application, including the technology, the people, and the environment. Application cards are created for AI applications and platform cards are created for AI platform services. These resources can support the development or deployment of your own applications and can be shared with users or stakeholders impacted by them.

As part of its commitment to responsible AI, GitHub adheres to Microsoft's six core principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. These principles are embedded in the Responsible AI Standard, which guides teams in designing, building, and testing AI applications. Application and Platform Cards play a key role in operationalizing these principles by offering transparency around capabilities, intended uses, and limitations. For further insight, readers are encouraged to explore Microsoft’s Responsible AI Transparency Report and GitHub Terms.

1. Overview

GitHub Copilot Chat is a chat interface that lets you interact with GitHub Copilot to ask and receive answers to coding-related questions. GitHub Copilot Chat is available on GitHub.com, in supported IDEs (VS Code, Visual Studio, JetBrains, and Eclipse), on GitHub Mobile, and in Windows Terminal. On GitHub.com and in GitHub Desktop, Copilot can also generate pull request summaries and commit messages—AI-powered overviews of changes made in a pull request or commit.

GitHub Copilot Chat can answer a wide range of coding-related questions on topics including syntax, programming concepts, test cases, debugging, and more. GitHub Copilot Chat is not designed to answer non-coding questions or provide general information on topics outside of coding.

The primary supported language for GitHub Copilot Chat is English.

2. Key terms

The following list provides a glossary of key terms related to GitHub Copilot Chat:

3. Key features or capabilities

The key features and capabilities outlined here describe what GitHub Copilot Chat is designed to do and how it performs across supported tasks.

4. Intended uses

GitHub Copilot Chat can be used in multiple scenarios across a variety of industries. Some examples of use cases include:

5. Models and training data

GitHub Copilot Chat leverages a variety of AI models to power the experience that users see. For a comparison of the models available for Copilot, see AI model comparison. For the full list of supported models, see Supported AI models in GitHub Copilot. For information on where models are hosted, see Hosting of models for GitHub Copilot. To learn more about the data used to train the foundation models behind GitHub Copilot Chat, refer to the linked AI model comparison above and What data has GitHub Copilot been trained on? in the GitHub Copilot FAQ.

Using Bring Your Own Key (BYOK)

When you use Bring Your Own Key with GitHub Copilot Chat, you can connect the chat experience to large language models from supported providers beyond the default Copilot model. Examples of supported providers include Anthropic, AWS Bedrock, Google AI Studio, Microsoft Foundry, OpenAI, OpenAI-compatible providers, and xAI. You add your API key for the chosen provider directly in your Copilot settings.

When BYOK is active:

BYOK empowers your organization to choose the language model that best fits your needs. Note that model performance and safety characteristics are provider-dependent.

6. Performance

GitHub Copilot Chat works by using a combination of natural language processing and machine learning to understand your question and provide you with an answer. This process involves:

  1. Input processing: The user's prompt is pre-processed by the system, combined with contextual information (for example, the current repository, open files, or chat history), and sent to a large language model. User input can take the form of code snippets or plain language.
  2. Language model analysis: The prompt is passed through the language model, which is a neural network trained on a large body of text data. The language model analyzes the input prompt.
  3. Response generation: The model generates a response based on its analysis of the input prompt and the context provided to it. This response can take the form of generated code, code suggestions, or explanations of existing code.
  4. Output formatting: The response is formatted with syntax highlighting, indentation, and other features to add clarity. Depending on the type of question, links to context that the model used—such as source code files, issues, or documentation—may also be provided.

Differences by experience

GitHub Copilot Chat is intended to provide you with the most relevant answer to your question. However, it may not always provide the answer you are looking for. Users of GitHub Copilot Chat are responsible for reviewing and validating responses generated by the system to ensure they are accurate and appropriate.

7. Limitations

Understanding GitHub Copilot Chat's limitations is crucial to determine if it is used within safe and effective boundaries. While we encourage customers to leverage GitHub Copilot Chat in their innovative solutions or applications, it's important to note that GitHub Copilot Chat was not designed for every possible scenario. We encourage users to refer to GitHub Terms as well as the following considerations when choosing a use case:

8. Evaluations

Performance and safety evaluations assess whether AI applications are operating reliably and securely by examining factors like groundedness, relevance, and coherence while identifying the risks of generating harmful content. The following evaluations were conducted with safety components already in place, which are also described in 9. Safety components and mitigations.

Performance and quality evaluations

GitHub Copilot Chat AI features are evaluated using a combination of industry-standard benchmarks (e.g., SWE-Bench) and internally developed evaluation suites. Benchmark tasks are sourced from public open-source repositories and synthetic scenarios; no real user queries or customer code are used. Each evaluation includes multiple independent runs to account for nondeterminism in model outputs. Key metrics include resolution rate (percentage of tasks successfully completed), token efficiency, latency, and tool call reliability. Models are re-evaluated when updates are made and monitored continuously in production via error rates, response latency, and aggregate usage patterns.

Performance and quality evaluation methods

New models for GitHub Copilot Chat undergo a staged evaluation process before deployment. Integrator teams run benchmark suites specific to their surface, testing the model on representative coding tasks such as bug fixes, code generation, and multi-file refactoring. Results are reviewed against established baselines and existing production models. Models must meet or exceed baseline performance across key metrics like resolution rate, token efficiency, and latency, before advancing to the next stage. A cross-functional review board makes a formal go/no-go decision before any model is approved for user-facing deployment.

Risk and safety evaluations

Evaluating potential risks associated with AI-generated content is essential for safeguarding against content risks with varying degrees of severity. This includes evaluating an AI application's predisposition towards generating harmful content or testing vulnerabilities to jailbreak attacks. For GitHub, we conduct performance evaluations, including those which are adapted for coding purposes from Microsoft Foundry:

Evaluation data for quality and safety

Our evaluation data is custom-built to assess AI application performance across key areas of safety and quality, simulating real-world scenarios and risks. We begin by identifying relevant evaluation aspects of concern based on multi-disciplinary research and expert input. These concerns are translated into targeted evaluation objectives and guide formulation of evaluation metrics. For safety, we create adversarial prompts to elicit undesirable or edge-case responses, which are then scored using AI-assisted annotators trained to assess alignment with GitHub’s standards. For quality, we craft rubric-based prompts relevant to scenarios including evaluating retrieval-augmented generation (RAG) applications and agents. Datasets are curated from diverse sources including synthetic and public datasets to simulate real-world user scenarios. Using the curated datasets, both evaluations undergo iterative refinement and human alignment to improve metric efficacy and reliability. This methodology forms the foundation of repeatable, rigorous assessments that reflect how customers use evaluations to build better AI.

Custom evaluations

As part of our product development process, we undertake red teaming to understand and improve the safety of GitHub Copilot Chat. When enabled, input prompts and output completions are run through content filters.

9. Safety components and mitigations

10. Best practices for deploying and adopting GitHub Copilot Chat

Responsible AI is a shared commitment between GitHub and its customers. While GitHub builds AI applications with safety, fairness, and transparency at the core, customers play a critical role in deploying and using these technologies responsibly within their own contexts. To support this partnership, we offer the following best practices for deployers and end users to help customers implement responsible AI effectively.

11. Learn more about GitHub Copilot Chat

For additional guidance on the responsible use of GitHub Copilot Chat, we recommend reviewing the following documentation:

Learn more about responsible AI