OpenAI API tutorial: How to use AI prompt chaining (original) (raw)
There's much ado about AI in the enterprise, but to truly be a more efficient and productive developer with it, you must learn how to make AI models do what you want.
This tutorial explains how to use the OpenAI API to synthesize information from a variety of models into a single, useful response. To accomplish that, we'll tackle the following tasks:
- Ask the AI model to compile and return a list of animals mentioned in the William Shakespeare plays Othello and The Winter's Tale.
- Incorporate that response data into a second prompt for another AI model, to create an illustration with the animals in the list and in a selected graphic rendering style, such as science fiction or realism.
All the concepts and techniques discussed are demonstrated in a Node.JS project stored in a GitHub repository.
We'll start by providing a brief overview of how to work with the OpenAI API.
Working with the OpenAI API
The OpenAI API provides artificial intelligence services to developers at a programming level. Developers can interact with the OpenAI API in code in the same way that end users interact with the ChatGPT website, which is also published by OpenAI.
With the OpenAI API, developers can perform numerous tasks, such as natural language processing, language translation, code generation and completion, and image generation. For this article, we'll use the OpenAI API to research some specific data, and generate an image based on the results of that research.
Figure 1 shows the workflow of the demonstration use case discussed in this tutorial.
Figure 1. The workflow of the demonstration use case described in this article.
The first step to use the OpenAI API is to register at its website, provide some profile information and receive an API key that allows your code to access the API.
Figure 2. Creating an API key in the OpenAI website.
The OpenAI API is not a free service. You must provide credit card information as part of the setup process. You will be charged according to the volume and complexity of calls made to the API. The cost to experiment with the OpenAI API is minimal, however. Typical experimentation will require a small investment of about $10.
Figure 3. Users are billed for using the OpenAI API.
Once you get an API key and provide payment information, you're ready to program using the various models that are accessible via the OpenAI API.
Working with AI models
As previously mentioned, developers use OpenAI to write code using one or more AI models provided by the API. For example, the GPT series model supports language processing, so developers can use it to pose queries or execute instructions in a natural language format, such as "show the sum of the numbers two plus two." The Dall-E model is used to generate images, while the Codex model is used to execute code completion.
Developers work with the OpenAI API via language-specific libraries and SDKs. Once you install an OpenAI API library in their programming environment, you select a particular model to use for a particular task. Figure 5 shows some of the models available in the Node.JS OpenAI npm package.
Figure 4. Selecting a model when coding the OpenAI package in Node.js.
The demonstration code, discussed in the following section, uses two models. One is GPT-4 Turbo, which we'll use to discover the animals mentioned in the plays Othello and The Winter's Tale. The other model is Dall-E 3, which will generate the image that contains the animals in the list. We'll use both models in a chained manner.
Programming using a model chain
The code block in Figure 5 shows the Node.js code for the function generateImage(selectedStyle)
. The function encapsulates all the logic that directs the OpenAI models to discover the list of animals mentioned in Shakespeare's plays and then generate an image that includes those animals. (Again, all of the code for this is in the aforementioned GitHub repository.)
Notice that the calls to models in the OpenAI API are made in a chained manner. The first two calls, made to the gpt-4-turbo
model, are at Line 17 to get the animals in the play and Line 27 to transform the information in JSON format into a simple list of animals. Line 40 calls to the dall-e-3
model to generate the image that includes those animals according to a specific graphical style, and the response is the URL within the OpenAI ecosystem that references the generated image.
Also notice that the calls to the models use natural language prompts, created at Lines 13, 24 and 37. A key feature of the OpenAI API is that it accepts programming statements expressed as natural language. This novel approach to software development will have significant implications as the paradigm matures.
Figure 5. This code prompts the OpenAI models for a list of animals and generates an image based on those animals.
Lastly, notice that a prompt for a particular call to the OpenAI API is declared within a JSON object that is passed as a parameter to the particular API method. That JSON object's configuration depends on the API function called. For example, this call at lines 17-21 declares the prompt as the value of the messages
property of the JSON parameter:
let response = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [{"role": "user", "content": query_1}],
max_tokens: 300
});
While lines 40-45 use the prompt
property to declare the prompt:
const imageResponse = await openai.images.generate({
model: 'dall-e-3',
prompt: imagePrompt,
n: 1,
size: "1024x1024"
})
Also, notice that the model each API call is to use is declared in the model property.
Running the code
The demonstration code uses the npm package readline to facilitate console interaction with the application in a terminal window. The following shows the first console interaction, asking the user to declare a graphic style for the image that will be generated.
Choose an image style according to its number:
1. science fiction
2. anime
3. fantasy
4. abstract
5. realism
Enter your choice (1-5): 1
You selected science fiction, generating image...
Next is the result of prompting the GPT-4 Turbo model in OpenAI API:
The animals are:
Here is the list of all the animals mentioned in the plays *Othello* and *The Winter's Tale* according to the JSON object provided:
**Othello:**
1. Horse
2. Cat
3. Goat
4. Monkey
5. Wolf
6. Bear
7. Flies
8. Toad
**The Winter's Tale:**
1. Sheep
2. Lamb
3. Bear
4. Camel
5. Deer
6. Fish
7. Camelot (Note: "Camelot" might be a misentry, as it is generally known as a castle associated with King Arthur rather than an animal. This could be an error in the original data.)
8. Sparrow
9. Lark
10. Crow
11. Calf
12. Ox
13. Rook
14. Ule (Note: "Ule" might be a typographical or transcription error unless it refers to a specific context within the play that is not commonly known. More commonly, this might be referring to an owl.)
This comprehensive list from the JSON object details the animals depicted, adding thematic layering and symbolism within these Shakespearean plays.
Here is the result of prompting the dall-e-3
model to create the image based on the selected graphical style and the list of animals discovered in the plays Othello and The Winter's Tale:
That URL published by the OpenAI API is temporary and will expire after a few hours. To keep the image available for long-term use, save it to your local machine.
And last but certainly not least, Figure 6 is the illustration of the list of data, rendered in a science-fiction style.
Figure 6. The illustration in the science fiction style, created by the OpenAI API using the Dall-E 3 model that includes a list of animals from a previously defined list.
The pluses and minuses of the OpenAI API
Working with the OpenAI API is straightforward. Accessing the API requires minimal setup time and very little out-of-pocket expense.
However, programming with the OpenAI API is imperfect. At this time, the result of a complex prompt is not deterministic, which means the same complex prompt returns a different result at different times. For example, the prompt "show me the sum of two plus two" will always return four, but the prompt "show me all the animals mentioned in the Shakespeare plays Othello and The Winter's Tale" returns results that are significantly different. This makes sense given the mechanics of prompt processing.
Keep in mind, however, that AI technology is still maturing. In the early days of voice recognition, users trained the recognition engine to recognize the voice of the particular user, but today voice recognition engines convert audio spoken by any native speaker into text without training. We can expect the same degree of improvement in artificial intelligence technology in general, and of OpenAI API in particular.
Using the OpenAI API opens up a score of new possibilities in software development. The power of natural language interaction, along with the scope of intelligence each model provides, enables developers to quickly create versatile and powerful applications. Chaining AI prompts and models together increases the power of such applications.
The OpenAI API is a game-changer that puts the power of artificial intelligence at the fingertips of all developers. This opens up a new path for computer programming that will dramatically benefit developers and those they serve.
Bob Reselman is a software developer, system architect and writer. His expertise ranges from software development technologies to techniques and culture.