Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleThe available models

GPT : Generative Pre-trained Transformer 

Expand
titleGPT-4
  • Latest model
  • With broad general knowledge and domain expertise, GPT-4 can follow complex instructions in natural language and solve difficult problems with with greater accuracy.
  • is more creative and collaborative than ever before. It can generate, edit, and iterate with users on creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style.

  • Following the research path from GPT, GPT-2, and GPT-3, the deep learning approach leverages more data and more computation to create increasingly sophisticated and capable language models
  • 6 months were spent making GPT-4 safer and more aligned.

  • GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.

  • Price list for GPT-4  (Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words.)

    ModelPromptCompletion
    8K context$0.03 / 1K tokens$0.06 / 1K tokens
    32K context$0.06 / 1K tokens$0.12 / 1K tokens
  • GPT-4 models

    LATEST MODELDESCRIPTIONMAX TOKENSTRAINING DATA
    gpt-4More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration.8,192 tokensUp to Sep 2021
    gpt-4-0314Snapshot of gpt-4 from March 14th 2023. Unlike gpt-4, this model will not receive updates, and will only be supported for a three month period ending on June 14th 2023.8,192 tokensUp to Sep 2021
    gpt-4-32kSame capabilities as the base gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration.32,768 tokensUp to Sep 2021
    gpt-4-32k-0314Snapshot of gpt-4-32 from March 14th 2023. Unlike gpt-4-32k, this model will not receive updates, and will only be supported for a three month period ending on June 14th 2023.32,768 tokensUp to Sep 2021


    For many basic tasks, the difference between GPT-4 and GPT-3.5 models is not significant. However, in more complex reasoning situations, GPT-4 is much more capable than any of our previous models.

  • Limitation:

          GPT-4 is currently in a limited beta and only accessible to those who have been granted access. In order to use this API we need to join the waitlist to get access when capacity is available.

Expand
titleGPT-3.5

GPT-3.5 models can understand and generate natural language or code. Our most capable and cost effective model in the GPT-3.5 family is gpt-3.5-turbo which has been optimized for chat but works well for traditional completions tasks as well.

LATEST MODELDESCRIPTIONMAX TOKENSTRAINING DATA
gpt-3.5-turboMost capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.4,096 tokensUp to Sep 2021
gpt-3.5-turbo-0301Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates, and will only be supported for a three month period ending on June 1st 2023.4,096 tokensUp to Sep 2021
text-davinci-003
  • text-davinci-003 is an advanced language model developed by OpenAI.
  • Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text.
4,097 tokensUp to Jun 2021
text-davinci-002Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning4,097 tokensUp to Jun 2021
code-davinci-002

Optimized for code-completion tasks

Now deprecated

8,001 tokensUp to Jun 2021

Models referred to as "GPT 3.5"

GPT-3.5 series is a series of models that was trained on a blend of text and code from before Q4 2021. The following models are in the GPT-3.5 series:

    1. code-davinci-002 is a base model, so good for pure code-completion tasks
    2. text-davinci-002 is an InstructGPT model based on code-davinci-002
    3. text-davinci-003 is an improvement on text-davinci-002
    4. gpt-3.5-turbo-0301 is an improvement on text-davinci-003, optimized for chat


Recommendation to use gpt-3.5-turbo over the other GPT-3.5 models because of its lower cost.

Experimenting with gpt-3.5-turbo is a great way to find out what the API is capable of doing. After you have an idea of what you want to accomplish, you can stay with gpt-3.5-turbo or another model and try to optimize around its capabilities.

Note: OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain.



Expand
titleGPT-3
  • GPT-3 models can understand and generate natural language.
  • These models were superceded by the more powerful GPT-3.5 generation models.
  • However, the original GPT-3 base models (davinci, curie, ada, and babbage) are current the only models that are available to fine-tune.
  • Fine-tuning mean to build and train our own data from one of GPT-3 Models
  • Prices:

         

Expand
titletext-davinci-002 vs text-davinci-003
  • Enhanced Capabilities of davinci-text-003

The main difference between davinci-text-002 and davinci-text-003 is that the latter has been trained on a larger dataset, allowing it to generate more accurate results. Additionally, text-davinci-003 can better understand natural language instructions and produce more detailed responses. It also has improved capabilities for summarizing long documents and generating coherent paragraphs from multiple sources. Furthermore, OpenAI claims that text-davinci-003 can generate longer content with greater clarity, and engagement.


  • Strengths and Weaknesses of Each Model

The output from davinci-text-002 was a series of short, concise sentences that were easy to read and understand. On the other hand, davinci-text-003 produced longer, more complex sentences with more intricate language. Both outputs had their own unique style and could be used for different purposes depending on the context. It is clear that both algorithms have their strengths and weaknesses when it comes to text generation. Note: ‘003 takes longer than the previous version to process prompts.

  • text-davinci-003 includes the following improvements:

    • It produces higher quality writing. This will help your applications deliver clearer, more engaging, and more compelling content.
    • It can handle more complex instructions, meaning you can get even more creative with how you make use of its capabilities now.
    • It’s better at longer form content generation, allowing you to take on tasks that would have previously been too difficult to achieve.
    Expand
    titleTo illustrate the improvements, check out the following example:
    • text-davinci-002:

    Prompt: "I am trying to write a web application. What backend frameworks could I use, and what are their pros and cons?"

    There are many backend frameworks that could be used for a web application. Some popular one

    • text-davinci-003:

    Prompt: "I am trying to write a web application. What backend frameworks could I use, and what are their pros and cons?"

    1.  Node.js: Node.js is a JavaScript-based runtime environment that is popular for writing backend applications. Pros: Node.js is fast, lightweight, and easy to learn, and it has a large community of developers. Cons: Node.js can be difficult to debug and there is a lack of robust tools for building large-scale applications. 2.  Django: Django is a popular Python-based web framework. Pros: Django is powerful and well-documented, making it easy to get started. It also supports a wide range of databases and APIs, allowing you to easily integrate with other services. Cons: Django can be difficult to scale for large applications and can be slow for certain types of requests.3.  Ruby on Rails: Ruby on Rails is a popular web framework based on the Ruby programming language. Pros: Ruby on Rails is easy to learn and can be used to quickly build powerful web applications. It also has a wide range of built-in features, such as authentication and authorization. Cons: Ruby on Rails can be slow and resource-intensive, and it can be difficult to scale. text-davinci-003 can be accessed through openai API and Playground at the same price as other Davinci base language models.

    Users can access `text-davinci-003` through the Open AI API and playground, at the same price as their other Davinci base language models ($0.0200 / 1k tokens). 

Expand
titleGpt-3.5-turbo vs text-davinci-003
  • OpenAI's new GPT-3.5 Turbo model offers a cost-effective option for many non-chat use cases.
  • While GPT-3.5 Turbo performs well in 0-shot classification and math, Davinci-003 performs slightly better in k-shot classification and may be a better option for those looking for clear, concise responses that get straight to the point.
  • Comparison for different tasks, can be found here
    • Expand
      titleClassification

      From our previous experiments, we concluded that Davinci-003 performed much better than Davinci-002 with 0 shot prompts (92% vs 69.91%), and on par or slightly worse than Davinci-002 on few shot prompts (87.2% vs 91.6%). We observe a similar case with GPT-3.5 Turbo and Davinci-003.

      GPT-3.5 Turbo performs better on 0 shot classification- on an insider trading classification example it achieves 91.72% accuracy, versus 82.02% for Davinci-003.

    • Expand
      titleText Generation

      If you're looking for an AI language model that can generate long, detailed answers to complex questions, GPT-3.5 Turbo might seem like the obvious choice since it's been trained on a massive dataset of human language and produces coherent, contextually-appropriate responses.

      But what if you're looking for a model that can provide clear, concise responses? In that case, you might want to consider using Davinci-003 instead.

      In our experiment, we provided both GPT-3.5 Turbo and Davinci-003 with a set of 30 questions, and asked each model to provide 30 answers. What we found was that GPT-3.5 Turbo’s responses tended to be much longer than Davinci-003's, with many of them exceeding 100 words. In contrast, Davinci-003's responses were much more concise, often consisting of just a few sentences. Given a max tokens of 500, the average response from GPT-3.5 Turbo was 156 words (~208 tokens) nearly twice the amount of Davinci-003, whose responses were an average of 83 words (~111 tokens).

    • Expand
      titleMath

      Notably, GPT-3.5 Turbo is significantly better than Davinci-003 at math. When evaluating by exact match with an answer key, GPT-3.5 Turbo gets the correct numerical answer 75% of the time, while Davinci-003 only gets 61% correct.

    • Overall, GPT-3.5 Turbo performs better than Davinci-003 on most tasks at 10% of the cost - but is much more rambly. You can test out whether GPT-3.5 Turbo is better for your use case through Spellbook - sign up today.

Expand
titleGPT-4 vs. ChatGPT-3.5
  • GPT4 and GPT-3.5-turb API cost comparison and understanding
  • Below you can find a list of references links:
  • Expand
    titleWhat Is GPT 3.5?

    GPT 3.5 is, as the name suggests, a sort of bridge between GPT-3 and GPT-4. OpenAI hasn’t really been particularly open about what makes GPT 3.5 specifically better than GPT 3, but it seems that the main goals were to increase the speed of the model and perhaps most importantly to reduce the cost of running it.

    Interestingly, what OpenAI has made available to users isn’t the raw core GPT 3.5, but rather several specialized offshoots. For example, GPT 3.5 Turbo is a version that’s been fine-tuned specifically for chat purposes, although it can generally still do all the other things GPT 3.5 can.

  • Expand
    titleWhat Is GPT 4?

    OpenAI’s GPT-4 has emerged as their most advanced language model yet, offering safer and more effective responses. This cutting-edge, multimodal system accepts both text and image inputs and generates text outputs, showcasing human-level performance on an array of professional and academic benchmarks.

    When comparing GPT-3 and GPT-4, the difference in their capabilities is striking. GPT-4 has enhanced reliability, creativity, and collaboration, as well as a greater ability to process more nuanced instructions. This marks a significant improvement over the already impressive GPT-3, which often made logic and other reasoning errors with more complex prompts.

    Expand
    titleOther differences:
    • Another key distinction between the two models lies in their size. GPT-3 boasts a remarkable 175 billion parameters, while GPT-4 takes it a step further with a (rumored) 1 trillion parameters.
      • GPT-4is 10 times more advanced than its predecessor, GPT-3.5. This enhancement enables the model to better understand context and distinguish nuances, resulting in more accurate and coherent responses.
      • Furthermore, GPT-4has a maximum token limit of 32,000 (equivalent to 25,000 words), which is a significant increase from GPT-3.5’s 4,000 tokens (equivalent to 3,125 words).


    • While GPT-4 is not perfect, the measures it adopts to ensure safer responses are a welcomed upgrade from that of the GPT-3.5 model. With GPT-3.5, OpenAI took a more moderation-based approach to safety. In other words, some of the safety measures were more of an afterthought. OpenAI monitored what users did and the questions they asked, identified flaws, and tried to fix them on the go.


    • One of GPT-3.5's flaws is its tendency to produce nonsensical and untruthful information confidently. In AI lingo, this is called "AI hallucination" and can cause distrust of AI-generated information. In GPT-4, hallucination is still a problem. However, according to the GPT-4 technical report, the new model is 19% to 29% less likely to hallucinate when compared to the GPT-3.5 model. But this isn't just about the technical report. Responses from the GPT-4 model on ChatGPT are noticeably more factual.


    • A less talked about difference between GPT-4 and GPT-3.5 is the context window and context size. A context window is how much data a model can retain in its "memory" during a chat session and for how long. GPT-4 has a significantly better context size and window than its predecessor model.


    • Another issue is the limitation on the volume of text you can use in a prompt at once. Summarizing long text using GPT-3 typically means splitting the text into multiple chunks and summarizing them bit by bit. The improvement in context length in the GPT-4 model means you can paste entire PDFs at a go and get the model to summarize without splitting it into chunks.


    • GPT-4 has a much larger model size, which means it can handle more complex tasks and generate more accurate responses. This is thanks to its more extensive training dataset, which gives it a broader knowledge base and improved contextual understanding.


    • GPT-4 is better equipped to handle longer text passages, maintain coherence, and generate contextually relevant responses. For this reason, it’s an incredibly powerful tool for natural language understanding applications. It’s so complex, some researchers from Microsoft think it’s shows “Sparks of Artificial General Intelligence” or AGI.


    • But there’s a downside, as with any cutting-edge technology. The significant advancements in GPT-4 come at the cost of increased computational power requirements. This makes it less accessible to smaller organizations or individual developers who may not have the resources to invest in such a high-powered machine.


Expand
titleComparative Table:   

  

Expand
titlePricing comparison

...