Questions tagged [gpt]

97 questions
19
votes
7 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…
user141493
  • 361
  • 1
  • 4
  • 9
18
votes
1 answer

How does an LLM "parameter" relate to a "weight" in a neural network?

I keep reading about how the latest and greatest LLMs have billions of parameters. As someone who is more familiar with standard neural nets but is trying to better understand LLMs, I'm curious if a LLM parameter is the same as a NN weight i.e. is…
slim_wizard
  • 183
  • 1
  • 1
  • 6
14
votes
3 answers

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

It could be that I'm misunderstanding the problems space and the iterations of LLAMA, GPT, and PaLM are all based on BERT like many language models are, but every time I see a new paper in improving language models it takes BERT as a based an adds…
Ethan
  • 243
  • 1
  • 2
  • 6
11
votes
2 answers

Does BERT has any advantage over GPT3?

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…
Bipin
  • 213
  • 1
  • 2
  • 8
10
votes
1 answer

How is GPT able to handle large vocabularies?

From what I understand, GPT and GPT-2 are trained to predict the $N^{th}$ word in a sentence given the previous $N-1$ words. When the vocabulary size is very large (100k+ words) how is it able to generate any meaningful prediction? Shouldn't it…
AAC
  • 509
  • 2
  • 6
  • 13
10
votes
1 answer

How to summarize a long text using GPT-3

What is the best way to summarize a long text that exceeds 4096 token limit (like a podcast transcript for example)? As I understand I need to split the text into chunks to summarize, and then concatenate the results and summarize those. Is there…
Poma
  • 203
  • 2
  • 6
10
votes
1 answer

BERT vs GPT architectural, conceptual and implemetational differences

In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks. In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks. I was guessing whats…
Rnj
  • 245
  • 2
  • 9
9
votes
1 answer

What tokenizer does OpenAI's GPT3 API use?

I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error. The closest I got to an answer…
Herman Autore
  • 93
  • 1
  • 3
7
votes
1 answer

Masking during transformer inference?

I send to following question to both ChatGPT and Deepseek. "Let's say we're not training the large language model, we are inferencing. The model already generated a sequence [A,B,C] and is about to predict next token D. The model needs to perform…
OnCodeDeny
  • 71
  • 3
7
votes
1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…
user141493
  • 361
  • 1
  • 4
  • 9
5
votes
5 answers

Is using GPT-4 to label data advisable?

If I have a lot of text data that needs to be labeled (e.g. sentiment analysis), and given the high accuracy of GPT-4, could I use it to label data? Or would that introduce bias or some other issues?
5
votes
2 answers

What exactly are the parameters in GPT-3's 175 billion parameters?

What exactly are the parameters in GPT-3's 175 billion parameters? Are these the words in text on which model is trained?
user16584277
  • 169
  • 1
  • 1
  • 10
4
votes
1 answer

What's the right input for gpt-2 in NLP

I'm fine-tuning pre-trained gpt-2 for text summarization. The dataset contains 'text' and 'reference summary'. So my question is how to add special tokens to get the right input format. Currently I'm thinking doing like this: example1 text …
yuqiong11
  • 61
  • 1
  • 1
  • 2
4
votes
1 answer

How do I prompt GPT-4 to look at a PDF in Jupyter Notebook?

I am a beginner. I purchased tokens to use GPT-4 and finally figured out how to import the GPT-4 model into my Jupyter Notebook. %env OPENAI_API_KEY= (my key goes here) !pip install --upgrade openai wandb from openai import OpenAI LLM =…
Mas
  • 55
  • 4
4
votes
2 answers

ChatGPT: How to use long texts in prompt?

I like the website chatpdf.com a lot. You can upload a PDF file and then discuss the textual content of the file with the file "itself". It uses ChatGPT. I would like to program something similar. But I wonder how to use the content of long PDF…
meyer_mit_ai
  • 63
  • 1
  • 1
  • 5
1
2 3 4 5 6 7