Highest Voted 'gpt' Questions - Data Science Stack Exchange

19

votes

7 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…

nlp language-model gpt

asked Feb 03 '23 at 08:57

user141493

361
1
4
9

18

votes

1 answer

How does an LLM "parameter" relate to a "weight" in a neural network?

I keep reading about how the latest and greatest LLMs have billions of parameters. As someone who is more familiar with standard neural nets but is trying to better understand LLMs, I'm curious if a LLM parameter is the same as a NN weight i.e. is…

machine-learning nlp terminology gpt

asked Apr 06 '23 at 21:53

slim_wizard

183
1
1
6

14

votes

3 answers

Why does everyone use BERT in research instead of LLAMA or GPT or PaLM, etc?

It could be that I'm misunderstanding the problems space and the iterations of LLAMA, GPT, and PaLM are all based on BERT like many language models are, but every time I see a new paper in improving language models it takes BERT as a based an adds…

nlp bert language-model gpt research

asked Aug 03 '23 at 01:11

Ethan

243
1
2
6

11

votes

2 answers

Does BERT has any advantage over GPT3?

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…

nlp bert gpt

asked Sep 12 '20 at 04:37

Bipin

213
1
2
8

10

votes

1 answer

How is GPT able to handle large vocabularies?

From what I understand, GPT and GPT-2 are trained to predict the $N^{th}$ word in a sentence given the previous $N-1$ words. When the vocabulary size is very large (100k+ words) how is it able to generate any meaningful prediction? Shouldn't it…

deep-learning nlp gpt

asked Jul 11 '20 at 03:33

AAC

509
2
6
13

10

votes

1 answer

How to summarize a long text using GPT-3

What is the best way to summarize a long text that exceeds 4096 token limit (like a podcast transcript for example)? As I understand I need to split the text into chunks to summarize, and then concatenate the results and summarize those. Is there…

gpt automatic-summarization

asked Jan 12 '23 at 09:15

Poma

203
2
6

10

votes

1 answer

BERT vs GPT architectural, conceptual and implemetational differences

In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks. In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks. I was guessing whats…

machine-learning nlp bert transformer gpt

asked Nov 26 '21 at 21:22

Rnj

245
2
9

9

votes

1 answer

What tokenizer does OpenAI's GPT3 API use?

I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error. The closest I got to an answer…

python-3.x tokenization gpt

asked Jul 08 '21 at 18:07

Herman Autore

93
1
3

7

votes

1 answer

Masking during transformer inference?

I send to following question to both ChatGPT and Deepseek. "Let's say we're not training the large language model, we are inferencing. The model already generated a sequence [A,B,C] and is about to predict next token D. The model needs to perform…

transformer attention-mechanism ai gpt masking

asked Mar 09 '25 at 20:24

OnCodeDeny

71
3

7

votes

1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…

nlp computer-vision language-model gpt deepmind

asked Oct 24 '22 at 23:26

user141493

361
1
4
9

5

votes

5 answers

Is using GPT-4 to label data advisable?

If I have a lot of text data that needs to be labeled (e.g. sentiment analysis), and given the high accuracy of GPT-4, could I use it to label data? Or would that introduce bias or some other issues?

machine-learning gpt labelling

asked Apr 05 '23 at 19:53

cookiecutter

73
3

5

votes

2 answers

What exactly are the parameters in GPT-3's 175 billion parameters?

What exactly are the parameters in GPT-3's 175 billion parameters? Are these the words in text on which model is trained?

nlp gpt

asked Sep 20 '21 at 17:40

user16584277

169
1
1
10

4

votes

1 answer

What's the right input for gpt-2 in NLP

I'm fine-tuning pre-trained gpt-2 for text summarization. The dataset contains 'text' and 'reference summary'. So my question is how to add special tokens to get the right input format. Currently I'm thinking doing like this: example1 text …

nlp data-science-model transformer gpt

asked Dec 11 '20 at 17:29

yuqiong11

61
1
1
2

4

votes

1 answer

How do I prompt GPT-4 to look at a PDF in Jupyter Notebook?

I am a beginner. I purchased tokens to use GPT-4 and finally figured out how to import the GPT-4 model into my Jupyter Notebook. %env OPENAI_API_KEY= (my key goes here) !pip install --upgrade openai wandb from openai import OpenAI LLM =…

nlp jupyter gpt llm api

asked Apr 11 '24 at 04:16

Mas

55
4

4

votes

2 answers

ChatGPT: How to use long texts in prompt?

I like the website chatpdf.com a lot. You can upload a PDF file and then discuss the textual content of the file with the file "itself". It uses ChatGPT. I would like to program something similar. But I wonder how to use the content of long PDF…

transformer gpt tokenization chatbot

asked Mar 18 '23 at 12:46

meyer_mit_ai

63
1
1
5

Questions tagged [gpt]