Questions tagged [labelling]

26 questions
5
votes
5 answers

Is using GPT-4 to label data advisable?

If I have a lot of text data that needs to be labeled (e.g. sentiment analysis), and given the high accuracy of GPT-4, could I use it to label data? Or would that introduce bias or some other issues?
4
votes
1 answer

Using training data that requires manual interpretation

I have a dataset that comprises several data streams that are measured on objects (>10k objects). The data is essentially time series data (0.5 second intervals). Typically, an expert interpreter would then manually segment the data (~500 seconds in…
2
votes
1 answer

Solutions for Labelling Training Data for Binary Classification Problems

I have a huge dataset for which I am trying to use an 80-20 (Holdout method) approach to train and test my model. However, the dataset I have been given has 6m rows. The objective is to train+test+validate the model before using live data traffic…
2
votes
1 answer

Online Audio annotation tools

I need to find a decent online annotation tool to transcribe audio. There are some requirements for a potential tool: I should be able to deliver audio files to a few labelers. I should be able to track which files went to which labeler. It should…
Aidos
  • 123
  • 3
2
votes
1 answer

Python package for machine-learning aided data labelling

In a lot of cases unlabelled data needs to be transformed to labelled data. The best solution is to use (multiple) human classifiers. However, going to all the data by hand (i.e. in text-mining or image-processing) is often a daunting task. Is there…
Pieter
  • 971
  • 6
  • 19
2
votes
1 answer

too much data to label

I'm working on a Data Science project to flag bots on Instagram. I collected a lot of data (+80k users) and now I have to label them as bot/legit users. I already flagged 20k users with different techniques but now I feel like I'm gonna have to flag…
Marc
  • 222
  • 1
  • 7
2
votes
0 answers

Labelling a dataset for sentiment analysis, which model is the best?

I want to do some sentiment analysis on a large text dataset I scraped. From what I've learned so far, I know that I need to either manually label each text data (positive, negative, neutral) or use a pre-trained model like bert and textblob. I…
Dan K
  • 21
  • 1
1
vote
0 answers

Best practices to image annotation for object detection when objects overlap

If I have the following example: How should I annotate the bottom image? I can think of those scenarios: Create a large box that captures class B and a second box that captures entirely class A. This will lead to overlapping…
1
vote
2 answers

How do I label images faster

I have around 1600 images extracted from videos shot at night time. I am labeling each image and trying to be as accurate as I can in assigning bounding boxes. I am labeling vehicles and traffic light/traffic signs. This is very time-consuming, I am…
Vendetta
  • 121
  • 3
1
vote
1 answer

Label A records B times or label A*B records

This question concerns pre-training data sourcing. Suppose you have a human workforce of B individuals and a potentially unlimited source of data. The task is labeling images with classes. These classes are somewhat subjective (emotions). This…
1
vote
1 answer

What are good ways to extend an ML model with a new class without relabeling all previous data?

I have a segmentation model trained using 1,000 images that can predict 4 classes (dog, cat, mouse, elephant). I would now like to extend the model with a 5th class (horse). Horses are present in the 1,000 images used for the first model, but not…
nickponline
  • 111
  • 1
1
vote
0 answers

Labelling spectrograms

Currently I'm working on a ML project, just need an information, is there any tool that is present that can load audios file and generates spectrograms as well as an option to annotating/ label the spectrograms. I have thousands of audio data to…
1
vote
1 answer

Labelling large amounts of audio data in automatic or semi-automatic way

I am working on a project, where I have to label the audio datasets which has thousands of data, each audio data is for one second. I have to label where it is in idle or event happening or noise. I used some tool like Audacity and Labelstudio, I…
1
vote
1 answer

Sub labelling of an object

First timer in image processing - Pardon my cluelessness. Is there a concept of sub labeling in objection identification? I want to label a person and sub label "eye" of a person and train a model to detect if the person's eye is open or closed. i.e…
Jean
  • 111
  • 2
1
vote
0 answers

How to label legit users when trying developing a bot flagging classification model?

I’m working on a project where I try to flag bots from legit users on social media. The data I collected is not labeled but I have labeled about 17% of it (22k users) thought different techniques. Finding bots was easy as they all have similarities…
Marc
  • 222
  • 1
  • 7
1
2