Questions tagged [data-stream-mining]

An activity that seeks patterns in a continuous stream of data elements, usually involving summarizing the stream in some way.

24 questions
12
votes
2 answers

Opensource tools for help in mining stream of leader board scores

Consider a stream containing tuples (user, new_score) representing users' scores in an online game. The stream could have 100-1,000 new elements per second. The game has 200K to 300K unique players. I would like to have some standing queries like:…
Tahir Akhtar
  • 315
  • 2
  • 9
6
votes
1 answer

Which Big Data technology stack is most suitable for processing tweets, extracting/expanding URLs and pushing (only) new links into 3rd party system?

(Note: Pulled this question from the list of questions in Area51, but believe the question is self explanatory. That said, believe I get the general intent of the question, and as a result likely able to field any questions on the question that…
blunders
  • 1,932
  • 2
  • 15
  • 19
5
votes
1 answer

Real time noise removal using Savitzky-Golay Method

I would like to ask if Savitzky-Golay can be implemented on real-time data. I have used it on a fixed array size, but would like to extend it to output values for real-time sensor data. Can anyone refer me to appropriate implementation or hint…
4
votes
2 answers

Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib

I want to implement Streaming Naive Bayes in a distributed system. What are the best approach to choose framework. Should I choose: Storm alone and implement streaming naive bayes on my own in storm topology. Storm + TridentML Storm + SAMOA Spark…
3
votes
0 answers

High dimensional data stream summarization and processing

Can anyone recommend a method for summarizing and processing high dimensional data streams efficiently and effectively for anomaly detection? In fact, I investigated the different methods for data stream summarization (sampling, histograms,…
3
votes
1 answer

Designing a ConvNet to facilitate game playing

For fun I want to design a convolutional neural net to recognize enemy NPCs in a first person shooter. I have captured 100 jpegs of the npcs as well as 100 jpegs of not-NPCs. I have successfully trained a really simple convNEt to identify NPCs. This…
aquagremlin
  • 133
  • 4
2
votes
1 answer

What are the approaches to aggregate categorical variables?

I am working on a clickstream dataset. I have come up with the following example dataset to explain my problem: ClickTimeStamp | SessionID | ART_weekOfYear | PagenameClicked | TimeSpentPerSession | CustID | ContractID | ... | TARGET…
2
votes
2 answers

Online learning w/ feature weighting/adjusting

Let's say I have a supervised learning problem with a sequence of features and labels. First, I learn on the training data and then I decide to stream in data, point by point and do online learning. Is it possible to update the weights or figure out…
2
votes
0 answers

Is there a counting sketch optimized for intersections?

Popular counting sketches(loglog, hyperloglog, etc) feature natural union operations. Are there any known counting sketches that feature natural intersection operations?
Newbie
  • 121
  • 3
2
votes
1 answer

How to build this data pipeline?

I don't have much experience in data engineering, so I'm here to ask for advice. I am working on a project which consists of building a dashboard for the IT department of a bank. the dashboard should present information from log data. Log data…
2
votes
1 answer

What is the differenc between Real concept drift, virtual concept drift and feature drift

As far as I know, the real concept drift is caused by changes in the decision boundary while virtual drift occurs because of changes in data distribution. Some researchers mention that virtual drift can be denoted as feature change. Is my…
2
votes
1 answer

Newbie questions: real-time clustering of messages

I'm very much a newbie in NLP, so please accept my apologies if this is an obvious question, the wrong place to ask it or any other error I could be making. I am considering using NLP for some subset of real-time spam detection in real-time chat.…
2
votes
0 answers

local regression with streaming data

From a data stream i'm receiving a pair of measurements consisting of a current consumption and a current percentage every second. By accumulating the consumption over time it will represent eventually the maximum capacity when the percentage…
2
votes
1 answer

Analysis of Real-Time Bidding

I'm totally new to the topic of real-time bidding in which I know Machine Learning algorithms are used pretty often. Can somebody explain me the system in a plain language i.e. a language for a non-technical person? What is the bidding? Who bids on…
DanielWelke
  • 173
  • 1
  • 10
1
vote
1 answer

reduction of sample from videos sample

Well, I post the same question in the main stack before finding the right place, sorry. A friend of mine is working with more than a 100 videos as sample for his neural network. Each video last more than a couple of minutes with around 24 frames per…
T.Dunglas
  • 23
  • 2
1
2