Highest Voted 'big-data' Questions - Computer Science Stack Exchange

8

votes

4 answers

Applying algorithms on large data

Is there any book or tutorial that teaches us how to efficiently apply the common algorithms (sorting, searching, etc.) on large data (i.e. data that cannot be fully loaded into main memory) and how to efficiently apply those algorithms considering…

asked Aug 22 '12 at 11:10

Arani

523
4
11

4

votes

1 answer

What is the difference between FLOPS and OPS?

I have typically heard computer performance discussed in terms of FLOPS. However, I have recently seen multiple references instead using OPS i.e. operations per second, typically in the context of Big Data. What is the difference between FLOPS and…

performance big-data high-performance-computing

asked Sep 25 '21 at 13:17

user1887919

141
1
3

3

votes

4 answers

If I have a large random array of 0s and 1s that I want to sort what kind of an algorithm and data structures should I consider?

What are the types of things that need to be considered if I need to sort a large random array of 0s and 1s? You can assume large array is in the order of million or billions. I understand there are tons of sorting algorithms out there (quick,…

sorting big-data

asked Dec 09 '12 at 02:44

user1068636

217
3
6

3

votes

2 answers

Bloom Filter for 208 million URLs

I need to create a bloom filter of 208 million URLs. What would be a good choice of bit vector size and number of hash functions? I tried a bit vector of size 1 GB and 4 hash functions, but it resulted in too many false positives while reading. I…

data-structures probabilistic-algorithms searching big-data bloom-filters

asked Sep 19 '12 at 15:58

Aadith Ramia

157
4

3

votes

4 answers

High Dimensional Data Structures

I have a 20-dimensional dataset, with a large amount of data points. I would like to have each dimension discretized into bins. Per bin, I would like to be able to access two neighbours per dimension (i.e. +1 and -1 per dimension). Basically I want…

data-structures big-data

asked Jun 05 '14 at 11:34

danielvdende

133
2

3

votes

2 answers

How do XFast and Y Fast Tries compare to B trees in performance?

I learned that Y fast tries support amortized loglog(u) time insertions , deletions. and loglog(u) time membership, successor and predecessor operations with O(n) space. So when n is closer to U in dense big data environments Y fast tries seem…

big-data b-tree

asked Feb 25 '23 at 14:52

thambi

125
9

3

votes

2 answers

Semi-streaming algorithm for $s$-$t$ connectivity

Let $G=(V,E)$ be an undirected graph. Given a pair of vertices $s,t \in V$, how can we construct a semi-streaming algorithm which determines is $s$ and $t$ are connected? Is there any way to construct such an algorithm which scans the input stream…

streaming-algorithm big-data

asked Sep 13 '20 at 13:36

KaliTheGreat

181
7

2

votes

1 answer

(a,b)-tree vs B-tree

I would like to know what are the differences between (a,b)-tree and a B-tree. It has been a few days I am studying different papers and I am seeing different definitions that make me confused. For example in External Memory geometric Data Structure…

data-structures trees big-data

asked Feb 13 '16 at 19:37

M a m a D

1,561
2
18
33

2

votes

0 answers

Examples of real world graphs that are too big for a single commodity-type machine

I've been reading on distributed systems for processing on large graphs. The most prominent examples include Pregel (developed by Google) and Apache Giraph. Most of these systems argue their existence that they are for "big data" processing, i.e.,…

distributed-systems space-complexity big-data

asked Jan 23 '15 at 07:23

bigdataguy

21
1

2

votes

2 answers

Hashing by doing modulo $m$ for $m=p^2$ for a prime $p$ instead of using a prime $m$ - is it that bad?

I am doing an exercise from a Big Data course I'm taking on Coursera (this exercise is for experimenting with a big-data problem and is not for any credit or homework) , the assignment was described briefly: Your task is to quickly find the number…

hash big-data

asked Oct 16 '14 at 22:11

Belgi

267
1
9

2

votes

0 answers

Looking for dynamic network data sets

There are a number of collections of network (or graph) data sets freely available on the web, e.g. http://snap.stanford.edu/data/index.html http://www.cc.gatech.edu/dimacs10/downloads.shtml I am looking for dynamic network data sets, i.e.…

graphs data-sets social-networks big-data

asked Jul 31 '13 at 15:18

cls

21
2

1

vote

0 answers

Clarification on MapReduce description in textbook

I am reading through chapter 2 of of the free textbook "Mining of Massive Datasets" (http://www.mmds.org/). On page 28 the following is stated: "It is reasonable to create one Map task for every chunk of the input file(s), but we may wish to create…

algorithms big-data mapreduce

asked Oct 03 '18 at 16:31

ClownInTheMoon

323
2
9

1

vote

1 answer

Why splitted text files is bigger than a large one with the same content?

I have this large text file that when unzipped has about 2GB. I split this one into multiple(more than 5 million) files and now I have a folder of about 20GB, how is this possible?

filesystems big-data

asked Apr 07 '17 at 19:45

Marcelo Machado

165
5

1

vote

0 answers

What are internal clustering index for binary data ? And if possible applicable to massive cluster ?

I was wondering what are the current existing internal clustering index for binary data. I know already the silhouette and Davis Bouldin for euclidian space, i suppose they work as well in binary space using Hamming distance for example, tell me if…

clustering big-data

asked Mar 02 '17 at 09:29

KyBe

235
3
9

1

vote

1 answer

Taking intersection in large search

As I understand, you can build the the word -> pages index in Google or large SQL database since indexed search has complexity O(1) -- lookup gives you a billion-page result at once сomputer -> About 2.14 bln results science -> About 1.93 bln…

search-algorithms big-data search

asked May 27 '16 at 10:12

Little Alien

195
6

Questions tagged [big-data]