Questions tagged [indexing]

Indexing is the almost most important part of data to get an efficient, properly storing and retrieval data from mediums

Indexing is the almost most important part of data to get an efficient, properly storing and retrieval data from mediums

In different Programming Languages, there are different indexing algorithms and structures can be found.

As an Example, in Java Language the Apache Foundation's Lucene is very popular.

In years there are very efficient, fast, scalable Indexing Algorithms projectioned to the computer-world such as MurMurHashing Algorithm (using by some NoSQL Databases), B+Tree Algorithm (using by versions of Windows OS itself), or Lucene (very popular in web technologies) and its variations.

Also there can be found problem-specific Indexing within Chemicals or Medical Data Representations in Digital World

21 questions
11
votes
2 answers

Counting indexes in pandas

I feel like this is a rudimentary question but I'm very new to this and just haven't been able to crack it / find the answer. Ultimately what I'm trying to do here is to count unique values on a certain column and then determine which of those…
Mr. Hasquestions
  • 113
  • 1
  • 1
  • 6
10
votes
1 answer

What is the most efficient data indexing technique

As we all know, there are some data indexing techniques, using by well-known indexing apps, like Lucene (for java) or Lucene.NET (for .NET), MurMurHash, B+Tree etc. For a No-Sql / Object Oriented Database (which I try to write/play a little around…
4
votes
1 answer

Pandas dataframe with multiple hierarchical indices

I have a data frame which looks like this FRUIT ID COLOR WEIGHT Apple 142 Red Heavy Mango 231 Red Light Apple 764 Green Light Apple 543 Green Heavy And I want the…
Phil
  • 73
  • 1
  • 7
4
votes
1 answer

Why keep vocabulary and posting list separate in a search engine

I am taking a class in information retrieval. We learned that the index of a search engine has (possibly among other things): A vocabulary mapping terms to their statistics (frequency, type, ...) and A posting list mapping terms to the documents…
icehawk
  • 141
  • 2
3
votes
1 answer

convert single index pandas data frame to multi-index

I have a data frame with following structure: df.columns Index(['first_post_date', 'followers_count', 'friends_count', 'last_post_date','min_retweet', 'retweet_count', 'screen_name', 'tweet_count', 'tweet_with_max_retweet', 'tweets',…
Rakib
  • 225
  • 3
  • 10
2
votes
2 answers

Pandas: Assign back to table from grouping by column and index

I am trying to implement Exponential Moving Average calculation on a DataFrame. The formula is An additional complication is that my table is grouped and there is a unique bin number per group. This is what I tried import numpy as np import…
Steztric
  • 181
  • 8
2
votes
2 answers

Tiering after clustering with Kmeans

I would like to have some suggestions on possible avenues that would make sense in the following context. 3 Optimal clusters have been identified in a 5000 list of customers using Kmeans Data model has 30 features and a PCA was performed prior to…
1
vote
0 answers

Book indexing data science project

Is it possible to perform Book index searching using Machine learning algorithms? Inputs : 1 Book pages with page numbers as images. 2 Index words in the book. Output: Tracing the page number/s with the indexes provided.
1
vote
0 answers

Primary indexes

I would like to ask you two questions about indexing: 1) Since a primary index, or clustering index, stores the tuples of a relation in the primary index itself (but primary index might also be separated from the file containing the tuples), how can…
UberM
  • 53
  • 3
1
vote
0 answers

Primary indexes and index-sequential files

I am studying the physical organization of databases and right now I trying to understand the concept of primary index or clustering index. The book states the primary index can be realized by storing the tuples on the index itself ( the index…
UberM
  • 53
  • 3
1
vote
1 answer

Index for efficient argmax(w.x) query ~ 20d

I'm looking for a spatial index that can efficiently find the most extreme n points in a certain direction, i.e. for a given w, find x[0:n] in the dataset where x0 gives the largest value of w.x and x1 the second largest value of w.x, etc... . Is…
user1158559
  • 151
  • 3
1
vote
1 answer

How to create dictionary with multiple keys from dataframe in python?

I have a pandas dataframe as follows, I want to convert it to a dictionary format with 2 keys as shown: id name energy fibre 0 11005 4-Grain Flakes 1404 …
KHAN irfan
  • 421
  • 1
  • 7
  • 16
1
vote
1 answer

How to resolve too many indices for array Index Error

I'm performing a binary classification in Keras and attempting to plot the ROC curves. When I tried to compute the fpr and tpr metrics, I get the "too many indices for array" error. Here is my code: #declare the number of…
shiva
  • 311
  • 3
  • 7
  • 15
1
vote
1 answer

The difference between Faiss Index and a Database Index

An index points to data in a table. In a database, indexes are similar to those in books. I am a little bit confused about the meaning of index in Faiss library and how it's different from the one in the database please if possible?
Avv
  • 231
  • 1
  • 2
  • 10
1
vote
0 answers

How does google indexes text documents?

Unfortunately, I didn't find much information surrounding this, no pdfs, no textbooks that discuss this in just enough detail. And I didn't see any forums posts about this. I just want to learn an example scenario of indexing in big data. I've found…
jewloa
  • 11
  • 1
1
2