Questions tagged [apache-mahout]

Apache Mahout is an open source scalable machine learning project

This topic covers questions related to Apache Mahout, a scalable machine learning project written in Java and largely based on Apache Hadoop, with implementations of algorithms for:

  • collaborative-filtering / recommenders
  • classification
  • clustering
  • frequent pattern mining
  • regression
  • locality-sensitive-hashing
  • more
16 questions
7
votes
2 answers

Item Based Collaborative Filtering with No Ratings

I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not…
sheldonkreger
  • 1,169
  • 8
  • 20
5
votes
2 answers

Item based recommender using SVD

I have an item-item similarity matrix. e.g. (the matrix is symmetric, and much bigger): 1.00 0.88 0.96 0.99 0.88 1.00 0.99 0.96 0.96 0.99 1.00 0.86 0.99 0.96 0.86 1.00 I need to implement recommender which, for a set of items, recommends a new…
Ognjen
  • 151
  • 2
5
votes
1 answer

Using Spark for finding similar users to a user?

I read about https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html but couldn't find a spark library for this implementation. I have columnar string dataset. I have a dataset with around data of 15-20 million users with their…
Nikhil Verma
  • 191
  • 1
  • 1
  • 9
4
votes
1 answer

User profiling with Mahout from categorized user behavior

I'm trying to cluster and classify users with Mahout. At the moment I am at the planning phase, my mind is completely mixed with ideas, and since I'm relatively new to the area I'm stuck at the data formatting. Let's say we have two data table (big…
Turcia
  • 41
  • 4
3
votes
1 answer

Content based recommendation on Mahout

Is it possible to get recommendation on similar product using Mahout ? eg : I have data set of movies with following attributes Movie_name, Actor_1, Actor_2, Actress_1, Actress_2, Director, Theme, Language Now given a Movie_name the system should …
Sreejithc321
  • 1,940
  • 3
  • 20
  • 34
3
votes
1 answer

N - fold cross validation in mahout

Is there a method/class available in Apache Mahout to perform n-fold cross validation? If yes how it can be done?
Sreejithc321
  • 1,940
  • 3
  • 20
  • 34
3
votes
0 answers

Interpretation of Similarity Number generated by LogLikehood in Mahout

I have a pretty basic question and I was hoping someone could help me. I’m not a math person and I’m fairly new to mahout so I’m looking for a poor’s man explanation. It is a typical order recommendation system. I have a database with around 699,445…
2
votes
2 answers

Building Recommendation engine with Python

Which all are the equivalent or advanced libraries in Python for building recommendation systems like Mahout for Collaborative Filtering and Content Based Filtering ? Also is there a way to integrate Mahout with Python?
Sreejithc321
  • 1,940
  • 3
  • 20
  • 34
2
votes
0 answers

Creating Data model for mahout recommendation engine

I am trying to build an item-item similarity matching recommendation engine with mahout. The data set is as in the following format ( attributes are in text not in numerals format ) name : category : cost : ingredients x : xx1 : 15 : xxx1, xxx2,…
2
votes
1 answer

Mahout Spark shell not working

I installed Hadoop, Mahout and Spark. I am able to see the Hadoop and Spark MasterWebUI. Moreover, I can also run the following command, [hadoop@muildevcel01 mahout]$ bin/mahout However, we I try running the spark-shell I run in the problem stated…
Dimag Kharab
  • 141
  • 1
  • 5
1
vote
0 answers

Parameters for OnlineLogisticRegression function in Mahout

Can anyone tell me where do I find any documentation for parameters like: -stepOffset -alpha -decayExponent in an OnlineLogisticRegression function in Mahout? I am interested in what do they change in calls like this one: int FEATURES = 10000; …
Marcin
  • 235
  • 4
  • 15
1
vote
1 answer

Mimic a Mahout like system

I have a data set, in excel format, with account names, reported symptoms, a determined root cause and a date in month year format for each row. I am trying to implement a mahout like system with a purpose of determining the likelihood symptoms an…
SRS
  • 1,065
  • 5
  • 11
  • 22
1
vote
1 answer

Unknown program 'spark-itemsimilarity' chosen

I have cloudera CDH5 running inside a virtual box. when I try to run : mahout spark-itemsimilarity .... I get the error: Unknown program 'spark-itemsimilarity' chosen. Do i have to install any additional package to run the spark-similarity? Any…
1
vote
0 answers

mahout clusterdump top terms meaning

I apologize that this has been asked and I feel that it may be obvious, but I am wondering exactly what the meaning of the numerical value below from clusterdump: Top Terms: monkey => 0.8170868432876803 I believe that to the be center of…
Chris
  • 221
  • 1
  • 2
  • 8
0
votes
2 answers

collaborative filtering using graph and machine learning

What are the advantages and disadvantages of using Collaborative filtering based recommendation using machine learning approach and graph based approach ? Say I have user purchase data (user_name, user_location, user_company_name, product_name,…
Sreejithc321
  • 1,940
  • 3
  • 20
  • 34
1
2