Questions tagged [open-source]

23 questions
203
votes
35 answers

Publicly Available Datasets

One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort,…
Amir Ali Akbari
  • 1,393
  • 3
  • 13
  • 25
28
votes
7 answers

Publicly available social network datasets/APIs

As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…
Rubens
  • 4,117
  • 5
  • 25
  • 42
18
votes
5 answers

Open source data science projects to contribute

Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts. Which projects do you contribute? Please provide some intro + link on Github.
IgorS
  • 5,474
  • 11
  • 34
  • 43
7
votes
5 answers

Where can I find free spatio-temporal dataset for download?

Where can I find free spatio-temporal dataset for download so that I can play with it in R ?
mynameisJEFF
  • 171
  • 1
  • 3
7
votes
2 answers

Item Based Collaborative Filtering with No Ratings

I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited. Our data only shows that a user has either visited a page, or they have not. Users do not…
sheldonkreger
  • 1,169
  • 8
  • 20
5
votes
1 answer

Community-driven Open Data Platforms

Does anybody know community-driven open data platforms? For example, consider object detection task. Then, next platforms come to my mind: Kaggle and Roboflow. However, in my opinion, both has a significant issue that makes it difficult to use these…
4
votes
3 answers

What open-source books (or other materials) provide a relatively thorough overview of data science?

As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that…
statsRus
  • 325
  • 1
  • 10
4
votes
3 answers

Data available from industry operations

I'm going to start my degree thesis and I want to do a fault detector system using machine learning techniques. I need datasets for my thesis but I don't know where I can get that data. I'm looking for historical operation/maintenance/fault datasets…
Juan David
  • 143
  • 3
4
votes
1 answer

Open source solver for large mixed integer programming task?

I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX…
rnorberg
  • 203
  • 2
  • 7
2
votes
0 answers

Regression dataset with categorical features

I have thought of a regression technique that I want to try on several datasets. I would like these datasets to have the following properties: Be a tabular dataset (no images). Have at least 20k rows, and ideally around 100k. Have some categorical…
David Masip
  • 6,136
  • 2
  • 28
  • 62
2
votes
0 answers

which algorithm will be good for detecting and recognition of faces from variety of angles

i am building a face recognition app for my class attendance system , i collect training data from social website like facebook, instagram and other, as you can see the images i got from there is not usually front facial but at variety of angle. i…
2
votes
1 answer

Difficulties of getting raw data

I am trying to obtain raw data for (violent) crime rates of a US/Canadian city (any city would do), but I need the data to be granular and raw. All I could find is either interpretations, summary data or useless editorials. I'm trying to do…
2
votes
1 answer

Where can I find open/free Galton's Ox estimate/Wisdom of Crowd dataset and similar?

I am playing around with some thoughts on Wisdom of Crowd phenomena and wanted to do some analysis in R/Excel. Francis Galton pioneered this concept and I was hoping to use his dataset but I can't find it anywhere online; I thought there was a…
Ethos
  • 21
  • 2
1
vote
1 answer

Publicly available news APIs/datasets?

In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available…
stevec
  • 211
  • 1
  • 7
1
vote
3 answers

Tools to preprocess a big data for dashboards?

I have a complex dataset with more than 16M rows coming from pharmaceutical industry. Regarding the data, it is saved in a sql server with multiple (more than 400) relational tables. Data got several levels of hierachies like province, city, postal…
JeanVuda
  • 431
  • 4
  • 6
1
2