Questions tagged [open-source]
23 questions
203
votes
35 answers
Publicly Available Datasets
One of the common problems in data science is gathering data from various sources in a somehow cleaned (semi-structured) format and combining metrics from various sources for making a higher level analysis. Looking at the other people's effort,…
Amir Ali Akbari
- 1,393
- 3
- 13
- 25
28
votes
7 answers
Publicly available social network datasets/APIs
As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…
Rubens
- 4,117
- 5
- 25
- 42
18
votes
5 answers
Open source data science projects to contribute
Contribution into open source projects is typically a good way to get some practice for newbies, and try a new area for experienced data scientists and analysts.
Which projects do you contribute? Please provide some intro + link on Github.
IgorS
- 5,474
- 11
- 34
- 43
7
votes
5 answers
Where can I find free spatio-temporal dataset for download?
Where can I find free spatio-temporal dataset for download so that I can play with it in R ?
mynameisJEFF
- 171
- 1
- 3
7
votes
2 answers
Item Based Collaborative Filtering with No Ratings
I am building a recommender for web pages. For each web page in our data set, we wish to generate a list of web pages that other users have also visited.
Our data only shows that a user has either visited a page, or they have not. Users do not…
sheldonkreger
- 1,169
- 8
- 20
5
votes
1 answer
Community-driven Open Data Platforms
Does anybody know community-driven open data platforms?
For example, consider object detection task. Then, next platforms come to my mind: Kaggle and Roboflow. However, in my opinion, both has a significant issue that makes it difficult to use these…
Leon Useinov
- 73
- 5
4
votes
3 answers
What open-source books (or other materials) provide a relatively thorough overview of data science?
As a researcher and instructor, I'm looking for open-source books (or similar materials) that provide a relatively thorough overview of data science from an applied perspective. To be clear, I'm especially interested in a thorough overview that…
statsRus
- 325
- 1
- 10
4
votes
3 answers
Data available from industry operations
I'm going to start my degree thesis and I want to do a fault detector system using machine learning techniques. I need datasets for my thesis but I don't know where I can get that data. I'm looking for historical operation/maintenance/fault datasets…
Juan David
- 143
- 3
4
votes
1 answer
Open source solver for large mixed integer programming task?
I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX…
rnorberg
- 203
- 2
- 7
2
votes
0 answers
Regression dataset with categorical features
I have thought of a regression technique that I want to try on several datasets. I would like these datasets to have the following properties:
Be a tabular dataset (no images).
Have at least 20k rows, and ideally around 100k.
Have some categorical…
David Masip
- 6,136
- 2
- 28
- 62
2
votes
0 answers
which algorithm will be good for detecting and recognition of faces from variety of angles
i am building a face recognition app for my class attendance system , i collect training data from social website like facebook, instagram and other, as you can see the images i got from there is not usually front facial but at variety of angle. i…
RISHABH RAI
- 71
- 1
- 3
2
votes
1 answer
Difficulties of getting raw data
I am trying to obtain raw data for (violent) crime rates of a US/Canadian city (any city would do), but I need the data to be granular and raw. All I could find is either interpretations, summary data or useless editorials. I'm trying to do…
LearnByReading
- 121
- 2
2
votes
1 answer
Where can I find open/free Galton's Ox estimate/Wisdom of Crowd dataset and similar?
I am playing around with some thoughts on Wisdom of Crowd phenomena and wanted to do some analysis in R/Excel. Francis Galton pioneered this concept and I was hoping to use his dataset but I can't find it anywhere online; I thought there was a…
Ethos
- 21
- 2
1
vote
1 answer
Publicly available news APIs/datasets?
In addition to our list of publicly available datasets, I'd like to know if there is any list of publicly available news datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics of the data available…
stevec
- 211
- 1
- 7
1
vote
3 answers
Tools to preprocess a big data for dashboards?
I have a complex dataset with more than 16M rows coming from pharmaceutical industry. Regarding the data, it is saved in a sql server with multiple (more than 400) relational tables. Data got several levels of hierachies like province, city, postal…
JeanVuda
- 431
- 4
- 6