I am working on a research project on illegal dumping within a city & devicing advanced analytics solutions for the waste management company like resource optimisation, trend identification etc but specifically advanced clustering. The dataset is of public reports of illegal dumping in an urban city. Dataset contains:
enquiry_desc column - Textual descriptions of dumped items Date and time information Location data (geocodes or latitude/longitude) Quantity size of dumped items - (must be extracted from enquiry_desc column. it is 3 categories like small, mid, large vehicle)
Analysis Plan:
Cluster reported incidents based on the "Types of items dumped" category. I am trying to extract the items from the enquiry_desc column, which is turning out to be a very difficult task because of large data & improper format & the usual difficulty of dealing with textual data. I intend to categorize the dumped items (furniture, construction, electronics etc) & Utilize location data to visualize these clusters on a map, aiding the waste management company in resource allocation.
I figured K-means may not be the right approach as it just partitions data. Maybe DBSCAN might be an option as we dont have to input number of clusters. I have not done clustering & be able to visualise the clusters on map. Does anyone has any input or suggestions? Let me know your thoughts & feel free to ask questions.
Thanks!