2

enter image description here

I want to convert a json file into a dataframe in pandas (Python). I tried with read_json() but got the error:

UnicodeDecodeError:'charmap' codec can't decode byte 0x81 in position 21596351:character maps to <undefined> 

I think I have some unwanted data in the json file like noise. The data is server generated.

This is a collection from the json file:

{"_id":{"$oid":"57a30ce368fd0809ec4d1b41"},"session":{"start_timestamp":{"$numberLong":"1470151881189"},"session_id":"8356bd90-20160802-153121189"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470152028294"},"event_type":"OfferViewed","event_timestamp":{"$numberLong":"1470151943271"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:1d507b8f-857c-42a4-a705-8db07d46bc8f","client_id":"aa092911-b9a7-498a-82da-76318356bd90"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"Xiaomi","model":"Redmi Note 3"},"attributes":{"Category":"90000","CustomerID":"4077","OfferID":"11846"}}
Eskapp
  • 456
  • 4
  • 18
Abhishek Pathak
  • 45
  • 1
  • 1
  • 6

3 Answers3

1

You have to read the file line by line, you can find a detailed answer in this question of stackoverflow

Dani Mesejo
  • 2,226
  • 13
  • 19
0
import codecs
import pandas as pd
pd.read_json(codecs.open('json_file','r','utf-8'))

This should work.

Himanshu Rai
  • 1,858
  • 13
  • 10
-1

You can try this.

import pandas as pd
#read line by line
with open('json_file','rb') as f:
  entries=f.readlines()
lines=list(entries)
Cleaned=[str(line).rstrip() for line in lines]
#Removes \n
Json="[" + ','.join(str(cl) for cl in Cleaned) + "]"

pd.read_json(Json)
Himanshu Rai
  • 1,858
  • 13
  • 10