The get_items() return every single a search result with class type.
So the count of tweets needs to count by for loop.
This code will works
100K tweets is possible but it take too much time, I reduced 1K tweets.
import snscrape.modules.twitter as sntwitter
import pandas as pd
query = 'from:elonmusk since:2022-08-01 until:2023-01-28'
limit = 1000
tweets = sntwitter.TwitterSearchScraper(query).get_items()
index = 0
df = pd.DataFrame(columns=['Date','URL' ,'Tweet'])
for tweet in tweets:
if index == limit:
break
URL = "https://twitter.com/{0}/status/{1}".format(tweet.user.username,tweet.id)
df2 = {'Date': tweet.date, 'URL': URL, 'Tweet': tweet.rawContent}
df = pd.concat([df, pd.DataFrame.from_records([df2])])
index = index + 1
# # Converting time zone from UTC to GMT+8
df['Date'] = df['Date'].dt.tz_convert('Etc/GMT+8')
print(df)
df.to_csv('tweets.csv')
This single data of get_items()
it needs to extract only required key's value
tweet.date -> Date
https://twitter.com/tweet.user.username/status/tweet.id-> URL
tweet.rawContent-> Tweet
{
"_type": "snscrape.modules.twitter.Tweet",
"url": "https://twitter.com/elonmusk/status/1619164489710178307",
"date": "2023-01-28T02:44:31+00:00",
"rawContent": "@tn_daki @ShitpostGate Yup",
"renderedContent": "@tn_daki @ShitpostGate Yup",
"id": 1619164489710178307,
"user": {
"_type": "snscrape.modules.twitter.User",
"username": "elonmusk",
"id": 44196397,
"displayname": "Mr. Tweet",
"rawDescription": "",
"renderedDescription": "",
"descriptionLinks": null,
"verified": true,
"created": "2009-06-02T20:12:29+00:00",
"followersCount": 127536699,
"friendsCount": 176,
"statusesCount": 22411,
"favouritesCount": 17500,
"listedCount": 113687,
"mediaCount": 1367,
"location": "",
"protected": false,
"link": null,
"profileImageUrl": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",
"profileBannerUrl": "https://pbs.twimg.com/profile_banners/44196397/1576183471",
"label": null,
"url": "https://twitter.com/elonmusk"
}
... cut off
Result
>python get-data.py
Date URL Tweet
0 2023-01-27 15:29:36-08:00 https://twitter.com/elonmusk/status/1619115435... @farzyness No way
0 2023-01-27 15:14:05-08:00 https://twitter.com/elonmusk/status/1619111533... @mtaibbi Please correct your bs @PolitiFact &a...
0 2023-01-27 14:52:55-08:00 https://twitter.com/elonmusk/status/1619106207... @WallStreetSilv A quarter of all taxes just to...
0 2023-01-27 13:28:26-08:00 https://twitter.com/elonmusk/status/1619084945... @nudubabba @mikeduncan Yeah, whole thing
0 2023-01-27 13:12:16-08:00 https://twitter.com/elonmusk/status/1619080876... @TaraBull808 That’s way more monkeys than the ...
.. ... ... ...
0 2022-12-14 11:14:53-08:00 https://twitter.com/elonmusk/status/1603106271... @Jason Advertising revenue next year will be l...
0 2022-12-14 04:08:43-08:00 https://twitter.com/elonmusk/status/1602999020... @Balyx_ He would be welcome
0 2022-12-14 03:42:47-08:00 https://twitter.com/elonmusk/status/1602992493... @NorwayMFA @TwitterSupport @jonasgahrstore @AH...
0 2022-12-14 03:35:14-08:00 https://twitter.com/elonmusk/status/1602990594... @AvidHalaby Wow
0 2022-12-14 03:35:03-08:00 https://twitter.com/elonmusk/status/1602990549... @AvidHalaby Live & learn …
[1000 rows x 3 columns]
Reference
Converting time zone pandas dataframe
Tweet URL format
Detain information in here
Example:
URL = "https://twitter.com/elonmusk/status/1619111533216403456"
It saved into csv file.
0,2023-01-27 15:14:05-08:00,https://twitter.com/elonmusk/status/1619111533216403456,@mtaibbi Please correct your bs @PolitiFact & @snopes
It matched the tweet content and pandas Tweet column.
Also, you can add column, followers Count, friends Count, statuses Count, favourites Count, listed Count, media Count, reply Count, retweet Count, like Count and view Count too.
