Analyzing My Google Search History
Google makes it really easy to download a list of everything you've ever searched, meticulously logged to the microsecond. I'm always looking for excuses to practice python, so over the latter part of my winter break I put a few hours into writing a python script that will show some metrics of my search history.
Getting started
First, I downloaded my Google history from Google Takeout, which is good practice anyways. I looked at a couple of the different data types and decided that I'd look at my search history, of which I have multiple years of data for.
Formatting the data
The first thing I needed to do was format the data into some usable structures. The search history is in json:
{"query":{"id":[{"timestamp_usec":"1372456830180971"}],"query_text":"Restaurants"}},
Not too bad. And yes, that's a lot of digits for a timestamp! That's because it's in microseconds.
Python's json library makes quick work of the files, which I organized into a map where the query was a key pointing towards a list of timestamps. I also recorded all time stamps into a single list for a frequency analysis.
Doing something with the data
The first thing I needed to do was convert the timestamps to a more usable format, which I did by converting them to days. Then using matplotlib, I ploted a bar graph of searches per day during the duration of the data.
Except that graph doesn't tell us too much - there's too much variance in the data and there's not a whole lot of meaning that can be drawn from individual days. So I converted days to weeks and plotted the number of searches per week as a line plot.
This is a bit more meaningful, and I can start noticing trends that align to life changes. It's a bit easier to see on a plot of my searches per month.
There are a few very noticeable valleys and peaks on this plot. Starting from the left, the first massive jump occurs during my senior year of high school. I attribute this to regular programming and school work. The summer of 2014 saw a drop in searches while I worked in a warehouse. I wasn't allowed on my phone and spent little time at the computer that summer, resulting in my searches dropping drastically. My searches go back up for the duration of my freshman year of college, where they stay relatively consistent. The first major peak in searches occurs in October of 2015, once I started regular programming at my first internship position. Following the end of my internship in December, my searches go down until the summer of 2017, where I was once again on an internship. As I entered school in the fall of 2016 my searches drop again, until I got my Google Home at the start of November, which causes my searches to spike up again. My searches fell during winter break, and will soon spike again once data comes in for my next internship.
The clear effects that working had on my search numbers inspired me to plot two new metrics. The first was the percent of my searches during working hours on weekdays.
This matches a lot of the conclusions I drew from the previous plot about the impact of my internships on the number of searches I made. So much so I plotted the percent of searches during the workday against the number of searches I made per month.
My searches per month are in red, and the percent of searches during the workday are in blue. After the start of co-op, the months that see large jumps in the number of searches also see large increases in the percent of searches that occur during the workday. During my last school term, the percent of searches that occurred during the workday dropped drastically, while the number of searches per month skyrocketed. I again attribute this to getting my Google Home.
I also plotted the average number of searches made during each day of the week.
However, I don't think this graph is very meaningful. I think that the inclusion of data from the lifetime of my Google account means that in the longterm all days average out to be about the same. I think that selected data ranges would prove a higher average for weekdays during work semesters, but I chose to pursue other metrics.
What I Searched For
The search data came in the format of saved queries, or the entire string of what I searched for. The first thing I did was find out my top queries, which was skewed because slight variations in searches would result in different queries. My top queries, scrubbed for identifying information (represented by ~what it was~
) can be found at the bottom of the post.
There are some odd things in here, like a search from a greyhound station to UC that the only explanation I have for is that the search results weren't loading. I've made that trip once and definitely didn't need to plan it 36 times.
Because my top queries were so arbitrary, I broke queries up into individual words. The results (again scrubbed for identifying information) can be found below my top queries.
This data was a little harder to do things with, without having a way to categorize words the information is a little hard to digest. Clearly a number of searches relate to programming, which shows my largest use case for searching. A lot of results are also directions. If nothing else, this data is a window into my interests and Internet usage habits.
Running the Script
The source code I used to generate this data can be found on github. Usage instructions can be found in the README, including directions on downloading your Google search history. Happy coding!
My top queries and searched for words
********************************************************************************
Top Searched Queries
********************************************************************************
1:(71) google maps
2:(39) nearby gamestop
3:(38) uc onestop
4:(36) Greyhound Bus Lines, 1005 Gilbert Ave, Cincinnati, OH 45202 -> University of Cincinnati, 2600 Clifton Ave, Cincinnati, OH 45220
5:(35) tesla
6:(35) yes
7:(34) wolfram alpha
8:(32) add reminder
9:(28) easy bib
10:(26) weather
11:(24) movies
12:(24) stop
13:(20) google keep
14:(18) ~My Home Address~ -> ~My brother's home address~
15:(18) rap genius
16:(17) restaurants
17:(17) (Current Location) -> Play it Again Sports, 8223 Colerain Ave, Cincinnati, OH 45239
18:(17) saucony guide 5
19:(16) solipsism
20:(16) amazon
21:(16) galaxy note 10.1 2015
22:(16) indians
23:(15) university of cincinnati
24:(15) motel
25:(15) netflix
26:(15) chipotle
27:(14) gmail
28:(14) urban dictionary
29:(14) gannon university programming contest
30:(13) rate my professor
31:(13) hotel
32:(13) UC PAL
33:(13) (Current Location) -> San Francisco International Airport (SFO), San Francisco, CA
34:(13) the venue of athens
35:(13) new reminder
36:(13) what's the temperature
37:(13) uc pal
38:(12) google translate
39:(12) wfmj
40:(12) lil dicky kickstarter
41:(12) act
42:(12) dokku
43:(11) (Current Location) -> ~My parent's address~
44:(11) MUTF:VTTSX
45:(11) dokku remote rejected pre receive access hook declined
46:(11) pnc bank
47:(11) applebees
48:(11) Cincinnati, OH 45202 1
49:(11) samsung galaxy note 10.1 2015
50:(10) duke energy
51:(10) Cleveland Cavaliers score
52:(10) bogarts hoodie allen
53:(10) google calendar
54:(10) mobile access to cincinnati email
55:(10) ~My home address~ -> ~My brother's street~
56:(9) putt putt
57:(9) spotify
58:(9) wall street journal
59:(9) seafood
60:(9) 21 news youngstown
61:(9) google
62:(9) what's the weather today
63:(9) uc email
64:(9) food
65:(9) deploy express app to dokku
66:(9) directions home
67:(9) ~My girlfriend's name~
68:(9) gopherarun
69:(8) endocrine system
70:(8) github
71:(8) dunkin donuts
72:(8) (Current Location) -> ~My home address~
73:(8) wsj
74:(8) iowa caucus
75:(8) remind me to respond to Adam
76:(8) game theory
77:(8) gayathri
78:(8) netbeans
79:(8) somo
80:(8) ohio cicada map
81:(8) hacktoberfest
82:(7) spanish dict
83:(7) ~My brother's address~ -> ~My parent's address~
84:(7) Toronto, ON, Canada 1
85:(7) set an alarm for 8 a.m.
86:(7) food near me
87:(7) tmux cheat sheet
88:(7) the onion
89:(7) (Current Location) -> Grange Road, Santa Rosa, CA
90:(7) cavs
91:(7) northrop grumman baltimore
92:(7) Cleveland Indians
93:(7) math backgrounds
94:(7) indians standings
95:(7) uc ceas
96:(7) ohio state
97:(7) running quotes
98:(7) note 10.1 2015
99:(7) japanese spider crab
100:(7) mount st helens
101:(7) drizzy
********************************************************************************
Top Searched Words
********************************************************************************
0:(1265) to 200:( 30) chevrolet 400:( 18) hall
1:( 752) the 201:( 30) spotify 401:( 18) man
2:( 654) of 202:( 30) start 402:( 18) flash
3:( 626) how 203:( 30) 7 403:( 18) max
4:( 541) oh 204:( 30) good 404:( 18) howland
5:( 507) in 205:( 29) university, 405:( 18) version
6:( 483) a 206:( 29) toronto, 406:( 18) more
7:( 453) -> 207:( 29) ne, 407:( 18) random
8:( 378) is 208:( 29) wiki 408:( 18) record
9:( 370) on 209:( 29) much 409:( 18) reply
10:( 322) for 210:( 29) doesn't 410:( 18) iron
11:( 303) me 211:( 29) convert 411:( 18) picture
12:( 288) 212:( 29) matlab 412:( 18) stone
13:( 245) what 213:( 28) download 413:( 18) setup
14:( 242) (current 214:( 28) richmond, 414:( 18) developer
15:( 242) location) 215:( 28) ga 415:( 18) games
16:( 226) cincinnati, 216:( 28) two 416:( 18) weather
17:( 219) remind 217:( 28) remove 417:( 18) middle
18:( 212) python 218:( 28) directory 418:( 18) header
19:( 205) and 219:( 28) eastman 419:( 18) access
20:( 192) cincinnati 220:( 28) class 420:( 18) a.m.
21:( 181) ohio 221:( 28) studios 421:( 17) chicken
22:( 180) java 222:( 28) high 422:( 17) feminine
23:( 167) uc 223:( 28) without 423:( 17) library
24:( 166) google 224:( 28) center 424:( 17) cedar
25:( 164) c++ 225:( 28) what's 425:( 17) pittsburgh,
26:( 153) qt 226:( 28) song 426:( 17) value
27:( 153) university 227:( 28) like 427:( 17) link
28:( 146) git 228:( 28) server 428:( 17) ascii
29:( 136) set 229:( 28) 2016 429:( 17) meaning
30:( 131) do 230:( 28) off 430:( 17) main
31:( 127) state 231:( 28) car 431:( 17) know
32:( 126) does 232:( 27) cleveland 432:( 17) dayton
33:( 126) an 233:( 27) table 433:( 17) need
34:( 125) from 234:( 27) bash 434:( 17) goat
35:( 118) windows 235:( 27) childish 435:( 17) harvest
36:( 117) file 236:( 27) bar 436:( 17) water
37:( 112) define: 237:( 27) dead 437:( 17) doesnt
38:( 112) not 238:( 27) school 438:( 17) pay
39:( 109) galaxy 239:( 27) should 439:( 17) swing
40:( 106) my 240:( 27) view 440:( 17) control
41:( 105) i 241:( 26) was 441:( 17) scheme
42:( 99) you 242:( 26) video 442:( 17) american
43:( 98) vs 243:( 26) ocean 443:( 17) box
44:( 98) pi 244:( 26) image 444:( 17) act
45:( 98) get 245:( 26) valley 445:( 17) item
46:( 95) at 246:( 26) music 446:( 17) white
47:( 94) st, 247:( 26) w 447:( 17) football
48:( 93) with 248:( 26) usb 448:( 17) primary
49:( 93) new 249:( 26) 45219, 449:( 17) account
50:( 93) linux 250:( 26) clifton 450:( 17) example
51:( 90) vine 251:( 26) student 451:( 17) because
52:( 89) raspberry 252:( 26) running 452:( 17) netflix
53:( 80) command 253:( 26) take 453:( 17) birthday
54:( 79) work 254:( 26) npm 454:( 16) edit
55:( 79) when 255:( 25) college 455:( 16) york
56:( 77) rd, 256:( 25) array 456:( 16) c
57:( 76) up 257:( 25) program 457:( 16) drop
58:( 74) time 258:( 25) tour 458:( 16) 2014
59:( 73) genius 259:( 25) bad 459:( 16) mayer
60:( 72) 2 260:( 25) api 460:( 16) ap
61:( 72) are 261:( 25) store 461:( 16) year
62:( 72) it 262:( 25) xbox 462:( 16) monday
63:( 71) github 263:( 25) app 463:( 16) tell
64:( 70) ~house number~ 264:( 25) center, 464:( 16) =
65:( 69) warren, 265:( 25) project 465:( 16) sound
66:( 69) code 266:( 25) microsoft 466:( 16) keys
67:( 66) ubuntu 267:( 25) movie 467:( 16) youtube
68:( 66) make 268:( 24) put 468:( 16) tree
69:( 65) computer 269:( 24) quotes 469:( 16) hill
70:( 64) use 270:( 24) minutes 470:( 16) space
71:( 63) reddit 271:( 24) size 471:( 16) vim
72:( 62) run 272:( 24) best 472:( 16) big
73:( 62) where 273:( 24) columbus 473:( 16) window
74:( 61) call 274:( 24) 45150 474:( 16) savannah,
75:( 61) android 275:( 24) season 475:( 16) public
76:( 61) line 276:( 24) calendar 476:( 16) chipotle
77:( 60) change 277:( 24) word 477:( 16) same
78:( 60) have 278:( 24) internet 478:( 16) temperature
79:( 60) no 279:( 24) girl 479:( 16) spanish
80:( 60) text 280:( 24) all 480:( 16) ny
81:( 60) squires 281:( 24) card 481:( 16) free
82:( 59) play 282:( 24) software 482:( 16) group
83:( 58) tomorrow 283:( 24) django 483:( 16) pnc
84:( 56) install 284:( 24) pictures 484:( 16) meme
85:( 56) columbus, 285:( 24) buy 485:( 16) char
86:( 55) ~house number~ 286:( 24) chevy 486:( 16) 30
87:( 55) alarm 287:( 23) point 487:( 16) s
88:( 55) list 288:( 23) x 488:( 16) first
89:( 54) string 289:( 23) siemens 489:( 16) column
90:( 54) one 290:( 23) star 490:( 15) wallpaper
91:( 53) park 291:( 23) kanye 491:( 15) full
92:( 53) error 292:( 23) only 492:( 15) m
93:( 52) by 293:( 23) website 493:( 15) build
94:( 52) or 294:( 23) connect 494:( 15) creek
95:( 52) home 295:( 23) numbers 495:( 15) return
96:( 51) add 296:( 23) express 496:( 15) san
97:( 51) define 297:( 23) battery 497:( 15) cruze
98:( 51) about 298:( 23) multiple 498:( 15) office
99:( 51) ave, 299:( 23) hour 499:( 15) binding
100:( 51) create 300:( 23) toronto 500:( 15) safe
101:( 51) hours 301:( 23) power 501:( 15) panera
102:( 51) west 302:( 23) st 502:( 15) chris
103:( 49) drive 303:( 22) docker 503:( 15) down
104:( 49) samsung 304:( 22) has 504:( 15) watch
105:( 49) road, 305:( 22) folder 505:( 15) hackathon
106:( 49) out 306:( 22) 2000 506:( 15) digital
107:( 49) schedule 307:( 22) results 507:( 15) spitzer
108:( 48) can 308:( 22) things 508:( 15) bell
109:( 48) 3 309:( 22) east 509:( 15) over
110:( 48) if 310:( 22) did 510:( 15) lake
111:( 48) 1 311:( 22) pizza 511:( 15) kit
112:( 47) near 312:( 22) port 512:( 15) score
113:( 47) email 313:( 22) nx 513:( 15) people
114:( 46) visual 314:( 22) test 514:( 15) repo
115:( 45) directions 315:( 22) stop 515:( 15) merge
116:( 45) street, 316:( 22) 6 516:( 15) bank
117:( 44) usa 317:( 22) 45219 517:( 15) winery
118:( 44) dr, 318:( 22) data 518:( 15) pdf
119:( 44) find 319:( 22) today 519:( 15) frank
120:( 44) map 320:( 22) delete 520:( 15) food
121:( 44) that 321:( 22) - 521:( 14) synonyms
122:( 43) rap 322:( 22) old 522:( 14) top
123:( 43) drake 323:( 21) write 523:( 14) chromecast
124:( 43) phone 324:( 21) source 524:( 14) overflow
125:( 42) kent 325:( 21) laptop 525:( 14) richmond
126:( 42) this 326:( 21) zenbook 526:( 14) lil
127:( 42) sprint 327:( 21) road 527:( 14) media
128:( 41) be 328:( 21) 10.1 528:( 14) section
129:( 41) ssh 329:( 21) search 529:( 14) micro
130:( 41) canada 330:( 21) market 530:( 14) color
131:( 41) milford, 331:( 21) snapchat 531:( 14) auto
132:( 41) s6 332:( 21) web 532:( 14) latex
133:( 41) 10 333:( 21) haskell 533:( 14) 100
134:( 41) see 334:( 21) output 534:( 14) too
135:( 40) dokku 335:( 21) name 535:( 14) fairfield
136:( 40) states 336:( 21) disable 536:( 14) runescape
137:( 40) house 337:( 21) mean 537:( 14) tutorial
138:( 39) programming 338:( 21) world 538:( 14) html
139:( 39) avenue, 339:( 21) north 539:( 14) edition
140:( 39) 4 340:( 21) tesla 540:( 14) photos
141:( 39) ghost 341:( 21) database 541:( 14) age
142:( 39) day 342:( 21) analytics 542:( 14) model
143:( 39) menu 343:( 21) warren 543:( 14) community
144:( 39) acm 344:( 20) arm 544:( 14) 2013
145:( 39) mysql 345:( 20) commit 545:( 14) remote
146:( 38) amazon 346:( 20) book 546:( 14) super
147:( 38) united 347:( 20) dictionary 547:( 14) coffee
148:( 38) as 348:( 20) national 548:( 14) mail
149:( 38) john 349:( 20) volume 549:( 14) location
150:( 38) system 350:( 20) live 550:( 14) found
151:( 38) studio 351:( 20) wifi 551:( 14) money
152:( 38) game 352:( 20) after 552:( 14) just
153:( 38) twitter 353:( 20) function 553:( 14) mall
154:( 38) mongodb 354:( 20) park, 554:( 14) commons
155:( 37) update 355:( 20) osu 555:( 14) south
156:( 37) why 356:( 20) county 556:( 14) save
157:( 37) post 357:( 20) blue 557:( 14) between
158:( 36) your 358:( 20) lyrics 558:( 13) case
159:( 36) street 359:( 20) branch 559:( 13) movies
160:( 36) s3 360:( 20) cool 560:( 13) form
161:( 36) science 361:( 20) mystique 561:( 13) log
162:( 36) ln 362:( 20) user 562:( 13) cafe
163:( 36) number 363:( 20) graph 563:( 13) last
164:( 35) lane 364:( 20) real 564:( 13) installing
165:( 35) black 365:( 20) markdown 565:( 13) thomas
166:( 35) many 366:( 20) track 566:( 13) newport
167:( 35) show 367:( 20) maps 567:( 13) kentucky
168:( 35) ky 368:( 20) request 568:( 13) timer
169:( 34) files 369:( 19) sort 569:( 13) passport
170:( 34) pa 370:( 19) oh, 570:( 13) msi
171:( 34) note 371:( 19) surreywood 571:( 13) tmux
172:( 34) asus 372:( 19) history 572:( 13) bridge
173:( 34) gambino 373:( 19) nodejs 573:( 13) kill
174:( 33) drive, 374:( 19) read 574:( 13) won't
175:( 33) who 375:( 19) life 575:( 13) backup
176:( 33) & 376:( 19) monitor 576:( 13) cinemas
177:( 33) but 377:( 19) can't 577:( 13) message
178:( 33) back 378:( 19) hard 578:( 13) circle
179:( 32) city 379:( 19) password 579:( 13) cookie
180:( 32) morning 380:( 19) * 580:( 13) contest
181:( 32) minecraft 381:( 19) e 581:( 13) rd
182:( 32) indians 382:( 19) address 582:( 13) wikipedia
183:( 32) check 383:( 19) 8 583:( 13) desktop
184:( 32) using 384:( 19) outlook 584:( 13) order
185:( 32) javascript 385:( 19) speed 585:( 13) tickets
186:( 32) go 386:( 19) event 586:( 13) parking
187:( 32) 2015 387:( 19) fire 587:( 13) logo
188:( 32) northeast, 388:( 19) review 588:( 13) jtable
189:( 32) red 389:( 18) p.m. 589:( 13) now
190:( 32) long 390:( 18) copy 590:( 13) information
191:( 32) date 391:( 18) cast 591:( 13) grill
192:( 31) r 392:( 18) regal 592:( 13) amc
193:( 31) 5 393:( 18) ikea 593:( 13) night
194:( 31) into 394:( 18) keyboard 594:( 13) formula
195:( 31) open 395:( 18) river 595:( 13) short
196:( 31) cannot 396:( 18) calculator 596:( 13) youngstown
197:( 30) tonight 397:( 18) fallout 597:( 13) tools
198:( 30) va 398:( 18) service 598:( 13) ysu
199:( 30) node 399:( 18) key 599:( 13) dialog
200:( 30) chevrolet 400:( 18) hall 600:( 13) price
201:( 30) spotify 401:( 18) man 601:( 13) area