Analyzing My Google Search History

Analyzing My Google Search History

Google makes it really easy to download a list of everything you've ever searched, meticulously logged to the microsecond. I'm always looking for excuses to practice python, so over the latter part of my winter break I put a few hours into writing a python script that will show some metrics of my search history.

Getting started

First, I downloaded my Google history from Google Takeout, which is good practice anyways. I looked at a couple of the different data types and decided that I'd look at my search history, of which I have multiple years of data for.

Formatting the data

The first thing I needed to do was format the data into some usable structures. The search history is in json:

{"query":{"id":[{"timestamp_usec":"1372456830180971"}],"query_text":"Restaurants"}},

Not too bad. And yes, that's a lot of digits for a timestamp! That's because it's in microseconds.
Python's json library makes quick work of the files, which I organized into a map where the query was a key pointing towards a list of timestamps. I also recorded all time stamps into a single list for a frequency analysis.

Doing something with the data

The first thing I needed to do was convert the timestamps to a more usable format, which I did by converting them to days. Then using matplotlib, I ploted a bar graph of searches per day during the duration of the data.

Except that graph doesn't tell us too much - there's too much variance in the data and there's not a whole lot of meaning that can be drawn from individual days. So I converted days to weeks and plotted the number of searches per week as a line plot.

This is a bit more meaningful, and I can start noticing trends that align to life changes. It's a bit easier to see on a plot of my searches per month.

There are a few very noticeable valleys and peaks on this plot. Starting from the left, the first massive jump occurs during my senior year of high school. I attribute this to regular programming and school work. The summer of 2014 saw a drop in searches while I worked in a warehouse. I wasn't allowed on my phone and spent little time at the computer that summer, resulting in my searches dropping drastically. My searches go back up for the duration of my freshman year of college, where they stay relatively consistent. The first major peak in searches occurs in October of 2015, once I started regular programming at my first internship position. Following the end of my internship in December, my searches go down until the summer of 2017, where I was once again on an internship. As I entered school in the fall of 2016 my searches drop again, until I got my Google Home at the start of November, which causes my searches to spike up again. My searches fell during winter break, and will soon spike again once data comes in for my next internship.

The clear effects that working had on my search numbers inspired me to plot two new metrics. The first was the percent of my searches during working hours on weekdays.

This matches a lot of the conclusions I drew from the previous plot about the impact of my internships on the number of searches I made. So much so I plotted the percent of searches during the workday against the number of searches I made per month.

My searches per month are in red, and the percent of searches during the workday are in blue. After the start of co-op, the months that see large jumps in the number of searches also see large increases in the percent of searches that occur during the workday. During my last school term, the percent of searches that occurred during the workday dropped drastically, while the number of searches per month skyrocketed. I again attribute this to getting my Google Home.

I also plotted the average number of searches made during each day of the week.

However, I don't think this graph is very meaningful. I think that the inclusion of data from the lifetime of my Google account means that in the longterm all days average out to be about the same. I think that selected data ranges would prove a higher average for weekdays during work semesters, but I chose to pursue other metrics.

What I Searched For

The search data came in the format of saved queries, or the entire string of what I searched for. The first thing I did was find out my top queries, which was skewed because slight variations in searches would result in different queries. My top queries, scrubbed for identifying information (represented by ~what it was~) can be found at the bottom of the post.

There are some odd things in here, like a search from a greyhound station to UC that the only explanation I have for is that the search results weren't loading. I've made that trip once and definitely didn't need to plan it 36 times.

Because my top queries were so arbitrary, I broke queries up into individual words. The results (again scrubbed for identifying information) can be found below my top queries.

This data was a little harder to do things with, without having a way to categorize words the information is a little hard to digest. Clearly a number of searches relate to programming, which shows my largest use case for searching. A lot of results are also directions. If nothing else, this data is a window into my interests and Internet usage habits.

Running the Script

The source code I used to generate this data can be found on github. Usage instructions can be found in the README, including directions on downloading your Google search history. Happy coding!

My top queries and searched for words

********************************************************************************
                            Top Searched Queries
********************************************************************************
1:(71) google maps
2:(39) nearby gamestop
3:(38) uc onestop
4:(36) Greyhound Bus Lines, 1005 Gilbert Ave, Cincinnati, OH 45202 -> University of Cincinnati, 2600 Clifton Ave, Cincinnati, OH 45220
5:(35) tesla
6:(35) yes
7:(34) wolfram alpha
8:(32) add reminder
9:(28) easy bib
10:(26) weather
11:(24) movies
12:(24) stop
13:(20) google keep
14:(18) ~My Home Address~ -> ~My brother's home address~
15:(18) rap genius
16:(17) restaurants
17:(17) (Current Location) -> Play it Again Sports, 8223 Colerain Ave, Cincinnati, OH 45239
18:(17) saucony guide 5
19:(16) solipsism
20:(16) amazon
21:(16) galaxy note 10.1 2015
22:(16) indians
23:(15) university of cincinnati
24:(15) motel
25:(15) netflix
26:(15) chipotle
27:(14) gmail
28:(14) urban dictionary
29:(14) gannon university programming contest
30:(13) rate my professor
31:(13) hotel
32:(13) UC PAL
33:(13) (Current Location) -> San Francisco International Airport (SFO), San Francisco, CA
34:(13) the venue of athens
35:(13) new reminder
36:(13) what's the temperature
37:(13) uc pal
38:(12) google translate
39:(12) wfmj
40:(12) lil dicky kickstarter
41:(12) act
42:(12) dokku
43:(11) (Current Location) -> ~My parent's address~
44:(11) MUTF:VTTSX
45:(11) dokku remote rejected pre receive access hook declined
46:(11) pnc bank
47:(11) applebees
48:(11) Cincinnati, OH 45202 1
49:(11) samsung galaxy note 10.1 2015
50:(10) duke energy
51:(10) Cleveland Cavaliers score
52:(10) bogarts hoodie allen
53:(10) google calendar
54:(10) mobile access to cincinnati email
55:(10) ~My home address~ -> ~My brother's street~
56:(9) putt putt
57:(9) spotify
58:(9) wall street journal
59:(9) seafood
60:(9) 21 news youngstown
61:(9) google
62:(9) what's the weather today
63:(9) uc email
64:(9) food
65:(9) deploy express app to dokku
66:(9) directions home
67:(9) ~My girlfriend's name~
68:(9) gopherarun
69:(8) endocrine system
70:(8) github
71:(8) dunkin donuts
72:(8) (Current Location) -> ~My home address~
73:(8) wsj
74:(8) iowa caucus
75:(8) remind me to respond to Adam
76:(8) game theory
77:(8) gayathri
78:(8) netbeans
79:(8) somo
80:(8) ohio cicada map
81:(8) hacktoberfest
82:(7) spanish dict
83:(7) ~My brother's address~ -> ~My parent's address~
84:(7) Toronto, ON, Canada 1
85:(7) set an alarm for 8 a.m.
86:(7) food near me
87:(7) tmux cheat sheet
88:(7) the onion
89:(7) (Current Location) -> Grange Road, Santa Rosa, CA
90:(7) cavs
91:(7) northrop grumman baltimore
92:(7) Cleveland Indians
93:(7) math backgrounds
94:(7) indians standings
95:(7) uc ceas
96:(7) ohio state
97:(7) running quotes
98:(7) note 10.1 2015
99:(7) japanese spider crab
100:(7) mount st helens
101:(7) drizzy
********************************************************************************
                            Top Searched Words
********************************************************************************
  0:(1265) to              200:(  30) chevrolet       400:(  18) hall        
  1:( 752) the             201:(  30) spotify         401:(  18) man         
  2:( 654) of              202:(  30) start           402:(  18) flash       
  3:( 626) how             203:(  30) 7               403:(  18) max         
  4:( 541) oh              204:(  30) good            404:(  18) howland     
  5:( 507) in              205:(  29) university,     405:(  18) version     
  6:( 483) a               206:(  29) toronto,        406:(  18) more        
  7:( 453) ->              207:(  29) ne,             407:(  18) random      
  8:( 378) is              208:(  29) wiki            408:(  18) record      
  9:( 370) on              209:(  29) much            409:(  18) reply       
 10:( 322) for             210:(  29) doesn't         410:(  18) iron        
 11:( 303) me              211:(  29) convert         411:(  18) picture     
 12:( 288)                 212:(  29) matlab          412:(  18) stone       
 13:( 245) what            213:(  28) download        413:(  18) setup       
 14:( 242) (current        214:(  28) richmond,       414:(  18) developer   
 15:( 242) location)       215:(  28) ga              415:(  18) games       
 16:( 226) cincinnati,     216:(  28) two             416:(  18) weather     
 17:( 219) remind          217:(  28) remove          417:(  18) middle      
 18:( 212) python          218:(  28) directory       418:(  18) header      
 19:( 205) and             219:(  28) eastman         419:(  18) access      
 20:( 192) cincinnati      220:(  28) class           420:(  18) a.m.        
 21:( 181) ohio            221:(  28) studios         421:(  17) chicken     
 22:( 180) java            222:(  28) high            422:(  17) feminine    
 23:( 167) uc              223:(  28) without         423:(  17) library     
 24:( 166) google          224:(  28) center          424:(  17) cedar       
 25:( 164) c++             225:(  28) what's          425:(  17) pittsburgh, 
 26:( 153) qt              226:(  28) song            426:(  17) value       
 27:( 153) university      227:(  28) like            427:(  17) link        
 28:( 146) git             228:(  28) server          428:(  17) ascii       
 29:( 136) set             229:(  28) 2016            429:(  17) meaning     
 30:( 131) do              230:(  28) off             430:(  17) main        
 31:( 127) state           231:(  28) car             431:(  17) know        
 32:( 126) does            232:(  27) cleveland       432:(  17) dayton      
 33:( 126) an              233:(  27) table           433:(  17) need        
 34:( 125) from            234:(  27) bash            434:(  17) goat        
 35:( 118) windows         235:(  27) childish        435:(  17) harvest     
 36:( 117) file            236:(  27) bar             436:(  17) water       
 37:( 112) define:         237:(  27) dead            437:(  17) doesnt      
 38:( 112) not             238:(  27) school          438:(  17) pay         
 39:( 109) galaxy          239:(  27) should          439:(  17) swing       
 40:( 106) my              240:(  27) view            440:(  17) control     
 41:( 105) i               241:(  26) was             441:(  17) scheme      
 42:(  99) you             242:(  26) video           442:(  17) american    
 43:(  98) vs              243:(  26) ocean           443:(  17) box         
 44:(  98) pi              244:(  26) image           444:(  17) act         
 45:(  98) get             245:(  26) valley          445:(  17) item        
 46:(  95) at              246:(  26) music           446:(  17) white       
 47:(  94) st,             247:(  26) w               447:(  17) football    
 48:(  93) with            248:(  26) usb             448:(  17) primary     
 49:(  93) new             249:(  26) 45219,          449:(  17) account     
 50:(  93) linux           250:(  26) clifton         450:(  17) example     
 51:(  90) vine            251:(  26) student         451:(  17) because     
 52:(  89) raspberry       252:(  26) running         452:(  17) netflix     
 53:(  80) command         253:(  26) take            453:(  17) birthday    
 54:(  79) work            254:(  26) npm             454:(  16) edit        
 55:(  79) when            255:(  25) college         455:(  16) york        
 56:(  77) rd,             256:(  25) array           456:(  16) c           
 57:(  76) up              257:(  25) program         457:(  16) drop        
 58:(  74) time            258:(  25) tour            458:(  16) 2014        
 59:(  73) genius          259:(  25) bad             459:(  16) mayer       
 60:(  72) 2               260:(  25) api             460:(  16) ap          
 61:(  72) are             261:(  25) store           461:(  16) year        
 62:(  72) it              262:(  25) xbox            462:(  16) monday      
 63:(  71) github          263:(  25) app             463:(  16) tell        
 64:(  70) ~house number~  264:(  25) center,         464:(  16) =           
 65:(  69) warren,         265:(  25) project         465:(  16) sound       
 66:(  69) code            266:(  25) microsoft       466:(  16) keys        
 67:(  66) ubuntu          267:(  25) movie           467:(  16) youtube     
 68:(  66) make            268:(  24) put             468:(  16) tree        
 69:(  65) computer        269:(  24) quotes          469:(  16) hill        
 70:(  64) use             270:(  24) minutes         470:(  16) space       
 71:(  63) reddit          271:(  24) size            471:(  16) vim         
 72:(  62) run             272:(  24) best            472:(  16) big         
 73:(  62) where           273:(  24) columbus        473:(  16) window      
 74:(  61) call            274:(  24) 45150           474:(  16) savannah,   
 75:(  61) android         275:(  24) season          475:(  16) public      
 76:(  61) line            276:(  24) calendar        476:(  16) chipotle    
 77:(  60) change          277:(  24) word            477:(  16) same        
 78:(  60) have            278:(  24) internet        478:(  16) temperature 
 79:(  60) no              279:(  24) girl            479:(  16) spanish     
 80:(  60) text            280:(  24) all             480:(  16) ny          
 81:(  60) squires         281:(  24) card            481:(  16) free        
 82:(  59) play            282:(  24) software        482:(  16) group       
 83:(  58) tomorrow        283:(  24) django          483:(  16) pnc         
 84:(  56) install         284:(  24) pictures        484:(  16) meme        
 85:(  56) columbus,       285:(  24) buy             485:(  16) char        
 86:(  55) ~house number~  286:(  24) chevy           486:(  16) 30          
 87:(  55) alarm           287:(  23) point           487:(  16) s           
 88:(  55) list            288:(  23) x               488:(  16) first       
 89:(  54) string          289:(  23) siemens         489:(  16) column      
 90:(  54) one             290:(  23) star            490:(  15) wallpaper   
 91:(  53) park            291:(  23) kanye           491:(  15) full        
 92:(  53) error           292:(  23) only            492:(  15) m           
 93:(  52) by              293:(  23) website         493:(  15) build       
 94:(  52) or              294:(  23) connect         494:(  15) creek       
 95:(  52) home            295:(  23) numbers         495:(  15) return      
 96:(  51) add             296:(  23) express         496:(  15) san         
 97:(  51) define          297:(  23) battery         497:(  15) cruze       
 98:(  51) about           298:(  23) multiple        498:(  15) office      
 99:(  51) ave,            299:(  23) hour            499:(  15) binding     
100:(  51) create          300:(  23) toronto         500:(  15) safe        
101:(  51) hours           301:(  23) power           501:(  15) panera      
102:(  51) west            302:(  23) st              502:(  15) chris       
103:(  49) drive           303:(  22) docker          503:(  15) down        
104:(  49) samsung         304:(  22) has             504:(  15) watch       
105:(  49) road,           305:(  22) folder          505:(  15) hackathon   
106:(  49) out             306:(  22) 2000            506:(  15) digital     
107:(  49) schedule        307:(  22) results         507:(  15) spitzer     
108:(  48) can             308:(  22) things          508:(  15) bell        
109:(  48) 3               309:(  22) east            509:(  15) over        
110:(  48) if              310:(  22) did             510:(  15) lake        
111:(  48) 1               311:(  22) pizza           511:(  15) kit         
112:(  47) near            312:(  22) port            512:(  15) score       
113:(  47) email           313:(  22) nx              513:(  15) people      
114:(  46) visual          314:(  22) test            514:(  15) repo        
115:(  45) directions      315:(  22) stop            515:(  15) merge       
116:(  45) street,         316:(  22) 6               516:(  15) bank        
117:(  44) usa             317:(  22) 45219           517:(  15) winery      
118:(  44) dr,             318:(  22) data            518:(  15) pdf         
119:(  44) find            319:(  22) today           519:(  15) frank       
120:(  44) map             320:(  22) delete          520:(  15) food        
121:(  44) that            321:(  22) -               521:(  14) synonyms    
122:(  43) rap             322:(  22) old             522:(  14) top         
123:(  43) drake           323:(  21) write           523:(  14) chromecast  
124:(  43) phone           324:(  21) source          524:(  14) overflow    
125:(  42) kent            325:(  21) laptop          525:(  14) richmond    
126:(  42) this            326:(  21) zenbook         526:(  14) lil         
127:(  42) sprint          327:(  21) road            527:(  14) media       
128:(  41) be              328:(  21) 10.1            528:(  14) section     
129:(  41) ssh             329:(  21) search          529:(  14) micro       
130:(  41) canada          330:(  21) market          530:(  14) color       
131:(  41) milford,        331:(  21) snapchat        531:(  14) auto        
132:(  41) s6              332:(  21) web             532:(  14) latex       
133:(  41) 10              333:(  21) haskell         533:(  14) 100         
134:(  41) see             334:(  21) output          534:(  14) too         
135:(  40) dokku           335:(  21) name            535:(  14) fairfield   
136:(  40) states          336:(  21) disable         536:(  14) runescape   
137:(  40) house           337:(  21) mean            537:(  14) tutorial    
138:(  39) programming     338:(  21) world           538:(  14) html        
139:(  39) avenue,         339:(  21) north           539:(  14) edition     
140:(  39) 4               340:(  21) tesla           540:(  14) photos      
141:(  39) ghost           341:(  21) database        541:(  14) age         
142:(  39) day             342:(  21) analytics       542:(  14) model       
143:(  39) menu            343:(  21) warren          543:(  14) community   
144:(  39) acm             344:(  20) arm             544:(  14) 2013        
145:(  39) mysql           345:(  20) commit          545:(  14) remote      
146:(  38) amazon          346:(  20) book            546:(  14) super       
147:(  38) united          347:(  20) dictionary      547:(  14) coffee      
148:(  38) as              348:(  20) national        548:(  14) mail        
149:(  38) john            349:(  20) volume          549:(  14) location    
150:(  38) system          350:(  20) live            550:(  14) found       
151:(  38) studio          351:(  20) wifi            551:(  14) money       
152:(  38) game            352:(  20) after           552:(  14) just        
153:(  38) twitter         353:(  20) function        553:(  14) mall        
154:(  38) mongodb         354:(  20) park,           554:(  14) commons     
155:(  37) update          355:(  20) osu             555:(  14) south       
156:(  37) why             356:(  20) county          556:(  14) save        
157:(  37) post            357:(  20) blue            557:(  14) between     
158:(  36) your            358:(  20) lyrics          558:(  13) case        
159:(  36) street          359:(  20) branch          559:(  13) movies      
160:(  36) s3              360:(  20) cool            560:(  13) form        
161:(  36) science         361:(  20) mystique        561:(  13) log         
162:(  36) ln              362:(  20) user            562:(  13) cafe        
163:(  36) number          363:(  20) graph           563:(  13) last        
164:(  35) lane            364:(  20) real            564:(  13) installing  
165:(  35) black           365:(  20) markdown        565:(  13) thomas      
166:(  35) many            366:(  20) track           566:(  13) newport     
167:(  35) show            367:(  20) maps            567:(  13) kentucky    
168:(  35) ky              368:(  20) request         568:(  13) timer       
169:(  34) files           369:(  19) sort            569:(  13) passport    
170:(  34) pa              370:(  19) oh,             570:(  13) msi         
171:(  34) note            371:(  19) surreywood      571:(  13) tmux        
172:(  34) asus            372:(  19) history         572:(  13) bridge      
173:(  34) gambino         373:(  19) nodejs          573:(  13) kill        
174:(  33) drive,          374:(  19) read            574:(  13) won't       
175:(  33) who             375:(  19) life            575:(  13) backup      
176:(  33) &               376:(  19) monitor         576:(  13) cinemas     
177:(  33) but             377:(  19) can't           577:(  13) message     
178:(  33) back            378:(  19) hard            578:(  13) circle      
179:(  32) city            379:(  19) password        579:(  13) cookie      
180:(  32) morning         380:(  19) *               580:(  13) contest     
181:(  32) minecraft       381:(  19) e               581:(  13) rd          
182:(  32) indians         382:(  19) address         582:(  13) wikipedia   
183:(  32) check           383:(  19) 8               583:(  13) desktop     
184:(  32) using           384:(  19) outlook         584:(  13) order       
185:(  32) javascript      385:(  19) speed           585:(  13) tickets     
186:(  32) go              386:(  19) event           586:(  13) parking     
187:(  32) 2015            387:(  19) fire            587:(  13) logo        
188:(  32) northeast,      388:(  19) review          588:(  13) jtable      
189:(  32) red             389:(  18) p.m.            589:(  13) now         
190:(  32) long            390:(  18) copy            590:(  13) information 
191:(  32) date            391:(  18) cast            591:(  13) grill       
192:(  31) r               392:(  18) regal           592:(  13) amc         
193:(  31) 5               393:(  18) ikea            593:(  13) night       
194:(  31) into            394:(  18) keyboard        594:(  13) formula     
195:(  31) open            395:(  18) river           595:(  13) short       
196:(  31) cannot          396:(  18) calculator      596:(  13) youngstown  
197:(  30) tonight         397:(  18) fallout         597:(  13) tools       
198:(  30) va              398:(  18) service         598:(  13) ysu         
199:(  30) node            399:(  18) key             599:(  13) dialog      
200:(  30) chevrolet       400:(  18) hall            600:(  13) price       
201:(  30) spotify         401:(  18) man             601:(  13) area