Flywheel has an API call that shows drivers near you. When you open the app, the first thing the app does is make a call to see what drivers are near you, so it can show the yellow icons on the map. Because of this, the call to that API is a pretty good proxy for “how many people have our app open,” which means that we track it pretty closely — we have a monitor way up on our wall that shows a real-time feed of how often that API is being hit.
In the last couple of days, we’d noticed a big, very consistent increase in calls to that endpoint. Day and night, it was like a bunch of people were keeping their apps open — but never hailing a taxi. This made us suspicious, and we dove into our logs.
We discovered that a very large percentage of the traffic to that endpoint was coming from a single IP address. That one IP address was hitting the endpoint every second or two, and was rotating between asking for drivers near three different geographical locations, one corresponding to each of our major markets. We suspected that a competitor was trying to get business information from our app, and resolved to figure out who it was.
The driver search API is unauthenticated (that is, you do not need to be logged in in order to use it). We did a traceroute on the IP address the calls were coming from and got an unhelpful generic Comcast address. We checked a service that purports to put a lat/long on an IP address, and came up with an address somewhere in Pleasanton. (This, it turned out, was a total red herring, but I thought I’d show all of our work).
We used Google Maps and street-view to check out the address in Pleasanton — it was an unremarkable residence. We checked tax records — I can’t recall whether we were able to get the owner name from that. We checked our logs to see if we had had any users do location-aware calls from that area, and it turned out we had. That user account came with an email address, which we investigated, found an identity to put it to, and investigated the identity… but nothing really seemed to fit. No smoking guns. So we abandoned that avenue of investigation.
We figured that anyone who had scouted out our service enough to write a bot to harvest data from it had probably made an account with us at some point. So we looked way back at the first traffic from our offending IP address, and sure enough, the first bit of traffic from it was a login. This got us an account, and thus an email address. We searched the email address on Facebook and came up with a real name for the person — let’s call her Jane Smith (her real name was actually relatively unique) — and searched her name on LinkedIn. LinkedIn revealed that there was a Jane Smith who was an Android QA engineer, which fit the user-agent of the offending requests — they were all purportedly from Android Nexus devices. So we thought it was pretty likely that Jane was our culprit.
LinkedIn claimed that Jane worked for Motorola Mobility, but Jane’s location-aware calls (before she started claiming to be from three different points in our main markets) were from a non-descript address in Palo Alto, far from any Motorola campus. This also didn’t seem like a very Motorola sort of thing to do, so we suspected that Jane’s LinkedIn information was out-of-date.
Since the first call we’d seen from that IP address was a login, not a create-account, we knew that Jane must have hit us from other IP addresses, earlier. We searched our logs for her account ID, and found another IP address, also from Comcast. We searched our logs for that IP address, and saw a login for another account. That account had location calls from the same place that Jane did, so we figured it was very likely a colleague of Jane’s. We did email -> Facebook -> LinkedIn for that person as well — we’ll call him Bob Jones, but his real name was super distinctive — and saw that Bob’s LinkedIn listed him as working for a start-up. We checked that start-ups address, and it was the place that Jane and Bob had made locative calls from.
An article in the tech journalism world noted that the start-up in question made specific-purpose Android devices (and provided additional confirmation that Jane worked for the startup, by means of name-checking a previous employer of hers), which seemed like it fit what we saw, and we developed the hypothesis that they were prototyping some kind of device that wanted to know about taxis, and using us as a data source. Not exactly ideal, but it didn’t seem like they were a direct competitor.
A WHOIS on the company’s website revealed the phone number of the CEO of the company. My boss called him and asked him about the traffic. The CEO was more than a bit bewildered by the cold call from someone who could refer to several of his employees by name, and the whole thing seems like it will be resolved amicably.
- The email -> Facebook -> LinkedIn technique is really powerful. You can just put someone’s email into Facebook’s search, and at minimum you’re likely to get their real name. A search on LinkedIn for a real name will list profiles, and you can learn a lot about someone based on their work history. Especially if your real name is fairly unique, you may want to have an email in Facebook that is not one that you use elsewhere, to prevent this chain from occurring.
- Locational services like ours provide a lot of scary contextual information that allows you to draw inferences about people in a bunch of ways. We don’t expose this information, of course, but I’m probably going to suggest that we lessen the precision of it even for ourselves.
- From anecdotes from other members of the team, it’s very likely that people who are doing sketchy things with your API will scout it with an easily traceable “real” account that can be correlated with the IP of the attack. This is a bad idea, guys! Use Tor!
So that was an interesting day at work.