Get Business Data with Yelp API and Python

Get Business Data with Yelp API and Python

There are a number of ways to get data from the web, and some useful ready to use free tools like APIs from some sources like Google, Foursquare or Yelp.

In this case we are going to use the Yelp API, and save some time scraping business with Python packages like, Beautiful Soup or Scrapy. So let's explore some features to develop a Data Analysis with the data gathered.

First we need to create a Developer account on Yelp. On the option Yelp Fusion, we can choose the REST API to connect with businesses. An important thing to have in mind is the rate limit of the API which is 5000 API calls per 24 hours.

After creating an account, an API Key will be generated for our application. In this case is using the Python language. All the REST endpoints are allowed to do the GET method.

Using Python the starting code would be like the following:

import requests
api_key= "YOUR_API_KEY"

url = "https://api.yelp.com/v3/businesses/search"

headers ={
    "Authorization": "Bearer"+ api_key
} 

params={
    "term": "Bars"
    "location": "Cologne"
} 

response = requests.get(url, headers=headers, params=params)
result = response.json()
print(result)

As we can see in the code, the result is requested to return a list of dictionaries. Then the answer for this request is as follows:

{'businesses':
 [{'id': 'cFQxH8dxSIETRe3M7KkesQ',
 'alias': 'treffpunkt-köln-4', 
'name': 'Treffpunkt', 
'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/5xTekBIGIJQ2yHxNsuMSvA/o.jpg', 
'is_closed': False, 
'url': 'https://www.yelp.com/biz/treffpunkt-k%C3%B6ln-4?adjust_creative=GEhWmzIpqBnTem3JWAfkdg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=GEhWmzIpqBnTem3JWAfkdg', 
'review_count': 2, 
'categories': [{'alias': 'divebars', 'title': 'Dive Bars'}], 
'rating': 5.0, 
'coordinates': {'latitude': 50.881554, 'longitude': 7.092187},
 'transactions': [], 
'location': {'address1': 'Waldstr. 88', 'address2': '', 
'address3': '', 
'city': 'Cologne', 
'zip_code': '51145', 
'country': 'DE', 
'state': 'NW', 
'display_address': ['Waldstr. 88', '51145 Cologne', 'Germany']}, 
'phone': '', 
'display_phone': '', 
'distance': 11679.138017139663}, ..., ...]

If we want to get presented visually in a better way we can make the next changes to the code:

response = requests.get(url, headers=headers, params=params)
result = response.json()['businesses']
names =[business["name"] for business in result]  
print(names)

The output of Bars in Cologne is:

['Treffpunkt', 'Harry´s New York Bar', 'Ona Mor', 'Einstein', 'Soul Bar', 'Santiago De Cuba', 'Lapidarium', 'Low Budget', 'One Night Club', 'Durst', 'Legends Bar', 'Lommerzheim', "Papa Joe's", 'The Corkonian', 'Toddy Tapper', 'Zum Köbes', 'SUDERMAN', 'BarFly', 'Die Kunstbar', 'Barney Vallelys']

Making a new request changing the type of business for Hair Salons would be:

['HAIR Relax & Lounge', 'Cut & Colour', 'One Head', 'Kastenbein & Bosch', 'curly&straight', 'Marcel Michels', "Jag's Hair Douglas", 'Since Eleven - Friseure', 'We Love Hair', 'Aristocutz', 'Frisuren Mode Kiefer', 'Hairkiller', 'Kopfsalat', 'B/B Bauer Bauer Hairdressers', 'Die Haarwerkstatt', 'Svenja Willmes', 'Carmelo', 'GFG Hair & Styling Pulheim', 'Salon Schnittig', 'Aspekt Kosmetik Mobile Visagistin']

Requesting the API just in that way, it will return just the first 20 businesses. To solve that and get more data from the API, we have to create a loop of 50 batches to get up to 1000, that in fact is the limit of this endpoint.

import requests

url = "https://api.yelp.com/v3/businesses/search"
api_key= "YOUR_API_KEY"

headers ={
    "Authorization": "Bearer "+ api_key
} 


def get_biz(offset):
    """
    the function gets the names of business from Yelp API
    """
    params={
        "term": "restaurants",
        "location": "Cologne",
        "limit": 50,
        "offset": offset
    } 

    try:
        response = requests.get(url, headers=headers, params=params)
        result = response.json()['businesses']
        names = [business["name"] for business in result]  

    except:
        print("Reached the limit")
        names=[]  

    return names

if __name__ == '__main__':
    biz=[] 

    for i in range(1, 1001, 50):
        biz.append(get_biz(i))

    print(biz)

Further work

To get more information than just the names of business, we can make some changes in the code and insert the date in a Data Frame using Pandas. Then we can visualize the statistics, and even we can use that coordinates to point them in the map.