Foodie’s Guide to Finding a Home in Bahrain

By clustering neighborhoods based on popular venues surrounding them
ml
sklearn
Published

September 10, 2021

I am a foodie looking for a place stay in Bahrain. I want to study certain areas in Bahrain and the kind of restaurants that surround them.

I think that a lot of people, not just the youth could benefit from this, since the issue isn’t just finding a decent place to stay in Bahrain, but finding one that best serves their culinary interests perhaps.

I mean there are obviously much better factors to look at besides food. However for this problem I want to stick to what I can gain from Foursquare with a free license. Thus, by neatly categorizing areas based on their attributes such as frequency of coffee shops, closeness to malls etc; I can can make a better guesstimate of where they might stay.

Foursquare allows us to grab information on venues surrounding a given location, and therefore we will look into the most frequent kind of venues surrounding a given area, and cluster areas them on that.

So let’s get started!

Scrap Bahrain Cities/Town Data from Wikipedia

I need to scrap data from Wikipedia to lookup towns and cities in Bahrain. We’re going to use the popular webscraper Beautiful Soup to do that.

url = 'https://en.wikipedia.org/wiki/Category:Populated_places_in_Bahrain'
html_doc = requests.get(url).text # Get HTML Doc
soup = BeautifulSoup(html_doc, 'html.parser') # Parse using bs4
blocks = soup.find_all("div", {"class": "mw-category-group"})[1:]

bh_data=[]
for block in blocks:
    places = block.find('ul').find_all('li')
    for place in places:
        bh_data.append(place.a.text.split(',')[0])

bh_data = pd.DataFrame(bh_data, columns=['Area'])
remove_places = ['Rifa and Southern Region', 'Northern City'] # Exclude these places
bh_data = bh_data[bh_data['Area'].apply(lambda item : item not in remove_places)].reset_index(drop=True)
bh_data.head(5)
Area
0 A'ali
1 Abu Baham
2 Abu Saiba
3 Al Garrya
4 Al Hajar

So there are about 82 areas in Bahrain to study.

Retrieving Coordinates via a Geocoder

After that, we need to geocode them; convert them from a simple address to latitude & longitude values.

Popular geocoders like OpenStreetMap & Map Quest will be used.

import os
apikey = "API-KEY-XXXXXXXXXXX"
import geocoder

lats = []
lngs = []
for city in bh_data['Area']:
    geocoder_type = 'osm'
    try:
        g = geocoder.osm(f"{city}, Bahrain", key=apikey)
        geodata = g.json
        lats.append(geodata['lat'])
    except:
        geocoder_type = 'MAPQUEST'
        g = geocoder.mapquest(f"{city}, Bahrain", key=apikey)
        geodata = g.json
        lats.append(geodata['lat'])
    lngs.append(geodata['lng'])
    print(city, "|", geocoder_type)
bh_data['Latitude'] = lats
bh_data['Longitude'] = lngs

These are the first few of them that were geocoded!

Area Latitude Longitude
0 A'ali 26.154454 50.527364
1 Abu Baham 26.205737 50.541668
2 Abu Saiba 30.325299 48.266157
3 Al Garrya 26.232690 50.578110
4 Al Hajar 26.225405 50.590138

Visualization on a Map

We will now use Folium to visualize the map of Bahrain along with each area as points on the map

# create map of Bahrain using latitude and longitude values
latitude, longitude = 26.0766404, 50.334118

map_bahrain = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city in zip(bh_data['Latitude'], bh_data['Longitude'],
                                           bh_data['Area']):
    
    label = folium.Popup(city, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=True).add_to(map_bahrain)  
map_bahrain
Make this Notebook Trusted to load map: File -> Trust Notebook

Foursquare: Exploring Areas for Food Places

Futhermore, we’ll leverage the Foursquare API to gather the most common types of restaurants associated with an area within 500m of its center. We’ll then look at various food places and restaurants and extract their types for further analysis.

Note: To filter only restaurants & food places, we will use the specific “Food” category ID : 4d4b7105d754a06374d81259

food_categoryId = "4d4b7105d754a06374d81259"

Alright, let’s look at all food places surrouding the first area within a 500m radius

… which happens to be A’ali

radius = 500
lat, lng = bh_data[['Latitude', 'Longitude']].iloc[0].values

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, food_categoryId, radius, LIMIT)
results = requests.get(url).json()

Looking at the first food place in the results json, we get this output:

results['response']['venues'][0]
{'id': '4e99da8f8231878c15393aa2',
 'name': 'Costa Coffee',
 'location': {'lat': 26.157464331750106,
  'lng': 50.52587327276449,
  'labeledLatLngs': [{'label': 'display',
    'lat': 26.157464331750106,
    'lng': 50.52587327276449}],
  'distance': 366,
  'cc': 'BH',
  'city': 'Madīnat ‘Īsá',
  'state': 'al Muḩāfaz̧ah Al Janūbīyah',
  'country': 'البحرين',
  'formattedAddress': ['Madīnat ‘Īsá', 'البحرين']},
 'categories': [{'id': '4bf58dd8d48988d1e0931735',
   'name': 'Coffee Shop',
   'pluralName': 'Coffee Shops',
   'shortName': 'Coffee Shop',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
    'suffix': '.png'},
   'primary': True}],
 'referralId': 'v-1631999729',
 'hasPerk': False}

Thie first venue is Costa Coffee, and has a category: Coffee Shop

So now let’s build a helpful function to extract the category of each food place. We’ll use the same area as an example.

# function that extracts the category of the restaurant
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['venues']
    
nearby_food = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_food = nearby_food.loc[:, filtered_columns]

# filter the category for each row
nearby_food['categories'] = nearby_food.apply(get_category_type, axis=1)

# clean columns
nearby_food.columns = [col.split(".")[-1] for col in nearby_food.columns]

nearby_food.head()
name categories lat lng
0 Costa Coffee Coffee Shop 26.157464 50.525873
1 Chilis Aali Diner 26.152996 50.526268
2 Loop Cafe Café 26.156017 50.531527
3 Hospital Resturant (كافيتيريا المستشفى) Restaurant 26.153012 50.526232
4 كفتيريا المستشفى Restaurant 26.153455 50.528375

These are some of them, in total it returns 19 food places around A’ali.

Exploring All Areas

We’ve got still got 82 places to explore, so let’s create a function to do this task much faster.

def getNearbyFoods(names, latitudes, longitudes, radius=500):
    food_categoryId = "4d4b7105d754a06374d81259"
    foods_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            food_categoryId,
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['venues']
        except:
            print(results)
            raise KeyError
        
        venue_list = []
        # return only relevant information for each nearby food place
        for v in results:
            vname, vlat, vlng = v['name'], v['location']['lat'], v['location']['lng']
            try:
                vcategory = v['categories'][0]['name']
                venue_list.append((name, 
                                    lat, 
                                    lng,
                                    vname, 
                                    vlat,
                                    vlng,
                                    vcategory))
            except:
                continue
        foods_list.append(venue_list)
    nearby_foods = pd.DataFrame([item for venue_list in foods_list for item in venue_list])
    nearby_foods.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude',
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_foods)

We run the above function on each area and create a new dataframe called bh_foods.

bh_food = getNearbyFoods(bh_data['Area'], bh_data['Latitude'], 
                                   bh_data['Longitude'], 500)
bh_food.head()
Area Area Latitude Area Longitude Venue Venue Latitude Venue Longitude Venue Category
0 A'ali 26.154454 50.527364 Costa Coffee 26.157464 50.525873 Coffee Shop
1 A'ali 26.154454 50.527364 Chilis Aali 26.152996 50.526268 Diner
2 A'ali 26.154454 50.527364 Loop Cafe 26.156017 50.531527 Café
3 A'ali 26.154454 50.527364 كفتيريا المستشفى 26.153455 50.528375 Restaurant
4 A'ali 26.154454 50.527364 Hospital Resturant (كافيتيريا المستشفى) 26.153012 50.526232 Restaurant

This gives us a whopping 1962 food places

We can study the count for each area…

bh_food.groupby('Area').count().head()
Area Latitude Area Longitude Venue Venue Latitude Venue Longitude Venue Category
Area
A'ali 19 19 19 19 19 19
Abu Baham 9 9 9 9 9 9
Al Daih 50 50 50 50 50 50
Al Dair 9 9 9 9 9 9
Al Garrya 50 50 50 50 50 50

We’ve trimmed out the remaining areas for brevity’s sake.

What’s interesting to notice from this data, is that there are 88 unique categories for food.

Some of them include: Coffee Shop, Diner, Café, Restaurant, Breakfast Spot and so on.

Most Common Food Places

Our solution relies on segmenting areas based on the most common type of food places within that area. This gives us an idea about the kind of area it is from a culinary point-of-view, and allowing us to make judgments on whether the food is ideal to our taste or not. We also want to factor in the total number of food places within an area since some places in Bahrain may not be ideal to live in if they don’t even have enough places to eat.

Using the dataframe bh_food, we form a one-hot encoding of the Venue Category field that produces new columns for each category. Each record in this table corresponds to a certain venue and a 1 is placed in the category field for that area. The only other field that is retained is the area name. We will call this bh_onehot.

# one hot encoding
bh_onehot = pd.get_dummies(bh_food[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bh_onehot = pd.concat([bh_food[['Area']], bh_onehot], axis=1) 

bh_onehot.head()
Area Afghan Restaurant African Restaurant American Restaurant Arepa Restaurant Asian Restaurant BBQ Joint Bagel Shop Bakery Bistro Breakfast Spot Bubble Tea Shop Buffet Burger Joint Burrito Place Cafeteria Café Chaat Place Chinese Restaurant Coffee Shop College Lab Comfort Food Restaurant Creperie Cuban Restaurant Cupcake Shop Deli / Bodega Dessert Shop Diner Doner Restaurant Donut Shop Dumpling Restaurant Eastern European Restaurant Egyptian Restaurant Falafel Restaurant Farmers Market Fast Food Restaurant Filipino Restaurant Fish & Chips Shop Food Food Court Food Truck French Restaurant Fried Chicken Joint Frozen Yogurt Shop Gas Station Gastropub Greek Restaurant Halal Restaurant Hookah Bar Hot Dog Joint Ice Cream Shop Indian Restaurant Iraqi Restaurant Italian Restaurant Japanese Restaurant Juice Bar Kebab Restaurant Korean Restaurant Lebanese Restaurant Mediterranean Restaurant Mexican Restaurant Middle Eastern Restaurant Moroccan Restaurant Movie Theater New American Restaurant Noodle House Pastry Shop Persian Restaurant Pie Shop Pizza Place Portuguese Restaurant Restaurant Salad Place Sandwich Place Seafood Restaurant Shawarma Place Snack Place South Indian Restaurant Steakhouse Supermarket Sushi Restaurant Tea Room Thai Restaurant Theme Restaurant Tibetan Restaurant Turkish Restaurant Vegetarian / Vegan Restaurant Vietnamese Restaurant Wings Joint
0 A'ali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 A'ali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 A'ali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 A'ali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 A'ali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Now, let’s group rows by area and by taking the mean of the frequency of occurrence of each category, along with the number of food places surrouding it (NumberOfFoodPlaces).

Looking at the number of food places is significant considering that some areas have fewer restaurants, and could be a valid factor to segment, if a “foodie” is looking for a place to stay.

bh_grouped = bh_onehot.groupby(['Area']).mean().reset_index()
bh_grouped['NumberOfFoodPlaces'] = bh_onehot[['Area']].value_counts(sort=False).values
bh_grouped.head()
Area Afghan Restaurant African Restaurant American Restaurant Arepa Restaurant Asian Restaurant BBQ Joint Bagel Shop Bakery Bistro Breakfast Spot Bubble Tea Shop Buffet Burger Joint Burrito Place Cafeteria Café Chaat Place Chinese Restaurant Coffee Shop College Lab Comfort Food Restaurant Creperie Cuban Restaurant Cupcake Shop Deli / Bodega Dessert Shop Diner Doner Restaurant Donut Shop Dumpling Restaurant Eastern European Restaurant Egyptian Restaurant Falafel Restaurant Farmers Market Fast Food Restaurant Filipino Restaurant Fish & Chips Shop Food Food Court Food Truck French Restaurant Fried Chicken Joint Frozen Yogurt Shop Gas Station Gastropub Greek Restaurant Halal Restaurant Hookah Bar Hot Dog Joint Ice Cream Shop Indian Restaurant Iraqi Restaurant Italian Restaurant Japanese Restaurant Juice Bar Kebab Restaurant Korean Restaurant Lebanese Restaurant Mediterranean Restaurant Mexican Restaurant Middle Eastern Restaurant Moroccan Restaurant Movie Theater New American Restaurant Noodle House Pastry Shop Persian Restaurant Pie Shop Pizza Place Portuguese Restaurant Restaurant Salad Place Sandwich Place Seafood Restaurant Shawarma Place Snack Place South Indian Restaurant Steakhouse Supermarket Sushi Restaurant Tea Room Thai Restaurant Theme Restaurant Tibetan Restaurant Turkish Restaurant Vegetarian / Vegan Restaurant Vietnamese Restaurant Wings Joint NumberOfFoodPlaces
0 A'ali 0.0 0.0 0.00 0.00 0.00 0.000000 0.0 0.052632 0.0 0.052632 0.0 0.0 0.00 0.0 0.000000 0.210526 0.0 0.0 0.105263 0.0 0.0 0.0 0.0 0.105263 0.0 0.00 0.052632 0.0 0.000000 0.0 0.0 0.0 0.052632 0.0 0.000000 0.00 0.000000 0.052632 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.0 0.00 0.000000 0.0 0.052632 0.0 0.0 0.0 0.000000 0.0 0.052632 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.00 0.0 0.157895 0.0 0.052632 0.0 0.0 0.00 0.0 0.00 0.00 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 19
1 Abu Baham 0.0 0.0 0.00 0.00 0.00 0.111111 0.0 0.000000 0.0 0.000000 0.0 0.0 0.00 0.0 0.111111 0.000000 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.00 0.000000 0.0 0.111111 0.0 0.0 0.0 0.000000 0.0 0.000000 0.00 0.111111 0.000000 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.111111 0.0 0.00 0.000000 0.0 0.000000 0.0 0.0 0.0 0.111111 0.0 0.222222 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.00 0.0 0.111111 0.0 0.000000 0.0 0.0 0.00 0.0 0.00 0.00 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 9
2 Al Daih 0.0 0.0 0.02 0.00 0.02 0.020000 0.0 0.140000 0.0 0.100000 0.0 0.0 0.04 0.0 0.020000 0.060000 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.06 0.040000 0.0 0.000000 0.0 0.0 0.0 0.020000 0.0 0.000000 0.00 0.000000 0.000000 0.00 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.02 0.0 0.0 0.020000 0.0 0.00 0.020000 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.160000 0.0 0.0 0.0 0.0 0.0 0.02 0.0 0.00 0.0 0.060000 0.0 0.020000 0.0 0.0 0.00 0.0 0.04 0.02 0.0 0.00 0.00 0.0 0.0 0.06 0.0 0.0 0.0 50
3 Al Dair 0.0 0.0 0.00 0.00 0.00 0.111111 0.0 0.555556 0.0 0.000000 0.0 0.0 0.00 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.000000 0.0 0.00 0.000000 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.111111 0.00 0.000000 0.000000 0.00 0.0 0.0 0.00 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.0 0.00 0.111111 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.00 0.0 0.111111 0.0 0.000000 0.0 0.0 0.00 0.0 0.00 0.00 0.0 0.00 0.00 0.0 0.0 0.00 0.0 0.0 0.0 9
4 Al Garrya 0.0 0.0 0.02 0.02 0.00 0.020000 0.0 0.020000 0.0 0.120000 0.0 0.0 0.02 0.0 0.020000 0.060000 0.0 0.0 0.060000 0.0 0.0 0.0 0.0 0.000000 0.0 0.02 0.020000 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.040000 0.06 0.000000 0.000000 0.02 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.00 0.0 0.0 0.000000 0.1 0.02 0.000000 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.020000 0.0 0.0 0.0 0.0 0.0 0.02 0.0 0.02 0.0 0.180000 0.0 0.000000 0.0 0.0 0.02 0.0 0.02 0.00 0.0 0.02 0.02 0.0 0.0 0.00 0.0 0.0 0.0 50

Let’s call this this bh_grouped. Now that we have this processed information, we can analyze this data more clearly by reordering it so that only the 10 most common type of food places for an area are retained.

# Function to sort venues by most common ones
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:-1]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area', 'NumberOfFoodPlaces']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Food Place'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Food Place'.format(ind+1))

# create a new dataframe
foods_sorted = pd.DataFrame(columns=columns)
foods_sorted[['Area','NumberOfFoodPlaces']] = bh_grouped[['Area','NumberOfFoodPlaces']]

for ind in np.arange(bh_grouped.shape[0]):
    foods_sorted.iloc[ind, 2:] = return_most_common_venues(bh_grouped.iloc[ind, :], num_top_venues)

# Get the count    
foods_sorted.head()
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
0 A'ali 19 Café Restaurant Coffee Shop Cupcake Shop Breakfast Spot Food Sandwich Place Falafel Restaurant Middle Eastern Restaurant Bakery
1 Abu Baham 9 Middle Eastern Restaurant Cafeteria Ice Cream Shop Donut Shop BBQ Joint Restaurant Fish & Chips Shop Mediterranean Restaurant Afghan Restaurant New American Restaurant
2 Al Daih 50 Middle Eastern Restaurant Bakery Breakfast Spot Dessert Shop Café Restaurant Turkish Restaurant Burger Joint Diner Steakhouse
3 Al Dair 9 Bakery Restaurant BBQ Joint Italian Restaurant Fast Food Restaurant Afghan Restaurant Lebanese Restaurant Noodle House New American Restaurant Movie Theater
4 Al Garrya 50 Restaurant Breakfast Spot Indian Restaurant Coffee Shop Filipino Restaurant Café Fried Chicken Joint Fast Food Restaurant Diner Middle Eastern Restaurant

Let’s call this table foods_sorted.

Cluster Areas

Now we are ready for further analysis and clustering. We will use the bh_grouped dataframe since it contains the necessary numerical values for machine learning.

Our feature set is comprised of all the food categories (10 features).

We are excluding the NumberOfFoodPlaces feature as input to the ML model, since our problem requires segmenting areas by the type of food available. This quantity is only relevant to us to finally decide whether to live in an area or not.

A more concrete reason to exclude it, is the fact that there are all sorts of factors involved that we’re neglecting due to lack of data, such as living costs, access to public transport etc.

This is a foodie’s guide to finding a place, and this venture shouldn’t be bogged-down by the fact that there are sometimes fewer number of restaurants than one would expect.

Our target value will be cluster labels.

For our machine learning analysis, we will use the simplest clustering algorithm to separate the areas which is K-Means Clustering; an unsupervised machine learning approach to serve our purpose. We’ll use the popular machine learning library Sci-Kit Learn to do that in python.

We’ll run k-means to group the areas into 5 clusters. We pick this number for the sake of examination. We’ll fit the model on the entire data to learn these clusters.

# set number of clusters
kclusters = 5

bh_grouped_clustering = bh_grouped.drop(['Area','NumberOfFoodPlaces'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
array([1, 0, 0, 2, 0, 0, 0, 0, 1, 1], dtype=int32)

Let’s create a new dataframe bh_merged that includes the cluster as well as the top 10 food places for each area.

# add clustering labels
try:
    foods_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except:
    # Allows me to retry if the Cluster Labels column exists
    foods_sorted['Cluster Labels'] = kmeans.labels_

bh_merged = bh_data

# merge bh_grouped with bh_data to add latitude/longitude for each neighborhood
bh_merged = bh_merged.join(foods_sorted.set_index('Area'), on='Area')
bh_merged.dropna(how='any', axis=0, inplace=True)
bh_merged['Cluster Labels'] = bh_merged['Cluster Labels'].astype(np.int32)
bh_merged.head() # check the last columns!
Area Latitude Longitude Cluster Labels NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
0 A'ali 26.154454 50.527364 1 19.0 Café Restaurant Coffee Shop Cupcake Shop Breakfast Spot Food Sandwich Place Falafel Restaurant Middle Eastern Restaurant Bakery
1 Abu Baham 26.205737 50.541668 0 9.0 Middle Eastern Restaurant Cafeteria Ice Cream Shop Donut Shop BBQ Joint Restaurant Fish & Chips Shop Mediterranean Restaurant Afghan Restaurant New American Restaurant
3 Al Garrya 26.232690 50.578110 0 50.0 Restaurant Breakfast Spot Indian Restaurant Coffee Shop Filipino Restaurant Café Fried Chicken Joint Fast Food Restaurant Diner Middle Eastern Restaurant
4 Al Hajar 26.225405 50.590138 0 49.0 Café Filipino Restaurant Middle Eastern Restaurant Fast Food Restaurant Coffee Shop Asian Restaurant Indian Restaurant Pizza Place BBQ Joint Restaurant
5 Al Kharijiya 26.160230 50.609140 0 16.0 Cafeteria Asian Restaurant Fast Food Restaurant Bakery Wings Joint Pizza Place Falafel Restaurant Middle Eastern Restaurant Café Food Court

Finally, let’s visualize the resulting clusters

Make this Notebook Trusted to load map: File -> Trust Notebook

Examine Clusters & Final Conclusion

Now, we can examine & determine the discriminating characteristics of each cluster.

Cluster 1

cluster1 = bh_merged.loc[bh_merged['Cluster Labels'] == 0, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster1
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
1 Abu Baham 9.0 Middle Eastern Restaurant Cafeteria Ice Cream Shop Donut Shop BBQ Joint Restaurant Fish & Chips Shop Mediterranean Restaurant Afghan Restaurant New American Restaurant
3 Al Garrya 50.0 Restaurant Breakfast Spot Indian Restaurant Coffee Shop Filipino Restaurant Café Fried Chicken Joint Fast Food Restaurant Diner Middle Eastern Restaurant
4 Al Hajar 49.0 Café Filipino Restaurant Middle Eastern Restaurant Fast Food Restaurant Coffee Shop Asian Restaurant Indian Restaurant Pizza Place BBQ Joint Restaurant
5 Al Kharijiya 16.0 Cafeteria Asian Restaurant Fast Food Restaurant Bakery Wings Joint Pizza Place Falafel Restaurant Middle Eastern Restaurant Café Food Court
12 Arad 50.0 Middle Eastern Restaurant Restaurant Dessert Shop Burger Joint Café Fast Food Restaurant Diner Ice Cream Shop Bakery Sandwich Place
15 Budaiya 36.0 Middle Eastern Restaurant Bakery Cafeteria Burger Joint Seafood Restaurant Tea Room Café Sandwich Place Ice Cream Shop Restaurant
16 Jid Ali 48.0 Restaurant Middle Eastern Restaurant Café Italian Restaurant Coffee Shop Breakfast Spot Dessert Shop Pizza Place Diner Seafood Restaurant
18 Bani Jamra 19.0 Breakfast Spot Restaurant Café Cafeteria Middle Eastern Restaurant Vegetarian / Vegan Restaurant Bakery Snack Place Indian Restaurant Fast Food Restaurant
19 Barbar 6.0 Pizza Place BBQ Joint Bakery Sandwich Place Middle Eastern Restaurant Juice Bar Lebanese Restaurant Noodle House New American Restaurant Movie Theater
21 Bu Quwah 12.0 Turkish Restaurant Cafeteria Food Pizza Place Asian Restaurant Bakery Restaurant Falafel Restaurant Seafood Restaurant Coffee Shop
22 Buri 22.0 Indian Restaurant Bakery Breakfast Spot Restaurant Food Food Court Middle Eastern Restaurant Café Burger Joint Cafeteria
23 Busaiteen 43.0 Coffee Shop Café Burger Joint Ice Cream Shop Tea Room Juice Bar Middle Eastern Restaurant Donut Shop Restaurant Dessert Shop
24 Al Daih 50.0 Middle Eastern Restaurant Bakery Breakfast Spot Dessert Shop Café Restaurant Turkish Restaurant Burger Joint Diner Steakhouse
28 Diraz 16.0 Cafeteria Middle Eastern Restaurant Fast Food Restaurant Steakhouse Snack Place Vegetarian / Vegan Restaurant Asian Restaurant Indian Restaurant Restaurant Breakfast Spot
32 East Hidd City 47.0 Coffee Shop Fried Chicken Joint Pizza Place Cafeteria Café Indian Restaurant BBQ Joint Ice Cream Shop Bakery Burrito Place
34 Galali 50.0 Fast Food Restaurant Restaurant Ice Cream Shop Sandwich Place Middle Eastern Restaurant Turkish Restaurant Tea Room Burger Joint Italian Restaurant Café
35 Al Hidd 47.0 Coffee Shop Fried Chicken Joint Pizza Place Cafeteria Café Indian Restaurant BBQ Joint Ice Cream Shop Bakery Burrito Place
36 Halat Bu Maher 48.0 Middle Eastern Restaurant Ice Cream Shop Seafood Restaurant Café Sandwich Place Cafeteria Fast Food Restaurant Indian Restaurant Restaurant Turkish Restaurant
37 Halat Nuaim 12.0 Restaurant Food Truck Lebanese Restaurant Indian Restaurant Deli / Bodega Bakery Seafood Restaurant Burger Joint Coffee Shop Café
38 Hamad Town 7.0 Sandwich Place Wings Joint Bakery Juice Bar Vegetarian / Vegan Restaurant Vietnamese Restaurant Hookah Bar Lebanese Restaurant New American Restaurant Movie Theater
41 Hillat Abdul Saleh 47.0 Middle Eastern Restaurant Fast Food Restaurant Coffee Shop Ice Cream Shop Pizza Place Juice Bar Donut Shop Café Food Truck Bakery
47 Jid Al-Haj 48.0 Restaurant Middle Eastern Restaurant Café Italian Restaurant Coffee Shop Breakfast Spot Dessert Shop Pizza Place Diner Seafood Restaurant
48 Jidhafs 49.0 Bakery Breakfast Spot Middle Eastern Restaurant Turkish Restaurant Dessert Shop American Restaurant Food Burger Joint Steakhouse Cafeteria
51 Karrana 28.0 Middle Eastern Restaurant Restaurant Ice Cream Shop Asian Restaurant Falafel Restaurant Bakery Coffee Shop Cafeteria Italian Restaurant Fried Chicken Joint
52 Karzakan 4.0 Vegetarian / Vegan Restaurant Turkish Restaurant Restaurant Shawarma Place Afghan Restaurant Kebab Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant
53 Khamis 20.0 Middle Eastern Restaurant Sandwich Place Turkish Restaurant Cafeteria Mediterranean Restaurant Café Restaurant Tea Room Italian Restaurant Diner
57 Manama 45.0 Indian Restaurant Filipino Restaurant Asian Restaurant Pizza Place Middle Eastern Restaurant Coffee Shop Café Cafeteria BBQ Joint Restaurant
59 Muharraq 48.0 Middle Eastern Restaurant Seafood Restaurant Café Cafeteria Restaurant Ice Cream Shop Turkish Restaurant Coffee Shop Pizza Place Burger Joint
70 Samaheej 3.0 Restaurant Bakery Afghan Restaurant Korean Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant
71 Sanad 38.0 Middle Eastern Restaurant Restaurant Ice Cream Shop Sandwich Place Pizza Place Breakfast Spot Cafeteria Burger Joint Café Bakery
72 Sar 27.0 Pizza Place Burger Joint Gas Station Chinese Restaurant Fast Food Restaurant Breakfast Spot Thai Restaurant Coffee Shop Fried Chicken Joint Deli / Bodega
75 Shakhura 45.0 Middle Eastern Restaurant Breakfast Spot Café Restaurant Fast Food Restaurant Dessert Shop Bakery Pizza Place Donut Shop Ice Cream Shop
78 Tashan 25.0 Middle Eastern Restaurant Restaurant Turkish Restaurant Seafood Restaurant Cafeteria Indian Restaurant Italian Restaurant Mediterranean Restaurant Diner Dessert Shop
79 Tubli 34.0 Restaurant Bakery Burger Joint Juice Bar Middle Eastern Restaurant Coffee Shop Café Turkish Restaurant Mediterranean Restaurant Pie Shop

This cluster has 34 areas

Cluster 2

cluster2 = bh_merged.loc[bh_merged['Cluster Labels'] == 1, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster2
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
0 A'ali 19.0 Café Restaurant Coffee Shop Cupcake Shop Breakfast Spot Food Sandwich Place Falafel Restaurant Middle Eastern Restaurant Bakery
6 Al Markh 17.0 Café Fast Food Restaurant Burger Joint Ice Cream Shop Juice Bar Coffee Shop Middle Eastern Restaurant Dessert Shop Bakery BBQ Joint
7 Al Musalla 17.0 Fast Food Restaurant Café Coffee Shop Middle Eastern Restaurant Seafood Restaurant Steakhouse Food Court Restaurant Pizza Place Japanese Restaurant
8 Al Qadam 4.0 Pizza Place Café Cafeteria Lebanese Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant
9 Al Qala 50.0 Café Burger Joint Restaurant Dessert Shop Coffee Shop Sandwich Place Bakery Juice Bar Ice Cream Shop Pizza Place
10 Al Qurayyah 11.0 Restaurant Café Mediterranean Restaurant Middle Eastern Restaurant Comfort Food Restaurant Coffee Shop Diner Japanese Restaurant Juice Bar Pastry Shop
11 Amwaj Islands 37.0 Café Middle Eastern Restaurant American Restaurant Indian Restaurant Restaurant Asian Restaurant Pizza Place Deli / Bodega Diner Portuguese Restaurant
13 Askar 4.0 Cafeteria Burger Joint Café Pie Shop Pastry Shop Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant
17 Bahrain Bay 50.0 Coffee Shop Café Indian Restaurant Pizza Place Steakhouse Burger Joint Middle Eastern Restaurant Restaurant Fried Chicken Joint American Restaurant
20 Bilad Al Qadeem 46.0 Café Middle Eastern Restaurant Ice Cream Shop Breakfast Spot Sandwich Place Fast Food Restaurant Pizza Place Burger Joint Restaurant Bakery
27 Diplomatic Area 50.0 Coffee Shop Café American Restaurant Fried Chicken Joint Burger Joint Restaurant Food Court Middle Eastern Restaurant Indian Restaurant French Restaurant
30 Dumistan 49.0 Café Coffee Shop Bakery Burger Joint Indian Restaurant Middle Eastern Restaurant Fast Food Restaurant Ice Cream Shop Falafel Restaurant Filipino Restaurant
33 Eker 6.0 Diner Snack Place Creperie Middle Eastern Restaurant Café Cafeteria Pastry Shop Noodle House New American Restaurant Movie Theater
39 Hamala 29.0 Coffee Shop Café Burger Joint Pizza Place Restaurant Sandwich Place Italian Restaurant Mexican Restaurant Hot Dog Joint Mediterranean Restaurant
42 Isa Town 43.0 Café Indian Restaurant Restaurant Pizza Place Bakery Cafeteria Coffee Shop Fast Food Restaurant Theme Restaurant Italian Restaurant
43 Janabiyah 29.0 Coffee Shop Café Burger Joint Pizza Place Restaurant Sandwich Place Italian Restaurant Mexican Restaurant Hot Dog Joint Mediterranean Restaurant
49 Jurdab 49.0 Café Coffee Shop Bakery Burger Joint Indian Restaurant Middle Eastern Restaurant Fast Food Restaurant Ice Cream Shop Falafel Restaurant Filipino Restaurant
50 Karbabad 12.0 Café Sandwich Place Coffee Shop Asian Restaurant Bakery Middle Eastern Restaurant Burger Joint Cafeteria Afghan Restaurant Mediterranean Restaurant
55 Mahazza 49.0 Café Coffee Shop Bakery Burger Joint Indian Restaurant Middle Eastern Restaurant Fast Food Restaurant Ice Cream Shop Falafel Restaurant Filipino Restaurant
58 Marquban 49.0 Café Coffee Shop Bakery Burger Joint Indian Restaurant Middle Eastern Restaurant Fast Food Restaurant Ice Cream Shop Falafel Restaurant Filipino Restaurant
60 Muqaba 35.0 Café Coffee Shop Restaurant Bakery Breakfast Spot Cafeteria Fried Chicken Joint Food Court Middle Eastern Restaurant Persian Restaurant
61 Muqsha 49.0 Café Coffee Shop Bakery Burger Joint Indian Restaurant Middle Eastern Restaurant Fast Food Restaurant Ice Cream Shop Falafel Restaurant Filipino Restaurant
64 Nuwaidrat 7.0 Café Asian Restaurant Restaurant Bakery Middle Eastern Restaurant Falafel Restaurant Afghan Restaurant Mediterranean Restaurant Noodle House New American Restaurant
65 Riffa 49.0 Café Coffee Shop Burger Joint Restaurant Bakery Sandwich Place Dessert Shop Pizza Place Italian Restaurant Lebanese Restaurant
66 Reef Island 13.0 Café American Restaurant Middle Eastern Restaurant Restaurant French Restaurant Japanese Restaurant Breakfast Spot Dessert Shop Lebanese Restaurant Mediterranean Restaurant
68 Sakhir 8.0 Burger Joint Diner Food Truck Middle Eastern Restaurant Café Lebanese Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant
69 Salmabad 12.0 Café Cupcake Shop Bakery Coffee Shop Middle Eastern Restaurant Breakfast Spot Juice Bar Food Asian Restaurant Japanese Restaurant
73 Sehla 6.0 Burger Joint Café American Restaurant Ice Cream Shop Bakery Food Court Pastry Shop Noodle House New American Restaurant Movie Theater
81 Zallaq 29.0 Café Coffee Shop Restaurant Juice Bar Middle Eastern Restaurant Fast Food Restaurant Creperie Cafeteria Burrito Place Burger Joint

This cluster has 29 areas

Cluster 3

cluster3 = bh_merged.loc[bh_merged['Cluster Labels'] == 2, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster3
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
25 Al Dair 9.0 Bakery Restaurant BBQ Joint Italian Restaurant Fast Food Restaurant Afghan Restaurant Lebanese Restaurant Noodle House New American Restaurant Movie Theater
44 Jannusan 11.0 Bakery Gastropub Cafeteria Indian Restaurant Sandwich Place Fish & Chips Shop Seafood Restaurant Snack Place New American Restaurant Movie Theater
46 Jaww 1.0 Bakery Afghan Restaurant Korean Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant Mediterranean Restaurant
54 Ma'ameer 3.0 Creperie Diner Bakery Afghan Restaurant Lebanese Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant
76 Sitra 2.0 Turkish Restaurant Bakery Afghan Restaurant Korean Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant

This cluster has 5 areas

Cluster 4

cluster4 = bh_merged.loc[bh_merged['Cluster Labels'] == 3, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster4
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
14 Awali 1.0 Café Afghan Restaurant Persian Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant Mediterranean Restaurant

This cluster has 1 area

Cluster 5

cluster5 = bh_merged.loc[bh_merged['Cluster Labels'] == 4, bh_merged.columns[[0] + list(range(4, bh_merged.shape[1]))]]
cluster5
Area NumberOfFoodPlaces 1st Most Common Food Place 2nd Most Common Food Place 3rd Most Common Food Place 4th Most Common Food Place 5th Most Common Food Place 6th Most Common Food Place 7th Most Common Food Place 8th Most Common Food Place 9th Most Common Food Place 10th Most Common Food Place
26 Dar Kulaib 4.0 Restaurant Coffee Shop Sandwich Place Breakfast Spot Afghan Restaurant Lebanese Restaurant New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant
56 Malkiya 2.0 Ice Cream Shop Coffee Shop Afghan Restaurant Lebanese Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant
77 Sufala 3.0 Coffee Shop Restaurant Afghan Restaurant Lebanese Restaurant Noodle House New American Restaurant Movie Theater Moroccan Restaurant Middle Eastern Restaurant Mexican Restaurant

This cluster has 3 areas

Conclusion

Phew! We’re done with finding our clusters, and finding out which areas fall into it. To understand the constraints and my discussion to conclude this solution, please refer to my report available on my github repo, where you will find the datasets I’ve used :D

I hope you’ve enjoyed reading & learning something new from this post. Doing this was part of my data-science course, and I hope you can do the same with your hobby projects.

Until next time, cheers!