Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by...

10
Recipe recommendation using ingredient networks Chun-Yuen Teng School of Information University of Michigan Ann Arbor, MI, USA [email protected] Yu-Ru Lin IQSS, Harvard University CCS, Northeastern University Boston, MA [email protected] Lada A. Adamic School of Information University of Michigan Ann Arbor, MI, USA [email protected] ABSTRACT The recording and sharing of cooking recipes, a human ac- tivity dating back thousands of years, naturally became an early and prominent social use of the web. The resulting online recipe collections are repositories of ingredient com- binations and cooking methods whose large-scale and vari- ety yield interesting insights about both the fundamentals of cooking and user preferences. At the level of an individual ingredient we measure whether it tends to be essential or can be dropped or added, and whether its quantity can be modi- fied. We also construct two types of networks to capture the relationships between ingredients. The complement network captures which ingredients tend to co-occur frequently, and is composed of two large communities: one savory, the other sweet. The substitute network, derived from user-generated suggestions for modifications, can be decomposed into many communities of functionally equivalent ingredients, and cap- tures users’ preference for healthier variants of a recipe. Our experiments reveal that recipe ratings can be well predicted with features derived from combinations of ingredient net- works and nutrition information. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— Data mining General Terms Measurement; Experimentation Keywords ingredient networks, recipe recommendation 1. INTRODUCTION The web enables individuals to collaboratively share knowl- edge and recipe websites are one of the earliest examples of collaborative knowledge sharing on the web. Allrecipes.com, the subject of our present study, was founded in 1997, years ahead of other collaborative websites such as the Wikipedia. Recipe sites thrive because individuals are eager to share their recipes, from family recipes that had been passed down for generations, to new concoctions that they created that afternoon, having been motivated in part by the ability to share the result online. Once shared, the recipes are imple- mented and evaluated by other users, who supply ratings and comments. The desire to look up recipes online may at first appear odd given that tombs of printed recipes can be found in almost every kitchen. The Joy of Cooking [12] alone con- tains 4,500 recipes spread over 1,000 pages. There is, how- ever, substantial additional value in online recipes, beyond their accessibility. While the Joy of Cooking contains a single recipe for Swedish meatballs, Allrecipes.com hosts “Swedish Meatballs I”,“II”, and“III”, submitted by different users, along with 4 other variants, including “The Amaz- ing Swedish Meatball”. Each variant has been reviewed, from 329 reviews for “Swedish Meatballs I” to 5 reviews for “Swedish Meatballs III”. The reviews not only provide a crowd-sourced ranking of the different recipes, but also many suggestions on how to modify them, e.g. using ground turkey instead of beef, skipping the “cream of wheat” be- cause it is rarely on hand, etc. The wealth of information captured by online collabora- tive recipe sharing sites is revealing not only of the fun- damentals of cooking, but also of user preferences. The co- occurrence of ingredients in tens of thousands of recipes pro- vides information about which ingredients go well together, and when a pairing is unusual. Users’ reviews provide clues as to the flexibility of a recipe, and the ingredients within it. Can the amount of cinnamon be doubled? Can the nut- meg be omitted? If one is lacking a certain ingredient, can a substitute be found among supplies at hand without a trip to the grocery store? Unlike cookbooks, which will contain vetted but perhaps not the best variants for some individu- als’ tastes, ratings assigned to user-submitted recipes allow for the evaluation of what works and what does not. In this paper, we seek to distill the collective knowledge and preference about cooking through mining a popular recipe-sharing website. To extract such information, we first parse the unstructured text of the recipes and the accom- panying user reviews. We construct two types of networks that reflect different relationships between ingredients, in order to capture users’ knowledge about how to combine in- gredients. The complement network captures which ingre- dients tend to co-occur frequently, and is composed of two arXiv:1111.3919v3 [cs.SI] 21 May 2012

Transcript of Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by...

Page 1: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

Recipe recommendation using ingredient networks

Chun-Yuen TengSchool of InformationUniversity of MichiganAnn Arbor, MI, USA

[email protected]

Yu-Ru LinIQSS, Harvard University

CCS, Northeastern UniversityBoston, MA

[email protected]

Lada A. AdamicSchool of InformationUniversity of MichiganAnn Arbor, MI, USA

[email protected]

ABSTRACTThe recording and sharing of cooking recipes, a human ac-tivity dating back thousands of years, naturally became anearly and prominent social use of the web. The resultingonline recipe collections are repositories of ingredient com-binations and cooking methods whose large-scale and vari-ety yield interesting insights about both the fundamentals ofcooking and user preferences. At the level of an individualingredient we measure whether it tends to be essential or canbe dropped or added, and whether its quantity can be modi-fied. We also construct two types of networks to capture therelationships between ingredients. The complement networkcaptures which ingredients tend to co-occur frequently, andis composed of two large communities: one savory, the othersweet. The substitute network, derived from user-generatedsuggestions for modifications, can be decomposed into manycommunities of functionally equivalent ingredients, and cap-tures users’ preference for healthier variants of a recipe. Ourexperiments reveal that recipe ratings can be well predictedwith features derived from combinations of ingredient net-works and nutrition information.

Categories and Subject DescriptorsH.2.8 [Database Management]: Database applications—Data mining

General TermsMeasurement; Experimentation

Keywordsingredient networks, recipe recommendation

1. INTRODUCTIONThe web enables individuals to collaboratively share knowl-

edge and recipe websites are one of the earliest examples ofcollaborative knowledge sharing on the web. Allrecipes.com,

the subject of our present study, was founded in 1997, yearsahead of other collaborative websites such as the Wikipedia.Recipe sites thrive because individuals are eager to sharetheir recipes, from family recipes that had been passed downfor generations, to new concoctions that they created thatafternoon, having been motivated in part by the ability toshare the result online. Once shared, the recipes are imple-mented and evaluated by other users, who supply ratingsand comments.

The desire to look up recipes online may at first appearodd given that tombs of printed recipes can be found inalmost every kitchen. The Joy of Cooking [12] alone con-tains 4,500 recipes spread over 1,000 pages. There is, how-ever, substantial additional value in online recipes, beyondtheir accessibility. While the Joy of Cooking contains asingle recipe for Swedish meatballs, Allrecipes.com hosts“Swedish Meatballs I”, “II”, and “III”, submitted by differentusers, along with 4 other variants, including “The Amaz-ing Swedish Meatball”. Each variant has been reviewed,from 329 reviews for “Swedish Meatballs I” to 5 reviewsfor “Swedish Meatballs III”. The reviews not only providea crowd-sourced ranking of the different recipes, but alsomany suggestions on how to modify them, e.g. using groundturkey instead of beef, skipping the “cream of wheat” be-cause it is rarely on hand, etc.

The wealth of information captured by online collabora-tive recipe sharing sites is revealing not only of the fun-damentals of cooking, but also of user preferences. The co-occurrence of ingredients in tens of thousands of recipes pro-vides information about which ingredients go well together,and when a pairing is unusual. Users’ reviews provide cluesas to the flexibility of a recipe, and the ingredients withinit. Can the amount of cinnamon be doubled? Can the nut-meg be omitted? If one is lacking a certain ingredient, can asubstitute be found among supplies at hand without a tripto the grocery store? Unlike cookbooks, which will containvetted but perhaps not the best variants for some individu-als’ tastes, ratings assigned to user-submitted recipes allowfor the evaluation of what works and what does not.

In this paper, we seek to distill the collective knowledgeand preference about cooking through mining a popularrecipe-sharing website. To extract such information, we firstparse the unstructured text of the recipes and the accom-panying user reviews. We construct two types of networksthat reflect different relationships between ingredients, inorder to capture users’ knowledge about how to combine in-gredients. The complement network captures which ingre-dients tend to co-occur frequently, and is composed of two

arX

iv:1

111.

3919

v3 [

cs.S

I] 2

1 M

ay 2

012

Page 2: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

large communities: one savory, the other sweet. The sub-stitute network, derived from user-generated suggestions formodifications, can be decomposed into many communities offunctionally equivalent ingredients, and captures users’ pref-erence for healthier variants of a recipe. Our experimentsreveal that recipe ratings can be well predicted by featuresderived from combinations of ingredient networks and nu-trition information (with accuracy .792), while most of theprediction power comes from the ingredient networks (84%).

The rest of the paper is organized as follows. Section 2 re-views the related work. Section 3 describes the dataset. Sec-tion 4 discusses the extraction of the ingredient and comple-ment networks and their characteristics. Section 5 presentsthe extraction of recipe modification information, as well asthe construction and characteristics of the ingredient substi-tute network. Section 6 presents our experiments on reciperecommendation and Section 7 concludes.

2. RELATED WORKRecipe recommendation has been the subject of much

prior work. Typically the goal has been to suggest recipesto users based on their past recipe ratings [15][3] or brows-ing/cooking history [16]. The algorithms then find simi-lar recipes based on overlapping ingredients, either treat-ing each ingredient equally [4] or by identifying key ingre-dients [19]. Instead of modeling recipes using ingredients,Wang et al. [17] represent the recipes as graphs which arebuilt on ingredients and cooking directions, and they demon-strate that graph representations can be used to easily ag-gregate Chinese dishes by the flow of cooking steps and thesequence of added ingredients. However, their approach onlymodels the occurrence of ingredients or cooking methods,and doesn’t take into account the relationships between in-gredients. In contrast, in this paper we incorporate the like-lihood of ingredients to co-occur, as well as the potential ofone ingredient to act as a substitute for another.

Another branch of research has focused on recommend-ing recipes based on desired nutritional intake or promotinghealthy food choices. Geleijnse et al. [7] designed a proto-type of a personalized recipe advice system, which suggestsrecipes to users based on their past food selections and nutri-tion intake. In addition to nutrition information, Kamiethet al. [9] built a personalized recipe recommendation systembased on availability of ingredients and personal nutritionalneeds. Shidochi et al. [14] proposed an algorithm to extractreplaceable ingredients from recipes in order to satisfy users’various demands, such as calorie constraints and food avail-ability. Their method identifies substitutable ingredients bymatching the cooking actions that correspond to ingredientnames. However, their assumption that substitutable ingre-dients are subject to the same processing methods is less di-rect and specific than extracting substitutions directly fromuser-contributed suggestions.

Ahn et al. [1] and Kinouchi et al [10] examined networksinvolving ingredients derived from recipes, with the formermodeling ingredients by their flavor bonds, and the latterexamining the relationship between ingredients and recipes.In contrast, we derive direct ingredient-ingredient networksof both compliments and substitutes. We also step beyondcharacterizing these networks to demonstrating that theycan be used to predict which recipes will be successful.

3. DATASETAllrecipes.com is one of the most popular recipe-sharing

websites, where novice and expert cooks alike can uploadand rate cooking recipes. It hosts 16 customized interna-tional sites for users to share their recipes in their nativelanguages, of which we study only the main, English, ver-sion. Recipes uploaded to the site contain specific instruc-tions on how to prepare a dish: the list of ingredients, prepa-ration steps, preparation and cook time, the number of serv-ings produced, nutrition information, serving directions, andphotos of the prepared dish. The uploaded recipes are en-riched with user ratings and reviews, which comment onthe quality of the recipe, and suggest changes and improve-ments. In addition to rating and commenting on recipes,users are able to save them as favorites or recommend themto others through a forum.

We downloaded 46,337 recipes including all informationlisted from allrecipes.com, including several classifications,such as a region (e.g. the midwest region of US or Eu-rope), the course or meal the dish is appropriate for (e.g.:appetizers or breakfast), and any holidays the dish may beassociated with. In order to understand users’ recipe prefer-ences, we crawled 1,976,920 reviews which include reviewers’ratings, review text, and the number of users who voted thereview as useful.

3.1 Data preprocessingThe first step in processing the recipes is identifying the

ingredients and cooking methods from the freeform text ofthe recipe. Usually, although not always, each ingredientis listed on a separate line. To extract the ingredients, wetried two approaches. In the first, we found the maximalmatch between a pre-curated list of ingredients and the textof the line. However, this missed too many ingredients,while misidentifying others. In the second approach, weused regular expression matching to remove non-ingredientterms from the line and identified the remainder as the in-gredient. We removed quantifiers, such as e.g. “1 lb” or “2cups”, words referring to consistency or temperature, e.g.chopped or cold, along with a few other heuristics, such asremoving content in parentheses. For example “1 (28 ounce)can baked beans (such as Bush’s Original R©)” is identifiedas “baked beans”. By limiting the list of potential termsto remove from an ingredient entry, we erred on the sideof not conflating potentially identical or highly similar in-gredients, e.g. “cheddar cheese”, used in 2450 recipes, wasconsidered different from “sharp cheddar cheese”, occurringin 394 recipes.

We then generated an ingredient list sorted by frequencyof ingredient occurrence and selected the top 1000 commoningredient names as our finalized ingredient list. Each of thetop 1000 ingredients occurred in 23 or more recipes, withplain salt making an appearance in 47.3% of recipes. Theseingredients also accounted for 94.9% of ingredient entries inthe recipe dataset. The remaining ingredients were missedeither because of high specificity (e.g. yolk-free egg noodle),referencing brand names (e.g. Planters almonds), rarity (e.g.serviceberry), misspellings, or not being a food (e.g. “nylonnetting”).

The remaining processing task was to identify cookingprocesses from the directions. We first identified all heatingmethods using a listing in the Wikipedia entry on cooking[18]. For example, baking, boiling, and steaming are all ways

Page 3: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

bake boil fry grill roast simmer marinate

midwestmountainnortheastwest coastsouth

method

% in

rec

ipes

010

2030

40

Figure 1: The percentage of recipes by region thatapply a specific heating method.

of heating the food. We then identified mechanical ways ofprocessing the food such as chopping and grinding, and otherchemical techniques such as marinating and brining.

3.2 Regional preferencesChoosing one cooking method over another appears to be

a question of regional taste. 5.8% of recipes were classifiedinto one of five US regions: Mountain, Midwest, Northeast,South, and West Coast (including Alaska and Hawaii). Fig-ure 1 shows significantly (χ2 test p-value < 0.001) varyingpreferences in the different US regions among 6 of the mostpopular cooking methods. Boiling and simmering, both in-volving heating food in hot liquids, are more common in theSouth and Midwest. Marinating and grilling are relativelymore popular in the West and Mountain regions, but in theWest more grilling recipes involve seafood (18/42 = 42%)relative to other regions combined (7/106 = 6%). Fryingis popular in the South and Northeast. Baking is a univer-sally popular and versatile technique, which is often used forboth sweet and savory dishes, and is slightly more popularin the Northeast and Midwest. Examination of individualrecipes reflecting these frequencies shows that these differ-ences in preference can be tied to differences in demograph-ics, immigrant culture and availability of local ingredients,e.g. seafood.

4. INGREDIENT COMPLEMENT NETWORKCan we learn how to combine ingredients from the data?

Here we employ the occurrences of ingredients across recipesto distill users’ knowledge about combining ingredients.

We constructed an ingredient complement network basedon pointwise mutual information (PMI) defined on pairs ofingredients (a, b):

PMI(a,b) = logp(a, b)

p(a)p(b),

where

p(a, b) =# of recipes containing a and b

# of recipes,

p(a) =# of recipes containing a

# of recipes,

p(b) =# of recipes containing b

# of recipes.

The PMI gives the probability that two ingredients occurtogether against the probability that they occur separately.Complementary ingredients tend to occur together far moreoften than would be expected by chance.

Figure 2 shows a visualization of ingredient complemen-tarity. Two distinct subcommunities of recipes are imme-diately apparent: one corresponding to savory dishes, theother to sweet ones. Some central ingredients, e.g. egg andsalt, actually are pushed to the periphery of the network.They are so ubiquitous, that although they have many edges,they are all weak, since they don’t show particular comple-mentarity with any single group of ingredients.

We further probed the structure of the complementaritynetwork by applying a network clustering algorithm [13].The algorithm confirmed the existence of two main clusterscontaining the vast majority of the ingredients. An interest-ing satellite cluster is that of mixed drink ingredients, whichis evident as a constellation of small nodes located near thetop of the sweet cluster in Figure 2. The cluster includesthe following ingredients: lime, rum, ice, orange, pineapplejuice, vodka, cranberry juice, lemonade, tequila, etc.

For each recipe we recorded the minimum, average, andmaximum pairwise pointwise mutual information betweeningredients. The intuition is that complementary ingredi-ents would yield higher ratings, while ingredients that don’tgo together would lower the average rating. We found thatwhile the average and minimum pointwise mutual informa-tion between ingredients is uncorrelated with ratings, themaximum is very slightly positively correlated with the av-erage rating for the recipe (ρ = 0.09, p-value < 10−10). Thissuggests that having at least two complementary ingredientsvery slightly boosts a recipe’s prospects, but having clashingor unrelated ingredients does not seem to do harm.

5. RECIPE MODIFICATIONSCo-occurrence of ingredients aggregated over individual

recipes reveals the structure of cooking, but tells us littleabout how flexible the ingredient proportions are, or whethersome ingredients could easily be left out or substituted. Anexperienced cook may know that apple sauce is a low-fat al-ternative to oil, or may know that nutmeg is often optional,but a novice cook may implement recipes literally, afraidthat deviating from the instructions may produce poor re-sults. While a traditional hardcopy cookbook would providefew such hints, they are plentiful in the reviews submittedby users who implemented the recipes, e.g. “This is a greatrecipe, but using fresh tomatoes only adds a few minutes tothe prep time and makes it taste so much better”, or anothercomment about the same salsa recipe“This is by far the bestrecipe we have ever come across. We did however change itjust a little bit by adding extra onion.”

As the examples illustrate, modifications are reported evenwhen the user likes the recipe. In fact, we found that 60.1%of recipe reviews contain words signaling modification, suchas “add”, “omit”, “instead”, “extra” and 14 others. Further-more, it is the reviews that include changes that have a sta-tistically higher average rating (4.49 vs. 4.39, t-test p-value< 10−10), and lower rating variance (0.82 vs. 1.05, Bartletttest p-value < 10−10), as is evident in the distribution ofratings, shown in Fig. 3. This suggests that flexibility inrecipes is not necessarily a bad thing, and that reviewerswho don’t mention modifications are more likely to think ofthe recipe as perfect, or to dislike it entirely.

Page 4: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

cherry gelatin

graham cracker

low fat cottage cheese

pork shoulder roast

heavy whipping cream tofu

bok choy

butter cracker

baking soda

pimento pepper

milk powder

chorizo sausage

lady�nger

steak sauce

crimushroom

radishe

shiitake mushroom

pesto

brownie mix

pumpkin pie spice

rye �our

cardamom

sa�ron thread

linguine

corn

fat free sour cream

basmati rice

bittersweet chocolate

bay

corn chip

cracker

french green bean

poppy seed

vegetable oil

grape tomato

pizza crust dough

low sodium beef broth

club soda

lard

soy saucepanko bread

couscou

crab meat

mango

unpastry shell

catalina dressing

pasta shell

italian salad dressing

mexican corn

decorating gel

italian bread

napa cabbage

onion powder

white wine vinegar

cocktail rye bread

basil sauce

crouton

brown gravy mix

barbeque sauce

apple cider vinegar

hoagie roll

milk chocolate candy kisse

�ounder

salt black pepper

maraschino cherry juice

chow mein noodle

tiger prawn

banana pepper

cranberry

vermicelli pasta

root beer

strawberry jam

lemon gelatin mix

creamed corn

pretzel

pie shell

sun�ower kernel

rump roast

romaine

vegetable stock

lemon pepper seasoning

guacamole

louisiana hot sauce

cabbage

yellow onion

super�ne sugar

orange peel

raspberry

cumin seedcandied mixed fruit peel

cream of coconut

bow tie pasta

creme fraiche

currant

pork chop

turkey gravy

fat free half and half

chicken ramen noodle

wooden skewer

whipping cream

mace

seasoning salt

mozzarella cheesepasta sauce

lean pork

broccoli �oweret

tomatillo

lemonade

tomato paste

caesar dressing

basil pesto

melon liqueur

coconut milk

whole wheat pastry �our

muenster cheese

lump crab meat

angel food cake

ring

cheese tortellini

spiral pasta

vanilla pudding

cauli�oweret

smoked sausage

hot dog

pita bread

cocoa powder

garbanzo bean

tart apple

wheat bran

hot pepper sauce

chili

refried bean

salmon steak

white cheddar cheese

low fat mayonnaise

grapefruit

dijon mustard

tomato juice

yellow squash

baking apple

cream of tartar

vodka

rye bread

white chip

�at iron steak

linguine pasta

fennel

whole wheat bread

baking mix

alfredo pasta sauce

margarine

confectioners' sugarfruit gelatin mix

pork

balsamic vinegar

pork loin chop

jicama

pre pizza crust

triple sec

teriyaki sauce

cola carbonated beverage

polish sausage

cracked black pepper

poblano chile pepper

individually wrapped caramel

roast beef

bread stu�ng mix

eggnog

pear

caramel

beet

worcestershire sauce

chicken stock

horseradish

semisweet chocolate chip

basil

red grape

plum

cinnamon sugar

fajita seasoning

rice noodle

powdered milk

star anise pod

short grain rice

ramen noodle

vegetable

coconut oil

whiskey

lime gelatin mix

peanut oil

ham

ginger root

lima bean

pimento stu�ed green olive

hoisin sauce

round steak

stu�ng

part skim ricotta cheese

broiler fryer chicken up

milk chocolate chip

turbinado sugar

vegetable shortening

tarragon vinegar

golden delicious apple

turkey

rigatoni pasta

stu�ng mix

milk

juiced

burgundy wine

red kidney bean

dill

candied pineapple

german chocolate cake mix

arborio rice

sugar free vanilla pudding mix

pine nut

green apple

cucumber oreganopearl onion

stu�ed green olive

whipped topping mix

broccoli

pinto bean

pasta

beef short rib

gelatin

garlic powder

rutabaga

chicken liver

pepperjack cheese

herb

lemon gras

sweet potato

pineapple ring

parsley �ake

pie �lling

spice cake mix

butterscotch chip

greek yogurt

vanilla ice cream

seafood seasoning

parsnip

applesauce

chinese �ve spice powder

salt pepper

beef broth

cherry tomato

sage

vanilla

vital wheat gluten

artichoke heart

mixed berry

bacon dripping

self rising �our

nilla wafer

navy bean

bacon

egg yolk

wonton wrapper

chocolate pudding mix

salsa

coconut

tomato based chili sauce

marsala wine

mussel

manicotti shell

anise extract

mustard seed

nutmeg

cayenne pepper

black bean pepperokra

asparagu

mustard powder

�rmly brown sugar

balsamic vinaigrette dressing

chicken breastoyster

ditalini pasta

old bay seasoning tm

brown rice

process american cheese

chocolate

miso paste

pineapple

iceberg lettuce

pearl barley

oat

greek seasoning

biscuit

clove

browning sauce

chicken bouillon powder

green pea

bread dough

cream cheesepeanut butter chip

silken tofu

pineapple chip

sea scallop

ricotta cheese

papaya

red cabbage

egg substitute

zesty italian dressing

devil's food cake mix

bagel

sour mix

lamb

irish stout beer

sea salt

romaine lettuce

kalamata olive

salt

monosodium glutamate

rice wine

white potato

rum extract

grape jelly

crescent roll dough

beer

phyllo dough

fettuccine pasta

chili seasoning mix

biscuit mix

candy coated chocolate

green cabbage

ranch bean

cream of celery soup

apple pie �lling

caper

nectarine

white mushroom

banana

orange gelatin mix

1% buttermilk

apple jelly

dinner roll

sugar pumpkin

salad green

shrimp

cheese ravioli

chicken wing

sour cream

saltine

cornmeal

mixed vegetable

beef tenderloin

sherry

rotini pasta

mexican cheese blend

kosher salt black pepper

mayonnaise

lobster

white onion

chocolate cookie

white bread

french baguette

bread

vanilla frosting

anise seed

ranch dressing mix

wild rice

hot

canadian bacon

corn�akes cereal

wax bean

cantaloupe

non fat yogurt

lite whipped topping

spaghetti squash

egg roll wrapper

solid pack pumpkin

recipe pastry

asafoetida powder

co�ee powder

italian sauce

amaretto liqueur

shortening

turmeric

semolina �our

pomegranate juice

corned beef

skewer

shallotspanish onion

tapioca

provolone cheese

chile sauce

vanilla bean

chile pepperangel hair pasta

pumpkin

tilapia

brie cheese

cottage cheese

banana liqueur

lemon

smoked salmon

ginger paste

brown mustard

peanut butter

escarole

sour milk

olive oil

country pork rib

pastry shell

adobo seasoning

candy coated milk chocolate

curryghee

alfredo sauce

yellow cake mix

granny smith apple

beef chuck

chocolate hazelnut spread

maple syrup

squid

gingersnap cooky

raspberry gelatin

molasse

lemon cake mix

�sh stock

cook

grenadine syrup

pu� pastry

rum

grapefruit juice

tahini

black pepperbutternut squash

key lime juice

sirloin steak

macaroni

butter shortening

brown lentil

chicken broth

chili bean

pickling spice

yellow food coloring

great northern bean

mixed nut

green chile

salmon

english mu�n

co�ee liqueur

non fat milk powder

buttermilk

distilled white vinegar

golden syrup

powdered fruit pectin

green chily

grape

raspberry gelatin mix

low fat sour cream

topping

pineapple juice

red lettuce

orange zest

ketchup

chunk chicken

steak seasoning

sandwich roll

crystallized ginger

kosher salt

roma tomato

red bean

red candied cherry

sesame seed

beef stock

cashew

popped popcorn

apricot nectar

any fruit jam

processed cheese food

red pepper

coleslaw mix

white cake mix

cherry pie �lling

canola oil

whole wheat �our

honey

long grain

marinara sauce

yellow summer squash

to�ee baking bit

whole milk

trout

onion separated

low fat cream cheese

corn oil

oat bran

cream of potato soup

allspice berry

mandarin orange

cumin

saltine cracker

swiss chard

fenugreek seed

�sh sauce

eggplant

baby corn

cider vinegar

orange sherbet

debearded

beef bouillon

kernel corn

vanilla vodka

chicken leg quarter

mintfeta cheese

lime juice

raspberry jam

cooking oil

white corn

herb stu�ng mix

lemon lime soda

pork sausage

ziti pasta

orange marmalade

yogurt

bean

ginger garlic paste

crescent dinner roll

scallop

walnut oil

smoked ham

red food coloring

triple sec liqueur

fat free evaporated milk

walnutbaking chocolate

blueberry

caramel ice cream topping

bacon grease

fat free italian dressing

steak

�g

miracle whip ‚Ñ

potato starch

luncheon meat

brandy based orange liqueur

smoked paprika

pu� pastry shell

raspberry preserve

apple butter

tomato sauce

white rice

beef stew meat

taco seasoning mix

date

whipped topping

marshmallow

co�ee

butterscotch schnapp

red wine vinegar

orange

chicken thigh

mild italian sausage

blueberry pie �lling

yeast

lime peel

rice �our

chocolate cake mix

barbecue sauce

monterey jack cheese

halibut

beef round steak

seed

sour cherry

pork sparerib

orange roughy

barley nugget cereal

leek

maraschino cherry

chickpea

fettuccini pasta

orange juice

blue cheese dressing

yam

garam masala

black eyed pea

penne pasta

serrano chile pepper

�ourchive

marjoram

herb stu�ng

beef sirloin

beef

maple extract

bamboo shoot

lemon extract

meat tenderizer

kielbasa sausage

low sodium chicken broth

asparagus

cod

italian seasoning

lime gelatin

vegetable bouillon

andouille sausage

collard green

blackberry

beef gravy

green grape

tamari

fruit

malt vinegar

strawberry gelatin

lemon gelatin

green olive

poultry seasoning

prune

beef consomme

chili powder

dressing

fennel seed

gruyere cheese

jellied cranberry sauce

chipotle pepper

vanilla extract

apricot

linguini pasta

cranberry sauce

port wine

process cheese

cornish game hen

cilantro

green chile pepper

wheat

bread machine yeast

tube pasta

biscuit baking mix

cream corn

spinach

low fat whipped topping

irish cream liqueur

candy

zucchini

mild cheddar cheese

orange gelatin cornstarch

cheese

snow pea

low fat margarine

green candied cherry

vermouth

brandy

white grape juice

corn bread mix

broccoli �oret

vidalia onion

cocktail sauce

pickled jalapeno pepper

beaten egg

hamburger bun

black walnut

dill pickle juice

dill pickle relish

habanero pepper

white chocolate chip

veal

powdered non dairy creamer

lasagna noodle

gingerapricot jam

imitation crab meat

chicken soup base

white bean

tarragon

onion soup mix

thousand island dressing

red lentil

pancake mix

wheat germ

fat free mayonnaise

yukon gold potato

long grain rice

carrot

cauli�ower �oret

vegetable cooking spray

craw�sh tail

peppermint extract

brussels sprout

onion salt buttermilk biscuit

white kidney bean

mango chutney

black olive

meatless spaghetti sauce

curry powder

coriander

red snapper

biscuit dough

sausage

cheddar cheese soup

lettuce

pork loin roast

lemon pepper

red curry paste

egg noodle

hot sauce

raspberry vinegar

butter cooking spray

peach schnapp

eggspicy pork sausage

mixed fruit

cat�sh

venison

yellow pepper

carbonated water

pumpkin seed

new potato

lemon juice

chocolate pudding

watermelon

chicken breast half

gorgonzola cheese

buttery round cracker

apple pie spice

process cheese sauce

jasmine rice

lemon pudding mix

cooking sherry

strawberry preserve

french bread

toothpick

sauce

corn tortilla chip

garlic paste

salt free seasoning blend

elbow macaronipickle

cream of chicken soup

cardamom pod

persimmon pulp

chicken

liquid smoke

cocoa

pound cake

bell pepper

food coloring

coconut extract

chocolate chip

berry cranberry sauce

red bell pepper

seashell pasta

american cheese

oatmeal

sourdough bread

cornbread

mixed salad green

arugula

oil

parmesan cheeseclam juice

brick cream cheesecereal

italian parsley

milk chocolate

rice wine vinegar

hot dog bun

pistachio pudding mix

curd cottage cheese

garlic salt

chocolate cookie crust

orange extract

cream of mushroom soup

sa�ron

mushroom

tortilla chip

white hominy

green beans snapped

dill pickle

french onion soup

skim milk

tequila

�ax seed

low fat cheddar cheese

red wine

nut

apple cider

candied cherry

cheddar cheese

gingerroot

chocolate frosting

low fat yogurt

peppercorn

pepperoni

artichoke

baby pea

crisp rice cereal

potato chip

coconut cream

angel food cake mix

onion �ake

salad shrimp

taco seasoning

champagne

peach

low fat

yellow cornmeal

pork roast

baby spinach

portobello mushroom cap

blue cheese

strawberry gelatin mix

pink lemonade

chestnut

strawberry

oyster sauce

sugar snap pea

ka�r lime

anchovy

stu�ed olive

herb bread stu�ng mix

half and half

serrano pepper

coconut rum

red apple

cherry

�ank steak

round

peppermint candy

butter bean

almond

white vinegarcelery seed

corn syrupfat free cream cheese

cannellini bean

clam

mustard

scallion

potato �ake

parsley

fat free yogurt

pita bread round

red pepper �ake

onion

bourbon whiskey

creme de menthe liqueur

golden raisin

pancetta bacon

apple juice

egg white

fontina cheese

kale

asiago cheese

spiced rum

farfalle pasta

lobster tail

mirin

leg of lamb

tomato

zested

sauerkraut

unpie crust

bourbon

lean beef

tuna steak

wild rice mix

raisin

chocolate syrup

juice

cajun seasoning

cauli�owerwaterlemon yogurt

tapioca �our

vanilla yogurt

pimiento

hazelnut liqueur

thyme

part skim mozzarella cheese

mandarin orange segment

cinnamon

corn tortilla

crispy rice cereal

colby monterey jack cheese

apricot preserve

chipotle chile powder

swiss cheese

white wine

baking powdergraham cracker crust

vanilla wafer

lime

sugar based curing mixture

cream cheese spread

celeryolive

simple syrup

asian sesame oil

bacon bit

sharp cheddar cheese

rice vinegar

sea salt black pepper

curry paste

beef chuck roast

butter extract

pork loin

ginger ale

chicken leg

adobo sauce

lime zest

ham hock

watercres

pastry

seasoning

lentil

mascarpone cheese

baker's semisweet chocolate

acorn squash

chunk chicken breast

pepperoni sausage

brown sugar

fusilli pasta

kaiser roll

red delicious apple

honey mustard

unbleached �our

vinegar

spicy brown mustard

chuck roast

candied citron

vegetable combination

beef �ank steak

red chile pepper

avocado

quinoa

cake �our

whole wheat tortilla

dill seed

turnip

vegetable broth

sugarsugar cookie mix neufchatel cheese

coriander seed

apple

vegetable soup mix

chocolate sandwich cooky

colby cheese

sourdough starter

green bean

pecansoftened butter matzo meal

hash brown potato

vanilla pudding mix

pickle relish

noodlered potato

white chocolate

pistachio nut

green food coloring

lemon zest

chutney

splenda

buttermilk baking mix

caraway seedmaple �avoring

taco sauce

chili oil

kiwi

lean turkey

garlic

golden mushroom soup

grit

chili sauce

rosemary

green salsa

corkscrew shaped pasta

marshmallow creme

enchilada sauce

baby carrot

savory

cinnamon red candy

corn mu�n mix

black peppercorn

green bell pepperwater chestnut

french dressing

almond extract

rose water

paprika

english cucumber

nutritional yeast

unpie shell

ears corn

cream of shrimp soup

plum tomato

bratwurst

green lettuce

lemon lime carbonated beverage

ice

creole seasoning

grape juice

italian sausage

pizza crust

orzo pasta

white rum

crescent roll

italian cheese blend

rhubarb

chicken bouillon

prosciutto

cream

red onion

marinated artichoke heart

jalapeno chile pepper

tater tot

pork tenderloin

spaghetti

gin

semisweet chocolate

pie crust

cooking spray

spaghetti sauce

bread �our

butterscotch pudding mix

romano cheese

bulgur

hungarian paprika

white balsamic vinegar

picante sauce

meatball

tuna

chili without bean

bean sprout

baking cocoa

chile paste

butter

yellow mustard

haddock

sun�ower seed

processed american cheese

russet potato

allspice

giblet

button mushroom

peanut

kidney bean

portobello mushroom

ranch dressing

almond paste

hazelnut

beef brisket

sake

fruit cocktail

beef sirloin steak

pimento

honeydew melon

low fat milk

salami

german chocolate

pizza sauce

green tomato

orange liqueur

celery salt

chocolate mix

cranberry juice

white pepper

barley

soy milk

sweet

poblano pepper

macadamia nut

goat cheese

tomato soup

tea bag

mixed spice

low fat peanut butter

turkey breast

lemon peel

tomato vegetable juice cocktail

jalapeno pepper

low sodium soy sauce

processed cheese

limeade

arti�cial sweetener

sesame oil

heavy cream

fat free chicken broth

pork shoulder

evaporated milk

corn�ake

bay scallop

chocolate waferwhite sugar

rapid rise yeast potato

�our tortilla

chicken drum

chocolate ice cream

pepper jack cheese

baking potato

italian dressing mix

Figure 2: Ingredient complement network. Two ingredients share an edge if they occur together more thanwould be expected by chance and if their pointwise mutual information exceeds a threshold.

1 2 3 4 5

rating

prop

ortio

n of

revi

ews

with

giv

en ra

ting

0.0

0.1

0.2

0.3

0.4

0.5

0.6

no modificationwith modification

Figure 3: The likelihood that a review suggests amodification to the recipe depends on the star ratingthe review is assigning to the recipe.

In the following, we describe the recipe modifications ex-tracted from user reviews, including adjustment, deletionand addition. We then present how we constructed an in-gredient substitute network based on the extracted informa-tion.

5.1 AdjustmentsSome modifications involve increasing or decreasing the

amount of an ingredient in the recipe. In this and the fol-lowing analyses, we split the review on punctuation suchas commas and periods. We used simple heuristics to de-tect when a review suggested a modification: adding/usingmore/less of an ingredient counted as an increase/decrease.Doubling or increasing counted as an increase, while reduc-ing, cutting, or decreasing counted as a decrease. While it islikely that there are other expressions signaling the adjust-ment of ingredient quantities, using this set of terms allowed

us to compare the relative rate of modification, as well asthe frequency of increase vs. decrease between ingredients.The ingredients themselves were extracted by performing amaximal character match within a window following an ad-justment term.

Figure 4 shows the ratios of the number of reviews sug-gesting modifications, either increases or decreases, to thenumber of recipes that contain the ingredient. Two patternsare immediately apparent. Ingredients that may be per-ceived as being unhealthy, such as fats and sugars, are, withthe exception of vegetable oil and margarine, more likelyto be modified, and to be decreased. On the other hand,flavor enhancers such as soy sauce, lemon juice, cinnamon,Worcestershire sauce, and toppings such as cheeses, baconand mushrooms, are also likely to be modified; however, theytend to be added in greater, rather than lesser quantities.Combined, the patterns suggest that good-tasting but “un-healthy” ingredients can be reduced, if desired, while spices,extracts, and toppings can be increased to taste.

5.2 Deletions and additionsRecipes are also frequently modified such that ingredients

are omitted entirely. We looked for words indicating thatthe reviewer did not have an ingredient (and hence did notuse it), e.g. “had no” and “didn’t have”. We further used“omit/left out/left off/bother with” as indication that thereviewer had omitted the ingredients, potentially for otherreasons. Because reviewers often used simplified terms, e.g.“vanilla” instead of “vanilla extract”, we compared words inproximity to the action words by constructing 4-character-grams and calculating the cosine similarity between the n-grams in the review and the list of ingredients for the recipe.

To identify additions, we simply looked for the word“add”,but omitted possible substitutions. For example, we woulduse “added cucumber”, but not “added cucumber instead ofgreen pepper”, the latter of which we analyze in the follow-ing section. We then compared the addition to the list ofingredients in the recipes, and considered the addition validonly if the ingredient does not already belong in the recipe.

Page 5: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

0.01 0.02 0.05 0.10 0.20 0.50 1.00

0.01

0.02

0.05

0.10

0.20

0.50

1.00

(# reviews adjusting down)/(# recipes)

(# re

view

s ad

just

ing

up)/(

# re

cipe

s)

salt

butter

egg

flour

white sugarwateronion

garlic

milk

vanilla extract

pepper olive oil

vegetable oil

brown sugar

black peppersugar

cinnamon

tomato

margarine

baking powderbaking soda

lemon juice

parsley

cs’. sugar

parmesan

celery

cream cheese

green bell pepper

carrot

walnut

cheddar

sour cream

garlic powderchicken breast

nutmegbasil

pecan

mushroom

mayonnaise

chicken broth

potato

soy sauce

oregano

cornstarch

shortening

honeychocolate chipbacon

worcestershire s.

Figure 4: Suggested modifications of quantity forthe 50 most common ingredients, derived fromrecipe reviews. The line denotes equal numbers ofsuggested quantity increases and decreases.

Table 1 shows the correlation between ingredient modifi-cations. As might be expected, the more frequently an in-gredient occurs in a recipe, the more times its quantity hasthe opportunity to be modified, as is evident in the strongcorrelation between the the number of recipes the ingredientoccurs in and both increases and decreases recommended inreviews. However, the more common an ingredient, the morestable it appears to be. Recipe frequency is negatively cor-related with deletions/recipe (ρ = −0.22), additions/recipe(ρ = −0.25), and increases/recipe (ρ = −0.26). For exam-ple, salt is so essential, appearing in over 21,000 recipes, thatwe detected only 18 reviews where it was explicitly dropped.In contrast, Worcheshire sauce, appearing in 1,542 recipes,is dropped explicitly in 148 reviews.

As might also be expected, additions are positively corre-lated with increases, and deletions with decreases. However,additions and deletions are very weakly negatively corre-lated, indicating that an ingredient that is added frequentlyis not necessarily omitted more frequently as well.

Table 1: Correlations between ingredient modifica-tions

addition deletion increase decrease# recipes 0.41 0.22 0.61 0.68addition -0.15 0.79 0.11deletion 0.09 0.58increase 0.39

5.3 Ingredient substitute networkReplacement relationships show whether one ingredient

is preferable to another. The preference could be basedon taste, availability, or price. Some ingredient substitu-tion tables can be found online1, but are neither extensivenor contain information about relative frequencies of each

1e.g., http://allrecipes.com/HowTo/common-ingredient-substitutions/detail.aspx

Figure 5: Ingredient substitute network. Nodes aresized according to the number of times they havebeen recommended as a substitute for another in-gredient, and colored according to their indegree.

substitution. Thus, we found an alternative source for ex-tracting replacement relationships – users’ comments, e.g.“I replaced the butter in the frosting by sour cream, just tosoothe my conscience about all the fatty calories”.

To extract such knowledge, we first parsed the reviewsas follows: we considered several phrases to signal replace-ment relationships: “replace a with b”, “substitute b for a”,“b instead of a”, etc, and matched a and b to our list ofingredients.

We constructed an ingredient substitute network to cap-ture users’ knowledge about ingredient replacement. Thisweighted, directed network consists of ingredients as nodes.We thresholded and eliminated any suggested substitutionsthat occurred fewer than 5 times. We then determined theweight of each edge by p(b|a), the proportion of substitu-tions of ingredient a that suggest ingredient b. For example,68% of substitutions for white sugar were to splenda, anartificial sweetener, and hence the assigned weight for thesugar → splenda edge is 0.68.

The resulting substitution network, shown in Figure 5,exhibits strong clustering. We examined this structure byapplying the map generator tool by Rosvall et al. [13], whichuses a random walk approach to identify clusters in weighted,directed networks. The resulting clusters, and their relation-ships to one another, are shown in Fig. 6. The derived clus-ters could be used when following a relatively new recipewhich may not receive many reviews, and therefore manysuggestions for ingredient substitutions. If one does not haveall ingredients at hand, one could examine the content ofone’s fridge and pantry and match it with other ingredientsfound in the same cluster as the ingredient called for bythe recipe. Table 2 lists the contents of a few such sampleingredient clusters, and Fig. 7 shows two example clustersextracted from the substitute network.

Page 6: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

Table 2: Clusters of ingredients that can be substi-tuted for one another. A maximum of 5 additionalingredients for each cluster are listed, ordered byPageRank.

main other ingredients

chicken turkey, beef, sausage, chicken breast, baconolive oil butter, apple sauce, oil, banana, margarine

sweet yam, potato, pumpkin, butternut squash,potato parsnipbaking baking soda, cream of tartarpowderalmond pecan, walnut, cashew, peanut, sunflower s.

apple peach, pineapple, pear, mango, pie fillingegg egg white, egg substitute, egg yolk

tilapia cod, catfish, flounder, halibut, orange roughyspinach mushroom, broccoli, kale, carrot, zucchiniitalian basil, cilantro, oregano, parsley, dill

seasoningcabbage coleslaw mix, sauerkraut, bok choy

napa cabbage

Finally, we examine whether the substitution network en-codes preferences for one ingredient over another, as evi-denced by the relative ratings of similar recipes, one whichcontains an original ingredient, and another which imple-ments a substitution. To test this hypothesis, we constructa “preference network”, where one ingredient is preferred toanother in terms of received ratings, and is constructed bycreating an edge (a, b) between a pair of ingredients, where aand b are listed in two recipes X and Y respectively, if reciperatings RX > RY . For example, if recipe X includes beef,ketchup and cheese, and recipe Y contains beef and pick-les, then this recipe pair contributes to two edges: one frompickles to ketchup, and the other from pickles to cheese. Theaggregate edge weights are defined based on PMI. BecausePMI is a symmetric quantity (PMI(a; b) = PMI(b; a)), weintroduce a directed PMI measure to cope with the direc-tionality of the preference network:

PMI(a→ b) = logp(a→ b)

p(a)p(b),

where

p(a→ b) =# of recipe pairs from a to b

# of recipe pairs,

and p(a), p(b) are defined as in the previous section.We find high correlation between this preference network

and the substitution network (ρ = 0.72, p < 0.001). This ob-servation suggests that the substitute network encodes users’ingredient preference, which we use in the recipe predictiontask described in the next section.

6. RECIPE RECOMMENDATIONWe use the above insights to uncover novel recommen-

dation algorithms suitable for recipe recommendations. Weuse ingredients and the relationships encoded between themin ingredient networks as our main feature sets to predictrecipe ratings, and compare them against features encod-ing nutrition information, as well as other baseline featuressuch as cooking methods, and preparation and cook time.

chicken,..

tilapia,..

italian seasoning,..seasoning,..

onion,..

garlic,..chicken broth,..

milk,..

sour cream,..

honey,..

olive oil,..

spinach,..

bread,..apple,..

sweet potato,..

cinnamon,..

black bean,..

flour,..

tomato,..

sauce,..

lemon juice,..

pepper,..brown rice,..

white wine,..

strawberry,..

spaghetti sauce,..

almond extract,..vanilla,..

cheese,..

almond,..

chocolate chip,..

baking powder,..

cream of mushroom soup,..

egg,..

cranberry,..pie crust,..

cabbage,..

celery,..

champagne,..

coconut milk,..

corn chip,..

sea scallop,..

apple juice,..

hoagie roll,..

iceberg lettuce,..

cottage cheese,..

golden syrup,.. black olive,..

pickle,..

red potato,..

quinoa,..

graham cracker,..

lemon cake mix,..

imitation crab meat,..

peach schnapp,..

hot,..

vegetable shortening,..pumpkin seed,..

lemonade,..

curry powder,..

dijon mustard,..

sugar snap pea,..

smoked paprika,..

Figure 6: Ingredient substitution clusters. Nodesrepresent clusters and edges indicate the presence ofrecommended substitutions that span clusters. Eachcluster represents a set of related ingredients whichare frequently substituted for one another.

milkheavy whipping cream

whole milk

skim milk

whipping cream

heavy cream

buttermilk

soy milk

half and half

evaporated milk

cream

cinnamon

ginger

pumpkin pie spicecardamom

nutmeg allspice

ginger root

clove

mace

(a) milk substitutes (b) cinammon substitutes

Figure 7: Relationships between ingredients locatedwithin two of the clusters from Fig. 6.

Then we apply a discriminative machine learning method,stochastic gradient boosting trees [6], to predict recipe rat-ings.

In the experiments, we seek to answer the following threequestions. (1) Can we predict users’ preference for a newrecipe given the information present in the recipe? (2) Whatare the key aspects that determine users’ preference? (3)Does the structure of ingredient networks help in recipe rec-ommendation, and how?

6.1 Recipe Pair PredictionThe goal of our prediction task is: given a pair of similar

recipes, determine which one has higher average rating thanthe other. This task is designed particularly to help userswith a specific dish or meal in mind, and who are trying todecide between several recipe options for that dish.

Recipe pair data. The data for this prediction taskconsists of pairs of similar recipes. The reason for select-ing similar recipes, with high ingredient overlap, is thatwhile apples may be quite comparable to oranges in thecontext of recipes, especially if one is evaluating salads ordesserts, lasagna may not be comparable to a mixed drink.To derive pairs of related recipes, we computed similarity

Page 7: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

with a cosine similarity between the ingredient lists for thetwo recipes, weighted by the inverse document frequency,log(# of recipes/# of recipes containing the ingredient).We considered only those pairs of recipes whose cosine sim-ilarity exceeded 0.2. The weighting is intended to identifyhigher similarity among recipes sharing more distinguishingingredients, such as Brussels sprouts, as opposed to recipessharing very common ones, such as butter.

A further challenge to obtaining reliable relative rankingsof recipes is variance introduced by having different userschoose to rate different recipes. In addition, some usersmight not have a sufficient number of reviews under theirbelt to have calibrated their own rating scheme. To con-trol for variation introduced by users, we examined recipepairs where the same users are rating both recipes and arecollectively expressing a preference for one recipe over an-other. Specifically, we generated 62,031 recipe pairs (a, b)where ratingi(a) > ratingi(b), for at least 10 users i, andover 50% of users who rated both recipe a and recipe b. Fur-thermore, each user i should be an active enough reviewerto have rated at least 8 other recipes.

Features. In the prediction dataset, each observationconsists of a set of predictor variables or features that rep-resent information about two recipes, and the response vari-able is a binary indicator of which gets the higher rating onaverage. To study the key aspects of recipe information, weconstructed different set of features, including:

• Baseline: This includes cooking methods, such as chop-ping, marinating, or grilling, and cooking effort de-scriptors, such as preparation time in minutes, as wellas the number of servings produced, etc. These fea-tures are considered as primary information about arecipe and will be included in all other feature setsdescribed below.

• Full ingredients: We selected up to 1000 popular ingre-dients to build a “full ingredient list”. In this featureset, each observed recipe pair contains a vector withentries indicating whether an ingredient from the fulllist is present in either recipe in the pair.

• Nutrition: This feature set does not include any in-gredients but only nutrition information such the totalcaloric content, as well as quantities of fats, carbohy-drates, etc.

• Ingredient networks: In this set, we replaced the fullingredient list by structural information extracted fromdifferent ingredient networks, as described in Sections 4and 5.3. Co-occurrence is treated separately as a rawcount, and a complementarity, captured by the PMI.

• Combined set: Finally, a combined feature set is con-structed to test the performance of a combination offeatures, including baseline, nutrition and ingredientnetworks.

To build the ingredient network feature set, we extractedthe following two types of structural information from theco-occurrence and substitution networks, as well as the com-plement network derived from the co-occurrence informa-tion:Network positions are calculated to represent how a recipe’s

ingredients occupy positions within the networks. Such po-sition measures are likely to inform if a recipe contains any“popular” or “unusual” ingredients. To calculate the posi-tion measures, we first calculated various network centrality

baseline

full ingredients

nutrition

ing. networks

combined

Accuracy

0.60

0.65

0.70

0.75

0.80

Figure 8: Prediction performance. The nutritioninformation and ingredient networks are more effec-tive features than full ingredients. The ingredientnetwork features lead to impressive performance,close to the best performance.

measures, including degree centrality, betweenness central-ity, etc., from the ingredient networks. A centrality measurecan be represented as a vector ~g where each entry indicatesthe centrality of an ingredient. The network position of arecipe, with its full ingredient list represented as a binary

vector ~f , can be summarized by ~gT · ~f , i.e., an aggregatedcentrality measure based on the centrality of its ingredients.

Network communities provide information about whichingredient is more likely to co-occur with a group of otheringredients in the network. A recipe consisting of ingredientsthat are frequently used with, complemented by or substi-tuted by certain groups may be predictive of the ratingsthe recipe will receive. To obtain the network communityinformation, we applied latent semantic analysis (LSA) onrecipes. We first factorized each ingredient network, rep-resented by matrix W , using singular value decomposition(SVD). In the matrix W , each entry Wij indicates whetheringredient i co-occurrs, complements or substitues ingredi-ent j.

Suppose Wk = UkΣkVTk is a rank-k approximation of W ,

we can then transform each recipe’s full ingredient list using

the low-dimensional representation, Σ−1k V T

k~f , as community

information within a network. These low-dimensional vec-tors, together with the vectors of network positions, consti-tute the ingredient network features.

Learning method. We applied discriminative machinelearning methods such as support vector machines (SVM) [2]and stochastic gradient boosting trees [5] to our predictionproblem. Here we report and discuss the detailed resultsbased on the gradient boosting tree model. Like SVM, thegradient boosting tree model seeks a parameterized classi-fier, but unlike SVM that considers all the features at onetime, the boosting tree model considers a set of featuresat a time and iteratively combines them according to theirempirical errors. In practice, it not only has competitiveperformance comparable to SVM, but can serve as a featureranking procedure [11].

In this work, we fitted a stochastic gradient boosting treemodel with 8 terminal nodes under an exponential loss func-tion. The dataset is roughly balanced in terms of whichrecipe is the higher-rated one within a pair. We randomly

Page 8: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

feature

impo

rtan

ce

0.0

0.2

0.4

0.6

0.8

1.0

20 40 60 80 100

group

nutrition (6.5%)

cook effort (5.0%)

ing. networks (84%)

cook methods (3.9%)

Figure 9: Relative importance of features in thecombined set. The individual items from nutri-tion information are very indicative in differentiat-ing highly rated recipes, while most of the predictionpower comes from ingredient networks.

feature

impo

rtan

ce

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

20 40 60 80 100

network

substitution (39.8%)

co−occurrence (30.9%)

complement (29.2%)

Figure 10: Relative importance of features repre-senting the network structure. The substitution net-work has the strongest contribution (39.8%) to thetotal importance of network features, and it also hasmore influential features in the top 100 list, whichsuggests that the substitution network is comple-mentary to other features.

divided the dataset into a training set (2/3) and a testingset (1/3). The prediction performance is evaluated based onaccuracy, and the feature performance is evaluated in termsof relative importance [8]. For each single decision tree, oneof the input variables, xj , is used to partition the region as-sociated with that node into two subregions in order to fitto the response values. The squared relative importance ofvariable xj is the sum of such squared improvements overall internal nodes for which it was chosen as the splittingvariable, as:

imp(j) =∑k

i2kI(splits on xj)

where i2k is the empirical improvement by the k-th nodesplitting on xj at that point.

6.2 ResultsThe overall prediction performance is shown in Fig. 8.

Surprisingly, even with a full list of ingredients, the pre-diction accuracy is only improved from .712 (baseline) to

feature

impo

rtan

ce

0.0

0.2

0.4

0.6

0.8

1.0

2 4 6 8 10 12

nutrition

carbs (20.9%)

cholesterol (17.7%)

calories (19.7%)

sodium (16.8%)

fiber (12.3%)

fat (12.4%)

Figure 11: Relative importance of features fromnutrition information. The carbs item is the mostinfluential feature in predicting higher-rated recipes.

.746. In contrast, the nutrition information and ingredientnetworks are more effective (with accuracy .753 and .786, re-spectively). Both of them have much lower dimensions (fromtens to several hundreds), compared with the full ingredientsthat are represented by more than 2000 dimensions (1000ingredients per recipe in the pair). The ingredient networkfeatures lead to impressive performance, close to the bestperformance given by the combined set (.792), indicatingthe power of network structures in recipe recommendation.

Figure 9 shows the influence of different features in thecombined feature set. Up to 100 features with the highestrelative importance are shown. The importance of a featuregroup is summarized by how much the total importance iscontributed by all features in the set. For example, thebaseline consisting of cooking effort and cooking methodscontribute 8.9% to the overall performance. The individualitems from nutrition information are very indicative in differ-entiating highly-rated recipes, while most of the predictionpower comes from ingredient networks (84%).

Figure 10 shows the top 100 features from the three net-works. In terms of the total importance of ingredient net-work features, the substitution network has slightly strongercontribution (39.8%) than the other two networks, and italso has more influential features in the top 100 list. Thissuggests that the structural information extracted from thesubstitution network is not only important but also comple-mentary to information from other aspects.

Looking into the nutrition information (Fig. 11), we foundthat carbohydrates are the most influential feature in pre-dicting higher-rated recipes. Since carbohydrates comprisearound 50% or more of total calories, the high importanceof this feature interestingly suggests that a recipe’s ratingcan be influenced by users’ concerns about nutrition anddiet. Another interesting observation is that, while individ-ual nutrition items are powerful predictors, a higher predic-tion accuracy can be reached by using ingredient networksalone, as shown in Fig. 8. This implies the informationabout nutrition may have been encoded in the ingredientnetwork structure, e.g. substitutions of less healthful ingre-dients with “healthier” alternatives.

Constructing the ingredient network feature involves re-ducing high-dimensional network information through SVD,as described in the previous section. The dimensionality canbe determined by cross-validation. As shown in Fig. 12, fea-tures with a very large dimension tend to overfit the training

Page 9: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

Dimensions

Acc

urac

y

0.76

0.77

0.78

0.79

0.80

●●

●●

10 20 30 40 50 60 70

network

● combined

substitution

complement

co−occurrence

Figure 12: Prediction performance over reduceddimensionality. The best performance is given byreduced dimension k = 50 when combining all threenetworks. In addition, using the information aboutthe complement network alone is more effective inprediction than using other two networks.

4

8

43

19

6

chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlim

e juicelem

on extractchocolate puddingalm

ond extractcream

of chicken soupbeefalm

ondkalevanillavanilla extractevaporated m

ilksour creambutterm

ilkchicken brothhalf and halfm

ilkbrow

n sugarbutterhoneyapplesauceolive oilsplenda

−0.5Value

Color Key

4 8 43 19 6

chicken breastporkitalian sausagesausagechickenturkeycoconut extractwalnutlime juicelemon extractchocolate puddingalmond extractcream of chicken soupbeefalmondkalevanillavanilla extractevaporated milksour creambuttermilkchicken brothhalf and halfmilkbrown sugarbutterhoneyapplesauceolive oilsplenda

−0.5 0.5Value

Color Key

ingredient

svd dimension

1

2

3

4

5

Figure 13: Influential substitution communities.The matrix shows the most influential feature di-mensions extracted from the substitution network.For each dimension, the six representative ingredi-ents with the highest intensity values are shown,with colors indicating their intensity. These featuressuggest that the communities of ingredient substi-tutes, such as the sweet and oil in the first dimen-sion, are particularly informative in prediction.

data. Hence we chose k = 50 for the reduced dimension ofall three networks. The figure also shows that using theinformation about the complement network alone is moreeffective in prediction than using either the co-occurrenceand substitute networks, even in the case of low dimen-sions. Consistently, as shown in terms of relative importance(Fig. 10), the substitution network alone is not the most ef-fective, but it provides more complementary information inthe combined feature set.

In Figure 13 we show the most representative ingredientsin the decomposed matrix derived from the substitution net-work. We display the top five influential dimensions, eval-uated based on the relative importance, from the SVD re-sultant matrix Vk, and in each of these dimensions we ex-tracted six representative ingredients based on their inten-sities in the dimension (the squared entry values). Theserepresentative ingredients suggest that the communities ofingredient substitutes, such as the sweet and oil substitutesin the first dimension or the milk substitutes in the seconddimesion (which is similar to the cluster shown in Fig. 6),are particularly informative in predicting recipe ratings.

To summarize our observations, we find we are able toeffectively predict users’ preference for a recipe, but the pre-diction is not through using a full list of ingredients. Instead,by using the structural information extracted from the re-lationships among ingredients, we can better uncover users’preference about recipes.

7. CONCLUSIONRecipes are little more than instructions for combining

and processing sets of ingredients. Individual cookbooks,even the most expansive ones, contain single recipes for eachdish. The web, however, permits collaborative recipe gen-eration and modification, with tens of thousands of recipescontributed in individual websites. We have shown how thisdata can be used to glean insights about regional preferencesand modifiability of individual ingredients, and also how itcan be used to construct two kinds of networks, one of in-gredient complements, the other of ingredient substitutes.These networks encode which ingredients go well together,and which can be substituted to obtain superior results, andpermit one to predict, given a pair of related recipes, whichone will be more highly rated by users.

In future work, we plan to extend ingredient networks toincorporate the cooking methods as well. It would also beof interest to generate region-specific and diet-specific rat-ings, depending on the users’ background and preferences.A whole host of user-interface features could be added forusers who are interacting with recipes, whether the recipeis newly submitted, and hence unrated, or whether they arebrowsing a cookbook. In addition to automatically predict-ing a rating for the recipe, one could flag ingredients thatcan be omitted, ones whose quantity could be tweaked, aswell as suggested additions and substitutions.

8. ACKNOWLEDGMENTSThis work was supported by MURI award FA9550-08-1-

0265 from the Air Force Office of Scientific Research. Themethodology used in this paper was developed with sup-port from funding from the Army Research Office, Multi-University Research Initiative on Measuring, Understand-ing, and Responding to Covert Social Networks: Passive andActive Tomography. The authors gratefully acknowledge D.Lazer for support.

9. REFERENCES[1] Ahn, Y., Ahnert, S., Bagrow, J., and Barabasi, A.

Flavor network and the principles of food pairing.Bulletin of the American Physical Society 56 (2011).

[2] Cortes, C., and Vapnik, V. Support-vector networks.Machine learning 20, 3 (1995), 273–297.

Page 10: Recipe recommendation using ingredient networks · We then generated an ingredient list sorted by frequency of ingredient occurrence and selected the top 1000 common ingredient names

[3] Forbes, P., and Zhu, M. Content-boosted matrixfactorization for recommender systems: Experimentswith recipe recommendation. Proceedings ofRecommender Systems (2011).

[4] Freyne, J., and Berkovsky, S. Intelligent foodplanning: personalized recipe recommendation. In IUI,ACM (2010), 321–324.

[5] Friedman, J. Stochastic gradient boosting.Computational Statistics & Data Analysis 38, 4(2002), 367–378.

[6] Friedman, J., Hastie, T., and Tibshirani, R. Additivelogistic regression: a statistical view of boosting.Annals of Statistics 28 (1998), 2000.

[7] Geleijnse, G., Nachtigall, P., van Kaam, P., andWijgergangs, L. A personalized recipe advice systemto promote healthful choices. In IUI, ACM (2011),437–438.

[8] Hastie, T., Tibshirani, R., Friedman, J., and Franklin,J. The elements of statistical learning: data mining,inference and prediction. The MathematicalIntelligencer 27, 2 (2005).

[9] Kamieth, F., Braun, A., and Schlehuber, C. Adaptiveimplicit interaction for healthy nutrition and foodintake supervision. Human-Computer Interaction.Towards Mobile and Intelligent InteractionEnvironments (2011), 205–212.

[10] Kinouchi, O., Diez-Garcia, R., Holanda, A.,Zambianchi, P., and Roque, A. The non-equilibriumnature of culinary evolution. New Journal of Physics10 (2008), 073020.

[11] Lu, Y., Peng, F., Li, X., and Ahmed, N. Couplingfeature selection and machine learning methods fornavigational query identification. In CIKM, ACM(2006), 682–689.

[12] Rombauer, I., Becker, M., Becker, E., and Maestro, L.Joy of cooking. Scribner Book Company, 1997.

[13] Rosvall, M., and Bergstrom, C. Maps of random walkson complex networks reveal community structure.PNAS 105, 4 (2008), 1118.

[14] Shidochi, Y., Takahashi, T., Ide, I., and Murase, H.Finding replaceable materials in cooking recipe textsconsidering characteristic cooking actions. In Proc. ofthe ACM multimedia 2009 workshop on Multimediafor cooking and eating activities, ACM (2009), 9–14.

[15] Svensson, M., Hook, K., and Coster, R. Designing andevaluating kalas: A social navigation system for foodrecipes. ACM Transactions on Computer-HumanInteraction (TOCHI) 12, 3 (2005), 374–400.

[16] Ueda, M., Takahata, M., and Nakajima, S. User’s foodpreference extraction for personalized cooking reciperecommendation. Proc. of the Second Workshop onSemantic Personalized Information Management:Retrieval and Recommendation (2011).

[17] Wang, L., Li, Q., Li, N., Dong, G., and Yang, Y.Substructure similarity measurement in chineserecipes. In WWW, ACM (2008), 979–988.

[18] Wikipedia. Outline of food preparation, 2011. [Online;accessed 22-Oct-2011].

[19] Zhang, Q., Hu, R., Mac Namee, B., and Delany, S.Back to the future: Knowledge light case base cookery.In Proc. of The 9th European Conference onCase-Based Reasoning Workshop (2008), 15.