View on GitHub

Exploring Recipe Complexity Trends

Do people tend to make more complex recipes as time goes on?

Predicting-Recipe-Trends

source: food.com

by Rio Aguina-Kang (raguinakang@ucsd.edu) and Judel Ancayan (jancayan@ucsd.edu)


Framing the Problem

In this project, our objective was to explore the factors influencing recipe complexity. We constructed two regression models: a baseline linear regressor and a final multilayer perceptron (MLP) regressor.

The response variable in both models is the recipe complexity, which we defined as the number of steps required to complete the recipe. We selected the number of steps as a measure of complexity because simpler recipes tend to have fewer steps (such as making a sandwich), while more complex recipes involve multiple intricate steps (like making a pizza). Our hypothesis was that certain features would correlate with the number of steps in a recipe, and we aimed to investigate these relationships.

The features used to predict the number of steps were as follows:

We believed that these features could potentially provide insights into the recipe complexity. The models were trained to predict the number of steps based on these input features.

To evaluate the models’ performance, we utilized the regression coefficient as a scoring metric. The regression coefficient indicates the strength and direction of the relationship between each feature and the recipe complexity. We preferred this metric over the root mean square error (RMSE) since accurately predicting the exact number of steps can be challenging due to its discrete nature. By using the regression coefficient, we normalized the scores to a range between 0 and 1, facilitating comparison and interpretation of feature importance.

Prior to training the models, the data underwent a thorough cleaning process, as outlined in the data cleaning section of this project (refer to this project’s data cleaning section). The relevant columns utilized in this project were retained for analysis.

Both the baseline linear regressor and the final MLP regressor were fitted to the cleaned dataset. The models’ predictions were evaluated based on their regression coefficients, allowing us to identify the features that exerted the most significant influence on recipe complexity.

Please note that the complete details of the data cleaning process and the regression coefficient analysis can be found in the referenced project’s documentation.

The dataframe below represents the first 5 rows of the dataset we used:

print(unique_recipe.head().to_markdown(index=False))
minutes calories n_ingredients year n_steps average rating
50 386.1 7 2008 11 3
55 377.1 8 2008 6 3
45 326.6 9 2008 7 3
45 577.7 9 2008 11 5
25 386.9 9 2008 8 5

Baseline Model

To create a baseline model that we could compare other models against, we developed a linear regression model to predict the number of steps (n_steps) for recipes using two features: the year the recipe was released and the number of calories. Our model incorporates various components from the scikit-learn library to create a pipeline that performs preprocessing steps and fits a linear regression model.

In terms of feature representation, we have identified two numerical key features in our model. The ‘year released’ feature is considered quantitative, while the ‘calories’ feature is also quantitative in nature. While time in of itself is usually considered a continuous variable, since we are building our model on strictly the year that these recipes were released instead of including days, minutes, or seconds, we are considering them as a discrete numerical variable. On the other hand, calories is a continuous variable, taking on forms within a certain range or interval, and they can have an infinite number of possible values between any two specific values. Additonally, since calories can take on completely different values depending on the kinds of foods a recipe is making, we decided to standardize calories for our model to ensure consistency.

Moving on to the performance of our model, the provided results indicate the scores:

(Note: testing set represents 20% of the data)

Ultimately, we consider these scores to be very low. Both scores are close in value regardless of testing or training sets, and we do not think this baseline model is a satisfactory (“good”) fit for predicting the number of steps for a recipe. This suggests that our current model is not performing well in capturing the variability in the target variable (n_steps) based on the given features.


Final Model

The final model we developed aimed to improve our prediction of the number of steps (n_steps) for recipes.

Feature Selection

In addition to the original features used in the baseline model (minutes, calories, year, and n_ingredients), we introduced feature engineering steps to preprocess the data. The features we incorporated are ‘minutes’, ‘calories’, ‘year’, and ‘n_ingredients’. Here’s why we believe these features are beneficial for the prediction task:

Modeling Algorithm and Hyperparameter Tuning

For our final model, we chose the MLPRegressor algorithm, a neural network-based regressor capable of capturing complex relationships in the data. The initial pipeline configuration included a hidden layer size of 20 neurons. We then conducted hyperparameter tuning using grid search to find the best combination of hyperparameters that maximizes the model’s performance.

The hyperparameters explored during grid search included ‘hidden_layer_sizes’, ‘activation’, ‘solver’, ‘alpha’, ‘learning_rate’, and ‘early_stopping’.

hyperparams = {
    'regress__hidden_layer_sizes': [(10,), (20,)],
    'regress__activation': ['relu', 'tanh'],
    'regress__solver': ['lbfgs', 'adam'],
    'regress__alpha': [0.0001, 0.001],
    'regress__learning_rate': ['constant', 'adaptive'],
    'regress__early_stopping': [True, False]
}

By searching over different combinations of these hyperparameters, we aimed to identify the optimal configuration that produces the best predictive performance.

Grid Search Results and Best Hyperparameters

The grid search was conducted, and the best hyperparameters were determined based on the mean squared error scores. The best hyperparameters found were as follows:

These hyperparameters represent the optimal configuration identified through grid search, aiming to minimize the model’s error and enhance its predictive capabilities.

The final model’s performance is an improvement over the baseline model, as it incorporates additional features and a more sophisticated modeling algorithm. The baseline model’s performance was relatively low, with scores of 0.027 for the train set and 0.020 for the test set. The final model, with the optimized hyperparameters identified through GridSearchCV, aims to enhance the predictive capabilities by leveraging the added features and adjusting the neural network’s architecture. The selected hyperparameters represent the best combination that maximizes the model’s performance based on the chosen scoring metric.

Final Model Performance

Using the best hyperparameters obtained from the grid search, we rebuilt the pipeline with the best hyperparameters. The pipeline consists of feature engineering steps and the MLPRegressor model configured with the identified hyperparameters.

The final model was trained on the training dataset (testing set represents 20% of the data) and evaluated on both the training and testing datasets. The performance of the final model is as follows:

The train score represents the model’s performance on the training dataset, while the test score indicates the model’s performance on the unseen testing dataset. These scores reflect the accuracy of the model in predicting the number of steps (n_steps) for recipes based on the given features.

Model Performance Analysis

Comparing the final model’s performance to the baseline model, we observe an enourmous improvement in the predictive capabilities. The baseline model had much lower scores (Train Score: 0.027, Test Score: 0.020) compared to the final model. This indicates that the final model captures more variability in the target variable (n_steps) based on the selected features.

The enhancements in the final model’s performance can be attributed to the following factors:

By incorporating these improvements, the final model demonstrates better accuracy in predicting the number of steps for recipes based on the given features.

Conclusion

In conclusion, we developed a final model using an MLPRegressor algorithm with optimized hyperparameters obtained through grid search. The model incorporated feature engineering steps to preprocess the data and improve its performance. The final model outperformed the baseline model in terms of accuracy, showcasing its enhanced predictive capabilities.

Although our final model exhibited substantial improvement compared to the baseline model, we recognize that its accuracy implies the potential existence of other algorithms or features that could yield even higher scores. In terms of future directions, we aim to explore the incorporation of additional features such as tags and ingredients. Unfortunately, due to limitations in computational resources, we were unable to encode and analyze these features in our current model. However, we acknowledge that incorporating such information may provide valuable insights and potentially enhance the model’s predictive performance.


Fairness Analysis

In order to analyze the fairness of our final model, we performed a permutation test on our data. This test was run with 1000 trials at a significance of 0.05. The following hypotheses were used to lead this test:

In order to perform this permutation test, we split the data into two groups: recipes with a high rating (defined as having an average rating greater than 3.5) and recipes with a low rating (average rating of 3.5 or lower). We then used these groups to calculate regression coefficients on the final model (we used the regression coefficient as opposed to the RMSE for the same reason stated under framing the problem), and subtracted the regression coefficient. The resulting value is our observed statistic.

The test statistics will be calculated in a similar manner:

Regression Coefficients of Recipes with High Average Rating Regression Coefficients of Recipes with Low Average Rating

The test statistics found by this permutation test are graphed below, with the red line representing the observed test statistic:

The p-value is calculated to be 0.101, which fails to reject the null hypothesis at a significance of 0.05.

Conclusion

Since the permutation test failed to reject the null hypothesis, it is likely (although not definitive) that our model is fair.