Non-Linear Programming: A Multiple Data Set Approach

Introduction

We have all been there. You are at the end of the school semester, and the only thing left to do is take that dreaded final exam. Like me, you would calculate the minimum exam score needed to receive a particular grade. For example, you might be able to score a 70% on your History exam and still get a B in the class. This is valuable information since you might decide to spend more time studying for other exams if you think a 70% is easily achievable.

In this article, we will solve this very straightforward optimization problem programmatically using R’s ROI package. In the first example, we will find the minimum grade needed on the final exam to receive a B in our History class.

In the second example, we will find the minimum final exam grade needed to receive a B (80%) for all four classes: History, Science, Math, & English. This article aims to show you how to solve similarly structured optimization problems with various data sets all in one function.

Linear v Non-Linear Solvers

Although the problems presented in this article are technically linear optimization problems, we will solve them using customized functions and non-linear solvers. The benefit of this approach is that it offers the most flexibility to optimization problems one faces in the wild.

Linear solver solutions are relatively easy to set up and understand. Linear solvers also find the optimal global solution without the risk of finding only locally optimal solutions. However, non-linear problem setups and solutions allow for conditionals (if-then statements) that are not allowed in linear solver solutions.

Conditional statements are prevalent in many business-type optimization problems. Therefore, the solutions presented below concentrate on setting up customized functions and non-linear solver solutions that offer the most flexibility.

R Versus Excel

Excel’s solver works great on single data sets and would work great to solve the minimum final exam grade needed to achieve a B in History. In fact, you could hypothetically set up 4 different solvers on 4 independent tabs for all four courses and solve each of them independently reasonably quickly as well.

However, consider these questions.
  • How efficient is Excel if we had to solve for 100 different data sets?
  • How do we update our tabs in Excel if our data changes? For example, what if another assignment is graded?
  • Why does Excel return a local solution that is 20% less optimal compared to R?
Excel might get bogged down and become very inefficient if you had to run 100 solvers independently on 100 tabs - even if you could program it using VBA. It is also hard to have new data flow into 100 different tabs efficiently. Lastly, Excel’s non-linear solutions are often 10-20% less optimal than more mature solvers.

However, there are some benefits of using Excel alongside R. For starters, the initial setup of a single solver can be performed much quicker in Excel. In addition, it is much easier to understand the problem visually in Excel. For example, you can color cells, highlight summaries, add labels, and much more.

To summarize, the initial setup for optimization problems should be performed in Excel. However, if you have new data continuously flowing in or are solving for multiple data sets, you should transition your solver into R.

ROI Package versus OMPR Package

There is an ompr package in R that allows for tidy linear programming and is simple to use. However, at this time, the ompr package does not support non-linear functions.

Although the ROI package has a steeper learning curve, you can quickly set up non-linear functions once you get comfortable with it. The ROI package also provides a great deal of flexibility for most business-related optimization problems.

As stated earlier, the problems we are solving in the article are technically linear, and we run the risk of possibly not finding the global optimal. Nevertheless, as mentioned earlier, we will solve the customized functions with non-linear methods to highlight the benefits of this approach.


Gratitude & References

If you want more detailed information about the topics discussed in this article, please check out Matt Dancho’s Learning Labs Pro: 15 & 16. If you are not a Learning Lab Pro member and love R or data science, you should definitely subscribe. As a disclaimer, I have no affiliation with BSU. I am just an avid student and a huge fan.


Load Libraries

Before we begin the analysis, let’s first load in a few libraries - including the ROI.plugin.alabama package.

# Solver Backend
library(ROI)
library(ROI.plugin.alabama)

# Core
library(tidyverse)


Part 1 | Section 1: Calculate Minimum Final Exam Score for History Class

Remember, our goal is to find the lowest possible final exam score that will allow us to get a B. To date, our overall class grade sits at 90% (45 points out of 50). The weight of the final exam is 50%. Therefore, we can score as low as a 70% on the final exam and still achieve a B in our overall class grade. See a summary of the math below.

(90% assignment total score x 0.50 wt) + (70% final exam score x 0.50 wt) = 80% overall grade ~ B

The current grades and summary stats are below. We will use this raw data to set up our functions later on.
  • Assignment #1: 12 pnts out of 15 ~ 80%
  • Assignment #2: 20 pnts out of 20 ~ 100%
  • Assignment #3: 13 pnts out of 15 ~ 87%


Part 1 | Section 2: Set Up History Scores in a Tibble

We will set up a data tibble for our current History course scores for the first 3 assignments. Remember, the total points earned was a total of 45 out of 50 points.

history_current_grade_tbl <- tibble(
  Subject       = rep(c("History"), times = c(3)),
  Assignment    = c("Assign_01", "Assign_02", "Assign_03"),
  Pnts_Earned   = c(12.00, 20.00, 13.00),
  Pnts_Possible = c(15.00, 20.00, 15.00))

history_current_grade_tbl







We also know that the final exam is worth 50 points. We will set up a variable below and use it later in our customized function.

Pnts_Possible_Final_Exam <- 50


Part 1 | Section 3: Set Up Function to Calculate Points Needed

Next, we will set up a customized function that will allow us to pass through an anonymous value for the points earned on the Final Exam. Once a value is plugged into the function, it will calculate the final grade. Remember, the final exam score needed to achieve our goal is “technically unknown” to this function.  With that said, we will test it out manually before plugging it into the non-linear solver.

calc_Final_Exam_Points <- function(Pnts_Earned_Final_Exam) {
  history_current_grade_tbl %>%
    
    # Sum up current points earned and sum up all current possible points
    summarise(Pnts_Earned_sum   = sum(Pnts_Earned),
              Pnts_Possible_sum = sum(Pnts_Possible)) %>%
    
    # Add Unknown Final Exam Pnts Earned to Current Pnts Earned
    mutate(Total_Pnts_Earned = Pnts_Earned_Final_Exam + Pnts_Earned_sum) %>%
    
    # Add Final Exam Possible Possible Pnts to Current Possible Pnts
    mutate(Total_Pnts_Possible = Pnts_Possible_Final_Exam + Pnts_Possible_sum) %>%
    
    # Calculate Total Grade for Class
    mutate(Final_Class_Grade = Total_Pnts_Earned/Total_Pnts_Possible) %>%
    
    # Pull out Final Class Grade
    pull(Final_Class_Grade)
}

When we plug in 35 final exam points earned into the function below, the final grade equals 80%.
In other words, we would need a 70% on the final (35 points earned / 50 points possible) to get an overall score of 80%. Therefore, we know the function is working correctly.

calc_Final_Exam_Points(35)



Part 1 | Section 4: Set Up a Single Data Set Optimization Objective


We still have not technically solved this problem using a solver solution. At this point, we have only created a function that allows us to calculate our scores manually with a customized function.

Therefore, let’s assume we didn’t know the answer and we needed to solve it programmatically. The number of outputs we will receive from the solver function equals one (1), so we will set our n variable = 1. We also know we want to achieve an 80% overall grade, so we can also set that variable up.

n <- 1
final_grade <- 0.80

Next, we will create a function that is described in detail below.
  • A single data set will be allowed to pass through the function. In this example, the data set is the history_current_grade_tbl which includes all of our History course scores to date.
  • The objective is to calculate the Final Exam points needed to achieve a B (80% overall grade).
  • The only constraint in the solver set is that the final grade must be equal to 80%.
  • Since this is a minimization problem, we set maximum = FALSE
  • Lastly, we can then solve the problem using the ROI_solve function and the alabama solver to return the results in a well-formatted tibble using dplyr functions.

single_data_set_optimization <- function(data = data) {

  # Set up solver (think of Excel's Solver Add-in)
  final_exam_grade_nlp <- OP(
    objective   = F_objective(F = calc_Final_Exam_Points, n = n, names = "Pnts_Earned_Final_Exam"),
    constraints = rbind(
      F_constraint(F = calc_Final_Exam_Points, dir = "==", rhs = final_grade)
    ),
    maximum = FALSE
  )
  
  # Solve for optimal solution
  sol <- ROI_solve(final_exam_grade_nlp, solver = "alabama", start = rep(1/n, n))
  
  # Return the optimal solution results in a tidy tibble format
  return(
    bind_cols(
      tibble(Final_Grade_Perc  = calc_Final_Exam_Points(sol$solution)),
      enframe(sol$solution) %>% spread(key = name, value = value),
      tibble(Pnts_Possible_Final_Exam = Pnts_Possible_Final_Exam)) %>%
      
      mutate(Final_Exam_Score_Needed = Pnts_Earned_Final_Exam/Pnts_Possible_Final_Exam)
    
  )
}

Now we can pass the grade tibble into the function and view the results.

history_grade_summary <- single_data_set_optimization(history_current_grade_tbl)

glimpse(history_grade_summary)







The formatted tibble returns the appropriate results we expected when calculating the solution manually. To receive a minimum overall grade of 80%, we need to score a 70% on the final exam. The setup of these functions will now allow us to programmatically figure out what we need to achieve on all of our final exams to get Bs in all of our classes.


Part 2 | Section 1: Calculate Minimum Final Exam Score for All Four Classes

In Part 2, we will now explore calculating the minimum final exam scores needed to get Bs in all four courses: History, Science, Math & English. A summary of our current grades to date is listed below.

To keep the examples simple, we assume only been 3 assignments have been given in each of the 4 classes, and all four final exams are worth 50 points.
  • Current History Grade: 90% (45.0 pnts out of 50)
  • Current Science Grade: 87% (43.5 pnts out of 50)
  • Current Math Grade: 76% (38.0 pnts out of 50)
  • Current English Grade: 68% (34.0 pnts out of 50)

Part 2 | Section 2: Set Up All Subjects’ Scores in a Tibble

We will set up a data tibble for all current scores for the first 3 assignments for all 4 subjects.

all_subjects_current_grade_tbl <- tibble(
  Subject       = rep(c("History", "Science", "Math", "English"), times = c(3, 3, 3, 3)),
  Assignment    = rep(c("Assign_01", "Assign_02", "Assign_03"), times = c(4)),
  Pnts_Earned   = c(12, 20, 13, 12, 20, 11.5, 10, 18, 10, 10, 14, 10),
  Pnts_Possible = c(15, 20, 15, 15, 20, 15, 15, 20, 15, 15, 20, 15))

all_subjects_current_grade_tbl















Part 2 | Section 3: Set Up Nested Tibble

One incredible functionality in R is the ability to nest() data sets. Nesting gives you the ability to apply functions across multiple data sets independently. In other words, think of it as a loop that allows you to perform functions on one set of data, print some results, and repeat these steps on other data sets. To account for the various independent data sets, we set up a nested tibble which includes the same information in the tibble above.

all_subjects_current_grade_nested_tbl <- all_subjects_current_grade_tbl %>%
  group_by(Subject) %>%
  nest()

all_subjects_current_grade_nested_tbl









We will apply the optimization function we created in Part 1 to the nested data that will allow us to calculate an optimal score for all four final exams independently.

For more information about nested functions and mapping, please check out R for Data Science.


Part 2 | Section 4: Set Up a Multiple Data Set Optimization Objective

As we did earlier, we will set n = 1 since we only receive one (1) output score for each subject. We also know we want to achieve an 80% overall grade for all subjects, so we can also set that variable up.

n <- 1
final_grade <- 0.80

The only difference in the code below is that we wrap another function around our original calc_Final_Exam_Points() function to allow mapping across multiple data sets.

all_subjects_nested_model <- function(nested_tbl) {
  
  calc_Final_Exam_Points <- function(Pnts_Earned_Final_Exam) {
    
    nested_tbl %>%
      
      # Sum up current points earned and sum up all current possible points
      summarise(Pnts_Earned_sum = sum(Pnts_Earned),
                Pnts_Possible_sum = sum(Pnts_Possible)) %>%
      
      # Add Unknown Final Exam Pnts Earned to Current Pnts Earned
      mutate(Total_Pnts_Earned = Pnts_Earned_Final_Exam + Pnts_Earned_sum) %>%
      
      # Add Final Exam Possible Possible Pnts to Current Possible Pnts
      mutate(Total_Pnts_Possible = Pnts_Possible_Final_Exam + Pnts_Possible_sum) %>%
      
      # Calculate Total Grade for Class
      mutate(Final_Class_Grade = Total_Pnts_Earned/Total_Pnts_Possible) %>% 
      
      # Pull out Final Class Grade
      pull(Final_Class_Grade)
  }
  
  # Set up solver (think of Excel's Solver Add-in)
  final_exam_grade_nlp <- OP(
    objective   = F_objective(F = calc_Final_Exam_Points, n = n, names = "Pnts_Earned_Final_Exam"),
    constraints = rbind(
      F_constraint(F = calc_Final_Exam_Points, dir = "==", rhs = final_grade)
    ),
    maximum = FALSE
  )
  
  # Solve for optimal solution
  sol <- ROI_solve(final_exam_grade_nlp, solver = "alabama", start = rep(1/n, n))
  
  # Return the optimal solution results in a tidy tibble format
  return(
    bind_cols(
      tibble(Final_Grade_Perc  = calc_Final_Exam_Points(sol$solution)),
      enframe(sol$solution) %>% spread(key = name, value = value),
      tibble(Pnts_Possible_Final_Exam = Pnts_Possible_Final_Exam)) %>%
      
      mutate(Final_Exam_Score_Needed = Pnts_Earned_Final_Exam/Pnts_Possible_Final_Exam)
  )
}

Now, we have to map the all_subjects_nested_model() function to the nested data set and unnest() the results. The final results are listed below in a concise tidy format.

receive_all_Bs_summary_tbl <- all_subjects_current_grade_nested_tbl %>% 
  mutate(model = map(data, all_subjects_nested_model)) %>%
  select(Subject, model) %>%
  unnest()

Let’s take a look at the final results for all 4 classes.

glimpse(receive_all_Bs_summary_tbl)









In summary, we would need to get the following minimum final exam scores to get all Bs on our report card for all four subjects.
  • History Final Exam Score ~ 70%
  • Science Final Exam Score ~ 73%
  • Math Final Exam Score ~ 84%
  • English Final Exam Score ~ 92%

Part 2 | Section 5: Calculate Score Needed to Get All As

Since we took the time and effort to set up dynamic functions and variables, we can see if it is feasible to get a 90% or an A in every course by changing the final_grade variable to 0.90. And then, all we need to do is re-run the same function and print the results.

final_grade <- 0.90

receive_all_As_summary_tbl <- all_subjects_current_grade_nested_tbl %>% 
  mutate(model = map(data, all_subjects_nested_model)) %>%
  select(Subject, model) %>%
  unnest()

glimpse(receive_all_As_summary_tbl)








We would need to get the following minimum final exam scores to get all As on our report card for all four subjects.
  • History Final Exam Score ~ 90%
  • Science Final Exam Score ~ 93%
  • Math Final Exam Score ~ 104%
  • English Final Exam Score ~ 112%
Unfortunately, unless extra credit was given on the Math and English exams, we could not get an A in either class since we would have to score above 100% for each final exam. However, As in History and Science are achievable if we scored a 90% and a 93% on those finals, respectively.

Summary and Final Thoughts

The examples presented are not especially difficult to solve manually or by using Excel’s solver. Both problems presented are technically linear and very simple in nature. Linear problems are often easier to understand compared to non-linear ones.

However, I believe in the non-linear and functional setup approach since non-linear problems are more likely to be presented in the wild. Non-linear problems allow conditions (if-then statements) to be set up in the functions that add another layer of flexibility to the optimization solutions.

Although the functions and the setup may seem like a lot to absorb, I truly believe the power comes from passing multiple data sets through one function multiple times to get various solutions. I hope you found some value in this article.

Popular posts from this blog

MySQL Part 1: Getting MySQL Set Up in goormIDE

Do Popular Market Index Returns Follow a Normal Distribution?