Authors: (1) Ankur Nath, Department of Computer Science and Engineering, Texas A&M University; (2) Alan Kuhnle, Department of Computer Science and Engineering, Texas A&M University. Table of Links Abstract & Introduction Related work Evaluation for Max-Cut Evaluation for SAT Summary and Outlook, References Supplementary Materials SUPPLEMENTARY MATERIALS Reproducibility We use the publicly available implementation of ECO-DQN[2] and GNNSAT[3] as our code base. To ensure a fair comparison, we employ the pretrained models from the original papers. However, for S2V-DQN applied to ER graphs with |V | = 60, we conduct training from scratch due to the inability to load the pretrained model with the hyperparameters provided in the original paper. We provide all our experimental results, code, and data at this link[4]. For training skewed graphs for the Max-Cut problem, we adopt the hyperparameters outlined in the original paper and modify the dataset while keeping other settings intact. Experimental Setup We run all experiments on two NVIDIA RTX A4000 (16GB memory) GPUs with Intel Xeon(R) w5-2445 CPUs at 3.10 GHz and 64 GB of memory Additional Tables and Plots In this section, you can find the tables and plots omitted in the main paper due to space constraints. Intra-episode Behavior Figure 4 and 5 present an analysis of the intra-episode behavior of ECO-DQN and SoftTabu agents across a diverse range of distributions. Generalisation on Small Instances Understanding the generalization of agents is crucial, as it determines their practical utility in real world applications where the data distribution may vary or evolve over time. Table 4 and 5 provide detailed analysis of how well these agents can adapt and perform in scenarios that go beyond their training data. Table 4: Generalisation of agents trained on ER graphs of size |V | = 40 to unseen graph sizes and structures. Table 5: Generalisation of agents trained on BA graphs of size |V | = 40 to unseen graph sizes and structures. Generalisation on Hard Instances and Real World Datasets Table 6 presents a performance analysis of agents that have been trained on skew graphs, specifically those with |V | = 200. The primary focus is on how these agents perform when faced with established benchmark tasks, with the best-performing results being emphasized and displayed in bold for clarity. Table 6: Average performance of agents trained on skew graphs of size |V | = 200 on known benchmarks (best in bold). Distributions of Actions on Hard Instances Evalution of SAT In Table 7, we compare the performance of learned heuristics alongside the WalkSAT algorithm. For WalkSAT, we have set the value of the parameter p to 0.5, following the experimental setup used in GNNSAT. Additionally, we have fine-tuned the algorithm to determine the optimal p value. Table 7: Performance of the learned heuristics and WalkSAT. For WalkSAT, we set the value of p = 0.5 following the experimental setup of GNNSAT and also tune the algorithm for the optimal value. In each cell, there are three metrics (from top to bottom): the ratio of the average number of steps, the median number of steps, and the percentage solved. *Values as reported in GNNSAT for reference. [2] Code available at: https://github.com/tomdbar/eco-dqn [3] Code available at: https://github.com/emreyolcu/sat [4] Code available at: https://tinyurl.com/52ykxtaj This paper is available on arxiv under CC 4.0 DEED license. Authors: (1) Ankur Nath, Department of Computer Science and Engineering, Texas A&M University; (2) Alan Kuhnle, Department of Computer Science and Engineering, Texas A&M University. Authors: Authors: (1) Ankur Nath, Department of Computer Science and Engineering, Texas A&M University; (2) Alan Kuhnle, Department of Computer Science and Engineering, Texas A&M University. Table of Links Abstract & Introduction Abstract & Introduction Related work Related work Evaluation for Max-Cut Evaluation for Max-Cut Evaluation for SAT Evaluation for SAT Summary and Outlook, References Summary and Outlook, References Supplementary Materials Supplementary Materials SUPPLEMENTARY MATERIALS Reproducibility We use the publicly available implementation of ECO-DQN[2] and GNNSAT[3] as our code base. To ensure a fair comparison, we employ the pretrained models from the original papers. However, for S2V-DQN applied to ER graphs with |V | = 60, we conduct training from scratch due to the inability to load the pretrained model with the hyperparameters provided in the original paper. We provide all our experimental results, code, and data at this link[4]. For training skewed graphs for the Max-Cut problem, we adopt the hyperparameters outlined in the original paper and modify the dataset while keeping other settings intact. Experimental Setup We run all experiments on two NVIDIA RTX A4000 (16GB memory) GPUs with Intel Xeon(R) w5-2445 CPUs at 3.10 GHz and 64 GB of memory Experimental Setup Additional Tables and Plots In this section, you can find the tables and plots omitted in the main paper due to space constraints. Intra-episode Behavior Figure 4 and 5 present an analysis of the intra-episode behavior of ECO-DQN and SoftTabu agents across a diverse range of distributions. Generalisation on Small Instances Understanding the generalization of agents is crucial, as it determines their practical utility in real world applications where the data distribution may vary or evolve over time. Table 4 and 5 provide detailed analysis of how well these agents can adapt and perform in scenarios that go beyond their training data. Table 4: Generalisation of agents trained on ER graphs of size |V | = 40 to unseen graph sizes and structures. Table 5: Generalisation of agents trained on BA graphs of size |V | = 40 to unseen graph sizes and structures. Generalisation on Hard Instances and Real World Datasets Table 6 presents a performance analysis of agents that have been trained on skew graphs, specifically those with |V | = 200. The primary focus is on how these agents perform when faced with established benchmark tasks, with the best-performing results being emphasized and displayed in bold for clarity. Table 6: Average performance of agents trained on skew graphs of size |V | = 200 on known benchmarks (best in bold). Distributions of Actions on Hard Instances Evalution of SAT In Table 7, we compare the performance of learned heuristics alongside the WalkSAT algorithm. For WalkSAT, we have set the value of the parameter p to 0.5, following the experimental setup used in GNNSAT. Additionally, we have fine-tuned the algorithm to determine the optimal p value. Table 7: Performance of the learned heuristics and WalkSAT. For WalkSAT, we set the value of p = 0.5 following the experimental setup of GNNSAT and also tune the algorithm for the optimal value. In each cell, there are three metrics (from top to bottom): the ratio of the average number of steps, the median number of steps, and the percentage solved. *Values as reported in GNNSAT for reference. [2] Code available at: https://github.com/tomdbar/eco-dqn [3] Code available at: https://github.com/emreyolcu/sat [4] Code available at: https://tinyurl.com/52ykxtaj This paper is available on arxiv under CC 4.0 DEED license. This paper is available on arxiv under CC 4.0 DEED license. available on arxiv