Umbhali: (1) I-Haolong Li, i-Tongji Universiy kunye nenkqubo yokusebenza kwi-ByteDance (furlongli322@gmail.com); (2) Yu Ma, Seed Foundation, ByteDance (mayu.1231@bytedance.com); (3) Yinqi Zhang, East China Normal University kunye nokusebenza kwiminyaka yokufunda kwi-ByteDance (zhang.inch@gmail.com); (4) Chen Ye (I-Author Correspondant), ESSC Lab, Tongji Universiy (yechen@tongji.edu.cn); (5) Jie Chen, Seed Foundation, ByteDance kunye noMphathiswa weProjekthi (chenjiexjtu@gmail.com). Authors: (1) I-Haolong Li, i-Tongji Universiy kunye nenkqubo yokusebenza kwi-ByteDance (furlongli322@gmail.com); (2) Yu Ma, Seed Foundation, ByteDance (mayu.1231@bytedance.com); (3) Yinqi Zhang, East China Normal University kunye nokusebenza kwiminyaka yokufunda kwi-ByteDance (zhang.inch@gmail.com); (4) Chen Ye (I-Author Correspondant), ESSC Lab, Tongji Universiy (yechen@tongji.edu.cn); (5) Jie Chen, Seed Foundation, ByteDance kunye noMphathiswa weProjekthi (chenjiexjtu@gmail.com). Umbala we-Left I-Abstract kunye ne-1 Introduction 2 Ukulungelelaniswa kweproblem 2.1 I-Arithmetic Puzzle Problem 2.2 Izixhobo zeData 2.3 Iinkcukacha 3 Imodeli 4 Iingxaki 4.1 Ukuhlolwa 4.2 Imiphumo 4.3 Iingcebiso zeCase 5 Iingxelo kunye neengxelo 6 Izinzuzo 7 I-Ethics Statement kunye ne-References A Appendix A.1 Ukucaciswa kwe-Hyperparameter A.2 Ukubalwa Model Basic A.3 Isifundo seCase A.4 Ukubonisa i-puzzle ebonakalayo Ukucinga I-Large Language Models (LLMs) zibonisa imveliso emangalisayo ekutholeni iilwimi, ukuvelisa iiteksti, ukuhanjiswa kweekhodi, kunye nezinye iimfuneko ezininzi, nangona zithembisa kwimfuneko ezininzi-step yobugcisa, njenge-mathematical reasoning. Kule nqaku, nge-problem ye-puzzle ye-arithmetic ebonakalayo, sinikezela ukuba i-model inokufumana kakuhle kwiimfuneko ze-puzzle e-multi-step ngokusebenzisa i-fine-tuning kwi-data synthetic ye-high-quality. Ngokutsho, iimveliso ze-experimental kunye ne-open-lama-3B ye-domain kwiintseti 1 Ukuqalisa I-Data Large Language Models (LLMs), njenge-zero-shot kunye ne-multi-task learners, zibonisa amandla ezizodwa kwiinkqubo ezininzi ze-language natural (Vaswani et al., 2017; Schulman et al., 2017; Radford et al., 2019; Ziegler et al., 2019; Brown et al., 2020; Kojima et al., 2022; Park et al., 2023; Chowdhery et al., 2023; Rafailov et al., 2022; Chen et al., 2022). Nangona kunjalo, i-LLMs ezininzi ezihlabathi xa kuxhomekeke kwimfuneko yeengxaki ezininzi-step, njenge-mathematical and scientific reasoning (Koncel-Kedziorski et al., 2016; Cobbe et al., 2021; Hendrycks I-GPT-4 (i-Achiam et al., 2023), i-LLaMA (i-Touvron et al., 2023a,b), i-Gemini (i-Team et al., 2023), i-Minerva (i-Lewkowycz et al., 2022), i-Llemma (i-Azerbayev et al., 2023), i-Mistral (i-Jiang et al., 2023), i-WizardMath (i-Luo et al., 2023), i-MAMMOTH (i-Yue et al., 2023), i-ToRA (i-Gou et al., 2023) kunye ne-Deepseek (i-Bi et al., 2024; i-Guo et al., 2024; i-Lu et al., 2024) ziyafumaneka njengezimod Kule nqakraza, sincoma iingxaki ezidlulileyo ngokuvumela inkcazelo elitsha kunye nenqakraza i-puzzle ye-arithmetic kunye nenqakraza yokufumana. Ngokutsho, sincoma i-puzzle ebonakalayo iingcebiso zeengxaki ezininzi ukuvelisa isisombululo efanelekileyo. Kwixesha elide, i-data synthesis pipeline yenzelwe ukuvelisa ngokuzenzekelayo iinkcukacha ezininzi kwizinga eliphezulu ye-supervised fine-tuning (SFT). Kwakhona, inqaku le-LLM esekelwe kwi-open-llama-3B (Touvron et al., 2023a) ibekwe kwi-dataset ye-synthetic. Ukongezelela, iimveliso ezimbini ze-out-of-domain kwi-extension ye-numerical range kunye ne-composing components ye-arithmetic puzzle problem. Ngokufuneka i-equity assessment, sincoma iimodeli zethu kwi-sampling yokufanelekileyo kwi-zero-shot setting kunye ne-verifier efanelekileyo. Iimvavanyo zethu ze-data-scaling zibonisa ukuba xa inani le data ye-synthetic ikakhulu, i-in-domain zero-shot pass@1 ikakhulu kusuka ku-0.22 kuya ku-0.44, kwaye i-out-of-domain zero-shot pass@1 ikakhulu kusuka ku-0.14/0.17 kuya ku-0.33/0.35. Iingcebiso zethu ezininzi ziyafumaneka njengoko: (1) Sinikezela inkcubeko entsha ye-puzzle ye-arithmetic kunye ne-data synthesis pipeline kunye ne-out-of-domain benchmarks, ukuze zibonise i-multi-step reasoning kunye ne-extrapolation iinkonzo ze-LLM ze-synthetic data. (2) Iingcebiso zibonisa ukuba ukwandisa inani le data ye-high-quality synthetic kubangela ukwandisa ukusebenza kwi-in-domain kunye ne-out-of-domain data sets. (3) I-case study ephelele kwenziwa. Oku kunokwenzeka kwi-archiv phantsi kolawulo lwe-CC BY-NC-SA 4.0 (Attribution-Noncommercial-Sharelike 4.0 International). Oku kunokwenzeka kwi-archiv phantsi kolawulo lwe-CC BY-NC-SA 4.0 (Attribution-Noncommercial-Sharelike 4.0 International).