References

[1] Nir Shavit. Data structures in the multicore age. Communications of The Acm, 54(3):76--84, March 2011. [ bib | DOI | http ]
The advent of multicore processors as the standard computing platform will force major changes in software design.
[2] Carson Gross, Dillon Shaffer, and Matt Revelle. Hypermedia controls: Feral to formal. In Proceedings of the 35th ACM Conference on Hypertext and Social Media, Ht '24, pages 52--64, New York, NY, USA, 2024. Association for Computing Machinery. [ bib | DOI | http ]
A defining characteristic of hypermedia systems is the presence of hypermedia controls. In this paper we examine hypermedia controls as found "in the wild", in particular in the World Wide Web. These hypermedia controls are analyzed to derive a functional hypermedia mechanic that can be used to characterize them. This functional mechanic is used to create first an informal and then formal definition of the term "hypermedia control". Using this formal definition we then derive a generalization of the concept, referring contextually to the World Wide Web. We then examine two hypermedia technologies that implement this concept of generalized hypermedia controls: htmx, which does so in the context of the WWW and Hyperview, which does so in a mobile context.
Keywords: htmx,Hypermedia,Hypermedia Controls
[3] John P. A. Ioannidis. Why most published research findings are false. PLOS Medicine, 2(8):null, August 2005. [ bib | DOI | http ]
Summary There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
[4] Lior Abraham, John Allen, Oleksandr Barykin, Vinayak Borkar, Bhuwan Chopra, Ciprian Gerea, Daniel Merl, Josh Metzler, David Reiss, Subbu Subramanian, Janet L. Wiener, and Okay Zed. Scuba: Diving into data at facebook. Proc. VLDB Endow., 6(11):1057--1067, August 2013. [ bib | DOI | .pdf ]
Facebook takes performance monitoring seriously. Performance issues can impact over one billion users so we track thousands of servers, hundreds of PB of daily network traffic, hundreds of daily code changes, and many other metrics. We require latencies of under a minute from events occuring (a client request on a phone, a bug report filed, a code change checked in) to graphs showing those events on developers' monitors.Scuba is the data management system Facebook uses for most real-time analysis. Scuba is a fast, scalable, distributed, in-memory database built at Facebook. It currently ingests millions of rows (events) per second and expires data at the same rate. Scuba stores data completely in memory on hundreds of servers each with 144 GB RAM. To process each query, Scuba aggregates data from all servers. Scuba processes almost a million queries per day. Scuba is used extensively for interactive, ad hoc, analysis queries that run in under a second over live data. In addition, Scuba is the workhorse behind Facebook's code regression analysis, bug report monitoring, ads revenue monitoring, and performance debugging.
[5] Raul Castro Fernandez, Aaron J. Elmore, Michael J. Franklin, Sanjay Krishnan, and Chenhao Tan. How large language models will disrupt data management. Proc. VLDB Endow., 16(11):3302--3309, July 2023. [ bib | DOI | http ]
Large language models (LLMs), such as GPT-4, are revolutionizing software's ability to understand, process, and synthesize language. The authors of this paper believe that this advance in technology is significant enough to prompt introspection in the data management community, similar to previous technological disruptions such as the advents of the world wide web, cloud computing, and statistical machine learning. We argue that the disruptive influence that LLMs will have on data management will come from two angles. (1) A number of hard database problems, namely, entity resolution, schema matching, data discovery, and query synthesis, hit a ceiling of automation because the system does not fully understand the semantics of the underlying data. Based on large training corpora of natural language, structured data, and code, LLMs have an unprecedented ability to ground database tuples, schemas, and queries in real-world concepts. We will provide examples of how LLMs may completely change our approaches to these problems. (2) LLMs blur the line between predictive models and information retrieval systems with their ability to answer questions. We will present examples showing how large databases and information retrieval systems have complementary functionality.
[6] Michael R. Genesereth and Nils J. Nilsson. Logical Foundations of Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1987. [ bib ]
[7] Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Olatunji Ruwase, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yunan Zhang, and Xiren Zhou. Phi-3 technical report: A highly capable language model locally on your phone, 2024. [ bib | arXiv | http ]
[8] Harold Abelson and Gerald J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA, USA, 2nd edition, 1996. [ bib ]
[9] Michael J. Accetta, Robert V. Baron, William J. Bolosky, David B. Golub, Richard F. Rashid, Avadis Tevanian, and Michael Young. Mach: A New Kernel Foundation for UNIX Development. In Proceedings of the USENIX Summer Technical Conference, pages 93--113, 1986. [ bib ]
[10] Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. Proc. VLDB Endow., 8(12):1792--1803, August 2015. [ bib | DOI | http ]
[11] Guillaume Alain and Yoshua Bengio. What regularized auto-encoders learn from the data generating distribution, 2014. Their models are essentially predecessors of modern diffusion models, and the failure modes they showcase explain design choices that underpin modern DDPMs.

https://x.com/norpadon/status/1828905291242467678. [ bib | arXiv | http ]

[12] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 3.3, knowledge capacity scaling laws, 2024. [ bib | arXiv | http ]
Keywords: LLM
[13] Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism. ACM Transactions on Computer Systems, 10(1):53--79, February 1992. [ bib ]
[14] Robert Frank. How not to buy happiness. Daedalus, 133:69--79, April 2004. [ bib | DOI ]
[15] Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-RAG: Learning to retrieve, generate, and critique through self-reflection, 2023. [ bib | arXiv ]
Keywords: RAG
[16] James H Austin. Chase, Chance, and Creativity: The Lucky Art of Novelty. Mit Press, 2003. [ bib ]
[17] Nádila Azevedo, Gustavo Aquino, Leonardo Nascimento, Leonardo Camelo, Thiago Figueira, Joel Oliveira, Ingrid Figueiredo, André Printes, Israel Torné, and Carlos Figueiredo. A novel methodology for developing troubleshooting chatbots applied to ATM technical maintenance support. Applied Sciences, 13(11):6777, 2023. [ bib ]
[18] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2016. [ bib | arXiv ]
[19] Eirik Bakke and David R Karger. Expressive query construction through direct manipulation of nested relational results. In Proceedings of the 2016 International Conference on Management of Data, pages 1377--1392, 2016. [ bib | .pdf ]
[20] Sourav Banerjee, Ayushi Agarwal, and Saloni Singla. LLMs will always hallucinate, and we need to live with this, 2024. [ bib | arXiv | http ]
[21] Gaurav Banga, Peter Druschel, and Jeffrey C. Mogul. Resource containers: A new facility for resource management in server systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, pages 45--58, 1999. [ bib ]
[22] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 164--177, 2003. [ bib ]
[23] Titus Barik. Expressions on the nature and significance of programming and play. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 145--153. IEEE, 2017. [ bib ]
[24] Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, et al. SeamlessM4T-Massively multilingual & multimodal machine translation. arXiv preprint arXiv:2308.11596, 2023. [ bib | arXiv ]
[25] Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 29--44, 2009. [ bib ]
[26] Peter Belcak and Roger Wattenhofer. Exponentially faster language modelling, 2023. [ bib | arXiv ]
[27] Fabrice Bellard. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, pages 41--46, 2005. [ bib ]
[28] Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Lightweight Remote Procedure Call. ACM Transactions on Computer Systems, 8(1):37--55, February 1990. [ bib ]
[29] Brian N. Bershad, Craig Chambers, Susan Eggers, Chris Maeda, Dylan Mcnamee, Stefan Savage, and Emin Gun Sirer. SPIN - an extensible microkernel for application-specific operating system services. Technical Report TR 94-03-03, University of Washington, 1994. [ bib ]
[30] Antoine Beugnard, Jean-Marc Jezequel, and Noel Plouzeau. Contract Aware Components, 10 years after. In Javier Camara, Carlos Canal, and Gwen Salaun, editors, WCSI, volume 37 of EPTCS, pages 1--11, 2010. [ bib | http ]
Keywords: dblp
[31] Antoine Beugnard, Jean-Marc Jézéquel, Noël Plouzeau, and Damien Watkins. Making Components Contract Aware. Computer, 32(7):38--45, July 1999. [ bib | DOI | http ]
[32] Paul Bilokon and Burak Gunduz. C++ design patterns for low-latency applications including high-frequency trading, 2023. [ bib | arXiv | http ]
[33] S. Blanco-Cuaresma and E. Bolmont. What can the programming language Rust do for astrophysics? In M. Brescia, S. G. Djorgovski, E. D. Feigelson, G. Longo, and S. Cavuoti, editors, Astroinformatics, volume 325 of IAU Symposium, pages 341--344, June 2017. [ bib | DOI ]
Keywords: exoplanets,N-Body,programming languages,Rust,simulations
[34] Barry W Boehm, James R Brown, and Mlity Lipow. Quantitative evaluation of software quality. In Proceedings of the 2nd International Conference on Software Engineering, pages 592--605. IEEE Computer Society Press, 1976. [ bib ]
[35] Hans J. Boehm. Threads Cannot Be Implemented as a Library. ACM SIGPLAN Notices, 40(6):261--268, June 2005. [ bib | DOI | .pdf ]
In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code.We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm.Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Keywords: c,implementation,language,operatingsystem,programming,semantics,thread
[36] Jeff Bonwick. The Slab Allocator: An Object-Caching Kernel Memory Allocator. In Proceedings of the USENIX Summer Technical Conference, pages 87--98, 1994. [ bib ]
[37] Jeff Bonwick. ZFS: The Last Word in Filesystems, October 2005. [ bib | http ]
[38] Andrea Bracciali, Antonio Brogi, and Carlos Canal. A formal approach to component adaptation. Journal of Systems and Software, 74(1):45--54, 2005. [ bib ]
[39] Frederick P. Brooks. No Silver Bullet: Essence and Accidents of Software Engineering. Computer, 20(4):10--19, April 1987. [ bib | DOI | http ]
First Page of the Article
Keywords: software_engineering
[40] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020. [ bib | arXiv ]
[41] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of Artificial General Intelligence: Early experiments with GPT-4, 2023. [ bib ]
[42] Aydar Bulatov, Yuri Kuratov, and Mikhail S. Burtsev. Scaling Transformer to 1M tokens and beyond with RMT, 2023. [ bib ]
[43] William E. Byrd. Relational Programming in Minikanren: Techniques, Applications, and Implementations. PhD thesis, Indiana University, 2010. [ bib ]
Keywords: *file-import-13-06-09
[44] Luca Cardelli. Type systems. ACM Comput. Surv., 28(1):263--264, March 1996. [ bib | DOI | http ]
Keywords: *file-import-13-06-09
[45] Carp -- Common Address Redundancy Protocol. NetBSD Kernel Interfaces Manual, October 2003. [ bib ]
[46] K Mani Chandy. Caltech infospheres project overview: Information infrastructures for task forces. Computer Science, 256:80, 1996. [ bib ]
[47] K Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1):63--75, 1985. [ bib ]
[48] K Mani Chandy. Sense and respond systems. In CMG-CONFERENCE-, volume 1, page 59. Computer Measurement Group; 1997, 2005. [ bib ]
[49] Jeffrey S. Chase, Henry M. Levy, Michael J. Feeley, and Edward D. Lazowska. Sharing and Protection in a Single-Address-Space Operating System. ACM Transactions on Computer Systems, 12(4):271--307, 1994. [ bib ]
[50] Himank Chaudhary. Skipping the boring parts of building a database using FoundationDB, September 2022. [ bib | http ]
[51] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. 2021. [ bib | arXiv ]
[52] Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, and Shafiq Joty. ChatGPT's one-year anniversary: Are open-source large language models catching up?, 2023. [ bib | arXiv ]
[53] Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, and Hongyang Li. End-to-end autonomous driving: Challenges and frontiers, 2023. [ bib | arXiv ]
[54] Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, and Quanquan Gu. Self-play fine-tuning converts weak language models to strong language models, 2024. [ bib | arXiv ]
[55] James Cheney and Christian Urban. Nominal logic programming. ACM Transactions on Programming Languages and Systems (TOPLAS), 30(5):1--47, 2008. [ bib ]
[56] James Cheney and Christian Urban. Nominal Logic Programming. CoRR, abs/cs/0609062, 2006. [ bib ]
[57] Kewei Cheng, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Binxuan Huang, Ruirui Li, Shiyang Li, Zheng Li, Yifan Gao, Xian Li, Bing Yin, and Yizhou Sun. Inductive or deductive? Rethinking the fundamental reasoning abilities of llms, 2024. πŸ“Œ Deductive reasoning presents a greater challenge than inductive reasoning for LLMs.

While LLMs can often infer correct mapping functions inductively, they struggle to apply these functions deductively, especially for unfamiliar tasks.

πŸ‘¨β€πŸ”§ Definition: Deductive reasoning is moving from general principles to specific conclusions, like applying given rules to solve problems, while inductive reasoning involves inferring general patterns or rules from specific observations. Deductive reasoning starts with a hypothesis and derives specific outcomes, whereas inductive reasoning formulates broad generalizations from individual instances.

πŸ“Œ The paper introduces a novel framework called SolverLearner to isolate and evaluate pure inductive reasoning abilities of LLMs.

πŸ“Œ SolverLearner uses a two-stage approach: 1) Function Proposal - LLMs learn an input-output mapping function from few-shot examples. 2) Function Execution - The learned function is applied through external code interpreters to solve test queries, removing LLM-based deductive reasoning.

πŸ“Œ The study evaluates LLMs on four tasks: arithmetic in different bases, basic syntactic reasoning with altered word orders, spatial reasoning with modified coordinate systems, and cipher decryption.

πŸ“Œ For each task, performance is compared across deductive settings (zero-shot and few-shot with explicit mapping functions) and inductive settings (few-shot without mapping functions and SolverLearner).

πŸ“Œ Results show LLMs struggle with deductive reasoning, especially for "counterfactual" tasks rarely seen in pretraining. However, they demonstrate strong inductive abilities through SolverLearner, often achieving perfect performance.

πŸ“Œ The effectiveness of inductive reasoning varies between models - GPT-4 consistently outperforms GPT-3.5 in learning correct input-output mappings.

via - https://x.com/rohanpaul_ai/status/1828778688151683227. [ bib | arXiv | http ]

[58] François Chollet. On the measure of intelligence, 2019. [ bib | arXiv | http ]
[59] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An Empirical Study of Operating Systems Errors. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 73--88, Banff, Alberta, Canada, 2001. ACM. [ bib ]
[60] Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. Scaling instruction-finetuned language models, 2022. [ bib | arXiv ]
[61] Aaron Clauset, Cosma R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661--703, February 2009. [ bib | DOI | http ]
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution -- the part of the distribution representing large but rare events -- and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
Keywords: machine-learning
[62] Jacques Cohen. Logic programming and constraint logic programming. ACM Comput. Surv., 28:257--259, March 1996. [ bib | DOI | http ]
An abstract is not available.
Keywords: history,logic-programming,programming-languages,survey
[63] Adam M. Costello and George Varghese. Redesigning the BSD Callout and Timer Facilities. Technical Report WUCS-95-23, Washington University, 1995. [ bib ]
[64] Charles D. Cranor. Design and Implementation of the UVM Virtual Memory System. PhD thesis, Washington University, 1998. [ bib ]
[65] Steven E. Czerwinski, Ben Y. Zhao, Todd D. Hodes, Anthony D. Joseph, and Randy H. Katz. An Architecture for a Secure Service Discovery Service. In Proceedings of the 5th MobiCom, pages 24--35, 1999. [ bib ]
[66] Christopher Dabrowski, Kevin L Mills, and Stephen Quirolgico. A model-based analysis of first-generation service discovery systems. Technical report, DTIC Document, 2005. [ bib ]
[67] Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? Language models implicitly perform gradient descent as meta-optimizers, 2023. [ bib | arXiv | http ]
[68] Siddhartha Dalal and Vishal Misra. The matrix: A bayesian learning model for llms, 2024. [ bib | arXiv | http ]
[69] Tung-Lam Dao, Trung-Tu Nguyen, Cyril Deremble, Yves Lemperiere, Jean-Philippe Bouchaud, and Marc Potters. Tail Protection for Long Investors: Trend Convexity at Work, May 2016. [ bib | DOI | http ]
The performance of trend following strategies can be ascribed to the difference between long-term and short-term realized variance. We revisit this general result and show that it holds for various definitions of trend strategies. This explains the positive convexity of the aggregate performance of Commodity Trading Advisors (CTAs) which -- when adequately measured -- turns out to be much stronger than anticipated. We also highlight interesting connections with so-called Risk Parity portfolios. Finally, we propose a new portfolio of strangle options that provides a pure exposure to the long-term variance of the underlying, offering yet another viewpoint on the link between trend and volatility.
Keywords: Convexity,CTA,Option Hedging,Protection,Risk Parity,Tail Risk,Trend Following,Variance Swap,Volatility
[70] Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical report, Google, Inc., 2010. [ bib | .pdf ]
[71] Sayantan Das. Model Alignment Process, March 2024. [ bib | http ]
The alignment of generative models with human feedback has significantly improved the performance of natural language generation tasks. For large language models (LLMs), alignment methods like reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) have consistently worked better than just supervised fine-tuning (SFT) alone based on current
[72] Pawel Jakub Dawidek. Porting the ZFS file system to the FreeBSD operating system. In Proceedings of AsiaBSDCon, pages 97--103, 2007. [ bib ]
[73] DDE/DDEKit. [ bib | http ]
[74] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. [ bib ]
[75] Luke Deller and Gernot Heiser. Linking Programs in a Single Address Space. In Proceedings of the USENIX Annual Technical Conference, pages 283--294, 1999. [ bib ]
[76] Trip Denton, Edward Jones, Srini Srinivasan, Ken Owens, and Richard W Buskens. NAOMI--an experimental platform for multi--modeling. In Model Driven Engineering Languages and Systems, pages 143--157. Springer, 2008. [ bib ]
[77] Tom De Smedt. Modeling Creativity. Uitgeverij UPA University Press Antwerp, 2013. [ bib ]
[78] Mathieu Desnoyers. Low-Impact Operating System Tracing. PhD thesis, Ecole Polytechnique de Montréal, December 2009. [ bib ]
[79] Mathieu Desnoyers, Paul E. McKenney, Alan S. Stern, Michel R. Dagenais, and Jonathan Walpole. User-Level Implementations of Read-Copy Update. IEEE Transactions on Parallel and Distributed Systems, 23(2):375--382, 2012. [ bib ]
[80] Jeff Dike. A user-mode port of the Linux kernel. In Proceedings of the Atlanta Linux Showcase, 2001. [ bib | .pdf ]
[81] Peter Dinda. The Minet TCP/IP Stack. Technical Report NWU-CS-02-08, Northwestern University Department of Computer Science, January 2002. [ bib ]
[82] Roland C. Dowdeswell and John Ioannidis. The CryptoGraphic Disk Driver. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, pages 179--186, 2003. [ bib ]
[83] Allen B Downey. Think Complexity: Complexity Science and Computational Modeling. O'Reilly Media, 2012. [ bib ]
[84] Richard Draves and Scott Cutshall. Unifying the User and Kernel Environments. Technical Report MSR-TR-97-10, Microsoft, 1997. [ bib ]
[85] Ulrich Drepper. What Every Programmer Should Know About Memory, 2007. [ bib | http ]
As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.
Keywords: hardware,linux,memory,unix
[86] James R. Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. Making Data Structures Persistent, 1989. via silentbicycle on twitter. [ bib ]
[87] Adam Dunkels. Design and Implementation of the lwIP TCP/IP Stack. Technical report, Swedish Institute of Computer Science, 2001. [ bib ]
[88] Zakir Durumeric, James Kasten, David Adrian, J. Alex Halderman, Michael Bailey, Frank Li, Nicolas Weaver, Johanna Amann, Jethro Beekman, Mathias Payer, and Vern Paxson. The Matter of Heartbleed. In Proceedings of the 2014 Conference on Internet Measurement Conference, IMC '14, pages 475--488, New York, NY, USA, 2014. ACM. [ bib | DOI | http ]
Keywords: heartbleed,internet-wide scanning,openssl,security
[89] R. Kent Dybvig. Three Implementation Models for Scheme. PhD thesis, University of North Carolina, Chapel Hill, April 1987. Chapter 4 describes the essence of the Chez Scheme Version 1 run-time architecture. [ bib ]
[90] E2fsprogs: Ext2/3/4 Filesystem Utilities. [ bib | http ]
[91] Aggelos Economopoulos. A Peek at the DragonFly Virtual Kernel, 2007. [ bib | http ]
[92] Bradley Efron. A 250-year argument: Belief, behavior, and the bootstrap. Bulletin of the American Mathematical Society, 50(1):129--146, 2013. [ bib ]
[93] Bradley Efron. Bootstrap methods: Another look at the jackknife. The annals of Statistics, pages 1--26, 1979. [ bib ]
[94] Ronen Eldan and Yuanzhi Li. TinyStories: How Small Can Language Models Be and Still Speak Coherent English?, 2023. [ bib ]
[95] David Ely, Stefan Savage, and David Wetherall. Alpine: A User-Level Infrastructure for Network Protocol Development. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, pages 171--184, 2001. [ bib ]
[96] Azlen Elza. Azlen/azlen.me. [ bib | http ]
Source code of azlen.me, that implmenets Andy Matuschak's Evergreen Notes style website.
[97] Dawson R. Engler, M. Frans Kaashoek, and James O'Toole Jr. Exokernel: An Operating System Architecture for Application-Level Resource Management. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 251--266, 1995. [ bib ]
[98] Matthias Felleisen. How to Design Programs: An Introduction to Programming and Computing. The MIT Press, 2001. [ bib ]
[99] Bryan Ford, Godmar Back, Greg Benson, Jay Lepreau, Albert Lin, and Olin Shivers. The Flux OSKit: A Substrate for OS and Language Research. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 38--51, 1997. [ bib ]
[100] Bryan Ford and Russ Cox. Vx32: Lightweight User-level Sandboxing on the x86. In Proceedings of the USENIX Annual Technical Conference, pages 293--306, 2008. [ bib ]
[101] Forth: The programming language that writes itself: The Web Page. [ bib | .html ]
An exploration of the evolution and meaning of the Forth programming language and its context in history.
Keywords: forth,programming
[102] Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: Enabling scalable virtual organizations. International journal of high performance computing applications, 15(3):200--222, 2001. [ bib ]
[103] Ian Foster, Carl Kesselman, Jeffrey M Nick, and Steven Tuecke. The physiology of the grid. Grid computing: making the global infrastructure a reality, pages 217--249, 2003. [ bib ]
[104] Armando Fox, Rean Griffith, A Joseph, R Katz, A Konwinski, G Lee, D Patterson, A Rabkin, and I Stoica. Above the clouds: A Berkeley view of cloud computing. Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, 28, 2009. [ bib ]
[105] Keisuke Fukuda and Edward K Vogel. Human variation in overriding attentional capture. The Journal of Neuroscience, 29(27):8726--8733, 2009. [ bib ]
[106] Fuse-ext2. [ bib | http ]
[107] Dimuthu U Gamage, Lahiru S Gallege, James H Hill, and Rajeev R Raje. A Compositional Trust Model for Predicting the Trust Value of Software System QoS Properties. In Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference On, pages 610--617. IEEE, 2012. [ bib ]
[108] Gregory R. Ganger, Dawson R. Engler, M. Frans Kaashoek, Hector M. Briceño, Russell Hunt, and Thomas Pinckney. Fast and Flexible Application-level Networking on Exokernel Systems. ACM Trans. Comput. Syst., 20(1):49--83, February 2002. [ bib | DOI | http ]
Keywords: Extensible systems,fast servers,network services,OS structure
[109] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2024. [ bib | arXiv ]
Keywords: GPT,LLM,RAG
[110] Ludovico Gardenghi, Michael Goldweber, and Renzo Davoli. View-OS: A New Unifying Approach Against the Global View Assumption. In Proceedings of the 8th International Conference on Computational Science, Part I, pages 287--296, 2008. [ bib ]
[111] Tal Garfinkel and Mendel Rosenblum. When Virtual is Harder than Real: Security Challenges in Virtual Machine Based Computing Environments. In Proceedings of the Workshop on Hot Topics in Operating Systems, 2005. [ bib | .pdf ]
[112] Jonas Geiping and Tom Goldstein. Cramming: Training a language model on a single GPU in one day, 2022. [ bib | arXiv | http ]
[113] Andrew Gelman. Induction and Deduction in Bayesian Data Analysis, 2011. [ bib ]
[114] Robert A. Gingell, Meng Lee, Xuong T. Dang, and Mary S. Weeks. Shared Libraries in SunOS. In Proceedings of the USENIX Summer Technical Conference, pages 375--390, 1987. [ bib ]
[115] Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15180--15190, 2023. [ bib | http ]
[116] Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024. [ bib | arXiv | http ]
[117] Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li. Textbooks are all you need, 2023. [ bib | arXiv ]
[118] Andreas Gustafsson. NetBSD-current/i386 build status. [ bib | http ]
[119] Study Hacks. Knowledge Workers are Bad at Working (and Here's What to Do About It...), November 2012. [ bib | http ]
An Inconvenient Observation Knowledge workers are bad at working. I say this because unlike every other skilled labor class in the history of skilled labor, ... Read more
[120] Anthony Hall. Seven myths of formal methods. Software, IEEE, 7(5):11--19, 1990. [ bib ]
[121] Taekgyeong Han and Kwang Mong Sim. An ontology-enhanced cloud service discovery system. In Proceedings of the International MultiConference of Engineers and Computer Scientists, volume 1, 2010. [ bib ]
[122] Yaru Hao, Yutao Sun, Li Dong, Zhixiong Han, Yuxian Gu, and Furu Wei. Structured prompting: Scaling in-context learning to 1,000 examples, 2022. Via @dosco - https://x.com/dosco/status/1825683850980372761. [ bib | arXiv | http ]
[123] Stavros Harizopoulos, Taylor Hopper, Morton Mo, Shyam Sundar Chandrasekaran, Tongguang Chen, Yan Cui, Nandini Ganesh, Gary Helmling, Hieu Pham, and Sebastian Wong. Meta's next-generation realtime monitoring and analytics platform. Proceedings of the VLDB Endowment, 15(12):3522--3534, 2022. [ bib | .pdf ]
[124] Brian Harvey and Matthew Wright. Simply Scheme: Introducing Computer Science. The MIT press, 1999. [ bib ]
[125] Gernot Heiser, Kevin Elphinstone, Jerry Vochteloo, Stephen Russell, and Jochen Liedtke. The Mungi Single-Address-Space Operating System. Software: Practice and Experience, 28(9):901--928, July 1998. [ bib ]
[126] Johannes Helander. Unix under Mach: The Lites Server. Master's thesis, Helsinki University of Technology, 1994. [ bib ]
[127] Pat Helland. Data on the Outside vs. Data on the Inside: Data kept outside SQL has different characteristics from data kept inside. Queue, 18(3):43--60, 2020. [ bib | http ]
[128] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding, 2021. [ bib | arXiv ]
[129] James P. Hennessy, Damian L. Osisek, and Joseph W. Seigh II. Passive Serialization in a Multitasking Environment, February 1989. US Patent 4,809,168. [ bib ]
[130] Jorrit N. Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S. Tanenbaum. MINIX 3: A highly reliable, self-repairing operating system. ACM SIGOPS Operating Systems Review, 40(3):80--89, 2006. [ bib ]
[131] Mike Hibler, Robert Ricci, Leigh Stoller, Jonathon Duerig, Shashi Guruprasad, Tim Stack, Kirk Webb, and Jay Lepreau. Large-scale Virtualization in the Emulab Network Testbed. In Proceedings of the USENIX Annual Technical Conference, pages 113--128, 2008. [ bib ]
[132] Dan Hildebrand. An Architectural Overview of QNX. In Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures, pages 113--126. USENIX Association, 1992. [ bib ]
[133] Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. Training compute-optimal large language models, 2022. [ bib | arXiv ]
[134] How to build a website without frameworks and tons of libraries. [ bib | http ]
A simple toolchain that Koding Kitty uses for building its web.
[135] Mei-Chen Hsueh, Timothy K. Tsai, and Ravishankar K. Iyer. Fault Injection Techniques and Tools. IEEE Computer, 30(4):75--82, 1997. [ bib ]
[136] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models, 2021. [ bib | arXiv ]
[137] John Hughes. Why Functional Programming Matters. The Computer Journal, 32:98--107, 1984. [ bib ]
[138] Galen C. Hunt and James R. Larus. Singularity: Rethinking the software stack. ACM SIGOPS Operating Systems Review, 41(2):37--49, 2007. [ bib ]
[139] Joseph Idziorek, Alex Keyes, Colin Lazier, Somu Perianayagam, Prithvi Ramanathan, James Christopher Sorenson III, Doug Terry, and Akshat Vig. Distributed transactions at scale in amazon {}DynamoDB{}. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 705--717, 2023. [ bib ]
Keywords: toread
[140] Jean-François Im, Kishore Gopalakrishna, Subbu Subramaniam, Mayank Shrivastava, Adwait Tumbde, Xiaotian Jiang, Jennifer Dai, Seunghyun Lee, Neha Pawar, Jialiang Li, et al. Pinot: Realtime olap for 530 million users. In Proceedings of the 2018 International Conference on Management of Data, pages 583--594, 2018. [ bib | .pdf ]
[141] Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, and Jong C. Park. Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity, 2024. [ bib | arXiv | http ]
[142] Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7B, 2023. [ bib | arXiv ]
[143] Xuxian Jiang and Dongyan Xu. SODA: A Service-On-Demand Architecture for Application Service Hosting Utility Platforms. In Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, pages 174--183, 2003. [ bib ]
[144] Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 2024. Online manuscript released August 20, 2024. [ bib | http ]
[145] Neil Johnson. Two's Company, Three Is Complexity : A Simple Guide to the Science of All Sciences. Oneworld, Oxford, 2007. [ bib ]
[146] M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Hector M. Briceno, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. Application Performance and Flexibility on Exokernel Systems. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 52--65, 1997. [ bib ]
[147] Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, and Christopher Potts. Mission: Impossible language models, 2024. [ bib | arXiv | http ]
[148] Poul-Henning Kamp and Robert N. M. Watson. Jails: Confining the omnipotent root. In Proceedings of SANE Conference, 2000. [ bib | .pdf ]
[149] Antti Kantee. Environmental Independence: BSD Kernel TCP/IP in Userspace. In Proceedings of AsiaBSDCon, pages 71--80, 2009. [ bib ]
[150] Antti Kantee et al. Flexible operating system internals: The design and implementation of the anykernel and rump kernels. 2012. [ bib ]
[151] Antti Kantee. Puffs - Pass-to-Userspace Framework File System. In Proceedings of AsiaBSDCon, pages 29--42, 2007. [ bib ]
[152] Antti Kantee. Rump Device Drivers: Shine On You Kernel Diamond. In Proceedings of AsiaBSDCon, pages 75--84, 2010. [ bib ]
[153] Antti Kantee. Rump File Systems: Kernel Code Reborn. In Proceedings of the USENIX Annual Technical Conference, pages 201--214, 2009. [ bib ]
[154] Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020. [ bib | arXiv ]
[155] Alfons Kemper. Technical Perspective: FoundationDB Performs Balancing Act. Commun. ACM, 66(6):96, May 2023. [ bib | DOI | http ]
[156] Brian Kernighan. Code Testing and Its Role in Teaching. ;login: The USENIX Magazine, 31(2):9--18, April 2006. [ bib ]
[157] Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into self-improving pipelines, 2023. [ bib | arXiv ]
[158] Jaehyeon Kim, Jungil Kong, and Juhee Son. Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech, 2021. [ bib | arXiv ]
[159] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023. [ bib | arXiv ]
[160] Steve R. Kleiman. Vnodes: An Architecture for Multiple File System Types in Sun UNIX. In Proceedings of the USENIX Annual Technical Conference, pages 238--247, 1986. [ bib ]
[161] Nikoletta Koilia and Christoforos Kachris. Hardware Acceleration of LLMs: A comprehensive survey and comparison, 2024. [ bib | arXiv | http ]
[162] Shriram Krishnamurthi. Programming Languages: Application and Interpretation. Shriram Krishnamurthi, 2007. [ bib ]
[163] Maximilian Kuschewski, David Sauerwein, Adnan Alhomssi, and Viktor Leis. BtrBlocks: Efficient columnar compression for data lakes. Proceedings of the ACM on Management of Data, 1(2):1--26, 2023. [ bib ]
[164] Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi. RewardBench: Evaluating reward models for language modeling, 2024. Abstract

Reward models (RMs) are at the crux of successful RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those reward models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. To date, very few descriptors of capabilities, training methods, or open-source reward models exist. In this paper, we present RewardBench, a benchmark dataset and code-base for evaluation, to enhance scientific understanding of reward models. The RewardBench dataset is a collection of prompt-win-lose trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries. We created specific comparison datasets for RMs that have subtle, but verifiable reasons (e.g. bugs, incorrect facts) why one answer should be preferred to another. On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods, such as the direct MLE training of classifiers and the implicit reward modeling of Direct Preference Optimization (DPO), and on a spectrum of datasets. We present many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process. [ bib | arXiv ]

Keywords: LLM,Reward Models
[165] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, 1978. [ bib ]
[166] Butler W. Lampson. Hints for Computer System Design. ACM SIGOPS Operating Systems Review, 17(5):33--48, 1983. [ bib ]
[167] Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, and Iftekhar Naim. Gecko: Versatile text embeddings distilled from large language models, 2024. Explanation Thread: https://twitter.com/ZetaVector/status/1775513153201148063. [ bib | arXiv | http ]
Keywords: LLM,Text Embeddings
[168] Greg Lehey. Debugging kernel problems, 2006. [ bib | .pdf ]
[169] Ben Leslie, Carl van Schaik, and Gernot Heiser. Wombat: A Portable User-Mode Linux for Embedded Systems. In Proceedings of the 6th Linux.Conf.Au, 2005. [ bib | .html ]
[170] Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and Stefan Götz. Unmodified Device Driver Reuse and Improved System Dependability Via Virtual Machines. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation, pages 17--30, 2004. [ bib ]
[171] Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, and Deheng Ye. More agents is all you need, 2024. [ bib | arXiv | http ]
Keywords: llm
[172] Libguestfs: Tools for accessing and modifying virtual machine disk images. [ bib | http ]
[173] Jochen Liedtke. Improving IPC by Kernel Design. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 175--188, 1993. [ bib ]
[174] Jochen Liedtke. On μ-Kernel Construction. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 237--250, 1995. [ bib ]
[175] Yiming Lin, Madelon Hulsebos, Ruiying Ma, Shreya Shankar, Sepanta Zeigham, Aditya G. Parameswaran, and Eugene Wu. Towards accurate and efficient document analytics with large language models, 2024. [ bib | arXiv | http ]
[176] Benoit Liquet, Sarat Moka, and Yoni Nazarathy. Mathematical Engineering of Deep Learning. CRC Press, 2024. [ bib ]
[177] Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov-arnold networks, 2024. [ bib | arXiv | http ]
[178] Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, and Max Tegmark. KAN: Kolmogorov-arnold networks, 2024. [ bib | arXiv | http ]
[179] LLaMA Now Goes Faster on CPUs. Sapphire Rapids has been available in the public cloud, developers are still targeting AVX512 when they should be targeting VNNI and AMX.

https://github.com/ggerganov/llama.cpp/issues/2555

via - https://news.ycombinator.com/item?id=39890570. [ bib | http ]

I wrote 84 new matmul kernels to improve llamafile CPU performance.
[180] Gilles Louppe. Understanding random forests: From theory to practice, 2015. [ bib | arXiv ]
[181] Hongyin Luo and Wei Sun. Addition is all you need for energy-efficient language models, 2024. [ bib | arXiv | http ]
[182] Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, and Chunting Zhou. Megalodon: Efficient LLM pretraining and inference with unlimited context length, 2024. [ bib | arXiv | http ]
[183] Anil Madhavapeddy, Thomas Leonard, Magnus Skjegstad, Thomas Gazagnaire, David Sheets, Dave Scott, Richard Mortier, Amir Chaudhry, Balraj Singh, Jon Ludlam, Jon Crowcroft, and Ian Leslie. Jitsu: Just-in-time Summoning of Unikernels. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI'15, pages 559--573, Berkeley, CA, USA, 2015. USENIX Association. [ bib | http ]
[184] Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. Unikernels: Library Operating Systems for the Cloud. SIGPLAN Not., 48(4):461--472, March 2013. [ bib | DOI | http ]
Keywords: functional programming,hypervisor,microkernel
[185] Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 461--472, New York, NY, USA, 2013. ACM. [ bib | DOI | http ]
Keywords: functional programming,hypervisor,microkernel
[186] Per Martin-Löf. An intuitionistic theory of types, 1998. [ bib ]
[187] Joao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio Honda, Roberto Bifulco, and Felipe Huici. ClickOS and the Art of Network Function Virtualization. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 459--473, Berkeley, CA, USA, 2014. USENIX Association. [ bib | http ]
[188] Michael Matz, Jan Hubička, Andreas Jaeger, and Mark Mitchell. System V Application Binary Interface, AMD64 Architecture Processor Supplement, Draft Version 0.99.5, 2010. [ bib | .pdf ]
[189] Jim Mauro and Richard McDougall. Solaris Internals: Core Kernel Architecture. Sun Microsystems, Inc., 2001. [ bib ]
[190] Steven McCanne and Van Jacobson. The BSD packet filter: A new architecture for user-level packet capture. In Proceedings of the USENIX Winter Technical Conference, pages 259--269, 1993. [ bib ]
[191] Paul E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels. PhD thesis, OGI School of Science and Engineering at Oregon Health and Sciences University, 2004. [ bib ]
[192] Marshall Kirk McKusick and Michael J. Karels. Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel. In Proceedings of the USENIX Summer Technical Conference, pages 295--304, 1988. [ bib ]
[193] Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman. The Design and Implementation of the 4.4BSD Operating System. Addison Wesley, 1996. [ bib ]
[194] Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. A Fast File System for UNIX. Computer Systems, 2(3):181--197, 1984. [ bib ]
[195] Marshall Kirk McKusick and Jeffery Roberson. Journaled Soft-updates. In Proceedings of EuroBSDCon 2010, 2010. [ bib | .pdf ]
[196] Marshall Kirk McKusick and Gregory R. Ganger. Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem. In Proceedings of the USENIX Annual Technical Conference, pages 1--17, 1999. [ bib ]
[197] Blakeley B. McShane and David Gal. Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Management Science, 62(6):1707--1718, 2016. [ bib | DOI | http ]
Statistical training helps individuals analyze and interpret data. However, the emphasis placed on null hypothesis significance testing in academic training and reporting may lead researchers to interpret evidence dichotomously rather than continuously. Consequently, researchers may either disregard evidence that fails to attain statistical significance or undervalue it relative to evidence that attains statistical significance. Surveys of researchers across a wide variety of fields (including medicine, epidemiology, cognitive science, psychology, business, and economics) show that a substantial majority does indeed do so. This phenomenon is manifest both in researchers’ interpretations of descriptions of evidence and in their likelihood judgments. Dichotomization of evidence is reduced though still present when researchers are asked to make decisions based on the evidence, particularly when the decision outcome is personally consequential. Recommendations are offered. This paper was accepted by Yuval Rottenstreich, judgment and decision making.
[198] Julio Merino. Automated Testing Framework. [ bib | http ]
[199] Luke Mewburn and Matthew Green. Build.sh: Cross-building NetBSD. In Proceedings of the USENIX BSD Conference, pages 47--56, 2003. [ bib ]
[200] Luke Mewburn. Private communication, April 2009. [ bib ]
[201] Maged M Michael and Michael L Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, pages 267--275, 1996. [ bib ]
[202] D. Lee Miller. WordLlama: Recycled token embeddings from large language models, 2024. [ bib | http ]
[203] S. P. Miller, B. C. Neuman, J. I. Schiller, and J. H. Saltzer. Kerberos Authentication and Authorization System. In Project Athena Technical Plan, 1988. [ bib ]
[204] Robert B. Miller. Response time in man-computer conversational transactions. In Proceedings of the Fall Joint Computer Conference, AFIPS (Fall, Part I), pages 267--277, San Francisco, California, 1968. [ bib ]
[205] Ronald G. Minnich and David J. Farber. The Mether System: Distributed Shared Memory for SunOS 4.0. Technical Report MS-CIS-93-24, University of Pennsylvania Department of Computer and Information Science, February 1993. [ bib ]
[206] Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications, 2018. [ bib | arXiv ]
[207] Sape J. Mullender, Guido van Rossum, Andrew S. Tanenbaum, Robbert van Renesse, and Hans van Staveren. Amoeba: A Distributed Operating System for the 1990s. Computer, 23(5):44--53, 1990. [ bib ]
[208] Derek G Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439--455, 2013. [ bib ]
[209] Madanlal Musuvathi and Dawson R. Engler. Model Checking Large Network Protocol Implementations. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation, pages 155--168, 2004. [ bib ]
[210] Mutt E-Mail Client. Mutt Email Client. [ bib | http ]
[211] Sebastian Nanz and Carlo A. Furia. A Comparative Study of Programming Languages in Rosetta Code, 2014. [ bib | http ]
[212] Ndis -- NDIS Miniport Driver Wrapper. FreeBSD Kernel Interfaces Manual, March 2010. [ bib ]
[213] NetBSD Project. NetBSD Project. [ bib | http ]
[214] Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 89--100, 2007. [ bib ]
[215] Thomas Neumann and Michael J Freitag. Umbra: A disk-based system with in-memory performance. In CIDR, volume 20, page 29, 2020. [ bib | .pdf ]
[216] David Niemi and Alain Knaff. Mtools, 2007. [ bib | http ]
[217] Bengt Nordström, Kent Petersson, and Jan M Smith. Programming in Martin-Löf's Type Theory, volume 200. Oxford University Press Oxford, 1990. [ bib ]
[218] Chris Okasaki. Purely Functional Data Structures. Technical report, CMU, 1998. [ bib ]
[219] OpenSSH. OpenSSH. [ bib | http ]
[220] Thomas J. Ostrand and Elaine J. Weyuker. The Distribution of Faults in a Large Industrial Software System. SIGSOFT Softw. Eng. Notes, 27(4):55--64, July 2002. [ bib ]
[221] Norman W. Paton and Oscar D'iaz. Active database systems. ACM Comput. Surv., 31(1):63--103, March 1999. [ bib | DOI | http ]
Keywords: *file-import-13-06-09
[222] Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A Comparison of Approaches to Large-scale Data Analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 165--178, New York, NY, USA, 2009. ACM. [ bib | DOI | http ]
Keywords: benchmarks,mapreduce,parallel database
[223] Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, and Jacques Nadeau. The composable data management system manifesto. Proceedings of the VLDB Endowment, 16(10):2679--2685, 2023. [ bib | .pdf ]
[224] Tomas Petricek. Data exploration through dot-driven development. ECOOP 2017, page 21, 2017. Found it via Aditya's  Nim Talk - https://www.youtube.com/watch?v=d2VRuZo2pdA. [ bib | http ]
[225] Rob Pike, Dave Presotto, Ken Thompson, and Howard Trickey. Plan 9 from Bell Labs. In Proceedings of the Summer UKUUG Conference, pages 1--9, 1990. [ bib ]
[226] Rob Pike. Systems Software Research is Irrelevant, 2000. [ bib | http ]
[227] Pkgsrc: The NetBSD Packages Collection. [ bib | http ]
[228] John R. Platt. Strong Inference. Science, 146(3642):347--353, 1964. [ bib | DOI | http ]
Keywords: stat_theory
[229] Robert Pollack. How to Believe a Machine-Checked Proof, 1997. [ bib ]
[230] Donald E. Porter, Silas Boyd-Wickizer, Jon Howell, Reuben Olinsky, and Galen C. Hunt. Rethinking the Library OS from the Top Down. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291--304, 2011. [ bib ]
[231] Neil Postman. Five things we need to know about technological change. Recuperado de http://www. sdca. org/sermons_ mp3/2012/121229_postman_5Things. pdf, 1998. [ bib ]
[232] Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. IRON File Systems. ACM SIGOPS Operating Systems Review, 39(5):206--220, 2005. [ bib ]
[233] Prashant Pradhan, Srikanth Kandula, Wen Xu, Anees Shaikh, and Erich Nahum. Daytona : A User-Level TCP Stack, 2002. [ bib | .pdf ]
[234] Ketaki A Pradhan, Lahiru S. Gallege, and Rajeev R Raje. MDE-URDS-A Mobile Device Enabled Service Discovery System. PhD thesis, Purdue University, 2011. [ bib ]
[235] Simon J.D. Prince. Understanding Deep Learning. The MIT Press, 2023. [ bib | http ]
[236] The Transport Layer Security (TLS) Protocol. The Transport Layer Security(TLS) Protocol, 2008. RFC 5246. [ bib ]
[237] Pud -- Pass-to-Userspace Device. NetBSD Kernel Interfaces Manual, November 2007. [ bib ]
[238] Octavian Purdila, Lucian Adrian Grijincu, and Nicolae Tapus. LKL: The Linux Kernel Library. In Proceedings of the RoEduNet International Conference, pages 328--333, 2010. [ bib ]
[239] Chen Qian, Xin Cong, Wei Liu, Cheng Yang, Weize Chen, Yusheng Su, Yufan Dang, Jiahao Li, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. Communicative agents for software development, 2023. [ bib | arXiv | http ]
[240] Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. ZeRO: Memory optimizations toward training trillion parameter models, 2020. [ bib | arXiv ]
[241] Richard Rashid, Avadis Tevanian, Michael Young, David Golub, Robert Baron, David Black, William Bolosky, and Jonathan Chew. Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures. SIGARCH Computer Architecture News, 15(5):31--39, October 1987. [ bib ]
[242] Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, and Weizhu Chen. Samba: Simple hybrid state space models for efficient unlimited context language modeling, 2024. [ bib | arXiv | http ]
[243] Sean Rhea, Eric Wang, Edmund Wong, Ethan Atkins, and Nat Storer. LittleTable: A Time-Series Database and Its Uses. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 125--138, New York, NY, USA, 2017. ACM. [ bib | DOI | http ]
Keywords: cloud computing,clustering,databases,internet of things,partitioning,time-series data
[244] Luigi Rizzo. Netmap: A Novel Framework for Fast Packet I/O. In Proceedings of the USENIX Annual Technical Conference, pages 101--112, August 2012. [ bib ]
[245] Daniel A. Roberts, Sho Yaida, and Boris Hanin. The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks. Cambridge University Press, May 2022. [ bib | DOI | http ]
[246] Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, and Dhagash Mehta. HybridRAG: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction, 2024. [ bib | arXiv | http ]
[247] Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. RAPTOR: Recursive abstractive processing for tree-organized retrieval, 2024. [ bib | arXiv ]
Keywords: LLM,RAG
[248] Margo I. Seltzer, Keith Bostic, Marshall K. McKusick, and Carl Staelin. An Implementation of a Log-Structured File System for UNIX. In Proceedings of the USENIX Winter Technical Conference, pages 307--326, 1993. [ bib ]
[249] Cosma Shalizi. The bootstrap. American Scientist, May 2010. [ bib | http ]
[250] Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context, 2023. [ bib | arXiv ]
Keywords: LLM
[251] Liang Shi, Zhengju Tang, and Zhi Yang. A survey on employing large language models for text-to-SQL tasks, 2024. [ bib | arXiv | http ]
[252] Chuck Silvers. UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, pages 285--290, 2000. [ bib ]
[253] A. NONSTANDARD FOR TRANSMISSION OF IP DATAGRAMS OVER SERIAL LINES: SLIP. A. NONSTANDARD FOR TRANSMISSION OF IP DATAGRAMS OVER SERIAL LINES:, 1988. RFC 1055. [ bib ]
[254] Slirp, the PPP/SLIP-on-terminal emulator. [ bib | http ]
[255] Christopher Small and Margo Seltzer. VINO: An Integrated Platform for Operating System and Database Research. Technical Report TR-30-94, Harvard, 1994. [ bib ]
[256] Smart Guy Productivity Pitfalls. [ bib | .html ]
Productivity is one of my pet topics, because it's always dogged me a bit, especially early in my career.  I'd pull long days and nights and...
[257] Brent Smith and Greg Linden. Two decades of recommender systems at Amazon.com. IEEE Internet Computing, 2017. [ bib | http ]
[258] David Smith, Joseph Samuel Myers, Craig S. Kaplan, and Chaim Goodman-Strauss. An aperiodic monotile, 2023. [ bib ]
[259] Keith A. Smith and Margo I. Seltzer. File System Aging---Increasing the Relevance of File System Benchmarks. SIGMETRICS Perform. Eval. Rev., 25(1):203--213, 1997. [ bib ]
[260] Stephen Soltesz, Herbert Pötzl, Marc E. Fiuczynski, Andy Bavier, and Larry Peterson. Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, pages 275--287, 2007. [ bib ]
[261] David I Spivak and Robert E Kent. Ologs: A categorical framework for knowledge representation. PloS one, 7(1):e24274, 2012. [ bib ]
[262] Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakas, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartlomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Senel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michal Swedrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Milkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, Zirui Wang, and Ziyi Wu. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, 2023. [ bib | arXiv ]
[263] Leon Sterling and Ehud Shapiro. The Art of Prolog: Advanced Programming Techniques. MIT Press, Cambridge, MA, USA, 1986. [ bib ]
Keywords: *file-import-13-06-09
[264] Jack Sun, Daniel Fryer, Ashvin Goel, and Angela Demke Brown. Using Declarative Invariants for Protecting File-System Integrity. In Proceedings of the 6th Workshop on Programming Languages and Operating Systems, pages 6:1--6:5, Cascais, Portugal, 2011. [ bib ]
[265] Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging BIG-Bench tasks and whether chain-of-thought can solve them, 2022. [ bib | arXiv ]
[266] Miklós Szeredi. FUSE: Filesystem in Userspace. [ bib | http ]
[267] Richard Colin Tait. Robert De Niro's Method: Acting, authorship and agency in the New Hollywood (1967-1980). 2013. [ bib ]
[268] Wei Tao, Yucheng Zhou, Wenqiang Zhang, and Yu Cheng. MAGIS: LLM-Based multi-agent framework for GitHub issue resolution, 2024. [ bib | arXiv | http ]
[269] Jason Thorpe. A Machine-Independent DMA Framework for NetBSD. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, pages 1--12, 1998. [ bib ]
[270] Thttpd -- tiny/turbo/throttling HTTP server. [ bib | http ]
[271] Omkar J Tilak and Rajeev R Raje. Temporal Interaction Contracts for Components in a Distributed System. In Enterprise Distributed Object Computing Conference, 2007. EDOC 2007. 11th IEEE International, pages 339--339. IEEE, 2007. [ bib ]
[272] TIL: Mermaid Gantt diagrams are great for displaying distributed traces in Markdown - brycemecum.com. [ bib | http ]
[273] Tooltip. [ bib | .html ]
[274] Chris Torek. Device Configuration in 4.4BSD, December 1992. [ bib | .ps ]
[275] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models, 2023. [ bib | arXiv ]
Keywords: generative-ai,LLM
[276] Xinming Tu, James Zou, Weijie J. Su, and Linjun Zhang. What should data science education do with large language models?, 2023. [ bib | arXiv ]
[277] Justine Tunney. Redbean. [ bib | http ]
single file distributable web server
[278] Mikhail Tuzhilin and Dong Zhang. Graphs, algorithms and applications, 2023. [ bib | arXiv ]
[279] Leslie Valiant. Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World. Basic Books, 2013. [ bib ]
[280] Nicolas van Kempen, Hyuk-Je Kwon, Dung Tuan Nguyen, and Emery D. Berger. It's not easy being green: On the energy efficiency of programming languages, 2024. [ bib | arXiv | http ]
[281] Axel Van Lamsweerde. Requirements engineering in the year 00: A research perspective. In Proceedings of the 22nd International Conference on Software Engineering, pages 5--19. ACM, 2000. [ bib ]
[282] Peter Van Roy and Seif Haridi. Concepts, Techniques, and Models of Computer Programming. MIT press, 2004. [ bib ]
[283] Peter Van Roy and Seif Haridi. Teaching programming broadly and deeply: The kernel language approach. Informatics Curricula and Teaching Methods, 245:53--62, 2002. [ bib ]
[284] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is All you Need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. [ bib | .pdf ]
[285] Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, and Shyamnath Gollakota. Look once to hear: Target speech hearing with noisy examples, 2024. [ bib | arXiv | http ]
[286] Philip Wadler. A critique of Abelson and Sussman or why calculating is better than scheming. ACM SIGPLAN Notices, 22(3):83--94, 1987. [ bib | .pdf ]
[287] Philip Wadler. Linear types can change the world! In Programming Concepts and Methods, volume 3, page 5. Citeseer, 1990. [ bib ]
[288] Carl A. Waldspurger. Memory Resource Management in VMware ESX Server. ACM SIGOPS Operating Systems Review, 36(SI):181--194, December 2002. [ bib ]
[289] Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. GLUE - A multi-task benchmark and analysis platform for natural language understanding, 2019. [ bib | arXiv ]
[290] Zhikui Wang, Xiaoyun Zhu, Pradeep Padala, and Sharad Singhal. Capacity and Performance Overhead in Dynamic Resource Allocation to Virtual Containers. In Proceedings of the IFIP/IEEE Symposium on Integrated Management, pages 149--158, May 2007. [ bib ]
[291] WAPBL -- Write Ahead Physical Block Logging File System Journaling. NetBSD Kernel Interfaces Manual, November 2010. [ bib ]
[292] Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, and Quoc V. Le. Long-form factuality in large language models, 2024. Github URL: https://github.com/google-deepmind/long-form-factuality. [ bib | http ]
[293] Hannes Weisbach, Björn Döbel, and Adam Lackorzynski. Generic User-Level PCI Drivers. In Proceedings of the 13th Real-Time Linux Workshop, Prague, Czech Republic, 2011. [ bib | .pdf ]
[294] Mark Weiser. The computer for the 21st century. Scientific american, 265(3):94--104, 1991. [ bib ]
[295] Gail Weiss, Yoav Goldberg, and Eran Yahav. Thinking Like Transformers. CoRR, abs/2106.06981, 2021. [ bib | arXiv | http ]
[296] David Wentzlaff and Anant Agarwal. Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, April 2009. [ bib ]
[297] Leland Wilkinson. The Grammar of Graphics (Statistics and Computing). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2005. [ bib ]
[298] David Woodhouse. JFFS: The Journalling Flash File System. In Proceedings of the Ottawa Linux Symposium, 2001. [ bib | .pdf ]
[299] Gary R. Wright and W. Richard Stevens. TCP/IP Illustrated, Volume 2. Addison Wesley, 1995. [ bib ]
[300] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang (Eric) Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. Technical Report MSR-TR-2023-33, Microsoft, August 2023. [ bib | http ]
We present AutoGen, an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. We provide many examples to build effective applications for domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.
[301] Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. BloombergGPT: A large language model for finance, 2023. [ bib | arXiv ]
[302] John Yang, Carlos E. Jimenez, Alexander Wettig, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent computer interfaces enable software engineering language models, 2024. SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

We accomplish these results by designing simple LM-centric commands and specially-built input and output formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents. [ bib | http ]

[303] Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. Automatically Generating Malicious Disks using Symbolic Execution. In Proceedings of the IEEE Symposium on Security and Privacy, pages 243--257, 2006. [ bib ]
[304] Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, and Zhe Gan. Ferret-UI: Grounded mobile UI understanding with multimodal LLMs, 2024. [ bib | arXiv | http ]
Keywords: LLM
[305] Arnaud Ysmal and Antti Kantee. Fs-utils: File Systems Access Tools for Userland. In Proceedings of EuroBSDCon, 2009. [ bib | .pdf ]
[306] Zhengqing Yuan, Ruoxi Chen, Zhaoxu Li, Haolong Jia, Lifang He, Chi Wang, and Lichao Sun. Mora: Enabling generalist video generation via a multi-agent framework, 2024. [ bib | arXiv ]
[307] Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, and Jason Weston. Self-rewarding language models, 2024.

Abstract

We posit that to achieve superhuman agents, future models require super- human feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewarding Language Models, where the language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training. We show that during Iterative DPO training that not only does instruction following ability improve, but also the ability to provide high-quality rewards to itself. Fine-tuning Llama 2 70B on three iterations of our approach yields a model that outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613. While there is much left still to explore, this work opens the door to the possibility of models that can continually improve in both axes. [ bib | arXiv | http ]

Keywords: LLM,Reward Models
[308] Yang Yu, Fanglu Guo, Susanta Nanda, Lap-chung Lam, and Tzi-cker Chiueh. A Feather-weight Virtual Machine for Windows Applications. In Proceedings of the ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 24--34, 2006. [ bib ]
[309] Pamela Zave and Michael Jackson. Where do operations come from? A multiparadigm specification technique. Software Engineering, IEEE Transactions on, 22(7):508--528, 1996. [ bib ]
[310] Marko Zec. Implementing a Clonable Network Stack in the FreeBSD Kernel. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track, pages 137--150, 2003. [ bib ]
[311] ZFS Source Tour. [ bib | http ]
[312] Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, and Tong Sun. LLaVAR: Enhanced visual instruction tuning for text-rich image understanding, 2023. [ bib | arXiv ]
[313] Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, and Shen Li. PyTorch FSDP: Experiences on scaling fully sharded data parallel, 2023. [ bib | arXiv ]
[314] Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, and Zhihua Wu. ChuXin: 1.6B technical report, 2024. [ bib | arXiv | http ]

This file was generated by bibtex2html 1.99.